TY - JOUR
T1 - New Avenues in Audio Intelligence
T2 - Towards Holistic Real-life Audio Understanding
AU - Schuller, Björn
AU - Baird, Alice
AU - Gebhard, Alexander
AU - Amiriparian, Shahin
AU - Keren, Gil
AU - Schmitt, Maximilian
AU - Cummins, Nicholas
N1 - Publisher Copyright:
© The Author(s) 2021.
PY - 2021
Y1 - 2021
N2 - Computer audition (i.e., intelligent audio) has made great strides in recent years; however, it is still far from achieving holistic hearing abilities, which more appropriately mimic human-like understanding. Within an audio scene, a human listener is quickly able to interpret layers of sound at a single time-point, with each layer varying in characteristics such as location, state, and trait. Currently, integrated machine listening approaches, on the other hand, will mainly recognise only single events. In this context, this contribution aims to provide key insights and approaches, which can be applied in computer audition to achieve the goal of a more holistic intelligent understanding system, as well as identifying challenges in reaching this goal. We firstly summarise the state-of-the-art in traditional signal-processing-based audio pre-processing and feature representation, as well as automated learning such as by deep neural networks. This concerns, in particular, audio interpretation, decomposition, understanding, as well as ontologisation. We then present an agent-based approach for integrating these concepts as a holistic audio understanding system. Based on this, concluding, avenues are given towards reaching the ambitious goal of ‘holistic human-parity’ machine listening abilities.
AB - Computer audition (i.e., intelligent audio) has made great strides in recent years; however, it is still far from achieving holistic hearing abilities, which more appropriately mimic human-like understanding. Within an audio scene, a human listener is quickly able to interpret layers of sound at a single time-point, with each layer varying in characteristics such as location, state, and trait. Currently, integrated machine listening approaches, on the other hand, will mainly recognise only single events. In this context, this contribution aims to provide key insights and approaches, which can be applied in computer audition to achieve the goal of a more holistic intelligent understanding system, as well as identifying challenges in reaching this goal. We firstly summarise the state-of-the-art in traditional signal-processing-based audio pre-processing and feature representation, as well as automated learning such as by deep neural networks. This concerns, in particular, audio interpretation, decomposition, understanding, as well as ontologisation. We then present an agent-based approach for integrating these concepts as a holistic audio understanding system. Based on this, concluding, avenues are given towards reaching the ambitious goal of ‘holistic human-parity’ machine listening abilities.
KW - audio intelligence
KW - computer audition
KW - machine learning
UR - http://www.scopus.com/inward/record.url?scp=85118868204&partnerID=8YFLogxK
U2 - 10.1177/23312165211046135
DO - 10.1177/23312165211046135
M3 - Article
C2 - 34751066
AN - SCOPUS:85118868204
SN - 2331-2165
VL - 25
JO - Trends in Hearing
JF - Trends in Hearing
ER -