TY - GEN
T1 - Spectral and cepstral audio noise reduction techniques in speech emotion recognition
AU - Pohjalainen, Jouni
AU - Ringeval, Fabien
AU - Zhang, Zixing
AU - Schuller, Björn
N1 - Publisher Copyright:
© 2016 ACM.
PY - 2016/10/1
Y1 - 2016/10/1
N2 - Signal noise reduction can improve the performance of machine learning systems dealing with time signals such as audio. Real-life applicability of these recognition technologies requires the system to uphold its performance level in variable, challenging conditions such as noisy environments. In this contribution, we investigate audio signal denoising methods in cepstral and log-spectral domains and compare them with common implementations of standard techniques. The different approaches are first compared generally using averaged acoustic distance metrics. They are then applied to automatic recognition of spontaneous and natural emotions under simulated smartphone-recorded noisy conditions. Emotion recognition is implemented as support vector regression for continuous-valued prediction of arousal and valence on a realistic multimodal database. In the experiments, the proposed methods are found to generally outperform standard noise reduction algorithms.
AB - Signal noise reduction can improve the performance of machine learning systems dealing with time signals such as audio. Real-life applicability of these recognition technologies requires the system to uphold its performance level in variable, challenging conditions such as noisy environments. In this contribution, we investigate audio signal denoising methods in cepstral and log-spectral domains and compare them with common implementations of standard techniques. The different approaches are first compared generally using averaged acoustic distance metrics. They are then applied to automatic recognition of spontaneous and natural emotions under simulated smartphone-recorded noisy conditions. Emotion recognition is implemented as support vector regression for continuous-valued prediction of arousal and valence on a realistic multimodal database. In the experiments, the proposed methods are found to generally outperform standard noise reduction algorithms.
KW - Denoising
KW - Noise reduction
KW - Speech emotion recognition
UR - http://www.scopus.com/inward/record.url?scp=84994652671&partnerID=8YFLogxK
U2 - 10.1145/2964284.2967306
DO - 10.1145/2964284.2967306
M3 - Conference contribution
AN - SCOPUS:84994652671
T3 - MM 2016 - Proceedings of the 2016 ACM Multimedia Conference
SP - 670
EP - 674
BT - MM 2016 - Proceedings of the 2016 ACM Multimedia Conference
PB - Association for Computing Machinery, Inc
T2 - 24th ACM Multimedia Conference, MM 2016
Y2 - 15 October 2016 through 19 October 2016
ER -