TY - GEN
T1 - Audiovisual classification of vocal outbursts in human conversation using long-short-term memory networks
AU - Eyben, Florian
AU - Petridis, Stavros
AU - Schuller, Björn
AU - Tzimiropoulos, George
AU - Zafeiriou, Stefanos
AU - Pantic, Maja
PY - 2011
Y1 - 2011
N2 - We investigate classification of non-linguistic vocalisations with a novel audiovisual approach and Long Short-Term Memory (LSTM) Recurrent Neural Networks as highly successful dynamic sequence classifiers. As database of evaluation serves this year's Paralinguistic Challenge's Audiovisual Interest Corpus of human-to-human natural conversation. For video-based analysis we compare shape and appearance based features. These are fused in an early manner with typical audio descriptors. The results show significant improvements of LSTM networks over a static approach based on Support Vector Machines. More important, we can show a significant gain in performance when fusing audio and visual shape features.
AB - We investigate classification of non-linguistic vocalisations with a novel audiovisual approach and Long Short-Term Memory (LSTM) Recurrent Neural Networks as highly successful dynamic sequence classifiers. As database of evaluation serves this year's Paralinguistic Challenge's Audiovisual Interest Corpus of human-to-human natural conversation. For video-based analysis we compare shape and appearance based features. These are fused in an early manner with typical audio descriptors. The results show significant improvements of LSTM networks over a static approach based on Support Vector Machines. More important, we can show a significant gain in performance when fusing audio and visual shape features.
KW - Audio-visual Processing
KW - Laughter
KW - Long Short-Term Memory
KW - Non-linguistic Vocalisations
UR - http://www.scopus.com/inward/record.url?scp=80051631670&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2011.5947690
DO - 10.1109/ICASSP.2011.5947690
M3 - Conference contribution
AN - SCOPUS:80051631670
SN - 9781457705397
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 5844
EP - 5847
BT - 2011 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011 - Proceedings
T2 - 36th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011
Y2 - 22 May 2011 through 27 May 2011
ER -