TY - GEN
T1 - Speaker trait characterization in web videos
T2 - 2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013
AU - Weninger, Felix
AU - Wagner, Claudia
AU - Wollmer, Martin
AU - Schuller, Bjorn
AU - Morency, Louis Philippe
PY - 2013/10/18
Y1 - 2013/10/18
N2 - We present a multi-modal approach to speaker characterization using acoustic, visual and linguistic features. Full realism is provided by evaluation on a database of real-life web videos and automatic feature extraction including face and eye detection, and automatic speech recognition. Different segmentations are evaluated for the audio and video streams, and the statistical relevance of Linguistic Inquiry and Word Count (LIWC) features is confirmed. In the result, late multimodal fusion delivers 73, 92 and 73% average recall in binary age, gender and race classification on unseen test subjects, outperforming the best single modalities for age and race.
AB - We present a multi-modal approach to speaker characterization using acoustic, visual and linguistic features. Full realism is provided by evaluation on a database of real-life web videos and automatic feature extraction including face and eye detection, and automatic speech recognition. Different segmentations are evaluated for the audio and video streams, and the statistical relevance of Linguistic Inquiry and Word Count (LIWC) features is confirmed. In the result, late multimodal fusion delivers 73, 92 and 73% average recall in binary age, gender and race classification on unseen test subjects, outperforming the best single modalities for age and race.
KW - computational paralinguistics
KW - multi-modal fusion
KW - speaker classification
UR - http://www.scopus.com/inward/record.url?scp=84890532851&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2013.6638338
DO - 10.1109/ICASSP.2013.6638338
M3 - Conference contribution
AN - SCOPUS:84890532851
SN - 9781479903566
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 3647
EP - 3651
BT - 2013 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Proceedings
Y2 - 26 May 2013 through 31 May 2013
ER -