TY - GEN
T1 - Temporal and situational context modeling for improved dominance recognition in meetings
AU - Wöllmer, Martin
AU - Eyben, Florian
AU - Schuller, Björn
AU - Rigoll, Gerhard
PY - 2012
Y1 - 2012
N2 - We present and evaluate a novel approach towards automatically detecting a speaker's level of dominance in a meeting scenario. Since previous studies reveal that audio appears to be the most important modality for dominance recognition, we focus on the analysis of the speech signals recorded in multiparty meetings. Unlike recently published techniques which concentrate on frame-level hidden Markov modeling, we propose a recognition framework operating on segmental data and investigate context modeling on three different levels to explore possible performance gains. First, we apply a set of statistical functionals to capture large-scale feature-level context within a speech segment. Second, we consider bidirectional Long Short-Term Memory recurrent neural networks for long-range temporal context modeling between segments. Finally, we evaluate the benefit of situational context incorporation by simultaneously modeling speech of all meeting participants. Overall, our approach leads to a remarkable increase of recognition accuracy when compared to hidden Markov modeling.
AB - We present and evaluate a novel approach towards automatically detecting a speaker's level of dominance in a meeting scenario. Since previous studies reveal that audio appears to be the most important modality for dominance recognition, we focus on the analysis of the speech signals recorded in multiparty meetings. Unlike recently published techniques which concentrate on frame-level hidden Markov modeling, we propose a recognition framework operating on segmental data and investigate context modeling on three different levels to explore possible performance gains. First, we apply a set of statistical functionals to capture large-scale feature-level context within a speech segment. Second, we consider bidirectional Long Short-Term Memory recurrent neural networks for long-range temporal context modeling between segments. Finally, we evaluate the benefit of situational context incorporation by simultaneously modeling speech of all meeting participants. Overall, our approach leads to a remarkable increase of recognition accuracy when compared to hidden Markov modeling.
KW - Audio feature extraction
KW - Dominance recognition
KW - Long Short-Term Memory
KW - Meeting analysis
UR - http://www.scopus.com/inward/record.url?scp=84878413746&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84878413746
SN - 9781622767595
T3 - 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012
SP - 350
EP - 353
BT - 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012
T2 - 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012
Y2 - 9 September 2012 through 13 September 2012
ER -