TY - GEN
T1 - Dominance detection in a reverberated acoustic scenario
AU - Principi, Emanuele
AU - Rotili, Rudy
AU - Wöllmer, Martin
AU - Squartini, Stefano
AU - Schuller, Björn
PY - 2012
Y1 - 2012
N2 - This work proposes a dominance detection framework operating in reverberated environments. The framework is composed of a speech enhancement front-end, which automatically reduces the distortions introduced by room reverberation in the speech signals, and a dominance detector, which processes the enhanced signals and estimates the most and least dominant person in a segment. The front-end is composed by three cooperating blocks: speaker diarization, room impulse responses identification and speech dereverberation. The dominance estimation algorithm is based on bidirectional Long Short-Term Memory networks which allow for context-sensitive activity classification from audio feature functionals extracted via the real-time speech feature extraction toolkit openSMILE. Experiments have been performed suitably reverberating the DOME dataset: the absolute accuracy improvement averaged over the addressed reverberated conditions is 32.68% in the most dominant person estimation task and 36.56% in the least dominant person estimation one, both with full agreement among annotators.
AB - This work proposes a dominance detection framework operating in reverberated environments. The framework is composed of a speech enhancement front-end, which automatically reduces the distortions introduced by room reverberation in the speech signals, and a dominance detector, which processes the enhanced signals and estimates the most and least dominant person in a segment. The front-end is composed by three cooperating blocks: speaker diarization, room impulse responses identification and speech dereverberation. The dominance estimation algorithm is based on bidirectional Long Short-Term Memory networks which allow for context-sensitive activity classification from audio feature functionals extracted via the real-time speech feature extraction toolkit openSMILE. Experiments have been performed suitably reverberating the DOME dataset: the absolute accuracy improvement averaged over the addressed reverberated conditions is 32.68% in the most dominant person estimation task and 36.56% in the least dominant person estimation one, both with full agreement among annotators.
KW - Blind Channel Identification
KW - Dominance Detection
KW - Speaker Diarization
KW - Speech Dereverberation
UR - http://www.scopus.com/inward/record.url?scp=84865145664&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-31346-2_45
DO - 10.1007/978-3-642-31346-2_45
M3 - Conference contribution
AN - SCOPUS:84865145664
SN - 9783642313455
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 394
EP - 402
BT - Advances in Neural Networks, ISNN 2012 - 9th International Symposium on Neural Networks, Proceedings
T2 - 9th International Symposium on Neural Networks, ISNN 2012
Y2 - 11 July 2012 through 14 July 2012
ER -