TY - GEN
T1 - Conversational speech recognition in non-stationary reverberated environments
AU - Rotili, Rudy
AU - Principi, Emanuele
AU - Wöllmer, Martin
AU - Squartini, Stefano
AU - Schuller, Björn
PY - 2012
Y1 - 2012
N2 - This paper presents a conversational speech recognition system able to operate in non-stationary reverberated environments. The system is composed of a dereverberation front-end exploiting multiple distant microphones, and a speech recognition engine. The dereverberation front-end identifies a room impulse response by means of a blind channel identification stage based on the Unconstrained Normalized Multi-Channel Frequency Domain Least Mean Square algorithm. The dereverberation stage is based on the adaptive inverse filter theory and uses the identified responses to obtain a set of inverse filters which are then exploited to estimate the clean speech. The speech recognizer is based on tied-state cross-word triphone models and decodes features computed from the dereverberated speech signal. Experiments conducted on the Buckeye corpus of conversational speech report a relative word accuracy improvement of 17.48% in the stationary case and of 11.16% in the non-stationary one.
AB - This paper presents a conversational speech recognition system able to operate in non-stationary reverberated environments. The system is composed of a dereverberation front-end exploiting multiple distant microphones, and a speech recognition engine. The dereverberation front-end identifies a room impulse response by means of a blind channel identification stage based on the Unconstrained Normalized Multi-Channel Frequency Domain Least Mean Square algorithm. The dereverberation stage is based on the adaptive inverse filter theory and uses the identified responses to obtain a set of inverse filters which are then exploited to estimate the clean speech. The speech recognizer is based on tied-state cross-word triphone models and decodes features computed from the dereverberated speech signal. Experiments conducted on the Buckeye corpus of conversational speech report a relative word accuracy improvement of 17.48% in the stationary case and of 11.16% in the non-stationary one.
UR - http://www.scopus.com/inward/record.url?scp=84870312972&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-34584-5_4
DO - 10.1007/978-3-642-34584-5_4
M3 - Conference contribution
AN - SCOPUS:84870312972
SN - 9783642345838
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 50
EP - 59
BT - Cognitive Behavioural Systems - COST 2102 International Training School, Revised Selected Papers
T2 - International Training School on Cognitive Behavioural Systems, COST 2102
Y2 - 21 February 2011 through 26 February 2011
ER -