Conversational speech recognition in non-stationary reverberated environments

Rudy Rotili, Emanuele Principi, Martin Wöllmer, Stefano Squartini, Björn Schuller

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

This paper presents a conversational speech recognition system able to operate in non-stationary reverberated environments. The system is composed of a dereverberation front-end exploiting multiple distant microphones, and a speech recognition engine. The dereverberation front-end identifies a room impulse response by means of a blind channel identification stage based on the Unconstrained Normalized Multi-Channel Frequency Domain Least Mean Square algorithm. The dereverberation stage is based on the adaptive inverse filter theory and uses the identified responses to obtain a set of inverse filters which are then exploited to estimate the clean speech. The speech recognizer is based on tied-state cross-word triphone models and decodes features computed from the dereverberated speech signal. Experiments conducted on the Buckeye corpus of conversational speech report a relative word accuracy improvement of 17.48% in the stationary case and of 11.16% in the non-stationary one.

Original languageEnglish
Title of host publicationCognitive Behavioural Systems - COST 2102 International Training School, Revised Selected Papers
Pages50-59
Number of pages10
DOIs
StatePublished - 2012
EventInternational Training School on Cognitive Behavioural Systems, COST 2102 - Dresden, Germany
Duration: 21 Feb 201126 Feb 2011

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume7403 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceInternational Training School on Cognitive Behavioural Systems, COST 2102
Country/TerritoryGermany
CityDresden
Period21/02/1126/02/11

Fingerprint

Dive into the research topics of 'Conversational speech recognition in non-stationary reverberated environments'. Together they form a unique fingerprint.

Cite this