Conversational speech recognition in non-stationary reverberated environments

Rudy Rotili, Emanuele Principi, Martin Wöllmer, Stefano Squartini, Björn Schuller

Publikation: Beitrag in Buch/Bericht/KonferenzbandKonferenzbeitragBegutachtung

Abstract

This paper presents a conversational speech recognition system able to operate in non-stationary reverberated environments. The system is composed of a dereverberation front-end exploiting multiple distant microphones, and a speech recognition engine. The dereverberation front-end identifies a room impulse response by means of a blind channel identification stage based on the Unconstrained Normalized Multi-Channel Frequency Domain Least Mean Square algorithm. The dereverberation stage is based on the adaptive inverse filter theory and uses the identified responses to obtain a set of inverse filters which are then exploited to estimate the clean speech. The speech recognizer is based on tied-state cross-word triphone models and decodes features computed from the dereverberated speech signal. Experiments conducted on the Buckeye corpus of conversational speech report a relative word accuracy improvement of 17.48% in the stationary case and of 11.16% in the non-stationary one.

OriginalspracheEnglisch
TitelCognitive Behavioural Systems - COST 2102 International Training School, Revised Selected Papers
Seiten50-59
Seitenumfang10
DOIs
PublikationsstatusVeröffentlicht - 2012
VeranstaltungInternational Training School on Cognitive Behavioural Systems, COST 2102 - Dresden, Deutschland
Dauer: 21 Feb. 201126 Feb. 2011

Publikationsreihe

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Band7403 LNCS
ISSN (Print)0302-9743
ISSN (elektronisch)1611-3349

Konferenz

KonferenzInternational Training School on Cognitive Behavioural Systems, COST 2102
Land/GebietDeutschland
OrtDresden
Zeitraum21/02/1126/02/11

Fingerprint

Untersuchen Sie die Forschungsthemen von „Conversational speech recognition in non-stationary reverberated environments“. Zusammen bilden sie einen einzigartigen Fingerprint.

Dieses zitieren