Stream fusion for multi-stream automatic speech recognition

Hesam Sagha, Feipeng Li, Ehsan Variani, José del R. Millán, Ricardo Chavarriaga, Björn Schuller

Research output: Contribution to journalArticlepeer-review

Abstract

Multi-stream automatic speech recognition (MS-ASR) has been confirmed to boost the recognition performance in noisy conditions. In this system, the generation and the fusion of the streams are the essential parts and need to be designed in such a way to reduce the effect of noise on the final decision. This paper shows how to improve the performance of the MS-ASR by targeting two questions; (1) How many streams are to be combined, and (2) how to combine them. First, we propose a novel approach based on stream reliability to select the number of streams to be fused. Second, a fusion method based on Parallel Hidden Markov Models is introduced. Applying the method on two datasets (TIMIT and RATS) with different noises, we show an improvement of MS-ASR.

Original languageEnglish
Pages (from-to)669-675
Number of pages7
JournalInternational Journal of Speech Technology
Volume19
Issue number4
DOIs
StatePublished - 1 Dec 2016
Externally publishedYes

Keywords

  • Classifier ensemble creation and fusion
  • Multi-stream speech recognition
  • Performance monitor

Fingerprint

Dive into the research topics of 'Stream fusion for multi-stream automatic speech recognition'. Together they form a unique fingerprint.

Cite this