Synchronized Forward-Backward Transformer for End-to-End Speech Recognition

Tobias Watzel, Ludwig Kürzinger, Lujun Li, Gerhard Rigoll

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Recently, various approaches utilize transformer networks, which apply a new concept of self-attention, in end-to-end speech recognition. These approaches mainly focus on the self-attention mechanism to improve the performance of transformer models. In our work, we demonstrate the benefit of adding a second transformer network during the training phase, which is optimized on time-reversed target labels. This new transformer receives a future context, which is usually not available for standard transformer networks. We have access to future context information, which we integrate into the standard transformer network by proposing two novel synchronization terms. Since we only require the newly added transformer network during training, we are not changing the complexity of the final network and only adding training time. We evaluate our approach on the publicly available dataset TEDLIUMv2, where we achieve relative improvements of 9.8% for the dev and 6.5% on the test set, respectively, if we employ synchronization terms with euclidean metrics.

Original languageEnglish
Title of host publicationSpeech and Computer - 22nd International Conference, SPECOM 2020, Proceedings
EditorsAlexey Karpov, Rodmonga Potapova
PublisherSpringer Science and Business Media Deutschland GmbH
Pages646-656
Number of pages11
ISBN (Print)9783030602758
DOIs
StatePublished - 2020
Event22nd International Conference on Speech and Computer, SPECOM 2020 - St. Petersburg, Russian Federation
Duration: 7 Oct 20209 Oct 2020

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12335 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference22nd International Conference on Speech and Computer, SPECOM 2020
Country/TerritoryRussian Federation
CitySt. Petersburg
Period7/10/209/10/20

Keywords

  • Forward-backward transformer
  • Regularization
  • Speech recognition
  • Synchronization
  • Transformer

Fingerprint

Dive into the research topics of 'Synchronized Forward-Backward Transformer for End-to-End Speech Recognition'. Together they form a unique fingerprint.

Cite this