Induced Local Attention for Transformer Models in Speech Recognition

Tobias Watzel, Ludwig Kürzinger, Lujun Li, Gerhard Rigoll

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

The transformer models and their variations currently are considered the prime model architectures in speech recognition since they yield state-of-the-art results on several datasets. Their main strength lies in the self-attention mechanism, where the models receive the ability to calculate a score over the whole input sequence and focus on essential aspects of the sequence. However, the attention score has some flaws. It is heavily global-dependent since it takes the whole sequence into account and normalizes along the sequence length. Our work presents a novel approach for a dynamic fusion between the global and a local attention score based on a Gaussian mask. The small networks for learning the fusion process and the Gaussian masks require only few additional parameters and are simple to add to current transformer architectures. With our exhaustive evaluation, we determine the effect of localness in the encoder layers and examine the most effective fusion approach. The results on the dataset TEDLIUMv2 demonstrate a steady improvement on the dev and the test set for the base transformer model equipped with our proposed fusion procedure for local attention.

Original languageEnglish
Title of host publicationSpeech and Computer - 23rd International Conference, SPECOM 2021, Proceedings
EditorsAlexey Karpov, Rodmonga Potapova
PublisherSpringer Science and Business Media Deutschland GmbH
Pages795-806
Number of pages12
ISBN (Print)9783030878016
DOIs
StatePublished - 2021
Event23rd International Conference on Speech and Computer, SPECOM 2021 - Virtual, Online
Duration: 27 Sep 202130 Sep 2021

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12997 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference23rd International Conference on Speech and Computer, SPECOM 2021
CityVirtual, Online
Period27/09/2130/09/21

Keywords

  • Attention fusion
  • Local attention
  • Speech recognition
  • Transformer

Fingerprint

Dive into the research topics of 'Induced Local Attention for Transformer Models in Speech Recognition'. Together they form a unique fingerprint.

Cite this