TY - GEN
T1 - Synchronized Forward-Backward Transformer for End-to-End Speech Recognition
AU - Watzel, Tobias
AU - Kürzinger, Ludwig
AU - Li, Lujun
AU - Rigoll, Gerhard
N1 - Publisher Copyright:
© 2020, Springer Nature Switzerland AG.
PY - 2020
Y1 - 2020
N2 - Recently, various approaches utilize transformer networks, which apply a new concept of self-attention, in end-to-end speech recognition. These approaches mainly focus on the self-attention mechanism to improve the performance of transformer models. In our work, we demonstrate the benefit of adding a second transformer network during the training phase, which is optimized on time-reversed target labels. This new transformer receives a future context, which is usually not available for standard transformer networks. We have access to future context information, which we integrate into the standard transformer network by proposing two novel synchronization terms. Since we only require the newly added transformer network during training, we are not changing the complexity of the final network and only adding training time. We evaluate our approach on the publicly available dataset TEDLIUMv2, where we achieve relative improvements of 9.8% for the dev and 6.5% on the test set, respectively, if we employ synchronization terms with euclidean metrics.
AB - Recently, various approaches utilize transformer networks, which apply a new concept of self-attention, in end-to-end speech recognition. These approaches mainly focus on the self-attention mechanism to improve the performance of transformer models. In our work, we demonstrate the benefit of adding a second transformer network during the training phase, which is optimized on time-reversed target labels. This new transformer receives a future context, which is usually not available for standard transformer networks. We have access to future context information, which we integrate into the standard transformer network by proposing two novel synchronization terms. Since we only require the newly added transformer network during training, we are not changing the complexity of the final network and only adding training time. We evaluate our approach on the publicly available dataset TEDLIUMv2, where we achieve relative improvements of 9.8% for the dev and 6.5% on the test set, respectively, if we employ synchronization terms with euclidean metrics.
KW - Forward-backward transformer
KW - Regularization
KW - Speech recognition
KW - Synchronization
KW - Transformer
UR - http://www.scopus.com/inward/record.url?scp=85092925105&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-60276-5_62
DO - 10.1007/978-3-030-60276-5_62
M3 - Conference contribution
AN - SCOPUS:85092925105
SN - 9783030602758
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 646
EP - 656
BT - Speech and Computer - 22nd International Conference, SPECOM 2020, Proceedings
A2 - Karpov, Alexey
A2 - Potapova, Rodmonga
PB - Springer Science and Business Media Deutschland GmbH
T2 - 22nd International Conference on Speech and Computer, SPECOM 2020
Y2 - 7 October 2020 through 9 October 2020
ER -