TY - JOUR
T1 - Light-weight self-attention augmented generative adversarial networks for speech enhancement
AU - Li, Lujun
AU - Lu, Zhenxing
AU - Watzel, Tobias
AU - Kürzinger, Ludwig
AU - Rigoll, Gerhard
N1 - Publisher Copyright:
Copyright: © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).
PY - 2021/7/1
Y1 - 2021/7/1
N2 - Generative adversarial networks (GANs) have shown their superiority for speech enhancement. Nevertheless, most previous attempts had convolutional layers as the backbone, which may obscure long-range dependencies across an input sequence due to the convolution operator’s local receptive field. One popular solution is substituting recurrent neural networks (RNNs) for convolutional neural networks, but RNNs are computationally inefficient, caused by the unparallelization of their temporal iterations. To circumvent this limitation, we propose an end-to-end system for speech enhancement by applying the self-attention mechanism to GANs. We aim to achieve a system that is flexible in modeling both long-range and local interactions and can be computationally efficient at the same time. Our work is implemented in three phases: firstly, we apply the stand-alone self-attention layer in speech enhancement GANs. Secondly, we employ locality modeling on the stand-alone self-attention layer. Lastly, we investigate the functionality of the self-attention augmented convolutional speech enhancement GANs. Systematic experiment results indicate that equipped with the stand-alone self-attention layer, the system outperforms baseline systems across classic evaluation criteria with up to 95 % fewer parameters. Moreover, locality modeling can be a parameter-free approach for further performance improvement, and self-attention augmentation also overtakes all baseline systems with acceptably increased parameters.
AB - Generative adversarial networks (GANs) have shown their superiority for speech enhancement. Nevertheless, most previous attempts had convolutional layers as the backbone, which may obscure long-range dependencies across an input sequence due to the convolution operator’s local receptive field. One popular solution is substituting recurrent neural networks (RNNs) for convolutional neural networks, but RNNs are computationally inefficient, caused by the unparallelization of their temporal iterations. To circumvent this limitation, we propose an end-to-end system for speech enhancement by applying the self-attention mechanism to GANs. We aim to achieve a system that is flexible in modeling both long-range and local interactions and can be computationally efficient at the same time. Our work is implemented in three phases: firstly, we apply the stand-alone self-attention layer in speech enhancement GANs. Secondly, we employ locality modeling on the stand-alone self-attention layer. Lastly, we investigate the functionality of the self-attention augmented convolutional speech enhancement GANs. Systematic experiment results indicate that equipped with the stand-alone self-attention layer, the system outperforms baseline systems across classic evaluation criteria with up to 95 % fewer parameters. Moreover, locality modeling can be a parameter-free approach for further performance improvement, and self-attention augmentation also overtakes all baseline systems with acceptably increased parameters.
KW - Generative adversarial networks
KW - Self-attention mechanism
KW - Speech enhancement
UR - http://www.scopus.com/inward/record.url?scp=85108885436&partnerID=8YFLogxK
U2 - 10.3390/electronics10131586
DO - 10.3390/electronics10131586
M3 - Article
AN - SCOPUS:85108885436
SN - 2079-9292
VL - 10
JO - Electronics (Switzerland)
JF - Electronics (Switzerland)
IS - 13
M1 - 1586
ER -