TY - GEN
T1 - Augmenting generative adversarial networks for speech emotion recognition
AU - Latif, Siddique
AU - Asim, Muhammad
AU - Rana, Rajib
AU - Khalifa, Sara
AU - Jurdak, Raja
AU - Schuller, Björn W.
N1 - Publisher Copyright:
Copyright © 2020 ISCA
PY - 2020
Y1 - 2020
N2 - Generative adversarial networks (GANs) have shown potential in learning emotional attributes and generating new data samples. However, their performance is usually hindered by the unavailability of larger speech emotion recognition (SER) data. In this work, we propose a framework that utilises the mixup data augmentation scheme to augment the GAN in feature learning and generation. To show the effectiveness of the proposed framework, we present results for SER on (i) synthetic feature vectors, (ii) augmentation of the training data with synthetic features, (iii) encoded features in compressed representation. Our results show that the proposed framework can effectively learn compressed emotional representations as well as it can generate synthetic samples that help improve performance in within-corpus and cross-corpus evaluation.
AB - Generative adversarial networks (GANs) have shown potential in learning emotional attributes and generating new data samples. However, their performance is usually hindered by the unavailability of larger speech emotion recognition (SER) data. In this work, we propose a framework that utilises the mixup data augmentation scheme to augment the GAN in feature learning and generation. To show the effectiveness of the proposed framework, we present results for SER on (i) synthetic feature vectors, (ii) augmentation of the training data with synthetic features, (iii) encoded features in compressed representation. Our results show that the proposed framework can effectively learn compressed emotional representations as well as it can generate synthetic samples that help improve performance in within-corpus and cross-corpus evaluation.
KW - Data augmentation
KW - Feature learning
KW - Generative adversarial networks
KW - Mixup
KW - Speech emotion recognition
KW - Synthetic feature generation
UR - http://www.scopus.com/inward/record.url?scp=85092674248&partnerID=8YFLogxK
U2 - 10.21437/Interspeech.2020-3194
DO - 10.21437/Interspeech.2020-3194
M3 - Conference contribution
AN - SCOPUS:85092674248
SN - 9781713820697
T3 - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
SP - 521
EP - 525
BT - Interspeech 2020
PB - International Speech Communication Association
T2 - 21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020
Y2 - 25 October 2020 through 29 October 2020
ER -