TY - JOUR
T1 - Connecting subspace learning and extreme learning machine in speech emotion recognition
AU - Xu, Xinzhou
AU - Deng, Jun
AU - Coutinho, Eduardo
AU - Wu, Chen
AU - Zhao, Li
AU - Schuller, Björn W.
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2019/3
Y1 - 2019/3
N2 - Speech emotion recognition (SER) is a powerful tool for endowing computers with the capacity to process information about the affective states of users in human–machine interactions. Recent research has shown the effectiveness of graph embedding-based subspace learning and extreme learning machine applied to SER, but there are still various drawbacks in these two techniques that limit their application. Regarding subspace learning, the change from linearity to nonlinearity is usually achieved through kernelization, whereas extreme learning machines only take label information into consideration at the output layer. In order to overcome these drawbacks, this paper leverages extreme learning machines for dimensionality reduction and proposes a novel framework to combine spectral regression-based subspace learning and extreme learning machines. The proposed framework contains three stages—data mapping, graph decomposition, and regression. At the data mapping stage, various mapping strategies provide different views of the samples. At the graph decomposition stage, specifically designed embedding graphs provide a possibility to better represent the structure of data through generating virtual coordinates. Finally, at the regression stage, dimension-reduced mappings are achieved by connecting the virtual coordinates and data mapping. Using this framework, we propose several novel dimensionality reduction algorithms, apply them to SER tasks, and compare their performance to relevant state-of-the-art methods. Our results on several paralinguistic corpora show that our proposed techniques lead to significant improvements.
AB - Speech emotion recognition (SER) is a powerful tool for endowing computers with the capacity to process information about the affective states of users in human–machine interactions. Recent research has shown the effectiveness of graph embedding-based subspace learning and extreme learning machine applied to SER, but there are still various drawbacks in these two techniques that limit their application. Regarding subspace learning, the change from linearity to nonlinearity is usually achieved through kernelization, whereas extreme learning machines only take label information into consideration at the output layer. In order to overcome these drawbacks, this paper leverages extreme learning machines for dimensionality reduction and proposes a novel framework to combine spectral regression-based subspace learning and extreme learning machines. The proposed framework contains three stages—data mapping, graph decomposition, and regression. At the data mapping stage, various mapping strategies provide different views of the samples. At the graph decomposition stage, specifically designed embedding graphs provide a possibility to better represent the structure of data through generating virtual coordinates. Finally, at the regression stage, dimension-reduced mappings are achieved by connecting the virtual coordinates and data mapping. Using this framework, we propose several novel dimensionality reduction algorithms, apply them to SER tasks, and compare their performance to relevant state-of-the-art methods. Our results on several paralinguistic corpora show that our proposed techniques lead to significant improvements.
KW - Extreme learning machine
KW - Graphembedding
KW - Spectralregression
KW - Speech emotion recognition
KW - Subspacelearning
UR - http://www.scopus.com/inward/record.url?scp=85051789661&partnerID=8YFLogxK
U2 - 10.1109/TMM.2018.2865834
DO - 10.1109/TMM.2018.2865834
M3 - Article
AN - SCOPUS:85051789661
SN - 1520-9210
VL - 21
SP - 795
EP - 808
JO - IEEE Transactions on Multimedia
JF - IEEE Transactions on Multimedia
IS - 3
M1 - 8440079
ER -