TY - GEN
T1 - Temporal Oriented ResNet for Gaming Dimensional Emotion Prediction
AU - Song, Meishu
AU - Jing, Xin
AU - Parada-Cabaleiro, Emilia
AU - Yang, Zijiang
AU - Yamamoto, Yoshiharu
AU - Schuller, Björn W.
N1 - Publisher Copyright:
© 2024 European Signal Processing Conference, EUSIPCO. All rights reserved.
PY - 2024
Y1 - 2024
N2 - Voice interfaces are increasingly popular in games, with players’ emotional expressions rapidly and continuously changing during game-play. Thus, to adapt and customise player experiences, designers would highly benefit from emotion recognition tools, which are able to provide fine grained changes in player emotions on continuous dimensions. To this end, we utilised our previous speech dataset: the Multimodel Frustration Game Database (MFGD), which was collected for binary classification of frustration in game-play. In this study, we added new annotation which describes continuous levels of valence and arousal. Meanwhile, in order to extract more robust features shared between both dimensions, a multi-task learning framework which jointly learns valence-arousal representations is developed. Furthermore, a Temporal Oriented ResNet is suggested to evaluate the effectiveness of the proposed system. The proposed framework effectively predicts players’ arousal and valence from speech, as shown by the obtained Mean Absolute Error (MAE) (arousal: 0.0055, valence: 0.0055), which significantly outperforms the conventional ResNet18 baseline results (arousal: 0.0081, valence: 0.0129).
AB - Voice interfaces are increasingly popular in games, with players’ emotional expressions rapidly and continuously changing during game-play. Thus, to adapt and customise player experiences, designers would highly benefit from emotion recognition tools, which are able to provide fine grained changes in player emotions on continuous dimensions. To this end, we utilised our previous speech dataset: the Multimodel Frustration Game Database (MFGD), which was collected for binary classification of frustration in game-play. In this study, we added new annotation which describes continuous levels of valence and arousal. Meanwhile, in order to extract more robust features shared between both dimensions, a multi-task learning framework which jointly learns valence-arousal representations is developed. Furthermore, a Temporal Oriented ResNet is suggested to evaluate the effectiveness of the proposed system. The proposed framework effectively predicts players’ arousal and valence from speech, as shown by the obtained Mean Absolute Error (MAE) (arousal: 0.0055, valence: 0.0055), which significantly outperforms the conventional ResNet18 baseline results (arousal: 0.0081, valence: 0.0129).
KW - arousal
KW - emotion recognition
KW - game
KW - multi-task learning
KW - valence
UR - http://www.scopus.com/inward/record.url?scp=85208443018&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85208443018
T3 - European Signal Processing Conference
SP - 596
EP - 600
BT - 32nd European Signal Processing Conference, EUSIPCO 2024 - Proceedings
PB - European Signal Processing Conference, EUSIPCO
T2 - 32nd European Signal Processing Conference, EUSIPCO 2024
Y2 - 26 August 2024 through 30 August 2024
ER -