TY - JOUR
T1 - Enrolment-based personalisation for improving individual-level fairness in speech emotion recognition
AU - Triantafyllopoulos, Andreas
AU - Schuller, Björn
N1 - Publisher Copyright:
© 2024 International Speech Communication Association. All rights reserved.
PY - 2024
Y1 - 2024
N2 - The expression of emotion is highly individualistic. However, contemporary speech emotion recognition (SER) systems typically rely on population-level models that adopt a 'one-size-fits-all' approach for predicting emotion. Moreover, standard evaluation practices measure performance also on the population level, thus failing to characterise how models work across different speakers. In the present contribution, we present a new method for capitalising on individual differences to adapt an SER model to each new speaker using a minimal set of enrolment utterances. In addition, we present novel evaluation schemes for measuring fairness across different speakers. Our findings show that aggregated evaluation metrics may obfuscate fairness issues on the individual-level, which are uncovered by our evaluation, and that our proposed method can improve performance both in aggregated and disaggregated terms.
AB - The expression of emotion is highly individualistic. However, contemporary speech emotion recognition (SER) systems typically rely on population-level models that adopt a 'one-size-fits-all' approach for predicting emotion. Moreover, standard evaluation practices measure performance also on the population level, thus failing to characterise how models work across different speakers. In the present contribution, we present a new method for capitalising on individual differences to adapt an SER model to each new speaker using a minimal set of enrolment utterances. In addition, we present novel evaluation schemes for measuring fairness across different speakers. Our findings show that aggregated evaluation metrics may obfuscate fairness issues on the individual-level, which are uncovered by our evaluation, and that our proposed method can improve performance both in aggregated and disaggregated terms.
KW - computational paralinguistics
KW - deep learning
KW - fairness
KW - personalisation
KW - speech emotion recognition
UR - http://www.scopus.com/inward/record.url?scp=85214811912&partnerID=8YFLogxK
U2 - 10.21437/Interspeech.2024-98
DO - 10.21437/Interspeech.2024-98
M3 - Conference article
AN - SCOPUS:85214811912
SN - 2308-457X
SP - 3729
EP - 3733
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
T2 - 25th Interspeech Conferece 2024
Y2 - 1 September 2024 through 5 September 2024
ER -