TY - JOUR
T1 - Speech-Based Classification of Defensive Communication
T2 - 24th International Speech Communication Association, Interspeech 2023
AU - Amiriparian, Shahin
AU - Christ, Lukas
AU - Kushtanova, Regina
AU - Gerczuk, Maurice
AU - Teynor, Alexandra
AU - Schuller, Björn W.
N1 - Publisher Copyright:
© 2023 International Speech Communication Association. All rights reserved.
PY - 2023
Y1 - 2023
N2 - Defensive communication is known to have detrimental effects on the quality of social interactions. Hence, recognising and reducing defensive behaviour is crucial to improving professional and personal communication. We introduce DefComm-DB, a novel multimodal dataset comprising video recordings in which one of the following types of defensive communication is present: (i) verbally attacking the conversation partner, (ii) withdrawing from the communication, (iii) making oneself greater, and (iv) making oneself smaller. Subsequently, we present a machine learning approach for the automatic classification of DefComm-DB. In particular, we utilise wav2vec2, autoencoders, a pre-trained CNN and openSMILE for feature extraction from the audio modality. For the text stream, we apply ELECTRA and SBERT. On the unseen test set, our models achieve an Unweighted Average Recall of 49.4 % and 52.2 % for the audio and text modalities, respectively, showing the feasibility of the introduced challenge.
AB - Defensive communication is known to have detrimental effects on the quality of social interactions. Hence, recognising and reducing defensive behaviour is crucial to improving professional and personal communication. We introduce DefComm-DB, a novel multimodal dataset comprising video recordings in which one of the following types of defensive communication is present: (i) verbally attacking the conversation partner, (ii) withdrawing from the communication, (iii) making oneself greater, and (iv) making oneself smaller. Subsequently, we present a machine learning approach for the automatic classification of DefComm-DB. In particular, we utilise wav2vec2, autoencoders, a pre-trained CNN and openSMILE for feature extraction from the audio modality. For the text stream, we apply ELECTRA and SBERT. On the unseen test set, our models achieve an Unweighted Average Recall of 49.4 % and 52.2 % for the audio and text modalities, respectively, showing the feasibility of the introduced challenge.
KW - Transformers
KW - computational paralinguistics
KW - defensive communication
KW - speech processing
UR - http://www.scopus.com/inward/record.url?scp=85171559902&partnerID=8YFLogxK
U2 - 10.21437/Interspeech.2023-76
DO - 10.21437/Interspeech.2023-76
M3 - Conference article
AN - SCOPUS:85171559902
SN - 2308-457X
VL - 2023-August
SP - 2703
EP - 2707
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Y2 - 20 August 2023 through 24 August 2023
ER -