TY - JOUR
T1 - Towards Multimodal Prediction of Spontaneous Humor
T2 - A Novel Dataset and First Results
AU - Christ, Lukas
AU - Amiriparian, Shahin
AU - Kathan, Alexander
AU - Muller, Niklas
AU - Konig, Andreas
AU - Schuller, Bjorn W.
N1 - Publisher Copyright:
© 2010-2012 IEEE.
PY - 2024
Y1 - 2024
N2 - Humor is a substantial element of human social behavior, affect, and cognition. Its automatic understanding can facilitate a more naturalistic human-AI interaction. Current methods of humor detection have been exclusively based on staged data, making them inadequate for 'real-world' applications. We contribute to addressing this deficiency by introducing the novel Passau-Spontaneous Football Coach Humor (Passau-SFCH) dataset, comprising about 11 hours of recordings. The Passau-SFCH dataset is annotated for the presence of humor and its dimensions (sentiment and direction) as proposed in Martin's Humor Style Questionnaire. We conduct a series of experiments employing pretrained Transformers, convolutional neural networks, and expert-designed features. The performance of each modality (text, audio, video) for spontaneous humor recognition is analyzed and their complementarity is investigated. Our findings suggest that for the automatic analysis of humor and its sentiment, facial expressions are most promising, while humor direction can be best modeled via text-based features. Further, we experiment with different multimodal approaches to humor recognition, including decision-level fusion and MulT, a multimodal Transformer approach. In this context, we propose a novel multimodal architecture that yields the best overall results. Finally, we make our code publicly available at https://www.github.com/lc0197/passau-sfch. The Passau-SFCH dataset is available upon request.
AB - Humor is a substantial element of human social behavior, affect, and cognition. Its automatic understanding can facilitate a more naturalistic human-AI interaction. Current methods of humor detection have been exclusively based on staged data, making them inadequate for 'real-world' applications. We contribute to addressing this deficiency by introducing the novel Passau-Spontaneous Football Coach Humor (Passau-SFCH) dataset, comprising about 11 hours of recordings. The Passau-SFCH dataset is annotated for the presence of humor and its dimensions (sentiment and direction) as proposed in Martin's Humor Style Questionnaire. We conduct a series of experiments employing pretrained Transformers, convolutional neural networks, and expert-designed features. The performance of each modality (text, audio, video) for spontaneous humor recognition is analyzed and their complementarity is investigated. Our findings suggest that for the automatic analysis of humor and its sentiment, facial expressions are most promising, while humor direction can be best modeled via text-based features. Further, we experiment with different multimodal approaches to humor recognition, including decision-level fusion and MulT, a multimodal Transformer approach. In this context, we propose a novel multimodal architecture that yields the best overall results. Finally, we make our code publicly available at https://www.github.com/lc0197/passau-sfch. The Passau-SFCH dataset is available upon request.
KW - Affective computing
KW - computer audition
KW - computer vision
KW - dataset
KW - humor
KW - multimedia
KW - natural language processing
KW - sentiment analysis
UR - http://www.scopus.com/inward/record.url?scp=85207140767&partnerID=8YFLogxK
U2 - 10.1109/TAFFC.2024.3475736
DO - 10.1109/TAFFC.2024.3475736
M3 - Article
AN - SCOPUS:85207140767
SN - 1949-3045
JO - IEEE Transactions on Affective Computing
JF - IEEE Transactions on Affective Computing
ER -