TY - JOUR
T1 - A Wide Evaluation of ChatGPT on Affective Computing Tasks
AU - Amin, Mostafa M.
AU - Mao, Rui
AU - Cambria, Erik
AU - Schuller, Bjorn W.
N1 - Publisher Copyright:
© 2010-2012 IEEE.
PY - 2024
Y1 - 2024
N2 - With the rise of foundation models, a new artificial intelligence paradigm has emerged, by simply using general purpose foundation models with prompting to solve problems instead of training a separate machine learning model for each problem. Such models have been shown to have emergent properties of solving problems that they were not initially trained on. The studies for the effectiveness of such models are still quite limited. In this work, we widely study the capabilities of the ChatGPT models, namely GPT-4 and GPT-3.5, on 13 affective computing problems, namely aspect extraction, aspect polarity classification, opinion extraction, sentiment analysis, sentiment intensity ranking, emotions intensity ranking, suicide tendency detection, toxicity detection, well-being assessment, engagement measurement, personality assessment, sarcasm detection, and subjectivity detection. We introduce a framework to evaluate the ChatGPT models on regression-based problems, such as intensity ranking problems, by modelling them as pairwise ranking classification. We compare ChatGPT against more traditional NLP methods, such as end-to-end recurrent neural networks and transformers. The results demonstrate the emergent abilities of the ChatGPT models on a wide range of affective computing problems, where GPT-3.5 and especially GPT-4 have shown strong performance on many problems, particularly the ones related to sentiment, emotions, or toxicity. The ChatGPT models fell short for problems with implicit signals, such as engagement measurement and subjectivity detection.
AB - With the rise of foundation models, a new artificial intelligence paradigm has emerged, by simply using general purpose foundation models with prompting to solve problems instead of training a separate machine learning model for each problem. Such models have been shown to have emergent properties of solving problems that they were not initially trained on. The studies for the effectiveness of such models are still quite limited. In this work, we widely study the capabilities of the ChatGPT models, namely GPT-4 and GPT-3.5, on 13 affective computing problems, namely aspect extraction, aspect polarity classification, opinion extraction, sentiment analysis, sentiment intensity ranking, emotions intensity ranking, suicide tendency detection, toxicity detection, well-being assessment, engagement measurement, personality assessment, sarcasm detection, and subjectivity detection. We introduce a framework to evaluate the ChatGPT models on regression-based problems, such as intensity ranking problems, by modelling them as pairwise ranking classification. We compare ChatGPT against more traditional NLP methods, such as end-to-end recurrent neural networks and transformers. The results demonstrate the emergent abilities of the ChatGPT models on a wide range of affective computing problems, where GPT-3.5 and especially GPT-4 have shown strong performance on many problems, particularly the ones related to sentiment, emotions, or toxicity. The ChatGPT models fell short for problems with implicit signals, such as engagement measurement and subjectivity detection.
KW - ChatGPT
KW - GPT-4
KW - affective computing
KW - aspect-based sentiment analysis
KW - emotions intensity ranking
KW - engagement measurement
KW - foundation models
KW - personality assessment
KW - sarcasm detection
KW - sentiment analysis
KW - sentiment intensity ranking
KW - subjectivity detection
KW - suicide tendency detection
KW - toxicity detection
KW - well-being assessment
UR - http://www.scopus.com/inward/record.url?scp=85197057514&partnerID=8YFLogxK
U2 - 10.1109/TAFFC.2024.3419593
DO - 10.1109/TAFFC.2024.3419593
M3 - Article
AN - SCOPUS:85197057514
SN - 1949-3045
VL - 15
SP - 2204
EP - 2212
JO - IEEE Transactions on Affective Computing
JF - IEEE Transactions on Affective Computing
IS - 4
ER -