Abstract
Despite recent advancements in speech emotion recognition (SER) models, state-of-the-art deep learning (DL) approaches face the challenge of the limited availability of annotated data. The advent of large language models (LLMs) has revolutionised our understanding of natural language, introducing emergent properties that broaden comprehension in language, speech, and vision. This paper explores the potential of LLMs, such as ChatGPT, to annotate abundant speech data with the goal of advancing the state-of-the-art in SER. Specifically, it proposes a method that integrates audio representations and gender information with textual prompts to enhance the annotation process using LLMs. Our evaluation encompasses single-shot and few-shots scenarios, revealing performance variability in SER. Notably, this work achieves improved results through data augmentation by incorporating ChatGPT-annotated samples into the existing datasets. Our work also uncovers new frontiers in speech emotion classification, highlighting the increasing significance of LLMs in this field moving forward.
| Original language | English |
|---|---|
| Pages (from-to) | 66-77 |
| Number of pages | 12 |
| Journal | IEEE Computational Intelligence Magazine |
| Volume | 20 |
| Issue number | 1 |
| DOIs | |
| State | Published - 2025 |
| Externally published | Yes |
Fingerprint
Dive into the research topics of 'Can Large Language Models Aid in Annotating Speech Emotional Data? Uncovering New Frontiers [Research Frontier]'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver