ParaCLAP - Towards a general language-audio model for computational paralinguistic tasks

Xin Jing, Andreas Triantafyllopoulos, Björn Schuller

Publikation: Beitrag in FachzeitschriftKonferenzartikelBegutachtung

Abstract

Contrastive language-audio pretraining (CLAP) has recently emerged as a method for making audio analysis more general-isable. Specifically, CLAP-style models are able to 'answer' a diverse set of language queries, extending the capabilities of audio models beyond a closed set of labels. However, CLAP relies on a large set of (audio, query) pairs for pretraining. While such sets are available for general audio tasks, like captioning or sound event detection, there are no datasets with matched audio and text queries for computational paralinguistic (CP) tasks. As a result, the community relies on generic CLAP models trained for general audio with limited success. In the present study, we explore training considerations for ParaCLAP, a CLAP-style model suited to CP, including a novel process for creating audio-language queries. We demonstrate its effectiveness on a set of computational paralinguistic tasks, where it is shown to surpass the performance of open-source state-of-the-art models. Our code and resources are publicly available at: https://github.com/KeiKinn/ParaCLAP.

OriginalspracheEnglisch
Seiten (von - bis)1155-1159
Seitenumfang5
FachzeitschriftProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
DOIs
PublikationsstatusVeröffentlicht - 2024
Veranstaltung25th Interspeech Conferece 2024 - Kos Island, Griechenland
Dauer: 1 Sept. 20245 Sept. 2024

Fingerprint

Untersuchen Sie die Forschungsthemen von „ParaCLAP - Towards a general language-audio model for computational paralinguistic tasks“. Zusammen bilden sie einen einzigartigen Fingerprint.

Dieses zitieren