Neural Compression Augmentation for Contrastive Audio Representation Learning

Zhaoyu Wang, Haohe Liu, Harry Coppock, Björn Schuller, Mark D. Plumbley

Research output: Contribution to journalConference articlepeer-review

Abstract

The choice of data augmentation is pivotal in contrastive self-supervised learning. Current augmentation techniques for audio data, such as the widely used Random Resize Crop (RRC), underperform in pitch-sensitive music tasks and lack generalisation across various types of audio. This study aims to address these limitations by introducing Neural Compression Augmentation (NCA), an approach based on lossy neural compression. We use the Audio Barlow Twins (ABT), a contrastive self-supervised framework for audio, as our backbone. We experiment with both NCA and several baseline augmentation methods in the augmentation block of ABT and train the models on AudioSet1. Experimental results show that models integrated with NCA considerably surpass the original performance of ABT, especially in the music tasks of the HEAR2 benchmark, demonstrating the effectiveness of compression-based augmentation for audio contrastive self-supervised learning.

Original languageEnglish
Pages (from-to)687-691
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
DOIs
StatePublished - 2024
Externally publishedYes
Event25th Interspeech Conferece 2024 - Kos Island, Greece
Duration: 1 Sep 20245 Sep 2024

Keywords

  • audio compression
  • data augmentation
  • self-supervised learning

Fingerprint

Dive into the research topics of 'Neural Compression Augmentation for Contrastive Audio Representation Learning'. Together they form a unique fingerprint.

Cite this