TY - JOUR
T1 - Neural Compression Augmentation for Contrastive Audio Representation Learning
AU - Wang, Zhaoyu
AU - Liu, Haohe
AU - Coppock, Harry
AU - Schuller, Björn
AU - Plumbley, Mark D.
N1 - Publisher Copyright:
© 2024 International Speech Communication Association. All rights reserved.
PY - 2024
Y1 - 2024
N2 - The choice of data augmentation is pivotal in contrastive self-supervised learning. Current augmentation techniques for audio data, such as the widely used Random Resize Crop (RRC), underperform in pitch-sensitive music tasks and lack generalisation across various types of audio. This study aims to address these limitations by introducing Neural Compression Augmentation (NCA), an approach based on lossy neural compression. We use the Audio Barlow Twins (ABT), a contrastive self-supervised framework for audio, as our backbone. We experiment with both NCA and several baseline augmentation methods in the augmentation block of ABT and train the models on AudioSet1. Experimental results show that models integrated with NCA considerably surpass the original performance of ABT, especially in the music tasks of the HEAR2 benchmark, demonstrating the effectiveness of compression-based augmentation for audio contrastive self-supervised learning.
AB - The choice of data augmentation is pivotal in contrastive self-supervised learning. Current augmentation techniques for audio data, such as the widely used Random Resize Crop (RRC), underperform in pitch-sensitive music tasks and lack generalisation across various types of audio. This study aims to address these limitations by introducing Neural Compression Augmentation (NCA), an approach based on lossy neural compression. We use the Audio Barlow Twins (ABT), a contrastive self-supervised framework for audio, as our backbone. We experiment with both NCA and several baseline augmentation methods in the augmentation block of ABT and train the models on AudioSet1. Experimental results show that models integrated with NCA considerably surpass the original performance of ABT, especially in the music tasks of the HEAR2 benchmark, demonstrating the effectiveness of compression-based augmentation for audio contrastive self-supervised learning.
KW - audio compression
KW - data augmentation
KW - self-supervised learning
UR - http://www.scopus.com/inward/record.url?scp=85214802403&partnerID=8YFLogxK
U2 - 10.21437/Interspeech.2024-1336
DO - 10.21437/Interspeech.2024-1336
M3 - Conference article
AN - SCOPUS:85214802403
SN - 2308-457X
SP - 687
EP - 691
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
T2 - 25th Interspeech Conferece 2024
Y2 - 1 September 2024 through 5 September 2024
ER -