End-to-end audio classification with small datasets - Making it work

Maximilian Schmitt, Björn Schuller

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

9 Scopus citations

Abstract

Deep end-to-end learning is a promising approach for many types of audio classification tasks. However, in fields such as health care and medical diagnosis, training data can be scarce, which makes training a neural network from the raw waveform to the target a challenge. In this work, we focus on a public dataset of human snore sounds, categorised into four classes, where one particular class has only a few training samples. We emphasise the pitfalls that need to be taken into account when working with such data and propose an end-to-end model providing a performance similar to that of other deep and non-deep approaches. Furthermore, we show that a model using only convolutional layers outperforms a model employing also recurrent layers.

Original languageEnglish
Title of host publicationEUSIPCO 2019 - 27th European Signal Processing Conference
PublisherEuropean Signal Processing Conference, EUSIPCO
ISBN (Electronic)9789082797039
DOIs
StatePublished - Sep 2019
Externally publishedYes
Event27th European Signal Processing Conference, EUSIPCO 2019 - A Coruna, Spain
Duration: 2 Sep 20196 Sep 2019

Publication series

NameEuropean Signal Processing Conference
Volume2019-September
ISSN (Print)2219-5491

Conference

Conference27th European Signal Processing Conference, EUSIPCO 2019
Country/TerritorySpain
CityA Coruna
Period2/09/196/09/19

Keywords

  • Audio classification
  • End-to-end learning
  • Representation learning
  • Scarce data
  • Snore sounds

Fingerprint

Dive into the research topics of 'End-to-end audio classification with small datasets - Making it work'. Together they form a unique fingerprint.

Cite this