Deep unsupervised representation learning for audio-based medical applications

Shahin Amiriparian, Maximilian Schmitt, Sandra Ottl, Maurice Gerczuk, Björn Schuller

Publikation: Beitrag in Buch/Bericht/KonferenzbandKapitelBegutachtung

1 Zitat (Scopus)


Feature learning denotes a set of approaches for transforming raw input data into representations that can be effectively utilised in solving machine learning problems. Classifiers or regressors require training data which is computationally suitable to process. However, real-world data, e.g., an audio recording from a group of people talking in a park whilst in the background a dog is barking and a musician is playing the guitar, or health-related data such as coughing and sneezing recorded by consumer smartphones, comprises a remarkably variable and complex nature. For understanding such data, developing expert-designed, hand-crafted features often demands for an exhaustive amount of time and resources. Another disadvantage of such features is the lack of generalisation, i.e., there is a need for re-engineering new features for new tasks. Therefore, it is inevitable to develop automatic representation learning methods. In this chapter, we first discuss the preliminaries of contemporary representation learning techniques for computer audition tasks. Hereby, we differentiate between approaches based on Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs). We then introduce and evaluate three state-of-the-art deep learning systems for unsupervised representation learning from raw audio: (1) pre-trained image classification CNNs, (2) a deep convolutional generative adversarial network (DCGAN), and (3) a recurrent sequence-to-sequence autoencoder (S2SAE). For each of these algorithms, the representations are obtained from the spectrograms of the input audio data. Finally, for a range of audio-based machine learning tasks, including abnormal heart sound classification, snore sound classification, and bipolar disorder recognition, we evaluate the efficacy of the deep representations, which are: (i) the activations of the fully connected layers of the pre-trained CNNs, (ii) the activations of the discriminator in case of the DCGAN, and (iii) the activations of a fully connected layer between the encoder and decoder units in case of the S2SAE.

TitelIntelligent Systems Reference Library
Herausgeber (Verlag)Springer
PublikationsstatusVeröffentlicht - 2020
Extern publiziertJa


NameIntelligent Systems Reference Library
ISSN (Print)1868-4394
ISSN (elektronisch)1868-4408


Untersuchen Sie die Forschungsthemen von „Deep unsupervised representation learning for audio-based medical applications“. Zusammen bilden sie einen einzigartigen Fingerprint.

Dieses zitieren