Snore sound classification using image-based deep spectrum features

Shahin Amiriparian, Maurice Gerczuk, Sandra Ottl, Nicholas Cummins, Michael Freitag, Sergey Pugachevskiy, Alice Baird, Björn Schuller

Research output: Contribution to journalConference articlepeer-review

209 Scopus citations


In this paper, we propose a method for automatically detecting various types of snore sounds using image classification convolutional neural network (CNN) descriptors extracted from audio file spectrograms. The descriptors, denoted as deep spectrum features, are derived from forwarding spectrograms through very deep task-independent pre-trained CNNs. Specifically, activations of fully connected layers from two common image classification CNNs, AlexNet and VGG19, are used as feature vectors. Moreover, we investigate the impact of differing spectrogram colour maps and two CNN architectures on the performance of the system. Results presented indicate that deep spectrum features extracted from the activations of the second fully connected layer of AlexNet using a viridis colour map are well suited to the task. This feature space, when combined with a support vector classifier, outperforms the more conventional knowledge-based features of 6 373 acoustic functionals used in the INTERSPEECH ComParE 2017 Snoring sub-challenge baseline system. In comparison to the baseline, unweighted average recall is increased from 40.6% to 44.8 % on the development partition, and from 58.5 % to 67.0 % on the test partition.

Original languageEnglish
Pages (from-to)3512-3516
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
StatePublished - 2017
Externally publishedYes
Event18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017 - Stockholm, Sweden
Duration: 20 Aug 201724 Aug 2017


  • Computational paralinguistics
  • Convolutional neural networks
  • Deep learning
  • Snore sound
  • Spectral features


Dive into the research topics of 'Snore sound classification using image-based deep spectrum features'. Together they form a unique fingerprint.

Cite this