SMILENets: Audio Representation Learning via Neural Knowledge Distillation of Traditional Audio-Feature Extractors

Mina A. Nessiem, Mostafa M. Amin, Björn W. Schuller

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Considerable work has been invested over the years into crafting features for different modalities, to facilitate the modelling of information. Said work - and the knowledge it is based on - has thus far been unused as a prior in Deep Neural Networks (DNNs) being trained to learn representations of different modalities. The representations that DNNs have been extracting thus-far are only based on the data at hand, which may either not necessarily be the best representations possible, or depend on a pre-training the DNNs using data with a combined length of thousands to tens of thousands of hours. In this paper, we introduce SMILENets, DNNs that are trained as students to distil the feature knowledge of audio features such as the ComParE challenge features as extracted by the teacher openSMILE toolkit. We explore SMILENets in terms of their ability to extract said audio features for different datasets under in- and out-of training distribution conditions and in terms of their utility as neural feature extractors for neural networks that would model datasets under the same validation conditions. We show that SMILENets learn to distil, and thus, extract features accurately for in-distribution conditions, while exhibiting close results for out-of-distributions scenarios where the SMILENet has been trained on a representative training dataset.

Original languageEnglish
Title of host publication2023 8th International Conference on Frontiers of Signal Processing, ICFSP 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages32-37
Number of pages6
ISBN (Electronic)9798350308792
DOIs
StatePublished - 2023
Externally publishedYes
Event8th International Conference on Frontiers of Signal Processing, ICFSP 2023 - Hybrid, Corfu, Greece
Duration: 23 Oct 202325 Oct 2023

Publication series

Name2023 8th International Conference on Frontiers of Signal Processing, ICFSP 2023

Conference

Conference8th International Conference on Frontiers of Signal Processing, ICFSP 2023
Country/TerritoryGreece
CityHybrid, Corfu
Period23/10/2325/10/23

Keywords

  • audio feature extraction
  • computational paralinguistics
  • knowledge distillation
  • representation learning
  • supervised learning

Fingerprint

Dive into the research topics of 'SMILENets: Audio Representation Learning via Neural Knowledge Distillation of Traditional Audio-Feature Extractors'. Together they form a unique fingerprint.

Cite this