TY - GEN
T1 - SMILENets
T2 - 8th International Conference on Frontiers of Signal Processing, ICFSP 2023
AU - Nessiem, Mina A.
AU - Amin, Mostafa M.
AU - Schuller, Björn W.
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Considerable work has been invested over the years into crafting features for different modalities, to facilitate the modelling of information. Said work - and the knowledge it is based on - has thus far been unused as a prior in Deep Neural Networks (DNNs) being trained to learn representations of different modalities. The representations that DNNs have been extracting thus-far are only based on the data at hand, which may either not necessarily be the best representations possible, or depend on a pre-training the DNNs using data with a combined length of thousands to tens of thousands of hours. In this paper, we introduce SMILENets, DNNs that are trained as students to distil the feature knowledge of audio features such as the ComParE challenge features as extracted by the teacher openSMILE toolkit. We explore SMILENets in terms of their ability to extract said audio features for different datasets under in- and out-of training distribution conditions and in terms of their utility as neural feature extractors for neural networks that would model datasets under the same validation conditions. We show that SMILENets learn to distil, and thus, extract features accurately for in-distribution conditions, while exhibiting close results for out-of-distributions scenarios where the SMILENet has been trained on a representative training dataset.
AB - Considerable work has been invested over the years into crafting features for different modalities, to facilitate the modelling of information. Said work - and the knowledge it is based on - has thus far been unused as a prior in Deep Neural Networks (DNNs) being trained to learn representations of different modalities. The representations that DNNs have been extracting thus-far are only based on the data at hand, which may either not necessarily be the best representations possible, or depend on a pre-training the DNNs using data with a combined length of thousands to tens of thousands of hours. In this paper, we introduce SMILENets, DNNs that are trained as students to distil the feature knowledge of audio features such as the ComParE challenge features as extracted by the teacher openSMILE toolkit. We explore SMILENets in terms of their ability to extract said audio features for different datasets under in- and out-of training distribution conditions and in terms of their utility as neural feature extractors for neural networks that would model datasets under the same validation conditions. We show that SMILENets learn to distil, and thus, extract features accurately for in-distribution conditions, while exhibiting close results for out-of-distributions scenarios where the SMILENet has been trained on a representative training dataset.
KW - audio feature extraction
KW - computational paralinguistics
KW - knowledge distillation
KW - representation learning
KW - supervised learning
UR - http://www.scopus.com/inward/record.url?scp=85183472593&partnerID=8YFLogxK
U2 - 10.1109/ICFSP59764.2023.10372936
DO - 10.1109/ICFSP59764.2023.10372936
M3 - Conference contribution
AN - SCOPUS:85183472593
T3 - 2023 8th International Conference on Frontiers of Signal Processing, ICFSP 2023
SP - 32
EP - 37
BT - 2023 8th International Conference on Frontiers of Signal Processing, ICFSP 2023
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 23 October 2023 through 25 October 2023
ER -