TY - JOUR
T1 - Recognition of echolalic autistic child vocalisations utilising convolutional recurrent neural networks
AU - Amiriparian, Shahin
AU - Baird, Alice
AU - Julka, Sahib
AU - Alcorn, Alyssa
AU - Ottl, Sandra
AU - Petrović, Sunčica
AU - Ainger, Eloise
AU - Cummins, Nicholas
AU - Schuller, Björn
N1 - Publisher Copyright:
© 2018 International Speech Communication Association. All rights reserved.
PY - 2018
Y1 - 2018
N2 - Autism spectrum conditions (ASC) are a set of neuro-developmental conditions partly characterised by difficulties with communication. Individuals with ASC can show a variety of atypical speech behaviours, including echolalia or the 'echoing' of another's speech. We herein introduce a new dataset of 15 Serbian ASC children in a human-robot interaction scenario, annotated for the presence of echolalia amongst other ASC vocal behaviours. From this, we propose a four-class classification problem and investigate the suitability of applying a 2D convolutional neural network augmented with a recurrent neural network with bidirectional long short-term memory cells to solve the proposed task of echolalia recognition. In this approach, log Mel-spectrograms are first generated from the audio recordings and then fed as input into the convolutional layers to extract high-level spectral features. The subsequent recurrent layers are applied to learn the long-term temporal context from the obtained features. Finally, we use a feed forward neural network with softmax activation to classify the dataset. To evaluate the performance of our deep learning approach, we use leave-one-subject-out cross-validation. Key results presented indicate the suitability of our approach by achieving a classification accuracy of 83.5 % unweighted average recall.
AB - Autism spectrum conditions (ASC) are a set of neuro-developmental conditions partly characterised by difficulties with communication. Individuals with ASC can show a variety of atypical speech behaviours, including echolalia or the 'echoing' of another's speech. We herein introduce a new dataset of 15 Serbian ASC children in a human-robot interaction scenario, annotated for the presence of echolalia amongst other ASC vocal behaviours. From this, we propose a four-class classification problem and investigate the suitability of applying a 2D convolutional neural network augmented with a recurrent neural network with bidirectional long short-term memory cells to solve the proposed task of echolalia recognition. In this approach, log Mel-spectrograms are first generated from the audio recordings and then fed as input into the convolutional layers to extract high-level spectral features. The subsequent recurrent layers are applied to learn the long-term temporal context from the obtained features. Finally, we use a feed forward neural network with softmax activation to classify the dataset. To evaluate the performance of our deep learning approach, we use leave-one-subject-out cross-validation. Key results presented indicate the suitability of our approach by achieving a classification accuracy of 83.5 % unweighted average recall.
KW - Autism spectrum conditions
KW - Convolutional recurrent neural network
KW - Echolalia
KW - Vocal abnormalities
UR - http://www.scopus.com/inward/record.url?scp=85055002994&partnerID=8YFLogxK
U2 - 10.21437/Interspeech.2018-1772
DO - 10.21437/Interspeech.2018-1772
M3 - Conference article
AN - SCOPUS:85055002994
SN - 2308-457X
VL - 2018-September
SP - 2334
EP - 2338
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
T2 - 19th Annual Conference of the International Speech Communication, INTERSPEECH 2018
Y2 - 2 September 2018 through 6 September 2018
ER -