TY - GEN
T1 - Assessing the prosody of non-native speakers of English
T2 - 10th International Conference on Language Resources and Evaluation, LREC 2016
AU - Coutinho, Eduardo
AU - Hönig, Florian
AU - Zhang, Yue
AU - Hantke, Simone
AU - Batliner, Anton
AU - Nöth, Elmar
AU - Schuller, Björn
PY - 2016
Y1 - 2016
N2 - In this paper, we describe a new database with audio recordings of non-native (L2) speakers of English, and the perceptual evaluation experiment conducted with native English speakers for assessing the prosody of each recording. These annotations are then used to compute the gold standard using different methods, and a series of regression experiments is conducted to evaluate their impact on the performance of a regression model predicting the degree of naturalness of L2 speech. Further, we compare the relevance of different feature groups modelling prosody in general (without speech tempo), speech rate and pauses modelling speech tempo (fluency), voice quality, and a variety of spectral features. We also discuss the impact of various fusion strategies on performance. Overall, our results demonstrate that the prosody of non-native speakers of English as L2 can be reliably assessed using supra-segmental audio features; prosodic features seem to be the most important ones.
AB - In this paper, we describe a new database with audio recordings of non-native (L2) speakers of English, and the perceptual evaluation experiment conducted with native English speakers for assessing the prosody of each recording. These annotations are then used to compute the gold standard using different methods, and a series of regression experiments is conducted to evaluate their impact on the performance of a regression model predicting the degree of naturalness of L2 speech. Further, we compare the relevance of different feature groups modelling prosody in general (without speech tempo), speech rate and pauses modelling speech tempo (fluency), voice quality, and a variety of spectral features. We also discuss the impact of various fusion strategies on performance. Overall, our results demonstrate that the prosody of non-native speakers of English as L2 can be reliably assessed using supra-segmental audio features; prosodic features seem to be the most important ones.
KW - Feature evaluation
KW - Index terms: non-native speech
KW - Prosody
UR - http://www.scopus.com/inward/record.url?scp=85016645446&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85016645446
T3 - Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016
SP - 1328
EP - 1332
BT - Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016
A2 - Calzolari, Nicoletta
A2 - Choukri, Khalid
A2 - Mazo, Helene
A2 - Moreno, Asuncion
A2 - Declerck, Thierry
A2 - Goggi, Sara
A2 - Grobelnik, Marko
A2 - Odijk, Jan
A2 - Piperidis, Stelios
A2 - Maegaard, Bente
A2 - Mariani, Joseph
PB - European Language Resources Association (ELRA)
Y2 - 23 May 2016 through 28 May 2016
ER -