TY - GEN
T1 - Perception of paralinguistic traits in synthesized voices
AU - Baird, Alice
AU - Jørgensen, Stina Hasse
AU - Parada-Cabaleiro, Emilia
AU - Hantke, Simone
AU - Cummins, Nicholas
AU - Schuller, Björn
N1 - Publisher Copyright:
© 2017 Copyright held by the owner/author(s).
PY - 2017/8/23
Y1 - 2017/8/23
N2 - Along with the rise of artificial intelligence and the internet-of-things, synthesized voices are now common in daily-life, providing us with guidance, assistance, and even companionship. From formant to concatenative synthesis, the synthesized voice continues to be defined by the same traits we prescribe to ourselves. When the recorded voice is synthesized, does our perception of its new machine embodiment change, and can we consider an alternative, more inclusive form? To begin evaluating the impact of aesthetic design, this study presents a first-step perception test to explore the paralinguistic traits of the synthesized voice. Using a corpus of 13 synthesized voices, constructed from acoustic concatenative speech synthesis, we assessed the response of 23 listeners from differing cultural backgrounds. To evaluate if perception shifts from the defined traits, we asked listeners to assigned traits of age, gender, accent origin, and human-likeness. Results present a difference in perception for age and human-likeness across voices, and a general agreement across listeners for both gender and accent origin. Connections found between age, gender and human-likeness call for further exploration into a more participatory and inclusive synthesized vocal identity.
AB - Along with the rise of artificial intelligence and the internet-of-things, synthesized voices are now common in daily-life, providing us with guidance, assistance, and even companionship. From formant to concatenative synthesis, the synthesized voice continues to be defined by the same traits we prescribe to ourselves. When the recorded voice is synthesized, does our perception of its new machine embodiment change, and can we consider an alternative, more inclusive form? To begin evaluating the impact of aesthetic design, this study presents a first-step perception test to explore the paralinguistic traits of the synthesized voice. Using a corpus of 13 synthesized voices, constructed from acoustic concatenative speech synthesis, we assessed the response of 23 listeners from differing cultural backgrounds. To evaluate if perception shifts from the defined traits, we asked listeners to assigned traits of age, gender, accent origin, and human-likeness. Results present a difference in perception for age and human-likeness across voices, and a general agreement across listeners for both gender and accent origin. Connections found between age, gender and human-likeness call for further exploration into a more participatory and inclusive synthesized vocal identity.
KW - Human-Machine Interaction
KW - Humanisation of Synthesis
KW - Paralinguistic Traits
KW - Personification Debate
KW - Synthesized Voice
UR - http://www.scopus.com/inward/record.url?scp=85038367773&partnerID=8YFLogxK
U2 - 10.1145/3123514.3123528
DO - 10.1145/3123514.3123528
M3 - Conference contribution
AN - SCOPUS:85038367773
T3 - ACM International Conference Proceeding Series
BT - Proceedings of the 12th International Audio Mostly Conference
PB - Association for Computing Machinery
T2 - 12th International Audio Mostly Conference, AM 2017
Y2 - 23 August 2017 through 26 August 2017
ER -