TY - GEN
T1 - Not My Voice! A Taxonomy of Ethical and Safety Harms of Speech Generators
AU - Hutiri, Wiebke
AU - Papakyriakopoulos, Orestis
AU - Xiang, Alice
N1 - Publisher Copyright:
© 2024 Owner/Author.
PY - 2024/6/3
Y1 - 2024/6/3
N2 - The rapid and wide-scale adoption of AI to generate human speech poses a range of significant ethical and safety risks to society that need to be addressed. For example, a growing number of speech generation incidents are associated with swatting attacks in the United States, where anonymous perpetrators create synthetic voices that call police officers to close down schools and hospitals, or to violently gain access to innocent citizens' homes. Incidents like this demonstrate that multimodal generative AI risks and harms do not exist in isolation, but arise from the interactions of multiple stakeholders and technical AI systems. In this paper we analyse speech generation incidents to study how patterns of specific harms arise. We find that specific harms can be categorised according to the exposure of affected individuals, that is to say whether they are a subject of, interact with, suffer due to, or are excluded from speech generation systems. Similarly, specific harms are also a consequence of the motives of the creators and deployers of the systems. Based on these insights we propose a conceptual framework for modelling pathways to ethical and safety harms of AI, which we use to develop a taxonomy of harms of speech generators. Our relational approach captures the complexity of risks and harms in sociotechnical AI systems, and yields a taxonomy that can support appropriate policy interventions and decision making for the responsible development and release of speech generation models.
AB - The rapid and wide-scale adoption of AI to generate human speech poses a range of significant ethical and safety risks to society that need to be addressed. For example, a growing number of speech generation incidents are associated with swatting attacks in the United States, where anonymous perpetrators create synthetic voices that call police officers to close down schools and hospitals, or to violently gain access to innocent citizens' homes. Incidents like this demonstrate that multimodal generative AI risks and harms do not exist in isolation, but arise from the interactions of multiple stakeholders and technical AI systems. In this paper we analyse speech generation incidents to study how patterns of specific harms arise. We find that specific harms can be categorised according to the exposure of affected individuals, that is to say whether they are a subject of, interact with, suffer due to, or are excluded from speech generation systems. Similarly, specific harms are also a consequence of the motives of the creators and deployers of the systems. Based on these insights we propose a conceptual framework for modelling pathways to ethical and safety harms of AI, which we use to develop a taxonomy of harms of speech generators. Our relational approach captures the complexity of risks and harms in sociotechnical AI systems, and yields a taxonomy that can support appropriate policy interventions and decision making for the responsible development and release of speech generation models.
KW - Deepfakes
KW - Generative AI
KW - Harms
KW - Multimodal
KW - Speech Generation
KW - Speech Synthesis
KW - Taxonomy
KW - Voice Cloning
UR - http://www.scopus.com/inward/record.url?scp=85196664674&partnerID=8YFLogxK
U2 - 10.1145/3630106.3658911
DO - 10.1145/3630106.3658911
M3 - Conference contribution
AN - SCOPUS:85196664674
T3 - 2024 ACM Conference on Fairness, Accountability, and Transparency, FAccT 2024
SP - 359
EP - 376
BT - 2024 ACM Conference on Fairness, Accountability, and Transparency, FAccT 2024
PB - Association for Computing Machinery, Inc
T2 - 2024 ACM Conference on Fairness, Accountability, and Transparency, FAccT 2024
Y2 - 3 June 2024 through 6 June 2024
ER -