Bias in word embeddings

Orestis Papakyriakopoulos, Simon Hegelich, Juan Carlos Medina Serrano, Fabienne Marco

Publikation: Beitrag in Buch/Bericht/KonferenzbandKonferenzbeitragBegutachtung

71 Zitate (Scopus)

Abstract

Word embeddings are a widely used set of natural language processing techniques that map words to vectors of real numbers. These vectors are used to improve the quality of generative and predictive models. Recent studies demonstrate that word embeddings contain and amplify biases present in data, such as stereotypes and prejudice. In this study, we provide a complete overview of bias in word embeddings. We develop a new technique for bias detection for gendered languages and use it to compare bias in embeddings trained on Wikipedia and on political social media data. We investigate bias diffusion and prove that existing biases are transferred to further machine learning models. We test two techniques for bias mitigation and show that the generally proposed methodology for debiasing models at the embeddings level is insufficient. Finally, we employ biased word embeddings and illustrate that they can be used for the detection of similar biases in new data. Given that word embeddings are widely used by commercial companies, we discuss the challenges and required actions towards fair algorithmic implementations and applications.

OriginalspracheEnglisch
TitelFAT* 2020 - Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency
Herausgeber (Verlag)Association for Computing Machinery, Inc
Seiten446-457
Seitenumfang12
ISBN (elektronisch)9781450369367
DOIs
PublikationsstatusVeröffentlicht - 27 Jan. 2020
Veranstaltung3rd ACM Conference on Fairness, Accountability, and Transparency, FAT* 2020 - Barcelona, Spanien
Dauer: 27 Jan. 202030 Jan. 2020

Publikationsreihe

NameFAT* 2020 - Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency

Konferenz

Konferenz3rd ACM Conference on Fairness, Accountability, and Transparency, FAT* 2020
Land/GebietSpanien
OrtBarcelona
Zeitraum27/01/2030/01/20

Fingerprint

Untersuchen Sie die Forschungsthemen von „Bias in word embeddings“. Zusammen bilden sie einen einzigartigen Fingerprint.

Dieses zitieren