Bias in word embeddings

Orestis Papakyriakopoulos, Simon Hegelich, Juan Carlos Medina Serrano, Fabienne Marco

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

73 Scopus citations

Abstract

Word embeddings are a widely used set of natural language processing techniques that map words to vectors of real numbers. These vectors are used to improve the quality of generative and predictive models. Recent studies demonstrate that word embeddings contain and amplify biases present in data, such as stereotypes and prejudice. In this study, we provide a complete overview of bias in word embeddings. We develop a new technique for bias detection for gendered languages and use it to compare bias in embeddings trained on Wikipedia and on political social media data. We investigate bias diffusion and prove that existing biases are transferred to further machine learning models. We test two techniques for bias mitigation and show that the generally proposed methodology for debiasing models at the embeddings level is insufficient. Finally, we employ biased word embeddings and illustrate that they can be used for the detection of similar biases in new data. Given that word embeddings are widely used by commercial companies, we discuss the challenges and required actions towards fair algorithmic implementations and applications.

Original languageEnglish
Title of host publicationFAT* 2020 - Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency
PublisherAssociation for Computing Machinery, Inc
Pages446-457
Number of pages12
ISBN (Electronic)9781450369367
DOIs
StatePublished - 27 Jan 2020
Event3rd ACM Conference on Fairness, Accountability, and Transparency, FAT* 2020 - Barcelona, Spain
Duration: 27 Jan 202030 Jan 2020

Publication series

NameFAT* 2020 - Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency

Conference

Conference3rd ACM Conference on Fairness, Accountability, and Transparency, FAT* 2020
Country/TerritorySpain
CityBarcelona
Period27/01/2030/01/20

Keywords

  • Bias
  • Detection
  • Diffusion
  • Fairness
  • Homophobia
  • Mitigation
  • Racism
  • Sexism
  • Word embeddings

Fingerprint

Dive into the research topics of 'Bias in word embeddings'. Together they form a unique fingerprint.

Cite this