Investigating Annotator Bias in Abusive Language Datasets

Maximilian Wich, Christian Widmer, Gerhard Hagerer, Georg Groh

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

9 Scopus citations

Abstract

Nowadays, social media platforms use classification models to cope with hate speech and abusive language. The problem of these models is their vulnerability to bias. A prevalent form of bias in hate speech and abusive language datasets is annotator bias caused by the annotators subjective perception and the complexity of the annotation task. In our paper, we develop a set of methods to measure annotator bias in abusive language datasets and to identify different perspectives on abusive language. We apply these methods to four different abusive language datasets. Our proposed approach supports annotation processes of such datasets and future research addressing different perspectives on the perception of abusive language.

Original languageEnglish
Title of host publicationInternational Conference Recent Advances in Natural Language Processing, RANLP 2021
Subtitle of host publicationDeep Learning for Natural Language Processing Methods and Applications - Proceedings
EditorsGalia Angelova, Maria Kunilovskaya, Ruslan Mitkov, Ivelina Nikolova-Koleva
PublisherIncoma Ltd
Pages1515-1525
Number of pages11
ISBN (Electronic)9789544520724
DOIs
StatePublished - 2021
EventInternational Conference on Recent Advances in Natural Language Processing: Deep Learning for Natural Language Processing Methods and Applications, RANLP 2021 - Virtual, Online
Duration: 1 Sep 20213 Sep 2021

Publication series

NameInternational Conference Recent Advances in Natural Language Processing, RANLP
ISSN (Print)1313-8502

Conference

ConferenceInternational Conference on Recent Advances in Natural Language Processing: Deep Learning for Natural Language Processing Methods and Applications, RANLP 2021
CityVirtual, Online
Period1/09/213/09/21

Fingerprint

Dive into the research topics of 'Investigating Annotator Bias in Abusive Language Datasets'. Together they form a unique fingerprint.

Cite this