TY - GEN
T1 - German abusive language dataset with focus on COVID-19
AU - Wich, Maximilian
AU - Räther, Svenja
AU - Groh, Georg
N1 - Publisher Copyright:
© 2021 KONVENS 2021 - Proceedings of the 17th Conference on Natural Language Processing. All Rights Reserved.
PY - 2021
Y1 - 2021
N2 - The COVID-19 pandemic has had a significant impact on human lives globally. As a result, it is unsurprising that it has influenced hate speech and other sorts of abusive language on social media. Machine learning models have been designed to automatically detect such posts and messages, which necessitate a significant amount of labeled data. Despite the relevance of the COVID-19 topic in the field of abusive language detection, no annotated datasets with this focus are available. To solve these shortfalls, we target to create such a dataset. Our contributions are as follows: (1) a methodology for collecting abusive language data from Twitter with a substantial amount of abusive and hateful content, and (2) a German abusive language dataset with 4,960 annotated tweets centered on COVID-19. Both the methodology and the dataset are intended to aid researchers in improving abusive language detection.
AB - The COVID-19 pandemic has had a significant impact on human lives globally. As a result, it is unsurprising that it has influenced hate speech and other sorts of abusive language on social media. Machine learning models have been designed to automatically detect such posts and messages, which necessitate a significant amount of labeled data. Despite the relevance of the COVID-19 topic in the field of abusive language detection, no annotated datasets with this focus are available. To solve these shortfalls, we target to create such a dataset. Our contributions are as follows: (1) a methodology for collecting abusive language data from Twitter with a substantial amount of abusive and hateful content, and (2) a German abusive language dataset with 4,960 annotated tweets centered on COVID-19. Both the methodology and the dataset are intended to aid researchers in improving abusive language detection.
UR - http://www.scopus.com/inward/record.url?scp=85115883620&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85115883620
T3 - KONVENS 2021 - Proceedings of the 17th Conference on Natural Language Processing
SP - 247
EP - 252
BT - KONVENS 2021 - Proceedings of the 17th Conference on Natural Language Processing
PB - KONVENS
T2 - 17th Conference on Natural Language Processing, KONVENS 2021
Y2 - 6 September 2021 through 9 September 2021
ER -