NLP-based feature extraction for the detection of COVID-19 misinformation videos on YouTube

Juan Carlos Medina Serrano, Orestis Papakyriakopoulos, Simon Hegelich

Research output: Contribution to journalConference articlepeer-review

40 Scopus citations

Abstract

We present a simple NLP methodology for detecting COVID-19 misinformation videos on YouTube by leveraging user comments. We use transfer learning pre-trained models to generate a multi-label classifier that can categorize conspiratorial content. We use the percentage of misinformation comments on each video as a new feature for video classification. We show that the inclusion of this feature in simple models yields an accuracy of up to 82.2%. Furthermore, we verify the significance of the feature by performing a Bayesian analysis. Finally, we show that adding the first hundred comments as tf-idf features increases the video classifier accuracy by up to 89.4%.

Fingerprint

Dive into the research topics of 'NLP-based feature extraction for the detection of COVID-19 misinformation videos on YouTube'. Together they form a unique fingerprint.

Cite this