To main content

Monitoring Digital Wildfires: a Large-Scale Dataset of COVID-19 Conspiracy Tweets Created via Fast NLP Inference using the Graphcore IPU

Abstract

Large-scale classification of social media content is a crucial technique for finding, studying, and analyzing misinformation in online social networks. Based on a manually labeled dataset of COVID-19 related conspiracy tweets, we train an NLP classifier and test methods for performing inference at scale using both GPUs as well as the Graphcore IPU AI accelerator.We apply our methods on a large dataset of about 2.5 billion tweets, demonstrating that using our methods, large scale inference is possible using affordable research infrastructures. Furthermore, we find that the IPU, due to its tile-centric design, is especially suited for such inference tasks.As a result, we obtain the AICO dataset of around 18 million tweets related to COVID-19 conspiracy theories that were posted between January 2020 and November 2021, which we make available for other researchers interested in studying the topic further under https://huggingface.co/datasets/Jlangguth/AICO.

Category

Academic article

Language

Other

Author(s)

Affiliation

  • SINTEF Digital / Sustainable Communication Technologies
  • University of Bergen
  • University of Inland Norway
  • Simula Research Laboratory

Date

13.08.2025

Year

2025

Published in

IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW)

Page(s)

241 - 250

View this publication at Norwegian Research Information Repository