To main content

Monitoring Digital Wildfires: a Large-Scale Dataset of COVID-19 Conspiracy Tweets Created via Fast NLP Inference using the Graphcore IPU

Abstract

Large-scale classification of social media content is a crucial technique for finding, studying, and analyzing misinformation in online social networks. Based on a manually labeled dataset of COVID-19 related conspiracy tweets, we train an NLP classifier and test methods for performing inference at scale using both GPUs as well as the Graphcore IPU AI accelerator.We apply our methods on a large dataset of about 2.5 billion tweets, demonstrating that using our methods, large scale inference is possible using affordable research infrastructures. Furthermore, we find that the IPU, due to its tile-centric design, is especially suited for such inference tasks.As a result, we obtain the AICO dataset of around 18 million tweets related to COVID-19 conspiracy theories that were posted between January 2020 and November 2021, which we make available for other researchers interested in studying the topic further under https://huggingface.co/datasets/Jlangguth/AICO.

Category

Academic chapter

Language

English

Author(s)

Affiliation

  • SINTEF Digital / Sustainable Communication Technologies
  • BI Norwegian Business School
  • Simula Research Laboratory

Date

13.08.2025

Year

2025

Publisher

IEEE (Institute of Electrical and Electronics Engineers)

Book

Proceedings of the 2025 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 3–7 June 2025, Milan, Italy

ISBN

9798331526436

Page(s)

241 - 250

View this publication at Norwegian Research Information Repository