Monitoring Digital Wildfires: a Large-Scale Dataset of COVID-19 Conspiracy Tweets Created via Fast NLP Inference using the Graphcore IPU

Abstract

Large-scale classification of social media content is a crucial technique for finding, studying, and analyzing misinformation in online social networks. Based on a manually labeled dataset of COVID-19 related conspiracy tweets, we train an NLP classifier and test methods for performing inference at scale using both GPUs as well as the Graphcore IPU AI accelerator.We apply our methods on a large dataset of about 2.5 billion tweets, demonstrating that using our methods, large scale inference is possible using affordable research infrastructures. Furthermore, we find that the IPU, due to its tile-centric design, is especially suited for such inference tasks.As a result, we obtain the AICO dataset of around 18 million tweets related to COVID-19 conspiracy theories that were posted between January 2020 and November 2021, which we make available for other researchers interested in studying the topic further under https://huggingface.co/datasets/Jlangguth/AICO.

Language

English

Author(s)

Rohullah Akbari
Daniel Thilo Schroeder
Petra Filkukova
Johannes Langguth

Affiliation

SINTEF Digital / Sustainable Communication Technologies
BI Norwegian Business School
Simula Research Laboratory

Date

13.08.2025

Year

2025

Publisher

IEEE (Institute of Electrical and Electronics Engineers)

Book

Proceedings of the 2025 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 3–7 June 2025, Milan, Italy

ISBN

9798331526436

Page(s)

241 - 250

DOI

https://doi.org/10.1109/ipdpsw66978.2025.00047

View this publication at Norwegian Research Information Repository

Contact us

Our services

Career

Sustainability

Management and board

Institutes

Other units

About us

Follow us