Abstract
Large-scale classification of social media content is a crucial technique for finding, studying, and analyzing misinformation in online social networks. Based on a manually labeled dataset of COVID-19 related conspiracy tweets, we train an NLP classifier and test methods for performing inference at scale using both GPUs as well as the Graphcore IPU AI accelerator.We apply our methods on a large dataset of about 2.5 billion tweets, demonstrating that using our methods, large scale inference is possible using affordable research infrastructures. Furthermore, we find that the IPU, due to its tile-centric design, is especially suited for such inference tasks.As a result, we obtain the AICO dataset of around 18 million tweets related to COVID-19 conspiracy theories that were posted between January 2020 and November 2021, which we make available for other researchers interested in studying the topic further under https://huggingface.co/datasets/Jlangguth/AICO.