Abstract
This work presents the methodology for constructing a novel dataset of fact-checked news articles in Norwegian bokmål, a language
with relatively limited publicly available resources for natural language processing. To the best of our knowledge, this is the first dataset
of its kind that combines the text of the news article and its veracity label. The source of the data is Faktisk.no, the only Norwegian
fact-checking organization. Each of their fact-checks is published with detailed assessment of a claim, including a link to the original
article in which the claim first appeared along with a verdict (5 categories from completely true, partially true, not sure, partially false
and completely false) and a justification based on factual evidence. The dataset creation process involves several filtering steps. Firstly,
all the links to the articles with the original claim were validated. Articles that had been deleted, often due to the claim being flagged as
false, were excluded. Non textual content, such as video and audio, were identified using keywords in the URL of the link and removed.
Articles that were behind hard paywalls were also removed. From the initial pool of 423 articles, approximately 200 valid instances were
retained. Each article was manually reviewed to ensure that the claim being assessed was still present in the current version of the source
article. A key challenge in compiling such datasets is that false claims are frequently deleted or edited after being fact-checked, resulting
in many articles being unusable. The final dataset includes, for each instance, the claim under evaluation, the corresponding article text,
its title, and its veracity label. This collection is intended to support future research on the language of fake news as well as mis- and
disinformation detection in low-resource languages.