ClaimPT: A Portuguese Dataset of Annotated Claims in News Articles

This abstract has open access
Abstract Summary
Fact-checking remains a demanding and time-consuming task, still largely dependent on manual verification and unable to match the rapid spread of misinformation online. This is particularly important because debunking false information typically takes longer to reach consumers than the original misinformation does; therefore, accelerating corrections through automation can help combat misinformation more effectively. Although many organizations perform manual fact-checking, this approach is difficult to scale given the growing volume of digital content. These limitations have motivated interest in automating fact-checking, where identifying claim sentences is a crucial first step. However, progress has been uneven across languages, with English dominating due to abundant annotated data. European Portuguese, like other low-resource languages, still lacks accessible and licensed datasets, limiting both research and NLP tool developments. In this paper, we introduce ClaimPT, a new dataset of annotated claims from European Portuguese news articles, comprising 1308 articles and 6875 individual annotations. Unlike most existing resources based on social media or parliamentary transcripts, ClaimPT focuses on journalistic content, collected through a partnership with LUSA, the Portuguese News Agency. To ensure high-quality annotations, each article was manually annotated by two trained annotators and validated by a curator, following a newly proposed annotation scheme. We also provide baseline models for claim detection, establishing initial performance benchmarks and enabling future applications of Natural Language Processing (NLP) and Information retrieval (IR) techniques. By releasing ClaimPT, we aim to advance research on low-resource fact-checking and enhance understanding of misinformation in news media.
Abstract ID :
NKDR131
Submission Type

Abstracts With Same Type

Abstract ID
Abstract Title
Abstract Topic
Submission Type
Primary Author
NKDR132
Resource
Mr. Jan Heinrich Merker
NKDR140
User aspects in IR
Resource
Saber Zerhoudi
NKDR129
Machine Learning and Large Language Models Societally-motivated IR research
Resource
Ricardo Campos
NKDR93
Evaluation research Machine Learning and Large Language Models Search and ranking
Resource
Laura Caspari
NKDR125
Evaluation research Recommender systems
Resource
Ludovico Boratto