Reducing Human Effort to Validate LLM Relevance Judgements via Stratified Sampling

This abstract has open access

Abstract Summary

Information Retrieval (IR) evaluation deeply relies on human-made relevance judgments. To overcome the high costs of the judgment collection process, a potential solution is to utilize LLMs as judges to replace human annotators. However, the validation of LLM-generated judgments is fundamental for informed use. Standard validation approaches typically rely on simple sampling techniques to collect a sample of the LLM-generated judgments and estimate the LLM agreement with the human. In this work, we propose using stratified sampling, a more sophisticated sampling strategy that, by leveraging appropriate stratification features, reduces human involvement in the validation process while still providing statistical guarantees on the human-LLM agreement estimate. Through the analysis of various candidate features, we identify the LLM-generated judgments themselves as the most promising one. Our approach achieves up to an 85% reduction in the required human involvement in the validation process.

Abstract ID :

NKDR45

Submission Type

Submission Topics

Evaluation research

Associated Sessions

Core Retrieval Models, Representations & Evaluation

Author
Co-Authors

Ph.D. Student

,

University Of Padova

Stefano Marchesin

University of Padua

Guglielmo Faggioli

University Of Padova

Nicola Ferro

Full Professor

,

University Of Padova

Abstracts With Same Type

Abstract ID

Abstract Title

Abstract Topic

Submission Type

Primary Author

NKDR52

An Empirical Study of Model Casing in Learned Sparse Retrieval

Search and ranking

Full papers

Emmanouil Georgios Lionis

NKDR58

Breaking Flat: A Generalised Query Performance PredictionEvaluation Framework

Full papers

Ms. PAYEL SANTRA

NKDR51

Bribery-Resistant Ranking Systems: A Multipartite User-Agnostic Framework for AI Act Compliance

Search and rankingSocietally-motivated IR research

Full papers

Martim Baltazar

NKDR15

Contradictions in Context: Challenges forRetrieval-Augmented Generation in Healthcare

ApplicationsMachine Learning and Large Language Models

Full papers

Saeedeh Javadi

NKDR49

Cross-Sensory Brain Passage Retrieval: Scaling Beyond Visual to Audio

Societally-motivated IR researchUser aspects in IR

Full papers

Niall McGuire

NKDR177

Event-aware Video Corpus Moment Retrieval

ApplicationsSearch and ranking

Full papers

Danyang Hou

NKDR184

Event-aware Video Corpus Moment Retrieval

ApplicationsEvaluation research

Full papers

Danyang Hou

NKDR193

Event-aware Video Corpus Moment Retrieval

ApplicationsSearch and ranking

Full papers

Danyang Hou

NKDR39

ExpertMix: Aspect and Severity Detection in ConversationalComplaints

ApplicationsMachine Learning and Large Language Models

Full papers

Sarmistha Das

View All Abstracts

2 visits