SuiteEval: Simplifying Retrieval Benchmarks

This abstract has open access
Abstract Summary
Information retrieval evaluation often suffers from fragmented practices---varying dataset subsets, aggregation methods, and pipeline configurations---that undermine reproducibility and comparability, especially for foundation embedding models requiring robust out-of-domain performance. We introduce SuiteEval, a unified framework that offers automatic end-to-end evaluation, dynamic indexing that reuses on-disk indices to minimise disk usage, and built-in support for major benchmarks (BEIR, LoTTE, MS MARCO, NanoBEIR, and BRIGHT). Users only need to supply a pipeline generator. SuiteEval handles data loading, indexing, ranking, metric computation, and result aggregation. New benchmark suites can be added in a single line. SuiteEval reduces boilerplate and standardises evaluations to facilitate reproducible IR research, as a broader benchmark set is increasingly required.
Abstract ID :
NKDR165
Submission Type
Submission Topics

Associated Sessions

University of Glasgow
University of Glasgow
Senior Lecturer
,
University Of Glasgow

Abstracts With Same Type

Abstract ID
Abstract Title
Abstract Topic
Submission Type
Primary Author
NKDR143
Applications Machine Learning and Large Language Models Recommender systems Search and ranking
Demos
Trung Vo
NKDR166
Applications Machine Learning and Large Language Models Search and ranking Societally-motivated IR research
Demos
Rodrigo Silva
NKDR168
Demos
Rishiraj Saha Roy
NKDR156
Applications Machine Learning and Large Language Models Search and ranking System aspects
Demos
Quang Hieu Vu
NKDR159
Applications Machine Learning and Large Language Models Search and ranking
Demos
Rodrigo Duarte
NKDR160
Applications Conversational search and recommender systems Societally-motivated IR research
Demos
Markos Dimitsas
1 visits