Loading Session...

Resource III: Evaluation Tooling for Retrieval and RecSys

Session Information

  • CoRECT: A Framework for Evaluating Embedding Compression Techniques at Scale

    Laura Caspari, Michael Dinzinger, Kanishka Ghosh Dastidar, Christofer Fellicious, Jelena Mitrović and Michael Granitzer

  • GREAT: Group Recommender Evaluation and Analysis Tool

    Ariel Smith, David Contreras, Maria Salamo and Ludovico Boratto

  • Evaluating the Efficiency and Effectiveness of Learned Sparse Retrieval with the lsr_benchmark

    Maik Fröbe, Ferdinand Schlatt, Cosimo Rulli, Tim Hagen, Jan Heinrich Merker, Gijs Hendriksen, Carlos Lassance, Franco Maria Nardini, Rossano Venturini and Martin Potthast

  • An Open SERP Mining Infrastructure for the Archive Query Log

    Jan Heinrich Merker, Simon Ruth, Harrisen Scells and Martin Potthast

  • RoutIR: Fast Serving of Retrieval Pipelines for Retrieval-Augmented Generation

    Eugene Yang, Andrew Yates, Dawn Lawrie, James Mayfield and Trevor Adriaanse

Apr 01, 2026 14:30 - 16:00(Europe/Amsterdam)
Venue : Chemie
20260401T1430 20260401T1600 Europe/Amsterdam Resource III: Evaluation Tooling for Retrieval and RecSys

CoRECT: A Framework for Evaluating Embedding Compression Techniques at Scale

Laura Caspari, Michael Dinzinger, Kanishka Ghosh Dastidar, Christofer Fellicious, Jelena Mitrović and Michael Granitzer

GREAT: Group Recommender Evaluation and Analysis Tool

Ariel Smith, David Contreras, Maria Salamo and Ludovico Boratto

Evaluating the Efficiency and Effectiveness of Learned Sparse Retrieval with the lsr_benchmark

Maik Fröbe, Ferdinand Schlatt, Cosimo Rulli, Tim Hagen, Jan Heinrich Merker, Gijs Hendriksen, Carlos Lassance, Franco Maria Nardini, Rossano Venturini and Martin Potthast

An Open SERP Mining Infrastructure for the Archive Query Log

Jan Heinrich Merker, Simon Ruth, Harrisen Scells and Martin Potthast

RoutIR: Fast Serving of Retrieval Pipelines for Retrieval-Augmented Generation

Eugene Yang, Andrew Yates, Dawn Lawrie, James Mayfield and Trevor Adriaanse

Chemie ECIR2026 conference-secretariat@blueboxevents.nl

Sub Sessions

CoRECT: A Framework for Evaluating Embedding CompressionTechniques at Scale

ResourceEvaluation research Machine Learning and Large Language Models Search and ranking 02:30 PM - 04:00 PM (Europe/Amsterdam) 2026/04/01 12:30:00 UTC - 2026/04/01 14:00:00 UTC
Dense retrieval systems have proven to be effective across various benchmarks, but require substantial memory to store large search indices. Recent advances in embedding compression show that index sizes can be greatly reduced with minimal loss in ranking quality. However, existing studies often overlook the role of corpus complexity -- a critical factor, as recent work shows that both corpus size and document length strongly affect dense retrieval performance. In this paper, we introduce CoRECT (Controlled Retrieval Evaluation of Compression Techniques), a framework for large-scale evaluation of embedding compression methods, supported by a newly curated dataset collection. To demonstrate its utility, we benchmark eight representative types of compression methods. Notably, we show that non-learned compression achieves substantial index size reduction, even on up to 100M passages, with statistically insignificant performance loss. However, selecting the optimal compression method remains challenging, as performance varies across models. Such variability highlights the necessity of CoRECT to enable consistent comparison and informed selection of compression methods. All code, data, and results are available on GitHub and HuggingFace.
Presenters
LC
Laura Caspari
University Of Passau
Co-Authors
MD
Michael Dinzinger
University Of Passau
KD
Kanishka Ghosh Dastidar
CF
Christofer Fellicious
JM
Jelena Mitrović
MG
Michael Granitzer
University Of Passau

GREAT: Group Recommender Evaluation and Analysis Tool

Resource 02:30 PM - 04:00 PM (Europe/Amsterdam) 2026/04/01 12:30:00 UTC - 2026/04/01 14:00:00 UTC
Previous research on group recommender systems (GRSs) has shown that group dynamics strongly influence decision-making, yet collaborative filtering (CF)¨Cbased GRSs rarely account for social interactions, largely because suitable analytical tools are lacking. This paper introduces a community resource for studying live groups as they engage with a CF-based recommender system through a domain-independent graphical interface that records interaction signals (such as suggestions, views, and favorites) and integrates them into the recommendation process. A live user study with 72 participants organized into 18 groups demonstrates the system¡¯s effectiveness in capturing and analyzing user interactions. Results show that incorporating interaction awareness enhances group satisfaction and reveals underlying social dynamics, offering new opportunities for adaptive GRSs responsive to real-time user behavior. Source code and dataset available online at this link1.
Presenters
AS
Ariel Smith
Universidad Arturo Prat
Co-Authors
DC
David Contreras
Smart Society Research Group, La Salle-Universitat Ramon Llull
MS
Maria Salamo
Departament Of Mathematics And Computer Science, Universitat De Barcelona
LB
Ludovico Boratto
Associate Professor Of Computer Science, University Of Cagliari

Evaluating the Efficiency and Effectiveness of Learned Sparse Retrieval with the lsr_benchmark

ResourceResource 02:30 PM - 04:00 PM (Europe/Amsterdam) 2026/04/01 12:30:00 UTC - 2026/04/01 14:00:00 UTC
Different learned sparse retrieval (LSR) models offer different trade-offs between effectiveness and efficiency. However, while there are standardized and interoperable tools to assess LSR effectiveness, there is no agreed-upon methodology for evaluating efficiency, and datasets with high-quality relevance judgments are too large for repeated efficiency experiments, e.g., across different hardware. To promote the evaluation of LSR~models for effectiveness and efficiency, we introduce the \lsrBenchmark, which measures retrieval effectiveness and efficiency of each step in an LSR~pipeline (document embedding, indexing, query embedding, and retrieval). To ensure tractability and extensibility, we apply current corpus subsampling methods to eleven TREC tasks, precompute embeddings with eleven LSR~models per task, and provide eight retrieval systems as baselines. For the benchmark's hosted version, a modular~API and tools for evaluating effectiveness and efficiency makes submitting new approaches easy. Our experiments show that the chosen embedding model significantly affects the efficiency of a retrieval system and that LSR is more effective but less efficient than BM25---an efficiency gap our benchmark helps to track as new LSR models are published.
Presenters
MF
Maik Fröbe
PhD Student, Friedrich-Schiller-Universität Jena
Co-Authors
FS
Ferdinand Schlatt
Friedrich-Schiller-Universit?t Jena
CR
Cosimo Rulli
Researcher , ISTI-CNR
TH
Tim Hagen
Research Assistant And PhD Student, University Of Kassel
Jan Heinrich Merker
Friedrich-Schiller-Universität Jena
GH
Gijs Hendriksen
PhD Candidate, Radboud University
CL
Carlos Lassance
Cohere
FN
Franco Maria Nardini
Research Director, ISTI-CNR
RV
Rossano Venturini
University Of Pisa
Martin Potthast
University Of Kassel, Hessian.AI, And ScaDS.AI

An Open SERP Mining Infrastructure for the Archive Query Log

ResourceResource 02:30 PM - 04:00 PM (Europe/Amsterdam) 2026/04/01 12:30:00 UTC - 2026/04/01 14:00:00 UTC
Query logs are key resources for studying search engine interactions and improving retrieval effectiveness but are rarely publicly available. In the past, search providers only shared small subsets of their own logs to curb competition and to ensure privacy. The Archive Query Log (AQL) will become an open alternative: mining query logs from archived search engine result pages (SERPs). While the AQL-22 prototype demonstrated the feasibility of this approach, its limited scalability and maintainability hindered the widespread adoption by the research community. We re-implement the crawling and parsing of the AQL on open infrastructure, using standard tools, a new framework for storing SERPs, and following FAIR data principles. The extended and continuously crawled AQL-25 corpus contains 553 million SERPs from 775 search providers, mined from six web archives, where so far 223 million SERPs (44 TB; 40%) have been downloaded and parsed. We demonstrate the use of this new AQL mining framework in two typical analysis scenarios: a temporal analysis now implemented as a single Elasticsearch query and a batch-processing analysis using Ray. Our resource equips researchers with all the tools needed to analyze SERPs.
Presenters Jan Heinrich Merker
Friedrich-Schiller-Universität Jena
Co-Authors
SR
Simon Ruth
University Of Kassel
HS
Harry Scells
Assistant Professor, University Of Tübingen
Martin Potthast
University Of Kassel, Hessian.AI, And ScaDS.AI

RoutIR: Fast Serving of Retrieval Pipelines for Retrieval-Augmented Generation

ResourceApplicationsResource 02:30 PM - 04:00 PM (Europe/Amsterdam) 2026/04/01 12:30:00 UTC - 2026/04/01 14:00:00 UTC
Retrieval models are key components of Retrieval-Augmented Generation (RAG) systems, which generate search queries, process the documents returned, and generate a response. RAG systems are often dynamic and may involve multiple rounds of retrieval. While many state-of-the-art retrieval methods are available through academic IR platforms, these platforms are typically designed for the Cranfield paradigm in which all queries are known up front and can be batch processed offline. This simplification accelerates research but leaves state-of-the-art retrieval models unable to support downstream applications that require online services, such as arbitrary dynamic RAG pipelines that involve looping, feedback, or even self-organizing agents. In this work, we introduce RoutIR, a Python package that provides a simple and efficient HTTP API that wraps arbitrary retrieval methods, including first stage retrieval, reranking, query expansion, and result fusion. By providing a minimal JSON configuration file specifying the retrieval models to serve, RoutIR can be used to construct and query retrieval pipelines on-the-fly using any available models (e.g., fusing the results of several first-stage retrieval methods followed by reranking). The API automatically performs asynchronous query batching and results caching by default. While many state-of-the-art retrieval methods are already supported by the package, RoutIR is also easily expandable by implementing the Engine abstract class. The package is publicly available on GitHub: http://github.com/hltcoe/routir.
Presenters Eugene Yang
Research Scientist, Human Language Technology Center Of Excellence, Johns Hopkins University
Co-Authors
AY
Andrew Yates
Johns Hopkins University, HLTCOE
DL
Dawn Lawrie
Senior Research Scientist, HLTCOE At Johns Hopkins University
James Mayfield
Principal Computer Scientist, JHU HLTCOE
TA
Trevor Adriaanse
Johns Hopkins University
121 visits

Session Participants

User Online
Session speakers, moderators & attendees
University Of Passau
Universidad Arturo Prat
PhD Student
,
Friedrich-Schiller-Universität Jena
Friedrich-Schiller-Universität Jena
Research Scientist
,
Human Language Technology Center Of Excellence, Johns Hopkins University
Associate professor
,
University Of Stavanger
Technical Project Manager
,
University Of Cologne
27 attendees saved this session

Session Chat

Live Chat
Chat with participants attending this session

Questions & Answers

Answered
Submit questions for the presenters

Session Polls

Active
Participate in live polls

Need Help?

Technical Issues?

If you're experiencing playback problems, try adjusting the quality or refreshing the page.

Questions for Speakers?

Use the Q&A tab to submit questions that may be addressed in follow-up sessions.