Loading Session...

Poster Session (Short Papers)

Session Information

Short papers:

LLM-Assisted Pseudo-Relevance Feedback
David Otero and Javier Parapar

Adversarial Edge Perturbation Framework in Graph-based Retrieval
Amir Khosrojerdi, Radin Hamidi Rad and Ebrahim Bagheri

EmbMerge: A Transformer-based Method for Fusing CDR Lists
Mehmet Erdeniz Aydoğdu, Yağmur Duru Tüfekçioğlu, Ismail Sengor Altingovde, Pinar Karagoz and Ismail Hakki Toroslu

Enhancing Attention-based Context Attribution via Token Selection and Think-Twice Mechanism
Tz-Huan Hsu, Sian-Yao Huang, Che-Yu Lin and Cheng-Lin Yang

Beyond Persuasiveness: A User-Centric Evaluation Framework of Explanations for Food Recommendation
Yurou Zhao, Yiding Sun, Ruidong Han, Fei Jiang, Wei Lin and Jiaxin Mao

Pre-trained LLMs Meet Sequential Recommenders: Efficient User-Centric Knowledge Distillation
Nikita Severin, Danil Kartushov, Vladislav Urzhumov, Vladislav Kulikov, Oksana Konovalova, Alexey Grishanov, Anton Klenitskiy, Artem Fatkulin, Alexey Vasilev, Andrey Savchenko and Ilya Makarov

Beyond Correlations: A Downstream Evaluation Framework for Query Performance Prediction
Payel Santra, Partha Basuchowdhuri and Debasis Ganguly

Trust Me on This: A User Study of Explainability for AI-Generated Responses
Weronika Łajewska and Krisztian Balog

Principled Context Engineering for RAG: Statistical Guarantees via Conformal Prediction
Debashish Chakraborty, Eugene Yang, Daniel Khashabi, Dawn Lawrie and Kevin Duh

Structure-aware Pre-Retrieval Performance Prediction on Query Affinity Graphs
Abbas Saleminezhad, Negar Arabzadeh, Seyed Mohammad Hosseini, Soosan Beheshti and Ebrahim Bagheri

Controlling Gender Bias in Retrieval via a Backpack Architecture
Amirabbas Afzali, Amirreza Velae, Iman Ahmadi and Mohammad Aliannejadi

Knowledge-enhanced Multi-Agent for LLM-based Recommendation
Zeyuan Meng, Zixuan Yi and Iadh Ounis

From Single to Multi-Agent Reasoning: Advancing GeneGPT for Genomics QA
Kimia Abedini, Farzad Shami and Gianmaria Silvello

Aligning Instruction-Tuned LLMs for Event Extraction with Multi-objective Reinforcement Learning
Omar Adjali, Siting Liang, Omair Shahzad Bhatti and Daniel Sonntag

Topological Metric for Unsupervised Embedding Quality Evaluation
Aleksei Shestov, Anton Klenitskiy, Daria Denisova, Amurkhan Dzagkoev, Daniil Petrovich, Andrey Savchenko and Maksim Makarenko

Generative Retrieval via Few-shot Indexing
Arian Askari, Chuan Meng, Mohammad Aliannejadi, Zhaochun Ren, Evangelos Kanoulas and Suzan Verberne

Correct but Incomplete: Why Chain-of-Thought Cannot Currently Support Auditable Reasoning
Edward Richards, Javier Sanz-Cruzado and Richard McCreadie


Mar 31, 2026 13:30 - 14:30(Europe/Amsterdam)
Venue : Chemie & Chaos
20260331T1330 20260331T1430 Europe/Amsterdam Poster Session (Short Papers)

Short papers:

LLM-Assisted Pseudo-Relevance FeedbackDavid Otero and Javier Parapar

Adversarial Edge Perturbation Framework in Graph-based RetrievalAmir Khosrojerdi, Radin Hamidi Rad and Ebrahim Bagheri

EmbMerge: A Transformer-based Method for Fusing CDR ListsMehmet Erdeniz Aydoğdu, Yağmur Duru Tüfekçioğlu, Ismail Sengor Altingovde, Pinar Karagoz and Ismail Hakki Toroslu

Enhancing Attention-based Context Attribution via Token Selection and Think-Twice MechanismTz-Huan Hsu, Sian-Yao Huang, Che-Yu Lin and Cheng-Lin Yang

Beyond Persuasiveness: A User-Centric Evaluation Framework of Explanations for Food RecommendationYurou Zhao, Yiding Sun, Ruidong Han, Fei Jiang, Wei Lin and Jiaxin Mao

Pre-trained LLMs Meet Sequential Recommenders: Efficient User-Centric Knowledge DistillationNikita Severin, Danil Kartushov, Vladislav Urzhumov, Vladislav Kulikov, Oksana Konovalova, Alexey Grishanov, Anton Klenitskiy, Artem Fatkulin, Alexey Vasilev, Andrey Savchenko and Ilya Makarov

Beyond Correlations: A Downstream Evaluation Framework for Query Performance PredictionPayel Santra, Partha Basuchowdhuri and Debasis Ganguly

Trust Me on This: A User Study of Explainability for AI-Generated ResponsesWeronika Łajewska and Krisztian Balog

Principled Context Engineering for RAG: Statistical Guarantees via Conformal PredictionDebashish Chakraborty, Eugene Yang, Daniel Khashabi, Dawn Lawrie and Kevin Duh

Structure-aware Pre-Retrieval Performance Prediction on Query Affinity Gr ...

Chemie & Chaos ECIR2026 conference-secretariat@blueboxevents.nl

Sub Sessions

Knowledge-enhanced Multi-Agent for LLM-based Recommendation

Short papersLarge Language Models Recommender systemsShort papers 01:30 PM - 02:30 PM (Europe/Amsterdam) 2026/03/31 11:30:00 UTC - 2026/03/31 12:30:00 UTC
Presenters
ZM
Zeyuan Meng
PhD Student, University Of Glasgow
Co-Authors
ZY
Zixuan Yi
University Of Glasgow
IO
Iadh Ounis
Professor, University Of Glasgow

Controlling Gender Bias in Retrieval via a Backpack Architecture

Short papersMachine learning Search and rankingShort papers 01:30 PM - 02:30 PM (Europe/Amsterdam) 2026/03/31 11:30:00 UTC - 2026/03/31 12:30:00 UTC
The presence of social biases in large language models (LLMs) has become a significant concern in AI research. These biases, often embedded in training data, can perpetuate harmful stereotypes and distort decision-making processes. When LLMs are integrated into ranking systems, they can propagate these biases, leading to unfair outcomes in critical applications such as search engines and recommendation systems. Backpack Language Models, unlike traditional transformer-based models that treat text sequences as monolithic structures, generate outputs as weighted combinations of non-contextual, learned word aspects, also known as senses. Leveraging this architecture, we propose a framework for debiasing ranking tasks. Our experimental results show that this framework effectively mitigates gender bias in text retrieval and ranking with minimal degradation in performance.
Presenters
AA
Amirabbas Afzali
Sharif University Of Technology
Co-Authors
AV
Amirreza Velae
Sharif University Of Technology
IA
Iman Ahmadi
Sharif University Of Technology
MA
Mohammad Aliannejadi
Assistant Professor, University Of Amsterdam

Structure-aware Pre-Retrieval Performance Prediction onQuery Affinity Graphs

Short papersIR evaluation 01:30 PM - 02:30 PM (Europe/Amsterdam) 2026/03/31 11:30:00 UTC - 2026/03/31 12:30:00 UTC
Presenters
AS
Abbas Saleminezhad
PhD Student, Toronto Metropolitan University
Co-Authors
NA
Negar Arabzadeh
University Of Bereckly
SH
Seyed Mohammad Hosseini
Toronto Metropolitan University
SB
Soosan Beheshti
EB
Ebrahim Bagheri
University Of Toronto

From Single to Multi-Agent Reasoning: Advancing GeneGPT for Genomics QA

Short papersLarge Language ModelsShort papers 01:30 PM - 02:30 PM (Europe/Amsterdam) 2026/03/31 11:30:00 UTC - 2026/03/31 12:30:00 UTC
Comprehending genomic information is essential for biomedical research, yet extracting data from complex distributed databases remains challenging. Large language models (LLMs) offer potential for genomic Question Answering (QA) but face limitations due to restricted access to domain-specific databases. GeneGPT is the current state-of-the-art system that enhances LLMs by utilizing specialized API calls, though it is constrained by rigid API dependencies and limited adaptability. We replicate GeneGPT and propose GenomAgent, a multi-agent framework that efficiently coordinates specialized agents for complex genomics queries. Evaluated on nine tasks from the GeneTuring benchmark, GenomAgent outperforms GeneGPT by 12% on average, and its flexible architecture extends beyond genomics to various scientific domains needing expert knowledge extraction.
Presenters
KA
Kimia Abedini
Student, University Of Padova
Co-Authors
FS
Farzad Shami
Aalto University
GS
Gianmaria Silvello
Professor, University Of Padova

Aligning Instruction-Tuned LLMs for Event Extraction with Multi-objective Reinforcement Learning

Short papersIR applications Large Language ModelsShort papers 01:30 PM - 02:30 PM (Europe/Amsterdam) 2026/03/31 11:30:00 UTC - 2026/03/31 12:30:00 UTC
Event extraction (EE) aims to identify event triggers and their corresponding arguments from unstructured text, providing structured knowledge essential for many downstream tasks. Despite the success of instruction-tuned large language models (LLMs), current methods often produce inconsistent formats, semantically drifted outputs, and event types that deviate from predefined schemas. These issues arise partly because supervised fine-tuning relies on static loss functions that fail to reflect task-specific objectives such as schema alignment. To address these limitations, we introduce a reinforcement learning framework based on Group Relative Policy Optimization (GRPO) designed to optimize instruction-tuned LLMs for event and argument extraction. We propose three complementary reward functions: a format reward to enforce syntactic and structural validity, a BM25-based reward to enhance lexical and semantic consistency with the input text, and a task-specific supervision reward that directly aligns optimization with task-level performance. Extensive experiments on three standard EE datasets demonstrate that our approach consistently and significantly improves EE performance over strong baselines.
Presenters
OA
Omar Adjali
Researcher, DFKI
Co-Authors
SL
Siting Liang
DFKI
OB
Omair Shahzad Bhatti
DFKI
DS
Daniel Sonntag
DFKI

Correct but Incomplete: Why Chain-of-Thought Cannot Currently Support Auditable Reasoning

Short papersIR evaluationShort papers 01:30 PM - 02:30 PM (Europe/Amsterdam) 2026/03/31 11:30:00 UTC - 2026/03/31 12:30:00 UTC
Large Language Models (LLMs) are increasingly promoted for knowledge-intensive reasoning tasks. Effective oversight requires faithful reasoning traces which show how answers are actually produced. Chain-of-Thought (CoT) prompting is positioned as a technique to promote both accuracy and transparency, as well as provide reasoning traces on how solutions are reached. Recent studies have shown that CoT traces, while plausible, are unfaithful to the how the answer was derived. However, we argue there is a second more subtle issue with CoT that requires more investigation; even logically correct CoT explanations can conceal key facts used to produce the answer - thereby misleading the reader. In this paper we illustrate this behavior by six LLM models when answering questions across three question answering (QA) datasets of different types (arithmetic, factual QA, and multi-choice reasoning). In particular, we show that injecting a key fact into the prompt increased QA accuracy by 11\% to 36\% (as expected), yet the models omitted this fact from otherwise sound CoT explanations in up 56\% of cases. This provides further evidence that researchers and developers should be wary of relying on CoT explanations, as even those that appear to be logically correct may be misleading.
Presenters
ER
Edward Richards
PhD Student, School Of Computing Science, University Of Glasgow
Co-Authors
JS
Javier Sanz-Cruzado
University Of Glasgow
RM
Richard McCreadie
University Of Glasgow

Generative Retrieval via Few-shot Indexing

Short papersShort papers 01:30 PM - 02:30 PM (Europe/Amsterdam) 2026/03/31 11:30:00 UTC - 2026/03/31 12:30:00 UTC
Presenters
AA
Arian Askari
Leiden University
Co-Authors
CM
Chuan Meng
Postdoctoral Researcher, The University Of Edinburgh
MA
Mohammad Aliannejadi
Assistant Professor, University Of Amsterdam
ZR
Zhaochun Ren
Leiden University
EK
Evangelos Kanoulas
University Of Amsterdam
Suzan Verberne
Leiden University

Topological Metric for Unsupervised Embedding Quality Evaluation

Short papersIR evaluation Machine learning Recommender systemsShort papers 01:30 PM - 02:30 PM (Europe/Amsterdam) 2026/03/31 11:30:00 UTC - 2026/03/31 12:30:00 UTC
Modern representation learning increasingly relies on unsupervised and self-supervised methods trained on large-scale unlabeled data. While these approaches achieve impressive generalization across tasks and domains, evaluating embedding quality without labels remains an open challenge. In this work, we propose Persistence, a topology-aware metric based on persistent homology that quantifies the geometric structure and topological richness of embedding spaces in a fully unsupervised manner. Unlike metrics that assume linear separability or rely on covariance structure, Persistence captures global and multi-scale organization. Empirical results across diverse domains show that Persistence consistently achieves top-tier correlations with downstream performance, outperforming existing unsupervised metrics and enabling reliable model and hyperparameter selection.
Presenters
AS
Aleksei Shestov
Senior AI Researcher, Sb AI Lab
Co-Authors
AK
Anton Klenitskiy
SB AI Lab
DD
Daria Denisova
SB AI Lab
AD
Amurkhan Dzagkoev
SB AI Lab
DP
Daniil Petrovich
HSE University
AS
Andrey Savchenko
SB AI Lab
MM
Maksim Makarenko
SB AI Lab

Principled Context Engineering for RAG: StatisticalGuarantees via Conformal Prediction

Short papersExplainability methods Large Language Models Retrieval-Augmented Generation 01:30 PM - 02:30 PM (Europe/Amsterdam) 2026/03/31 11:30:00 UTC - 2026/03/31 12:30:00 UTC
Presenters
DC
Debashish Chakraborty
Researcher, Johns Hopkins University
Co-Authors Eugene Yang
Research Scientist, Human Language Technology Center Of Excellence, Johns Hopkins University
DK
Daniel Khashabi
DL
Dawn Lawrie
Senior Research Scientist, HLTCOE At Johns Hopkins University
KD
Kevin Duh
Johns Hopkins University Human Language Technology Center Of Excellence (HLTCOE)

Trust Me on This: A User Study of Explainability for AI-Generated Responses

Short papersExplainability methods Retrieval-Augmented Generation User aspects in IRShort papers 01:30 PM - 02:30 PM (Europe/Amsterdam) 2026/03/31 11:30:00 UTC - 2026/03/31 12:30:00 UTC
Presenters
Weronika Łajewska
Applied Scientist, Amazon
Co-Authors
KB
Krisztian Balog
Professor, University Of Stavanger

EmbMerge: A Transformer-based Method for Fusing CDR Lists

Short papersShort papers 01:30 PM - 02:30 PM (Europe/Amsterdam) 2026/03/31 11:30:00 UTC - 2026/03/31 12:30:00 UTC
To overcome the problem of data sparsity in recommendation systems, cross-domain recommendation (CDR) models leverage user data from a source domain to improve recommendations in a target domain. However, different CDR models capture unique and often complementary aspects of user preferences. This paper introduces EmbMerge, a supervised method to fuse the outputs of these diverse CDR models to produce a better, unified ranking. Traditional fusion methods are mainly based on sparse features that do not capture deep semantic relationships. EmbMerge instead employs rather small dense vector representations for models, items, ranks, and scores, processed by a transformer encoder to generate rich, context-aware embeddings. We propose two architectural variants and conduct experiments on two datasets using ranked lists from three different state-of-the-art CDR systems. Our results demonstrate that EmbMerge outperforms four baseline fusion methods, validating its effectiveness as a technique for combining the strengths of various cross-domain recommendation systems.
Presenters
MA
Mehmet Erdeniz Aydo?du
Middle East Technical University
Co-Authors
Y
Ya?mur Duru
Middle East Technical University
IA
Ismail Sengor Altingovde
Middle East Technical University
PK
Pinar Karagoz
Middle East Technical University
IT
Ismail Hakki Toroslu
Middle East Technical University

Adversarial Edge Perturbation Framework in Graph-based Retrieval

Short papersMachine learning Search and rankingShort papers 01:30 PM - 02:30 PM (Europe/Amsterdam) 2026/03/31 11:30:00 UTC - 2026/03/31 12:30:00 UTC
Presenters
AK
Amir Khosrojerdi
Student, University Of Toronto
Co-Authors
RR
Radin Hamidi Rad
University Of Toronto
EB
Ebrahim Bagheri
University Of Toronto

Enhancing Attention-based Context Attribution via TokenSelection and Think-Twice Mechanism

Short papersExplainability methods Retrieval-Augmented Generation 01:30 PM - 02:30 PM (Europe/Amsterdam) 2026/03/31 11:30:00 UTC - 2026/03/31 12:30:00 UTC
Presenters Tz-Huan Hsu
Data Scientist, CyCraft Technology Corporation Taiwan Branch
Co-Authors
SH
Sian-Yao Huang
Data Scientist Technical Lead, CyCraft Technology Corporation Taiwan Branch
CL
Che-Yu Lin
CY
Cheng-Lin Yang
CyCraft AI Lab

Beyond Persuasiveness: A User-Centric Evaluation Framework of Explanations for Food Recommendation

Short papersShort papers 01:30 PM - 02:30 PM (Europe/Amsterdam) 2026/03/31 11:30:00 UTC - 2026/03/31 12:30:00 UTC
Presenters
YZ
Yurou Zhao
PhD Student, Renmin University Of China
Co-Authors
YS
Yiding Sun
RH
Ruidong Han
JF
Jiang Fei
WL
Wei Lin
JM
Jiaxin Mao

Beyond Correlations: A Downstream Evaluation Framework forQuery Performance Prediction

Short papersIR evaluation Search and ranking 01:30 PM - 02:30 PM (Europe/Amsterdam) 2026/03/31 11:30:00 UTC - 2026/03/31 12:30:00 UTC
The standard practice of query performance prediction (QPP) evaluation is to measure a set-level correlation between the estimated retrieval qualities and the true ones. However, this correlation-based evaluation measure quantifies QPP effectiveness at the level of individual queries, nor does this connect to a downstream application, meaning that QPP methods yielding high correlation values may not find a practical application in query-specific decisions in an IR pipeline. In this paper, we propose a downstream-focussed evaluation framework where a distribution of QPP estimates across a list of top-documents retrieved with several rankers is used as priors for IR fusion. While on the one hand, a distribution of these estimates closely matching that of the true retrieval qualities indicates the quality of the predictor, their usage as priors on the other hand indicates a predictor's ability to make informed choices in an IR pipeline. Our experiments firstly establish the importance of QPP estimates in weighted IR fusion, yielding significant improvements of over 4.5% over unweighted CombSUM and RRF fusion strategies, and secondly, reveal new insights that the downstream effectiveness of QPP does not correlate well with the standard correlation-based QPP evaluation.
Presenters PAYEL SANTRA
Ph.D Student, Indian Association For The Cultivation Of Science, Kolkata
Co-Authors
PB
Partha Basuchowdhuri
DG
Debasis Ganguly
University Of Glasgow

Pre-trained LLMs Meet Sequential Recommenders: Efficient User-Centric Knowledge Distillation

Short papersLarge Language Models Recommender systems Search and rankingShort papers 01:30 PM - 02:30 PM (Europe/Amsterdam) 2026/03/31 11:30:00 UTC - 2026/03/31 12:30:00 UTC
Sequential recommender systems have achieved significant success in modeling temporal user behavior but remain limited in capturing rich user semantics beyond interaction patterns. Large Language Models (LLMs) present opportunities to enhance user understanding with their reasoning capabilities, yet existing integration approaches create prohibitive inference costs in real time. To address these limitations, we present a novel knowledge distillation method that utilizes textual user profile generated by pre-trained LLMs into sequential recommenders without requiring LLM inference at serving time. The resulting approach maintains the inference efficiency of traditional sequential models while requiring neither architectural modifications nor LLM fine-tuning.
Presenters
NS
Nikita Severin
Independent Researcher
Co-Authors Aleksei Grishanov
Sber AI Lab
AK
Anton Klenitskiy
SB AI Lab
Artem Fatkulin
Middle AI Researcher, Sber AI Lab
AV
Alexey Vasilev
Sber AI Lab, HSE University

LLM-Assisted Pseudo-Relevance Feedback

Short papersLarge Language Models Search and rankingShort papers 01:30 PM - 02:30 PM (Europe/Amsterdam) 2026/03/31 11:30:00 UTC - 2026/03/31 12:30:00 UTC
Query expansion is a long-standing technique to mitigate vocabulary mismatch in ad hoc Information Retrieval. Pseudo-relevance feedback methods, such as RM3, estimate an expanded query model from the top-ranked documents, but remain vulnerable to topic drift when early results include noisy or tangential content. Recent approaches instead prompt Large Language Models to generate synthetic expansions or query variants. While effective, these methods risk hallucinations and misalignment with collection-specific terminology. We propose a hybrid alternative that preserves the robustness and interpretability of classical PRF while leveraging LLM semantic judgement. Our method inserts an LLM-based filtering stage prior to RM3 estimation: the LLM judges the documents in the initial top-$k$ ranking, and RM3 is computed only over those accepted as relevant. This simple intervention improves over blind PRF and a strong baseline across several datasets and metrics.
Presenters
DO
David Otero
Universidade Da Coruña
Co-Authors
JP
Javier Parapar
Associate Professor, Universidade Da Coruña
110 visits

Session Participants

User Online
Session speakers, moderators & attendees
Universidade Da Coruña
Student
,
University Of Toronto
Middle East Technical University
Data Scientist
,
CyCraft Technology Corporation Taiwan Branch
PhD Student
,
Renmin University Of China
+ 12 more speakers. View All
No moderator for this session!
No attendee has checked-in to this session!
43 attendees saved this session

Session Chat

Live Chat
Chat with participants attending this session

Questions & Answers

Answered
Submit questions for the presenters

Session Polls

Active
Participate in live polls

Need Help?

Technical Issues?

If you're experiencing playback problems, try adjusting the quality or refreshing the page.

Questions for Speakers?

Use the Q&A tab to submit questions that may be addressed in follow-up sessions.