Poster Session (Short Papers)

Loading Session...

Session Information

Short papers:

LLM-Assisted Pseudo-Relevance Feedback
David Otero and Javier Parapar

Adversarial Edge Perturbation Framework in Graph-based Retrieval
Amir Khosrojerdi, Radin Hamidi Rad and Ebrahim Bagheri

EmbMerge: A Transformer-based Method for Fusing CDR Lists
Mehmet Erdeniz Aydoğdu, Yağmur Duru Tüfekçioğlu, Ismail Sengor Altingovde, Pinar Karagoz and Ismail Hakki Toroslu

Enhancing Attention-based Context Attribution via Token Selection and Think-Twice Mechanism
Tz-Huan Hsu, Sian-Yao Huang, Che-Yu Lin and Cheng-Lin Yang

Beyond Persuasiveness: A User-Centric Evaluation Framework of Explanations for Food Recommendation
Yurou Zhao, Yiding Sun, Ruidong Han, Fei Jiang, Wei Lin and Jiaxin Mao

Pre-trained LLMs Meet Sequential Recommenders: Efficient User-Centric Knowledge Distillation
Nikita Severin, Danil Kartushov, Vladislav Urzhumov, Vladislav Kulikov, Oksana Konovalova, Alexey Grishanov, Anton Klenitskiy, Artem Fatkulin, Alexey Vasilev, Andrey Savchenko and Ilya Makarov

Beyond Correlations: A Downstream Evaluation Framework for Query Performance Prediction
Payel Santra, Partha Basuchowdhuri and Debasis Ganguly

Trust Me on This: A User Study of Explainability for AI-Generated Responses
Weronika Łajewska and Krisztian Balog

Principled Context Engineering for RAG: Statistical Guarantees via Conformal Prediction
Debashish Chakraborty, Eugene Yang, Daniel Khashabi, Dawn Lawrie and Kevin Duh

Structure-aware Pre-Retrieval Performance Prediction on Query Affinity Graphs
Abbas Saleminezhad, Negar Arabzadeh, Seyed Mohammad Hosseini, Soosan Beheshti and Ebrahim Bagheri

Controlling Gender Bias in Retrieval via a Backpack Architecture
Amirabbas Afzali, Amirreza Velae, Iman Ahmadi and Mohammad Aliannejadi

Knowledge-enhanced Multi-Agent for LLM-based Recommendation
Zeyuan Meng, Zixuan Yi and Iadh Ounis

From Single to Multi-Agent Reasoning: Advancing GeneGPT for Genomics QA
Kimia Abedini, Farzad Shami and Gianmaria Silvello

Aligning Instruction-Tuned LLMs for Event Extraction with Multi-objective Reinforcement Learning
Omar Adjali, Siting Liang, Omair Shahzad Bhatti and Daniel Sonntag

Topological Metric for Unsupervised Embedding Quality Evaluation
Aleksei Shestov, Anton Klenitskiy, Daria Denisova, Amurkhan Dzagkoev, Daniil Petrovich, Andrey Savchenko and Maksim Makarenko

Generative Retrieval via Few-shot Indexing
Arian Askari, Chuan Meng, Mohammad Aliannejadi, Zhaochun Ren, Evangelos Kanoulas and Suzan Verberne

Correct but Incomplete: Why Chain-of-Thought Cannot Currently Support Auditable Reasoning
Edward Richards, Javier Sanz-Cruzado and Richard McCreadie

Posters

Mar 31, 2026 13:30 - 14:30(Europe/Amsterdam)

Venue : Chemie & Chaos

20260331T1330 20260331T1430 Europe/Amsterdam Poster Session (Short Papers) Short papers: LLM-Assisted Pseudo-Relevance FeedbackDavid Otero and Javier Parapar Adversarial Edge Perturbation Framework in Graph-based RetrievalAmir Khosrojerdi, Radin Hamidi Rad and Ebrahim Bagheri EmbMerge: A Transformer-based Method for Fusing CDR ListsMehmet Erdeniz Aydoğdu, Yağmur Duru Tüfekçioğlu, Ismail Sengor Altingovde, Pinar Karagoz and Ismail Hakki Toroslu Enhancing Attention-based Context Attribution via Token Selection and Think-Twice MechanismTz-Huan Hsu, Sian-Yao Huang, Che-Yu Lin and Cheng-Lin Yang Beyond Persuasiveness: A User-Centric Evaluation Framework of Explanations for Food RecommendationYurou Zhao, Yiding Sun, Ruidong Han, Fei Jiang, Wei Lin and Jiaxin Mao Pre-trained LLMs Meet Sequential Recommenders: Efficient User-Centric Knowledge DistillationNikita Severin, Danil Kartushov, Vladislav Urzhumov, Vladislav Kulikov, Oksana Konovalova, Alexey Grishanov, Anton Klenitskiy, Artem Fatkulin, Alexey Vasilev, Andrey Savchenko and Ilya Makarov Beyond Correlations: A Downstream Evaluation Framework for Query Performance PredictionPayel Santra, Partha Basuchowdhuri and Debasis Ganguly Trust Me on This: A User Study of Explainability for AI-Generated ResponsesWeronika Łajewska and Krisztian Balog Principled Context Engineering for RAG: Statistical Guarantees via Conformal PredictionDebashish Chakraborty, Eugene Yang, Daniel Khashabi, Dawn Lawrie and Kevin Duh Structure-aware Pre-Retrieval Performance Prediction on Query Affinity Gr ... Chemie & Chaos ECIR2026 conference-secretariat@blueboxevents.nl

Add to my Schedule

Sub Sessions

Knowledge-enhanced Multi-Agent for LLM-based Recommendation

Short papersLarge Language Models Recommender systemsShort papers 01:30 PM - 02:30 PM (Europe/Amsterdam) 2026/03/31 11:30:00 UTC - 2026/03/31 12:30:00 UTC

Presenters

Co-Authors

Controlling Gender Bias in Retrieval via a Backpack Architecture

Short papersMachine learning Search and rankingShort papers 01:30 PM - 02:30 PM (Europe/Amsterdam) 2026/03/31 11:30:00 UTC - 2026/03/31 12:30:00 UTC

The presence of social biases in large language models (LLMs) has become a significant concern in AI research. These biases, often embedded in training data, can perpetuate harmful stereotypes and distort decision-making processes. When LLMs are integrated into ranking systems, they can propagate these biases, leading to unfair outcomes in critical applications such as search engines and recommendation systems. Backpack Language Models, unlike traditional transformer-based models that treat text sequences as monolithic structures, generate outputs as weighted combinations of non-contextual, learned word aspects, also known as senses. Leveraging this architecture, we propose a framework for debiasing ranking tasks. Our experimental results show that this framework effectively mitigates gender bias in text retrieval and ranking with minimal degradation in performance.

Presenters

Co-Authors

Structure-aware Pre-Retrieval Performance Prediction onQuery Affinity Graphs

Short papersIR evaluation 01:30 PM - 02:30 PM (Europe/Amsterdam) 2026/03/31 11:30:00 UTC - 2026/03/31 12:30:00 UTC

Presenters

Co-Authors

From Single to Multi-Agent Reasoning: Advancing GeneGPT for Genomics QA

Short papersLarge Language ModelsShort papers 01:30 PM - 02:30 PM (Europe/Amsterdam) 2026/03/31 11:30:00 UTC - 2026/03/31 12:30:00 UTC

Comprehending genomic information is essential for biomedical research, yet extracting data from complex distributed databases remains challenging. Large language models (LLMs) offer potential for genomic Question Answering (QA) but face limitations due to restricted access to domain-specific databases. GeneGPT is the current state-of-the-art system that enhances LLMs by utilizing specialized API calls, though it is constrained by rigid API dependencies and limited adaptability. We replicate GeneGPT and propose GenomAgent, a multi-agent framework that efficiently coordinates specialized agents for complex genomics queries. Evaluated on nine tasks from the GeneTuring benchmark, GenomAgent outperforms GeneGPT by 12% on average, and its flexible architecture extends beyond genomics to various scientific domains needing expert knowledge extraction.

Presenters

Co-Authors

Aligning Instruction-Tuned LLMs for Event Extraction with Multi-objective Reinforcement Learning

Short papersIR applications Large Language ModelsShort papers 01:30 PM - 02:30 PM (Europe/Amsterdam) 2026/03/31 11:30:00 UTC - 2026/03/31 12:30:00 UTC

Event extraction (EE) aims to identify event triggers and their corresponding arguments from unstructured text, providing structured knowledge essential for many downstream tasks. Despite the success of instruction-tuned large language models (LLMs), current methods often produce inconsistent formats, semantically drifted outputs, and event types that deviate from predefined schemas. These issues arise partly because supervised fine-tuning relies on static loss functions that fail to reflect task-specific objectives such as schema alignment. To address these limitations, we introduce a reinforcement learning framework based on Group Relative Policy Optimization (GRPO) designed to optimize instruction-tuned LLMs for event and argument extraction. We propose three complementary reward functions: a format reward to enforce syntactic and structural validity, a BM25-based reward to enhance lexical and semantic consistency with the input text, and a task-specific supervision reward that directly aligns optimization with task-level performance. Extensive experiments on three standard EE datasets demonstrate that our approach consistently and significantly improves EE performance over strong baselines.

Presenters

Co-Authors

Correct but Incomplete: Why Chain-of-Thought Cannot Currently Support Auditable Reasoning

Short papersIR evaluationShort papers 01:30 PM - 02:30 PM (Europe/Amsterdam) 2026/03/31 11:30:00 UTC - 2026/03/31 12:30:00 UTC

Large Language Models (LLMs) are increasingly promoted for knowledge-intensive reasoning tasks. Effective oversight requires faithful reasoning traces which show how answers are actually produced. Chain-of-Thought (CoT) prompting is positioned as a technique to promote both accuracy and transparency, as well as provide reasoning traces on how solutions are reached. Recent studies have shown that CoT traces, while plausible, are unfaithful to the how the answer was derived. However, we argue there is a second more subtle issue with CoT that requires more investigation; even logically correct CoT explanations can conceal key facts used to produce the answer - thereby misleading the reader. In this paper we illustrate this behavior by six LLM models when answering questions across three question answering (QA) datasets of different types (arithmetic, factual QA, and multi-choice reasoning). In particular, we show that injecting a key fact into the prompt increased QA accuracy by 11\% to 36\% (as expected), yet the models omitted this fact from otherwise sound CoT explanations in up 56\% of cases. This provides further evidence that researchers and developers should be wary of relying on CoT explanations, as even those that appear to be logically correct may be misleading.

Presenters

Co-Authors

Generative Retrieval via Few-shot Indexing

Short papersShort papers 01:30 PM - 02:30 PM (Europe/Amsterdam) 2026/03/31 11:30:00 UTC - 2026/03/31 12:30:00 UTC

Presenters

Co-Authors

Topological Metric for Unsupervised Embedding Quality Evaluation

Short papersIR evaluation Machine learning Recommender systemsShort papers 01:30 PM - 02:30 PM (Europe/Amsterdam) 2026/03/31 11:30:00 UTC - 2026/03/31 12:30:00 UTC

Modern representation learning increasingly relies on unsupervised and self-supervised methods trained on large-scale unlabeled data. While these approaches achieve impressive generalization across tasks and domains, evaluating embedding quality without labels remains an open challenge. In this work, we propose Persistence, a topology-aware metric based on persistent homology that quantifies the geometric structure and topological richness of embedding spaces in a fully unsupervised manner. Unlike metrics that assume linear separability or rely on covariance structure, Persistence captures global and multi-scale organization. Empirical results across diverse domains show that Persistence consistently achieves top-tier correlations with downstream performance, outperforming existing unsupervised metrics and enabling reliable model and hyperparameter selection.

Presenters

Co-Authors

Principled Context Engineering for RAG: StatisticalGuarantees via Conformal Prediction

Short papersExplainability methods Large Language Models Retrieval-Augmented Generation 01:30 PM - 02:30 PM (Europe/Amsterdam) 2026/03/31 11:30:00 UTC - 2026/03/31 12:30:00 UTC

Presenters

Co-Authors

Trust Me on This: A User Study of Explainability for AI-Generated Responses

Short papersExplainability methods Retrieval-Augmented Generation User aspects in IRShort papers 01:30 PM - 02:30 PM (Europe/Amsterdam) 2026/03/31 11:30:00 UTC - 2026/03/31 12:30:00 UTC

Presenters

WŁ

Co-Authors

EmbMerge: A Transformer-based Method for Fusing CDR Lists

Short papersShort papers 01:30 PM - 02:30 PM (Europe/Amsterdam) 2026/03/31 11:30:00 UTC - 2026/03/31 12:30:00 UTC

To overcome the problem of data sparsity in recommendation systems, cross-domain recommendation (CDR) models leverage user data from a source domain to improve recommendations in a target domain. However, different CDR models capture unique and often complementary aspects of user preferences. This paper introduces EmbMerge, a supervised method to fuse the outputs of these diverse CDR models to produce a better, unified ranking. Traditional fusion methods are mainly based on sparse features that do not capture deep semantic relationships. EmbMerge instead employs rather small dense vector representations for models, items, ranks, and scores, processed by a transformer encoder to generate rich, context-aware embeddings. We propose two architectural variants and conduct experiments on two datasets using ranked lists from three different state-of-the-art CDR systems. Our results demonstrate that EmbMerge outperforms four baseline fusion methods, validating its effectiveness as a technique for combining the strengths of various cross-domain recommendation systems.

Presenters

Co-Authors

Adversarial Edge Perturbation Framework in Graph-based Retrieval

Short papersMachine learning Search and rankingShort papers 01:30 PM - 02:30 PM (Europe/Amsterdam) 2026/03/31 11:30:00 UTC - 2026/03/31 12:30:00 UTC

Presenters

Co-Authors

Enhancing Attention-based Context Attribution via TokenSelection and Think-Twice Mechanism

Short papersExplainability methods Retrieval-Augmented Generation 01:30 PM - 02:30 PM (Europe/Amsterdam) 2026/03/31 11:30:00 UTC - 2026/03/31 12:30:00 UTC

Presenters

Co-Authors

Beyond Persuasiveness: A User-Centric Evaluation Framework of Explanations for Food Recommendation

Short papersShort papers 01:30 PM - 02:30 PM (Europe/Amsterdam) 2026/03/31 11:30:00 UTC - 2026/03/31 12:30:00 UTC

Presenters

Co-Authors

Beyond Correlations: A Downstream Evaluation Framework forQuery Performance Prediction

Short papersIR evaluation Search and ranking 01:30 PM - 02:30 PM (Europe/Amsterdam) 2026/03/31 11:30:00 UTC - 2026/03/31 12:30:00 UTC

The standard practice of query performance prediction (QPP) evaluation is to measure a set-level correlation between the estimated retrieval qualities and the true ones. However, this correlation-based evaluation measure quantifies QPP effectiveness at the level of individual queries, nor does this connect to a downstream application, meaning that QPP methods yielding high correlation values may not find a practical application in query-specific decisions in an IR pipeline. In this paper, we propose a downstream-focussed evaluation framework where a distribution of QPP estimates across a list of top-documents retrieved with several rankers is used as priors for IR fusion. While on the one hand, a distribution of these estimates closely matching that of the true retrieval qualities indicates the quality of the predictor, their usage as priors on the other hand indicates a predictor's ability to make informed choices in an IR pipeline. Our experiments firstly establish the importance of QPP estimates in weighted IR fusion, yielding significant improvements of over 4.5% over unweighted CombSUM and RRF fusion strategies, and secondly, reveal new insights that the downstream effectiveness of QPP does not correlate well with the standard correlation-based QPP evaluation.

Presenters

Co-Authors

Pre-trained LLMs Meet Sequential Recommenders: Efficient User-Centric Knowledge Distillation

Short papersLarge Language Models Recommender systems Search and rankingShort papers 01:30 PM - 02:30 PM (Europe/Amsterdam) 2026/03/31 11:30:00 UTC - 2026/03/31 12:30:00 UTC

Sequential recommender systems have achieved significant success in modeling temporal user behavior but remain limited in capturing rich user semantics beyond interaction patterns. Large Language Models (LLMs) present opportunities to enhance user understanding with their reasoning capabilities, yet existing integration approaches create prohibitive inference costs in real time. To address these limitations, we present a novel knowledge distillation method that utilizes textual user profile generated by pre-trained LLMs into sequential recommenders without requiring LLM inference at serving time. The resulting approach maintains the inference efficiency of traditional sequential models while requiring neither architectural modifications nor LLM fine-tuning.

Presenters

Co-Authors

LLM-Assisted Pseudo-Relevance Feedback

Short papersLarge Language Models Search and rankingShort papers 01:30 PM - 02:30 PM (Europe/Amsterdam) 2026/03/31 11:30:00 UTC - 2026/03/31 12:30:00 UTC

Query expansion is a long-standing technique to mitigate vocabulary mismatch in ad hoc Information Retrieval. Pseudo-relevance feedback methods, such as RM3, estimate an expanded query model from the top-ranked documents, but remain vulnerable to topic drift when early results include noisy or tangential content. Recent approaches instead prompt Large Language Models to generate synthetic expansions or query variants. While effective, these methods risk hallucinations and misalignment with collection-specific terminology. We propose a hybrid alternative that preserves the robustness and interpretability of classical PRF while leveraging LLM semantic judgement. Our method inserts an LLM-based filtering stage prior to RM3 estimation: the LLM judges the documents in the initial top-$k$ ranking, and RM3 is computed only over those accepted as relevant. This simple intervention improves over blind PRF and a strong baseline across several datasets and metrics.

Presenters

Co-Authors

110 visits

Session Participants

User Online

Session speakers, moderators & attendees

David Otero

Universidade Da Coruña

Mr. Amir Khosrojerdi

Student

University Of Toronto

Mehmet Erdeniz Aydo?du

Middle East Technical University

Tz-Huan Hsu

Data Scientist

CyCraft Technology Corporation Taiwan Branch

Yurou Zhao

PhD Student

Renmin University Of China

+ 12 more speakers. View All

No moderator for this session!

No attendee has checked-in to this session!

43 attendees saved this session

Session Chat

Live Chat

Chat with participants attending this session

Questions & Answers

Answered

Submit questions for the presenters

Session Polls

Active

Participate in live polls

Need Help?

Technical Issues?

If you're experiencing playback problems, try adjusting the quality or refreshing the page.

Questions for Speakers?

Use the Q&A tab to submit questions that may be addressed in follow-up sessions.

Poster Session (Short Papers)

Session Information

Sub Sessions

Knowledge-enhanced Multi-Agent for LLM-based Recommendation

Controlling Gender Bias in Retrieval via a Backpack Architecture

Structure-aware Pre-Retrieval Performance Prediction onQuery Affinity Graphs

From Single to Multi-Agent Reasoning: Advancing GeneGPT for Genomics QA

Aligning Instruction-Tuned LLMs for Event Extraction with Multi-objective Reinforcement Learning

Correct but Incomplete: Why Chain-of-Thought Cannot Currently Support Auditable Reasoning

Generative Retrieval via Few-shot Indexing

Topological Metric for Unsupervised Embedding Quality Evaluation

Principled Context Engineering for RAG: StatisticalGuarantees via Conformal Prediction

Trust Me on This: A User Study of Explainability for AI-Generated Responses

EmbMerge: A Transformer-based Method for Fusing CDR Lists

Adversarial Edge Perturbation Framework in Graph-based Retrieval

Enhancing Attention-based Context Attribution via TokenSelection and Think-Twice Mechanism

Beyond Persuasiveness: A User-Centric Evaluation Framework of Explanations for Food Recommendation

Beyond Correlations: A Downstream Evaluation Framework forQuery Performance Prediction

Pre-trained LLMs Meet Sequential Recommenders: Efficient User-Centric Knowledge Distillation

LLM-Assisted Pseudo-Relevance Feedback

Session Participants

Session Chat

Questions & Answers

Session Polls

Need Help?

Please enter the four digit secret code The secret code should have been announced or displayed at the session location.

AI-generated Summary

Please enter the four digit secret code
The secret code should have been announced or displayed at the session location.