Poster Session (Short Papers)

Loading Session...

Session Information

Short papers:

Multi-Step Semantic Reasoning in Generative Retrieval
Steven Dong, Yubao Tang and Maarten de Rijke

SSEmb: A Joint Structural and Semantic Embedding Framework for Mathematical Formula Retrieval
Ruyin Li and Xiaoyu Chen

On the Viability of Exploiting Large Language Models for Misinformation Annotation
Pablo Landrove, Marcos Fernandez-Pichel and David E. Losada

Incorporating Q&A Nuggets into Retrieval-Augmented Generation
Laura Dietz, Bryan Li, Gabrielle Liu, Jia-Huei Ju, Eugene Yang, Dawn Lawrie, William Walden and James Mayfield

Evolving Mixture of Low-Rank Experts for Continual User Modeling
Jeevan Thapa, Sinan Zhao and Koyoshi Shindo

Personalized Autocompletion of Interactions with LLM-based Chatbots
Shani Goren, Nachshon Cohen, Oren Kalinsky, Tomer Stav, Yaron Fairstein, Yuri Rapoport, Ram Yazdi, Alex Libov and Guy Kushilevitz

Evaluating Large Language Models as Domain-Specific Retrieval Agents: A Study on Cybersecurity Challenge Benchmarks
Omed Abed, Md. Samiul Haque, Patrick-Benjamin Bök and Matteo Große-Kampmann

Large Language Models as Assessors: On the Impact of Relevance Scales
Riccardo Zamolo, Riccardo Lunardi, Michael Soprano, Gianluca Demartini, Stefano Mizzaro and Kevin Roitero

Analyzing AI Evaluation Benchmarks Through Information Retrieval and Network Science
Gaia Simeoni, Michael Soprano, Riccardo Lunardi, Kevin Roitero and Stefano Mizzaro

Evaluating Retrieval-Augmented Generation Systems on Unanswerable, Uncheatable, Realistic, Multi-hop Queries
Gabrielle Liu, Bryan Li, Arman Cohan, William Walden and Eugene Yang

DARE: A Dialectical Framework for Adversarial and Evidence-Aware RAG
Saisab Sadhu, Dwaipayan Roy and Tannay Basu

Do We Still Need Text for Video Retrieval in the Era of Vision-Language Models?
Jiaqi Zhan, Xinyu Zhang, Shengyao Zhuang, Xueguang Ma and Jimmy Lin

Query Performance Prediction using a Child-focused Definition of Relevance
Hrishita Chakrabarti and Maria Soledad Pera

ReFormeR: Learning and Applying Explicit Query Reformulation Patterns
Amin Bigdeli, Mert Incesu, Negar Arabzadeh, Charles L. A. Clarke and Ebrahim Bagheri

One Word is Enough: Minimal Adversarial Perturbations for Neural Text Ranking
Tanmay Karmakar, Sourav Saha, Debapriyo Majumdar and Surjyanee Halder

Text vs. Speech? Detecting Audio Deepfakes on Instagram
Karla Schäfer

MiNER: A Two-Stage Pipeline for Metadata Extraction from Municipal Meeting Minutes
Rodrigo Batista, Filipe Cunha, Purificação Silvano, Nuno Guimarães, Alípio Jorge, Evelin Amorim and Ricardo Campos

Revisiting Human-vs-LLM judgments on the TREC Podcast Track
Watheq Mansour, J. Shane Culpepper, Joel Mackenzie and Andrew Yates

Forward Index Compression for Learned Sparse Retrieval
Sebastian Bruch, Martino Fontana, Franco Maria Nardini, Cosimo Rulli and Rossano Venturini

Posters

Mar 30, 2026 13:30 - 14:30(Europe/Amsterdam)

Venue : Chemie & Chaos

20260330T1330 20260330T1430 Europe/Amsterdam Poster Session (Short papers) Short papers: Multi-Step Semantic Reasoning in Generative RetrievalSteven Dong, Yubao Tang and Maarten de Rijke SSEmb: A Joint Structural and Semantic Embedding Framework for Mathematical Formula RetrievalRuyin Li and Xiaoyu Chen On the Viability of Exploiting Large Language Models for Misinformation AnnotationPablo Landrove, Marcos Fernandez-Pichel and David E. Losada Incorporating Q&A Nuggets into Retrieval-Augmented GenerationLaura Dietz, Bryan Li, Gabrielle Liu, Jia-Huei Ju, Eugene Yang, Dawn Lawrie, William Walden and James Mayfield Evolving Mixture of Low-Rank Experts for Continual User ModelingJeevan Thapa, Sinan Zhao and Koyoshi Shindo Personalized Autocompletion of Interactions with LLM-based ChatbotsShani Goren, Nachshon Cohen, Oren Kalinsky, Tomer Stav, Yaron Fairstein, Yuri Rapoport, Ram Yazdi, Alex Libov and Guy Kushilevitz Evaluating Large Language Models as Domain-Specific Retrieval Agents: A Study on Cybersecurity Challenge BenchmarksOmed Abed, Md. Samiul Haque, Patrick-Benjamin Bök and Matteo Große-Kampmann Large Language Models as Assessors: On the Impact of Relevance ScalesRiccardo Zamolo, Riccardo Lunardi, Michael Soprano, Gianluca Demartini, Stefano Mizzaro and Kevin Roitero Analyzing AI Evaluation Benchmarks Through Information Retrieval and Network ScienceGaia Simeoni, Michael Soprano, Riccardo Lunardi, Kevin Roitero and Stefano Mizzaro Evaluating Retrieval-Augmented Generation Systems on Unanswerable, ... Chemie & Chaos ECIR2026 conference-secretariat@blueboxevents.nl

Add to my Schedule

Sub Sessions

Analyzing AI Evaluation Benchmarks Through InformationRetrieval and Network Science

Short papersIR evaluation 01:30 PM - 02:30 PM (Europe/Amsterdam) 2026/03/30 11:30:00 UTC - 2026/03/30 12:30:00 UTC

Many analyses have been performed on Information Retrieval (IR) evaluation benchmarks, with many different approaches. Benchmarking also plays a central role in evaluating the capabilities of Large Language Models (LLMs). However, recent concerns have emerged regarding the robustness of benchmarks and the reliability of leaderboard rankings. In this paper, we apply an IR approach to LLM evaluation: we analyze LLM benchmark results through the lens of network science. Adapting a method developed for TREC test collections, we construct a bipartite graph between models and benchmark questions and apply Kleinberg's HITS algorithm to uncover latent structure in the evaluation data. In this framework, model hubness quantifies a modelÁøs tendency to perform well on easy questions, while question hubness captures its ability to discriminate between more and less effective models. This graph-based view provides a principled way to diagnose benchmark-induced biases and to assess the reliability of model rankings. We conduct experiments on seven multiple-choice QA benchmarks with a pool of 34 LLMs. Through this IR-inspired approach, we show that model rankings on leaderboards are strongly influenced by subsets of easy questions.

Presenters

Co-Authors

Large Language Models as Assessors: On the Impact ofRelevance Scales

Short papersIR evaluation Large Language Models 01:30 PM - 02:30 PM (Europe/Amsterdam) 2026/03/30 11:30:00 UTC - 2026/03/30 12:30:00 UTC

Traditionally, the task of relevance judgment has relied on human annotators, but recent advances in Large Language Models (LLMs) have prompted growing interest in their use as proxy to provide fully- or semi-automated judgments of relevance. In this setting, a key yet underexplored factor is the role of the relevance scale adopted when judging relevance. Relevance scales range from binary to fine-grained ones, and their impact on the effectiveness of LLM-based judgments, the effect of scale conversions, and their role in the presence of potential data contamination are yet unknown. In this paper, we systematically investigate how different scales, as well as their conversions, affect LLM ability in providing reliable point-wise relevance judgments, exploring that across multiple prompting strategies and model sizes. Using a popular TREC collection, we compare model outputs against both crowd and expert annotations, analyzing their alignment, stability, and potential data contamination issues.

Presenters

Co-Authors

SSEmb: A Joint Structural and Semantic Embedding Frameworkfor Mathematical Formula Retrieval

Short papers 01:30 PM - 02:30 PM (Europe/Amsterdam) 2026/03/30 11:30:00 UTC - 2026/03/30 12:30:00 UTC

Formula retrieval is an important topic in Mathematical Information Retrieval. We propose SSEmb, a novel embedding framework capable of capturing both structural and semantic features of formulas. Structurally, we employ Graph Contrastive Learning to encode formulas represented as Shared-substructure Operator Graphs. To enhance structural diversity while preserving mathematical validity of these formula graphs, we introduce a novel graph data augmentation approach through a substitution strategy. Semantically, we utilize Sentence-BERT to encode the surrounding text of formulas. Finally, for each query and its candidates, structural and semantic similarities are calculated separately and then fused through a weighted scheme. In the ARQMath-3 Formula Retrieval Task, SSEmb outperforms existing embedding-based methods by over 5 percentage points on P'@10 and nDCG'@10. Furthermore, SSEmb enhances the performance of all runs of other methods and achieves state-of-the-art results when combined with Approach0.

Presenters

Co-Authors

One Word is Enough: Minimal Adversarial Perturbations for Neural Text Ranking

Short papersIR evaluation Machine learning Search and rankingShort papers 01:30 PM - 02:30 PM (Europe/Amsterdam) 2026/03/30 11:30:00 UTC - 2026/03/30 12:30:00 UTC

Presenters

Co-Authors

ReFormeR: Learning and Applying Explicit Query Reformulation Patterns

Short papersGenerative IRSearch and rankingShort papers 01:30 PM - 02:30 PM (Europe/Amsterdam) 2026/03/30 11:30:00 UTC - 2026/03/30 12:30:00 UTC

We present ReFormeR, a pattern-guided approach for query reformulation. Instead of prompting a language model to generate reformulations of a query directly, ReFormeR first elicits short reformulation patterns from pairs of initial queries and empirically stronger reformulations, consolidates them into a compact library of transferable reformulation patterns, and then selects an appropriate reformulation pattern for a new query given its retrieval context. The selected pattern constrains query reformulation to controlled operations such as sense disambiguation, vocabulary grounding, or discriminative facet addition, to name a few. As such, our proposed approach makes the reformulation policy explicit through these reformulation patterns, guiding the LLM towards targeted and effective query reformulations. Our extensive experiments on TREC DL 2019, DL 2020, and DL Hard show consistent improvements over classical feedback methods and recent LLM-based query reformulation and expansion approaches.

Presenters

Co-Authors

Query Performance Prediction using a Child-focused Definition of Relevance

Short papersIR applicationsIR evaluationSocietally-motivated IR researchUser aspects in IRShort papers 01:30 PM - 02:30 PM (Europe/Amsterdam) 2026/03/30 11:30:00 UTC - 2026/03/30 12:30:00 UTC

Query performance prediction (QPP) methods have primarily been tailored to mainstream users, thus relying on the traditional concept of relevance. In the case of children, however, relevance goes beyond content-based resource-query matching, which is why we gauge the performance of existing QPP methods in estimating the fit of resources retrieved in response to child-formulated queries. Outcomes from our empirical exploration of various QPP methods using a traditional and a child-focused definition of relevance on 2 datasets reveal the limitations in the adaptability of existing methods to the context of child information retrieval.

Presenters

Co-Authors

Text vs. Speech? Detecting Audio Deepfakes on Instagram

Short papersExplainability methods IR applications Societally-motivated IR researchShort papers 01:30 PM - 02:30 PM (Europe/Amsterdam) 2026/03/30 11:30:00 UTC - 2026/03/30 12:30:00 UTC

With the increasing use of AI, deepfakes are becoming an increasingly prevalent threat in today's world. At the same time, the performance of most detectors drops significantly when faced with unseen data, whereas generation models are improving, resulting in fewer artefacts. We examined deepfakes published on Instagram, using the SocialDF dataset. In addition to analysing the deepfakes in the frequency domain using audio deepfake detectors, we transcribed the speech and analysed the text (e.g. emotion and topics) and the audio content (e.g. emotion and music genre). We found that audio deepfake detectors struggle to identify real-world deepfakes on Instagram. Furthermore, current audio deepfake detection uses audio artefacts only. Content is not used for detection purposes. We suggest using both the speech recording and the content. This approach improves results on real-world data and provides an explanation for the classification. Using content information, we outperformed frequency-based detection with an F1-score of 74.3%.

Presenters

MiNER: A Two-Stage Pipeline for Metadata Extraction fromMunicipal Meeting Minutes

Short papersIR applications Machine learning 01:30 PM - 02:30 PM (Europe/Amsterdam) 2026/03/30 11:30:00 UTC - 2026/03/30 12:30:00 UTC

Municipal meeting minutes are official documents of local governance, exhibiting heterogeneous formats and writing styles. Effective information retrieval (IR) requires identifying metadata such as meeting number, date, location, participants, and start/end times, elements that are rarely standardized or easy to extract automatically. Existing named entity recognition (NER) models are ill-suited to this task, as they are not adapted to such domain-specific categories. In this paper, we propose a two-stage pipeline for metadata extraction from municipal minutes. First, a question answering (QA) model identifies the opening and closing text segments containing metadata. Transformer-based models (BERTimbau and XLM-RoBERTa with and without a CRF layer) are then applied for fine-grained entity extraction and enhanced through deslexicalization. To evaluate our proposed pipeline, we benchmark both open-weight (Phi) and closed-weight (Gemini) LLMs, assessing predictive performance, inference cost, and carbon footprint. Our results demonstrate strong in-domain performance, better than larger general-purpose LLMs. However, cross-municipality evaluation reveals reduced generalization reflecting the variability and linguistic complexity of municipal records. This work establishes the first benchmark for metadata extraction from municipal meeting minutes, providing a solid foundation for future research in this domain.

Presenters

Co-Authors

Forward Index Compression for Learned Sparse Retrieval

Short papersShort papers 01:30 PM - 02:30 PM (Europe/Amsterdam) 2026/03/30 11:30:00 UTC - 2026/03/30 12:30:00 UTC

Text retrieval using learned sparse representations of queries and documents has, over the years, evolved into a highly effective approach to search. It is thanks to recent advances in approximate nearest neighbor search---with the emergence of highly efficient algorithms such as the inverted index-based (Seismic) and the graph-based (HNSW)---that retrieval with sparse representations became viable in practice. In this work, we scrutinize the efficiency of sparse retrieval algorithms and focus particularly on the size of a data structure that is common to all algorithmic flavors and that constitutes a substantial fraction of the overall index size: the forward index. In particular, we seek compression techniques to reduce the storage footprint of the forward index without compromising search quality or inner product computation latency. In our examination with various integer compression techniques, we report that StreamVByte achieves the best trade-off between memory footprint, retrieval accuracy, and latency. We then improve StreamVByte by introducing DotVByte, a new algorithm tailored to inner product computation. Experiments on MSMARCO show that our improvements lead to significant space savings while maintaining retrieval efficiency.

Presenters

Co-Authors

Revisiting Human-vs-LLM judgments on the TREC Podcast Track

Short papersLarge Language Models Search and rankingShort papers 01:30 PM - 02:30 PM (Europe/Amsterdam) 2026/03/30 11:30:00 UTC - 2026/03/30 12:30:00 UTC

Using large language models (LLMs) to annotate relevance is an increasingly important problem in the information retrieval community. While some studies demonstrate that LLMs can achieve high user agreement with ground truth (human) judgments, other studies have argued for the opposite conclusion. To the best of our knowledge, these studies have primarily focused on classic ad-hoc text search scenarios. In this paper, we conduct an analysis on user agreement between LLM and human experts, and explore the impact disagreement has on system rankings. In contrast to prior studies, we focus on a collection composed of audio files that are transcribed into two-minute segments -- the TREC 2020 and 2021 podcast track. We employ five different LLM models to re-assess all of the query-segment pairs, which were originally annotated by TREC assessors. Furthermore, we re-assess a small subset of pairs where LLM and TREC assessors have the highest disagreement, and found that the human experts tend to agree with LLMs more than with the TREC assessors. Our results reinforce the previous insights of Sormunen in 2002 -- that relying on a single assessor leads to lower user agreement.

Presenters

Co-Authors

Do We Still Need Text for Video Retrieval in the Era of Vision-Language Models?

Short papersShort papers 01:30 PM - 02:30 PM (Europe/Amsterdam) 2026/03/30 11:30:00 UTC - 2026/03/30 12:30:00 UTC

Effective video retrieval has historically relied heavily on textual descriptions and metadata. However, recent advances in vision-language models (VLMs) prompt the question: Are text features still essential for effective video retrieval? In this work, we investigate this question using a unified multimodal retrieval framework based on advanced VLM embeddings. Evaluating on the comprehensive and multilingual MultiVENT 2.0 dataset from the MAGMaR shared task, we show that multimodal retrieval systems, combining visual frames, audio signals, and textual descriptions, surpass traditional text-only retrieval performance. Remarkably, our results demonstrate that retrieval based solely on non-text modalities (vision and audio) achieves performance comparable to text-based methods, indicating that explicit text input may no longer be strictly necessary.

Presenters

Co-Authors

DARE: A Dialectical Framework for Adversarial and Evidence-Aware RAG

Short papersGenerative IRIR applicationsLarge Language ModelsRetrieval-Augmented GenerationSystem aspectsShort papers 01:30 PM - 02:30 PM (Europe/Amsterdam) 2026/03/30 11:30:00 UTC - 2026/03/30 12:30:00 UTC

Retrieval-Augmented Generation (RAG) systems are susceptible to factual inconsistencies when retrieved evidence is conflicting, a common issue with open-domain sources. Prevailing multi-agent approaches attempt to resolve this through unstructured debates that treat all information sources as equally credible. Concurrently, reliability-aware systems address source quality but typically only as a weighting factor during final aggregation, failing to integrate this crucial signal into the reasoning process itself. This paper proposes DARE (A Dialectical Adversarial RAG Engine), a novel framework that implements a formal dialectical process to resolve such conflicts through an evidence-aware adversarial agent that initiates a structured cross-examination of claims made by other agents. This process forces each claim to be defended against the complete set of source documents, allowing the system to dynamically infer an argument's credibility based on its logical resilience. By structuring the debate as a formal dialectic, DARE provides a more robust and principled mechanism for synthesizing truth from unreliable and conflicting information. The same has been observed in our empirical analysis where DARE outperforms the state of the arts in terms of exact match accuracy.

Presenters

Co-Authors

Multi-Step Semantic Reasoning in Generative Retrieval

Short papersGenerative IRSearch and rankingShort papers 01:30 PM - 02:30 PM (Europe/Amsterdam) 2026/03/30 11:30:00 UTC - 2026/03/30 12:30:00 UTC

Generative retrieval (GR) models encode a corpus within model parameters and generate relevant document identifiers directly for a given query. While this paradigm shows promise in retrieval tasks, existing GR models struggle with complex queries in numerical contexts, such as those involving semantic reasoning over financial reports, due to limited reasoning capabilities. This limitation leads to suboptimal retrieval accuracy and hinders practical applicability. We propose ReasonGR, a framework designed to enhance multi-step semantic reasoning in numerical contexts within GR. ReasonGR employs a structured prompting strategy combining task-specific instructions with stepwise reasoning guidance to better address complex retrieval queries. Additionally, it integrates a reasoning-focused adaptation module to improve learning of reasoning-related parameters. Experiments on the FinQA dataset, which contains financial queries over complex documents, demonstrate that ReasonGR improves retrieval accuracy and consistency, indicating its potential for advancing GR models in reasoning-intensive retrieval scenarios.

Presenters

Co-Authors

On the Viability of Exploiting Large Language Models for Misinformation Annotation

Short papersIR evaluationLarge Language ModelsShort papers 01:30 PM - 02:30 PM (Europe/Amsterdam) 2026/03/30 11:30:00 UTC - 2026/03/30 12:30:00 UTC

This paper investigates the potential of LLMs for automatically annotating the usefulness, supportiveness, and credibility of search results. These aspects, while essential to the construction of misinformation benchmarks, are expensive and difficult to obtain at scale. Our comparative study suggests that, under certain conditions, LLMs can provide reasonable estimates of usefulness and supportiveness. In contrast, credibility judgments generated by LLMs show almost no agreement with human assessments. This raises concerns for the exploitation of LLMs to assist in the construction of collections that require annotations that go beyond relevance.

Presenters

Co-Authors

Incorporating Q&A Nuggets into Retrieval-Augmented Generation

Short papersRetrieval-Augmented GenerationShort papers 01:30 PM - 02:30 PM (Europe/Amsterdam) 2026/03/30 11:30:00 UTC - 2026/03/30 12:30:00 UTC

Presenters

Co-Authors

Evolving Mixture of Low-Rank Experts for Continual User Modeling

Short papersMachine learningRecommender systemsUser aspects in IRShort papers 01:30 PM - 02:30 PM (Europe/Amsterdam) 2026/03/30 11:30:00 UTC - 2026/03/30 12:30:00 UTC

Building a user model that incorporates diverse tasks remains a big challenge. While continual learning offers an alternative to multi-task learning by eliminating the need for retraining on all past tasks, prior works train the whole network backbone along with task-specific masks, which becomes computationally inefficient. Recent prompt-based parameter-efficient continual user modeling (PECUM) addresses this challenge by training only a few parameters, thus reducing the training cost. However, prompt tuning can yield homogeneous task embeddings and converge slowly compared to adapters. Hence, we propose a novel framework to integrate SVD-decomposed low-rank adapters into continual user modeling, which can be interpreted as a relaxed mixture of rank-1 experts. We further develop a novel attention framework that selectively weighs experts trained by semantically similar past tasks, and we jointly learn their attention coefficients along with newly added adapters, enabling interference-free knowledge transfer. We show the effectiveness of our proposed method on two real-world datasets.

Presenters

Co-Authors

Personalized Autocompletion of Interactions with LLM-basedChatbots

Short papersConversational search and recommendationLarge Language ModelsRecommender systems 01:30 PM - 02:30 PM (Europe/Amsterdam) 2026/03/30 11:30:00 UTC - 2026/03/30 12:30:00 UTC

Composing messages in chatbot interactions is often time-consuming, making autocompletion an appealing way to reduce user effort. Different users have different preferences and therefore different expectations from autocompletion solutions. We study how personalization can improve the autocompletion process, evaluating four schemes defined along two axes: generation vs. ranking, and prior messages vs. external features. Experiments on the WildChat and PRISM datasets with the Mistral-7B and Phi-3.5-mini models show consistent gains. Our results highlight personalization as a key factor in building effective chatbot autocomplete systems, and assist researchers and practitioners in deciding where and how to invest in improving these solutions.

Presenters

Co-Authors

Evaluating Large Language Models as Domain-SpecificRetrieval Agents: A Study on Cybersecurity ChallengeBenchmarks

Short papersIR evaluationLarge Language ModelsSystem aspects 01:30 PM - 02:30 PM (Europe/Amsterdam) 2026/03/30 11:30:00 UTC - 2026/03/30 12:30:00 UTC

Large Language Models are increasingly used as retrieval and reasoning agents in specialized domains. This study evaluates their performance on cybersecurity Capture-the-Flag challenges, reframed as structured retrieval tasks where models must infer information from textual and code-based evidence. Using three public benchmarks, NYU~CSAW, CyBench, and InterCode-CTF, we compare five recent LLMs within a unified and reproducible evaluation framework. Results show significant variation across datasets and task categories, with performance differences across models. The proposed benchmark provides a foundation for assessing domain-specific retrieval and reasoning.

Presenters

Co-Authors

Evaluating Retrieval-Augmented Generation Systems onUnanswerable, Uncheatable, Realistic, Multi-hop Queries

Short papersLarge Language ModelsRetrieval-Augmented Generation 01:30 PM - 02:30 PM (Europe/Amsterdam) 2026/03/30 11:30:00 UTC - 2026/03/30 12:30:00 UTC

Real-world use cases often present RAG systems with complex queries for which relevant information is missing from the corpus or is incomplete. In these settings, RAG systems must be able to reject unanswerable, out-of-scope queries and identify failures of retrieval and multi-hop reasoning. Despite this, existing RAG benchmarks rarely reflect realistic task complexity for multi-hop or out-of-scope questions, which often can be cheated via disconnected reasoning (i.e., solved without genuine multi-hop inference) or require only simple factual recall. This limits the ability for such benchmarks to uncover limitations of existing RAG systems. To address this gap, we present the first pipeline for automatic, difficulty-controlled creation of uncheatable, realistic, unanswerable, and multi-hop queries (CRUMQs), adaptable to any corpus and domain. We use our pipeline to create CRUMQs over two popular RAG datasets and demonstrate its effectiveness via benchmark experiments on leading retrieval-augmented LLMs. Results show that compared to prior RAG benchmarks, CRUMQs are highly challenging for RAG systems and achieve up to 81.0% reduction in cheatability scores. More broadly, our pipeline offers a simple way to enhance benchmark difficulty and realism and drive development of more capable RAG systems.

Presenters

Co-Authors

176 visits

Session Participants

User Online

Session speakers, moderators & attendees

Steven Dong

Student

University of Amsterdam

Pablo Landrove

Universidade de Santiago de Compostela

Dr. Laura Dietz

Associate Professor

University Of New Hampshire

Jeevan Thapa

Rochester Institute of Technology

Shani Goren

Technion

+ 14 more speakers. View All

No moderator for this session!

Sérgio Nunes

University of Porto | INESC TEC

49 attendees saved this session

Session Chat

Live Chat

Chat with participants attending this session

Questions & Answers

Answered

Submit questions for the presenters

Session Polls

Active

Participate in live polls

Need Help?

Technical Issues?

If you're experiencing playback problems, try adjusting the quality or refreshing the page.

Questions for Speakers?

Use the Q&A tab to submit questions that may be addressed in follow-up sessions.

Poster Session (Short papers)

Session Information

Sub Sessions

Analyzing AI Evaluation Benchmarks Through InformationRetrieval and Network Science

Large Language Models as Assessors: On the Impact ofRelevance Scales

SSEmb: A Joint Structural and Semantic Embedding Frameworkfor Mathematical Formula Retrieval

One Word is Enough: Minimal Adversarial Perturbations for Neural Text Ranking

ReFormeR: Learning and Applying Explicit Query Reformulation Patterns

Query Performance Prediction using a Child-focused Definition of Relevance

Text vs. Speech? Detecting Audio Deepfakes on Instagram

MiNER: A Two-Stage Pipeline for Metadata Extraction fromMunicipal Meeting Minutes

Forward Index Compression for Learned Sparse Retrieval

Revisiting Human-vs-LLM judgments on the TREC Podcast Track

Do We Still Need Text for Video Retrieval in the Era of Vision-Language Models?

DARE: A Dialectical Framework for Adversarial and Evidence-Aware RAG

Multi-Step Semantic Reasoning in Generative Retrieval

On the Viability of Exploiting Large Language Models for Misinformation Annotation

Incorporating Q&A Nuggets into Retrieval-Augmented Generation

Evolving Mixture of Low-Rank Experts for Continual User Modeling

Personalized Autocompletion of Interactions with LLM-basedChatbots

Evaluating Large Language Models as Domain-SpecificRetrieval Agents: A Study on Cybersecurity ChallengeBenchmarks

Evaluating Retrieval-Augmented Generation Systems onUnanswerable, Uncheatable, Realistic, Multi-hop Queries

Session Participants

Session Chat

Questions & Answers

Session Polls

Need Help?

Please enter the four digit secret code The secret code should have been announced or displayed at the session location.

AI-generated Summary

Please enter the four digit secret code
The secret code should have been announced or displayed at the session location.