Reproducibility II: Retrieval

Loading Session...

Session Information

Fast, Compact, Dynamic Indexing for Learned Sparse Retrieval Systems
Down with the Hierarchy: The 'H' in HNSW Stands for "Hubs"
Multivector Reranking in the Era of Strong First-Stage Retrievers
Temporal Fact Conflicts in LLMs: Reproducibility Insights from Unifying DYNAMICQA and MULAN

Reproducibility

Mar 30, 2026 16:00 - 17:00(Europe/Amsterdam)

Venue : Chaos

20260330T1600 20260330T1700 Europe/Amsterdam Reproducibility II: Retrieval Fast, Compact, Dynamic Indexing for Learned Sparse Retrieval SystemsDown with the Hierarchy: The 'H' in HNSW Stands for "Hubs"Multivector Reranking in the Era of Strong First-Stage RetrieversTemporal Fact Conflicts in LLMs: Reproducibility Insights from Unifying DYNAMICQA and MULAN Chaos ECIR2026 conference-secretariat@blueboxevents.nl

Add to my Schedule

Sub Sessions

Fast, Compact, Dynamic Indexing for Learned Sparse Retrieval Systems

ReproducibilityReproducibility 04:00 PM - 05:00 PM (Europe/Amsterdam) 2026/03/30 14:00:00 UTC - 2026/03/30 15:00:00 UTC

Learned sparse retrieval (LSR) is an emerging paradigm that uses pretrained language models to assign learned weights to the terms of a document, enabling practitioners to deploy next-generation rankers within their existing lexical retrieval pipelines. Although LSR systems have been found to provide strong increases in effectiveness over traditional statistical approaches, this boost comes at the cost of both indexing and retrieval efficiency. In this work, we explore the application of LSR to a practical online setting where new documents must be indexed and searchable as soon as they arrive. In particular, we create a clean-room re-implementation of the current state-of-the-art linked block dynamic indexing approach, and propose a set of important augmentations that enable efficient online indexing and query processing to generalize to the learned sparse regime. Our results over two traditional and two LSR models, and a multitude of experimental settings, demonstrate the practicality of our approach, allowing new documents to be queried at a reasonable latency while maintaining fast insertion ability.

Presenters

Co-Authors

Down with the Hierarchy: The 'H' in HNSW Stands for "Hubs"

ReproducibilityReproducibility 04:00 PM - 05:00 PM (Europe/Amsterdam) 2026/03/30 14:00:00 UTC - 2026/03/30 15:00:00 UTC

Driven by recent breakthrough advances in neural representation learning, approximate near-neighbor (ANN) search over vector embeddings has emerged as a critical computational workload. With the introduction of the seminal Hierarchical Navigable Small World (HNSW) algorithm, graph-based indexes have established themselves as the overwhelmingly dominant paradigm for efficient and scalable ANN search. As the name suggests, HNSW searches a layered hierarchical graph to quickly identify neighborhoods of similar points to a given query vector. But is this hierarchy even necessary? A rigorous experimental analysis to answer this question would provide valuable insights into the nature of algorithm design for ANN search and motivate directions for future work in this increasingly crucial domain. To that end, we conduct an extensive reproducibility study covering more large-scale datasets than prior investigations of this question. We ultimately find that a flat navigable small world graph retains all of the benefits of HNSW on high-dimensional datasets, with latency and recall performance essentially identical to the original algorithm but with less memory overhead. Furthermore, we go a step beyond prior works and study why the hierarchy of HNSW provides no benefit in high dimensions, hypothesizing that navigable small world graphs contain a well-connected, frequently traversed highway of hub nodes that maintain the same purported function as the hierarchical layers. We present compelling empirical evidence that the Hub Highway Hypothesis holds for real datasets and investigate the mechanisms by which the highway forms. The implications of this hypothesis may also provide future research directions in developing enhancements to graph-based ANN search.

Presenters

Co-Authors

Multivector Reranking in the Era of Strong First-Stage Retrievers

ReproducibilityReproducibility 04:00 PM - 05:00 PM (Europe/Amsterdam) 2026/03/30 14:00:00 UTC - 2026/03/30 15:00:00 UTC

Learned multivector representations power modern search systems with strong retrieval effectiveness, but their real-world use is limited by the high cost of exhaustive token-level retrieval. Therefore, most systems adopt a gather-and-refine strategy, where a lightweight gather phase selects candidates for full scoring. However, this approach requires expensive searches over large token-level indexes and often misses the documents that would rank highest under full similarity. In this paper, we reproduce several state-of-the-art multivector retrieval methods on two publicly available datasets, providing a clear picture of the current multivector retrieval field and observing the inefficiency of token-level gathering. Building on top of that, we show that replacing the token-level gather phase with a single-vector document retriever ---specifically, a learned sparse retriever (LSR)--- produces a smaller and more semantically coherent candidate set. This recasts the gather-and-refine pipeline into the well-established two-stage retrieval architecture. As retrieval latency decreases, query encoding with two neural encoders becomes the dominant computational bottleneck. To mitigate this, we integrate recent inference-free LSR methods, demonstrating that they preserve the retrieval effectiveness of the dual-encoder pipeline while substantially reducing query encoding time. Finally, we investigate multiple reranking configurations that balance efficiency, memory, and effectiveness, and we introduce two optimization techniques that prune low-quality candidates early. Empirical results show that these techniques improve retrieval efficiency by up to 1.8 times with no loss in quality. Overall, our two-stage approach achieves over 24 times speedup over the state-of-the-art multivector retrieval systems, while maintaining comparable or superior retrieval quality.

Presenters

Co-Authors

Temporal Fact Conflicts in LLMs: Reproducibility Insightsfrom Unifying DYNAMICQA and MULAN

Reproducibility 04:00 PM - 05:00 PM (Europe/Amsterdam) 2026/03/30 14:00:00 UTC - 2026/03/30 15:00:00 UTC

Large Language Models (LLMs) often struggle with temporal fact conflicts due to outdated or evolving information in their training data. Two recent studies with accompanying dataset report opposite conclusions on whether external context can effectively resolve such conflicts. DYNAMICQA evaluates how effective external context is in shifting the model's output distribution, finding that temporal facts are more resistant to change. In contrast, MULAN examines how often external context changes memorised facts, concluding that temporal facts are easier to update. In this reproducibility paper, we first reproduce experiments from both benchmarks. We then reproduce the experiments of each study on the dataset of the other to investigate the source of their disagreement. To enable direct comparison of findings, we standardise both datasets to align with the evaluation settings of each study. Importantly, using an LLM, we synthetically generate realistic natural language contexts to replace MULAN¡¯s programmatically constructed statements when reproducing the findings of DYNAMICQA. Our analysis reveals strong dataset dependence: MULAN¡¯s findings generalise under both methodological frameworks, whereas applying MULAN¡¯s evaluation to DYNAMICQA yields mixed outcomes. Finally, while the original studies only considered 7B LLMs, we reproduce these experiments across LLMs of varying sizes, revealing how scale influences the encoding and updating of temporal facts. Our results highlight how dataset design, evaluation metrics, and model size shape LLM behaviour in the presence of temporal knowledge conflicts. Our code and data are publicly available at https://anonymous.4open.science/r/Temporal-Conflict-Repro-30B7.

Presenters

Co-Authors

113 visits

Session Participants

User Online

Session speakers, moderators & attendees

Billy Rule

The University of Queensland

Blaise Munyampirwa

Argmax

Mr. Silvio Martinico

PhD Student

University Of Pisa

Mr. Ritajit Dey

PhD student

University Of Glasgow

Sean MacAvaney

Senior Lecturer

University Of Glasgow

No attendee has checked-in to this session!

21 attendees saved this session

Session Chat

Live Chat

Chat with participants attending this session

Questions & Answers

Answered

Submit questions for the presenters

Session Polls

Active

Participate in live polls

Need Help?

Technical Issues?

If you're experiencing playback problems, try adjusting the quality or refreshing the page.

Questions for Speakers?

Use the Q&A tab to submit questions that may be addressed in follow-up sessions.

Reproducibility II: Retrieval

Session Information

Sub Sessions

Fast, Compact, Dynamic Indexing for Learned Sparse Retrieval Systems

Down with the Hierarchy: The 'H' in HNSW Stands for "Hubs"

Multivector Reranking in the Era of Strong First-Stage Retrievers

Temporal Fact Conflicts in LLMs: Reproducibility Insightsfrom Unifying DYNAMICQA and MULAN

Session Participants

Session Chat

Questions & Answers

Session Polls

Need Help?

Please enter the four digit secret code The secret code should have been announced or displayed at the session location.

AI-generated Summary

Please enter the four digit secret code
The secret code should have been announced or displayed at the session location.