Open Web Indexes for Remote Querying

This abstract has open access

Abstract Summary

We propose to redesign the access to Web-scale indexes. Instead of using custom search engine software and hiding access behind an API or a user interface, we store the inverted file in a standard, open source file format (Parquet) on publicly accessible (and cheap) object storage. Users can perform retrieval by fetching the relevant postings for the query terms and performing ranking locally. By using standard data formats and cloud infrastructure, we (a) natively support a wide range of downstream clients, and (b) can directly benefit from improvements in analytical query processing engines. We show the viability of our approach through a series of experiments using the ClueWeb corpora. While our approach (naturally) has a higher latency than dedicated search APIs, we show that we can still obtain results in reasonable time (usually within 10-20 seconds). Therefore, we argue that the increased accessibility and decreased deployment costs make this a suitable setup for cooperation in IR research by sharing large indexes publicly.

Abstract ID :

NKDR43

Submission Type

Submission Topics

Search and rankingSystem aspects

Associated Sessions

RAG: Retrieval Utility, Scaling & Infrastructure

Author
Co-Authors

Gijs Hendriksen

PhD Candidate

,

Radboud University

Prof. Djoerd Hiemstra

Find me on Mastodon: https://idf.social/@djoerd

,

Radboud University

Arjen De Vries

Radboud University

Abstracts With Same Type

Abstract ID

Abstract Title

Abstract Topic

Submission Type

Primary Author

NKDR52

An Empirical Study of Model Casing in Learned Sparse Retrieval

Search and ranking

Full papers

Emmanouil Georgios Lionis

NKDR58

Breaking Flat: A Generalised Query Performance PredictionEvaluation Framework

Full papers

Ms. PAYEL SANTRA

NKDR51

Bribery-Resistant Ranking Systems: A Multipartite User-Agnostic Framework for AI Act Compliance

Search and rankingSocietally-motivated IR research

Full papers

Martim Baltazar

NKDR15

Contradictions in Context: Challenges forRetrieval-Augmented Generation in Healthcare

ApplicationsMachine Learning and Large Language Models

Full papers

Saeedeh Javadi

NKDR49

Cross-Sensory Brain Passage Retrieval: Scaling Beyond Visual to Audio

Societally-motivated IR researchUser aspects in IR

Full papers

Niall McGuire

NKDR177

Event-aware Video Corpus Moment Retrieval

ApplicationsSearch and ranking

Full papers

Danyang Hou

NKDR184

Event-aware Video Corpus Moment Retrieval

ApplicationsEvaluation research

Full papers

Danyang Hou

NKDR193

Event-aware Video Corpus Moment Retrieval

ApplicationsSearch and ranking

Full papers

Danyang Hou

NKDR39

ExpertMix: Aspect and Severity Detection in ConversationalComplaints

ApplicationsMachine Learning and Large Language Models

Full papers

Sarmistha Das

View All Abstracts

104 visits