Open Web Indexes for Remote Querying

This abstract has open access
Abstract Summary
We propose to redesign the access to Web-scale indexes. Instead of using custom search engine software and hiding access behind an API or a user interface, we store the inverted file in a standard, open source file format (Parquet) on publicly accessible (and cheap) object storage. Users can perform retrieval by fetching the relevant postings for the query terms and performing ranking locally. By using standard data formats and cloud infrastructure, we (a) natively support a wide range of downstream clients, and (b) can directly benefit from improvements in analytical query processing engines. We show the viability of our approach through a series of experiments using the ClueWeb corpora. While our approach (naturally) has a higher latency than dedicated search APIs, we show that we can still obtain results in reasonable time (usually within 10-20 seconds). Therefore, we argue that the increased accessibility and decreased deployment costs make this a suitable setup for cooperation in IR research by sharing large indexes publicly.
Abstract ID :
NKDR43
Submission Type
PhD Candidate
,
Radboud University
Find me on Mastodon: https://idf.social/@djoerd
,
Radboud University
Radboud University

Abstracts With Same Type

Abstract ID
Abstract Title
Abstract Topic
Submission Type
Primary Author
NKDR52
Search and ranking
Full papers
Emmanouil Georgios Lionis
NKDR51
Search and rankingSocietally-motivated IR research
Full papers
Martim Baltazar
NKDR15
ApplicationsMachine Learning and Large Language Models
Full papers
Saeedeh Javadi
NKDR49
Societally-motivated IR researchUser aspects in IR
Full papers
Niall McGuire
NKDR177
ApplicationsSearch and ranking
Full papers
Danyang Hou
NKDR184
ApplicationsEvaluation research
Full papers
Danyang Hou
NKDR193
ApplicationsSearch and ranking
Full papers
Danyang Hou
NKDR39
ApplicationsMachine Learning and Large Language Models
Full papers
Sarmistha Das
2 visits