An Open SERP Mining Infrastructure for the Archive Query Log

This abstract has open access
Abstract Summary
Query logs are key resources for studying search engine interactions and improving retrieval effectiveness but are rarely publicly available. In the past, search providers only shared small subsets of their own logs to curb competition and to ensure privacy. The Archive Query Log (AQL) will become an open alternative: mining query logs from archived search engine result pages (SERPs). While the AQL-22 prototype demonstrated the feasibility of this approach, its limited scalability and maintainability hindered the widespread adoption by the research community. We re-implement the crawling and parsing of the AQL on open infrastructure, using standard tools, a new framework for storing SERPs, and following FAIR data principles. The extended and continuously crawled AQL-25 corpus contains 553 million SERPs from 775 search providers, mined from six web archives, where so far 223 million SERPs (44 TB; 40%) have been downloaded and parsed. We demonstrate the use of this new AQL mining framework in two typical analysis scenarios: a temporal analysis now implemented as a single Elasticsearch query and a batch-processing analysis using Ray. Our resource equips researchers with all the tools needed to analyze SERPs.
Abstract ID :
NKDR132
Submission Type
Submission Topics
Friedrich-Schiller-Universität Jena
University of Kassel
Assistant Professor
,
University Of Tübingen
University Of Kassel, Hessian.AI, And ScaDS.AI

Abstracts With Same Type

Abstract ID
Abstract Title
Abstract Topic
Submission Type
Primary Author
NKDR140
User aspects in IR
Resource
Saber Zerhoudi
NKDR129
Machine Learning and Large Language Models Societally-motivated IR research
Resource
Ricardo Campos
NKDR131
Machine Learning and Large Language Models Societally-motivated IR research
Resource
Ricardo Campos
NKDR93
Evaluation research Machine Learning and Large Language Models Search and ranking
Resource
Laura Caspari
NKDR125
Evaluation research Recommender systems
Resource
Ludovico Boratto
1 visits