Website Segmentation Beyond Structure: A Benchmark onFunctional and Digital Maturity Classes

This abstract has open access
Abstract Summary
Segmentation is a crucial prerequisite for effective and efficient information retrieval on websites, as it enables the structured interpretation of heterogeneous content. Recently, a novel dataset has been released that provides two complementary segmentation schemes: a broad functional segmentation and a niche segmentation based on website digital maturity. While the former captures general structural elements, the latter targets a more specialized classification task, creating an interesting challenge for state-of-the-art segmentation approaches. In this paper, we present the first comprehensive evaluation of visual and textual models on this dataset, ranging from basic rule-based methods to large language models. We assess their performance across both segmentation frameworks using multiple evaluation scores. Our results show that visual approaches, despite limited training data, are generally more successful at generalizing across website structures and consistently outperform textual models. Notably, ResNet18 achieves the strongest performance in both functional and maturity-based segmentation, which we attribute to its ability to effectively capture and integrate both global and local context of a webpage. These findings establish important baselines for future research and underscore the importance of developing models that can perform robustly in niche settings and under data-scarce conditions.
Abstract ID :
NKDR63
Submission Type
Institute of Computer Science, Zurich University of Applied Science ZHAW
Zurich University Of Applied Sciences
Institute of Computer Science, Zurich University of Applied Science ZHAW
University of Konstanz, Germany and Thurgau Institute for Digitial Transformation, Switzerland

Abstracts With Same Type

Abstract ID
Abstract Title
Abstract Topic
Submission Type
Primary Author
NKDR52
Search and ranking
Full papers
Emmanouil Georgios Lionis
NKDR51
Search and rankingSocietally-motivated IR research
Full papers
Martim Baltazar
NKDR15
ApplicationsMachine Learning and Large Language Models
Full papers
Saeedeh Javadi
NKDR49
Societally-motivated IR researchUser aspects in IR
Full papers
Niall McGuire
NKDR177
ApplicationsSearch and ranking
Full papers
Danyang Hou
NKDR184
ApplicationsEvaluation research
Full papers
Danyang Hou
NKDR193
ApplicationsSearch and ranking
Full papers
Danyang Hou
NKDR39
ApplicationsMachine Learning and Large Language Models
Full papers
Sarmistha Das
1 visits