Abstract Summary
Transformer-based cross-encoders, such as MonoT5, achieve state-of-the-art performance in several retrieval tasks. In particular, MonoT5 is based on a sequence-to-sequence architecture and trained on MS MARCO for passage re-ranking, where it predicts the relevance of a passage given an input query. In this paper, to understand what MonoT5 learned, we analyse the parameter updates during its training process. We observe that the largest shifts occur in a small set of parameters, i.e. less than 1% of the model, while the rest of the model remains relatively unchanged. Motivated by this finding, we propose Light-MonoT5, a parameter-efficient variant of MonoT5 that updates only this small set of parameters during training, and leaves the rest of the network unchanged. Extensive evaluation on both in- and out-domain benchmarks shows that Light-MonoT5 achieves statistically equivalent effectiveness compared to MonoT5. Since relevance can be captured by updating only a subset of T5 parameters, we hypothesise that MonoT5, which updates all the original model°Øs parameters, primarily learns to evaluate passage quality rather than explicitly assessing the relevance of a passage to the query. To test our hypothesis, we employ QT5, a T5-based quality estimation model, to prune low-quality passages before indexing. On the pruned collection, Light-MonoT5 achieves performance on par with MonoT5, indicating that MonoT5°Øs strong performance is largely attributable to quality assessment, with minimal adaptation required once low-quality content is removed.