Abstract Summary
Instruction-aware retrievers are retrieval models that usenatural language instructions to specify fine-grainedretrieval conditions beyond the original query. Theseretrievers, built on large language models, are trainedusing contrastive learning to consider both the relevancebetween a query and a document and the relevance between aninstruction-augmented query and a document. However, duringtraining, instruction-augmented queries are learned solelyfrom relevance information associated with relateddocuments, without explicitly considering the originalquery. As a result, retrievers often struggle todistinguish between the query and the instruction, leadingto results that either do not follow the instruction or areirrelevant to the original query. To address this issue, wepropose a query-preserving regularization method integratedinto contrastive learning. The proposed method aligns thedocument relevance distributions induced by the originalquery and the internal query representation within theinstruction-augmented query, ensuring that the modelpreserves the original query's semantics while using theinstruction to guide relevance learning from relateddocuments. Experiments on two instruction followingretrieval benchmarks demonstrate that our method improvesthe existing state-of-the-art instruction-aware retriever.Furthermore, our model achieves strong performance onstandard retrieval tasks without instructions, in both indomain and out of domain scenarios.