Abstract Summary
Information Retrieval systems are becoming increasingly multimodal, and one interesting and natural modality for human information interaction is audio. Audio data is rapidly expanding in both volume and popularity. For example, as of September 2025, there are 4.52 million podcasts available worldwide, with over 584 million listeners, indicating a global growth in attention to this industry. However, several challenges arise when searching this vast space of data, including content diversity, length variability, and various indexing considerations such as retrieval segmentation and text vs audio retrieval modalities. In this work, we aim to address some of these challenges and provide a useful base solution to the information retrieval community interested in this domain. We plan to achieve this goal through three main projects; In the first one, we develop and group all relevant text-based techniques and resources. This includes developing baseline retrieval systems, identifying suitable automatic speech recognition (ASR) models, and surveying and expanding existing test collections. In the second project, we conduct a systematic failure analysis of the systems resulting from the first project. The output is intended to highlight common failure cases and provide practical recommendations to avoid them. Finally, we plan to devise novel approaches benefiting from the attained recommendations, develop audio-based solutions, and explore the effect of fusing the best of both modalities, i.e., text and audio.