Tutorial On Mechanistic Interpretability

Loading Session...

Session Information

This tutorial introduces mechanistic interpretability, a growing research area within the broader interpretability community that seeks to reverse-engineer model components to understand how neural models perform tasks. While this area has rapidly advanced in NLP, yielding insights into the inner workings of large Transformer-based models and enabling model diagnostics, controllability, and safety, it remains largely unexplored in IR.

This tutorial provides a foundational overview of mechanistic interpretability in NLP, covering its key goals and core methods. We then zoom in on its early applications in IR, examining the few existing studies in depth and discussing how these methods can be adapted to retrieval settings. Through an interactive coding session, participants will gain a practical understanding of how to design, implement, and analyze mechanistic interpretability experiments.

By the end, attendees will be equipped with the conceptual and practical foundation needed to initiate their own research and help strengthen the emerging interpretability and explainability community within IR.

Learning Objectives

Understand the key goals and motivations of mechanistic interpretability as a research area.
Become familiar with the core methods that have led to influential findings in ML and NLP.
Explore how these methods can be applied to IR and identify open directions for future work.
Gain hands-on experience in designing, implementing, and analyzing mechanistic interpretability experiments.

Website: https://mech-interp-tutorial-ecir26.github.io/

Half Day

Mar 29, 2026 09:00 - 12:30(Europe/Amsterdam)

Venue : Commissiekamer 3

20260329T0900 20260329T1230 Europe/Amsterdam Tutorial on Mechanistic Interpretability This tutorial introduces mechanistic interpretability, a growing research area within the broader interpretability community that seeks to reverse-engineer model components to understand how neural models perform tasks. While this area has rapidly advanced in NLP, yielding insights into the inner workings of large Transformer-based models and enabling model diagnostics, controllability, and safety, it remains largely unexplored in IR. This tutorial provides a foundational overview of mechanistic interpretability in NLP, covering its key goals and core methods. We then zoom in on its early applications in IR, examining the few existing studies in depth and discussing how these methods can be adapted to retrieval settings. Through an interactive coding session, participants will gain a practical understanding of how to design, implement, and analyze mechanistic interpretability experiments. By the end, attendees will be equipped with the conceptual and practical foundation needed to initiate their own research and help strengthen the emerging interpretability and explainability community within IR. Learning ObjectivesUnderstand the key goals and motivations of mechanistic interpretability as a research area.Become familiar with the core methods that have led to influential findings in ML and NLP.Explore how these methods can be applied to IR and identify open directions for future work.Gain hands-on experience in designing, implementing, and analyzing mechanistic interpretability experiments. Website: https://mech-interp-tutorial-ecir26.github.io/ Commissiekamer 3 ECIR2026 conference-secretariat@blueboxevents.nl

Add to my Schedule

Sub Sessions

Tutorial on Mechanistic Interpretability

Tutorials 09:00 AM - 12:30 PM (Europe/Amsterdam) 2026/03/29 07:00:00 UTC - 2026/03/29 10:30:00 UTC

"This tutorial introduces mechanistic interpretability, a growing research area within the broader interpretability community that seeks to reverse-engineer model components to understand how neural models perform tasks. While this area has rapidly advanced in NLP, yielding insights into the inner workings of large Transformer-based models and enabling model diagnostics, controllability, and safety, it remains largely unexplored in IR. This tutorial provides a foundational overview of mechanistic interpretability in NLP, covering its key goals and core methods. We then zoom in on its early applications in IR, examining the few existing studies in depth and discussing how these methods can be adapted to retrieval settings. Through an interactive coding session, participants will gain a practical understanding of how to design, implement, and analyze mechanistic interpretability experiments. By the end, attendees will be equipped with the conceptual and practical foundation needed to initiate their own research and help strengthen the emerging interpretability and explainability community within IR."

Presenters

Co-Authors

125 visits

Session Participants

User Online

Session speakers, moderators & attendees

Catherine Chen

PhD Candidate

Brown University

No moderator for this session!

No attendee has checked-in to this session!

6 attendees saved this session

Session Chat

Live Chat

Chat with participants attending this session

Questions & Answers

Answered

Submit questions for the presenters

Session Polls

Active

Participate in live polls

Need Help?

Technical Issues?

If you're experiencing playback problems, try adjusting the quality or refreshing the page.

Questions for Speakers?

Use the Q&A tab to submit questions that may be addressed in follow-up sessions.

Tutorial on Mechanistic Interpretability

Session Information

Learning Objectives

Sub Sessions

Tutorial on Mechanistic Interpretability

Session Participants

Session Chat

Questions & Answers

Session Polls

Need Help?

Please enter the four digit secret code The secret code should have been announced or displayed at the session location.

AI-generated Summary

Please enter the four digit secret code
The secret code should have been announced or displayed at the session location.