Loading Session...

Tutorial on Mechanistic Interpretability

Session Information

This tutorial introduces mechanistic interpretability, a growing research area within the broader interpretability community that seeks to reverse-engineer model components to understand how neural models perform tasks. While this area has rapidly advanced in NLP, yielding insights into the inner workings of large Transformer-based models and enabling model diagnostics, controllability, and safety, it remains largely unexplored in IR.

This tutorial provides a foundational overview of mechanistic interpretability in NLP, covering its key goals and core methods. We then zoom in on its early applications in IR, examining the few existing studies in depth and discussing how these methods can be adapted to retrieval settings. Through an interactive coding session, participants will gain a practical understanding of how to design, implement, and analyze mechanistic interpretability experiments.

By the end, attendees will be equipped with the conceptual and practical foundation needed to initiate their own research and help strengthen the emerging interpretability and explainability community within IR.

Learning Objectives

  • Understand the key goals and motivations of mechanistic interpretability as a research area.
  • Become familiar with the core methods that have led to influential findings in ML and NLP.
  • Explore how these methods can be applied to IR and identify open directions for future work.
  • Gain hands-on experience in designing, implementing, and analyzing mechanistic interpretability experiments.


Website: https://mech-interp-tutorial-ecir26.github.io/

Mar 29, 2026 09:00 - 12:30(Europe/Amsterdam)
Venue : Commissiekamer 3
20260329T0900 20260329T1230 Europe/Amsterdam Tutorial on Mechanistic Interpretability

This tutorial introduces mechanistic interpretability, a growing research area within the broader interpretability community that seeks to reverse-engineer model components to understand how neural models perform tasks. While this area has rapidly advanced in NLP, yielding insights into the inner workings of large Transformer-based models and enabling model diagnostics, controllability, and safety, it remains largely unexplored in IR.

This tutorial provides a foundational overview of mechanistic interpretability in NLP, covering its key goals and core methods. We then zoom in on its early applications in IR, examining the few existing studies in depth and discussing how these methods can be adapted to retrieval settings. Through an interactive coding session, participants will gain a practical understanding of how to design, implement, and analyze mechanistic interpretability experiments.

By the end, attendees will be equipped with the conceptual and practical foundation needed to initiate their own research and help strengthen the emerging interpretability and explainability community within IR.

Learning ObjectivesUnderstand the key goals and motivations of mechanistic interpretability as a research area.Become familiar with the core methods that have led to influential findings in ML and NLP.Explore how these methods can be applied to IR and identify open directions for future work.Gain hands-on experience in designing, implementing, and analyzing mechanistic interpretability experiments.

Website: https://mech-interp-tutorial-ecir26.github.io/

Commissiekamer 3 ECIR2026 conference-secretariat@blueboxevents.nl

Sub Sessions

Tutorial on Mechanistic Interpretability

Tutorials 09:00 AM - 12:30 PM (Europe/Amsterdam) 2026/03/29 07:00:00 UTC - 2026/03/29 10:30:00 UTC
"This tutorial introduces mechanistic interpretability, a growing research area within the broader interpretability community that seeks to reverse-engineer model components to understand how neural models perform tasks. While this area has rapidly advanced in NLP, yielding insights into the inner workings of large Transformer-based models and enabling model diagnostics, controllability, and safety, it remains largely unexplored in IR. This tutorial provides a foundational overview of mechanistic interpretability in NLP, covering its key goals and core methods. We then zoom in on its early applications in IR, examining the few existing studies in depth and discussing how these methods can be adapted to retrieval settings. Through an interactive coding session, participants will gain a practical understanding of how to design, implement, and analyze mechanistic interpretability experiments. By the end, attendees will be equipped with the conceptual and practical foundation needed to initiate their own research and help strengthen the emerging interpretability and explainability community within IR."
Presenters
CC
Catherine Chen
PhD Candidate, Brown University
Co-Authors
MH
Maria Heuss
Assistant Professor , University Of Amsterdam
CE
Carsten Eickhoff
Unievrsity Of Tübingen
125 visits

Session Participants

User Online
Session speakers, moderators & attendees
PhD Candidate
,
Brown University
No moderator for this session!
No attendee has checked-in to this session!
6 attendees saved this session

Session Chat

Live Chat
Chat with participants attending this session

Questions & Answers

Answered
Submit questions for the presenters

Session Polls

Active
Participate in live polls

Need Help?

Technical Issues?

If you're experiencing playback problems, try adjusting the quality or refreshing the page.

Questions for Speakers?

Use the Q&A tab to submit questions that may be addressed in follow-up sessions.