Debiasing CLIP with Neural Interventions

This abstract has open access
Abstract Summary
This paper presents an inference-time method to mitigate demographic bias in CLIP-like vision¨Clanguage models through targeted neural interventions in their internal attention mechanisms. We first identify ``expert'' attention heads that encode demographic information by systematically analyzing CLIP¡¯s internal representations in response to labeled inputs. At inference, we intervene these heads -- replacing their activations with demographic prototypes or by neutralizing them (zero ablation). We chose to intervene specifically at the CLS token, as it aggregates information globally across image patches and is directly responsible for the final image embedding. Our results across multiple evaluation frameworks show that these targeted interventions can significantly reduce both gender and ethnicity biases in cross-modal retrieval and zero-shot classification, without compromising model performance.
Abstract ID :
NKDR191
Submission Type
Submission Topics

Associated Sessions

PhD student
,
COMPUTER VISION CENTER
Researcher
,
COMPUTER VISION CENTER
1 visits