Abstract Summary
This paper presents an inference-time method to mitigate demographic bias in CLIP-like vision¨Clanguage models through targeted neural interventions in their internal attention mechanisms. We first identify ``expert'' attention heads that encode demographic information by systematically analyzing CLIP¡¯s internal representations in response to labeled inputs. At inference, we intervene these heads -- replacing their activations with demographic prototypes or by neutralizing them (zero ablation). We chose to intervene specifically at the CLS token, as it aggregates information globally across image patches and is directly responsible for the final image embedding. Our results across multiple evaluation frameworks show that these targeted interventions can significantly reduce both gender and ethnicity biases in cross-modal retrieval and zero-shot classification, without compromising model performance.