From Global to Local: Social Bias Transfer in CLIP

10月 1, 2025·

Ryan Ramos

Yusuke Hirota

Yuta Nakashima

Noa Garcia

· 0 分で読める

概要

CLIP models are often used as backbones for training new models for different vision-language tasks. However, these encoders exhibit social biases in their representation spaces. This raises the concern of whether these social biases have harmful effects when used to train models for downstream applications. We investigate the mechanics of pre-training bias in CLIP and its potential to affect downstream bias via bias transfer. Firstly, we examine how pre-training bias levels change between global and local views of data. Secondly, we train multiple models with different CLIP backbones, and search for correlations between pre-training biases and downstream biases across multiple social attributes and bias metrics. Lastly, we investigate why the correlation trends exist the way they do based on the common paradigm of adapting backbones to frozen language models. Our experiments show 1) that pre-training bias measurement is highly data dependent; 2) a lack of consistency in bias transfer trends between pre-training and downstream bias; and 3) backbones will converge in terms of their representation spaces under the current paradigm for training new models for downstream tasks.

タイプ

学会論文

収録

*Proc. the IEEE/CVF International Conference on Computer Vision Workshops (ICCVW2025) *

最終更新 10月 1, 2025

← Bias in Gender Bias Benchmarks: How Spurious Features Distort Evaluation 10月 1, 2025

Graphs as Knowledge Representation 10月 1, 2025 →