From Global to Local: Social Bias Transfer in CLIP
10月 1, 2025·,,,·
0 分で読める
Ryan Ramos
Yusuke Hirota
Yuta Nakashima
Noa Garcia
概要
CLIP models are often used as backbones for training new models for different vision-language tasks. However, these encoders exhibit social biases in their representation spaces. This raises the concern of whether these social biases have harmful effects when used to train models for downstream applications. We investigate the mechanics of pre-training bias in CLIP and its potential to affect downstream bias via bias transfer. Firstly, we examine how pre-training bias levels change between global and local views of data. Secondly, we train multiple models with different CLIP backbones, and search for correlations between pre-training biases and downstream biases across multiple social attributes and bias metrics. Lastly, we investigate why the correlation trends exist the way they do based on the common paradigm of adapting backbones to frozen language models. Our experiments show 1) that pre-training bias measurement is highly data dependent; 2) a lack of consistency in bias transfer trends between pre-training and downstream bias; and 3) backbones will converge in terms of their representation spaces under the current paradigm for training new models for downstream tasks.
タイプ
収録
*Proc. the IEEE/CVF International Conference on Computer Vision Workshops (ICCVW2025) *