Auditing Image-based NSFW Classifiers for Content Filtering
Abstract
This paper examines NSFW (Not Safe For Work) image classifiers for content filtering. Through an audit of three prevalent NSFW classifiers, we analyze the relationship between NSFW predictions and three demographic factors: gender, skin-tone, and age. Our study reveals that women are disproportionately more frequently misclassified as NSFW compared to men, even when they appear conducting common daily-life activities. Additionally, we find that NSFW classifiers tend to mispredict images of people with lighter skin-tones and images depicting younger people. We explore the causes of such mispredictions by analyzing the explanatory pixel maps, which reveal some of the reasons behind the misclassifications. Overall, the implications of our findings become particularly salient when considering the application of filters based on NSFW classifiers, which we identified to have a direct impact on image datasets, computer vision models, generative AI, user experience, and artistic creativity. In summary, we hope our study brings attention to the inherent biases within NSFW classifiers and underscores the importance of addressing these issues to ensure fair and equitable outcomes in content filtering.
Type
Publication
Proc. ACM Conference on Fairness, Accountability, and Transparency (FAccT)