World's first! Major study in Nature suggests computer vision is bidding farewell to the era of stealing data

Computer Vision (CV) technology is widely used in many fields such as autonomous driving cars and consumer electronics devices. Image datasets play a fundamental role in this field, and the emergence of large image datasets such as ImageNet has achie
However, in the past decade, most data sets have been collected through web scraping, and there are still ethical issues related to unauthorized access, lack of diversity, consent, and compensation.
The 'initial deficiency' in data not only undermines the fairness and accuracy of artificial intelligence (AI) systems, but also reinforces social biases such as race and gender.
For instance, research indicates that commercial face recognition systems have a higher error rate when identifying individuals with dark skin compared to those with light skin. Additionally, some prominent datasets have been withdrawn due to ethical
In this context, Sony AI has launched the world's first publicly available, globally diverse dataset, FHIBE, based on user consent, specifically for evaluating the fairness of human-centered computer vision tasks.
According to reports, FHIBE has collected 10,318 images from 81 countries and regions, involving 1,981 individual individuals, covering a wide range of visual tasks from face recognition to visual question answering.
In addition, FHIBE has the most comprehensive labeled information available, including demographic characteristics, physical attributes, environmental factors, instrument parameters, and pixel-level labeled data. This enables more detailed bias diagn
Related research papers are titled as
Fair human-centric image dataset for ethical AI benchmarking
Nature

Paper link: www.nature.com/articles/s41586-025-09716-2
Due to the scarcity of publicly available ethical datasets for most computer vision tasks, even the most basic first step of examining biases is extremely difficult, said Alice Xiang, Head of Sony Global AI Governance and Principal Researcher at FHIB
This achievement is a significant milestone in the development of trustworthy artificial intelligence, which not only improves the measurement criteria for artificial intelligence fairness benchmarking, but also provides a path for responsible data m
The world's first 'people-centric'
Unlike previous methods, FHIBE adopts a global crowdsourcing and self-reporting approach, where data providers collect images from 81 countries. Each participant uploads their own photos and provides self-reported information such as age, pronouns, e
To ensure the diversity of image data, the image capture devices cover 785 camera models from 45 manufacturers, which truly represent 16 scene types, 6 lighting conditions, 7 weather conditions, 3 shooting angles, and 5 shooting distances.
Compared with other similar datasets, FHIBE is particularly balanced in regional distribution: Africa accounts for 44.7%, and Asia and Oceania account for 40.6%, which significantly improves the previous problem of excessive concentration of portrait

Image Annotation with themes, instruments and environments, all metadata of images in FHIBE are accessible.
Each image in FHIBE is accompanied by self-reported gestures, interactions, appearance characteristics, age category labels, as well as pixel-level annotations of faces, people, and keypoints. These annotations include 33 keypoints and 28 segmentatio

Image | FHIBE image example, including detailed pixel-level annotations, keypoints, segmentation masks, and bounding boxes.
It is worth mentioning that the research team strictly followed the General Data Protection Regulation (GDPR) and other protection regulations during the data collection process, including consent forms with clear terms, which specify the purposes of
In addition, through the generative diffusion model, the research team performs image restoration (such as removing bystanders or license plates) and manual review on the involuntary subjects and identifiable personally identifiable information in th
AI can also be fooled: an evaluation of the fairness of existing models
In addition to meeting ethical norms, FHIBE is also rigorous in methodology, including:
Demographics + phenotypic details: attributes self-reported by participants, such as pronouns, ancestral origins, age groups, hairstyles, makeup, and headwear. Environmental background: images contain metadata about lighting, weather, and scene types
This provides conditions for its widespread application in model fairness evaluation. Through FHIBE, the research team has systematically tested the bias issues of various current mainstream narrow models and general base models, specifically includi
Based on cross-group analysis (pronoun × age × ancestry × skin color), the study found that younger individuals (aged 18-29), with lighter skin tones and Asian ancestry tend to have higher accuracy rates, while older individuals (over 50 years old) w
The performance of different models in specific cross combinations also varies. For example, in face detection, RetinaFace performs best in the she/her/hers × Type I × Asia combination and worst in the he/him/his × Type II × Africa combination; while
In addition, FHIBE has also identified previously unrecognized subtle biases, such as:
Due to insufficient ability to identify individuals with gray hair, facial analysis models perform poorly when dealing with older individuals. Due to significant differences in hairstyles, the accuracy of face verification models in identifying femal
Given the aforementioned differences, FHIBE is able to identify the interfering factors related to human detection performance through feature regression and decision tree analysis, including body posture (such as lying posture), subject interaction
In terms of multi-modal basic models, the team focused on testing two mainstream models, CLIP and BLIP-2. The results showed that
CLIP: In the image classification task, CLIP tends to assign neutral labels (unspecified) to images using the pronoun "he/him/his"(0.69) than images using the pronoun "she/her/hers"(0.38), reflecting men's tendency to default; there are association b

Deviation of CLIP in predicting FHIBE dataset
BLIP-2: In open-ended questions, even if the questions do not involve gender or racial information, BLIP-2 generates descriptions with gender or racial biases; for negative cues such as 'crime', it triggers higher harmful stereotypes among individual

Diagram: BLIP-2 Analysis Result
The paper points out that FHIBE is a turning point in promoting more responsible development of artificial intelligence and paving the way for ethical data collection in the future. However, at the same time, the research team also acknowledges that
High costs. Participant recruitment, review, and compensation require a large amount of manpower and funding, which is much higher than web scraping. Insufficient visual diversity. Compared with web scraping, consensus-based data collection has a hig
In the future, the research team hopes to use FHIBE as a starting point to integrate comprehensive and consensually obtained images and labeled data, and promote institutionalized practices in data collection, informed consent, privacy protection, an
On the other hand, they also aim to leverage the role of FHIBE as a detection tool to monitor the performance and biases of models, and develop more inclusive and trustworthy artificial intelligence systems.


