By John McCormick

Facebook Inc. on Thursday made publicly available a dataset designed to help artificial-intelligence researchers evaluate their computer vision and audio models for potential algorithmic bias.

The dataset, called Casual Conversations, consists of videos of some 3,000 participants of various skin tones sharing their age and gender.

Cristian Canton Ferrer, Facebook AI's research manager who supervised the effort, said the dataset tries to address two problems: "The critical need within the AI community to [improve] the fairness of AI systems" and "the lack of high-quality data sets that are designed to help measure this fairness in AI." Facebook AI is the social network's artificial-intelligence organization.

Artificial-intelligence systems are trained on large sets of data. Facial-recognition systems, for instance, are fed mountains of facial images that allow the system to find patterns in faces that it can use to make a match. If a dataset used to train a system included mostly the photos of a particular demographic, it might not be as accurate when identifying photos of people in other demographics. AI systems have been shown to be less accurate at identifying faces of dark-skinned women, for example.

Facebook said having people provide their ages and genders for content labeling, as opposed to having a third party or computer system estimate that information, creates a relatively unbiased dataset of people's actual ages and genders.

The Casual Conversations dataset also includes labels of participants' apparent skin tones that were developed by trained annotators using the Fitzpatrick scale, a skin classification system. The annotators also marked videos with ambient lighting conditions, which can help measure how AI systems treat skin tones under lowlight conditions.

In all, participants made an average of 15 videos each, in which they engaged in nonscripted conversations, for a total of more than 45,000 videos. The videos were originally gathered as part of an earlier project Facebook participated in called the Deepfake Detection Challenge, which was set up to accelerate research for detecting and preventing manipulated media.

Many companies have released tools designed to check algorithms for bias in recent years. LinkedIn Fairness Toolkit, introduced last year by Microsoft Corp.'s professional social-network unit, analyzes the attributes of a data set, such as its racial and gender makeup, and compares its findings with an algorithm's results. If a data set is nearly equally divided by gender, for example, but a search algorithm based on that data set generates results with only a quarter of women, the system will spot that.

The standard way of evaluating AI models' performance today is to measure them against a test set after the models have been trained and validated, Facebook said. But that test set, the company said, may contain the same shortcomings as the training sets because it may be collected from similar sources.

Svetlana Sicular, a vice president, analyst, at technology research and advisory firm Gartner Inc., said such a second set of eyes can help AI developers validate the fairness of their systems.

The dataset, for instance, allows a company building a product with a facial-recognition feature to perform additional algorithmic bias testing. In an initial test, the system might have performed equally well across all the ages and genders. However, when run against the Facebook Casual Conversations dataset, in which actual ages, genders and skin tones are known, the second test might pick up instances in which the system doesn't perform consistently well with a group of people with a certain skin tone.

"That's where these data sets might be helpful -- to allow you to measure how fair you are with respect to a bunch of different categories, " Mr. Ferrer said.

If a problem is identified, he said, the developer could add more images of people with that skin tone to the software's training set to improve the AI system's ability to recognize it.

But the dataset is just a step, Mr. Ferrer said. The company is allowing outside developers to access the dataset to find ways to improve it. For instance, he said, the dataset videos were all captured in the U.S. Perhaps, he said, an outsider could enrich the dataset by adding videos of people outside the U.S.

Write to John McCormick at john.mccormick@wsj.com

(END) Dow Jones Newswires

04-08-21 0914ET