AI Bias and Risk Models

Phone: 650.723.2465
Fax: 650.725.0253

January 9, 2025
Illustration by Leif Parson

The world is rife with risk-prediction algorithms. Algorithms tell lenders whether a borrower is likely to default. Risk assessment algorithms are used to predict the likelihood that a criminal will be a repeat offender. All such algorithms have one thing in common: They rely on data.

And that is how Julian Nyarko, professor of law and associate director at the Stanford Institute for Human-Centered Artificial Intelligence (HAI), came to study the effectiveness of risk-based prediction models. At stake is whether risk assessment models actually predict the truths they purport to predict. Nyarko and two colleagues from Harvard University have published a new paper in Science Advances showing that many risk models may not be all they are cracked up to be, not because they lack data, but because they have too much data. They refer to the conventional wisdom in the field as the “kitchen sink” approach—a strategy where more data is thought to be better.

“The thinking goes, ‘Let’s just give the model access to as much data as possible. It can’t hurt, right? If the data say shoe size or the price of coffee are good predictors of recidivism, researchers should want to know that and to use that information in their models’.”

But, Nyarko explains, “This rationale assumes that we actually have historical data on the outcome we are trying to predict. But this is rarely the case. For instance, while we are generally interested in predicting whether a suspect would reoffend if released, all we can train our models on is historical data about rearrests. The two often differ, e.g. because of differences in policing activity across various areas. Under circumstances like this, we show that training on less data can actually be the better approach.” SL