Do Predictive Algorithms Have A Place In Public Policy?
Algorithms are finding their way into all areas of our lives—from determining what news stories pop up in your social media feed to suggesting new music or restaurants. But should they have a hand in shaping jail sentences and predicting public policies? Government agencies are now using algorithms and data mining to predict outcomes and behaviors in individuals, and to aid decision-making.
Although there is an uptick in algorithmic use, the code behind these tools is not always made public. In a study out this week in Science Advances, computer scientist Hany Farid and a colleague examined one tool used for predicting recidivism risk, the likelihood that criminals will commit a crime again. According to their analysis, the algorithm was no more accurate in predicting recidivism rates than untrained individuals.
Farid and Ellen Goodman, co-director of the Rutgers Institute for Information Policy and Law, discuss how to create a fair and accurate algorithm, and how these tools might be able to improve government and criminal justice decision-making.
Hany Farid is a professor of computer science at Dartmouth College. He’s based in Hanover, New Hampshire.
Ellen Goodman is a professor of Law and the Co-director of the Rutgers Institute for Information Policy & Law at Rutgers University in Camden, New Jersey.
IRA FLATOW: This is Science Friday. I’m Ira Flatow.
An algorithm is a series of steps a computer takes in arriving at a decision. Algorithms are used to figure out what posts pop up in your social media feed. They can suggest a song you might like or recommend a spot for dinner– not something that gets your attention.
But what about when algorithms are used in higher stakes decision-making, like determining a jail sentence, or where to send police on foot patrols? This is the stuff that might make you think a bit more deeply.
And with that in mind, researchers wanted to understand how some of those algorithms work and compare them to how people actually think. They tested a tool that is used to predict if a defendant is a risk for committing another crime. And they found that a group of randomly sampled untrained people arrived at the same predictive rate as the algorithm. And they published their study in the journal Science Advances.
Well, could these algorithms help in public policy decision-making? How much do we know about their accuracy? And what does it mean to program fairness and justice into the algorithm? That’s what we’re going to be talking about this hour.
Our number– 844-724-8255. You can also tweet us– @scifri. And we have up there on our website a question– would you trust an algorithm over what a person says? Maybe we’ll get some results of that query by the end of the program.
Let me introduce my first guest. Hany Farid is one of the authors on that study. He’s also a professor of computer science at Dartmouth College. Welcome to Science Friday.
HANY FARID: Good to be here.
IRA FLATOW: The tool you evaluated that I talked about is called COMPAS, or Correctional Offender Management Profiling for Alternative Sanctions.
HANY FARID: That’s correct.
IRA FLATOW: A lot of words.
HANY FARID: A lot of words and in fairly wide use– since its inception, it has analyzed over a million defendants in the US. So it’s actually something that has been fairly broadly used in the courts. It is not the only software. There are many softwares like this.
But perhaps one of the most important things– before we talk about efficacy, and accuracy, and fairness– is that this is a commercial entity, and the details of the algorithm are very tightly kept secret. So these decisions are being made inside of a black box, where we don’t know what is actually happening. And I think that given what you said earlier, given the stakes here, I think that should be a little concerning to us.
IRA FLATOW: And so as a tool used to determine bail and sentencing, as you say, it’s proprietary software, as we might call it.
HANY FARID: That’s exactly right.
IRA FLATOW: You tried to get a look inside the algorithm by developing your own and testing it with a randomly selected pool of people. What did you find?
HANY FARID: Right. We found a couple of interesting things. I mean, anytime you’re going to deploy algorithms like this to take decision-making away from people– and let me say it, that’s not a fundamentally bad thing. We don’t object to using artificial intelligence and data to make more objective and data-driven decisions. You know, that’s a good thing. It’s a worthy cause.
But before you do that, you should ask the question, what is baseline? Where are the humans? And then where are the algorithms above that, and where do they fail, and where do they succeed?
And what we found– and I will say, much to our surprise– and I should mention this is the undergraduate thesis work of Julia Dressell from here at Dartmouth– is that when we polled 400 workers online responding to a survey on Amazon’s Mechanical Turk– it’s an online workspace– they essentially have indistinguishable accuracy from this commercial software that is being used in the courts. They were as accurate, as fair, and as inaccurate as the software.
And right out of the gate, that was really interesting. Because these are people who we’re paying $1 to respond to 50 questions. And all they’re seeing are just a very short paragraph about a defendant. They know nothing else, and they’re basically as accurate as the software.
IRA FLATOW: Now when you say as accurate, from what I understand, they’re accurate about 70% of the time. Is that right?
HANY FARID: 65% is about where they are.
IRA FLATOW: I would expect that something with a computer involved in it would be accurate more than 65. You’d want it to be accurate. If I have a calculator and I add 150 and 350, I want it to come out to the same number all the time.
HANY FARID: Yeah. Well, I think that’s the right question to ask. Because if we’re going to deploy this type of technology in the courts to make, let’s be honest, life-altering decisions for defendants– whether that’s bail, sentencing, or parole– I think it’s reasonable to say 65 is a relatively low accuracy.
Now, what is the right accuracy? I don’t know. But I think it’s reasonable to say that seemed too low. Because 35% of the time you’re getting it wrong, and the stakes are too high to make that many mistakes.
IRA FLATOW: Because the whole point of having something is to be more accurate than what the people might be.
HANY FARID: That’s exactly right. And so we were really surprised by this. But then it gets even more interesting and more surprising after that.
So Julie and I sat down, and we thought, well, first of all, how can this be? How can random people online responding to a survey be as good as this commercial software being used in the courts to make life-altering decisions? And we said, well, you know what– and we understand it’s a black box– we understand it’s proprietary– let’s see if we can dig into that and really understand the algorithms.
And here’s what we did. We found that if you only give a very, very simple algorithm, it’s the simplest possible machine learning classification algorithm, the kind of thing that we’ve known about for decades. If you give it just two pieces of data– how old somebody is and how many prior convictions they’ve had– you can get a 65% accuracy, the same as the commercial software, and the same as our users online.
Now what’s interesting is when you look at how it’s making the decision, it turns out it’s completely obvious. So here it is. If you are young and have committed a lot of prior crimes, you’re high risk. And if you are old and you have very few priors, you’re low risk.
And the point here is– that makes, by the way, perfect sense– but there’s a couple of issues here. So one is number of prior convictions is correlated to race. There are asymmetries in our society with the frequency of arrest, prosecution, and conviction based on race. So number of convictions is a proxy for race. So even though it looks like the algorithm should be race-blind, they are not necessarily race-blind.
And the other thing, and in some ways I think maybe the more important, is the following. Imagine you are a judge and I tell you I have this proprietary software based on big data and big algorithms that can make a risk assessment. You can imagine that you would give a fair amount of weight to that– but if I now tell you that 12 people online said this person is high or low risk, that you would give that a different weight.
And so our point is not so much use the software, don’t use the software. It’s there should be more transparency. There should be more understanding of what the algorithm does so that we can give a proportional weight to it– so that we can say, you know what, this is a pretty simple thing. I’ve got two numbers. How old are they? How many prior convictions? I’m aware of prior convictions having a racial asymmetry. I’m going to use those numbers in a way that is proportional to my confidence in this estimation.
IRA FLATOW: I understand. I want to bring in another guest to talk about how these algorithms are used and about transparency. Ellen Goodman is co-director of the Rutgers Institute for Information Policy and Law at Rutgers in Camden, New Jersey. And she joins us here in our WCNY studios. Welcome to Science Friday.
ELLEN GOODMAN: Thanks. Good to be here.
IRA FLATOW: The fact that we don’t know what’s inside the black boxes of these things– is that import, the transparency issue?
ELLEN GOODMAN: Yeah. I mean, I agree with Dr. Farid, and from a legal perspective, that really is the critical issue.
These algorithms are being used not just in criminal justice, but also in human services to detect high-risk houses for child abuse. They’re being used to assess teachers. And there have been a lot of lawsuits. And we’re seeing increasing numbers of individuals whose fate is determined by these algorithms, and they can’t challenge them because they are black boxes.
So there are legal dimensions. And then there are also just– we expect our governments to be accountable, and we can’t hold them accountable unless we understand how they are making decisions.
IRA FLATOW: You wrote in a Wired article that “governments have not made the shift to understanding this is policy-making.” What did you mean by that? What’s your concern there?
ELLEN GOODMAN: If we just take this criminal justice decision, so we would like it to be perfectly accurate. None of these decisions– not human, not algorithmic– are perfectly accurate. Therefore, they are tuned one way or another to privilege certain policies.
So in the criminal justice context, we may want more false positives than false negatives, right? We may want to be conservative about sentencing, so that we make sure even people who pose a lower risk are locked up rather than let them go and have them commit a crime.
So that’s a policy choice, right– exactly how you tune that. And the same thing in all these domains– those preferences are built into the algorithm, but we don’t know what they are.
IRA FLATOW: And speaking of we don’t know what the algorithms are, I understand that you had a study where you filed 42 requests to get these algorithms. And what did you find out about how these contracts are handled, and how much do the people using these tools actually know?
ELLEN GOODMAN: Right. So our requests were to state and local jurisdictions across a range of domains for the predictive algorithms they were using. And I should be clear– we didn’t expect actually to get the software. Because as Dr. Farid said, this is proprietary and closely guarded.
But we were looking even for things around, outside of the actual software– for example, what were your high level objectives? What were some of the policies you incorporated? And by and large, we got very little in response, mostly responses that they had no responsive documents. And sometimes we got the contracts, but sometimes not even. And so we think that the claims were essentially either that they had no information, because cities are not bargaining for this information, or that they were protected by trade secret.
IRA FLATOW: Is there not any open source software that could be used and then actually have a mandate by a city or a state that we’re going to only buy and use open source software? So everybody can see what’s inside.
ELLEN GOODMAN: Yeah. I mean, New York City has just gone through this process. And they now have a new ordinance where they are going to undertake a transparency initiative for their predictive algorithms. And I think their first thought was, we’re going to mandate open source. And I think they walked back from that just because there are so many commercial products that are not open source that they might want to use.
But that would be one way to handle it, although I don’t think even open source gets you all the way there. Because first of all, all this software has to be audited, and adjusted, and tuned, and changed over time. And so it’s a moving target. And then also there are implementation questions. A judge sees the prediction, sees the scores– do they implement them, or do they override them?
We need to know more than just kind of what’s in the black box. But how are they actually being implemented?
IRA FLATOW: Right. Hany, reaction?
HANY FARID: I agree with Dr. Goodman that open source is sort of one extreme from where we are now. I think there’s a middle ground. So you can imagine, for example, the National Institute for Standards and Technology, NIST, could say if you want to use these algorithms in these high stakes areas– criminal justice– then they need to be vetted by federal government agencies that have double-blind assessment, big data sets, and very careful analysis.
Dr. Goodman raises a good point too– this is a moving target. So this is not something you do once and then stop. So I think there’s a middle ground of protecting the commercial entities but also protecting the public from these black box algorithms.
IRA FLATOW: Let’s go to the phones. Let’s head out to Lexi in Boston. Welcome to Science Friday.
CALLER: Hi. I’m calling to ask about how we prevent human biases from creeping into these algorithms. So if you’re training a machine learning algorithm with a data set that has inherent human biases, say from policing areas, how do we keep those biases from then informing the algorithms that we’re using?
IRA FLATOW: Good question. Hany, do you want to answer that?
HANY FARID: Yeah, that’s a great question. That’s sort of the right question too. And in some ways this is really hard. Because we have the sense that since this is big data and AI, it doesn’t reflect the biases. But the fact is, the data we shove into these algorithms reflect societal biases, as I mentioned in terms of prior convictions. So in some ways it’s very hard to eliminate those biases from the data.
Now, you can work hard at the algorithmic side and make sure you understand what are the mistakes that you’re making as a function of race, gender, age, ethnicity– whatever it is– and then try to balance the algorithms. But the data is the data. And so I think the burden is going to be on the underlying algorithms to make sure they are making fair assessments.
IRA FLATOW: This is Science Friday from PRI, Public Radio International, talking about computer algorithms with Hany Farid and Ellen Goodman.
Ellen, you know, I think about this as, well, if I have a real person talking in court, I might have an opportunity to cross-examine that person about what their decisions are, but I have no opportunity to cross-examine the reasoning of an algorithm.
ELLEN GOODMAN: Right. I mean, you can take that too far, right? So another thing you might say is, I never get to know what’s inside a judge’s head. A judge’s head is a black box, so is a superintendent’s evaluation of my teaching. So why should we get to know that now when we didn’t get to know that before?
And I think there are two ways to think about that. One is that that’s one of the virtues of algorithms, is that we can know– let’s take advantage of that. But the second is that I think, as Dr. Farid said, that these algorithms, you have a few competitors, But they are being rolled out across the country to hundreds of jurisdictions. And so if you do have an error or you have a bias, it scales, and it gets repeated over and over in a way that a bias or a defect in the human mind doesn’t.
IRA FLATOW: Do you agree, Hany?
HANY FARID: It’s a great point. These algorithms can work on scale the way humans don’t. And again, I want to be clear that I don’t think I’m saying– and I don’t think Dr. Goodman is saying either– that we should not use big data, and we should not use algorithms.
I think we’re saying more that there should be transparency, that we should understand them, and that you shouldn’t be able to hide behind the black box. Because once we understand them, then we can deal with the limitations and the strengths.
And look, in the end– 10, 20 years from now– they may do better than humans. They may eliminate some of the biases that we know exist. But I don’t think we are there yet. And in the process, I think people are suffering because of failures of these algorithms.
IRA FLATOW: So what would make it better, make us get there?
HANY FARID: I can tell you at least from the technology side– I mean, there is hope that the machine learning algorithms will get better. We are seeing a real revolution in artificial intelligence and machine learning. We are seeing a revolution in the access to data. And I think that there is hope that you can build better algorithms that are more fair and more accurate.
I will say, however, that we should keep in mind that predicting the future is really hard. And we are asking these algorithms to look at a relatively small amount of information and make predictions about the next two years of a person’s life. And that is not an easy task. And so I think at some level we might want to ask ourselves, if this is not possible at a high level of accuracy, what else should we rely on in the criminal justice system?
IRA FLATOW: Ellen, any final comments about that?
ELLEN GOODMAN: Yeah I mean, I think there is a misconception right now. And if we think about these algorithms as sort of being the early days of the automobile industry– we didn’t have belts, we didn’t have safety protections, and so we need to put those in place. There’s a misperception that this is all going to be cheap, that it’s cheaper to use these algorithms than to use human beings.
And I think the problem is that to do it ethically, it shouldn’t be that cheap. Because your city’s public jurisdictions are going to need to demand the right kinds of data, information, records, checks, and then we’re going to need auditing, and we’re going to need a lot more transparency, and it’s going to cost money.
IRA FLATOW: And we’re going to be paying the people who make the algorithms. Is that what you mean– it’s going to cost money?
ELLEN GOODMAN: Paying the private vendors is, I think, built in. We know that. But it’s these other things– these public expenditures– that are going to have to be made in order to do it ethically. And if it’s done on the cheap, you know, I think ultimately it’s not going to be cheap because we’re going to have a lot of litigation around it, because these due process concerns are not going to be handled.
IRA FLATOW: Well that’s something we’ll continue to follow. Maybe I’ll have you all back to talk about it later.
Thank you both for taking time to be with us today. Hany Farid is professor of computer science at Dartmouth, Ellen Goodman co-director of the Rutgers Institute for Information Policy and Law at Rutgers University.
Alexa Lim was a senior producer for Science Friday. Her favorite stories involve space, sound, and strange animal discoveries.