11/20/2015

Why Machines Discriminate—and How to Fix Them

27:50 minutes

Server room, from Shutterstock
Server room, from Shutterstock

Some believers in big data have claimed that, in big data sets, “the numbers speak for themselves.” Or in other words, the more data available to them, the closer machines can get to achieving objectivity in their decision-making. But data researcher Kate Crawford says that’s not always the case, because big data sets can perpetuate the same biases present in our culture, teaching machines to discriminate when scanning resumes or approving loans, for example.

And when algorithms do discriminate, computer scientist Suresh Venkatasubramanian says he tends to hear expressions of disbelief, such as, “Algorithms are just code—they only do what you tell them.” But the decisions that machine-learning algorithms spit out are a lot more complicated and opaque than people think, he says, which makes tracking down an offending line of code a near impossibility.

One solution, he says, is to screen algorithms more rigorously, testing them on subsets of data to see if they produce the same high-quality results for different populations of people. And Crawford says it might be worth training computer scientists differently, too, in order to raise their awareness of the pitfalls of machine learning in regards to race, gender, bias, and discrimination.

Segment Guests

Kate Crawford

Kate Crawford is a principal researcher for Microsoft Research and a visiting professor at the MIT Center for Civic Media in Cambridge, Massachusetts.

Suresh Venkatasubramanian

Suresh Venkatasubramanian is an associate professor at the School of Computing at the University of Utah in Salt Lake City, Utah.

Segment Transcript

IRA FLATOW: This is Science Friday. I’m Ira Flatow.

Let’s say you apply for a new job. Would you rather have your resume judged by a person or an algorithm, software, computer? The fact is you may have no choice. If you’ve applied for a job at a big company recently, chances are you may have already been subjected to this trial by machine.

And in theory that should be a good thing, right? Take the human out of the equation– all the bias, the baggage, the discrimination. Be judged purely on your merits. But it turns out that machines resemble their human makers much more than we might have imagined.

My next guest says that if our algorithms are fed data from a prejudiced world, they can end up with many of the same biases that we do. Kate Crawford is a principal researcher at Microsoft Research, Visiting Professor at the MIT Center for Civic Media. Welcome to Science Friday.

KATE CRAWFORD: It’s a pleasure to be here, thanks, Ira.

IRA FLATOW: And we want to ask our listeners, we want to hear from them– would they be rather be judged by a human or an algorithm? Has an algorithm ever discriminated against you, to your knowledge? So give us a call, 844-724-8255. Also, you can tweet us @scifri.

Kate, there is a lot of hype about the power of big data. Why, in your view, do we have to treat the data and what machines are doing with a bit more skeptical eye?

KATE CRAWFORD: Well, big data is very exciting for lots of reasons. It gives us capacities that we simply didn’t have before. But there’s also this little trap that we can tend to look at large collections of data as somehow being more objective and more representative when this is not necessarily the case.

And we can make the same sort of miscalculation if you will about algorithms– that because they’re systematic we assume that they’re somehow more objective than humans. But what we’re starting to see is that we can have elements of bias in both the data sets that we’re using and in the algorithms themselves.

IRA FLATOW: Give us an idea. What kind of real life examples.

KATE CRAWFORD: Sure. Well, I first started researching this several years ago now when I was looking at the way that we use social media data to understand natural disasters. And what you find if you’re using say, for example, Twitter data, is that in many cases Twitter tends to be used by people who have a little bit more money, often a smartphone, often they’re urban, often they skew younger. So if we’re starting to use that data to really try to understand where to direct resources in a natural disaster, you’re missing part of the picture.

Then I started to look at how apps might be reproducing this form of bias. So there’s an app that’s actually quite fascinating called Street Bump which is used in the Cambridge area. Now, this was designed for really good reasons. It’s actually very smart. What it does is it basically tracks your accelerometer and your GPS data as you’re traveling down the road so when you hit a bump it’ll say, oh, there’s a bump, and if a few other people hit the same bump, then the city knows to send out a road crew and–

IRA FLATOW: There’s a pothole there.

KATE CRAWFORD: And repair that pothole. Absolutely right.

But what I started to do is to look at who owned smartphones in the Boston area, and what, unsurprisingly enough, it tends to map with people with more disposable income and also younger audiences. Particularly if you look at the over 65s, you’ll find that in some cases smartphone penetration is as low as 16%.

So what you end up doing is having an app which in many cases is accentuating the signals from areas that have younger, wealthier residents, and we’re not getting the signals from areas where we have people who have less money or if they’re generally older populations. So what that could mean is cities could start to be resourced and repaired based on signals that are really skewing the population towards those who are already are pretty privileged base.

So I started to think about both how the data collection can produce particular kinds of, shall we say, black holes and really boosting the signal of populations who are already, let’s face it, very well-connected.

IRA FLATOW: So if you were to look at that example and you say how– knowing that the data could be biased– how do I unbias it, maybe then you put the detectors on garbage trucks that go everywhere.

KATE CRAWFORD: And that’s exactly right, and that’s actually exactly what they did. And the group– the New Urban Mechanics who built this apps are actually very sensitive to making sure that this data is equitable and they’re getting a good picture of the city.

So what they did is they put this app along all kind of counsel vehicles, from garbage trucks through to police cars so that they get a wider signal. That was a really good example of a quick response and thinking about how you try to unbias your data. But in many cases that’s a lot harder to do, particularly if you’re using historical data.

IRA FLATOW: Now, everybody– the first thing they do when they want to know something is that they Google it. Are we getting biased data coming back from a Google search?

KATE CRAWFORD: Well, it’s really interesting. So a group of CMU researchers recently published a paper looking at what happens if you Google for particular jobs. And they actually created a very clever system that was automated and they created a lot of fake profiles, half of which were female and half of which were male. And they used this to test what kind of job ads that were being shown. And very problematically what they found was that all of the really high-paying jobs– the CEO level jobs, the things that were 200,000k plus were being shown far more to men and they weren’t being shown to women.

So that produces a very serious bias that, and of course, impacts upon itself because not only do we have a problem with not enough women in the C-suite, now they’re not even seeing the ads for those positions. So absolutely we’ve seen particular forms of search bias across all of the search engines, and this is something obviously that’s very serious for technology companies to think about.

IRA FLATOW: I want to bring in another guest. Suresh Venkatasubramanian is an associate professor in the School of Computing at the University of Utah in Salt Lake City. He joins us from KUER there today. Welcome to Science Friday, Suresh.

SURESH VENKATASUBRAMANIAN: Thanks for having me.

IRA FLATOW: Let’s talk about– before we get ahead of ourselves too much– let’s get into the nitty gritty about the software itself which at its heart contains these algorithms. What actually is an algorithm? You had a great article in Medium comparing algorithms to recipes. Let’s talk about that.

SURESH VENKATASUBRAMANIAN: Sure. So one of the things we do in our undergraduate classes in computer science is to teach people what it means for something to be an algorithm. And the usual story we tell them goes something like this– that you should think of an algorithm as a set of instructions that takes some inputs and produce some outputs, and the best way to think about this is like a recipe.

So you have a recipe for making something. It has a set of instructions that are hopefully well-defined and easy to follow– not always, but most of the time. And your inputs are the ingredients and the output, hopefully, is the food you want to make.

And so we’ve been telling this story for a long time, and I think it’s a very good story. It captures a vast majority of the kinds of algorithms we work with, but what it does not do anymore is capture the kinds of machine-learning algorithms that we’re actually using in the cases that Kate mentioned.

By the way, Kate, honor to meet you. I loved your article on the six provocations of big data, by the way.

KATE CRAWFORD: Oh, that’s very kind.

SURESH VENKATASUBRAMANIAN: So–

KATE CRAWFORD: Thanks, Suresh.

SURESH VENKATASUBRAMANIAN: So the better analogy and what I wrote about in the article was something a bit more involved. It looks something like this, if you permit me to tell a little story.

So Thanksgiving is coming up and we’re all going to be making, among other things, stuffing. And you could, of course, get a stuffing recipe from a book and make it. But let’s suppose you didn’t do it that way. You didn’t use the algorithm for making stuffing that way. You did something different.

What you did was you called 10 of your friends over to your house three days before Thanksgiving and said, bring me your examples of stuffing that you like and if you can, bring me an example of stuffing you don’t like. So they come and bring the examples and they leave their bowls there marked off with what they like and don’t like and they go away.

And you taste the difference stuffings and you have some idea, OK, I need these ingredients. I need maybe, I don’t know, bread crumbs or I need cornmeal or I need celery, and you make something. And you make something and they come back the next day and tell you what they like and didn’t like and you do this over and over again.

Eventually you come up with a set of instructions that might involve mixing the ingredients in a bowl, twirling three times on one foot, spinning it over your head, and it works great. You write these instructions down very faithfully and you send it out to everyone saying these are the instructions for making stuffing. And they make the stuffing and it works, and if they come and ask you, well, why do I have to twirl three times, your answer is, I don’t know. It just seemed to work.

And, now, if you did this in Salt Lake, you might get one recipe. You did this in New York City you might get a different recipe. They will all be recipes for stuffing but they’re all very different. They all come from different sources, and it’s very hard to tell why they are the way they are. And that’s what these algorithms are nowadays. That’s exactly how they look.

IRA FLATOW: And does that mean– does that explain how the algorithms get written? Is the software actually writing its own algorithm then if it’s different in every city?

SURESH VENKATASUBRAMANIAN: Essentially yes.

IRA FLATOW: Wow.

SURESH VENKATASUBRAMANIAN: They are being trained so the algorithm takes in some data, it trains itself on some data. It has what I was describing– it plays a game of 50-dimensional roulette essentially trying to find a spot to land in a very, very high-dimensional space and it lands somewhere and where it lands is the final algorithm.

IRA FLATOW: Wow. That’s many things. So that’s how then it might filter down to a resume because we’re feeding in all these biases of our own, all the different recipes and that pops something that we don’t recognize or we don’t really want.

SURESH VENKATASUBRAMANIAN: Yes. And one of the problems is that humans are very good at trying to find patterns in data. Machines are also very good at trying to find patterns in data, whether or not they’re meaningful. So if finding a pattern gives you some edge in terms of prediction accuracy or how well you can match the results that the human told you you should match– the stuffing that your friends told you was good for example– you will find these small patterns.

So an algorithm that scans resumes might even say, oh, I notice that this– when people use this kind of font it has a high correlation with being productive, so we should– this is an important feature. Is it or is it? I don’t know. Maybe it is. Maybe it isn’t. But it could do things like that and then it’s hard to understand why it’s doing that.

IRA FLATOW: And, Kate, and big companies are doing resume scanning like this?

KATE CRAWFORD: Absolutely. This is really common at the moment, and I think the big problem is that, as Suresh points out, there can be frankly just errors in how they’re understanding what productivity looks like.

In many ways we could think about the process of being hired as always having an algorithm. You met somebody who would look at your CV. They would decide what a productive person looks like and acts like and talks like. It’s just that now there’s algorithms basically within systems– in many cases, a black box– so you really don’t see how you’re being judged.

And that’s actually part of the problem. I like this idea of the cooking analogy because I think one of my favorite descriptions of how we can think about these kinds of big scale data systems comes from Geoff Bowker who said that raw data is both an oxymoron and a bad idea. That in fact your data should be cooked with care. And in many of these cases, these are systems where the data is being cooked but we don’t know necessarily the level of care that’s being brought to it.

IRA FLATOW: I like that. I’m going to go to the phones. Back in the day when I was studying computers, it was garbage in, garbage out.

Let’s go to Tod in Oakland, California. Hi, welcome to Science Friday.

TOD: Hi, thanks for having me.

IRA FLATOW: Go ahead.

TOD: So, yeah. So I work at Google and awhile ago I built a hiring algorithm that did a pretty good job at predicting who’d be successful, so I thought I would just weigh in a bit.

KATE CRAWFORD: I’d love to hear what kind of criteria you’re using to determine what success looks like there.

SURESH VENKATASUBRAMANIAN: And what algorithm you’re using.

TOD: Yeah. So I can’t give away the secret algorithm, but I can say that we looked at both were you successful in getting hired but also once you got hired, we looked at 30 different performance metrics from performance scores to organizational systemship behaviors to all sorts of different performance metrics to see once you’re there, are you a good performer.

KATE CRAWFORD: And what were some of the unexpected findings in terms of things that would make somebody likely to stick around?

TOD: So we thought going in that GPA would be a important variable but what we found was it’s only important for the first three years after somebody graduates and then after that it loses all its predictive power. So we kind of changed what we were looking for to once you’ve been out of school for three years we don’t really look at your GPA.

IRA FLATOW: So what was the most successful predictor?

TOD: A really interesting one was the age you got into computers compared to your peers. And we didn’t ask you what the absolute age was. We just asked compared to your peers. So if you’re 50 and you’re getting into computers and all your friends don’t get into it for another 10 years, you actually get in younger, and that actually tended to correlate really well with success on the job.

IRA FLATOW: Let me just remind everybody that I’m Ira Flatow. This is Science Friday from PRI– Public Radio International, talking about–

SURESH VENKATASUBRAMANIAN: So I have a question for Tod, actually–

IRA FLATOW: I know. You’re chomping at the bit, Suresh. Go ahead.

SURESH VENKATASUBRAMANIAN: Sorry. I was wondering what kind of training data do you use?

TOD: We used everybody that had come to our systems for I think about a year and then looked at their performance data a couple years later.

SURESH VENKATASUBRAMANIAN: OK.

IRA FLATOW: Did you have to change anything? Are you now have a fully cooked– because we’re using a recipe analogy here– do you have a fully cooked method down?

TOD: Well, I agree with the other two callers that it uses the input in the hiring process and we get a lot of pieces of information and we don’t ever just turn to the algorithm and say who do you want to hire, but it’s better at combining some of the pieces that we do have and then we still turn it back to the human to decide who should we hire.

KATE CRAWFORD: Can I ask a little question about where you might think bias could emerge in a system like this? And I’m thinking, for example, about gender bias which is obviously a very big issue in the technology sector and we’re trying to increase the number of female engineers and programmers but it’s still at a very low level.

I’m thinking about the fact that quite often boys are encouraged to get into computers earlier than girls for a whole range of reasons. Do you think that might also skew that particular marker that you have that says age that you get interested in computers relative to your peers, do you think that might then have some sort of gender flow on effects in terms of who’s being seen as a hireable candidate?

TOD: Well, it’s actually not something we ask about when we hire for people. It’s just what it shows is interest in technology, and that was one way in the validation study that we asked it. But when people come in and do interviews we ask everybody all sorts of different questions that just gets into are you curious about technology. So it didn’t just ask you when did you buy your first computer but did you take apart your phone at home and other ways that you got into technology.

IRA FLATOW: How do you address jobs that are not technological– teaching, welding, anything like that, or are you just talking about technological jobs?

TOD: Well, for us we’re more interested in that because we hire technologists. But I think what we’re looking for is a passion in the subject matter that you’re being hired for. So if you’re hiring welders you might say what age did you get interested in welding stuff together or paper hangers or different things. It’s just kind of passion for the thing that you’re being hired for.

SURESH VENKATASUBRAMANIAN: So one of the interesting things that comes up in this– and I’m not implying that this is directly affecting the algorithm that you guys are using at Google– is that even if you don’t look at attributes like gender or race, it’s well known that it’s easy to essentially pick up on correlated attributes that could give you this information effectively. And, in fact, one of the things that we’ve been looking at in our research is can you make some assessment of the potential for bias by discovering these correlations?

For example, if you’re not looking for gender but if you’re looking at things like when you first got into computing, as Kate mentioned. If there’s a disparity in when boys and girls first start getting into computing, it’s quite conceivable that with some collection of attributes you could essentially predict the gender attribute that the algorithm might effectively be doing even if you’re not explicitly asking it to do so. And I’m wondering if you’ve ever investigated any possibilities for things like that?

TOD: Well, the comparison group is peer group, so that’s kind of one of them, so I guess if your peers were fellow girls and you still got in younger, that would help. But I think the larger point is that we ask when people apply to us for a whole ton of different pieces of information– work samples, they come in and interview, they code while they’re there, and all these are different signals. And then at the end of the day we still are asking a hiring committee, a group of peers, to look at all the pieces of information and still decide.

IRA FLATOW: All right, thanks for calling us, Tod, and don’t be a stranger.

TOD: All right, thanks.

IRA FLATOW: Have a good holiday. Interesting people listening to Science Friday.

All right, we’re going to come back– take a short break and come back and talk about some more about hiring and maybe see if you’d rather have a computer or a person do your job interview. I don’t know. Maybe it doesn’t matter anymore.

Well, we’ll talk about it. Stay with us. We’ll be right back after this break.

This is Science Friday. I’m Ira Flatow. In case you joined us, you’ve joined us into a good hour. We’re talking this hour about algorithms and prejudice hidden in computer code with Kate Crawford, principal researcher at Microsoft Research, Visiting Professor at MIT Center for Civic Media. Suresh Venkatasubramanian. He’s an associate professor in the School of Computing in the University of Utah in Salt Lake City. Our number, 844-724-8255. Lots of discussion.

Kate, let me ask you, let’s say a company found that its algorithms were indeed discriminating against job applicants but it saves a ton of money to use machines and wants to stay with the machines. Is there any incentive for the company to change this? And way for it to do that?

KATE CRAWFORD: This is a big risk right now that there just simply isn’t, in some cases, enough incentive for them to stop. If it’s really saving them that much money and they get a few error cases here and there, it’s tempting to stay with the system. And this is part of the reason that, certainly in my work, one of the things I’ve been calling for is how do we have due process with big data systems?

And due process is an idea that’s been around since the Magna Carta. We’ve had it since 1215. But when it comes to big data systems, what does due process look like? And at the very least, I think it has to begin with letting people know that they are being judged by these large, algorithmically-driven systems, and it also should allow people to see the data that’s being used about them and, in some cases, correct the record if it’s bad data.

We all know that there are many systems that have just incorrect data about us. I like taking that browser test which predicts how old you are and what gender you are and I quite frequently come up as a 29-year-old male. I think that says something more about my music tastes than anything else and possibly my film taste. But we know that incorrect data is out there, so how do you let people know that they’re being judged by systems and that they can actually intervene. I think that’s really important.

IRA FLATOW: Let’s go to the phones. Let’s go to Michaela in Kalamazoo, Michigan. Hi there.

MICHAELA: Hi, thanks for taking my call.

So I work at the writing center at my college. I’m a college senior. And as a person I feel like there’s very much a formula for writing a resume. There’s words you’re supposed to use and not use and I hate the idea of a computer weeding through that even more because as a person we worked so hard on these resumes and like that’s all they see us as is this piece of paper.

So my method is to try to use the same words that are going to get you through but have a different voice, something that’s going to catch the person who’s reading it, catch their attention. But if it’s a computer, can they pick up on a voice? Can they really get more of who you are?

KATE CRAWFORD: It’s a great question and in some cases there are interesting ways that you can write algorithms to look for unusual turns of phrase. But perhaps what’s more interesting here is that you are being judged for things that you’re probably not even thinking about in your resume like, for example, your address.

So there was one HR firm that has been using an algorithmically-driven system that gives people extra credit if they live within a close radius of the workplace, if they’re sort of within a 15 minute commute, because that their data showed that if you had a longer commute you were more likely to quit or to be fired within a year, which basically means that the investment for them is being lost.

So what that also means is that they just start to hire people who live nearby, which can have a whole range of other discriminatory functions. Particularly if you’re in an area where it’s expensive to live there, you’re then immediately tending to hire people who have more money and more access to urban areas. So there’s a lot of questions that we might ask about how that resume scanning works. But keep in mind that when you’re trying to find a beautiful turn of phrase, they might just be looking at your address and then that’s actually the real problem in terms of discrimination.

IRA FLATOW: Suresh, you got a common on that?

SURESH VENKATASUBRAMANIAN: Yeah. Another example I’ve heard of is where the system will scan the layout of the various text blocks on the page and use that because of some pattern they found that people lay out blocks in certain ways is a better resume, is more likelihood of someone being a good employee than if they lay it out in some other fashion.

It sounds ludicrous, but again this is one of the things that there’s such a vast choice of patterns that a machine can pick up on. It’s not just the text and it’s not just the words you use. But having said that, there are algorithms that, yes.

For example, in music, there are algorithms that claim they can pick up on tone, they can pick up on characterization. So I think there’s a lot of work on trying to pick up on things like tone and, of course, can you tell someone’s gender from the text they use and the words they use? This is something that’s been around for a long– so there are lots of things you can do with that.

IRA FLATOW: Is there any research that shows that you get a better interview by a person or by computer? Your chances? And should you request to be interviewed by a person if you’re going for a job?

KATE CRAWFORD: I’ve yet to see this get down to the interview stage I have to say, Ira. I think at this point there’s still humans at that interface. But it’s the back end that I’m more worried about.

IRA FLATOW: Could you ask in your resume to be looked at by a person?

SURESH VENKATASUBRAMANIAN: No, but in fact there are companies that are offering the following service where rather than go in for an interview you sit down with Skype and talk to a machine. It sends you questions, you answer. They record the video, they analyze the video, they create a bunch of features and a bunch of estimators, and then give it to a human.

IRA FLATOW: See, I’m ahead of my time.

KATE CRAWFORD: You are.

SURESH VENKATASUBRAMANIAN: So this is happening now, yes.

IRA FLATOW: Wow. Wow. But beyond thinking of a job or a loan application, are algorithms being used by law enforcement in many ways that might actually–

SURESH VENKATASUBRAMANIAN: Oh, yes, everywhere. So I just came back from a workshop the Data Society ran in DC on data and civil rights and it was all about use of data in various stages of the criminal justice pipeline, going from predictive policing to sentencing guidelines to parole guidelines, recidivism rate estimation. It’s been used all over the place.

And that’s, I think, in some ways an even more serious and consequential use of algorithms then in hiring. Not that hiring is less important, but– and there’s a lot of work being done. It’s being pushed all over the place, and there’s a lot of questions about the effectiveness and the value and the ethical use of these methods.

IRA FLATOW: Well, and now with terrorism attacks are we going to see more our algorithms being used do you suspect, Suresh? Or–

[INTERPOSING VOICES]

SURESH VENKATASUBRAMANIAN: Probably. They already are being used. I don’t think it’s– they’re already being used all over the place. I don’t think there’s a question of more. There’s a lot that’s happening that we don’t know about and there is definitely a lot of profiling that’s happening inside at that level. The NSA has a gigantic facility out here in Utah, among other things. So, yes, I think they are going to be doing a lot more of it and that’s exactly the problem I think that.

The thing is this– going back to the example with the caller from Google. Google in many ways in what he describes is doing the right thing. They have a number of different features. They have a number of different predictors. And the hope is that if you put enough of these things together that you hope and, again, you hope that biases may cancel out. Again, you don’t know but you hope so.

The problem is– and as Kate mentioned– when you have bad data, right? You have incorrect data.

There was this horrific story I heard a couple of years ago on This American Life about someone whose name was similar to another known terrorist and was put into an FBI watchlist and 15 years later, even though the FBI had cleared him as being uninvolved with any activities, could not get his name off the watchlist and that bit that was set up with his name was propagating into credit applications and loan– things like that, and he just couldn’t get it out of there.

And so what Kate is saying makes perfect sense. You need a way to verify in some ways whether the data being used for you is accurate in any form. And this is a big problem, I think.

KATE CRAWFORD: Yeah, and I think that’s absolutely right, Suresh. And I’m really interested in what we do about it because I’m concerned about the kind of discrimination were seeing against entire groups– be they African American, be they women, be they people who live in rural areas, you name it. We’re seeing a form of group discrimination often occur in these kinds of systems.

But there are things we can do about it. I think one of the things we’ve been discussing today is how do you have internal systems that are checking the discriminatory outcomes? A lot of technology companies are looking into that. Another thing you can do is external audits. Can we actually really test who’s getting a loan from this company, who’s getting housing from this company, who’s getting jobs over here, and that’s another way to look at it.

But perhaps the thing that I think is most exciting to me right now that I think we can do pretty much immediately is to really think about this concept of data ethics. How do we train a data scientist to be thinking about these questions of discrimination and fairness? When they’re in school, when they’re actually learning how to use these techniques.

It’s actually very powerful to think about these questions in an educational setting so that when you’re out in the world designing these systems these are the first questions you ask yourself. Because, as we know, these systems do encode forms of human blindness and sometimes human bias. And if we’re basically just thinking about who is affected by the system and who might I not be including, that can actually be very powerful.

IRA FLATOW: All right.

SURESH VENKATASUBRAMANIAN: And I’d like to go further than that even. I think along with the education that is incredibly important– and I have been educated myself just by looking at this– we have to look at the algorithms very carefully. Some of the research that I and others and many of us have been looking at is how do you instrument the algorithms to do exactly what Kate’s asking– detect potential for bias and even try to repair bias?

There actually are ways now where you can take a black box algorithm that someone’s using and modify the data that’s going into it to essentially remove signals that could be potentially discriminatory. Now, there are a lot of technical details in how this works but this is some of the research that I’ve been doing.

And there are ways– we’re still very in the early days of doing this research– but there are ways we can do this and essentially we’re trying to formulate a mathematical way of describing bias and describing how to be fair– how algorithms could be fair and trying to implement that into algorithms. So there are lots of things we can do and I think we need a lot more study of this and there is a growing interest in the technical side of things in how to do this.

IRA FLATOW: And we’ll be following this.

I want to thank both of you for taking time to be with us today. Suresh Venkatasubramanian is Associate Professor in the School of Computing at the University of Utah in Salt Lake. Kate Crawford, Principal Researcher at Microsoft Research and Visiting Professor at the MIT Center for Civic Media. Thank you both.

KATE CRAWFORD: Thank you, Ira.

[INTERPOSING VOICES]

IRA FLATOW: –have a happy holiday season.

Copyright © 2015 Science Friday Initiative. All rights reserved. Science Friday transcripts are produced on a tight deadline by 3Play Media. Fidelity to the original aired/published audio or video file might vary, and text might be updated or amended in the future. For the authoritative record of ScienceFriday’s programming, please visit the original aired/published recording. For terms of use and more information, visit our polices pages at http://www.sciencefriday.com/about/policies.

Meet the Producer

About Christopher Intagliata

Christopher Intagliata is Science Friday’s senior producer. He once served as a prop in an optical illusion and speaks passable Ira Flatowese.

  • Stephanie Heacox

    In response to the Google guy – the question regarding when you got into tech relative to your peers is not only a problem for women, it’s a problem for poor people. When you got your first computer can have an awful lot to do with whether your parents could afford one…

  • chrisnfolsom

    People and experience can make leaps and inferences AND look for false positives. Using the stuffing analogy if you don’t account for different types of stuffing and mixed bread with rice and we all know just blindly mixing things together is not the best way to find an answer – of course you might come up with something entirely new of which a computer, again, might not recognize. Man is where he is because of his ability to see patterns, adapt and I am sure computers will get better at that, but just as a committee is notoriously bad at creativity I believe a computer will also suffer from this too. Being in America we were allowed to fail, regroup and try again. If you weigh your algorithm too high for safety you will never see any further.

  • Legend79

    Anecdote re; employment screening tests/surveys. A few years ago, I took one online for a big box retailer, and for some odd reason I answered the questions honestly, and never got thru. When I took the same survey a few months later and answered as I thought the Company would prefer, I got called in for an interview the next day! (It was one of those “strongly agree, strongly disagree” type of Q&A.)

    Then I looked it up, read some “How to land an Interview” website/blogs discussing these screening surveys and discovered that these surveys are seeking the extreme answer candidates, meaning you “strongly agree” or “strongly disagree”, and never answer in the middle, and as I did the first time – honestly!

    Apparently extremism, and not honesty is what employers are seeking. Go figure.

  • Henry Robinson

    The National Weather Service has had thirty or more years experience with these algorithms (called Model Output Statistics) and has faced the ethical aspects years ago. They solved the problem and made it more useful to people. To find out how, mail me at henry,[email protected] or [email protected].

  • chrisnfolsom

    ALL data or analysis is “bad”, or not perfect, the question is if the data and it’s analysis can find a point that is good enough. Politically data is much more easy use top down as we can use it for police to “stop” crime, but we need to use it to prevent crime from happening. Will public policy be more likely to use algorithms to build more prisons, or build more schools? Will businesses be more likely to use algorithms to limit prospects or develop those that are just under some data point to be better? My fear is that people will use algorithms to be lazy and profile just as we have done with many other profiling methods over the years.

  • jimmyt

    math , logic, and statistics are rayciss