Decrypting Big Tech’s Data Hoard
The era of Big Data promised large-scale analytics of complex sets of information, harnessing the predictive power of finding patterns in the real world behaviors of millions of people.
But as new documentaries like The Social Dilemma, Coded Bias, and other recent critiques point out, the technologies we’ve built to collect data have created their own new problems. Even as powerhouses like Google says it’s done tracking and targeting individual users in the name of better advertising, educational institutions, housing providers, and countless others haven’t stopped.
Ira talks to two researchers, mathematician Cathy O’Neil and law scholar Rashida Richardson, about the places our data is collected without our knowing, the algorithms that may be changing our lives, and how bias can creep into every digital corner.
Rashida Richardson is a visiting scholar at Rutgers Law School-Camden and the Rutgers Institute of Information Policy and Law in New York, New York.
Cathy O’Neil is an data scientist and mathematician. She is CEO at ORCAA, and author of Weapons of Math Destruction: How Big Data Increases Inequality and Undermines Democracy.
IRA FLATOW: This is Science Friday. I’m Ira Flatow. One of my favorite sayings is, “The road to hell is paved with good intentions.” And when the internet was unleashed to the public, I assume that the designers of email, social communities, smartphones and watches had the best of intentions, wonderful, fun meeting places where we could gather to meet and greet and discuss stuff. I was willing to sign those multi-page user agreements in exchange for the services of this brave new world.
But somewhere along that road, the internet became very commercialized, all about the money, collecting details of your life, tracking your movements online and where you’ll walk and whom you’ll talk to, selling all of this big data unregulated to advertisers, government agencies, and whomever would pay for it. It became evident that I was not the customer. I was the product. And the more I investigated, the more troubled I became about biased algorithms, democracy, and social justice.
I wanted to talk to some of the people who’ve been sounding the alarm about the harms of big data collection and the tools used to pry into our private lives, so I’ve asked a couple of them to be on our show today. And let me introduce them, Professor Rashida Richardson, a visiting scholar at both Rutgers Law School and the Rutgers Institute for Information Policy and Law, and Dr. Cathy O’Neil, mathematician, data scientist, and author of the book Weapons of Math Destruction– How Big Data Increases Inequality and Threatens Democracy. She’s CEO of the algorithmic auditing company ORCAA. Welcome to Science Friday.
CATHY O’NEIL: Thanks for having us.
RASHIDA RICHARDSON: Yeah. Thanks, Ira.
IRA FLATOW: Let me begin with one of the reasons I felt I needed to talk about this was the phrase, as I said, you’re not the customer; you are the product. Rashida, is that how we are now?
RASHIDA RICHARDSON: In some regards, yes. But in many regards, it’s that we have a limited economy of choice. We have to use many services and products that do collect a lot of our data to then commodify that data. But there seems to be less and less choice amongst consumers about whether or not you can opt in or out.
IRA FLATOW: Cathy how do you react to that?
CATHY O’NEIL: Yeah. I think about it similarly. When I was doing my research for my book Weapons of Math Destruction, I focused on the algorithms that we interact with by necessity, like when we try to get a job, when we go to college, when we get insurance, apply for credit or a mortgage. We don’t have the option to not be judged by an algorithm in those situations. And we don’t have the option not to have our data used against us or for us.
They profile us in deciding whether we deserve these things. We can think about opting in or opting out of certain types of things. But for the most part, we always will interact with those bureaucracies, and those bureaucracies will be run by algorithms.
IRA FLATOW: Your work is looking at sectors we might not expect our data being used against us, Rashida, as tenants in rental housing, even education contexts. Can you say a bit more about this?
RASHIDA RICHARDSON: Yeah. There’s a lot of data that’s collected on a daily and annual basis by government entities through administrative practices. And they also get access to private data.
And that data can be used and applied to make decisions regarding where students are assigned to go to school, whether or not you’re eligible for public benefits and the amount of public benefits you should receive, where police will patrol or who is likely to be a criminal, which children are likely to be subject to abuse.
And these are all decisions that one, are highly subjective and high risk, but two, I really question whether or not data enables us to make more informed decisions about. And often when applied in algorithmic systems, there is a level of opacity that then makes highly subjective and often discriminatory decisions less visible and certainly reinforces this notion of these decisions being fair and neutral.
IRA FLATOW: Give me an example, Cathy, if you can, of how our data is used against us.
CATHY O’NEIL: Well, the FICO score, the credit score that we’re all used to, that’s like an old-school version of data being used against us, if you will. It’s reasonable in a lot of instances, that it’s being used, because it tracks whether we’re paying our bills. There are all sorts of new-fangled versions of classical credit scores.
And they are being used with all sorts of kinds of data, like our social media data, who we’re friends with, what our chances of getting sick are, what kind of illnesses we might have. The way data protection works in this country, as long as it’s available for sale, it’s not considered protected medical data, it’s allowed to be used pretty much by anybody to decide our options.
IRA FLATOW: Mm-hmm. Rashida, let’s go back to some news that came out recently, that being that Google says that they will no longer track individual users. They want to phase out cookies and use more anonymous groups or categories to target ads. So we’ll get some privacy back, right? Is that good news, Rashida?
RASHIDA RICHARDSON: It’s good news that they recognize their data collection practices are problematic, but I don’t think this one change in policy and practice is now going to make it so that we all have greater privacy protection. Google has been around for over two decades. All of the data that they’ve collected over that time, they still have, and they can still use in a variety of ways. And there is also a lot of ways that Google controls the data that we are able to see or have access to. And that constrains a lot of our choices and opportunities available.
IRA FLATOW: Cathy, you’ve written about targeted ads themselves as being bad for us. Why is that?
CATHY O’NEIL: A lot of people do want targeted ads. I personally enjoy all the glistening, gem-colored yarns that I’m offered on a daily basis because they’re beautiful. So it’s not always a terrible thing.
I think there’s two exceptions to this as a service that we need to consider. And the first one is a lot of the ads are actually predatory. The way the online ecosystem for advertising works is it gives services to lucky people.
So it makes lucky people luckier, if you will. And then it preys on unlucky people. So it makes unlucky people unluckier. And the way you see that is you can see gambling houses will prey on people that might have gambling addictions.
You’ll see same things with for-profit colleges. They specifically microtarget people that don’t really know the difference between private colleges and for-profit colleges and don’t know that they’re not going to get as good an education. There’s the predatory side of it. And then the other thing that we have to keep in mind is that the political landscape, for political campaign ads, that microtargeting, that has a different problem which is a little bit harder to measure. But it essentially is destroying democracy, if you will.
IRA FLATOW: You just can’t say that and then not tell us more about that. Oh, it’s destroying democracy, if you will. In what ways?
CATHY O’NEIL: The most obvious way is because the political campaigns have all the control. They have all the information about the people. They can literally tell one thing to one group of people and another thing to another group of people. They could actually just say different things about the candidates’ policies.
Now, I don’t think that happens too, too often. But what I do think happens is they choose different things to show the different kinds of people. Basically, what it comes down to is instead of the voter becoming informed, they are exposed to the very things that the campaign wants them to be exposed to and to nothing else.
And that includes that it might not even be information at all. It might be simply emotional manipulation. And if you want to go to an extreme example, think about the way in 2016, we discovered after the election that Trump’s campaign actually suppressed the African-American vote with microtargeted ads on Facebook trying to convince Black voters, do not vote, it’s not worth it.
IRA FLATOW: Yeah. That’s certainly an example. And going back, Rashida, to this decades of data, Google has stopped collecting some data now, but how many other places on the internet do we still have to worry about our data being scooped up without us knowing it?
RASHIDA RICHARDSON: Well, I think it’s better to understand data collection and consumer surveillance on maybe a sector level, in that it’s not all on the internet. We have data collection and surveillance happening in financial services, as Cathy touched upon, telecommunications. All our location data as tracked. And then basically in any physical environment we’re in, our information is being collected, whether it’s a workplace, school, or even our own home depending on which type of listening devices you have in your home.
IRA FLATOW: Not to mention facial recognition.
RASHIDA RICHARDSON: That’s why I said basically any physical space. In public spaces, we have tons of CCTV cameras and other technologies that can collect both aggregate data about us as well as very minute and personal data.
IRA FLATOW: And all these companies, they make big money off of selling this data. Shouldn’t they be paying us back something for the use of all that data?
RASHIDA RICHARDSON: Well, I think it’s a little more complicated. Because you also have to understand that while there is some data that is very specific to us as individuals, a lot of data is relational. So simply saying that each of us as individuals can own data and then sell it on a open market is not necessarily a solution either.
Because that tends to reinforce any type of social inequities that exist in society, in that as a Black woman, I know my data is not going to be worth the same as a white man. And what does that mean when there’s different values to data and the primary means of protecting it is selling it?
CATHY O’NEIL: Rashida’s point about how much is my data worth– I mean, one of the things I discovered in my research for for-profit colleges– and those for-profit colleges often targeted single Black mothers. Those clicks were worth a lot of money. I’m not saying that white man’s clicks are not worth money. I’m just saying that you’d be surprised.
I think the real issue is that the bargaining power isn’t there, right? The bargaining power of most of the people in society– they don’t have the time nor the understanding of what their data is worth to actually make the negotiation work in their favor.
It points to a larger problem, of course, which is that going back to your imagination, Ira, the beginning of what the internet was going to be, on the internet, we are not citizens. We are consumers. So it’s all about money all the time. If we go in there thinking we’re in a town square being able to have a conversation, we’re wrong. We’re in a rented space. And we are paying that rent with our data.
IRA FLATOW: And it’s important that we understand that, Cathy, right?
CATHY O’NEIL: Yeah. We have to understand that. Because one of the trickiest things is, how do we change that? What’s the new vision, where it is more like a town square, we’re not being constantly measured and sold?
IRA FLATOW: Does that talk about government regulation now? Is that one of the pathways that we might head, Cathy, or Rashida?
RASHIDA RICHARDSON: I think we need multifaceted approaches. Because yes, government regulation is one part of this conversation in that you can regulate the tech sector or even enforce antitrust regulations to have specific outcomes. But I think the reason why there is not a simple silver bullet solution to all of this is because some of this comes down to societal values. So if we only believe in rugged individualism and free markets as means of addressing everything in society, then that shuts out a lot of marginalized communities and individuals or allows for more predatory practices and situations to emerge for certain groups.
IRA FLATOW: What about cities and states, then, that have talked about banning facial recognition or banning certain kinds of algorithms, Rashida?
RASHIDA RICHARDSON: I think those are necessary steps in that there are certain technologies, like facial recognition and some forms of predictive analytics, that have only demonstrated harm in society. But I don’t think the whack-a-mole approach of banning or putting moratoria on the most egregious examples of bad technology is necessarily our way out of this. Because some of this stems from structural inequalities in society. And not all of these problems are specific to just technology, but they amplify and compound a lot of the problems that have preexisted in society.
CATHY O’NEIL: I would even argue that we don’t need new laws so much as we need to enforce existing laws. One of the things that kills me about algorithms is that they are currently bypassing a lot of really important anti-discrimination laws in the regulated sectors of insurance, credit, and hiring simply because the regulators don’t know how to decide whether an algorithm is compliant. And by the way, the answer is no, it’s probably not compliant.
That’s actually what I do in my day job. I audit algorithms for things like racial bias and gender bias and things like that. And since the data is biased, the algorithms are biased. So all I’m saying is that instead of thinking about, what new laws do we need, I would start with, what about enforcing the existing laws that we have?
RASHIDA RICHARDSON: But I actually want to complicate this a little, because I think there is also problems with how we view some of these problems. So a lot of the anti-discrimination laws are based on intentional discrimination or discriminatory intent. And that’s just a problematic framework, in that there’s tons of discrimination that happens in our society on a daily basis, where if you ask the person who is actually discriminating, they [? wouldn’t ?] be like, that wasn’t intentional or that wasn’t my intent.
I agree with Cathy in that I do think we have some laws that just lack the enforcement. But I also think some of our legal frameworks really need to be revised not only in light of our big data society, but being realistic about how societal problems like discrimination actually operate in society.
IRA FLATOW: Just a quick reminder. This is Science Friday from WNYC Studios. Cathy, you wrote your book Weapons of Math Destruction in 2016. And this year, Netflix documentaries like The Social Dilemma and the upcoming Coded Bias, they’re trying to act as wake-up calls about the downsides of the digital age. Is there something special about this moment in time that these wake-up calls are getting louder and more prominent?
CATHY O’NEIL: I think the answer to your question, Ira, is that the obvious failures of some of the algorithms are becoming so much more obvious. It’s undeniable. And they’re becoming PR fiascos. So facial recognition is an example where it gets to be pretty clear how it’s being used and how it’s failing, thanks in large part to the work of Gender Shades study with Joy Buolamwini and Deb Raji and Temnit Gebru, by the way. I would also caution, though, that there’s a lot of problematic algorithms that are not public facing that are really problematic, and we aren’t hearing about them.
IRA FLATOW: And Rashida, are people paying attention to the right problems in digital surveillance or algorithms? What would you want on lawmakers’ minds as we talk about reforming tech?
RASHIDA RICHARDSON: I think just what Cathy said, to have a more expansive view of what the nature of the problem is. I think a lot of our public discourse is about private sector practices and uses. But a lot of my research and what I think is the worst stuff is what’s happening in government and in the public sector. Because often, we see data surveillance and data applications being used to make high-stakes decisions about people that can completely throw off that the trajectory of their life or inhibit any type of opportunities they have access to.
And I think the way we often talk about data, as well, presumes this level of objectivity or that the data reflects reality in some ways rather than it being very value-laden and subjective. And then that type of subjective framing of data is applied in circumstances where it feels fairer, more neutral decisions are being made. And there are no neutral arbiters, whether it’s an algorithm or a judge. And I think we just need to be a little bit more honest about those realities.
IRA FLATOW: We’ve been talking about this for quite some time now, and I would like to thank both of you for taking time to be with us today.
RASHIDA RICHARDSON: Thanks, Ira.
CATHY O’NEIL: Thank you, Ira.
IRA FLATOW: Professor Rashida Richardson, a visiting scholar at both Rutgers Law School and the Rutgers Institute for Information Policy and Law, and Dr. Cathy O’Neill, mathematician, data scientist, and author of the book Weapons of Math Destruction– How Big Data Increases Inequality and Threatens Democracy. She is CEO of the algorithmic auditing company ORCAA.