Is It Time For Big Data To Get Smaller?

Subscribe to Science Friday

The era of Big Data promised large-scale analytics of complex sets of information, harnessing the predictive power of finding patterns in the real world behaviors of millions of people.

But as new documentaries like The Social Dilemma, Coded Bias, and other recent critiques point out, the technologies we’ve built to collect data have created their own new problems. Even as powerhouses like Google says it’s done tracking and targeting individual users in the name of better advertising, educational institutions, housing providers, and countless others haven’t stopped.

Ira talks to two researchers, mathematician Cathy O’Neil and law scholar Rashida Richardson, about the places our data is collected without our knowing, the algorithms that may be changing our lives, and how bias can creep into every digital corner.

Segment Guests

Rashida Richardson

Rashida Richardson is a visiting scholar at Rutgers Law School-Camden and the Rutgers Institute of Information Policy and Law in New York, New York.

Segment Transcript

IRA FLATOW: This is Science Friday. I’m Ira Flatow. One of my favorite sayings is, “The road to hell is paved with good intentions.” And when the internet was unleashed to the public, I assume that the designers of email, social communities, smartphones and watches had the best of intentions, wonderful, fun meeting places where we could gather to meet and greet and discuss stuff. I was willing to sign those multi-page user agreements in exchange for the services of this brave new world.

But somewhere along that road, the internet became very commercialized, all about the money, collecting details of your life, tracking your movements online and where you’ll walk and whom you’ll talk to, selling all of this big data unregulated to advertisers, government agencies, and whomever would pay for it. It became evident that I was not the customer. I was the product. And the more I investigated, the more troubled I became about biased algorithms, democracy, and social justice.

I wanted to talk to some of the people who’ve been sounding the alarm about the harms of big data collection and the tools used to pry into our private lives, so I’ve asked a couple of them to be on our show today. And let me introduce them, Professor Rashida Richardson, a visiting scholar at both Rutgers Law School and the Rutgers Institute for Information Policy and Law, and Dr. Cathy O’Neil, mathematician, data scientist, and author of the book Weapons of Math Destruction– How Big Data Increases Inequality and Threatens Democracy. She’s CEO of the algorithmic auditing company ORCAA. Welcome to Science Friday.

CATHY O’NEIL: Thanks for having us.

RASHIDA RICHARDSON: Yeah. Thanks, Ira.

IRA FLATOW: Let me begin with one of the reasons I felt I needed to talk about this was the phrase, as I said, you’re not the customer; you are the product. Rashida, is that how we are now?

RASHIDA RICHARDSON: In some regards, yes. But in many regards, it’s that we have a limited economy of choice. We have to use many services and products that do collect a lot of our data to then commodify that data. But there seems to be less and less choice amongst consumers about whether or not you can opt in or out.

IRA FLATOW: Cathy how do you react to that?

CATHY O’NEIL: Yeah. I think about it similarly. When I was doing my research for my book Weapons of Math Destruction, I focused on the algorithms that we interact with by necessity, like when we try to get a job, when we go to college, when we get insurance, apply for credit or a mortgage. We don’t have the option to not be judged by an algorithm in those situations. And we don’t have the option not to have our data used against us or for us.

They profile us in deciding whether we deserve these things. We can think about opting in or opting out of certain types of things. But for the most part, we always will interact with those bureaucracies, and those bureaucracies will be run by algorithms.

IRA FLATOW: Your work is looking at sectors we might not expect our data being used against us, Rashida, as tenants in rental housing, even education contexts. Can you say a bit more about this?

RASHIDA RICHARDSON: Yeah. There’s a lot of data that’s collected on a daily and annual basis by government entities through administrative practices. And they also get access to private data.

And that data can be used and applied to make decisions regarding where students are assigned to go to school, whether or not you’re eligible for public benefits and the amount of public benefits you should receive, where police will patrol or who is likely to be a criminal, which children are likely to be subject to abuse.

And these are all decisions that one, are highly subjective and high risk, but two, I really question whether or not data enables us to make more informed decisions about. And often when applied in algorithmic systems, there is a level of opacity that then makes highly subjective and often discriminatory decisions less visible and certainly reinforces this notion of these decisions being fair and neutral.

IRA FLATOW: Give me an example, Cathy, if you can, of how our data is used against us.

CATHY O’NEIL: Well, the FICO score, the credit score that we’re all used to, that’s like an old-school version of data being used against us, if you will. It’s reasonable in a lot of instances, that it’s being used, because it tracks whether we’re paying our bills. There are all sorts of new-fangled versions of classical credit scores.

And they are being used with all sorts of kinds of data, like our social media data, who we’re friends with, what our chances of getting sick are, what kind of illnesses we might have. The way data protection works in this country, as long as it’s available for sale, it’s not considered protected medical data, it’s allowed to be used pretty much by anybody to decide our options.

IRA FLATOW: Mm-hmm. Rashida, let’s go back to some news that came out recently, that being that Google says that they will no longer track individual users. They want to phase out cookies and use more anonymous groups or categories to target ads. So we’ll get some privacy back, right? Is that good news, Rashida?

RASHIDA RICHARDSON: It’s good news that they recognize their data collection practices are problematic, but I don’t think this one change in policy and practice is now going to make it so that we all have greater privacy protection. Google has been around for over two decades. All of the data that they’ve collected over that time, they still have, and they can still use in a variety of ways. And there is also a lot of ways that Google controls the data that we are able to see or have access to. And that constrains a lot of our choices and opportunities available.

IRA FLATOW: Cathy, you’ve written about targeted ads themselves as being bad for us. Why is that?

CATHY O’NEIL: A lot of people do want targeted ads. I personally enjoy all the glistening, gem-colored yarns that I’m offered on a daily basis because they’re beautiful. So it’s not always a terrible thing.

I think there’s two exceptions to this as a service that we need to consider. And the first one is a lot of the ads are actually predatory. The way the online ecosystem for advertising works is it gives services to lucky people.

So it makes lucky people luckier, if you will. And then it preys on unlucky people. So it makes unlucky people unluckier. And the way you see that is you can see gambling houses will prey on people that might have gambling addictions.

You’ll see same things with for-profit colleges. They specifically microtarget people that don’t really know the difference between private colleges and for-profit colleges and don’t know that they’re not going to get as good an education. There’s the predatory side of it. And then the other thing that we have to keep in mind is that the political landscape, for political campaign ads, that microtargeting, that has a different problem which is a little bit harder to measure. But it essentially is destroying democracy, if you will.

IRA FLATOW: You just can’t say that and then not tell us more about that. Oh, it’s destroying democracy, if you will. In what ways?

CATHY O’NEIL: The most obvious way is because the political campaigns have all the control. They have all the information about the people. They can literally tell one thing to one group of people and another thing to another group of people. They could actually just say different things about the candidates’ policies.

Now, I don’t think that happens too, too often. But what I do think happens is they choose different things to show the different kinds of people. Basically, what it comes down to is instead of the voter becoming informed, they are exposed to the very things that the campaign wants them to be exposed to and to nothing else.

And that includes that it might not even be information at all. It might be simply emotional manipulation. And if you want to go to an extreme example, think about the way in 2016, we discovered after the election that Trump’s campaign actually suppressed the African-American vote with microtargeted ads on Facebook trying to convince Black voters, do not vote, it’s not worth it.

IRA FLATOW: Yeah. That’s certainly an example. And going back, Rashida, to this decades of data, Google has stopped collecting some data now, but how many other places on the internet do we still have to worry about our data being scooped up without us knowing it?

RASHIDA RICHARDSON: Well, I think it’s better to understand data collection and consumer surveillance on maybe a sector level, in that it’s not all on the internet. We have data collection and surveillance happening in financial services, as Cathy touched upon, telecommunications. All our location data as tracked. And then basically in any physical environment we’re in, our information is being collected, whether it’s a workplace, school, or even our own home depending on which type of listening devices you have in your home.

IRA FLATOW: Not to mention facial recognition.

RASHIDA RICHARDSON: That’s why I said basically any physical space. In public spaces, we have tons of CCTV cameras and other technologies that can collect both aggregate data about us as well as very minute and personal data.

IRA FLATOW: And all these companies, they make big money off of selling this data. Shouldn’t they be paying us back something for the use of all that data?

RASHIDA RICHARDSON: Well, I think it’s a little more complicated. Because you also have to understand that while there is some data that is very specific to us as individuals, a lot of data is relational. So simply saying that each of us as individuals can own data and then sell it on a open market is not necessarily a solution either.

Because that tends to reinforce any type of social inequities that exist in society, in that as a Black woman, I know my data is not going to be worth the same as a white man. And what does that mean when there’s different values to data and the primary means of protecting it is selling it?

CATHY O’NEIL: Rashida’s point about how much is my data worth– I mean, one of the things I discovered in my research for for-profit colleges– and those for-profit colleges often targeted single Black mothers. Those clicks were worth a lot of money. I’m not saying that white man’s clicks are not worth money. I’m just saying that you’d be surprised.

I think the real issue is that the bargaining power isn’t there, right? The bargaining power of most of the people in society– they don’t have the time nor the understanding of what their data is worth to actually make the negotiation work in their favor.

It points to a larger problem, of course, which is that going back to your imagination, Ira, the beginning of what the internet was going to be, on the internet, we are not citizens. We are consumers. So it’s all about money all the time. If we go in there thinking we’re in a town square being able to have a conversation, we’re wrong. We’re in a rented space. And we are paying that rent with our data.

IRA FLATOW: And it’s important that we understand that, Cathy, right?

CATHY O’NEIL: Yeah. We have to understand that. Because one of the trickiest things is, how do we change that? What’s the new vision, where it is more like a town square, we’re not being constantly measured and sold?

IRA FLATOW: Does that talk about government regulation now? Is that one of the pathways that we might head, Cathy, or Rashida?

RASHIDA RICHARDSON: I think we need multifaceted approaches. Because yes, government regulation is one part of this conversation in that you can regulate the tech sector or even enforce antitrust regulations to have specific outcomes. But I think the reason why there is not a simple silver bullet solution to all of this is because some of this comes down to societal values. So if we only believe in rugged individualism and free markets as means of addressing everything in society, then that shuts out a lot of marginalized communities and individuals or allows for more predatory practices and situations to emerge for certain groups.

IRA FLATOW: What about cities and states, then, that have talked about banning facial recognition or banning certain kinds of algorithms, Rashida?

RASHIDA RICHARDSON: I think those are necessary steps in that there are certain technologies, like facial recognition and some forms of predictive analytics, that have only demonstrated harm in society. But I don’t think the whack-a-mole approach of banning or putting moratoria on the most egregious examples of bad technology is necessarily our way out of this. Because some of this stems from structural inequalities in society. And not all of these problems are specific to just technology, but they amplify and compound a lot of the problems that have preexisted in society.

CATHY O’NEIL: I would even argue that we don’t need new laws so much as we need to enforce existing laws. One of the things that kills me about algorithms is that they are currently bypassing a lot of really important anti-discrimination laws in the regulated sectors of insurance, credit, and hiring simply because the regulators don’t know how to decide whether an algorithm is compliant. And by the way, the answer is no, it’s probably not compliant.

That’s actually what I do in my day job. I audit algorithms for things like racial bias and gender bias and things like that. And since the data is biased, the algorithms are biased. So all I’m saying is that instead of thinking about, what new laws do we need, I would start with, what about enforcing the existing laws that we have?

RASHIDA RICHARDSON: But I actually want to complicate this a little, because I think there is also problems with how we view some of these problems. So a lot of the anti-discrimination laws are based on intentional discrimination or discriminatory intent. And that’s just a problematic framework, in that there’s tons of discrimination that happens in our society on a daily basis, where if you ask the person who is actually discriminating, they [? wouldn’t ?] be like, that wasn’t intentional or that wasn’t my intent.

I agree with Cathy in that I do think we have some laws that just lack the enforcement. But I also think some of our legal frameworks really need to be revised not only in light of our big data society, but being realistic about how societal problems like discrimination actually operate in society.

IRA FLATOW: Just a quick reminder. This is Science Friday from WNYC Studios. Cathy, you wrote your book Weapons of Math Destruction in 2016. And this year, Netflix documentaries like The Social Dilemma and the upcoming Coded Bias, they’re trying to act as wake-up calls about the downsides of the digital age. Is there something special about this moment in time that these wake-up calls are getting louder and more prominent?

CATHY O’NEIL: I think the answer to your question, Ira, is that the obvious failures of some of the algorithms are becoming so much more obvious. It’s undeniable. And they’re becoming PR fiascos. So facial recognition is an example where it gets to be pretty clear how it’s being used and how it’s failing, thanks in large part to the work of Gender Shades study with Joy Buolamwini and Deb Raji and Temnit Gebru, by the way. I would also caution, though, that there’s a lot of problematic algorithms that are not public facing that are really problematic, and we aren’t hearing about them.

IRA FLATOW: And Rashida, are people paying attention to the right problems in digital surveillance or algorithms? What would you want on lawmakers’ minds as we talk about reforming tech?

RASHIDA RICHARDSON: I think just what Cathy said, to have a more expansive view of what the nature of the problem is. I think a lot of our public discourse is about private sector practices and uses. But a lot of my research and what I think is the worst stuff is what’s happening in government and in the public sector. Because often, we see data surveillance and data applications being used to make high-stakes decisions about people that can completely throw off that the trajectory of their life or inhibit any type of opportunities they have access to.

And I think the way we often talk about data, as well, presumes this level of objectivity or that the data reflects reality in some ways rather than it being very value-laden and subjective. And then that type of subjective framing of data is applied in circumstances where it feels fairer, more neutral decisions are being made. And there are no neutral arbiters, whether it’s an algorithm or a judge. And I think we just need to be a little bit more honest about those realities.

IRA FLATOW: We’ve been talking about this for quite some time now, and I would like to thank both of you for taking time to be with us today.

RASHIDA RICHARDSON: Thanks, Ira.

CATHY O’NEIL: Thank you, Ira.

IRA FLATOW: Professor Rashida Richardson, a visiting scholar at both Rutgers Law School and the Rutgers Institute for Information Policy and Law, and Dr. Cathy O’Neill, mathematician, data scientist, and author of the book Weapons of Math Destruction– How Big Data Increases Inequality and Threatens Democracy. She is CEO of the algorithmic auditing company ORCAA.

Copyright © 2021 Science Friday Initiative. All rights reserved. Science Friday transcripts are produced on a tight deadline by 3Play Media. Fidelity to the original aired/published audio or video file might vary, and text might be updated or amended in the future. For the authoritative record of Science Friday’s programming, please visit the original aired/published recording. For terms of use and more information, visit our policies pages at http://www.sciencefriday.com/about/policies/

Meet the Producers and Host

About Christie Taylor

@ctaylsaurus

Christie Taylor was a producer for Science Friday. Her days involved diligent research, too many phone calls for an introvert, and asking scientists if they have any audio of that narwhal heartbeat.

About Ira Flatow

Ira Flatow is the founder and host of Science Friday. His green thumb has revived many an office plant at death’s door.

Cookie	Duration	Description
_abck	1 year	This cookie is used to detect and defend when a client attempt to replay a cookie.This cookie manages the interaction with online bots and takes the appropriate actions.
ASP.NET_SessionId	session	Issued by Microsoft's ASP.NET Application, this cookie stores session data during a user's website visit.
AWSALBCORS	7 days	This cookie is managed by Amazon Web Services and is used for load balancing.
bm_sz	4 hours	This cookie is set by the provider Akamai Bot Manager. This cookie is used to manage the interaction with the online bots. It also helps in fraud preventions
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
csrftoken	past	This cookie is associated with Django web development platform for python. Used to help protect the website against Cross-Site Request Forgery attacks
JSESSIONID	session	The JSESSIONID cookie is used by New Relic to store a session identifier so that New Relic can monitor session counts for an application.
nlbi_972453	session	A load balancing cookie set to ensure requests by a client are sent to the same origin server.
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
TiPMix	1 hour	The TiPMix cookie is set by Azure to determine which web server the users must be directed to.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
visid_incap_972453	1 year	SiteLock sets this cookie to provide cloud-based website security services.
X-Mapping-fjhppofk	session	This cookie is used for load balancing purposes. The cookie does not store any personally identifiable data.
x-ms-routing-name	1 hour	Azure sets this cookie for routing production traffic by specifying the production slot.

Cookie	Duration	Description
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
bcookie	2 years	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser ID.
bscookie	2 years	LinkedIn sets this cookie to store performed actions on the website.
lang	session	LinkedIn sets this cookie to remember a user's language setting.
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
S	1 hour	Used by Yahoo to provide ads, content or analytics.
sp_landing	1 day	The sp_landing is set by Spotify to implement audio content from Spotify on the website and also registers information on user interaction related to the audio content.
sp_t	1 year	The sp_t cookie is set by Spotify to implement audio content from Spotify on the website and also registers information on user interaction related to the audio content.
UserMatchHistory	1 month	LinkedIn sets this cookie for LinkedIn Ads ID syncing.

Cookie	Duration	Description
__jid	30 minutes	Cookie used to remember the user's Disqus login credentials across websites that use Disqus.
_gat	1 minute	This cookie is installed by Google Universal Analytics to restrain request rate and thus limit the collection of data on high traffic sites.
_gat_UA-28243511-22	1 minute	A variation of the _gat cookie set by Google Analytics and Google Tag Manager to allow website owners to track visitor behaviour and measure site performance. The pattern element in the name contains the unique identity number of the account or website it relates to.
AWSALB	7 days	AWSALB is an application load balancer cookie set by Amazon Web Services to map the session to the target.
countryCode	session	This cookie is used for storing country code selected from country selector.

Cookie	Duration	Description
_fbp	3 months	This cookie is set by Facebook to display advertisements when either on Facebook or on a digital platform powered by Facebook advertising, after visiting the website.
fr	3 months	Facebook sets this cookie to show relevant advertisements to users by tracking user behaviour across the web, on sites that have Facebook pixel or Facebook social plugin.
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
NID	6 months	NID cookie, set by Google, is used for advertising purposes; to limit the number of times the user sees an ad, to mute unwanted ads, and to measure the effectiveness of ads.
personalization_id	2 years	Twitter sets this cookie to integrate and share features for social media and also store information about how the user uses the website, for tracking and targeting.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
vglnk.Agent.p	1 year	VigLink sets this cookie to track the user behaviour and also limit the ads displayed, in order to ensure relevant advertising.
vglnk.PartnerRfsh.p	1 year	VigLink sets this cookie to show users relevant advertisements and also limit the number of adverts that are shown to them.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
_dc_gtm_UA-28243511-20	1 minute	No description
abtest-identifier	1 year	No description
AnalyticsSyncHistory	1 month	No description
ARRAffinityCU	session	No description available.
ccc	1 month	No description
COMPASS	1 hour	No description
cookies.js_dtest	session	No description
debug	never	No description available.
donation-identifier	1 year	No description
f	never	No description available.
GFE_RTT	5 minutes	No description available.
incap_ses_1185_2233503	session	No description
incap_ses_1185_823975	session	No description
incap_ses_1185_972453	session	No description
incap_ses_1319_2233503	session	No description
incap_ses_1319_823975	session	No description
incap_ses_1319_972453	session	No description
incap_ses_1364_2233503	session	No description
incap_ses_1364_823975	session	No description
incap_ses_1364_972453	session	No description
incap_ses_1580_2233503	session	No description
incap_ses_1580_823975	session	No description
incap_ses_1580_972453	session	No description
incap_ses_198_2233503	session	No description
incap_ses_198_823975	session	No description
incap_ses_198_972453	session	No description
incap_ses_340_2233503	session	No description
incap_ses_340_823975	session	No description
incap_ses_340_972453	session	No description
incap_ses_374_2233503	session	No description
incap_ses_374_823975	session	No description
incap_ses_374_972453	session	No description
incap_ses_375_2233503	session	No description
incap_ses_375_823975	session	No description
incap_ses_375_972453	session	No description
incap_ses_455_2233503	session	No description
incap_ses_455_823975	session	No description
incap_ses_455_972453	session	No description
incap_ses_8076_2233503	session	No description
incap_ses_8076_823975	session	No description
incap_ses_8076_972453	session	No description
incap_ses_867_2233503	session	No description
incap_ses_867_823975	session	No description
incap_ses_867_972453	session	No description
incap_ses_9117_2233503	session	No description
incap_ses_9117_823975	session	No description
incap_ses_9117_972453	session	No description
li_gc	2 years	No description
loglevel	never	No description available.
msToken	10 days	No description

Decrypting Big Tech’s Data Hoard

Subscribe to Science Friday

Further Reading

Segment Guests

Segment Transcript

Meet the Producers and Host

About Christie Taylor

About Ira Flatow

Explore More

Subscribe to Science Friday

Further Reading

Segment Guests

Segment Transcript

Meet the Producers and Host

About Christie Taylor

About Ira Flatow

Explore More

Keeping Your Habits Private In A Connected World

Big Tech Can’t Stop The Lies