Ride-Sharing Data Will Be Available to All. Will Privacy Be Protected?

Subscribe to Science Friday

A driver using Uber. Photo by noeltock/flickr/CC BY-NC 2.0

It’s no secret that ride-sharing company Uber collects large amounts of data on its users and their ride requests. So far, Uber has resisted requests from cities like Seattle and New York City that it share its data with city planners, who want to use it to assess traffic patterns and more. Now, Uber is voluntarily releasing some kinds of ride data to anyone who wants it. Its online tool, Movement, will initially offer data like ride durations only to city planners, but the company says it intends to open the site to the general public in the next few months.

Uber is also a company that has dealt with complaints about its privacy practices over the years, which have entailed outright breaches of privacy, as well as a new update that tracks users’ locations after their rides conclude.

In this edition of Good Thing/Bad Thing, Marc Rotenberg of the Electronic Privacy Information Center analyzes the pros and cons of sharing users’ data.

Segment Guests

Marc Rotenberg

Marc Rotenberg is President of the Electronic Privacy Information Center in Washington, D.C..

Segment Transcript

JOHN DANKOSKY: Well, now it’s time to play Good Thing, Bad Thing.

[MUSIC PLAYING]

Because every story has a flip side, ride-sharing company Uber has a gift for us this year. It’s data from more than 2 billion trips taken by using the app. It’s called Uber Movement. The tool will be available to city planners now, but maybe to the rest of us in a few months from now.

It sounds pretty cool and useful, but not everyone is embracing. Here to explain the good and the bad of this big data dump is Marc Rotenberg. He’s president and executive director of the Electronic Privacy Information Center in Washington, DC. Marc, welcome back to the show.

MARC ROTENBERG: Thank you, nice to be with you.

JOHN DANKOSKY: So first of all, what kind of ways could Uber’s data be beneficial to cities?

MARC ROTENBERG: Well, we can think about data in aggregate terms, like census data, for example. It can be used to help better plan transportation decision-making. Where to put roads, how to do load leveling, how to ensure that there’s less traffic, for example. Big issue for city management, particularly as we move into an era of smart cities.

But key, of course, to all of this is ensuring that the privacy of individual user data will be protected. So the focus has been on the collection and use of aggregate data, anonymized data. That’s really what the debate is about.

JOHN DANKOSKY: Well, and Uber has resisted giving this data to cities like New York City in the past, talking about user privacy. So why now? Why is all this information coming out now?

MARC ROTENBERG: That’s a very interesting question. Uber, of course, has also been under a lot of pressure from the cities. They’ve come into a lot of cities where there are established taxi services. The incumbents are resisting Uber’s presence. And I think the cities may feel a little bit that if they get some of their user data from Uber for their planning purposes, now there’s a benefit that they didn’t previously have.

And that’s what’s creating the tension in this particular policy proposal. Uber’s being asked to give some information to the cities. If it’s actually personally identifiable, it will be hugely problematic because, of course, most people don’t want the personal data associated with the rides to be turned over to local government. So they have to find a way to do this, if it goes forward, to protect user identity.

JOHN DANKOSKY: And how exactly would they do that? I mean, every single trip is attached to a person. The rider is really the data that they’re tracking. So how do you split those two things apart?

MARC ROTENBERG: That is the hard problem in this whole debate. And there are some very smart people working in the field right now with analytics and deidentification of anonymization, trying to see if it is truly possible to take information that begins as personally identifiable, which most certainly the ride information associated with the Uber service is, and transform it in a way so that even with lots of technology and lots of processing power, you can’t reconstruct the original identity information you might have had.

It turns out, as people such as Professor Latanya Sweeney have demonstrated, that with very little information, it may be quite easy to reconstruct identity. You have other researchers such as Cynthia Dwork who have developed techniques like differential privacy that try to help people assess what the risk of reidentification is. But I think the hard problem here still is to show provably that they will be able to deidentify the user data before they turn it over to the cities.

JOHN DANKOSKY: Could you maybe give us a specific example of what you’d be concerned about? I mean, what would an individual Uber user have to worry about here?

MARC ROTENBERG: Well, I would be concerned if it turned out, in fact, that the data could not be provably deidentified. That would be a threshold problem a little bit like saying that we couldn’t assure you that the drinking water in your town was safe to drink. Now, if you want to ask me the next question, which is, well, what are the actual consequences of having unsafe drinking water, then we could talk about concrete examples.

But you see, from a privacy perspective, once you lose the ability to assure that the data is deidentified, now you have to be considering everything from surveillance, stalking, cyberhacking, credit card theft, identity theft, financial fraud. There’s a long list of potential risk to the users of the Uber service. And that’s why you need to deal with a threshold problem, which is the deidentification issue.

JOHN DANKOSKY: So obviously, that’s something you would need to feel good about this. Is there something in this, though, Marc, where maybe by releasing all of this data, Uber is now going to be held to a slightly different account? For a while, cities have been asking for data like this. Now they’re saying you can have it. Are we going to able to hold the company that gives so many of us rides more accountable for the way they do their work?

MARC ROTENBERG: So that’s an excellent question. And what we’ve said throughout the debate about deidentification is that we don’t think you can leave it to Uber or to the cities, for that matter, to ensure that the data will be properly deidentified to protect against these risks. I think actually need a third party independent ombudsman essentially representing the privacy interests of Uber customers to be able to determine whether these techniques are working as they’re supposed to work.

An even better approach, by the way, might be simply to have a state law or a federal law which says to Uber that if in fact you do disclose personally identifiable information, there’ll be some liability. And I think that would keep both Uber and the cities operating in a way that’s more aligned with the interests of the Uber customers.

JOHN DANKOSKY: So really, put a law on the books like that as Uber comes into new markets, or as cities and states negotiate with Uber for how it works within their jurisdictions?

MARC ROTENBERG: Absolutely. And returning to the census example, I don’t think it’s the case that aggregate data can’t be useful. Obviously census data is very useful and it should be made available.

But the key to the census, as we know from our own history, is that if you do get to the point in time where people start to dig down and say, well, we’ve got this track data, we’ve got this aggregate data. But we actually want to find out, for example, during the Second World War where the Japanese are living, or we want to learn after 9/11 where the Muslims are. Suddenly there’s a real risk in that aggregate data, and that’s the problem we’re anticipating here, a possible misuse of aggregate data.

JOHN DANKOSKY: And I would say there’s a lot more to talk about that we’ve run out of time for. But I want to thank Marc Rotenberg. He’s president and executive director of the Electronic Privacy Information Center.

Copyright © 2016 Science Friday Initiative. All rights reserved. Science Friday transcripts are produced on a tight deadline by 3Play Media. Fidelity to the original aired/published audio or video file might vary, and text might be updated or amended in the future. For the authoritative record of ScienceFriday’s programming, please visit the original aired/published recording. For terms of use and more information, visit our policies pages at http://www.sciencefriday.com/about/policies/

Meet the Producer

About Christie Taylor

@ctaylsaurus

Christie Taylor was a producer for Science Friday. Her days involved diligent research, too many phone calls for an introvert, and asking scientists if they have any audio of that narwhal heartbeat.

Cookie	Duration	Description
_abck	1 year	This cookie is used to detect and defend when a client attempt to replay a cookie.This cookie manages the interaction with online bots and takes the appropriate actions.
ASP.NET_SessionId	session	Issued by Microsoft's ASP.NET Application, this cookie stores session data during a user's website visit.
AWSALBCORS	7 days	This cookie is managed by Amazon Web Services and is used for load balancing.
bm_sz	4 hours	This cookie is set by the provider Akamai Bot Manager. This cookie is used to manage the interaction with the online bots. It also helps in fraud preventions
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
csrftoken	past	This cookie is associated with Django web development platform for python. Used to help protect the website against Cross-Site Request Forgery attacks
JSESSIONID	session	The JSESSIONID cookie is used by New Relic to store a session identifier so that New Relic can monitor session counts for an application.
nlbi_972453	session	A load balancing cookie set to ensure requests by a client are sent to the same origin server.
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
TiPMix	1 hour	The TiPMix cookie is set by Azure to determine which web server the users must be directed to.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
visid_incap_972453	1 year	SiteLock sets this cookie to provide cloud-based website security services.
X-Mapping-fjhppofk	session	This cookie is used for load balancing purposes. The cookie does not store any personally identifiable data.
x-ms-routing-name	1 hour	Azure sets this cookie for routing production traffic by specifying the production slot.

Cookie	Duration	Description
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
bcookie	2 years	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser ID.
bscookie	2 years	LinkedIn sets this cookie to store performed actions on the website.
lang	session	LinkedIn sets this cookie to remember a user's language setting.
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
S	1 hour	Used by Yahoo to provide ads, content or analytics.
sp_landing	1 day	The sp_landing is set by Spotify to implement audio content from Spotify on the website and also registers information on user interaction related to the audio content.
sp_t	1 year	The sp_t cookie is set by Spotify to implement audio content from Spotify on the website and also registers information on user interaction related to the audio content.
UserMatchHistory	1 month	LinkedIn sets this cookie for LinkedIn Ads ID syncing.

Cookie	Duration	Description
__jid	30 minutes	Cookie used to remember the user's Disqus login credentials across websites that use Disqus.
_gat	1 minute	This cookie is installed by Google Universal Analytics to restrain request rate and thus limit the collection of data on high traffic sites.
_gat_UA-28243511-22	1 minute	A variation of the _gat cookie set by Google Analytics and Google Tag Manager to allow website owners to track visitor behaviour and measure site performance. The pattern element in the name contains the unique identity number of the account or website it relates to.
AWSALB	7 days	AWSALB is an application load balancer cookie set by Amazon Web Services to map the session to the target.
countryCode	session	This cookie is used for storing country code selected from country selector.

Cookie	Duration	Description
_fbp	3 months	This cookie is set by Facebook to display advertisements when either on Facebook or on a digital platform powered by Facebook advertising, after visiting the website.
fr	3 months	Facebook sets this cookie to show relevant advertisements to users by tracking user behaviour across the web, on sites that have Facebook pixel or Facebook social plugin.
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
NID	6 months	NID cookie, set by Google, is used for advertising purposes; to limit the number of times the user sees an ad, to mute unwanted ads, and to measure the effectiveness of ads.
personalization_id	2 years	Twitter sets this cookie to integrate and share features for social media and also store information about how the user uses the website, for tracking and targeting.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
vglnk.Agent.p	1 year	VigLink sets this cookie to track the user behaviour and also limit the ads displayed, in order to ensure relevant advertising.
vglnk.PartnerRfsh.p	1 year	VigLink sets this cookie to show users relevant advertisements and also limit the number of adverts that are shown to them.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
_dc_gtm_UA-28243511-20	1 minute	No description
abtest-identifier	1 year	No description
AnalyticsSyncHistory	1 month	No description
ARRAffinityCU	session	No description available.
ccc	1 month	No description
COMPASS	1 hour	No description
cookies.js_dtest	session	No description
debug	never	No description available.
donation-identifier	1 year	No description
f	never	No description available.
GFE_RTT	5 minutes	No description available.
incap_ses_1185_2233503	session	No description
incap_ses_1185_823975	session	No description
incap_ses_1185_972453	session	No description
incap_ses_1319_2233503	session	No description
incap_ses_1319_823975	session	No description
incap_ses_1319_972453	session	No description
incap_ses_1364_2233503	session	No description
incap_ses_1364_823975	session	No description
incap_ses_1364_972453	session	No description
incap_ses_1580_2233503	session	No description
incap_ses_1580_823975	session	No description
incap_ses_1580_972453	session	No description
incap_ses_198_2233503	session	No description
incap_ses_198_823975	session	No description
incap_ses_198_972453	session	No description
incap_ses_340_2233503	session	No description
incap_ses_340_823975	session	No description
incap_ses_340_972453	session	No description
incap_ses_374_2233503	session	No description
incap_ses_374_823975	session	No description
incap_ses_374_972453	session	No description
incap_ses_375_2233503	session	No description
incap_ses_375_823975	session	No description
incap_ses_375_972453	session	No description
incap_ses_455_2233503	session	No description
incap_ses_455_823975	session	No description
incap_ses_455_972453	session	No description
incap_ses_8076_2233503	session	No description
incap_ses_8076_823975	session	No description
incap_ses_8076_972453	session	No description
incap_ses_867_2233503	session	No description
incap_ses_867_823975	session	No description
incap_ses_867_972453	session	No description
incap_ses_9117_2233503	session	No description
incap_ses_9117_823975	session	No description
incap_ses_9117_972453	session	No description
li_gc	2 years	No description
loglevel	never	No description available.
msToken	10 days	No description