Ride-Sharing Data Will Be Available to All. Will Privacy Be Protected?
It’s no secret that ride-sharing company Uber collects large amounts of data on its users and their ride requests. So far, Uber has resisted requests from cities like Seattle and New York City that it share its data with city planners, who want to use it to assess traffic patterns and more. Now, Uber is voluntarily releasing some kinds of ride data to anyone who wants it. Its online tool, Movement, will initially offer data like ride durations only to city planners, but the company says it intends to open the site to the general public in the next few months.
Uber is also a company that has dealt with complaints about its privacy practices over the years, which have entailed outright breaches of privacy, as well as a new update that tracks users’ locations after their rides conclude.
In this edition of Good Thing/Bad Thing, Marc Rotenberg of the Electronic Privacy Information Center analyzes the pros and cons of sharing users’ data.
Marc Rotenberg is President of the Electronic Privacy Information Center in Washington, D.C..
JOHN DANKOSKY: Well, now it’s time to play Good Thing, Bad Thing.
Because every story has a flip side, ride-sharing company Uber has a gift for us this year. It’s data from more than 2 billion trips taken by using the app. It’s called Uber Movement. The tool will be available to city planners now, but maybe to the rest of us in a few months from now.
It sounds pretty cool and useful, but not everyone is embracing. Here to explain the good and the bad of this big data dump is Marc Rotenberg. He’s president and executive director of the Electronic Privacy Information Center in Washington, DC. Marc, welcome back to the show.
MARC ROTENBERG: Thank you, nice to be with you.
JOHN DANKOSKY: So first of all, what kind of ways could Uber’s data be beneficial to cities?
MARC ROTENBERG: Well, we can think about data in aggregate terms, like census data, for example. It can be used to help better plan transportation decision-making. Where to put roads, how to do load leveling, how to ensure that there’s less traffic, for example. Big issue for city management, particularly as we move into an era of smart cities.
But key, of course, to all of this is ensuring that the privacy of individual user data will be protected. So the focus has been on the collection and use of aggregate data, anonymized data. That’s really what the debate is about.
JOHN DANKOSKY: Well, and Uber has resisted giving this data to cities like New York City in the past, talking about user privacy. So why now? Why is all this information coming out now?
MARC ROTENBERG: That’s a very interesting question. Uber, of course, has also been under a lot of pressure from the cities. They’ve come into a lot of cities where there are established taxi services. The incumbents are resisting Uber’s presence. And I think the cities may feel a little bit that if they get some of their user data from Uber for their planning purposes, now there’s a benefit that they didn’t previously have.
And that’s what’s creating the tension in this particular policy proposal. Uber’s being asked to give some information to the cities. If it’s actually personally identifiable, it will be hugely problematic because, of course, most people don’t want the personal data associated with the rides to be turned over to local government. So they have to find a way to do this, if it goes forward, to protect user identity.
JOHN DANKOSKY: And how exactly would they do that? I mean, every single trip is attached to a person. The rider is really the data that they’re tracking. So how do you split those two things apart?
MARC ROTENBERG: That is the hard problem in this whole debate. And there are some very smart people working in the field right now with analytics and deidentification of anonymization, trying to see if it is truly possible to take information that begins as personally identifiable, which most certainly the ride information associated with the Uber service is, and transform it in a way so that even with lots of technology and lots of processing power, you can’t reconstruct the original identity information you might have had.
It turns out, as people such as Professor Latanya Sweeney have demonstrated, that with very little information, it may be quite easy to reconstruct identity. You have other researchers such as Cynthia Dwork who have developed techniques like differential privacy that try to help people assess what the risk of reidentification is. But I think the hard problem here still is to show provably that they will be able to deidentify the user data before they turn it over to the cities.
JOHN DANKOSKY: Could you maybe give us a specific example of what you’d be concerned about? I mean, what would an individual Uber user have to worry about here?
MARC ROTENBERG: Well, I would be concerned if it turned out, in fact, that the data could not be provably deidentified. That would be a threshold problem a little bit like saying that we couldn’t assure you that the drinking water in your town was safe to drink. Now, if you want to ask me the next question, which is, well, what are the actual consequences of having unsafe drinking water, then we could talk about concrete examples.
But you see, from a privacy perspective, once you lose the ability to assure that the data is deidentified, now you have to be considering everything from surveillance, stalking, cyberhacking, credit card theft, identity theft, financial fraud. There’s a long list of potential risk to the users of the Uber service. And that’s why you need to deal with a threshold problem, which is the deidentification issue.
JOHN DANKOSKY: So obviously, that’s something you would need to feel good about this. Is there something in this, though, Marc, where maybe by releasing all of this data, Uber is now going to be held to a slightly different account? For a while, cities have been asking for data like this. Now they’re saying you can have it. Are we going to able to hold the company that gives so many of us rides more accountable for the way they do their work?
MARC ROTENBERG: So that’s an excellent question. And what we’ve said throughout the debate about deidentification is that we don’t think you can leave it to Uber or to the cities, for that matter, to ensure that the data will be properly deidentified to protect against these risks. I think actually need a third party independent ombudsman essentially representing the privacy interests of Uber customers to be able to determine whether these techniques are working as they’re supposed to work.
An even better approach, by the way, might be simply to have a state law or a federal law which says to Uber that if in fact you do disclose personally identifiable information, there’ll be some liability. And I think that would keep both Uber and the cities operating in a way that’s more aligned with the interests of the Uber customers.
JOHN DANKOSKY: So really, put a law on the books like that as Uber comes into new markets, or as cities and states negotiate with Uber for how it works within their jurisdictions?
MARC ROTENBERG: Absolutely. And returning to the census example, I don’t think it’s the case that aggregate data can’t be useful. Obviously census data is very useful and it should be made available.
But the key to the census, as we know from our own history, is that if you do get to the point in time where people start to dig down and say, well, we’ve got this track data, we’ve got this aggregate data. But we actually want to find out, for example, during the Second World War where the Japanese are living, or we want to learn after 9/11 where the Muslims are. Suddenly there’s a real risk in that aggregate data, and that’s the problem we’re anticipating here, a possible misuse of aggregate data.
JOHN DANKOSKY: And I would say there’s a lot more to talk about that we’ve run out of time for. But I want to thank Marc Rotenberg. He’s president and executive director of the Electronic Privacy Information Center.