From Strava to Facebook to Venmo, You May Be Leaking Data
Last weekend, an Australian researcher pointed out on Twitter that a “heat map” of popular running locations released by the fitness app Strava could be used to help identify the locations of military installations in deserted areas. By examining the map for activity in places where you normally wouldn’t expect to find a group of younger, fitness-minded western individuals working out, analysts were able to highlight a number of likely military sites, including some that had not been previously disclosed.
Strava released their global heatmap. 13 trillion GPS points from their users (turning off data sharing is an option). https://t.co/hA6jcxfBQI … It looks very pretty, but not amazing for Op-Sec. US Bases are clearly identifiable and mappable pic.twitter.com/rBgGnOzasq
— Nathan Ruser (@Nrg8000) January 27, 2018
It’s not just Strava, though. Even if you think you’re being careful about what you reveal online, the apps and services you use may be exposing bits of data about your habits. People have given away their location by posting a geo-tagged image on social media. Snapchat’s Snap Map feature can reveal the locations of your friends. A glance at your Venmo feed can give outsiders information about your personal life. Even the patterns of which posts you like and interact with on Facebook can be used to help draw inferences about your private world.
Techno-sociologist Zeynep Tufecki and open-source intelligence expert Gavin Sheridan join Ira to talk about what information we’re all leaking, and what, if anything, can be done about protecting our privacy in an online, socially-networked world.
Zeynep Tufekci is an associate professor in the School of Information and Library Science at the University of North Carolina, Chapel Hill in Chapel Hill, North Carolina.
Gavin Sheridan is cofounder of Vizlegal and an open-source intelligence specialist based in Dublin, Ireland.
IRA FLATOW: This is Science Friday. I’m Ira Flatow.
Last weekend an Australian researcher pointed out on Twitter that a heat map of popular running locations released by the fitness app Strava could reveal the locations of military installations in deserted areas, places around the world.
And if you looked at the map for runner’s activity where you normally wouldn’t expect to find young, fitness-minded Westerners working out, you could uncover a number of likely military sites, including some that had not been previously disclosed.
It’s not just Strava though, even if you think you’re being careful about what you reveal online, the apps and the services you use may be exposing bits of data about your habits.
A couple of examples. People have been giving away their location by posting a geotagged image on a Craigslist ad. Snapchat’s Snap Map feature can reveal the location of your friends. A glance at your Venmo feed can give outsiders information about your personal life. Even the patterns of which post you like and interact with on Facebook can be used to help draw inferences about your private world.
So what’s going on here? Is there anything we can do to help keep our lives more private? Joining me now is Zeynep Tufekci. She’s an associate professor in the School of Information and Library Science at the University of North Carolina in Chapel Hill and contributing opinion writer at the New York Times who wrote a really interesting piece recently about the Strava case. Welcome to Science Friday.
ZEYNEP TUFEKCI: Thank you for inviting me.
IRA FLATOW: You’re welcome. Also Gavin Sheridan is an open source intelligence specialist, and also co-founder of Vizlegal. He joins me via Skype from Dublin, Ireland. Welcome to Science Friday.
GAVIN SHERIDAN: Thank you for having me
IRA FLATOW: Gavin, just briefly what happened in this Strava case.
GAVIN SHERIDAN: So Strava published a map showing the activity of people who contribute data to Strava’s platform. Which allows you to see, all over the world, where people were running or cycling, and sharing their data with Strava. And you could zoom in on any part of the world to see where people are running and what kind of routes they take. And it allows you to get this amazing, kind of, visualization, I guess, of people’s activity.
IRA FLATOW: So this was really not a hack. This was not someone releasing data everyone thought was private. But people using Strava knew they were giving out this information.
GAVIN SHERIDAN: Precisely. Yeah, it actually comes from Strava labs. So they collect all this data from all their users. And they had this idea to create a map to allow anybody to come in and see where people are running and what kind of risk people take.
And you know it’s very interesting to look at it because you can see where people are cycling, where people are running, and where you are swimming. And you can kinda get an idea of the kind of activities people are doing and where to do it.
IRA FLATOW: But, of course, when they turn to on. They didn’t realize that the rest of the world would be watching them also.
GAVIN SHERIDAN: Well to some extent, yeah. I mean, they put it out there. And I guess the first thing people do in this situation is that they zoom in on places that they’re familiar with. So people will usually, you know, whether it’s a Google satellite image or a Strata map to zoom in on places that they’re familiar with. So they look at where they live. And they look at places that they’ve run themselves, and they’ll see what other people are doing. But the problem with that is essentially that they published it for the whole world, so it means that we can see where everybody is running.
IRA FLATOW: Zeynep, so who’s to blame here? Is it the user? Is it the company? I mean, is it the military people, the officers were not telling their soldiers, turn this stuff off when you’re running? I mean–
ZEYNEP TUFEKCI: I think this is a really interesting case that shows we’re all to blame and nobody is to blame. Because it’s very hard to predict what any piece of data will reveal not on its own. But when it’s joined with all sorts of other data. So it might make perfect sense for somebody say, who’s winning in Seattle, to let this be public. And they look at the heat map, and they discover new routes. There might be a perfectly reasonable use case for it. But obviously, you know, you zoom in Yemen and there’s this rectangle that’s clearly the perimeter of something. And it’s probably not somebody from Yemen with an iPhone running. It looks like a military base. So these kind of inadvertent revelations that come from the fact that the data doesn’t live by itself but gets joined with millions and millions. This has 300 trillion GPS points. And it shows more than what anybody bargained for.
IRA FLATOW: Because you don’t know. You write in your story, you don’t know what you’re bargaining for. And in fact, you say, the company doesn’t know what it’s bargaining for.
ZEYNEP TUFEKCI: Absolutely not. In fact I don’t think Strava went out to say, let’s expose the alleged CIA annex at Mogadishu airport. Right? So this is a problem. Our privacy protections depend on this alleged informed consent. But the companies are in no position to inform us because they don’t know what the data is going to be doing out in the world. So we’re not in any position to consent to what we cannot comprehend is going to happen. It’s really a difficult moment.
IRA FLATOW: Yeah. You really don’t know what you’re consenting to when you press the EULA consent button.
ZEYNEP TUFEKCI: Also the companies make their– a lot of business models depend on collecting all our data. So it doesn’t even make, even if we assume they’re trying to inform us, they cannot. And very often it’s just legalese. It’s pages and pages of stuff. We just click on, I agree. And it’s not really meaningful.
IRA FLATOW: Gavin, let’s talk about some examples of how people can inadvertently reveal information. You have an example with people taking a selfie on the first day of their new job. Tell us what’s wrong with that.
GAVIN SHERIDAN: So what’s really interesting about the platforms that people use. So everybody who’s listening would be familiar with posting a photograph to Instagram or to Facebook or other social platforms, and not just applications like Strava. What’s interesting about that is that those platforms usually have what’s called an API, or an application programming interface. What that allows people like me to do, or people who are involved in software is, it allows you to check those APIs, not by the person you’re interested in, but by the location that you’re interested in. So, for example, if I’m interested in the Pentagon as a location, I could draw a circle around the Pentagon and then I could check social APIs like Instagram’s or Facebook’s or Twitter’s. And I could see who is sharing location information from that specific location. So say within 100 meters of the center of the Pentagon, who is sharing information? And then I can aggregate that information. I can draw a picture of all the people who have ever geotagged a picture or video or tweet from that location over time. And I think people, sometimes, if anybody who uses Instagram will often say, well OK I’m sharing my information. I’m taking a photograph. And I’ll geotag it to say, hey, I’m at whatever location. And often that’s some kind of social proof to say, I’m on vacation. Here’s an amazing location. What’s also interesting about it though is that when you’re looking at a through a data level, I can pull all those API’s at the same time. And then I can extract all the activities of all those different people who visit that location. And that gives me a certain level of understanding of somebody on their first day at work in a company like Google or Apple might take a photograph of their ID badge and say, hey, it’s my first day. They’re going to geotag their location to that company or with that location that they’re at. The problem with that, to some extent, is that somebody could be watching that. In theory and in practice are. They can see all the people who take photographs at that location. And then they can understand, to some extent, who might be being hired by the company. Because it’s their first day. And you can also infer things about that. So you can say what area does that person work in based on their data and profile? Or what, where in the world are they from? Is it somebody that this company has brought in from outside the US? Or is it from somewhere in the US? And does that tell me or give me some intelligence about what the behavior of that company is at the moment?
IRA FLATOW: Wow. Does that bother you, too?
ZEYNEP TUFEKCI: Well, not only that. Of course that bothers me. And you know it’s very hard to predict the future uses. This doesn’t even bring in what artificial intelligence can infer. So for example when you, sort of, post on social media and, you know, you just post pictures on Instagram. You’re probably not thinking that a machine learning algorithm could predict the onset of my depression before clinical symptoms. But they can. Right, so when you’re posting online, there’s all sorts of ways that artificial intelligence can now statistically infer things about you. Just looking at, say your Facebook likes, we have this published research that shows that just your Facebook likes can statistically, fairly reliably, reveal your race, gender, your sexual orientation, substance abuse potential, whether you’re depressed state, your personality character. So there’s all these, even if you did not disclose them, so it’s not looking at your Facebook like, and you joined some depression help group. It’s just looking at the pattern and inferring private things about you that you didn’t realize you were disclosing. So there’s, when you put your data out in the world, it’s not just the data you’re putting out. You’re letting machine intelligence, these algorithms, kind of, churn through it and figure things out about you. It’s being done.
IRA FLATOW: And to what purpose? Why is it being done?
ZEYNEP TUFEKCI: Well at the moment it is mostly being done to sell us stuff and to make us click on things. But in countries like China, and maybe, at some point, it’s plausible here, too. It will be done to predict for social control. Right? Who’s a dissident? If you’re hiring people who might be prone to unionizing, who might be prone to get pregnant in the next few years, who might be somebody that you think is going to cause your company medical expenses. There are all these ways in which it can be used for authoritarian purposes, social control purposes, corporate purposes. And the scary thing for me is that even if you reveal it and this, sort of, the consent form says that you’re revealing to the public. And it may used to say, look at your picture, which you already know, and you’re like, OK you can look at my picture. What you can’t predict is that what that picture along with other data about you that’s brought together will reveal. There’s just we, it’s moving very fast. Every week there are new papers and new things coming out. And I think this is a moment in which there has to be a real good reason for data to be stored as a, just because, we don’t have a handle on what’s going on.
IRA FLATOW: Gavin do you agree? I mean, this is–
GAVIN SHERIDAN: I totally agree. I think Europe has, over here in Europe, we’re kind of grappling with this issue quite significantly because a new law will come into effect later this year called the General Data Protection Regulation. And it deals with these issues of privacy in a kind of a fundamental way about what you can and can’t do, and about the consent of people and how they consent to their data being used.
And even to take on Zeynep’s point about what’s possible through how people publicly share. I mean, what people don’t, I think, imagine is that when you have an Instagram account and you follow a few hundred people and you have a few hundred followers and your account is public. Maybe you’ve posted a few hundred pictures over the past year. What machines can do with that today is extremely significant.
I can take all your photographs and apply a machine learning algorithm or a machine vision algorithm to it. And I can classify your photographs based on what you took photographs of.
So you know, I’m here in Dublin and I could say give me all the photographs of people who’ve taken pictures of pints of Guinness over the past 12 months at certain locations. And then I could then pull my way through all the photographs that person has taken over time, where they’ve geotagged information. I could look through who they follow, and who follows them, and look at the interconnectedness of their social graph, and figure out something about the person, and about who they are, and about what they’re interested in, how old they are. Without that person necessarily believing that the information is possibly to be, you know, derived from what they’re shared. They’re not necessarily understanding how, what they think might be relatively inane or innocuous information can be, kind of, extrapolated. I think it’s a serious concern that people are not really aware of this.
IRA FLATOW: But what I hear what Zeynep and you are saying is that this information can be weaponized.
ZEYNEP TUFEKCI: Absolutely. And your mental model of what you’re doing is you’re just sharing on Instagram with a couple of hundred people. And that’s your mental model. But that’s not the reality.
That information is being collected for a reason. At the moment it’s mostly to profile you for corporate things. But history tells us this is such a tempting political target to profile people. And it’s already being used to sort of decide who to hire and decide what to do.
It gets scarier. These algorithms that do all this discernment, that are developing so fast. We don’t even really understand how they’re doing it. So the way they work, they sort of classify things, and let’s say they pick users that the company is about to hire, and says, oh these people are prone to depression and these people are not. But we don’t really know what piece of data came together with what other piece of data.
So it’s not like you can say, OK now let’s go pull out, say, the Instagram color profile and then you won’t be able to do it, because we don’t understand how the classification is being done. Not only can we not foresee future uses, we can’t even decide which piece of data to hold back that’s going to provide the critical threshold for classification because the machines are doing it on their own without our understanding of what exactly they’re doing. It’s really potent and powerful.
IRA FLATOW: I hear you. This is Science Friday from by PRI, Public Radio International. Talking about your privacy with Gavin Sheridan and Zeynep Tufekci. This is scary stuff.
ZEYNEP TUFEKCI: It is.
IRA FLATOW: I mean, and now we have facial recognition everywhere.
ZEYNEP TUFEKCI: Absolutely.
IRA FLATOW: Right. We have from photos that you post, from being in the street, your picture is taken 100 times a day. So you put all this stuff together. Are we just, do you just throw up your hands and say, privacy is over?
ZEYNEP TUFEKCI: I don’t think so. So for one thing I want to say, even if you cover your face, these machine learning algorithms can recognize their gait. How you walk. So they’re really powerful.
But there was a time in which we had lead in paint. Right? We had every, and we didn’t have seat belts in cars. And it might have seemed crazy to say that we would not have lead in paint, and we wouldn’t use asbestos, and all of those things. So I actually am quite hopeful that we’ve just begun.
And there are technical ways in which we can have all the conveniences and the nice things that all of this data gives. Because obviously it’s not all downside. If you look at Strava’s heat map and you’re a runner and you find new paths. So there’s all these positive things. There are technical ways in which we could collect some data and encrypt it in particular ways and bring it together in very particular ways so that we have conveniences and the power without this kind of individual identification.
The problem is, at the moment Silicon Valley is basically minting money with the current collect everything and do whatever you want model. So they’re not really incentivized to provide us with these services without the surveillance.
IRA FLATOW: But doesn’t the Facebook problem that just occurred, doesn’t that give them a little bit of a hint. There’s a problem with–
ZEYNEP TUFEKCI: I think there is a problem. But when you’re half a trillion dollar company with your stock going up all the time. The hint isn’t overriding the incentive. So the reason I’m hopeful is that, if we change the incentive structure, I think we have the resources and the technical means to keep a huge amount of the benefits and just not do things this way.
IRA FLATOW: Gavin do you agree? Is there hope for us?
GAVIN SHERIDAN: I think there is. I think there’s also an awful lot of interesting technologies coming as well that will cause more problems. So for example, the intersplicing of photographs on top of videos. So for example, you’ve posted a photograph on Instagram of yourself. That picture is then put on a video to make it look like you’re in a video that you’re not actually in. And it looks very realistic. It looks like that it’s you in the video, but it’s not you in the video. And that kind of technique is already being used online.
But I think overall there is a couple of questions here. One is a question to the platforms like Facebook and Google and other platforms is, how transparent and open are they being about the algorithms that they’re using and how those algorithms are being applied? I think that’s a really important question for anybody who is using social media at all.
And why are they seeing the things that they see in their feed? Why are they getting suggested things? Why are ads being targeted the way they are at that person?
IRA FLATOW: So you’re saying we should be asking these questions.
GAVIN SHERIDAN: I think it’s going to be a combination of things in the future. I think it’s like we’re in year zero of social media. Where we think it’s been around for a long time. But actually we’re at the very start.
I think one thing is how do we kind of interrogate the platforms that we’re using to oblige them perhaps to tell us what they know about us. And that’s one thing that the GDPR in Europe is going to cause some problems to the social platforms about. But also whether and, to some extent, regulation is inevitable or not about what these social platforms are allowed to do with our data. And I think that that’s the question for the next five years is. How will the platforms be proactive about telling us what they’re doing in real time, not just like retrospectively? But also, how will legislatures deal with us.
IRA FLATOW: All right. We’ve run out of time. Gavin Sheridan also co-founder of Vizlegal in Dublin, and Zeynep Tufekci associate professor in School of Information Library Science University of North Carolina at Chapel Hill. Thank you both for taking time to be with us today.
ZEYNEP TUFEKCI: Thank you for inviting us.
GAVIN SHERIDAN: Thank you for having me.
IRA FLATOW: You’re welcome. Take a break. We’ll be right back after this short break.