What Would You Pay For Faster, Smarter Government Data?
Whether we’re aware of it or not, “the cloud” has changed our lives forever. It’s where we watch movies, share documents, and store passwords. It’s quick, efficient, and we wouldn’t be able to live our fast-paced, internet-connected lives without it.
Now, federal agencies are storing much of their data in the cloud. For example, NASA is trying to make 20 petabytes of data available to the public for free. But to do that, they need some help from a commercial cloud provider—a company like Amazon or Microsoft or Google. But will the government’s policy of open data clash with the business model of Silicon Valley? Mariel Borowitz, Assistant Professor at Georgia Tech and Katya Abazajian, Open Cities Director with the Sunlight Foundation join guest host John Dankosky to discuss the trade offs to faster, smarter government data in the cloud.
Invest in quality science journalism by making a donation to Science Friday.
Mariel Borowitz is an assistant professor at the Sam Nunn School of International Affairs of Georgia Institute of Technology in Atlanta, Georgia.
Katya Abazajian is the Open Cities Director for The Sunlight Foundation, in Washington D.C..
JOHN DANKOSKY: This is Science Friday. I’m John Dankosky. Whether we’re aware of it or not, the cloud has changed our lives forever. It’s where we watch movies, share documents, and store passwords.
It’s quick it’s efficient, and we wouldn’t be able to live our fast-paced, internet-connected lives without it. And now federal agencies are finally catching up with the 21st century and storing much of their data in the cloud too.
For example, NASA is trying to make 20 petabytes of data available to the public for free, but to do that, they need some help from a commercial cloud provider, a company like Amazon or Microsoft or Google. But will the government’s open data policy clash with the business model of Silicon Valley, and what are the trade to faster, smarter, easier access to government data in the cloud?
Joining me now to talk about this is Mariel Borowitz, assistant professor at the Sam Nunn school of International Affairs at Georgia Tech. Mariel, welcome to Science Friday.
MARIEL BOROWITZ: Hi. Thank you for having me.
JOHN DANKOSKY: So let’s start off the bat by just getting real clear here. We’re not talking about data that needs to be kept secret or private, anything that’s classified. So what kind of data are we talking about, and who’s using it?
MARIEL BOROWITZ: Sure. So there’s all sorts of different types of government data, but a lot of the agencies that are running into this issue, first are the science agencies. So groups like NASA that have lots of satellite data, Earth observation data, NOAA collects all sorts of different weather data from satellites and other sources. NIH is collecting a lot of genetic data and other health-related data. So lots of different sources of data– but especially in these science agencies.
JOHN DANKOSKY: And stuff that they want people to be able to use that’s open for people to use in research of their own.
MARIEL BOROWITZ: Exactly. So there are many, many users of this data already with NASA, Earth Science, they had over 4 million users last year accessing their data.
JOHN DANKOSKY: So what’s prompted these federal agencies to start using the cloud for data storage?
MARIEL BOROWITZ: It’s really out of necessity. So the amount of data that they’re collecting had just gotten to be so large that they can’t make it all available with a traditional– just put it on a server and put it up on a website portal. So NOAA, for example, right now through their online portals, you can only get at about 10% of their data. And they want to make all of the data available so you really have to move to the cloud to do that.
JOHN DANKOSKY: Tell us about some of the benefits to the user for having this data in the cloud. What would be different from the way that maybe they stored it in the past?
MARIEL BOROWITZ: Sure. So there’s a couple of benefits. One, if you– just to actually store this much data, to have accessibility to it and make sure you can actually get to all of it and download whichever part that you want, say, you want just some element of it, that’s one piece of it. So like I mentioned, with the current system, NOAA can only get 10% of its data out there. There’s just too much data to make it all available on the web portal. So one is just access.
And then the second piece is analysis. So if you do want to analyze a large part of this data set, you can’t just download that onto your own laptop computer and run that analysis. You really need to do that type of analysis in a cloud environment.
JOHN DANKOSKY: So tell us more about the government’s open data policy. And I don’t know, would it be breaking any laws if it didn’t make this data free and open to people? If, for instance, they didn’t have a cloud solution to get it all up there, I mean, what are the consequences?
MARIEL BOROWITZ: Sure. So in the US, the Obama administration had an open government directive in 2009. That’s what started the data.gov movement and an idea of getting all this data up online and available to people. And that’s really spread globally, so there’s already more than 70 countries that have similar initiatives.
And so the agencies, even without the cloud computing capability, they’re still abiding by open data. They’re making their data freely available. They’re not trying to charge for it or make revenue, which is something that had been done in the past. So it will be open, and that’s actually one of the tricky things about this move to the cloud environment. It actually doesn’t go against the technical rules of being open, because the agency itself is not charging for the actual data.
What you would be paying potentially is you’d be paying Amazon, for example, the cost of downloading the data or the cost of using their Cloud Analysis product. But in reality, from the user perspective, what you get is a situation where the data you used to work with for free, now you have to pay some type of fee to access that.
JOHN DANKOSKY: So that’s one of the models. Maybe you could explain this, because that’s not the only model being explored. There’s one that’s kind of a fee for service. You’d pay a little bit of something to the cloud computing system, and there’s another model in which it would just be free and open to people to use anywhere. Explain these different models, if you would.
MARIEL BOROWITZ: Right. Absolutely. So the agencies have control over how they set this up, depending on what their budget is and what their capabilities are. So with the pilot program that NOAA is doing right now, for example, it’s set up the way we just described, where the NOAA data is free from NOAA’s perspective, but through the commercial providers they’re working with, you would actually pay the commercial providers if you wanted to download the data or analyze it in the cloud.
NASA, on the other hand, in their initial program on the cloud, NASA actually covers the costs of the user downloading the data or the user doing some of the Cloud Analysis. So it’s only up to a certain point. They can’t cover endless amounts of what people want to do, but for the most part, for most users, that’s a free– [INAUDIBLE] just be completely free.
JOHN DANKOSKY: Do you see that there’s a possible impact from how many people are going to try to use the data and whether or not it costs anything? I mean, is there some sort of a limitation there if there’s any cost applied?
MARIEL BOROWITZ: Yes, there is. So if you look historically, when we had costs imposed on data, it really did significantly decrease the amount that people accessed and used that data. So one of the examples I like to point to, the US has this Landsat satellite system that collects just remote sensing imagery– so imagery of the Earth all around the globe. It was made freely available online in 2008.
Before that happened, the largest number of images they ever sold was about 25,000. Within a couple of years, after making the data freely available online, they were distributing about 250,000 images a month.
JOHN DANKOSKY: So that’s a big difference.
MARIEL BOROWITZ: Yes. Exactly. Yeah. It certainly makes a difference.
JOHN DANKOSKY: One of the questions we have is, why don’t these agencies just develop their own cloud systems? It seems as though they’re smart enough. They certainly have the technical ability. We wanted to check in with someone from NASA about this. Kevin Murphy is program executive for Earth Science Data Systems at NASA’s Goddard Space Flight Center, and here’s what he told us.
KEVIN MURPHY: We have about 3 million users a year that use our products. And building out systems which are capable and having the right security policies enabled for people to openly access government products is very difficult. Utilizing these commercial environments allows us to have a security enclave within them, which is accessible by anyone, but managed by NASA.
So one, it’s costly. We don’t have the same size or efficiency as these commercial cloud providers. The second thing is that by moving to these commercial cloud providers, the data becomes more accessible by people who don’t have government credentials.
JOHN DANKOSKY: And again, that’s Kevin Murphy from NASA. Mariel, maybe you could just respond to that. It sounds as though he lays out a pretty convincing argument for why this might work for NASA.
MARIEL BOROWITZ: Right, and I think he touched on a lot of the important points here. So there certainly is a debate going back and forth in agencies around the world about whether they should build their own cloud system or go with these commercial options.
And as Kevin mentioned, some of the benefits of going commercial are the system’s already out there, so you can just go ahead and start using it right away, which is a benefit in terms of time. But then also, those companies have huge workforces, big physical infrastructure that’s much larger than anything one agency is ever going to be able to replicate. So they can really take advantage of all of that and make sure they’re staying on the cutting edge of that technology.
JOHN DANKOSKY: The cutting edge and really provide that type of access that people expect now from cloud computing systems, which are just everywhere. It’s what we’re used to dealing with in our personal lives.
MARIEL BOROWITZ: Exactly.
JOHN DANKOSKY: I want to bring in another guest, Katya Abazajian is Open Cities director with the Open Data Nonprofit, the Sunlight Foundation, based in Washington DC. Katya, welcome to Science Friday. Thanks for being here.
KATYA ABAZAJIAN: Hi. Thank you for having me.
JOHN DANKOSKY: So you help cities and local governments develop their own open data policies. I guess I’m wondering what your take is on these sorts of plans, hiring big third party groups to store this really important data.
KATYA ABAZAJIAN: Yeah. So the fundamental belief behind open data policies is that the public has a right to public information and that ultimately, citizens are the owners of public data. And so it becomes tricky when you start working with either partnering or contracting with commercial data providers because essentially, what you’re doing is giving a private entity control over a public resource. And as we know of Silicon Valley business models, data is an extremely profitable resource for them. And so there is a lot that goes into crafting those agreements, and I think we advocate for agreements that remain transparent and accessible. And that’s a huge priority for government staff as they’re making these decisions about where to put their data.
JOHN DANKOSKY: I mean, we could probably imagine some ourselves, but maybe you could walk me through one or two of the potential pitfalls here. What are the things that you worry that Silicon Valley might do if they control and house all this data?
KATYA ABAZAJIAN: Sure. Well, there are the basics, like the operational issues that come with partnering with a private entity. And one is just that they have control over how the data is presented and provided to the public, and they can choose to charge a fee, of course, in the agreement that they set up with the government. But also, they might provide the data in an [? inaccessible ?] format. They might make it more difficult for users to use it.
So that’s on the very basic operational level, but then, on a more complicated level, private entities have a stake in analyzing that data and generating insights that will then be profitable or interesting to third parties. And so that’s what we’re seeing with a lot of private and sensitive data that gets shared on a personal level. And it’s not always a risk, but it’s something that should be considered as government staff are crafting these agreements is how is the data going to be reused by the partner who is also participating in this relationship.
JOHN DANKOSKY: We actually we talked to NASA’s Kevin Murphy about crafting these arrangements and how they approach this. Let’s take a listen.
KEVIN MURPHY: You know, today, even without the use of the cloud, we have to pay for storage, and we maintain the ownership of the data in that storage. So what I’d say is that this isn’t like a brand new relationship that we’re entering into. We’ve been buying storage and we’ve been buying hardware for a long time.
We look at these commercial cloud entities as other vendors of storage and hardware, and that’s who will operate and own everything that it purchases within those environments. So we’re trying to make systems which really don’t show much difference to the user communities that we currently have. Maybe a better functionality, but NASA will continue to operate, manage, and own everything in them.
JOHN DANKOSKY: Katya, what do you say to that? How do you respond?
KATYA ABAZAJIAN: Yeah, that question of ownership is a really central one, and I think it’s really great that NOAA and NASA both have taken that into account in crafting their agreements with their partners. It does become more of an issue when you’re working with regional or local level governments, which I do. And there are a wide variety of agencies that go into partnerships that maybe aren’t as protective of public data ownership, and so that’s an extremely essential point to cover in the agreements.
JOHN DANKOSKY: Mariel, I’m wondering if you can talk about how to make sure to preserve that ownership, because what we do know as consumers, when we’re dealing with big cloud [? entities, ?] It’s really hard to know what we own and what the cloud provider owns. I mean, talk about making sure ownership is clear.
MARIEL BOROWITZ: Right. So I think the role for this is really for the agency, when they’re setting up these agreements, to be very clear in terms of who owns the data, what the commercial entity is allowed to do with the data. That, in the existing agreements, for example, those entities cannot put licenses on the data that would restrict who can use it or anything like that. So attention to all of those details, to the exact licensing, to the price, all of that is going to be very important.
JOHN DANKOSKY: Do you have any concerns that there may be some sort of conflict of interest? I mean, you’ve got companies like Google that provide these services that are also deeply enmeshed within the economy. They’re doing all sorts of their own science. I guess I’ll ask you first Mariel, and maybe you could follow up, Katya, but do you have concerns about, maybe, say, Google being the [? stored ?] of this really important data?
MARIEL BOROWITZ: So I think to the extent that they’re working very closely with agencies and you have a very close relationship in terms of making sure the data is usable and following these kind of plans of the agency, I think it’s all right. And I’ll point out also that these agencies, just like any other user in the United States or in the world, already has access to this data. So it’s open data, not just to researchers or nonprofits, but also to commercial entities. And so there are cases already separate from these agreements with government where groups like Amazon or Google download the government data, just like any other user would, and then make it available on their platforms as a way to kind of bring people in and get them to use their platform.
JOHN DANKOSKY: I’m John Dankosky, and this is Science Friday from WNYC Studios. Katya, I’m wondering could you pick up on that? Do you have any thoughts?
KATYA ABAZAJIAN: Yes. I think that– I agree that a lot of these agreements do already take into account a lot of the nuance in the licensing that’s necessary. And actually where we see this become more of a conflict is at the local level where there isn’t necessarily as much nuance in the oversight of the agreements and how those agreements are crafted.
For example, you have entities like Sidewalk Labs in Toronto and also working in New York that are installing urban tech infrastructure that are collecting data, and they say that they will share open data, but that’s a completely different use case. And so it does become more nuanced depending on which agency you’re talking about. But broadly speaking, I do think that a lot of the details of those agreements at the federal level are fairly good about protecting the ownership and the rights of the public data.
JOHN DANKOSKY: Katya, I’m wondering if we could loop back with you to the question we had earlier about maybe even the small processing fee that you might have to pay. Does that, in your mind, pass muster with the idea of open data, if you have to pay a fee to an Amazon, say, to get any sort of information from NOAA or NASA?
KATYA ABAZAJIAN: I mean, frankly, no, that’s not open data. If you have to pay a fee to access it, then it doesn’t become public data anymore. And that is, like I said earlier, one of the absolutely fundamental elements of open data policy is that it’s accessible to the public and that it’s usable for free.
JOHN DANKOSKY: I’m wondering, Mariel, are any of these agencies working together to figure out how to do this, or is everyone casting off on their own, cutting deals with a Google or an Amazon, or are they working together to try to figure out the best way to solve this big data problem?
MARIEL BOROWITZ: Sure. So it’s a little bit of both. Certainly the agencies have their independent programs. They do have independent negotiations with these companies. It’s not just one government negotiation with Amazon, for example. But they are in communication with each other, so certainly, NASA and NOAA and NIH, NSF, these agencies, and even internationally are communicating with each other, not just about how to set up these arrangements, but also how to ensure usability, how to get the word out about how things might be changing. So there is some communication there.
JOHN DANKOSKY: Katya, how would this system look different if you could design it? I mean, what would you say is the best possible way to solve the problem of this costing an awful lot of money, making sure that we get data that is open and accessible, and is available to people through the cloud, but it does all the protections that you’re hoping for?
KATYA ABAZAJIAN: Mm-hmm. Well, I think that the use cases for each different agency are unique, right, and so there’s always going to have to be a certain level of customization and consideration of what we hope that these agencies will do. I mean, we want them to be sophisticated in the analysis that they’re able to do. We want them to have the flexibility to be creative. So we definitely want these partnerships to be able to happen.
But at the same time, I think, in my view, it would be ideal if the public could have access to the content of these agreements in a way where there could be at least some accountability and oversight into the way that private entities are then sharing the data or publishing it. So the element of public oversight, I think, is really crucial because many of these agreements don’t have to be public proactively.
JOHN DANKOSKY: Katya Abazijian is the Open Cities director with the Sunlight Foundation based in Washington DC. Thank you so much for joining us. I really appreciate it.
KATYA ABAZAJIAN: Thank you.
JOHN DANKOSKY: Thanks also to Mariel Borowitz, assistant professor at the Sam Nunn school of International Affairs at Georgia Tech. She’s been writing about this big issue, and I’m really glad you brought it to our attention and that you could join us today. Thank you, Mariel.
MARIEL BOROWITZ: Thank you.
JOHN DANKOSKY: Thanks also to Kevin Murphy, program executive for Earth Science Data Systems at NASA’s Goddard Space Flight Center for sharing his comments with us.
John Dankosky works with the radio team to create our weekly show, and is helping to build our State of Science Reporting Network. He’s also been a long-time guest host on Science Friday. He and his wife have four cats, thousands of bees, and a yoga studio in the sleepy Northwest hills of Connecticut.