Plugging Into DNA for Digital Data Storage

Subscribe to Science Friday

Luis Ceze, the UW Torode Family Career Development Professor of Computer Science & Engineering, and research scientist Lee Organick prepare DNA containing digital data for sequencing, which allows them to "read" and retrieve the original files. Credit: Tara Brown Photography/University of Washington — Luis Ceze, the University of Washington Torode Family Career Development Professor of Computer Science & Engineering, and research scientist Lee Organick prepare DNA containing digital data for sequencing, which allows them to “read” and retrieve the original files. Credit: Tara Brown Photography/University of Washington

DNA is the storage system for the biological code of the human genome. Now, engineers are tapping into this natural code to store digital data. For instance, Georg Seelig, an engineer from the University of Washington, and his team were able to store and retrieve digitized photos on strands of DNA. Seelig discusses how to translate binary code into the four nucleotide bases.

Segment Guests

Georg Seelig

Georg Seelig is an associate professor in Electrical Engineering and Computer Science & Engineering at the University of Washington in Seattle, Washington.

Segment Transcript

IRA FLATOW: Speaking of DNA, DNA is the storage locker for our genetic code. And why not convert the ones and zeroes of a digital code into the AGTCs of the genetic code and store that digital information in our DNA? A team of engineers did just that. The engineers were able to turn digitized photos and videos into DNA code. And then they were able to retrieve the info out of DNA, which is very important if you’re going to use it as a storage medium, with 100% accuracy. Very important step.

So why not create a hard drive out of DNA? My next guest is here to tell us how to do that. Georg Seelig is part of that team. And he’s an Associate Professor of Electrical and Computer Science at the University of Washington. He joins us from KNOW. Welcome.

GEORG SEELIG: Welcome, it’s good to be here.

IRA FLATOW: So the first question is why do you do this? We have thumb drives that can store terabytes of information. Is DNA a great storage medium?

GEORG SEELIG: I think it is. There’s three main reasons why that’s true. The first one is it’s really small. It’s really dense. So you can just cram a lot of information into very small space. The second is that it lasts for a long time. I mean, we can retrieve DNA from very old, 100,000 year old, fossils. And the third one is that DNA, as you’ve just talked about, is our genetic material. And so will always be interested in reading DNA. So it’s not like a floppy disk that 10 years from now nobody will be able to look at.

IRA FLATOW: In your experimentation, tell us what you do. What kind of photos? How you were able to digitize the DNA?

GEORG SEELIG: OK, yes, so the process is actually quite simple. So anytime you store image or video or something on a computer, you’ve already digitized it into zeroes and ones. And then what we need to do is translate these zeroes and ones to A’s and C’s and G’s and T’s. So you could do that by just saying, an A is 0-0. T is 0-1. G is 1-0. And C is 1-1.

And so you just translate zeros and ones to letters. Then you essentially ask a DNA synthesis company to make strands of DNA for you that have the right series of letters. You get the DNA in the mail. Actually, you store it in a fridge as of now, but in different settings in the future. And then you sequence it back to read the information in the test tube.

IRA FLATOW: Wow, and it will stay for how long?

GEORG SEELIG: It really depends on how you treat it. DNA can be very, very stable, like thousands of years if it’s kept away from water. If it’s in water, it won’t last very long.

IRA FLATOW: I’m Ira Flatow. This is Science Friday from PRI, Public Radio International, talking with Georg Seelig at the University of Washington in Seattle. How do you then get the information back out? What kind of mechanisms can use to retrieve it from the DNA?

GEORG SEELIG: OK, that’s a 2-step process. First, you use a technique called PCR, which allows you to essentially pick out, if you have in your pool of DNA many different files and you just want to read one. You can use PCR to do random access, and pick out just the specific file you like and amplify that. And then you take what you amplify and you put it on the DNA sequencer, which is exactly the same type of device that Dr. Venter is using. And so then that allows you to read the information back.

IRA FLATOW: And give me an idea how much data you can store in DNA.

GEORG SEELIG: So in our paper, we just stored about 150 kilobytes, which is not a lot, obviously. I think it was four images. But I think that’s really changing very rapidly. I think just like 10 years ago, 20 years ago, people were able to synthesize maybe a few strands of DNA corresponding to a few characters that you could store. And I think in just a few years, we’ll be able to store orders of magnitude more data.

IRA FLATOW: Give me an idea what you mean. How many pictures, photos, videos? Is it terabytes we’re talking about?

GEORG SEELIG: Oh, I think eventually for sure. I mean, I think eventually exabytes are realistic. So I think we can store, not next year, but maybe 10 years from now or so, I think we can store the data center sized amounts of information in DNA.

IRA FLATOW: A whole data center size, how much space would that take?

IRA FLATOW: An exabyte, if you just look at the DNA alone with a little bit of packaging, you could argue that it would be like a sugar cube. I think realistically, you’ll have to build infrastructure around that sugar cube. So the effective density will be less than that. But it could be really small. It’s definitely denser than any storage material that’s out there, including magnetic tape which is currently the gold standard.

GEORG SEELIG: So you could take a whole data center and put it in a sugar cube sized thing.

IRA FLATOW: I think eventually that will happen. I think that the potential is there. The key thing that needs to happen to get there is that you need to make writing DNA, so DNA synthesis, much, much cheaper, like a million times cheaper. And there’s a lot of questions as to how to do that, but I think it can be done. And if we do it, then this becomes realistic.

IRA FLATOW: And you’re working with Microsoft on this, are you not?

GEORG SEELIG: Exactly. I don’t actually work– yeah.

IRA FLATOW: They’ve got so much business in the cloud now, they must be interested in data storage.

GEORG SEELIG: Exactly. So we’re actually a team. My expertise is on the DNA side. I’m actually a synthetic biologist by training. And then I’m working with people at Microsoft, Karin Straus, Doug Carmean, and in my own department in computer science, with the Luis Ceze and their computer architects. And really, what they are thinking about is how to improve storage. So I think there’s a real interest commercially in making storage cheaper and denser.

IRA FLATOW: Well, Georg Seelig, thank you very much for taking the time to be with us today.

GEORG SEELIG: It’s my pleasure.

IRA FLATOW: Associate Professor of Electrical and Computer Science at the University of Washington in Seattle.

Copyright © 2016 Science Friday Initiative. All rights reserved. Science Friday transcripts are produced on a tight deadline by 3Play Media. Fidelity to the original aired/published audio or video file might vary, and text might be updated or amended in the future. For the authoritative record of ScienceFriday’s programming, please visit the original aired/published recording. For terms of use and more information, visit our policies pages at http://www.sciencefriday.com/about/policies.

Cookie	Duration	Description
_abck	1 year	This cookie is used to detect and defend when a client attempt to replay a cookie.This cookie manages the interaction with online bots and takes the appropriate actions.
ASP.NET_SessionId	session	Issued by Microsoft's ASP.NET Application, this cookie stores session data during a user's website visit.
AWSALBCORS	7 days	This cookie is managed by Amazon Web Services and is used for load balancing.
bm_sz	4 hours	This cookie is set by the provider Akamai Bot Manager. This cookie is used to manage the interaction with the online bots. It also helps in fraud preventions
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
csrftoken	past	This cookie is associated with Django web development platform for python. Used to help protect the website against Cross-Site Request Forgery attacks
JSESSIONID	session	The JSESSIONID cookie is used by New Relic to store a session identifier so that New Relic can monitor session counts for an application.
nlbi_972453	session	A load balancing cookie set to ensure requests by a client are sent to the same origin server.
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
TiPMix	1 hour	The TiPMix cookie is set by Azure to determine which web server the users must be directed to.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
visid_incap_972453	1 year	SiteLock sets this cookie to provide cloud-based website security services.
X-Mapping-fjhppofk	session	This cookie is used for load balancing purposes. The cookie does not store any personally identifiable data.
x-ms-routing-name	1 hour	Azure sets this cookie for routing production traffic by specifying the production slot.

Cookie	Duration	Description
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
bcookie	2 years	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser ID.
bscookie	2 years	LinkedIn sets this cookie to store performed actions on the website.
lang	session	LinkedIn sets this cookie to remember a user's language setting.
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
S	1 hour	Used by Yahoo to provide ads, content or analytics.
sp_landing	1 day	The sp_landing is set by Spotify to implement audio content from Spotify on the website and also registers information on user interaction related to the audio content.
sp_t	1 year	The sp_t cookie is set by Spotify to implement audio content from Spotify on the website and also registers information on user interaction related to the audio content.
UserMatchHistory	1 month	LinkedIn sets this cookie for LinkedIn Ads ID syncing.

Cookie	Duration	Description
__jid	30 minutes	Cookie used to remember the user's Disqus login credentials across websites that use Disqus.
_gat	1 minute	This cookie is installed by Google Universal Analytics to restrain request rate and thus limit the collection of data on high traffic sites.
_gat_UA-28243511-22	1 minute	A variation of the _gat cookie set by Google Analytics and Google Tag Manager to allow website owners to track visitor behaviour and measure site performance. The pattern element in the name contains the unique identity number of the account or website it relates to.
AWSALB	7 days	AWSALB is an application load balancer cookie set by Amazon Web Services to map the session to the target.
countryCode	session	This cookie is used for storing country code selected from country selector.

Cookie	Duration	Description
_fbp	3 months	This cookie is set by Facebook to display advertisements when either on Facebook or on a digital platform powered by Facebook advertising, after visiting the website.
fr	3 months	Facebook sets this cookie to show relevant advertisements to users by tracking user behaviour across the web, on sites that have Facebook pixel or Facebook social plugin.
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
NID	6 months	NID cookie, set by Google, is used for advertising purposes; to limit the number of times the user sees an ad, to mute unwanted ads, and to measure the effectiveness of ads.
personalization_id	2 years	Twitter sets this cookie to integrate and share features for social media and also store information about how the user uses the website, for tracking and targeting.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
vglnk.Agent.p	1 year	VigLink sets this cookie to track the user behaviour and also limit the ads displayed, in order to ensure relevant advertising.
vglnk.PartnerRfsh.p	1 year	VigLink sets this cookie to show users relevant advertisements and also limit the number of adverts that are shown to them.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
_dc_gtm_UA-28243511-20	1 minute	No description
abtest-identifier	1 year	No description
AnalyticsSyncHistory	1 month	No description
ARRAffinityCU	session	No description available.
ccc	1 month	No description
COMPASS	1 hour	No description
cookies.js_dtest	session	No description
debug	never	No description available.
donation-identifier	1 year	No description
f	never	No description available.
GFE_RTT	5 minutes	No description available.
incap_ses_1185_2233503	session	No description
incap_ses_1185_823975	session	No description
incap_ses_1185_972453	session	No description
incap_ses_1319_2233503	session	No description
incap_ses_1319_823975	session	No description
incap_ses_1319_972453	session	No description
incap_ses_1364_2233503	session	No description
incap_ses_1364_823975	session	No description
incap_ses_1364_972453	session	No description
incap_ses_1580_2233503	session	No description
incap_ses_1580_823975	session	No description
incap_ses_1580_972453	session	No description
incap_ses_198_2233503	session	No description
incap_ses_198_823975	session	No description
incap_ses_198_972453	session	No description
incap_ses_340_2233503	session	No description
incap_ses_340_823975	session	No description
incap_ses_340_972453	session	No description
incap_ses_374_2233503	session	No description
incap_ses_374_823975	session	No description
incap_ses_374_972453	session	No description
incap_ses_375_2233503	session	No description
incap_ses_375_823975	session	No description
incap_ses_375_972453	session	No description
incap_ses_455_2233503	session	No description
incap_ses_455_823975	session	No description
incap_ses_455_972453	session	No description
incap_ses_8076_2233503	session	No description
incap_ses_8076_823975	session	No description
incap_ses_8076_972453	session	No description
incap_ses_867_2233503	session	No description
incap_ses_867_823975	session	No description
incap_ses_867_972453	session	No description
incap_ses_9117_2233503	session	No description
incap_ses_9117_823975	session	No description
incap_ses_9117_972453	session	No description
li_gc	2 years	No description
loglevel	never	No description available.
msToken	10 days	No description

Subscribe to Science Friday

Segment Guests

Segment Transcript

Explore More

Minimalist Biology: Craig Venter’s Latest Life Form

Mapping Out the Future of Genomics