Scientists Release The First Fully Complete Human Genome

Subscribe to Science Friday

Two decades ago, scientists announced they had sequenced the human genome. What you might not know is that there were gaps in that original sequence—about 8% was completely blank.

Now, after a years-long global collaboration, scientists have finally released the first fully complete assembly of the human genome. Researchers believe these missing pieces might be the key to understanding how DNA varies between people.

Six scientific papers on the topic were published in a special edition of the academic journal Science this week.

Ira talks with Karen Miga and Adam Phillippy, co-founders of the Telomere to Telomere Consortium, an international effort that led to the assembly of this new fully complete human genome.

Karen Miga is an assistant professor of bimolecular engineering and the associate director of the UC Santa Cruz Genomics Institute, based in Santa Cruz California. Adam Phillippy is head of the Genome Informatics Section and senior investigator in the computational and statistical genomics branch at the National Human Genome Research Institute at the National Institutes of Health, based in Bethesda, Maryland.

Donate To Science Friday

Invest in quality science journalism by making a donation to Science Friday.

Donate

Segment Guests

Karen Miga

Karen Miga is an assistant professor of Biomolecular Engineering and the Associate Director of the UC Santa Cruz Genomics Institute in Santa Cruz, California.

Segment Transcript

IRA FLATOW: This is Science Friday. I’m Ira Flatow. Two decades ago scientists announced a monumental scientific achievement. They had sequenced the human genome. But what you might not know is that there were gaps in that original sequence. About 8% of the sequence was completely blank and a lot of that used to be dismissively called junk DNA. Well, now after a years-long global collaboration, scientists have finally released the first fully complete assembly of the human genome. Researchers believe these missing pieces might be the key to understanding how DNA varies between people. Six scientific papers on the topic were published in a special edition of The Journal of Science this week. And joining me now are my guests to talk about it. Karen Miga, assistant professor of biomolecular engineering and associate director of the UC Santa Cruz Genomics Institute. And Adam Phillippy, senior investigator at the National Human Genome Research Institute, that’s at NIH in Bethesda, Maryland. Welcome to Science Friday.

KAREN MIGA: Yeah, thanks so much for having us.

ADAM PHILLIPPY: Thanks, Ira, pleasure to be here.

IRA FLATOW: Nice to have you. Let me begin with this question of this Telomere to Telomere Consortium that you have founded, an international effort that led to the assembly of this new fully complete human genome. Dr. Miga, tell us the significance of that name.

KAREN MIGA: Your listeners may recognize that the telomeres at the end of our chromosomes. And so we chose telomere to telomere to really illustrate that we were trying to complete an entire chromosome in one assembly end to end.

IRA FLATOW: Not just broken pizza pieces.

KAREN MIGA: Exactly. Yeah, and it’s been really wonderful because it really does create a full view of a human chromosome for the first time, which is exciting.

IRA FLATOW: So let’s get into this. What’s on these newly sequenced parts of the human genome, Dr. Miga?

KAREN MIGA: Right, so the new sequences represents essentially regions of our genome that are known to be important for fundamental cellular processes. When we talk about regions like the centromere, which is pretty exciting for my own research, we know they’re responsible for how our chromosomes are transmitted every single time our cells divide. So changes in these sites in our genome could actually cause errors that could lead to all kinds of health outcomes. In total, we’re talking about 200 million bases, so it’s a lot.

IRA FLATOW: That is a lot. That’s a large percentage. We said 8%. Could you make a whole new genome out of that kind of material?

KAREN MIGA: Well, I think when we talk about a chromosome’s worth, 200 million bases is about the size of one of our largest chromosomes. If we look at the information, it’s our third largest. So it’s slightly bigger than chromosome 3.

IRA FLATOW: Now, are these parts of the genome that scientists used to refer to as junk DNA? Is that what you have actually identified.

KAREN MIGA: I think it would be hard to consider the sequences in these regions to be junk. I think that word– Ira, you’ll probably agree with me– is probably outdated and it’s just used as a way to explain processes we don’t yet understand.

IRA FLATOW: How dumb was that?

KAREN MIGA: I really think that these regions are misunderstood. They don’t fall in our standard textbook definition of how genomes are organized. They do have genes in these regions, we do have these standard organizations, but they’re really enriched with a unique structure where you have a sequence that’s found in a head to tail, head to tail orientation for millions of bases. And why our genome is arranged in this way in the corners of our genome I think remains an unknown.

IRA FLATOW: And this has been your life’s work, hasn’t it?

KAREN MIGA: It has. I’ve been passionate about satellite DNAs since graduate school. And so I was really lucky to be able to pair up with such an amazing scientist like Adam Phillippy to make this dream come true because he’s really the other side of this, where I’ve been focusing so much on the satellite DNAs and the biology of these unique genetic elements– having that type of mastery over assembly really brought us to where we are now.

IRA FLATOW: Well, let me talk to Adam. Dr. Phillippy, why did it take so long to sequence this final 8% of the genome?

ADAM PHILLIPPY: After the Human Genome Project finished in 2004, the holes that were left were the most repetitive bits of the genome. So imagine you have a puzzle and there’s a Bowl of Skittles over in the corner of that puzzle. And that’s the hardest bit to put together because all of the Skittles look the same. Those types of repeats make jigsaw puzzles hard, just like they make putting a genome back together again difficult. And so it was those repeats that had us interested from a computational perspective and gave us a big challenge in putting this back together again.

So the reason that they weren’t done originally is the technology just wasn’t up to the task back in the early 2000s. We could only read small bits of the genome at a time. And when the puzzle is made of small pieces, it’s a lot harder to put together than when they’re made of big pieces. And so for this project we came in with new sequencing technologies that have been developed over the last decades that can read up to a million bases of sequence at a time compared to in the early 2000s where we were limited to a few hundred bases.

IRA FLATOW: Now knowing what we know now, if you have the total sequence, how does this move us forward in learning about DNA?

ADAM PHILLIPPY: Well, now that we’ve figured out how to do it and we can reconstruct these repetitive regions for the first time, it allows us to do that again now for many more human genomes. Or if we have a patient come into the clinic, we can sequence their complete genome, line it up against this new complete reference sequence, and we’re able to get a more comprehensive picture of all of the potential variants that they have within their genome. And then over time we’ll be able to link those newly discovered variants to potential disease associations, for instance.

IRA FLATOW: Is there one disease out there or one treatment that was waiting for this total sequence to be unraveled, do you think, and now it’s in the crosshairs?

ADAM PHILLIPPY: The one I would probably point to first are the so-called Robertsonian translocations. These occur in one in 1,000 births and it’s a fusion essentially of two different chromosomes. And we’ve revealed for the first time five entirely new chromosome arms, and they are directly related to this type of chromosomal anomaly. And a lot of our collaborators that are interested in these translocations now have the base precise sequence that they can look into and try to understand how these form and what the potential repercussions could be.

IRA FLATOW: Will it also tell us how we’re different from other animals, other primates close to us?

ADAM PHILLIPPY: Yeah, absolutely. In fact, these repetitive regions are some of the most dynamic, the most variable regions of the genome compared to our nearest primate relatives, the most variable between individual humans. So we have some hope that they’ll be very exciting discoveries within these regions that might hold the key to what makes our genome uniquely human.

KAREN MIGA: It’s a contrast to how we think about function, with everything having to be deeply conserved. And these regions which we know are functional or are placed in these critically functional regions, they’re, as Adam mentioned, extremely dynamic and in many cases human-specific. So it’s in contrast to what we’re used to thinking about in terms of how we think about evolution and conservation.

IRA FLATOW: Now that we have these new tools you’re talking about, how far out are we from each one of us getting our own individual genomes mapped?

ADAM PHILLIPPY: That’s definitely the goal of this consortium is to help develop the technology to a point that a project like this, to get a complete human genome, can be replicated and become routine. And I think within 10 years it will be routine to have your complete diploid genome as just part of your medical record.

IRA FLATOW: At a cost of what?

ADAM PHILLIPPY: So for the original Human Genome Project, just to put things in perspective in today’s dollars, I think it was around $5 billion and a 10-year plus effort. This project, we estimate maybe a couple of million from start to end. But the technologies that we developed along the way and the technologies that have come from industry and elsewhere have driven this number down so that if we were to redo this project today, we could probably get it done in a month for tens of thousands of dollars. But the trajectory of technology advances just continued on this exponential pace for 30 years. And I think within the next 10 years we can easily get it to under a day and very likely this mystical $1,000 genome.

KAREN MIGA: In addition to this economic benefit of making it more affordable and more scalable– but I think that in the process of moving in that direction, we’re giving the research community time to study these sequences and balance it with the benefit, going back against that statement that this is junk DNA. Now providing the genomic community and the research community with these sequences for the first time, hopefully they’ll see why it’s so useful to have this type of comprehensive variant scan.

IRA FLATOW: Interesting. Dr. Phillippy, I know that part of this research is that your team has mapped some missing pieces in the Y chromosome. What was missing and why is this such a big achievement?

ADAM PHILLIPPY: So in the papers that are coming out this week, we actually didn’t describe the Y chromosome. The particular cell line that we chose to sequence initially has two copies of the X chromosome. But in the year that followed since we completed that genome, we moved on and got a different cell line that had a Y chromosome and replicated the same effort for this particular one.

There’s 50% missing in the current Y chromosome reference to date from the Human Genome Project in 2004. A lot of that is this highly repetitive DNA that Karen was mentioning earlier. And it’s important for the same reasons that we’ve completed the rest of the genome. This is filling in all of the missing pieces in the puzzle and now we can look at those regions and identify the variants and understand the functional consequences of the sequence in those regions.

IRA FLATOW: And the Y chromosome is passed paternally, right, so that’s quite important to know about.

ADAM PHILLIPPY: Yeah, that’s correct. It’s commonly used in genealogical studies because of that fact, used to build family trees. And anybody that’s used 23andMe and any of those other services will have benefited from that.

IRA FLATOW: Karen, why is this part so important?

KAREN MIGA: Well, the Y chromosome, I think that’s a huge question. I think we haven’t quite figured out what it means to lose a giant amount of tandem repeats that exist on the Q arm, half of the chromosome. I do know that as we age, these parts of our genome that are typically repetitive, they change in the way our cells regulate how they’re organized. And over time the Y chromosome is sometimes lost, so it does it does offer some new insight that something about these particular sequences and all the proteins that are binding to them present a huge unknown for people to start thinking about what it’s doing in the cell and how it could be influenced with the gain or loss of a Y.

ADAM PHILLIPPY: On these very large tandem repeats on the Y, it’s not just those tandem repeats that we’ve added, but also added a number of genes nearby and around those tandem repeats, increasing, not only the sequence content, but the [INAUDIBLE] content of the Y as well.

IRA FLATOW: Adam, you started this project to complete the human genome back in 2018. But the second half of the project, the computational end, took place during the pandemic and, in fact, a big breakthrough happened right in the middle in the spring of 2020. Can you tell us about that?

ADAM PHILLIPPY: Yeah, we were in many ways fortunate that all of the sequencing and lab work that generated the data for this project happened before the pandemic. And so we were sitting on that data spring of 2020 when COVID outbreak hit. And a postdoc in my lab, Sergey Nerk, who was leading the computational analysis, came to me with the early look at that data, trying to assemble it for the first time. And when we looked at it, he put it up on his screen, and showed me, and everybody in the room saw that, and thought, wow, we actually have a chance at succeeding here. It gave us the clearest picture we had seen to date.

And so we rounded up all of our colleagues that were experts in this genome assembly process and worked over the course of that summer for about three months and didn’t really expect to complete every chromosome. I would have been happy if we just got five done, but, come August, all of it had snapped together. We had all of the chromosomes complete and ready to validate. And it was just a tremendously exciting summer and gave us something positive to focus on during those difficult times.

IRA FLATOW: Do you think that everyone working from home and focused on this project– because it was a worldwide project– did that perhaps get you to where you could get to the endpoint faster?

ADAM PHILLIPPY: Yeah, it’s always tricky to speculate, but I usually have a busy travel schedule and I was home in my basement every day working. And so I definitely could focus a lot more on the work at hand and not be diverted with all of these other administrative tasks, travel tasks. And also just all of the collaborative tools– we were on Slack and Zoom all day long talking to each other constantly. It definitely helped make progress fast.

IRA FLATOW: I’m Ira Flatow and this is Science Friday from WNYC Studios. Dr. Miga, how many scientists internationally worked on this project?

KAREN MIGA: Well, when you look over our author list in the main paper, I think we’re approaching 100 scientists in total. When we started, it was just, more or less, Adam and myself asking is it possible and just starting to sequence and work together. But when we opened the consortium, it was really a grassroots effort. It was an open door, anyone could join. And we soon had contributors from around the world.

IRA FLATOW: Now, I understand you published the complete genome via preprint server last summer. Are scientists already using it in the lab?

KAREN MIGA: For sure. I mean, we’ve had hundreds of people already download our preprint, citing our preprint. So I think that this is really demonstrating the utility of our work and the fact that there’s going to be new discoveries that will be made and announced in the future.

ADAM PHILLIPPY: This kind of philosophy of the group to be very open and inviting to everyone has given us those new directions. And it’s also given us a lot of confidence that what we’re looking at is correct because we’ve had, Karen mentioned, hundreds of people looking at every corner of this genome over the past three years and that gives us a lot of confidence that we’ve done it correctly.

IRA FLATOW: I remember when the project was first announced 20 years ago when they talked about, hey, we’ve got the human genome figured out. The scientists were saying, well, wait a minute, that was actually the easy part. The difficult– the real work is going to come into figuring out what the functionality is and how we apply it. Dr. Phillippy, do you think this is where we are again? What comes next?

ADAM PHILLIPPY: Yeah, exactly. We’ve spent 20 years digging into what was produced by the Human Genome Project and have just scratched the surface of that. And now we’re at this again where we’ve got another 8% and we’ve been looking at the same parts of the genome for the past 20 years. So this represents a new 200 million bases to be investigated and so, yeah, we’re starting over with this. It’s brand new, unknown sequence and the same excitement will repeat itself now and another 20 years of digging into this new sequence.

IRA FLATOW: And, Dr. Miga, what do you hope this new sequence will bring?

KAREN MIGA: I hope that these new sequences will bring some new insight into what these repetitive sequences are contributing to in terms of how our cell functions, how it contributes to cell identity in early development, and how it contributes towards human disease. I think that there’s so many open questions here, and I think they’ve just had a roadblock because of the lack of a reference genome. And, in fact, many scientists and researchers around the world probably already have data now that they could just map to our reference genome without even doing another experiment and start to find new discoveries and new information just because they’ve been ignoring it for so long.

IRA FLATOW: It’s so hard to portray– for we on this side of the microphone to portray the excitement that must be going on with scientists who have completed this. Would I be correct in assuming that?

ADAM PHILLIPPY: Well, it’s been the most exciting point of my career, for sure.

KAREN MIGA: Same. This is really great. This is the most joy I’ve had in my career.

IRA FLATOW: Thank you both for taking time to be with us today. And I hope you’ll come back and tell us about these exciting times when they happen.

KAREN MIGA: Happy to. Thanks so much for having us.

ADAM PHILLIPPY: Yeah, if you’re still here in 20 years, Ira, we’ll be back.

IRA FLATOW: It’s a date. We’ll meet here. Thank you both for the work you do and for taking time to be with us today. Karen Miga, assistant professor of biomolecular engineering, associate director of the UC Santa Cruz Genomics Institute. And Adam Phillippy, senior investigator, head of the genome informatics section at the National Human Genome Research Institute that’s at NIH in Bethesda, Maryland.

Copyright © 2022 Science Friday Initiative. All rights reserved. Science Friday transcripts are produced on a tight deadline by 3Play Media. Fidelity to the original aired/published audio or video file might vary, and text might be updated or amended in the future. For the authoritative record of Science Friday’s programming, please visit the original aired/published recording. For terms of use and more information, visit our policies pages at http://www.sciencefriday.com/about/policies/

Meet the Producers and Host

About Shoshannah Buxbaum

Shoshannah Buxbaum is a producer for Science Friday. She’s particularly drawn to stories about health, psychology, and the environment. She’s a proud New Jersey native and will happily share her opinions on why the state is deserving of a little more love.

About Ira Flatow

Ira Flatow is the founder and host of Science Friday. His green thumb has revived many an office plant at death’s door.

Cookie	Duration	Description
_abck	1 year	This cookie is used to detect and defend when a client attempt to replay a cookie.This cookie manages the interaction with online bots and takes the appropriate actions.
ASP.NET_SessionId	session	Issued by Microsoft's ASP.NET Application, this cookie stores session data during a user's website visit.
AWSALBCORS	7 days	This cookie is managed by Amazon Web Services and is used for load balancing.
bm_sz	4 hours	This cookie is set by the provider Akamai Bot Manager. This cookie is used to manage the interaction with the online bots. It also helps in fraud preventions
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
csrftoken	past	This cookie is associated with Django web development platform for python. Used to help protect the website against Cross-Site Request Forgery attacks
JSESSIONID	session	The JSESSIONID cookie is used by New Relic to store a session identifier so that New Relic can monitor session counts for an application.
nlbi_972453	session	A load balancing cookie set to ensure requests by a client are sent to the same origin server.
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
TiPMix	1 hour	The TiPMix cookie is set by Azure to determine which web server the users must be directed to.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
visid_incap_972453	1 year	SiteLock sets this cookie to provide cloud-based website security services.
X-Mapping-fjhppofk	session	This cookie is used for load balancing purposes. The cookie does not store any personally identifiable data.
x-ms-routing-name	1 hour	Azure sets this cookie for routing production traffic by specifying the production slot.

Cookie	Duration	Description
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
bcookie	2 years	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser ID.
bscookie	2 years	LinkedIn sets this cookie to store performed actions on the website.
lang	session	LinkedIn sets this cookie to remember a user's language setting.
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
S	1 hour	Used by Yahoo to provide ads, content or analytics.
sp_landing	1 day	The sp_landing is set by Spotify to implement audio content from Spotify on the website and also registers information on user interaction related to the audio content.
sp_t	1 year	The sp_t cookie is set by Spotify to implement audio content from Spotify on the website and also registers information on user interaction related to the audio content.
UserMatchHistory	1 month	LinkedIn sets this cookie for LinkedIn Ads ID syncing.

Cookie	Duration	Description
__jid	30 minutes	Cookie used to remember the user's Disqus login credentials across websites that use Disqus.
_gat	1 minute	This cookie is installed by Google Universal Analytics to restrain request rate and thus limit the collection of data on high traffic sites.
_gat_UA-28243511-22	1 minute	A variation of the _gat cookie set by Google Analytics and Google Tag Manager to allow website owners to track visitor behaviour and measure site performance. The pattern element in the name contains the unique identity number of the account or website it relates to.
AWSALB	7 days	AWSALB is an application load balancer cookie set by Amazon Web Services to map the session to the target.
countryCode	session	This cookie is used for storing country code selected from country selector.

Cookie	Duration	Description
_fbp	3 months	This cookie is set by Facebook to display advertisements when either on Facebook or on a digital platform powered by Facebook advertising, after visiting the website.
fr	3 months	Facebook sets this cookie to show relevant advertisements to users by tracking user behaviour across the web, on sites that have Facebook pixel or Facebook social plugin.
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
NID	6 months	NID cookie, set by Google, is used for advertising purposes; to limit the number of times the user sees an ad, to mute unwanted ads, and to measure the effectiveness of ads.
personalization_id	2 years	Twitter sets this cookie to integrate and share features for social media and also store information about how the user uses the website, for tracking and targeting.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
vglnk.Agent.p	1 year	VigLink sets this cookie to track the user behaviour and also limit the ads displayed, in order to ensure relevant advertising.
vglnk.PartnerRfsh.p	1 year	VigLink sets this cookie to show users relevant advertisements and also limit the number of adverts that are shown to them.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
_dc_gtm_UA-28243511-20	1 minute	No description
abtest-identifier	1 year	No description
AnalyticsSyncHistory	1 month	No description
ARRAffinityCU	session	No description available.
ccc	1 month	No description
COMPASS	1 hour	No description
cookies.js_dtest	session	No description
debug	never	No description available.
donation-identifier	1 year	No description
f	never	No description available.
GFE_RTT	5 minutes	No description available.
incap_ses_1185_2233503	session	No description
incap_ses_1185_823975	session	No description
incap_ses_1185_972453	session	No description
incap_ses_1319_2233503	session	No description
incap_ses_1319_823975	session	No description
incap_ses_1319_972453	session	No description
incap_ses_1364_2233503	session	No description
incap_ses_1364_823975	session	No description
incap_ses_1364_972453	session	No description
incap_ses_1580_2233503	session	No description
incap_ses_1580_823975	session	No description
incap_ses_1580_972453	session	No description
incap_ses_198_2233503	session	No description
incap_ses_198_823975	session	No description
incap_ses_198_972453	session	No description
incap_ses_340_2233503	session	No description
incap_ses_340_823975	session	No description
incap_ses_340_972453	session	No description
incap_ses_374_2233503	session	No description
incap_ses_374_823975	session	No description
incap_ses_374_972453	session	No description
incap_ses_375_2233503	session	No description
incap_ses_375_823975	session	No description
incap_ses_375_972453	session	No description
incap_ses_455_2233503	session	No description
incap_ses_455_823975	session	No description
incap_ses_455_972453	session	No description
incap_ses_8076_2233503	session	No description
incap_ses_8076_823975	session	No description
incap_ses_8076_972453	session	No description
incap_ses_867_2233503	session	No description
incap_ses_867_823975	session	No description
incap_ses_867_972453	session	No description
incap_ses_9117_2233503	session	No description
incap_ses_9117_823975	session	No description
incap_ses_9117_972453	session	No description
li_gc	2 years	No description
loglevel	never	No description available.
msToken	10 days	No description