Scientists Release The First Fully Complete Human Genome
Now, after a years-long global collaboration, scientists have finally released the first fully complete assembly of the human genome. Researchers believe these missing pieces might be the key to understanding how DNA varies between people.
Six scientific papers on the topic were published in a special edition of the academic journal Science this week.
Ira talks with Karen Miga and Adam Phillippy, co-founders of the Telomere to Telomere Consortium, an international effort that led to the assembly of this new fully complete human genome.
Karen Miga is an assistant professor of bimolecular engineering and the associate director of the UC Santa Cruz Genomics Institute, based in Santa Cruz California. Adam Phillippy is head of the Genome Informatics Section and senior investigator in the computational and statistical genomics branch at the National Human Genome Research Institute at the National Institutes of Health, based in Bethesda, Maryland.
Invest in quality science journalism by making a donation to Science Friday.
Karen Miga is an assistant professor of Biomolecular Engineering and the Associate Director of the UC Santa Cruz Genomics Institute in Santa Cruz, California.
Adam Phillippy is head of the Genome Informatics Section, and a senior investigator in the National Human Genome Research Institute, National Institutes of Health in Bethesda, Maryland.
IRA FLATOW: This is Science Friday. I’m Ira Flatow. Two decades ago scientists announced a monumental scientific achievement. They had sequenced the human genome. But what you might not know is that there were gaps in that original sequence. About 8% of the sequence was completely blank and a lot of that used to be dismissively called junk DNA. Well, now after a years-long global collaboration, scientists have finally released the first fully complete assembly of the human genome. Researchers believe these missing pieces might be the key to understanding how DNA varies between people. Six scientific papers on the topic were published in a special edition of The Journal of Science this week. And joining me now are my guests to talk about it. Karen Miga, assistant professor of biomolecular engineering and associate director of the UC Santa Cruz Genomics Institute. And Adam Phillippy, senior investigator at the National Human Genome Research Institute, that’s at NIH in Bethesda, Maryland. Welcome to Science Friday.
KAREN MIGA: Yeah, thanks so much for having us.
ADAM PHILLIPPY: Thanks, Ira, pleasure to be here.
IRA FLATOW: Nice to have you. Let me begin with this question of this Telomere to Telomere Consortium that you have founded, an international effort that led to the assembly of this new fully complete human genome. Dr. Miga, tell us the significance of that name.
KAREN MIGA: Your listeners may recognize that the telomeres at the end of our chromosomes. And so we chose telomere to telomere to really illustrate that we were trying to complete an entire chromosome in one assembly end to end.
IRA FLATOW: Not just broken pizza pieces.
KAREN MIGA: Exactly. Yeah, and it’s been really wonderful because it really does create a full view of a human chromosome for the first time, which is exciting.
IRA FLATOW: So let’s get into this. What’s on these newly sequenced parts of the human genome, Dr. Miga?
KAREN MIGA: Right, so the new sequences represents essentially regions of our genome that are known to be important for fundamental cellular processes. When we talk about regions like the centromere, which is pretty exciting for my own research, we know they’re responsible for how our chromosomes are transmitted every single time our cells divide. So changes in these sites in our genome could actually cause errors that could lead to all kinds of health outcomes. In total, we’re talking about 200 million bases, so it’s a lot.
IRA FLATOW: That is a lot. That’s a large percentage. We said 8%. Could you make a whole new genome out of that kind of material?
KAREN MIGA: Well, I think when we talk about a chromosome’s worth, 200 million bases is about the size of one of our largest chromosomes. If we look at the information, it’s our third largest. So it’s slightly bigger than chromosome 3.
IRA FLATOW: Now, are these parts of the genome that scientists used to refer to as junk DNA? Is that what you have actually identified.
KAREN MIGA: I think it would be hard to consider the sequences in these regions to be junk. I think that word– Ira, you’ll probably agree with me– is probably outdated and it’s just used as a way to explain processes we don’t yet understand.
IRA FLATOW: How dumb was that?
KAREN MIGA: I really think that these regions are misunderstood. They don’t fall in our standard textbook definition of how genomes are organized. They do have genes in these regions, we do have these standard organizations, but they’re really enriched with a unique structure where you have a sequence that’s found in a head to tail, head to tail orientation for millions of bases. And why our genome is arranged in this way in the corners of our genome I think remains an unknown.
IRA FLATOW: And this has been your life’s work, hasn’t it?
KAREN MIGA: It has. I’ve been passionate about satellite DNAs since graduate school. And so I was really lucky to be able to pair up with such an amazing scientist like Adam Phillippy to make this dream come true because he’s really the other side of this, where I’ve been focusing so much on the satellite DNAs and the biology of these unique genetic elements– having that type of mastery over assembly really brought us to where we are now.
IRA FLATOW: Well, let me talk to Adam. Dr. Phillippy, why did it take so long to sequence this final 8% of the genome?
ADAM PHILLIPPY: After the Human Genome Project finished in 2004, the holes that were left were the most repetitive bits of the genome. So imagine you have a puzzle and there’s a Bowl of Skittles over in the corner of that puzzle. And that’s the hardest bit to put together because all of the Skittles look the same. Those types of repeats make jigsaw puzzles hard, just like they make putting a genome back together again difficult. And so it was those repeats that had us interested from a computational perspective and gave us a big challenge in putting this back together again.
So the reason that they weren’t done originally is the technology just wasn’t up to the task back in the early 2000s. We could only read small bits of the genome at a time. And when the puzzle is made of small pieces, it’s a lot harder to put together than when they’re made of big pieces. And so for this project we came in with new sequencing technologies that have been developed over the last decades that can read up to a million bases of sequence at a time compared to in the early 2000s where we were limited to a few hundred bases.
IRA FLATOW: Now knowing what we know now, if you have the total sequence, how does this move us forward in learning about DNA?
ADAM PHILLIPPY: Well, now that we’ve figured out how to do it and we can reconstruct these repetitive regions for the first time, it allows us to do that again now for many more human genomes. Or if we have a patient come into the clinic, we can sequence their complete genome, line it up against this new complete reference sequence, and we’re able to get a more comprehensive picture of all of the potential variants that they have within their genome. And then over time we’ll be able to link those newly discovered variants to potential disease associations, for instance.
IRA FLATOW: Is there one disease out there or one treatment that was waiting for this total sequence to be unraveled, do you think, and now it’s in the crosshairs?
ADAM PHILLIPPY: The one I would probably point to first are the so-called Robertsonian translocations. These occur in one in 1,000 births and it’s a fusion essentially of two different chromosomes. And we’ve revealed for the first time five entirely new chromosome arms, and they are directly related to this type of chromosomal anomaly. And a lot of our collaborators that are interested in these translocations now have the base precise sequence that they can look into and try to understand how these form and what the potential repercussions could be.
IRA FLATOW: Will it also tell us how we’re different from other animals, other primates close to us?
ADAM PHILLIPPY: Yeah, absolutely. In fact, these repetitive regions are some of the most dynamic, the most variable regions of the genome compared to our nearest primate relatives, the most variable between individual humans. So we have some hope that they’ll be very exciting discoveries within these regions that might hold the key to what makes our genome uniquely human.
KAREN MIGA: It’s a contrast to how we think about function, with everything having to be deeply conserved. And these regions which we know are functional or are placed in these critically functional regions, they’re, as Adam mentioned, extremely dynamic and in many cases human-specific. So it’s in contrast to what we’re used to thinking about in terms of how we think about evolution and conservation.
IRA FLATOW: Now that we have these new tools you’re talking about, how far out are we from each one of us getting our own individual genomes mapped?
ADAM PHILLIPPY: That’s definitely the goal of this consortium is to help develop the technology to a point that a project like this, to get a complete human genome, can be replicated and become routine. And I think within 10 years it will be routine to have your complete diploid genome as just part of your medical record.
IRA FLATOW: At a cost of what?
ADAM PHILLIPPY: So for the original Human Genome Project, just to put things in perspective in today’s dollars, I think it was around $5 billion and a 10-year plus effort. This project, we estimate maybe a couple of million from start to end. But the technologies that we developed along the way and the technologies that have come from industry and elsewhere have driven this number down so that if we were to redo this project today, we could probably get it done in a month for tens of thousands of dollars. But the trajectory of technology advances just continued on this exponential pace for 30 years. And I think within the next 10 years we can easily get it to under a day and very likely this mystical $1,000 genome.
KAREN MIGA: In addition to this economic benefit of making it more affordable and more scalable– but I think that in the process of moving in that direction, we’re giving the research community time to study these sequences and balance it with the benefit, going back against that statement that this is junk DNA. Now providing the genomic community and the research community with these sequences for the first time, hopefully they’ll see why it’s so useful to have this type of comprehensive variant scan.
IRA FLATOW: Interesting. Dr. Phillippy, I know that part of this research is that your team has mapped some missing pieces in the Y chromosome. What was missing and why is this such a big achievement?
ADAM PHILLIPPY: So in the papers that are coming out this week, we actually didn’t describe the Y chromosome. The particular cell line that we chose to sequence initially has two copies of the X chromosome. But in the year that followed since we completed that genome, we moved on and got a different cell line that had a Y chromosome and replicated the same effort for this particular one.
There’s 50% missing in the current Y chromosome reference to date from the Human Genome Project in 2004. A lot of that is this highly repetitive DNA that Karen was mentioning earlier. And it’s important for the same reasons that we’ve completed the rest of the genome. This is filling in all of the missing pieces in the puzzle and now we can look at those regions and identify the variants and understand the functional consequences of the sequence in those regions.
IRA FLATOW: And the Y chromosome is passed paternally, right, so that’s quite important to know about.
ADAM PHILLIPPY: Yeah, that’s correct. It’s commonly used in genealogical studies because of that fact, used to build family trees. And anybody that’s used 23andMe and any of those other services will have benefited from that.
IRA FLATOW: Karen, why is this part so important?
KAREN MIGA: Well, the Y chromosome, I think that’s a huge question. I think we haven’t quite figured out what it means to lose a giant amount of tandem repeats that exist on the Q arm, half of the chromosome. I do know that as we age, these parts of our genome that are typically repetitive, they change in the way our cells regulate how they’re organized. And over time the Y chromosome is sometimes lost, so it does it does offer some new insight that something about these particular sequences and all the proteins that are binding to them present a huge unknown for people to start thinking about what it’s doing in the cell and how it could be influenced with the gain or loss of a Y.
ADAM PHILLIPPY: On these very large tandem repeats on the Y, it’s not just those tandem repeats that we’ve added, but also added a number of genes nearby and around those tandem repeats, increasing, not only the sequence content, but the [INAUDIBLE] content of the Y as well.
IRA FLATOW: Adam, you started this project to complete the human genome back in 2018. But the second half of the project, the computational end, took place during the pandemic and, in fact, a big breakthrough happened right in the middle in the spring of 2020. Can you tell us about that?
ADAM PHILLIPPY: Yeah, we were in many ways fortunate that all of the sequencing and lab work that generated the data for this project happened before the pandemic. And so we were sitting on that data spring of 2020 when COVID outbreak hit. And a postdoc in my lab, Sergey Nerk, who was leading the computational analysis, came to me with the early look at that data, trying to assemble it for the first time. And when we looked at it, he put it up on his screen, and showed me, and everybody in the room saw that, and thought, wow, we actually have a chance at succeeding here. It gave us the clearest picture we had seen to date.
And so we rounded up all of our colleagues that were experts in this genome assembly process and worked over the course of that summer for about three months and didn’t really expect to complete every chromosome. I would have been happy if we just got five done, but, come August, all of it had snapped together. We had all of the chromosomes complete and ready to validate. And it was just a tremendously exciting summer and gave us something positive to focus on during those difficult times.
IRA FLATOW: Do you think that everyone working from home and focused on this project– because it was a worldwide project– did that perhaps get you to where you could get to the endpoint faster?
ADAM PHILLIPPY: Yeah, it’s always tricky to speculate, but I usually have a busy travel schedule and I was home in my basement every day working. And so I definitely could focus a lot more on the work at hand and not be diverted with all of these other administrative tasks, travel tasks. And also just all of the collaborative tools– we were on Slack and Zoom all day long talking to each other constantly. It definitely helped make progress fast.
IRA FLATOW: I’m Ira Flatow and this is Science Friday from WNYC Studios. Dr. Miga, how many scientists internationally worked on this project?
KAREN MIGA: Well, when you look over our author list in the main paper, I think we’re approaching 100 scientists in total. When we started, it was just, more or less, Adam and myself asking is it possible and just starting to sequence and work together. But when we opened the consortium, it was really a grassroots effort. It was an open door, anyone could join. And we soon had contributors from around the world.
IRA FLATOW: Now, I understand you published the complete genome via preprint server last summer. Are scientists already using it in the lab?
KAREN MIGA: For sure. I mean, we’ve had hundreds of people already download our preprint, citing our preprint. So I think that this is really demonstrating the utility of our work and the fact that there’s going to be new discoveries that will be made and announced in the future.
ADAM PHILLIPPY: This kind of philosophy of the group to be very open and inviting to everyone has given us those new directions. And it’s also given us a lot of confidence that what we’re looking at is correct because we’ve had, Karen mentioned, hundreds of people looking at every corner of this genome over the past three years and that gives us a lot of confidence that we’ve done it correctly.
IRA FLATOW: I remember when the project was first announced 20 years ago when they talked about, hey, we’ve got the human genome figured out. The scientists were saying, well, wait a minute, that was actually the easy part. The difficult– the real work is going to come into figuring out what the functionality is and how we apply it. Dr. Phillippy, do you think this is where we are again? What comes next?
ADAM PHILLIPPY: Yeah, exactly. We’ve spent 20 years digging into what was produced by the Human Genome Project and have just scratched the surface of that. And now we’re at this again where we’ve got another 8% and we’ve been looking at the same parts of the genome for the past 20 years. So this represents a new 200 million bases to be investigated and so, yeah, we’re starting over with this. It’s brand new, unknown sequence and the same excitement will repeat itself now and another 20 years of digging into this new sequence.
IRA FLATOW: And, Dr. Miga, what do you hope this new sequence will bring?
KAREN MIGA: I hope that these new sequences will bring some new insight into what these repetitive sequences are contributing to in terms of how our cell functions, how it contributes to cell identity in early development, and how it contributes towards human disease. I think that there’s so many open questions here, and I think they’ve just had a roadblock because of the lack of a reference genome. And, in fact, many scientists and researchers around the world probably already have data now that they could just map to our reference genome without even doing another experiment and start to find new discoveries and new information just because they’ve been ignoring it for so long.
IRA FLATOW: It’s so hard to portray– for we on this side of the microphone to portray the excitement that must be going on with scientists who have completed this. Would I be correct in assuming that?
ADAM PHILLIPPY: Well, it’s been the most exciting point of my career, for sure.
KAREN MIGA: Same. This is really great. This is the most joy I’ve had in my career.
IRA FLATOW: Thank you both for taking time to be with us today. And I hope you’ll come back and tell us about these exciting times when they happen.
KAREN MIGA: Happy to. Thanks so much for having us.
ADAM PHILLIPPY: Yeah, if you’re still here in 20 years, Ira, we’ll be back.
IRA FLATOW: It’s a date. We’ll meet here. Thank you both for the work you do and for taking time to be with us today. Karen Miga, assistant professor of biomolecular engineering, associate director of the UC Santa Cruz Genomics Institute. And Adam Phillippy, senior investigator, head of the genome informatics section at the National Human Genome Research Institute that’s at NIH in Bethesda, Maryland.