08/01/25

65 Genomes Expand Our Picture Of Human Genetics

17:04 minutes

The first complete draft of the human genome was published back in 2003. Since then, researchers have worked both to improve the accuracy of human genetic data, and to expand its diversity, looking at the genetics of people from many different backgrounds. Three genetics experts join Host Ira Flatow to talk about a recent close examination of the genomes of 65 individuals from around the world, and how it may help researchers get a better understanding of genomic functioning and diversity.


Further Reading


Donate To Science Friday

Invest in quality science journalism by making a donation to Science Friday.

Donate

Segment Guests

Christine Beck

Dr. Christine Beck is an associate professor of genetics and genome sciences in the University of Connecticut Health Center and the Jackson Laboratory.

Glennis Logsdon

Dr. Glennis Logsdon is an assistant professor of genetics and a core member of the Epigenetics Institute at the University of Pennsylvania.

Adam Phillippy

Dr. Adam Philippy is a Senior Investigator in the Center for Genomics and Data Science Research at the National Human Genome Research Institute at the NIH.

Segment Transcript

IRA FLATOW: This is Science Friday. I’m Ira Flatow. Remember the Human Genome Project? Well, the initial draft was declared complete back in 2003, but researchers then realized that one genome doesn’t paint a complete picture of the human race. So fast forward a decade or so, there came the 1000 Genomes Project, an attempt to expand the picture by sampling people from all over the world with different backgrounds and try to get a fuller look at how we’re the same or how we’re different.

Writing this month in the journal Nature, two teams of researchers take another look at some of those 1,000 genomes, resequencing, reassembling with more advanced techniques to lessen the number of typos, and really firm up how the pieces of the genome puzzle fit together.

Joining me now to talk about how it’s going are my guests, Dr. Christine Beck, associate professor of genetics and genome sciences at the University of Connecticut Health Center and the Jackson Laboratory. She’s in Farmington, Connecticut; Dr. Glennis Logsdon, assistant professor of genetics and core member of the Epigenetics Institute at the University of Pennsylvania in Philly; and Dr. Adam Phillippy, a senior investigator in the Center for Genomics and Data Science Research at the National Human Genome Research Institute at the NIH in Bethesda, Maryland. Welcome all of you to Science Friday.

GLENNIS LOGSDON: Thanks.

ADAM PHILLIPPY: Thanks, Ira. Great to be here.

IRA FLATOW: Nice to have you all. Dr. Logsdon, let me start with you. What’s the thousand-mile view of this paper? What were you all trying to do here?

GLENNIS LOGSDON: Yeah, the goal of the paper was really to generate complete sequence assemblies of 65 diverse human around the world. And we were mainly interested in trying to resolve all sequences within all the 46 chromosomes within these genomes. And that includes the most challenging regions of the genomes that have been kind of plaguing scientists for decades to try to resolve their regions.

And the point in doing this is to understand the genetic and epigenetic variation of these genomes, to understand how we differ in our sequences, our structures, and trying to understand how proteins also differ amongst us. Some of the most interesting regions that we resolved are the centromeres, which are these essential chromosomal regions found on every single chromosome in our genome.

And they’re important for ensuring that our chromosomes are equally and accurately segregated during mitosis and meiosis. And so, for the first time, we were able to solve about 1,200 centromeres from among these 65 genomes and understand how they differ between each of us and what that difference might mean in terms of function.

IRA FLATOW: Does that mean the older genome sequences had problems with them?

GLENNIS LOGSDON: Yes, that’s exactly what that means. Sequencing technologies from about a decade ago weren’t really able to resolve these regions of the genome, and that’s because they’re really highly repetitive and very large. And the sequencing data that was generated back then was smaller than the size of the repeat itself. But now we’re able to actually resolve these regions in their entirety, traverse them from one side of the chromosome to the other, and finally get complete maps at high resolution of these regions of the genome.

IRA FLATOW: Interesting. Dr. Beck, tell me more about the 65 individuals that Dr. Logsdon talked about. Where did these samples come from?

CHRISTINE BECK: Sure. The 65 samples were actually part of the 1000 Genomes Project. So the samples are from around the world. And basically, with data from previous sequencing projects, we had a good handle on how much variation there was between individual samples and a reference genome. So therefore, we chose cell lines from individuals that would maximize the amount of novel sequence variation that was discovered in our work because if we sequenced a bunch of individuals that were really, really similar, we’d have less return on our investment for sequencing each individual person.

So we sequenced these 65 people, and from them, we discovered a large amount of DNA variation from person to person. And part of the reason why that’s important is because we don’t really have a good handle on how much DNA variation there is in some of these complex regions of the genome. So without a good understanding of that kind of background topology of the genome, it’s really, really hard to separate benign differences in the population from pathogenic.

IRA FLATOW: So you really did find diversity, similarity, dissimilarity in all these different genomes?

CHRISTINE BECK: So we looked between all of these genomes, and between them, we cataloged 188,000-plus variations between people that were greater than 50 base pairs in length.

IRA FLATOW: Is that surprising, all those?

CHRISTINE BECK: To a certain degree, it’s not. So previous studies looking at an individual genome versus the reference assembly, we were able to find a decent number of variants, but in these complex regions, like Glennis was talking about in other loci comprised largely of repeat sequences, we were able to uncover vast amounts of differences between individuals in the population that had heretofore been undiscovered because of the quality of sequencing.

So just as a quick side by side example, just four years ago, we published the human genome structural variant consortium– at the time published another study where they cataloged variants between people. And with that sequencing technology, there were almost 2,000 fewer variants between every individual and the reference genome than there are in this recent compendium.

IRA FLATOW: Wow. Dr. Phillippy, what is it that is letting you do this work now? Is it better machinery to actually do sequence? Is it better tools to assemble all the data? What is it?

ADAM PHILLIPPY: Yeah, all of the above, including better computers. You mentioned, at the top, the 1000 Genomes Project, which was initiated almost 20 years ago now. That was the original collection and sequencing of these samples. But it wasn’t just tripos that we had at the time. There was, if you want to continue the book of life analogy, entire pages missing from each of these individual genomes. And as was noted by Glennis, a lot of this was due to repeats. And so these repeating pages, repeating sentences, so to speak, in the genome, are just like when you’re doing a jigsaw puzzle– hard to put back together again when they’re highly repetitive.

IRA FLATOW: So how complete do you think you’ve gotten? Are we finished?

ADAM PHILLIPPY: Yeah, I was on a number of years ago talking about this Telomere to Telomere Project, which was the first completion of an entire human genome. And we estimated, at the time, that that filled in about 8% of what was missing after the initial Human Genome Project from the early 2000s. And I would say that number holds about the same for these genomes. And so for all of genomes presented here, each of them has about 8% more sequence than the initial product of the Human Genome Project from 2003.

And the technology, the sequencing methods are able to read a longer stretch of DNA at a time. That helps. The computational methods have advanced. And we have better and more accurate methods of putting those puzzles back together again. And just handling this sheer quantity of data, generating millions and millions of sequencing reads from all of these individual genomes and putting it back together again, is really only possible with the advance of computing that we’ve seen over the past couple of decades as well.

IRA FLATOW: So how close are we to a final number or a final end to all the sequencing?

ADAM PHILLIPPY: Well, how many billions of people deep would you like to go? Obviously, we’re just scratching the surface here, but as Christine said, we’re trying to do it in a way that of maximizes our return on investment. And so we can go into a population of people and pick out the ones that look most different from one another, sequence those first. And then, over time, we start to saturate the amount of variation that we return. So now we’re talking about 50-ish genomes. In the next year or two, we’ll be talking about thousands of genomes. And if this field just continues to increase exponentially like it has over the past two decades, yeah, the sky’s the limit.

IRA FLATOW: Now you mentioned that this really helped fill in the bits of the genome that repeat over and over. What does that tell us? Why does it repeat over and over? Is there information there?

GLENNIS LOGSDON: Yeah, there’s absolutely information in the repetitive regions of the genome. And not only just information, there’s function I mentioned earlier the centromeres. They’re some of the most mutable, highly dynamic regions of our genome. And they’re so mutable that I haven’t yet seen two centromeres that look identical across humans. And despite this variability, we can see quite a bit of variation that, in fact, affects function. So we find that when certain regions of the centromeres are deleted or expanded or duplicated, this could actually affect the way that the chromosomes segregate during meiosis and mitosis.

IRA FLATOW: Is this repetitive stuff what we once called junk DNA?

GLENNIS LOGSDON: It is. It is exactly what you would call junk DNA. But we know for sure that it’s not junk DNA. In fact, it’s very functional, important regions of our genome. It’s important for life. If we didn’t have these regions of the genome, then we wouldn’t be able to live.

CHRISTINE BECK: I think that that’s kind of an important part to touch on because I think repeats of all classes really shine with these novel techniques and novel sequencing modalities, as well as the assemblies. So both the centromeric repeats that Glennis studied, as well as segmental duplications and complex kind of different ways of arraying those puzzle pieces, from beginning to end, have begun to come to light with these new sequences.

And from that, you can infer whether or not the mutations or the differences between these people have actually affected the coding sequences of genes embedded in these repeats or whether or not it might have changed the cisregulatory landscape. Like, let’s say, the ability to turn a gene off or turn it up to 11 is also altered between some of these genomes. So getting a good picture of that repetitive nature of the underlying sequence is really, really key to understanding differences in function downstream.

IRA FLATOW: Turning a gene up to 11 is something we haven’t spoken about before. How much data do you need to have to tell if something is, quote unquote, “normal” genetically? I just throw that out to any of you.

ADAM PHILLIPPY: I think that’s a great question. And I think it’s really the power of this type of fundamental knowledge generation that we’re doing in these types of projects. Being trained as a computer scientist, I think a lot from that lens.

And in a similar way that something like AlphaFold succeeded at protein prediction, based on this foundation of the protein data bank that was decades in progress, we’re building now this foundation of what typical human genomes look like. And I think, in the next few years, we’ll see genomic language models, so to speak, trained on that data and be able to predict associations quite accurately between atypical sequences and their disease associations.

Exactly how many sequences you need and how many people with diseases and without diseases you need in that training set always depends on the type of the disease, how complex those associations are, and so forth. But I think we’re rapidly approaching a tipping point in being able to make very accurate predictions off of this genomic data alone.

IRA FLATOW: And what kinds of predictions are we talking about?

ADAM PHILLIPPY: So imagine, as a thought experiment, we just mutate a random base in your genome. How well do you think we can predict the effect of that mutation, whether it will be deleterious or not? We’re not quite that good at it, compared to some other aspects of prediction.

But with these resources, we’re getting much, much better– in particular in the noncoding regions of the genome that Dr. Beck was just mentioning. A large fraction of those mutations– you have many millions of them in your genome compared to a typical reference genome, and the vast majority of them are benign. But the few that matter are the important ones. And we’re going to get much better in the coming years at making those predictions and being able to spot, basically at birth with DNA sequencing, those predictions, those variants that will likely result in some form of genetic disease.

IRA FLATOW: Would I be wrong in assuming, Dr. Phillippy, that, as a computer scientist, you’re using a lot of AI here?

ADAM PHILLIPPY: More and more, it’s embedded into a lot of the things we do. The sequencing technologies that we’re using to read off the DNA are using state-of-the-art AI methods to make a prediction from the electrical current or the optical image that you’re seeing to the actual As, Cs, Gs, and Ts. So that translation process uses AI. And yes, these kind of DNA models that I was referring to are also coming of age now, and people are actively using them to make predictions of the suspected pathogenicity of a variant that you see in one genome compared to another.

IRA FLATOW: Final question. I’ll send it to you, Dr. Beck. I remember when the Human Genome Project was announced. It was hailed as a major breakthrough in helping to cure illnesses down the road. How has that been working out? How would you grade the success so far and looking forward?

CHRISTINE BECK: Oh, nice, a softball.

[LAUGHTER]

So I think that at the end of the day, I think that the sequencing of the human genome has allowed a lot of inference into Mendelian diseases. So the architecture of diseases that are highly penetrant in the population, where you have a clear variant and effect– so a cause and effect that you can tie together very clearly– those things have really been helped astronomically by the development of the human genome reference sequence.

And then stepping into the more murkier territory of complex disease genetics, I think that there’s still a lot of work to be done to figure out the underlying genetic architecture of those diseases and understanding the combinatorics of alleles and variants that come together to equal the predisposition to diseases with environmental factors added to them.

So I think that getting back to what Dr. Phillippy said earlier, I think that an understanding of this is probably going to be borne out by a much better understanding of variation in genomes, which we’re gaining with studies like ours, mixed with machine learning approaches to plumb the depths of those data for variants that might, in aggregate or individually, contribute to these complex diseases. So long story short, I think that there has been a lot of progress. But I also think, in the future, there’s a lot of work and progress to be done.

ADAM PHILLIPPY: I think I would be remiss to not give credit to the initial Human Genome Project that we’re building on here that finished up, as you said, about two decades ago now. And I find it really informative to look back and realize that project took about 10 years and, in today’s dollars, about $5 billion. Each of these individual genomes that we’re doing now at a better quality can be done in basically a few days for around $5,000. And so just do the simple math. That’s a million-fold reduction in the costs to sequence a human genome thanks to these research investments that have been made over the past 25 years by the NIH and by my home institute, NHGRI.

And it’s just amazing to reflect on the progress that this field has undergone over the past 20 years with those investments. And so if you look back at the economic impact on that, there was a study in 2013 that estimated the economic impact of the Human Genome Project at $1 trillion, and that was 10 years ago. Imagine what those returns are now. So this Human Genome Project is just a gift that keeps giving, both in terms of economic terms and in terms of quality of life.

IRA FLATOW: Well, I want to thank all of you for taking time to be with us. This has been informative. I imagine you’re all very, very hopeful about the future.

CHRISTINE BECK: Yeah, absolutely.

IRA FLATOW: Please come back and tell us more about where this is heading when you get a chance.

CHRISTINE BECK: Will do, thanks.

ADAM PHILLIPPY: Thanks much, Ira.

GLENNIS LOGSDON: Thank you so much.

IRA FLATOW: You’re welcome. Dr. Adam Phillippy at the National Human Genome Research Institute– that is at NIH in Bethesda. Dr. Christine Beck of the University of Connecticut Health Center and the Jackson Laboratory, and Dr. Glennis Logsdon at the University of Pennsylvania. Thank you all for taking time, as I say, to be with us today.

Copyright © 2025 Science Friday Initiative. All rights reserved. Science Friday transcripts are produced on a tight deadline by 3Play Media. Fidelity to the original aired/published audio or video file might vary, and text might be updated or amended in the future. For the authoritative record of Science Friday’s programming, please visit the original aired/published recording. For terms of use and more information, visit our policies pages at http://www.sciencefriday.com/about/policies/

Meet the Producers and Host

About Charles Bergquist

As Science Friday’s director and senior producer, Charles Bergquist channels the chaos of a live production studio into something sounding like a radio program. Favorite topics include planetary sciences, chemistry, materials, and shiny things with blinking lights.

About Ira Flatow

Ira Flatow is the founder and host of Science FridayHis green thumb has revived many an office plant at death’s door.

Explore More