Two Decades Beyond The First Full Map Of Human DNA
In February 2001, the international group of scientists striving to sequence the human genome in its entirety hit a milestone: a draft of the complete sequence was published in the journals Nature and Science.
The project took 13 years to complete: In that time, genome sequencing became faster and cheaper, and computational biology ascended as a discipline. It laid the groundwork for the greater cooperation and open data practices that have made rapid vaccine development possible during the pandemic. In the decades since, researchers have been trying to better understand how genetics impact health. We’re still working toward the dream of personalized treatments based on a person’s specific genetic risks.
Then, with bioinformatician Dina Zielinski and Indigenous geneticist-bioethicist Krystal Tsosie, he looks to the contemporary hurdles for genetic research, including privacy, commercialization, and the sovereignty of Indigenous peoples over their own genetic data.
Invest in quality science journalism by making a donation to Science Friday.
Shirley Tilghman is a molecular biologist. She’s the former president of Princeton University, and a former member of the National Advisory Council for the Human Genome Project at the National Institutes of Health.
Krystal Tsosie is a PhD Candidate at Vanderbilt University and co-founder of the Native BioData Consortium and in Phoenix, Arizona.
Dina Zielinski is a bionfomatician in the Paris Transplant Group, lead scientist at Cibiltech, and a PhD candidate in bioinformatics at Sorbonne University in Paris, France.
IRA FLATOW: This is Science Friday. I’m Ira Flatow. When you consider the history of science, the modern field of genetics is quite young. Genetic engineering, which we take for granted, dates back just to the early 1970s. Then in the late 1980s when an international team of scientists decided to press forward to create a full sequence of the human genome, it morphed into a monumental moonshot-like effort that would cost $300 million and take 13 years from start to finish.
This month marks 20 years since the first draft of the genome was published simultaneously in the journals Nature and Science. In 2005, we were still discussing the Human Genome Project on this program in terms of its potential. Here’s geneticist Huntington Willard talking about how genomic sequencing could change medicine.
HUNTINGTON WILLARD: If we can evaluate a given individual’s entire genome at the cost of a thousand or maybe a few thousand dollars, that fundamentally changes the way we address disease where you would bring people into the health care system, scan their genome, and look for the variants that might predispose to different types of disease. If we could do that genome-wide for thousands and thousands of diseases for literally everyone in the country that was entering the health care system, that would fundamentally change the way we provide health care in this country and around the world into a much more personalized form of health care.
IRA FLATOW: So now in 2021, what can we say about what we’ve actually gained and at what price? First here to look back with us is Dr. Shirley Tilghman, a molecular biologist at Princeton University. Former president of Princeton, she served on the advisory council that oversaw the project from start to finish. Welcome, Dr. Tilghman.
SHIRLEY TILGHMAN: Thank you. Glad to be here.
IRA FLATOW: Nice to have you. Please take us back to a time when this monumental project was first kicking off. Did you think it would succeed the way it has?
SHIRLEY TILGHMAN: Well, I thought it would succeed. I never questioned that if we put our mind to it, we would be able to organize a project to determine the order of the 3,000 bases in the human genome. Biology, unlike physics for example, had been a very cottage industry kind of science. It had never embarked on something very grand, very large. And so could I have anticipated back in the late 1980s the impact it has had? Probably not.
IRA FLATOW: Let’s start with some of the numbers. Can you put a number for us? And how many scientists were involved in this project from start to finish?
SHIRLEY TILGHMAN: From start to finish, I would have to say that we are talking about hundreds if not thousands of scientists who were involved. The project did not begin in 1988 with the sequencing of the human genome. It began by sequencing organisms as simple as bacteria and yeast. And it was during that time when so many scientists who never for a minute thought they were interested in the human genome began becoming interested in how to do large scale organismal sequencing.
IRA FLATOW: Was that the story of your involvement too, first getting involved in the simple sequences of those small organisms and then thinking, oh, maybe we can expand to the human?
SHIRLEY TILGHMAN: It was actually my interest. And I think as I participated in those early, early deliberations on even whether we should sequence the genome, for one thing the idea of sequencing three billion bases of human genome was just daunting at the time. So it made a great deal of sense for us to take a much simpler organism with a much smaller genome and say, well, let’s learn how to do this properly by taking on a small project and expanding to larger and larger organisms, larger genomes, and thereby by the time we really began in earnest sequencing the human genome, we had had at least a dozen years of learning how to do cost effective and timely DNA sequencing.
IRA FLATOW: I’m trying to remember the figure that was thrown around back then the first time they talked about the sequencing, what it would cost to sequence the first genome, something like it was in the millions, wasn’t it?
SHIRLEY TILGHMAN: So probably the best way to think about what happened to the cost is that when I joined the National Academy Committee in 1987, ’88, I had been sequencing myself. And it was costing roughly $100 a base to do good sequencing at that time and to sequence let’s say 1,000 bases would take you at least a week.
We knew that if you extrapolated those numbers and they didn’t improve, there was no way we were sequencing the human genome. Today, we’re at a place where an entire three billion base human genome can be sequenced for under $1,000 and in well less than a week. So that’s the magnitude of the technology advances that have happened since those early days of the Genome Project.
IRA FLATOW: There were a lot of surprises that came out of the sequence. And I think the biggest one that I can recall at the moment as we speak is specifically how small it was or is.
SHIRLEY TILGHMAN: And if by small you mean how many actual genes there are in the genome, absolutely. This came as an enormous surprise and, if I may say so, a humbling surprise to those of us who had always seen humans at the very top of a very large evolutionary tree. And to discover that to be human did not require many more genes than it took to be a fruit fly or a soil worm was quite a shock I think to everybody involved in the community.
IRA FLATOW: And not only that, but this concept of junk DNA.
SHIRLEY TILGHMAN: Yeah.
IRA FLATOW: I remember in the early days when we talked about on Science Friday 20 years ago, I said we can’t call it junk DNA. It’s been preserved for how many thousands and thousands of years. It’s got to be doing something.
SHIRLEY TILGHMAN: Correct. And one of the most famous statements about junk DNA was made by the great molecular biologist Sidney Brenner who said you have to distinguish junk from garbage. Junk is what you put in your attic until its reuse becomes evident to you. And Sidney was absolutely right. The regions of the genome that did not appear to encode genes themselves have turned out to be some of the most important regions of the genome to understand because those are the regions that are controlling the activity of genes. Whether a gene turns on or turns off in the right place and at the right time, that’s what’s being controlled by what we used to call junk.
IRA FLATOW: Let’s talk about other surprises. I think one thing that surprised me to learn is that there are still portions of the genome that we haven’t fully sequenced. How can that still be?
SHIRLEY TILGHMAN: I know. It seems amazing 20 years later that there are still unknown parts of the genome. The parts of the genome that are most challenging to sequence are what are called repetitive regions of the genome. And these are regions where there’s a simple sequence and it’s repeated over and over and over and over again. And so when people break up the genome and go in to sequence it, there’s no way to know once you’ve broken it up whether the repeat came from here, or came from here, or came from here. So there’s still a little bit of uncertainty in these very repeated regions about what the actual sequence is.
IRA FLATOW: And one other thing that we thought we would learn is we would find out there was a gene for this. There was a single gene for that. But it turns out there are very few diseases where one gene is actually going to change your outcome.
SHIRLEY TILGHMAN: And I think the even bigger surprise has come from an expectation going in that if you take very common diseases, like heart disease, or stroke, or hypertension, the thought was, well, maybe there are going to be five, or six, or seven genes that are important in determining whether you have a high or a low risk for those diseases. There are very, very few common diseases for which that is true and that for these very common diseases, the genes that are affecting your likelihood are probably in the hundreds and maybe even in the thousands, which makes it very, very difficult to identify any one that’s really important.
IRA FLATOW: Interesting. Now you were a biologist before we had a sequenced human genome. How did work like yours change in the aftermath of all of this?
SHIRLEY TILGHMAN: I think the genome has profoundly changed the way in which many, many biologists go about their work. For the vast majority of my career, I studied one gene at a time. I picked a gene that I thought was interesting and important. I studied it to death till I knew absolutely everything about it. And when I was done, I would then go on to another gene.
What scientists do today is very different. You can now design an experiment in which you don’t ask what one gene does in an organism. You can ask what all 22,000 genes are doing. And you can ask it all at once in one experiment. And that has radically changed the way in which we think about and do experiments in biology. So it’s really changed the landscape completely.
IRA FLATOW: Another big outcome of this project was the precedent it set for data transparency. Can you talk about how this set the stage for a lot of how science works today?
SHIRLEY TILGHMAN: I think one of the proudest moments I had during my involvement was the decision to adopt what were indeed called Bermuda rules because they were actually voted upon by the community at a meeting in Bermuda. And that was the decision to make the DNA sequences that were flowing off sequencing machines available every 24 hours and to anybody, as I said earlier, who has a computer and can access the information so that the sequences were there for everyone to see, and to work on, and to begin to understand, not just the people who were generating the sequence.
IRA FLATOW: We’re looking back at years of genomics advances right now. Has it accelerated to the point where if you want to predict what the next 20 years is like it’s almost impossible?
SHIRLEY TILGHMAN: Yes. I think that’s exactly correct, Ira. I think it’s very hard to see where this field is going because things are moving so quickly. The field that has benefited the most from the Genome Project is certainly cancer. We now understand cancer so much more comprehensively than we did before the Genome Project. And the wonderful thing is that the knowledge is turning into therapies, and therapies that are far more exquisitely designed to stop the growth of the cancer than the old fashioned ways in which we used to tackle cancer through radiation and chemotherapy.
I think the whole face of cancer, and cancer research, cancer therapy has been changed by the Genome Project. It’s had less of an impact on other branches of medicine but I have no doubt that there are going to be advances that come as a consequence of our understanding what exactly is in this blueprint of life called our genome.
IRA FLATOW: Well, I couldn’t think of a better way to end our segment here with that kind of statement. And I want to thank you very much, Dr. Tillman, for your work and for taking time to be with us today.
SHIRLEY TILGHMAN: It was a pleasure. Thank you for asking me.
IRA FLATOW: Dr. Shirley Tilghman, professor of molecular biology at Princeton, former advisor on the Human Genome Project. I’m Ira Flatow. And this is Science Friday from WNYC Studios.
We’ve been talking about the legacy of the Human Genome Project, which published its first draft of the sequenced human genome 20 years ago this month. But what about the next 20 years of genomic research? Here to unpack some of the challenges they see as newer arrivals to the field, let me introduce them, Dina Zielinski, a PhD candidate at the Sorbonne, a bioinformatician with the Paris Transplant Group, and a lead scientist at Cibiltech. Welcome, Dina.
DINA ZIELINSKI: Thank you. Thanks for having me.
IRA FLATOW: And Krystal Tsosie, a member of the Navajo Nation and a geneticist bioethicist. She’s a PhD candidate at Vanderbilt University and co-founder of the Native Biodata Consortium. Welcome, Krystal.
KRYSTAL TSOSIE: Thanks for having me.
IRA FLATOW: You’re welcome. Dina, let me start with you, since you wrote a piece specifically about the concern of privacy and genomic surveillance as we look to the future of genetic research. What are you most concerned about?
DINA ZIELINSKI: So it’s great to reflect back now on the Human Genome Project. It’s hard to believe it’s been 20 years. And I was actually taking high school biology at the time. And I think that one of the main advantages of the Human Genome Project was that it really laid out data sharing principles. And this is really critical in order to advance in research.
And I started working in the field of genomics 10 years ago. And one of the first things I had to do was complete this human subjects research training. And so working with human data, I’ve always been very aware and very grateful, to be honest, to have access to that data. And so moving forward, I think we’ve done a pretty good job so far. But one thing that I think that is really changing is that individuals have access to their own genome. Before it was just these big consortiums that would collect data and anonymize the data. But now many of us have had, and myself included, we’ve had these direct to consumer tests like 23andMe, Ancestry, MyHeritage.
And so our DNA, one, is everywhere but now we actually have the full sequence sitting on our hard drives, or in our emails, or in these accounts. And so I think moving forward, we just have to continually adapt to the situation. I mean privacy has always been a challenge not just in genomics. And the thing is our DNA is probably our most valuable, our most private information in many ways.
IRA FLATOW: Well, you say our most private information but now it’s no longer private.
DINA ZIELINSKI: Exactly. And I mean it never really was. Your DNA is everywhere. But now it’s very easy to sequence. It takes less than a day to sequencing an entire genome, on the order of hours now, and less than $1,000.
IRA FLATOW: You know, we saw one example of this ownership or non-ownership of DNA in the conviction of the Golden State Killer a few years ago in California when data from one of his relatives in an ancestry database helped them track him down. I mean is this the kind of thing that’s going to keep happening over and over again?
DINA ZIELINSKI: Honestly, I hope so. I say that with a bit of reservation. I think that genomics has been an incredible tool for forensics but I hope that it rests with forensics. And I am a bit concerned that it will be exploited for unnecessary surveillance. But in terms of using it for forensics to convict people who committed violent crimes, I share my DNA. I consent to have my DNA shared in these databases.
If my DNA can help find someone who committed a violent crime, I’m all for sharing it. That being said, I think, and I understand people’s concerns, when we share our DNA we’re not just sharing our own DNA. I’m sharing that of my siblings, my mother, my family members. So it’s really a tough call. I’m very much open to sharing data. But I am a bit wary, to be honest.
IRA FLATOW: This is Science Friday. I’m Ira Flatow. In case you’re just joining us, we’re talking about challenges for genetic research 20 years after the first draft of the human genome was published with my guests, Dina Zielinski, a bioinformatician with the Paris Transplant Group and a lead scientist for Cibiltech, and Krystal Tsosie, an Indigenous geneticist bioethicist with Vanderbilt University and the Native Biodata Consortium.
Krystal, I introduced you as a co-founder of the Native Biodata Consortium, which gets to an issue we’ve talked about in different ways on this program in the past, Indigenous sovereignty over genetic data. Please remind us how big an issue this is.
KRYSTAL TSOSIE: Yeah. So when we talk about precision medicine and health, we’re always promising that the next advantages and innovations will be conferred to those individuals that contribute the genomic information. The pandemic has shown that preventative health care and structural barriers to access to health care probably highlighted more about health disparities than this unpronounced supposedly future advantages of health care. Indigenous peoples have willingly or unwillingly contributed their DNA for the supposed betterment of humankind.
Need I remind everybody what happened after the completion of the Human Genome Project? We had the completion of large scale diversity projects such as the Human Genome Diversity Project and 1,000 Genomes Project, which were denounced by over 600 plus Indigenous nations worldwide that went to the United Nations because they were concerned about privatization, and commercialization, and exploitation of Indigenous genomes.
And what has happened to those biomarkers collected from Indigenous peoples from Central and South America? Those biomarkers are now freely and openly accessible to companies such as ancestry DNA and 23andMe. Ancestry DNA has posted revenues over a billion dollars every holiday quarter since 2017. So we always have to ask ourselves, what exactly are the protections related to this data privacy and commercialization.
The rate of technology outpaces our regulation of these new technologies. And while we think that these protections are conferred by laws, such as the Genetic Information Non-discrimination Act, laws change. Companies are bought and sold. So we have to ask ourselves, what’s the commercial value of the data that we are being asked to freely give away. And how can we look to communities and empower communities to self-direct the decisions that are being made using their data?
IRA FLATOW: Dina, you contributed your data and you gave it away freely. Do you not feel the same kind of threat here that exists?
DINA ZIELINSKI: Not quite in the same way, no. Individuals of European ancestry make up the vast majority of genetic studies. And that’s really problematic because they only make up 6% of the population. And I completely understand the threats to underrepresented populations. We should be sequencing these underrepresented populations, but we should be sequencing them with the idea of making genomics research more equitable, of giving back to these communities, not just taking from them.
That being said, I can’t even explain how useful data like that from the 1,000 Genomes Project has been. I’ve used it in most of my projects. I have whole human genomes at the tip of my fingers. When I’m accessing this data, as well as other scientists, I think we generally have good intentions. So I currently use it in a study to better understand Parkinson’s disease.
That being said, I think in many cases, a lot of this data has restricted or limited access for researchers versus commercial entities. I agree here that we really should limit what industry can or cannot do with our data.
IRA FLATOW: Krystal, you mentioned preventive care and the pandemic. The Human Genome Project I remember promised to tell us everything about our genome. Doesn’t this sort of tell people, hey, we know everything about you now and ignore the nurture part of the nature nurture debate?
KRYSTAL TSOSIE: What I can tell you as a geneticist, my first skepticism and what I always tell tribal leaders, is that genetic data is just the easiest type of data to collect. But genetic data does not predict as much about disease risk than we think. Other things such as access to care, cultural factors, colonial factors relating to health probably contribute more to the health differences and outcomes than actual genetics itself. Things like diet, environment, and lifestyle are things that we should be looking at and definitely socioeconomic status factors. But these are the hardest bits of data to collect. And so we really can’t build truly robust models without looking at these other factors related to health.
So looking at genetics and biological factors is sometimes a little bit of a copout. We don’t necessarily properly convey the limitations of genetics and biological research to the lay public. There’s a lot of unfortunately disinformation related to how much biology actually contributes to health. And it creates these false relationships between, for instance, genetics and ancestry and genetics and identity, especially when we as geneticists use terms like genetic ancestry, not telling everyone though that these are statistical inferences made out of small bits of the genome.
IRA FLATOW: Dina, do you see these same limitations?
DINA ZIELINSKI: It’s very interesting actually to bring up ancestry. Ancestry is pretty complicated. And I think, like Krystal said, our DNA is rarely deterministic. There are very few mutations that we know of that will lead 100% to a trait or disease. And ancestry is very complicated. We basically will match an individual’s DNA to a database of people who are defined by geographic borders. And DNA doesn’t necessarily respect these borders. What ancestry can tell you is simply where in the world you happen to share different percentages of your DNA. It does not define who you are.
IRA FLATOW: If the big scientific advance of 20 years ago was reading out the full sequence of the human genome, and the big advance of the last 20 years has been this large scale analysis of how groups of genes relate to health and, as we are talking about, environmental factors, Dina, what do you think the next big advance is?
DINA ZIELINSKI: One I think equitable research and sequencing more than the 6% of the population that is of European ancestry is critical. And I think that is happening. I’m really happy to see that there are initiatives to sequence underrepresented populations. And I think they will be transformative at making genetics research and the results that we find available to all populations.
And the other thing is I think that with all of the advances in computational tools, and Krystal mentioned this, we can’t keep up with the technology in many ways, even in the last 10 years, the technology has improved enormously. We have now sequencers that can sequence very long stretches of DNA.
One of the main limitations of the technology was that DNA is often only read in these short sentences of about 100 to a few hundred letters called nucleotides. I think the next step will be finally sorting out the link between genes, and traits, and diseases. It’s going to happen at a much faster rate than it has happened in the past 20 years. But I think if we continue going the way that we have been going that we will get there eventually.
IRA FLATOW: And Krystal, do you agree with that?
KRYSTAL TSOSIE: That is the concerning part is that what we’re seeing are ethical questions of the past that still have not been resolved that are being born yet again today as if they’re new questions.
The next big discoveries to advance genomes technology will be likely found in small or yet untested populations as we move through looking at common variants contributing to disease to rare variants. These are the same populations that have been historically oppressed. And even for members of the majority population, genetic privacy is threatened. And these concerns are compounded for small communities like us. And it’s really unfortunate for me and really frustrating for me because we have seen all of the events that have occurred related to discussions of racial justice and injustice last year.
And now I’m seeing terms like diversity being conflated with equity and justice in genomics where diversity is really only limited to just including more members of diverse and underrepresented peoples on a plate without giving them agency over what happens to their DNA and data. That’s not justice.
And what we really need to be doing is talking about what it means to partner with communities and individuals to give greater decision-making autonomy and authority to those people that are contributing the information. If this means benefit sharing in terms of actually giving back to the people that contributed the information, that would be a positive direction that I would like to see in the next 20 years. But the only way that we’ll do that is if we have more people who are from those underrepresented communities actually doing the research, and actually directing the research, and actually provided a seat at the table in which our voices are actually listened to.
IRA FLATOW: Great points. I want to thank you for taking time to be with us and wish you all great luck in your research.
DINA ZIELINSKI: Thank you, Ira.
KRYSTAL TSOSIE: [NON-ENGLISH], which means thank you.
IRA FLATOW: Dina Zielinski, a bioinformatician with the Paris Transplant Group and Krystal Tsosie, an Indigenous geneticist bioethicist with Vanderbilt University and the Native Biodata Consortium.