Subscribe to Science Friday
Proteins are crucial for life. They’re made of amino acids that “fold” into millions of different shapes. And depending on their structure, they do radically different things in our cells. For a long time, predicting those shapes for research was considered a grand biological challenge.
But in 2020, Google’s AI lab DeepMind released Alphafold, a tool that was able to accurately predict many of the structures necessary for understanding biological mechanisms in a matter of minutes. In 2024, the Alphafold team was awarded a Nobel Prize in chemistry for the advance.
Five years later after its release, Host Ira Flatow checks in on the state of that tech and how it’s being used in health research with John Jumper, one of the lead scientists responsible for developing Alphafold.
Sign Up For The Week In Science Newsletter
Keep up with the week’s essential science news headlines, plus stories that offer extra joy and awe.
Donate To Science Friday
Invest in quality science journalism by making a donation to Science Friday.
Segment Guests
John Jumper is a Distinguished Scientist at Google DeepMind and a co-recipient of the 2024 Nobel Prize in chemistry.
Segment Transcript
IRA FLATOW: This is Science Friday. I’m Ira Flatow. You’re all familiar with proteins, right? They’re made of amino acids. They do many important jobs in the body. And they can take millions of different shapes. And depending on their structure, they do radically different things in our cells. And for a long time, predicting those shapes for research was considered a grand biological challenge.
But in 2020, Google’s AI lab, DeepMind, released an early version of AlphaFold. That’s a tool that was able to accurately predict many of these structures necessary for understanding biology predicting in a matter of minutes. In 2024, the AlphaFold team was awarded a Nobel Prize in chemistry? For the advance. That’s how important they deemed it.
Five years later, after its initial release, we’re checking in on the state of that tech and how it’s being used in health research with one of the lead scientists responsible for developing AlphaFold, John Jumper, a scientist at Google DeepMind and co-recipient of the 2024 Nobel Prize in Chemistry. John, welcome to Science Friday.
JOHN JUMPER: Oh, it’s great to be here.
IRA FLATOW: It’s nice to have you. All right. Let’s begin at the beginning. Tell us what exactly protein folding is, why it’s important to understand how it works.
JOHN JUMPER: Well, so one of the really important things to say is, the cell has a lot to do. It’s, in some sense, a factory or a machine. It has many, many parts. And those parts are all encoded in the DNA. We talk about DNA as the instruction manual for the cell. And one of the big things that it does is that it gives instructions on how to make different proteins. These are a couple thousand atom machines, really, really tiny nanomachines that do jobs like pump things in and out of the cell.
So when your nerves fire, for example, they let ions in and out. They copy DNA. They repair DNA. So they do all the functions of the cell. And there’s a machine made largely of proteins that converts our DNA into RNA. And then there’s the ribosome, which converts our RNA into proteins. And so it makes this long chain of amino acids.
And then just due to the laws of physics, this chain folds up into a really intricate functional shape. The analogy I kind of like for it is, it’s almost like if you had your, say, IKEA bookshelf. And as soon as you open the box, the bookshelf just built itself. So these proteins are encoded in the sequence of the protein. The amino acids in order is effectively the shape. But what we didn’t how to do was get that shape, or at least not very well. In fact, quite a few Nobel prizes have been awarded for determining the shape– or normally, we call it the structure– of a single protein.
But reading the structure requires extraordinarily difficult experiments still, something like a year or more of a PhD student’s time, and maybe $100,000 in expense to go, to what is that structure? And that structure is the thing that is actually functional. And there’s quite a bit that you understand from the structure of proteins.
IRA FLATOW: Right. And so how does AlphaFold come into this?
JOHN JUMPER: So AlphaFold is an AI system, a deep learning system, not exactly like a chatbot, specially trained just for this task. First, I should say that scientists have studied the structure of proteins. Even though I just said it costs something like $100,000, people have actually done it a lot because it’s really important for biological research. So there’s about 200,000 known protein structures that have been collected by scientists over more than 50 years and deposited in what’s called the Protein Data Bank, this openly available resource of all these protein structures.
So you can think of it as an enormous amount of societal investment in these. And what we did was trained a machine learning system using the inputs that would come from DNA– here’s the protein sequence– and predicting the 3D structures that scientists have measured and, in fact, predicting both the structure and how confident we are in that structure.
And we weren’t the first to try and do this or to think maybe a computer program could be useful here. But what we really developed was massively more accurate algorithms from the same data that gave a very, very precise prediction of this structure to something comparable, in many cases, to the accuracy of the experiments themselves. Because, of course, no experiment is exactly perfect.
IRA FLATOW: And so what can you do with this? What have scientists used it for so far?
JOHN JUMPER: So there’s a couple of really big things that scientists can do with it. And I think it’s worth saying that one thing scientists can do is just use it to understand biology. Here’s this protein that I want to understand that’s maybe associated with a disease. And I want to change that protein. Or I want to understand why this mutation in this patient might cause someone to be sick.
And you might say, well, what is the structural context of that mutation? Is that maybe where this protein sticks to another? So a lot of times what scientists will do is, they will use AlphaFold to predict a structure. And then they’ll look at it and they’ll say, ah, these data that I had before that didn’t make sense, now they make sense. Or they’ll use it when they’re designing new proteins.
For example, at Oxford, they’re working on developing a vaccine, I believe, for malaria. And they used AlphaFold to understand the structure of the protein. And they said, oh, well, this part of the protein from malaria is probably going to be really good for a vaccine. So they design their vaccine based on the parts of the sequence that AlphaFold say produce a meaningful bit of structure. Scientists are also using it as part of drug discovery, both AlphaFold2, which predicted the structure of proteins, and then AlphaFold3, which predicts a wider range of things, for example, how a drug-like molecule– what position it might stick on a protein.
And so they’ll use it as part of understanding and basically always, in a sense, using it to find the hypothesis for what’s the next experiment to do. And they will use it also to understand things like evolutionary history. Basically, what you should almost think of is, the structure is a map that helps you make better hypotheses about proteins. So everyone that works with proteins gains something from understanding the structure that helps them design better experiments. Of course, they still test it in the lab when they’re done.
IRA FLATOW: Right. So how good is this at actually predicting?
JOHN JUMPER: So I think there’s a couple ways to think about it. One is a numerical number. We’re about 90% correct according to a certain scale called GDT. But I don’t really think that’s the right way to think about it. What I would more say is that it tends to generate very, very reliable hypotheses. And it says when it’s not sure.
IRA FLATOW: So where does AlphaFold need more work? Are there proteins that it doesn’t predict well?
JOHN JUMPER: That’s an interesting thing. So some proteins don’t have a structure at all. So I told you that proteins fold up into a structured thing. And that’s true for a majority of proteins. But actually, there are lots of regions that are intrinsically floppy, often because of their function. So in that sense, there’s no answer for AlphaFold to give.
The other thing that is a really strong determinant of AlphaFold accuracy is that it uses information not just from that particular protein but from the evolutionary history of that protein. And that evolutionary history is really saying, oh, well, here’s this protein in human. But there’s a very similar protein in mice. There may be even a very similar protein in yeast– many, many different evolutionarily-related species that have similar proteins. And all of those are used jointly to make the prediction.
And so proteins that are evolving extraordinarily quickly, like some viral proteins or proteins from very obscure organisms, where we don’t have many other similar organisms, tend to be much harder to predict.
IRA FLATOW: Let’s talk a bit about AI drug discovery. DeepMind has its own spin off company called Isomorphic Labs that uses the AlphaFold tech and AI to discover new kinds of drugs. This space has been active for over a decade. But a lot of companies have folded, no pun intended. And AI hasn’t actually produced a really marketable drug, has it? Why is this so hard?
JOHN JUMPER: My expertise is not AI drug discovery. I know some. I obviously talk with the Isomorphic people. But the first thing I’ll say is that you’re asking me at the five-year anniversary why, don’t we have a drug yet? And even AlphaFold, that’s really for AlphaFold2, which is a protein technology. And then AlphaFold3, which is protein-drug interactions, came a couple of years later. So I think, first of all, we’re still very early in this story.
I do think there’s a second part, though, that making a drug, you have to optimize many factors. So first, you have to understand the biology really well. There’s lots of diseases in which we simply don’t understand the biology. What we do expect is that really atomic things, like, how do drugs bind? How do we make them bind better? Those are the kinds of problems that you would expect would get a lot better with AlphaFold-derived technologies. Because it has a very good understanding of how things come together, how they bind.
And I think we are seeing relatively rapid advances– not yet drugs. The drug timeline is typically seven-plus years. But we are seeing relatively rapid advances in how you use AI to do it. And of course, it’s not just AlphaFold. You have to build many more technologies. And then you have to solve a lot more problems. You have to understand things like, how soluble is my drug going to be? Will it penetrate the cell membrane? Will it get metabolized by various enzymes? Will it be toxic in various ways?
There are all these problems that you have to tackle in order to make drugs faster and faster. And I think people are seeing progress against these problems, but they require more work. When it comes to AlphaFold for structure prediction, it’s pretty clearly both a breakthrough in the science– we now understand how to use AI to do this. And it’s a kind of black-box piece of software, technological artifact that predicts protein structure really well.
When you think about extending this to drug development, it’s still useful in terms of structure prediction. But I said a structure is about $100,000. A drug costs about a billion. So you can’t have– one structure prediction isn’t likely the gap to making a drug. The numbers just don’t even work out. But what you see is that this kind of breakthrough that’s showing you can use AI to solve these problems that it didn’t used to be able to do, or that we didn’t used to have any computational ability to do by any method– we had experimental ability. This should accelerate. But you have to build this into more and more technology.
So I think we’re seeing a technological build-out taking inspiration from this and others. And how do we build new technologies that will help us do different phases? Of course, we still have the problem that in no sense is biology solved. Or in no sense should we expect drug design to be easy just because we can do structure prediction. There’s so much more to it. I like to think, in terms of AlphaFold itself, maybe we made the field of structural biology 5% or 10% faster as a whole, which is really extraordinary when you think about how many scientists work in it, how much work is done.
But there’s still a lot left to be done. There’s so much more. I think AlphaFold old especially is an important technology. But it’s also a directional indicator that we should expect more powerful technologies to continue coming within this field. But that still takes time. It takes time to work out. It takes time to do trials. But everything I hear from people very directly in the AI drug discovery industry is that they’re still very, very excited, that there still looks like there’s progress being made and improvements being made. But we have a lot of problems left. And we still have giant biology challenges.
[RELAXING MUSIC]
IRA FLATOW: We have to take a quick break, but don’t go away– more on this when we come back.
JOHN JUMPER: How do we put those together? How do we get, say, a language model that can talk about protein structure, that can reason over it and have these linguistic capabilities?
[CONTEMPLATIVE MUSIC]
IRA FLATOW: Well, let’s talk about what the future looks like. The new version of AlphaFold, AlphaFold3, is being used right now. What’s the big advancement with this model?
JOHN JUMPER: So we started off by talking about proteins. I told you about– they’re really important in the cell. But of course, proteins are not the only things in the cell. Our DNA is in the cell. In fact, proteins stick to DNA. That’s how they read them. There’s RNA in the cell. RNA is structure. Proteins stick to it. There’s small molecules, either drugs or natural molecules– for example, adrenaline, it’s a natural molecule, et cetera. So the Protein Data Bank is named the Protein Data bank. But it’s really the structural biology data bank. It contains a lot of data about the other atomic components of the cell. How do they associate?
And so we extended the ideas of AlphaFold and said, let’s not just make it about proteins. Let’s do the protein cinematic universe. Let’s do DNA, RNA, small molecules, ions, all of this. And somewhat surprisingly, it worked quite well at a lot of these. There’s a lot less data on everything other than proteins. But we are able to develop, I think, reasonably accurate predictors of different types of, especially, binding and interaction.
And this tells the wider story of how all of these pieces fit together in order to make larger cell machines. And that’s been the focus of the AlphaFold3 technical advance. And it’s become also– it’s a really, really important technology for getting to things like drug discovery. Because previously, you could only talk about how proteins stick together in AlphaFold2. But now we can talk about things like, how does protein stick to a drug?
IRA FLATOW: Let’s move on to a related area that I think is maybe concerning to a lot of people. And I’m talking about tech companies developing AI, Google included. They’re building out a ton of data centers for their AI efforts. And they require a huge amount of electrical power and water to run. Are people going to be competing with AI for their electricity?
JOHN JUMPER: On the large language model– it’s not really my area of expertise– I can tell you about AlphaFold. AlphaFold2 was on 128 GPUs. AlphaFold3 was trained on about 256. Actually, AlphaFold1 was TPUs– so quite a bit smaller than what I would say is large language models. The other thing that I’ll say that’s been kind of interesting in the AlphaFold context– or we’ve looked at– is the energy of other things.
Like, experimental structure determination takes place at synchrotrons, which have enormous energy consumption. So actually, I can’t speak to the large language model case. And of course, that’s very complicated in economic analysis. It’s very complicated. But in the structural biology and AlphaFold case, AlphaFold is a lot, lot, lot less electricity than the equivalent ways to get protein structures. So I think you do at least have to talk about substitution economic effect and others. But you’ll have to talk to economists for the details of other cases.
IRA FLATOW: OK. So what is the next AlphaFold, the next killer app for AI in science?
JOHN JUMPER: I think I’m excited about a couple of things. First, I’m very excited about the maturing of these technologies for drug design, for protein designs. But I think the other part about science is, it will get into large language models. It will get into this question of, OK, we can learn from these big, well-curated data sets, like the Protein Data Bank. But how are we going to learn from the scientific literature itself? How are we going to learn to reason better and better?
And I think we’re already seeing large language models are shockingly effective at scientific discourse. They’re starting to get scientific reasoning. People are looking at them for scientific workflows. So I think that’s going to be a really big deal. And then the question I think that will become a key one– or at least one I’m interested in in the long term– is, how are we going to fuse these two technologies?
We have this AI applied to narrow problems, like protein structure prediction. And we have these generalist language models that are not as good as a specialist human in their field but is still a really exciting and powerful development. How do we put those together? How do we get, say, a language model that can talk about protein sequence, protein structure, that can reason over it and have these linguistic capabilities but also have this extraordinary performance that comes from learning from scientific data? So I think that, in a few years, if we make that technology work, will be really, really transformative.
IRA FLATOW: Well, thank you very much for taking the time to be with us today, and good luck.
JOHN JUMPER: Thank you.
IRA FLATOW: John Jumper is a scientist at Google DeepMind. And we’ll be checking back as things develop.
[INSPIRATIONAL MUSIC]
This episode was produced by Dee Peterschmidt. I’m Ira Flatow. Thanks for listening.
[INSPIRATIONAL MUSIC]
Copyright © 2025 Science Friday Initiative. All rights reserved. Science Friday transcripts are produced on a tight deadline by 3Play Media. Fidelity to the original aired/published audio or video file might vary, and text might be updated or amended in the future. For the authoritative record of Science Friday’s programming, please visit the original aired/published recording. For terms of use and more information, visit our policies pages at http://www.sciencefriday.com/about/policies/
Meet the Producers and Host
About Dee Peterschmidt
Dee Peterschmidt is a producer, host of the podcast Universe of Art, and composes music for Science Friday’s podcasts. Their D&D character is a clumsy bard named Chip Chap Chopman.
About Ira Flatow
Ira Flatow is the founder and host of Science Friday. His green thumb has revived many an office plant at death’s door.