04/10/26

Why so many studies can’t be replicated

How do we know what we know? That’s where science comes in—it gives us a method for testing our ideas and getting trustworthy results. But some researchers have warned that many scientific studies can’t be replicated.

To find out how deep the problem goes, the US Defense Advanced Research Projects Agency funded one of the largest analyses of social science, called the SCORE project. They checked the results of thousands of papers across economics, education, and psychology—and found that only half of them could be replicated.

Joining Host Ira Flatow to discuss the findings are Tim Errington, one of the leads on this project, and economist Abel Brodeur, who recently released the results of a separate replication study that found more encouraging results than SCORE did.


Further Reading


Donate To Science Friday

Invest in quality science journalism by making a donation to Science Friday.

Donate

Segment Guests

Abel Brodeur

Dr. Abel Brodeur is a professor of economics at the University of Ottawa and founder of the Institute for Replication.

Tim Errington

Dr. Tim Errington is senior director of research at the Center For Open Science in Washington, D.C.

Segment Transcript

[THEME MUSIC] IRA FLATOW: Hi, I’m Ira Flatow, and you’re listening to Science Friday. How we know what we know? That’s where science comes in. It gives us a method for testing our assumptions and getting trustworthy results. And that’s really the definition of research.

First, you search with an initial study, then you research with follow-up studies to confirm. But some researchers have warned that many scientific studies cannot be replicated. This is what’s called the replication crisis.

To find out how deep the problem goes, the US Defense Department Research Projects Agency– that is DARPA– funded one of the largest analysis of social science called the SCORE project. With the help of hundreds of researchers, they checked the results of thousands of papers across economics, education, and psychology.

And the results? Researchers could only replicate half the papers analyzed.

Here to talk about the project and give an update on how the scientific world is trying to change itself, is Dr. Tim Errington, Senior Director of Research at the Center for Open Science, one of the leads on this project. Joining him is Dr. Abel Brodeur, Professor of Economics at the University of Ottawa and Founder of the Institute for Replication, who recently released the results of a separate replication study.

Welcome both of you to Science Friday.

TIM ERRINGTON: Thanks for having me.

ABEL BRODEUR: Thanks for the invite.

IRA FLATOW: All right. Thanks. Thank you for joining us. We have covered replication on this show in the past. So what made this project different, and why was DARPA involved?

TIM ERRINGTON: Yeah, so there’s a couple aspects to that to describe. One reason that DARPA was involved, and I can’t speak on behalf of them, is that there were prior projects that were reported having challenges in terms of having confidence in research. How much confidence should we have in anything that’s being published. I think the wrong way to do that is it’s binary. It’s published equals it’s true. Not published means it’s not true. So that’s definitely not the case.

And DARPA was paying attention to this. They use research. They use the social and behavioral science research a lot. And they were trying to figure out methods to sort through that, through how much confidence they should have in any given research finding. And so because we have a challenge in understanding how much confidence we should have in something because of this replicability issue, they wanted to invest in seeing if there were ways to develop automated tools to assist with that.

And so SCORE is a project that was designed to not just repeat the experiments, but to use that as a ground truth in the development of AI tools to assist with confidence assessment. Again, something that’s largely done just by humans. So that’s one key aspect. This was started in 2019, so before a lot of that AI discussion that’s going on now.

The other part that makes this quite unique is the breath. The breath of this in terms of disciplines across the social behavioral sciences, as you mentioned. And also the volume. We looked at 10 years worth of journals across 62 different journals. So that is a much larger scale than any prior project.

IRA FLATOW: What’s at stake here? I mean, how are the papers you analyze being used in making policy decisions? I imagine, that’s an important part of it.

TIM ERRINGTON: Yeah, there’s a lot of– I think, in terms of the breadth of the research that’s used, it’s used in a lot of different aspects of from policy to, I think, even our own individual actions. So let me give you some examples of the type of research that was included in this project.

It could be something looking at how public employees leave the US Civil Service. Can see how that would have an impact on policy decision-making. Or, does being the victim of a crime spur political participation? Again, can see the impacts that would have in terms of various policy or decision-making. And those are the types of research across the social behavioral sciences that this project was investigating.

IRA FLATOW: Abel, do you have anything to add to that?

ABEL BRODEUR: Yeah. I think, one of the big problem is there’s a lot of good research out there. And unfortunately, there’s also research that is less good. Just, like, to put this in context. A paper might not replicate for many reasons. It can be just like there’s data missing. You’re trying to run it, but somehow, it produces different results. Maybe things are not robust when you start playing with the data, or maybe you use completely new data, and you get a different result. And I think what’s really cool about all these projects coming out right now is we get a much better idea of actually, there’s problems everywhere, and they compound.

IRA FLATOW: Tim, what were some of the common trends then, that you saw in the SCORE project?

TIM ERRINGTON: Right. One would be just sharing data sharing. So if you want to ask the question, somebody publishes a finding. They report some statistic. Can I have confidence in just the reporting of that statistic?

Well, in order to do that, I need the data. And I need the method that they used just to repeat that simple reproduction step. But it’s hard to do that when you don’t share data. Now–

IRA FLATOW: Are you saying they don’t– just to jump in there. Are you saying they don’t share the methodology, either, I mean, so you can reproduce what they’ve done?

TIM ERRINGTON: Right. So there’s a lot of dimensions here that don’t always get shared, which is interesting. Because you think of that hallmark of science as I share everything.

IRA FLATOW: Yeah.

TIM ERRINGTON: Then that’s– actually, that’s where I have confidence in it. And there’s a lot of nuance here.

So I’ll first say that the high-level statement, which is yeah, across the board, there’s probably not as much sharing as one would expect or hope for. That includes the data, the analysis, even the methods of collecting the data, the sources of data. It starts to get really complex. And so when you don’t have that information, you’re stuck doing a couple of things.

You just trust it at face value. It got published. It must be true. But that’s not how science works. Or you’re left making assumptions to fill in the gaps. Both of those are not ideal.

So one thing that we can do is start sharing more. And you have to incentivize people to share data. It doesn’t mean it’s more reliable. It just means that now you can interrogate it the way that Abel was just talking about.

IRA FLATOW: And, Abel, you also just released a new study analyzing economic and political research. I’m wondering how yours was different from the SCORE, and what did you find?

ABEL BRODEUR: I think the results are much more positive, and there’s a lot of reasons for that. And I think the main one is we looked at recent papers. So we started this project in ’22, so we looked at papers published in ’22 and at the end of 2023. And data sharing is going up. There’s less and less hesitancy, I would say, in economics, political science in terms of data sharing.

That doesn’t mean everything is perfect. We still find coding errors in 15%, 20% papers. Results are robust, maybe, 75% of the time. So that’s better than, let’s say, 50% or some of the rates that were documented. So I would say things are just getting better.

That masks a lot of things, though. So for instance, at the Institute for Replication, we mass reproduce studies in economics, political science, psychology, public health, environmental research. And data sharing practices are completely different across fields. So I would say things are getting better. That’s mostly what we document, I think, but it’s definitely far away from being perfect.

IRA FLATOW: Are they getting better because we recognize the problem or did they just get better for some other reason?

ABEL BRODEUR: I think that’s part of the story. I look back at my own research that I was doing back in 2010 as a master’s student, and my coding was terrible. And I just–

IRA FLATOW: I’m not laughing. I’m sorry.

ABEL BRODEUR: And that’s fine. I laugh at it too, because sometimes I look back at, it and I’m like, dear God, that was bad. But also, I remind myself then, back then it was impossible virtually to look at other people’s codes. There was no codes online. Whereas I look at my own students, PhD students nowadays, and they have access to so much coding of other researchers.

IRA FLATOW: And coding, for the layperson, what does that mean?

ABEL BRODEUR: So let’s say you’re doing a study. You’re interested in the effect minimum wage on, don’t know, unemployment rate or the effect of policy on deforestation in the Amazonian rainforest. You’re going to gather data, maybe satellite data to understand is there deforestation happening. Then you have another data set on the policy change, but then you have some control variables to make sure that you get a causal effect. But then you need to merge all these data together. So you need to code that in Excel, or it could be in R, Python, et cetera, status quo software.

But then you need to run the regression. You need to actually do the analysis in the statistical software. And you can make stupid mistakes. It could be like duplicates. You have the same individual again and again by accident. It can be you say something in a paper, but actually in your codes, you did something else. You didn’t really look at deforestation station using this specific ways of measuring it.

But I think, just code review, like, it’s kind of crazy. But imagine you do a research paper. You have your research assistant doing the coding. You’re done. You submit this to a journal, and they accept it. And it went through peer review. Their external expert that looked at it.

And then it’s published. During the entire process, nobody ever looked at your data in code. They trust 100%, what you’ve done.

And what we’re trying to do is to go after that and being like, well, let’s have a look at the data and codes to see whether there’s errors and things like this. And you would think this happens– it should happen throughout the process of publishing, but the norms are just not there. So things need to change, and I think they will at one point because of AI.

IRA FLATOW: Yeah. Tim, what do you think of this?

TIM ERRINGTON: Yeah. You asked a really good question of is it just happening on its own, or is part of this discussion actually part of what’s causing these changes? Because I absolutely do agree. I think things are improving. And I think there’s a couple of reasons for it.

One is we’re talking about it. We’re talking about it here on Science Friday. So this is an example of it getting to the point where it’s more common to have these discussions.

I completely agree with the point of the norms are the biggest driver. In many cases, I wish my graduate education taught me how to replicate someone else’s results as part of my educational practice.

IRA FLATOW: Interesting.

TIM ERRINGTON: Because the best way to learn how to document the methods you used or the way that you analyze your data is to repeat what someone else did, and see if you can get the same result. It’s easier to see someone else’s error than it is in yourself, and the best way to do it is replicate. But it’s not going to be just the researchers. So to be really clear, there’s a lot of actors in the system. The journals have a role. So as they change their policies, that helps.

Institutions have a role in this, as well. What do they hold accountable to their researchers, as well as how they train students. And funders have a role. A lot of this, if we’re talking in the US, that’s NSF, NIH, largely, for the most part. But every funder has a role in terms of what they are asking of their researchers and how they support the research.

So say that differently, if you just support the research and the paper, but you don’t support that rigor behind it in terms of sharing and documentation, then you don’t get that rigor and documentation that now we’re essentially having a challenge. As we back through the research one more time, we’re having a hard time finding things because it wasn’t prioritized.

IRA FLATOW: Isn’t that eventually going to bite you later?

TIM ERRINGTON: You mean in the sense of I’m a researcher, I publish something, I don’t–

IRA FLATOW: Yeah.

TIM ERRINGTON: –maybe share everything. Yeah. I mean, so there’s a couple of thoughts here. So one would be, oh, my gosh, there’s somebody shady hiding something. There’s always bad actors out there. I’m sure that’s the case.

I think a lot of this is, honest, just we’re busy people. This is really hard. Research is really, really hard. And I think the vast majority of what we’re finding is just wow, when I just kind of rushed through, trying my best, but honest mistakes kind of– little ones can pile up over and over. And especially since we’re so driven by positive results and that kind of really positive storytelling, it’s very easy to think that there’s a mistake when you find something exciting. It’s only when you don’t find what you expect, that’s when you scrutinize.

So I think that’s actually what’s going on for the most part. At some point, we’re investing in the wrong spot at the wrong point in time. Which is, again, when we get back to the point of replication, what DARPA was trying to do, that’s the whole– that’s the big million dollar question. How much confidence do I have in a given result at that given moment in time? It’s never going to be 100%.

IRA FLATOW: After the break, how big a role is AI going to play in replication? Stay with us.

[THEME MUSIC]

AI was mentioned a little time ago in our discussion. Tim, what role do you see AI playing in the future of replication work?

TIM ERRINGTON: That’s a great question. So I think I see two futures, and I can’t quite tell which one we’re going towards. We’re probably going towards both. I see one where– we’re kind of entering it a bit, which is it’s easy to see AI-generated anything these days. And especially since a lot of the scientific process is communicated through written word and journals, it’s very easy to have AI generate that. Which means, it’s really challenging, even more so, to say, how replicable is this research when you’re like, wait, did a human actually do this, or is this just AI-generated language? Because it’s really clever.

So this is going to cause, and it already is causing, problems in terms of trying to understand what do we know from AI versus non-AI. But I actually think there’s a huge promise at the exact same time. So when we think about some of the low-hanging fruit challenges we have, part of the challenge of figuring out how to share your data is how to describe your data or where to deposit your data. Those can get improved with AI. If we do have access to data and we have access to code, you can actually start to have AI agents run the reproductions herself. That’s very simple.

IRA FLATOW: Wow.

TIM ERRINGTON: In fact, I already know tools that can do that. If you give them the code and the data, they can run it themselves again.

Now, this is where it starts to go a little farther. If you want to have them develop AI agents that can do plausible different analytical strategies, which I think is amazing, that robustness check, well, we’re getting there. There’s a large universe of plausible analysis.

The trick is going to be, do AI agents just do everything? And in that case, sometimes, it’s really good designs. Other times, it’s just gobbledygook. It’s like, I don’t. They just threw a bunch of variables into an algorithm and popped out an answer. But that would be an amazing tool because again, we as humans are really good at pattern recognition. But if we’re only picking one analysis, know that’s not looking at the possible universe of plausible approaches to test a hypothesis. But AI could help us.

IRA FLATOW: All right. As I wrap up here, for people listening, Abel, let me start with you. What do you think the takeaway is for both your work and the SCORE project? I think some people listening might think, what science headlines can I trust?

ABEL BRODEUR: I think that’s a fair statement, unfortunately. And so, the way I tell people, like the way I consume research personally is, if I see a new result, something like innovative, like the first time that I hear about something, I don’t believe it. And I wait that other researchers find a similar result, and again and again. And maybe after three, four, or five times, I start to believe it.

And there’s nothing wrong with that, I think. We like headlines, and we like progress, but I think there’s a cost to that. And the cost could be that we start doing lots of research along the same lines without making sure that actually the foundation of the initial result are strong.

My personal problem is, I don’t know which result I can really trust versus those that I cannot trust. And it’s very annoying. And I’m patient because of that. And I don’t put all my eggs in the same basket. And I wait to see whether things replicate, whether other researchers are going to find the same pattern, and so on and so on.

The same way as, I think, during COVID. The first time there was a vaccine, people were like a bit skeptical. But then two or three companies came up with vaccine, and now you’re thinking, OK, maybe there’s something to it. And I think it’s the same for pretty much anything in life. You just need things to be repeated and replicated, and that’s how you build confidence into a scientific result.

IRA FLATOW: Tim, want to weigh in on that?

TIM ERRINGTON: Yeah. Science is a process. It’s really easy to forget that, and it’s really important to remember it. Each of our findings, each paper, each headline we read about, that’s just a piece of a puzzle. We’re trying our best. We’re humans, and we’re at the forefront of knowledge. It doesn’t mean that somebody publishes a paper and all of a sudden that is, quote unquote, “the truth.” All we’re doing is trying to get closer and closer, and sometimes going backwards is closer.

The second thing is, think about all the amazing benefits of research in our society around us every single day. And we just told you, it’s really not that optimal. It’s not optimized that well by applying the scientific process to how we do science. And so, to me, I think, wow, this is a great opportunity for us. We’re doing amazing things, and there is a lot more that we can do if we can keep improving the way that we conduct and share our research.

IRA FLATOW: Well, hopefully, giving some light to it here on this show will help. I want to thank both of you for taking time to be with us today.

ABEL BRODEUR: Thank you so much.

TIM ERRINGTON: You’re welcome. Thanks for having us.

IRA FLATOW: Dr. Tim Errington, Senior Director of Research at the Center for Open Science, and Dr. Abel Brodeur, Founder of the Institute for Replication. This episode was produced by Dee Peterschmidt. I’m Ira Flatow. Thanks for listening.

[THEME MUSIC]

Copyright © 2026 Science Friday Initiative. All rights reserved. Science Friday transcripts are produced on a tight deadline by 3Play Media. Fidelity to the original aired/published audio or video file might vary, and text might be updated or amended in the future. For the authoritative record of Science Friday’s programming, please visit the original aired/published recording. For terms of use and more information, visit our policies pages at http://www.sciencefriday.com/about/policies/

Meet the Producers and Host

About Ira Flatow

Ira Flatow is the founder and host of Science FridayHis green thumb has revived many an office plant at death’s door.

About Dee Peterschmidt

Dee Peterschmidt is Science Friday’s audio production manager, hosted the podcast Universe of Art, and composes music for Science Friday’s podcasts. Their D&D character is a clumsy bard named Chip Chap Chopman.

Explore More