Could Lowering The P-Value Threshold Benefit Research?
For decades, scientists have been using the probability value, commonly known as p-value, to test the significance of their findings. The p-value falls from 0 to 1, and the lower the number, the greater the chance that the findings are statistically significant and not just a coincidence.
Within the research community, the widely accepted threshold for a significant p-value has been set at 0.05 or below. But there has always been a debate about this number. Last year, a team of 72 scientists put forth a recommendation for lowering the value to 0.005, and this week the American Statistical Association is discussing the future of the p-value at the Symposium for Statistical Inference in Bethesda, Maryland.
While lowering the threshold would reduce the number of false positives (studies which report a positive effect when there is none), it could also result in negative consequences. Dalmeet Singh Chawla, a science journalist who has covered the p-value debate, joins John Dankosky to talk about the benefits and restraints of lowering the p-value threshold.
Correction 10/27/17: A previous version of this article stated that the American Statistical Association put forth the recommendation for lowering the value to 0.005. It was actually a team of 72 scientists who put forth the recommendation, and the American Statistical Association is discussing. We regret the error.
Dalmeet Singh Chawla is a science journalist based in London, UK.
JOHN DANKOSKY: And now it’s time to play good thing, bad thing.
Because every story has a flip side, if you want to spark a debate in a room full of scientists, just mentioned the p-value or the probability value. When you’re studying how two phenomena might be related, a low p-value means that your findings are significant, not just a fluke. But how low is low enough? For years the p-value threshold was set at 0.05. If your p-value is lesser than that, well, congrats. Your findings are significant. If it’s greater, not so much.
But why 0.05? That’s the question a lot of scientists are asking. Many think it’s not low enough. Last year the American Statistical Association proposed lowering the threshold to 0.005. That didn’t sit well with everyone. Joining me to talk more about this is Dalmeet Singh Chawla, who is a science journalist based in London. Dalmeet, welcome to Science Friday.
DALMEET SINGH CHAWLA: Thanks for having me John.
JOHN DANKOSKY: So first of all, why would lowering the p-value threshold be a good thing?
DALMEET SINGH CHAWLA: Well, what researchers in a manuscript in July argued– and this includes very prominent researchers– they argued that lowering the threshold would reduce the number of false positives being reported in literature. That basically means incorrectly saying that an effect exists when it actually doesn’t. That was like kind of the main argument.
The second main point is that it would make it harder for researchers to misuse the p-value. Now that is become an increasingly known issue to anyone exploring the literature, that increasingly what researchers are doing is instead of having a hypothesis and then collecting data to test it, they’re collecting data and actively searching for trends that meet the p-value they want just so they can report it as significant.
JOHN DANKOSKY: Well before we get to the bad side of this, I have to ask, why this number in the first place? Why did we arrive at this?
DALMEET SINGH CHAWLA: Well 0.05, funny enough, it just kind of existed from the early 20th century. There’s no particular reason. It just stayed around by tradition. It had just been passed down from generations of researchers. And until recently, it was just kind of a standard way of judging significance or evidence levels of research. And now in the few last few years as more problems with the lack of reproducibility of scientific research is being reported, an increasing amount of that is due to statistical flaws. And researchers have pointed out there’s been a sort of a methodological weakening you could say in recent years.
JOHN DANKOSKY: So tell us about the bad part then. What would be bad about lowering this threshold?
DALMEET SINGH CHAWLA: Well lowering the threshold would increase the number of false negatives, essentially incorrectly reporting that an effect doesn’t exist when it actually does, which is a counterargument for the false positive. Now the July manuscript researchers acknowledged that. And what they suggested was we raise the sample sizes of all experiments by, on average, 70%, which is what they say would prevent the false negatives from increasing, at the same time dramatically reduce the false positives that are being reported. But increasing sample size by 70% in every experiment, only rich researchers or well-funded labs would be able to do that. And also it would excavate or make worse what we call the file drawer problem or publication bias, whereby studies with negative results are kind of stashed away and not reported in literature, which is a well-known issue among academics.
JOHN DANKOSKY: Dalmeet Singh Chawla is a science journalist based in London. Thanks so much for sharing this good thing, bad thing with us. I really appreciate that.
DALMEET SINGH CHAWLA: Well thanks so much for having me John.
JOHN DANKOSKY: Now after the break we’re going to talk with famed primatologist and conservationist Jane Goodall. And if you’ve got a question for her– I know I’ve got a few– call us at 844-724-8255. That’s 844-SCI-TALK.