In A Noted Food Lab, The Glass May Be Half Empty
If you like to read about the psychology around food and eating, you’ve probably come across stories based on research from Cornell’s Food and Brand Lab, directed by Brian Wansink. Over the years, the lab has published many studies that have caught the eye of the media, from using a magically-refilling soup bowl to help study portion sizes, to work involving the effect of plate size on meals, to investigating how the pricing of meals and wine affects the perceptions of the people consuming them.
[This film composer shares some notes on what it takes to create a score for science.]
In a article published this week by Buzzfeed News, science reporter Stephanie Lee reports on a history of shoddy research practices in the lab, and a chain of emails that indicates a practice of “p-hacking”—a statistical wrangling of data aimed at making a borderline result appear to be statistically significant. Lee discusses her reporting with Ira, and talks about the challenge of reproducibility in scientific research.
Stephanie Lee is a science reporter for Buzzfeed News, based in San Francisco, California.
IRA FLATOW: This is Science Friday. I’m Ira Flatow. A bit later in the hour, medical cures from the olden days, like the radioactive jock strap– and no, don’t try that one at home. But first, if you like to read about the psychology around food and eating, you’ve probably come across stories based on research from the famous Cornell Food and Brand Lab. Over the years, the lab has published many studies that have caught the eye of the media, including Science Friday– from using a magically refilling soup bowl to help study portion sizes to work involving the effect of plate size on meals to investigating how the pricing of meals and wine affect the perceptions of the people consuming them.
But the glass may be half empty. In an article published this week by BuzzFeed News, science reporter Stephanie Lee reports on a history of shoddy research practices in the lab, and a chain of emails that indicates a practice of P-hacking. That’s a statistical wrangling of data aimed at making a borderline result appear to be statistically significant.
Stephanie Lee joins us from KQED in San Francisco. Welcome to Science Friday.
STEPHANIE LEE: Hi, thanks for having me.
IRA FLATOW: You’re welcome. Can we set the scene first? First, what is this lab? What are some of the studies people would have heard of, for example?
STEPHANIE LEE: Right. So you probably heard that eating off a small plate makes you feel fuller faster, or the shape of your wineglass determines how much you pour into it. So those findings come from Brian Wansink. And he’s the head of the Cornell Food and Brand Lab, where he’s published hundreds of studies over the years about eating behaviors, how and why we eat.
He’s very well known, and you’ve probably seen him on the Today Show or Rachel Ray or read it in the New York Times. But more than a year ago, independent researchers started taking a closer look at his work. Their names are Tim van der Zee, Nick Brown, Jordan Anaya, James Heathers.
And they started finding a lot of errors and things that just did not add up. And he’s had to retract six papers. And he’s corrected more than a dozen.
IRA FLATOW: To be clear, you’re not alleging outright fraud here.
STEPHANIE LEE: So I’ve been reporting on him for a few months. And recently, I got eight years of emails between him and colleagues, interviewed a student who used to work in the lab. And what I found was that, for years, they sliced and diced low quality data to get findings that could go viral, that could make headlines. And even though they acknowledged among themselves that their data was weak, they aggressively manipulated it to get findings that they wanted or that sounded interesting to an extent beyond sound scientific practice. One expert told me this is not science, it is storytelling.
IRA FLATOW: And in some cases, there’s something called an effort at P-hacking. And we in the media know a little bit about that. But tell us in a little details of what is P-hacking.
STEPHANIE LEE: Sure. So when a scientist has a hypothesis and they set out to test it and experiment and they get results, how do they show that their results are real? So a tool that’s become widely used in scientific research to show significance is called a P-value. And a P-value is basically the probability that what you found was in fact rare.
So what’s become this universal benchmark is to have a P-value that’s less than or equal to 0.05. There’s not really anything magical about the value 0.05. But it has just become this universal guideline in at least psychology and biology research– that if your results match 0.05 or are less, then that’s considered a statistically significant result.
But if you manipulate data through various means in a practice called P-hacking, you can get a value that’s at 0.05 or less, but it can just actually be a false positive. The connection may not actually be real.
So we see this being an overwhelming theme in these emails out of the Cornell Food and Brand Lab– that desire to get results that are interesting and that match that P-value just echoing over and over in their research.
So for example, they did a study where they put stickers of Elmo from Sesame Street on apples. And they put them next to cookies. And they had kids choose between them at lunch. And what they hoped would happen was that the cartoon stickers would entice kids to pick the healthy fruit over the junk food.
So they did this experiment, and they looked at the data. And the P-value was actually 0.06, which is not that threshold that they wanted. And so they said in an email, if you can get the data, and it needs some tweaking, it would be good to get that one value below 0.05. They called the 0.06 P-value a sticking point.
So they had a result that they wanted to show. And they knew it was statistically not significant. But they wanted to tweak the data to get it below that.
In another case, they did an experiment with an all-you-can-eat pizza buffet in New York, where they charged some diners $4, some diners $8. Then they asked people how they thought about their meals. And then they collected their data, and was looking at it afterwards.
And Dr. Wansink gives the data set to a young scientist from Turkey who’s visiting and says, hey, we collected this data. Why don’t you take a look at it?
And he tells her to basically run all these different combinations of tests on it, looking at all these different variables, between the dimers and what they said afterwards. And he says, quote, to “squeeze some blood out of this rock.”
And so in this case, it’s not really clear if they even had a hypothesis in mind going into it. But they want to mine the data until they find a connection that is statistically significant and interesting.
IRA FLATOW: We called the lab following your article. And they said that they stand by their research. How normal is this kind of P-hacking and sifting and massaging of data in science?
STEPHANIE LEE: Well, so the scientists who helped me review these emails told me that they think this is pretty egregious. One person said, this is P-hacking on steroids. Another said, Dr. Wansink seems to be a consistent and repeated offender of statistics.
But one interesting reaction I’ve heard since my story was publish is that it’s a little bit uncomfortable that what they were doing seemed to be quite egregious, but maybe is just a deeper degree of something that goes on in a lot of science labs already, just perhaps not to this extent.
IRA FLATOW: And what role does the media have here, people like Science Friday, other people who cover these sorts of stories?
STEPHANIE LEE: Yeah. I think what’s interesting about this story is that the practices at the Cornell Food and Brand Lab seem to go beyond sound scientific practice. But what they discuss is they’re responding to incentives that affect every scientist. So the hard truth for all scientists is that the more you publish and the more interesting your findings are that you publish, the more likely you are to get rewarded, in terms of getting tenure or getting funding or getting covered by the media, by journalists like us.
And these are real pressures that affect everybody in science. How you choose to respond to it is up to you. But I think that this story should hopefully lead to some soul searching on the part of everybody involved in the system– so when universities are deciding who to hire and give tenure to, not just looking at somebody who’s produced a lot of really cool work; or funding institutions, same thing there; or journals deciding to only publish the novel findings that will make a splash; in the media, when deciding what findings to cover, maybe thinking more critically about what the study actually says and has it been shown by anybody else to be true, and how much attention and skepticism to give to it.
IRA FLATOW: It’s a good point, because you know what does not get a lot of attention in science research– and we cover an awful lot of it– is reproducibility in science, how difficult it is for us to, for example, apply lab work from mouse studies to humans, which hardly ever– maybe 50% of the time can be translated.
STEPHANIE LEE: Right.
IRA FLATOW: And then the fact that if other scientists don’t want to do an experiment that someone else has done because that’s just not how we do it– we like to do our own stuff– doesn’t this speak to a larger problem of reproducibility in science?
STEPHANIE LEE: Absolutely. And I think the unfortunate thing is that, like you said, a scientist trying to reproduce somebody else’s experiment is just– there’s not that much glory in it, very little. And it takes a lot of time and effort. And the reward is just not really clear for that.
And so I think that’s starting to change a little bit, at least in psychology and a couple of other fields. Over the last few years, some psychologists have taken it upon themselves to try to replicate some of the field’s most popular widespread findings. And one massive effort was done on this in 2015. They tried to reproduce like 100 experiments. And they could only do less than half of the original findings.
So that was kind of a wake up call for the field. And I think it’s starting to spread to other fields too, now, where we have to create incentives for scientists to do good research the first time around, and then for other scientists to reproduce it on their own terms.
IRA FLATOW: And especially now, since science is under such pressure to justify itself– it takes it upon itself, in the public eye, to go out and do these things and be more reliable, to be more honest and transparent about it.
STEPHANIE LEE: Yeah. Absolutely. I think that people hopefully, with stories like those people, consumers, members of the public, they deserve to hear about science that’s reliable and tested and done well, and not just hear headlines like “chocolate is good for you” or whatever, all these single one-off studies.
IRA FLATOW: So where do you go from here? Are you going to continue to be the watchdog looking at other things?
STEPHANIE LEE: Well, so the Cornell Food and Brand Lab stands by its work. Cornell University, the employer, is investigating the lab, and hasn’t said much more than that. So I’m continuing to watch what happens. But there is more happening with the story kind of by the day.
On Monday, Dr. Wansink issued his sixth retraction for a study. A couple of days later, I reported that one of his longtime collaborators, Dr. Collin Payne from New Mexico State University, had left the university for reasons unclear. And then The Joy of Cooking, a cookbook, tweeted about it. So it’s all been very interesting.
IRA FLATOW: Well, Stephanie, thank you for your work, and continue to keep doing it– Stephanie Lee, from KQED in San Francisco, science reporter at BuzzFeed News.
As Science Friday’s director, Charles Bergquist channels the chaos of a live production studio into something sounding like a radio program. Favorite topics include planetary sciences, chemistry, materials, and shiny things with blinking lights.