Why is a surprising amount of psychological research unreplicable? Psychology calls itself a science but often falls short on the replication test of scientific merit. I took a closer look at the data to find out why. The journal Psychonomic Bulletin and Review will publish the findings very soon, but the accepted pre-print is already available now (download it here). The take-home message is that psychology just cannot go on like this.
The scientific journal Science called it one of the breakthroughs of the year 2015. 270 researchers combined forces to estimate the reproducibility of their field. Surprises were guaranteed but what they found was more dispiriting than anything. Psychological science is not nearly as replicable as expected: instead of the predicted 92% replication rate only 36% of previously reported effects were found in the replication studies. This take-home message led to a media frenzy which ‘completely overwhelmed’ the lead author Brian Nosek.
What many people do not realise is that psychologists aren’t actually sure why it was so difficult to reproduce previous studies. There are essentially two big camps. One camp, most recently re-invigorated by a critical commentary in the journal Science, suggests that the replication teams did not reproduce all details of previous studies. Essentially, they ran slightly different studies compared to the originals and, unsurprisingly, they got different results.
A second camp suggests that it was so difficult to find the original effects in the replication studies because many original effects never actually existed or were much smaller than reported. According to this view, psychological researchers apply all sorts of tricks to their data in order to find what they are after. Original researchers find beautiful patterns in the chaos of the real world but the patterns are not much more than a product of their hopes and dreams of promotion.
I took a closer look at the data in order to see which camp got it right. My idea is simple: if an effect was already replicated before Brian Nosek and his huge team began their work then it should be easier for the huge reproducibility team to also reproduce it. In essence, if you see an effect twice you are more likely to see it a third time compared to only having observed it once.
This is actually the prediction of the first ‘change in details’ camp. Previous replications by the original authors are hardly ever of the exact kind, so previous successful replications despite small differences suggest that a new replication by the reproducibility team will also succeed despite possibly small deviations from the original study.
What I found was the opposite. There is no difference in the replication success of psychological effects, whether they were previously replicated by the original authors or not (see previous blog post). How can that be? I believe that this result supports the second camp. If you can apply all sorts of tricks to your data once, then you can also apply them twice. What looks like a replication is sometimes just trickery applied to two different data sets. So, the fact that a subset of psychological effects which one would expect to be more easily replicated are no different to all other psychological effects suggests that questionable research practices are quite common in psychology.
Fortunately, there are already good initiatives which change scientific practices in order to improve the situation. I am confident that in a few years from now psychology will fare much better in terms of reproducibility. Until then, articles like mine will hopefully convince the last doubters that psychology really should not continue using questionable research practices.
— — —
Gilbert, D., King, G., Pettigrew, S., & Wilson, T. (2016). Comment on “Estimating the reproducibility of psychological science” Science, 351 (6277), 1037-1037 DOI: 10.1126/science.aad7243
Open Science Collaboration (2015). Estimating the reproducibility of psychological science Science, 349 (6251) DOI: 10.1126/science.aac4716