Science is in crisis. Everyone in the scientific community knows about it but few want to talk about it. The crisis is one of honesty. A junior scientist (like me) asks himself a similar question to Galileo in 1633: how much honesty is desirable in science?
According to nearly all empirical scientific publications that I have read, scientists allegedly work like this:
Scientists call this ‘the story’ of the paper. This ‘story framework’ is so entrenched in science that the vast majority of scientific publications are required to be organised according to its structure: 1) Introduction, 2) Methods, 3) Results, 4) Discussion. My own publication is no exception.
However, virtually all scientists know that ‘the story’ is not really true. It is merely an ideal-case-scenario. Usually, the process looks more like this:
Scientists call some of the added red arrows questionable research practices (or QRP for short). The red arrows stand for (going from left to right, top to bottom):
1) adjusting the hypothesis based on the experimental set-up. This is particularly true when a) working with an old data-set, b) the set-up is accidentally different from the intended one, etc.
2) changing design details (e.g., how many participants, how many conditions to include, how many/which measures of interest to focus on) depending on the results these changes produce.
3) analysing until results are easy to interpret.
4) analysing until results are statistically desirable (‘significant results’), i.e. so-called p-hacking.
5) hypothesising after results are known (so-called HARKing).
The outcome is a collection of blatantly unrealistic ‘stories’ in scientific publications. Compare this to the more realistic literature on clinical trials for new drugs. More than half the drugs fail the trial (Goodman, 2014). In contrast, nearly all ‘stories’ in the wider scientific literature are success stories. How?
Joseph Simmons and colleagues (2011) give an illustration of how to produce spurious successes. They simulated the situation of researchers engaging in the second point above (changing design details based on results). Let’s assume that the hypothesised effect is not real. How many experiments will erroneously find an effect at the conventional 5% significance criterion? Well, 5% of experiments should (scientists have agreed that this number is low enough to be acceptable). However, thanks to the questionable research practices outlined above this number can be boosted. For example, sampling participants until the result is statistically desirable leads to up to 22% of experiments reporting a ‘significant result’ even though there is no effect to be found. It is estimated that 70% of US psychologists have done this (John et al., 2012). When such a practice is combined with other, similar design changes, up to 61% of experiments falsely report a significant effect. Why do we do this?
The Pope of 1633 is back
If we know that the scientific literature is unrealistic why don’t we just drop the pretense and just tell it as it is? The reason is simple: because you like the scientific wonderland of success stories. If you are a scientist reader, you like to base the evaluation of scientific manuscripts on the ‘elegance’ (simplicity, straight-forwardness) of the text. This leaves no room for telling you what really happened. You also like to base the evaluation of your colleagues on the quantity and the ‘impact’ of their scientific output. QRPs are essentially a career requirement in such a system. If you are a lay reader, you like the research you fund (via tax money) to be sexy, easy and simple. Scientific data are as messy as the real world but the reported results are not. They are meant to be easily digestible (‘elegant’) insights.
In 1633 it did not matter much whether Galileo admitted to the heliocentric world view which was deemed blasphemous. The idea was out there to conquer the minds of the renaissance world. Today’s Galileo moment is also marked by an inability to admit to scientific facts (i.e. the so-called ‘preliminary’ research results which scientists obtain before applying questionable research practices). But this time the role of the Pope is played both by political leaders/ the lay public and scientists themselves. Actual scientific insights get lost before they can see the light of day.
There is a great movement to remedy this situation, including pressure to share data (e.g., at PLoS ONE), replication initiatives (e.g., RRR1, reproducibility project), the opportunity to pre-register experiments etc. However, these remedies only focus on scientific practice, as if Galileo was at fault and the concept of blasphemy was fine. Maybe we should start looking into how we got into this mess in the first place. Focus on the Pope.
— — —
John, L., Loewenstein, G., & Prelec, D. (2012). Measuring the Prevalence of Questionable Research Practices with Incentives for Truth-Telling SSRN Electronic Journal DOI: 10.2139/ssrn.1996631
Simmons, J., Nelson, L., & Simonsohn, U. (2011). False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant Psychological Science, 22 (11), 1359-1366 DOI: 10.1177/0956797611417632
— — —
Picture: Joseph Nicolas Robert-Fleury [Public domain], via Wikimedia Commons
PS: This post necessarily reflects the field of science that I am familiar with (Psychology, Cognitive Science, Neuroscience). The situation may well be different in other scientific fields.