Month: November 2015

Are pre-registrations the solution to the replication crisis in Psychology?

Most psychology findings are not replicable. What can be done? In his Psychological Science editorial, Stephen Lindsay advertises pre-registration as a solution, writing that “Personally, I aim never again to submit for publication a report of a study that was not preregistered”. I took a look at whether pre-registrations are effective and feasible [TL;DR: maybe and possibly].

[I updated the blog post using comments by Cortex editor Chris Chambers, see below for full comments. It turns out that many of my concerns have already been addressed. Updates in square brackets.]

A recent study published in Science found that the majority of Psychological research cannot be reproduced by independent replication teams (Open Science Collaboration, 2015). I believe that this is due to questionable research practices (LINK) and that internal replications are no solution to this problem (LINK). However, might pre-registrations be the solution? I don’t think so. The reason why I am pessimistic is three-fold.

What is a pre-registration? A pre-registered study submits its design and analysis before data is acquired. After data acquisition the pre-registered data analysis plan is executed and the results can confidently be labelled confirmatory (i.e. more believable). Analyses not specified before are labelled exploratory (i.e. less believable). Some journals offer peer-review of the pre-registration document. Once it has been approved, the chances of the journal accepting a manuscript based on the proposed design and analysis are supposedly very high. [Chris Chambers: “for more info on RRs see”%5D


1) Pre-registration does not remove all incentives to employ questionable research practices

Pre-registrations should enforce honesty about post hoc changes in the design/analysis. Ironically, the efficacy of pre-registrations is itself dependent on the honesty of researchers. The reason is simple: including the information that an experiment was pre-registered is optional. So, if the planned analysis is optimal, a researcher can boost its impact by revealing that the entire experiment was pre-registered. If not, s/he deletes the pre-registration document and proceeds as if it had never existed, a novel questionable research practice (anyone want to invent a name for it? Optional forgetting?).

Defenders of pre-registration could counter that peer-reviewed pre-registrations are different because there is no incentive to deviate from the planned design/analysis. Publication is guaranteed if the pre-registered study is executed as promised. However, two motives remove this publication advantage:

1a) the credibility boost of presenting a successful post hoc design or analysis decision as a priori can still be achieved by publishing the paper in a different journal which is unaware of the pre-registration document.

1b) the credibility loss of a wider research agenda due to a single unsuccessful experiment can still be avoided by simply withdrawing the study from the journal and forgetting about it.

The take-home message is that one can opt-in and out of pre-registration as one pleases. The maximal cost is the rejection of one peer-reviewed pre-registered paper at one journal. Given that paper rejection is the most normal thing in the world for a scientist these days, this threat is not effective.

[Chris Chambers: “all pre-registrations made now on the OSF become public within 4 years – so as far as I understand, it is no longer possible to register privately and thus game the system in the way you describe, at least on the OSF.”]

2) Pre-registrations did not clean up other research fields

Note that the argument so far assumes that when the pre-registration document is revealed, it is effective in stopping undisclosed post hoc design/analysis decisions. The medical sciences, in which randomized control trials have to be pre-registered since a 2004 decision by journal editors, teach us that this is not so. There are four aspects to this surprising ineffectivetiveness of pre-registrations:

2a) Many pre-registered studies are not published. For example, Chan et al. (2004a,b) could not locate the publications of 54% – 63% of the pre-registered studies. It’s possible that this is due to the aforementioned publication bias (see 1b above), or other reasons (lack of funding, manuscript under review…).

2b) Medical authors feel free to frequently deviate from their planned designs/analyses. For example 31% – 62% of randomized controlled trials changed at least one primary outcome between pre-registration and publication (Mathieu et al., 2009; Chan et al., 2004a,b). If you thought that psychological scientists are somehow better than medical ones, early indications are that this is not so (Franco et al., 2015).

pre-registration deviations in psych science

2c) Deviations from pre-registered designs/analyses are not discovered because 66% of journal reviewers do not consult the pre-registration document (Mathieu et al., 2013).

2d) In the medical sciences pre-registration documents are usually not peer-reviewed and quite often sloppy. For example, Mathieu et al., (2013) found 37% of trials to be post-registered (the pointless exercise of registering a study which has already taken place), and 17% of pre-registrations being too imprecise to be useful.

[Chris Chambers: “The concerns raised by others about reviewers not checking protocols apply to clinical trial registries but this is moot for RRs because checking happens at an editorial level (if not at both an editorial and reviewer level) and there is continuity of the review process from protocol through to study completion.”]

3) Pre-registration is a practical night-mare for early career researchers

Now, one might argue that pre-registering is still better than not pre-registering. In terms of non-peer-reviewed pre-registration documents, this is certainly true. However, their value is limited because they can be written so vaguely as to be not useless (see 2d) and they can simply be deleted if they ‘stand in the way of a good story’, i.e. if an exploratory design/analysis choice gets reported as confirmatory (see 1a).

The story is different for peer-reviewed pre-registrations. They are impractical because of one factor which tenured decision makers sometimes forget: time. Most research is done by junior scientists who have temporary contracts running anywhere between a few months and five years [reference needed]. These people cannot wait for a peer-review decision which, on average, takes something like one year and ten months (Nosek & Bar-Anan, 2012). This is the submission-to-publication-time distribution for one prominent researcher (Brian Nosek):


What does this mean? As a case study, let’s take Richard Kunert, a fine specimen of a junior researcher, who was given three years of funding by the Max-Planck-Gesellschaft in order to obtain a PhD. Given the experience by Brain Nosek with his articles, and assuming Richard submits three pre-registration documents on day 1 of his 3-year PhD, each individual document has a 84.6% chance of being accepted within three years. The chance that all three will be accepted is 60.6% (0.8463). This scenario is obviously unrealistic because it leaves no time for setting up the studies and for actually carrying them out.

For the more realistic case of one year of piloting and one year of actually carrying out the studies, Richard has a 2.2% chance (0.2823) that all three studies are peer-reviewed at the pre-registration stage and published. However, Richard is not silly (or so I have heard), so he submits 5 studies, hoping that at least three of them will be eventually carried out. In this case he has a 14% that at least three studies are peer-reviewed at the pre-registration stage and published. Only if Richard submits 10 or more pre-registration documents for peer-review after 1 year of piloting, he has a more than 50% chance of being left with at least 3 studies to carry out within 1 year.

For all people who hate numbers, let me put it into plain words. Peer-review is so slow that requiring PhD students to only perform pre-registered studies means the overwhelming majority of PhD students will fail their PhD requirements in their funded time. In this scenario cutting-edge, world-leading science will be done by people flipping burgers to pay the rent because funding ran out too quickly.

[Chris Chambers: “Average decision times from Cortex, not including time taken by authors to make revisions: initial trial = 5 days; Stage 1 provisional acceptance = 9 weeks (1-3 rounds of in-depth review); Stage 2 full acceptance = 4 weeks”]

What to do

The arrival of pre-registration in the field of Psychology is undoubtedly a good sign for science. However, given what we know now, no one should be under the illusion that this instrument is the solution to the replication crisis which psychological researchers are facing. At the most, it is a tiny piece of a wider strategy to make Psychology what it has long claimed to be: a robust, evidence based, scientific enterprise.


[Please do yourself a favour and read the comments below. You won’t get better people commenting than this.]

— — —

Chan AW, Krleza-Jerić K, Schmid I, & Altman DG (2004). Outcome reporting bias in randomized trials funded by the Canadian Institutes of Health Research. CMAJ : Canadian Medical Association journal = journal de l’Association medicale canadienne, 171 (7), 735-40 PMID: 15451835

Chan, A., Hróbjartsson, A., Haahr, M., Gøtzsche, P., & Altman, D. (2004). Empirical Evidence for Selective Reporting of Outcomes in Randomized Trials JAMA, 291 (20) DOI: 10.1001/jama.291.20.2457

Franco, A., Malhotra, N., & Simonovits, G. (2015). Underreporting in Psychology Experiments: Evidence From a Study Registry Social Psychological and Personality Science DOI: 10.1177/1948550615598377

Lindsay, D. (2015). Replication in Psychological Science Psychological Science DOI: 10.1177/0956797615616374

Mathieu, S., Boutron, I., Moher, D., Altman, D.G., & Ravaud, P. (2009). Comparison of Registered and Published Primary Outcomes in Randomized Controlled Trials JAMA, 302 (9) DOI: 10.1001/jama.2009.1242

Mathieu, S., Chan, A., & Ravaud, P. (2013). Use of Trial Register Information during the Peer Review Process PLoS ONE, 8 (4) DOI: 10.1371/journal.pone.0059910

Nosek, B., & Bar-Anan, Y. (2012). Scientific Utopia: I. Opening Scientific Communication Psychological Inquiry, 23 (3), 217-243 DOI: 10.1080/1047840X.2012.692215

Open Science Collaboration (2015). PSYCHOLOGY. Estimating the reproducibility of psychological science. Science (New York, N.Y.), 349 (6251) PMID: 26315443
— — —

Broca’s area processes both music and language at the same time

When you read a book and listen to music, the brain doesn’t keep these two tasks nicely separated. In a new article just out, I show that there is a brain area which is busy with both tasks at the same time (Kunert et al., 2015). This brain area might tell us a lot about what music and language share.


The brain area which you see highlighted in red on this picture is called Broca’s area. Since the 19th century, many people believe it to be ‘the language production part of the brain’. However, a more modern theory proposes that this area is responsible for combining elements (e.g., words) into coherent wholes (e.g., sentences), a task which needs to be solved to understand and produce language (Hagoort, 2013). In my most recent publication, I found evidence that at the same time as combining words into sentences, this area also combines tones into melodies (Kunert et al., 2015).

What did I do with my participants in the MRI scanner?

Take for example the sentence The athlete that noticed the mistresses looked out of the window. Who did the noticing? Was it the mistresses who noticed the athlete or the athlete who noticed the mistresses? In other words, how does noticed combine with the mistresses and the athlete? There is a second version of this sentence which uses the same words in a different way: The athlete that the mistresses noticed looked out of the window. If you are completely confused now, I have achieved my aim of giving you a feeling for what a complicated task language is. Combining words is generally not easy (first version of the sentence) and sometimes really hard (second version of the sentence).

Listening to music can be thought of in similar ways. You have to combine tones or chords in order to hear actual music rather than just a random collection of sounds. It turns out that this is also generally not easy and sometimes really hard. Check out the following two little melodies. The text is just the first example sentence above, translated into Dutch (the fMRI study was carried out in The Netherlands).

If these examples don’t work, see more examples on my personal website here.

Did you notice the somewhat odd tone in the middle of the second example? Some people call this a sour note. The idea is that it is more difficult to combine such a sour note with the other tones in the melody, compared to a more expected note.

So, now we have all the ingedients to compare the combination of words into a sentence (with an easy and a difficult kind of combination) and tones in a melody (with an easy and a difficult kind of combination). My participants heard over 100 examples like the ones above. The experiment was done in an fMRI scanner and we looked at the brain area highlighted in red above: Broca’s area (under your left temple).

What did I find in the brain data?

The height of the bars represents the difference in brain activity signal between the easy and difficult versions of the sentences. As you can see, the bars are generally above zero, i.e. this brain area displays more activity for more difficult sentences (not a significant main effect in this analysis actually). I show three bars because the sentences were sung in three different music versions: easy (‘in-key’), hard (‘out-of-key’), or with an unexpected loud note (‘auditory anomaly’). As you can see the easy version of the melody (left bar) or the one with the unexpected loud note (right bar) hardly lead to an activity difference between easy and difficult sentences. It is the difficult version (middle bar) which does. In other words: when this brain area is trying to make a difficult combination of tones, it suddenly has great trouble with the combination of words in a sentence.

What does it all mean?

This indicates that Broca’s area uses the same resources for music and language. If you overwhelm this area with a difficult music task, there are less resources available for the language task. In a previous blog post, I have argued that behavioural experiments have shown a similar picture (Kunert & Slevc, 2015). This experiment shows that the music-language interactions we see in people’s behaviour might stem from the activity in this brain area.

So, this fMRI study contributes a tiny piece to the puzzle of how the brain deals with the many tasks it has to deal with. Instead of keeping everything nice and separated in different corners of the head, similar tasks appear to get bundled in specialized brain areas. Broca’s area is an interesting case. It is associated with combining a structured series of elements into a coherent whole. This is done across domains like music, language, and (who knows) beyond.

[Update 13/11/2015: added link to personal website.]

— — —
Hagoort P (2013). MUC (Memory, Unification, Control) and beyond. Frontiers in psychology, 4 PMID: 23874313

Kunert R, & Slevc LR (2015). A Commentary on: “Neural overlap in processing music and speech”. Frontiers in human neuroscience, 9 PMID: 26089792

Kunert R, Willems RM, Casasanto D, Patel AD, & Hagoort P (2015). Music and Language Syntax Interact in Broca’s Area: An fMRI Study. PloS one, 10 (11) PMID: 26536026

— — —

DISCLAIMER: The views expressed in this blog post are not necessarily shared by my co-authors Roel Willems, Daniel Casasan/to, Ani Patel, and Peter Hagoort.