Are pre-registrations the solution to the replication crisis in Psychology?

Most psychology findings are not replicable. What can be done? In his Psychological Science editorial, Stephen Lindsay advertises pre-registration as a solution, writing that “Personally, I aim never again to submit for publication a report of a study that was not preregistered”. I took a look at whether pre-registrations are effective and feasible [TL;DR: maybe and possibly].

[I updated the blog post using comments by Cortex editor Chris Chambers, see below for full comments. It turns out that many of my concerns have already been addressed. Updates in square brackets.]

A recent study published in Science found that the majority of Psychological research cannot be reproduced by independent replication teams (Open Science Collaboration, 2015). I believe that this is due to questionable research practices (LINK) and that internal replications are no solution to this problem (LINK). However, might pre-registrations be the solution? I don’t think so. The reason why I am pessimistic is three-fold.

What is a pre-registration? A pre-registered study submits its design and analysis before data is acquired. After data acquisition the pre-registered data analysis plan is executed and the results can confidently be labelled confirmatory (i.e. more believable). Analyses not specified before are labelled exploratory (i.e. less believable). Some journals offer peer-review of the pre-registration document. Once it has been approved, the chances of the journal accepting a manuscript based on the proposed design and analysis are supposedly very high. [Chris Chambers: “for more info on RRs see https://osf.io/8mpji/wiki/home/”%5D

 

1) Pre-registration does not remove all incentives to employ questionable research practices

Pre-registrations should enforce honesty about post hoc changes in the design/analysis. Ironically, the efficacy of pre-registrations is itself dependent on the honesty of researchers. The reason is simple: including the information that an experiment was pre-registered is optional. So, if the planned analysis is optimal, a researcher can boost its impact by revealing that the entire experiment was pre-registered. If not, s/he deletes the pre-registration document and proceeds as if it had never existed, a novel questionable research practice (anyone want to invent a name for it? Optional forgetting?).

Defenders of pre-registration could counter that peer-reviewed pre-registrations are different because there is no incentive to deviate from the planned design/analysis. Publication is guaranteed if the pre-registered study is executed as promised. However, two motives remove this publication advantage:

1a) the credibility boost of presenting a successful post hoc design or analysis decision as a priori can still be achieved by publishing the paper in a different journal which is unaware of the pre-registration document.

1b) the credibility loss of a wider research agenda due to a single unsuccessful experiment can still be avoided by simply withdrawing the study from the journal and forgetting about it.

The take-home message is that one can opt-in and out of pre-registration as one pleases. The maximal cost is the rejection of one peer-reviewed pre-registered paper at one journal. Given that paper rejection is the most normal thing in the world for a scientist these days, this threat is not effective.

[Chris Chambers: “all pre-registrations made now on the OSF become public within 4 years – so as far as I understand, it is no longer possible to register privately and thus game the system in the way you describe, at least on the OSF.”]

2) Pre-registrations did not clean up other research fields

Note that the argument so far assumes that when the pre-registration document is revealed, it is effective in stopping undisclosed post hoc design/analysis decisions. The medical sciences, in which randomized control trials have to be pre-registered since a 2004 decision by journal editors, teach us that this is not so. There are four aspects to this surprising ineffectivetiveness of pre-registrations:

2a) Many pre-registered studies are not published. For example, Chan et al. (2004a,b) could not locate the publications of 54% – 63% of the pre-registered studies. It’s possible that this is due to the aforementioned publication bias (see 1b above), or other reasons (lack of funding, manuscript under review…).

2b) Medical authors feel free to frequently deviate from their planned designs/analyses. For example 31% – 62% of randomized controlled trials changed at least one primary outcome between pre-registration and publication (Mathieu et al., 2009; Chan et al., 2004a,b). If you thought that psychological scientists are somehow better than medical ones, early indications are that this is not so (Franco et al., 2015).

pre-registration deviations in psych science

2c) Deviations from pre-registered designs/analyses are not discovered because 66% of journal reviewers do not consult the pre-registration document (Mathieu et al., 2013).

2d) In the medical sciences pre-registration documents are usually not peer-reviewed and quite often sloppy. For example, Mathieu et al., (2013) found 37% of trials to be post-registered (the pointless exercise of registering a study which has already taken place), and 17% of pre-registrations being too imprecise to be useful.

[Chris Chambers: “The concerns raised by others about reviewers not checking protocols apply to clinical trial registries but this is moot for RRs because checking happens at an editorial level (if not at both an editorial and reviewer level) and there is continuity of the review process from protocol through to study completion.”]

3) Pre-registration is a practical night-mare for early career researchers

Now, one might argue that pre-registering is still better than not pre-registering. In terms of non-peer-reviewed pre-registration documents, this is certainly true. However, their value is limited because they can be written so vaguely as to be not useless (see 2d) and they can simply be deleted if they ‘stand in the way of a good story’, i.e. if an exploratory design/analysis choice gets reported as confirmatory (see 1a).

The story is different for peer-reviewed pre-registrations. They are impractical because of one factor which tenured decision makers sometimes forget: time. Most research is done by junior scientists who have temporary contracts running anywhere between a few months and five years [reference needed]. These people cannot wait for a peer-review decision which, on average, takes something like one year and ten months (Nosek & Bar-Anan, 2012). This is the submission-to-publication-time distribution for one prominent researcher (Brian Nosek):

hist_publication_times

What does this mean? As a case study, let’s take Richard Kunert, a fine specimen of a junior researcher, who was given three years of funding by the Max-Planck-Gesellschaft in order to obtain a PhD. Given the experience by Brain Nosek with his articles, and assuming Richard submits three pre-registration documents on day 1 of his 3-year PhD, each individual document has a 84.6% chance of being accepted within three years. The chance that all three will be accepted is 60.6% (0.8463). This scenario is obviously unrealistic because it leaves no time for setting up the studies and for actually carrying them out.

For the more realistic case of one year of piloting and one year of actually carrying out the studies, Richard has a 2.2% chance (0.2823) that all three studies are peer-reviewed at the pre-registration stage and published. However, Richard is not silly (or so I have heard), so he submits 5 studies, hoping that at least three of them will be eventually carried out. In this case he has a 14% that at least three studies are peer-reviewed at the pre-registration stage and published. Only if Richard submits 10 or more pre-registration documents for peer-review after 1 year of piloting, he has a more than 50% chance of being left with at least 3 studies to carry out within 1 year.

For all people who hate numbers, let me put it into plain words. Peer-review is so slow that requiring PhD students to only perform pre-registered studies means the overwhelming majority of PhD students will fail their PhD requirements in their funded time. In this scenario cutting-edge, world-leading science will be done by people flipping burgers to pay the rent because funding ran out too quickly.

[Chris Chambers: “Average decision times from Cortex, not including time taken by authors to make revisions: initial trial = 5 days; Stage 1 provisional acceptance = 9 weeks (1-3 rounds of in-depth review); Stage 2 full acceptance = 4 weeks”]

What to do

The arrival of pre-registration in the field of Psychology is undoubtedly a good sign for science. However, given what we know now, no one should be under the illusion that this instrument is the solution to the replication crisis which psychological researchers are facing. At the most, it is a tiny piece of a wider strategy to make Psychology what it has long claimed to be: a robust, evidence based, scientific enterprise.

 

[Please do yourself a favour and read the comments below. You won’t get better people commenting than this.]

— — —

Chan AW, Krleza-Jerić K, Schmid I, & Altman DG (2004). Outcome reporting bias in randomized trials funded by the Canadian Institutes of Health Research. CMAJ : Canadian Medical Association journal = journal de l’Association medicale canadienne, 171 (7), 735-40 PMID: 15451835

Chan, A., Hróbjartsson, A., Haahr, M., Gøtzsche, P., & Altman, D. (2004). Empirical Evidence for Selective Reporting of Outcomes in Randomized Trials JAMA, 291 (20) DOI: 10.1001/jama.291.20.2457

Franco, A., Malhotra, N., & Simonovits, G. (2015). Underreporting in Psychology Experiments: Evidence From a Study Registry Social Psychological and Personality Science DOI: 10.1177/1948550615598377

Lindsay, D. (2015). Replication in Psychological Science Psychological Science DOI: 10.1177/0956797615616374

Mathieu, S., Boutron, I., Moher, D., Altman, D.G., & Ravaud, P. (2009). Comparison of Registered and Published Primary Outcomes in Randomized Controlled Trials JAMA, 302 (9) DOI: 10.1001/jama.2009.1242

Mathieu, S., Chan, A., & Ravaud, P. (2013). Use of Trial Register Information during the Peer Review Process PLoS ONE, 8 (4) DOI: 10.1371/journal.pone.0059910

Nosek, B., & Bar-Anan, Y. (2012). Scientific Utopia: I. Opening Scientific Communication Psychological Inquiry, 23 (3), 217-243 DOI: 10.1080/1047840X.2012.692215

Open Science Collaboration (2015). PSYCHOLOGY. Estimating the reproducibility of psychological science. Science (New York, N.Y.), 349 (6251) PMID: 26315443
— — —

11 comments

  1. Thanks for this interesting post. I think you make some reasonable points about weaknesses with non peer-reviewed pre-registration (e.g. deleting protocols that didn’t “work”, pre-registering vague methods, post-registering in clinical trials etc. all of which are legitimate concerns, notwithstanding the fact that pre-registration in clinical trials has overall been hugely successful, with recent evidence suggesting it has aided in the publication of negative results).

    That said, your characterisation of the peer-reviewed version of pre-registration (Registered Reports, RRs) is simply factually incorrect. To explain further:

    1. Your argument against RRs in point 1 is a non sequitur because it isn’t that “peer-reviewed pre-registrations are different because there is no incentive to deviate from the planned design/analysis”, it is that it isn’t possible to deviate without committing fraud. The journal holds the Stage 1 protocol in its records; any protocol deviations are therefore transparent and, if major, would lead to rejection. This is part of the Stage 2 RR editorial criteria. You can read an example of such criteria here: http://cdn.elsevier.com/promis_misc/PROMIS%20pub_idt_CORTEX%20Guidelines_RR_29_04_2013.pdf

    2. I’m afraid your estimated timings on review of RRs are way off, therefore your argument that RRs disadvantage junior researchers doesn’t make sense. At Cortex, for instance, not counting the time for authors to revise the manuscript, it takes about 8-10 weeks for a Stage 1 protocol to be provisionally accepted. Nothing like a year.

    I guess finally I would point out that your running premise, like others that have come before, is based on the strawman proposition that pre-registration is “the” solution to some problem. Who ever argues this? I don’t know anyone – not even its strongest proponents – who would suggest that pre-registration or RRs are a one-size-fits-all cure for reproducibility problems in science.

    In case you hadn’t seen it, this Q&A on RRs helps to dispel some of the myths and misconceptions about this format: https://osf.io/8mpji/wiki/4.%20Frequently%20Asked%20Questions/

    1. Thanks for the comment. I am glad I could start a conversation.

      To be sure what the aim of this blog post is, let’s spell it out: dispel some positive myths and misconceptions of this format. The last thing we need is for the replication crisis to lead to “solutions” which turn out to be a burden on researchers without much benefit. From the analysis in this post, my impression is that, only if done right, pre-registration will offer a substantial benefit.

      Regarding your answers:

      1) Who checks whether deviations from pre-registration protocols happened? Mathieu et al. (2013) found alarmingly high numbers of reviewers not doing it. Claiming that reviewers will hold methodological details in mind between the end of Stage 1 and the beginning of Stage 2 is not realistic in my experience. Supervisors frequently forget method details of projects which their own PhD students are working on.

      2) The data I base my timing on is clearly specified. It is not good data, to be sure (only 1 researcher, publication times rather than Stage 1 acceptance times), but the best I could get my hands on. Do you have better data available for Cortex? I am happy to modify my blog post with better data in hand.

      I notice that you do not respond to all concerns, especially in part 1. For example, regarding concern 1b above, how many of the Stage 1 provisionally accepted studies do you never see back for Stage 2?

      1. 1. For RRs there is no requirement to keep details in memory. The action editor (and at Cortex the RR editorial team, which includes five editors) check that the Methods are unchanged at Stage 2. Our experience is that authors are very honest and we have a transparent mechanism to allow reporting of minor protocol deviations via footnotes in the Methods. The accepted Stage 1 protocol is also sent to the reviewers at Stage 2 together with the complete manuscript. The concerns raised by others about reviewers not checking protocols apply to clinical trial registries but this is moot for RRs because checking happens at an editorial level (if not at both an editorial and reviewer level) and there is continuity of the review process from protocol through to study completion.

        2. Average decision times from Cortex, not including time taken by authors to make revisions: initial trial = 5 days; Stage 1 provisional acceptance = 9 weeks (1-3 rounds of in-depth review); Stage 2 full acceptance = 4 weeks.

        3. “how many of the Stage 1 provisionally accepted studies do you never see back for Stage 2?”
        None and I suspect this will be extremely rare because failure to resubmit would be treated as a de facto study withdrawal and would thus lead to publication of a Withdrawn Registration, where the abstract from the Stage 1 submission is published along with a reason for the withdrawal; see FAQ 9D here: https://osf.io/8mpji/wiki/4.%20Frequently%20Asked%20Questions/ Also, off the topic of RRs, all pre-registrations made now on the OSF become public within 4 years – so as far as I understand, it is no longer possible to register privately and thus game the system in the way you describe, at least on the OSF.

        For more info on RRs see https://osf.io/8mpji/wiki/home/

  2. As someone who has in the past also debating the worth of pre-registration or registered reports and who Chris Chambers also has at least partially converted to a believer, I feel I should give my two cents here. Chris has already argued better than I could why RRs are dealing with many of the problems you raised in your post. I don’t think the peer review process would necessarily make it impossible to publish during your PhD – it probably won’t actually change all that much. It’s really mainly about shifting the timing of (most of) the peer review to take place before data collection and it also shuts down unreasonable reviewer demands (you know like: “Please collect 100 more subjects” or “Please test condition X which only tenuously relates to your original hypothesis”) so in fact process may even get faster whilst ensuring sounder experiments because a lot of the design has been scrutinised before the study commenced.

    In fact, my only remaining concern about RRs is this: A lot of the arguments seem to rest on the editors doing a good job. I have no doubt that most editors of RRs now are very conscientious and just the kind of editor you need for this to work well. But this is because most editors of RRs are presumably strong advocates of this model and believe that this is a sound way to do science. If RRs become the new standard, there will be a lot of editors and reviewers who won’t do such a good job, just as many probably don’t do the best job they could in the current model either. The question is whether this would undo a lot of the benefits of RRs or could even make things worse. I think the answer to this is probably: Yes sometimes, but it won’t outweigh the benefits of RRs. Either way it’s an empirical question that we can address only by promoting RRs and then looking at this in a few years time when they have become more widespread.

    1. Ha, all this talk of conversions and whatnot makes me sound like some kind of Jesuit evangelist🙂

      Agree 100% with what you say, Sam. Judicious editing is key, and having now edited for RR and non-RR formats, I do feel that RRs require more editorial attention and discussion – and most importantly THE EDITOR MUST READ THE PAPER – CAREFULLY! Actually, I’m considering putting together some form of training guidelines / video guides etc. once the format gets even more popular.

      Just last week I sent a letter around the editorial board of Royal Society Open Science (where I am RR handling editor) pointing out some of the things we’ve learned so far as RR editors. We’ve got some exciting times ahead with this format at RSOS because they are available across >200 different fields.

      In case it’s of interest, here are the five editorial challenges I noted in my letter:

      “We now have two years experience assessing and publishing RRs at different journals, and while many obstacles undoubtedly remain to be discovered, we are learning a great deal.

      We have experienced five major challenges from my letter as identified so far (and not all are unique to RRs, but they seem more salient):

      1) Careful editorial triage is vital to ensure that only submissions that reach a sufficient standard are sent for in-depth Stage 1 review. Where there are obvious flaws or oversights in meeting the Stage 1 review criteria, we should reject submissions without in-depth review – and if the submissions are redeemable we have the option on Scholar One to desk-reject but invite a resubmission. Early in the life of RRs, where many authors are unfamiliar with the requirements, it is not uncommon to triage many otherwise promising submissions with the offer to resubmit. As described in the workflow above, part of my role is to perform the first editorial triage of all RR submissions, and then (for those that pass) to perform an additional triage assessment in consultation with a specialist associate editor. Only once the associate editor and I are happy with the general standard of a submission will it be sent for in-depth Stage 1 review. When my own judgment about a submission is in doubt due to a lack of specialist knowledge, I will defer to the judgment of the associate editor.

      2) The review process for RRs typically has a different tone to standard review – rather than finding reasons to reject manuscripts, reviewers at Stage 1 generally make detailed suggestions for improving and enhancing the proposed methodology. As editors we have found this to be a remarkable process that defies the conventional (often negative) norm of peer review; however, with greater pre-experimental input from reviewers comes additional editorial challenges. It is not uncommon for reviewers’ suggestions at Stage 1 to be mutually exclusive, which can require careful editorial input to help steer authors toward clarity. Because of this fine attention to detail, our impression is that the editorial involvement in Stage 1 RRs is more intense and time-consuming than for standard submissions.

      3) It is important that editors ensure that their decisions are tied closely to the review criteria and I would therefore encourage all editors to read these carefully (and please feel free to contact me at chambersc1@cardiff.ac.uk if you have any questions or concerns). At Stage 2, once the study is completed, impressions about the importance or novelty of results should never influence editorial decisions.

      4) It is not uncommon for reviewers at Stage 2 to attempt to shift the goal posts when the outcome of an approved Stage 1 protocol disagrees with their expectations. A case study: in a paper I handled as editor at another journal, a reviewer at Stage 2 who had previously approved the Stage 1 protocol demanded that the authors conduct a long series of unregistered post hoc analyses because the reviewer did not find the outcomes of the pre-registered analyses “convincing”. For conventional unregistered articles, reviewers typically have great power to compel authors to conduct such extra analyses and an editor would typically require them to do so unless there was a very good reason not to. However, for RRs, this power is focused in a different way: reviewers instead have the opportunity to help shape the methodological and analytical approach at Stage 1, before the results exist. The influence of reviewers in requiring new analyses at Stage 2 is restricted in order to overcome common biases against negative results or other outcomes that reviewers may find undesirable. Therefore, while authors of Stage 2 RRs are welcome to include additional post hoc analyses as suggested by the reviewers, they are not required to conduct them unless such analyses would be necessary to satisfy one or more of the Stage 2 review criteria (see above). For instance, a post hoc analysis may be required if the editors judge that without such an analysis, the authors’ conclusions are not justified by the evidence (Stage 2 Criterion 6). In cases where a reviewer strongly recommends a post hoc analysis, but the authors do not agree that it is necessary, we can offer the reviewer the opportunity to publish a commentary in which they use the published data to report the analysis themselves.

      5) One of the most substantial and interesting challenges in handling RRs is deciding in advance the quality checks that must be satisfied in order to conclude that the experiment was conducted to a publishable standard (see Stage 1 Criterion 6 and Stage 2 Criterion 1). Such tests must be independent of the stated hypotheses and might include positive control experiments or other checks on data quality. Although such tests should be a part of all good scientific experiments, our impression is that this process is more rigorous and intensive for Stage 1 RRs compared with conventional (unregistered) papers. This may be because in some fields there is a tendency during conventional peer review to conflate methodological quality with the outcomes of hypothesis testing. For example, the methodological quality of an experiment may be more likely to be questioned when the results fail to support the hypothesis or otherwise run counter to expectations.”

      1. Chris Chambers

        Thanks Chris, these are great observations. In particular the positive side effect that the review tone changes is great news. It’s not surprising perhaps but still not something anyone really talked about before trying RRs. This can only benefit science.

        Regarding the issue of shifting goalposts, would it make sense to explicitly ask reviewers in the review form at stage 2 whether they have any posthoc suggestions whilst making clear that authors are not bound to them? The main problem is that sometimes a fatal flaw may only become clear when you see the final data so some posthoc demands may be extremely useful. If that is the case, the reviewers should have to a compelling argument though and this cannot be excessive (so no long string of additional controls).

        I like the idea of giving reviewers the chance to reanalyse if they feel they should. In fact with open data this should always be possible even without RR. This argument should be generally used to minimise excessive goalpost shifting.

  3. Yes – that’s a good idea. So far we’ve found the reviewers tend not to hold back anyway when it comes to suggesting additional analyses (not always out of motivated reasoning, but as with normal papers just because something occurs to them in the results that they would be interested to find out more about). And yes, if the data did reveal a fatal flaw in the original (pre-registered) analysis plan then a corrective post hoc analysis could be required in order to satisfy the key Stage 2 criterion that the conclusions must be supported by the evidence (since a faulty analysis is likely to lead to a wrong conclusion).

    One thing I particularly like about the data sharing aspect is not only that it allows reviewers to dig around and report post hoc tests, but that it shifts the burden on to THEM to actually do that work. This acts as a natural counterweight to overzealous reviewer requests and it’s interesting to see how reviewers react when they say “Do these 15 extra analyses” and you (the editor) says back “Here’s the data, you’re welcome to do them yourself and publish a comment”. At that point you learn exactly how important the reviewer really believes those extra analyses to be, and there’s another potentially interesting aspect here too: inviting the reviewer to commit to writing a comment before they even know the outcome of the post hoc analysis. Also, the whole back-and-forth exchange is much more transparent and attributes the ideas for post hoc analyses to their correct sources.

  4. I’d like someone to help me understand the difference between pre-registration(PR) and registered reports (RR), since the two terms seem to get swapped during discussions. My understanding had been that PR meant to submit your methods on a website, and that this process is quick and has no peer-review (as per discussions I’ve had on twitter with Sam). An RR would involve an agreement to publish a paper and that would involve peer review of the methods prior to running the experiments.

    However now I see these two terms thrown around interchangeably and it’s a bit confusing.

    1. RR are definitely the peer-reviewed two-stage process. However, preregistration kind of encompasses both – I don’t know if there is any other specific term for the quick non-RR version. I agree it would make sense to make the distinction clearer in discussions.

    2. You can think of pre-registration as the general approach being undertaken, and RRs as a specific manifestation of that approach in which (a) the protocol is peer reviewed before outcomes are known, and (b) the journal provisionally accepts the final paper based on the review of that protocol. Full details here: https://osf.io/8mpji/wiki/home/

      Beyond RRs, it is possible to pre-register via a non peer-reviewed route e.g. on the Open Science Framework. This has some advantages and some disadvantages. Richard’s piece does a nice job, I think, in highlighting some of the drawbacks of the free-range approach.

      It is true that in our shop-talk about these concepts we often mix them up, mostly because nerds like Sam and I already know the context in which we’re discussing them.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s