The end of Brain’s Idea

This will be the very last blog post of Brain’s Idea. Ever since my first blog post on 26 January 2012 to nearly six years later now, it has been great fun writing about science, psychology, and the brain.

Looking back

The readership of Brain’s Idea grew year on year. I averaged about 1000 views per month during the first three years. In 2015 and 2016 I saw big increases in my readership, particularly around the topic of Psychology’s replication crisis, see here, here, and here. In 2016 more than 2000 views per month were generated. Everything else I have ever written in my life pales against these numbers.


My most successful blog posts over the years were:

  1. The real reason why new pop music is so incredibly bad
  2. The biological basis of orchestra seating
  3. The mysterious appeal of too loud music
  4. Psychological principles as guidelines for effective PowerPoint presentations
  5. Why do we like sad music?

I am actually surprised that nearly all of the most read posts are music related. Many thanks to the readers of Brain’s Idea for reminding me that my PhD topic is interesting, relevant, and worth investigating.

Looking ahead

By now, I have left academia. In February of this year I successfully defended my PhD in Nijmegen (The Netherlands). Many people encouraged me to pursue a career in science and I even had concrete offers for post-doc positions. However, I turned those down and turned away from academia. Why? I think a colleague of mine put it very well when he said ‘I want to be a scientist. I just don’t want to live like a scientist.’ I was good at academic science. But academic science was not good for me. So, I left.

My arduous path to this conclusion has been eloquently put into words by my PhD supervisor Roel Willems during the laudatio of my PhD defense:

[…] Also outside of Nijmegen you made your mark. In the particular niche of science called music cognition, you quickly introduced yourself, made contact with international colleagues without the help of supervisors who could introduce you.

Your presence did not go unnoticed, and I was reminded of this last summer when on a conference I started talking to a colleague from Leipzig that I hadn’t met before. I told him I worked in Nijmegen and his honest and immediate answer was: ‘Ah, Nijmegen, do you know Richard Kunert?’ Just as Erasmus became an ambassador for Rotterdam, Richard you had become an ambassador for Nijmegen. Would you ever have figured?

Your project continued, data got analyzed, manuscripts got written. With it came your doubts and disappointments. You felt like you played by the rulesof science, but that someone had changed the rules without telling you.

The elusive practices of peer-review, the habit of story-telling aver data, coupled with an ongoing replicatin crisis, made you rethink your future as an academic. With your characteristic determination you decided to leave academia. Even if that meant giving up a plan, a dream as you said, which you had been pursuing ever since your bachelor’s time in Glasgow.

During your defense many have expressed their praise for your intellectual achievements, and I will do so agian here. You have done a fantastic PhD project, and all the praise you get is well-deserved. But your choice to give up on your initial plan when you found out that the academic world was not your place, wins you more than praise. I find it really remarkable that you choose for what felt like the best thing to do, to change your initial plan, even if the outcome is unknown.

Strange as it might sound, this brings us back to Erasmus. Regarded as one of the most influential thinkers of his time. More than 500 years ago Erasmus started a doctoral project in Paris. He never finished it. He didn’t like the way theology – his subject matter – was taught and dropped out early. He too felt that academia as it was, was not the way he wanted it to be, albeit for different reasons than the replication crisis of peer review. He too sensed that his proper place was elsewhere. […]

Rich Data

I might have left Brain’s Idea but I haven’t left blogging. My new data science blog is called Rich Data. You should check it out: rikunert.com . See you there.

The curious effect of a musical rhythm on us

Do you know the feeling of a musical piece moving you? What is this feeling? One common answer by psychological researchers is that what you feel is your attention moving in sync with the music. In a new paper I show that this explanation is mistaken.

Watch the start of the following video and observe carefully what is happening in the first minute or so (you may stop it after that).

Noticed something? Nearly everyone in the audience moved to the rhythm, clapping, moving the head etc. And you? Did you move? I guess not. You probably looked carefully at what people were doing instead. Your reaction illustrates nicely how musical rhythms affect people according to psychological researchers. One very influential theory claims that your attention moves up and down in sync with the rhythm. It treats the rhythm like you treated it. It simply ignores the fact that most people love moving to the rhythm.

The theory: a rhythm moves your attention

Sometimes we have gaps of attention. Sometimes we manage to concentrate really well for a brief moment. A very influential theory, which has been supported in various experiments, claims that these fluctuations in attention are synced to the rhythm when hearing music. Attention is up at rhythmically salient moments, e.g., the first beat in each bar. And attention is down during rhythmically unimportant moments, e.g., off-beat moments.

This makes intuitive sense. Important tones, e.g., those determining the harmonic key of a music piece, tend to occur at rhythmically salient moments. Looking at language rhythm reveals a similar picture. Stressed syllables are important for understanding language and signal moments of rhythmic salience. It makes sense to attend well during moments which include important information.

The test: faster decisions and better learning?

I, together with Suzanne Jongman, asked whether attention really is up at rhythmically salient moments. If so, people should make decisions faster when a background rhythm has a moment of rhythmic importance. As if people briefly concentrated better at that moment. This is indeed what we found. People are faster at judging whether a few letters on the screen are a real word or not, if the letters are shown near a salient moment of a background rhythm, compared to another moment.

However, we went further. People should also learn new words better if they are shown near a rhythmically salient moment. This turned out not to be the case. Whether people have to memorise a new word at a moment when their attention is allegedly up or down (according to a background rhythm) does not matter. Learning is just as good.

What is more, even those people who react really strongly to the background rhythm in terms of speeding up a decision at a rhythmically salient moment (red square in Figure below), even those people do not learn new words better at the same time as they speed up.

It’s as if the speed-up of decisions is unrelated to the learning of new words. That’s weird because both tasks are known to be affected by attention. This makes us doubt that a rhythm affects attention. What could it affect instead?


Figure 1. Every dot is one of 60 participants. How much a background rhythm sped up responses is shown horizontally. How much the same rhythm, at the same time, facilitated pseudoword memorisation is shown on the vertical axis. The red square singles out the people who were most affected by the rhythm in terms of their decision speed. Notice that, at the same time, their learning is unaffected by the rhythm.

The conclusion: a rhythm does not move your attention, it moves your muscles

To our own surprise, a musical rhythm appears not to affect how your attention moves up and down, when your attentional lapses happen, or when you can concentrate well. Instead, it simply appears to affect how fast you can press a button, e.g., when indicating a decision whether a few letters form a word or not.

Thinking back to the video at the start, I guess this just means that people love moving to the rhythm because the urge to do so is a direct consequence of understanding a rhythm. Somewhere in the auditory and motor parts of the brain, rhythm processing happens. However, this has nothing to do with attention. This is why learning a new word shown on the screen – a task without an auditory or motor component – is not affected by a background rhythm.

The paper: the high point of my career

You may read all of this yourself in the paper (here). I will have to admit that in many ways this paper is how I like to see science done and, so, I will shamelessly tell you of its merits. The paper is not too long (7,500 words) but includes no less than 4 experiments with no less than 60 participants each. Each experiment tests the research question individually. However, the experiments build on each other in such a way that their combination makes the overall paper stronger than any experiment individually ever could.

In terms of analyses, we put in everything we could think of. All analyses are Bayesian (subjective Bayes factor) and frequentist (p-values). We report hypothesis testing analyses (Bayes factor, p-values) and parameter estimation analyses (effect sizes, Confidence intervals, Credible intervals). If you can think of yet another analysis, go for it. We publish the raw data and analysis code alongside the article.

The most important reason why this paper represents my favoured approach to science, though, is because it actually tests a theory. A theory I and my co-author truly believed in. A theory with a more than 30-year history. With a varied supporting literature. With a computational model implementation. With more than 800 citations for two key papers. With, in short, everything you could wish to see in a good theory.

And we falsified it! Instead of thinking of the learning task as ‘insensitive’ or as ‘a failed experiment’, we dug deeper and couldn’t help but concluding that the attention theory of rhythm perception is probably wrong. We actually learned something from our data!

PS: no-one is perfect and neither is this paper. I wish we had pre-registered at least one of the experiments. I also wish the paper was open access (see a free copy here). There is room for improvement, as always.

— — —
Kunert R, & Jongman SR (2017). Entrainment to an auditory signal: Is attention involved? Journal of experimental psychology. General, 146 (1), 77-88 PMID: 28054814

Discovering a glaring error in a research paper – a personal account

New York Magazine has published a great article about how grad student Steven Ludeke tried to correct mistakes in the research of Pete Hatemi and Brad Verhulst. Overall, Ludeke summarises his experience as ‘not recommendable’. Back in my undergraduate years I spotted an error in an article by David DeMatteo and did little to correct it. Why?

Christian Bale playing a non-incarcerated American Psycho.

David DeMatteo, assistant professor in Psychology at Drexel University, investigates psychopathy. In 2010, I was a lowly undergraduate student and noticed a glaring mistake in one of his top ten publications which has now been cited 50 times according to Google Scholar.

The error

The study investigated the characteristics of psychopaths who live among us, the non-incarcerated population. How do these psychopaths manage to avoid prison? DeMatteo et al. (2006) measured their psychopathy in terms of personality features and in terms of overt behaviours. ‘Participants exhibited the core personality features of psychopathy (Factor 1) to a greater extent than the core behavioral features of psychopathy (Factor 2). This finding may be helpful in explaining why many of the study participants, despite having elevated levels of psychopathic characteristics, have had no prior involvement with the criminal justice system.’ (p. 142)

The glaring mistake in this publication is that Factor 2 scores at 7.1 (the behavioural features of psychopathy) are actually higher than the Factor 1 scores at 5.2 (the personality features of psychopathy). The numbers tell the exactly opposite story to the words.


The error in short. The numbers obviously do not match up with the statement.

The numbers are given twice in the paper making a typo unlikely (p. 138 and p. 139). Adjusting the scores for the maxima of the scales that they are from (factor 1 x/x_max = 0.325 < factor 2 x/x_max=0.394) or the sample maximum (factor 1 x/x_max_obtained = 0.433 < factor 2 x/x_max_obtained = 0.44375) makes no difference. No outlier rejection is mentioned in the paper.

In sum, it appears as if DeMatteo and his co-authors interpret their numbers in a way which makes intuitive sense but which is in direct contradiction to their own data. When researchers disagree with their own data, we have a real problem.

The reaction

1) Self doubt. I consulted with my professor (the late Paddy O’Donnel) who confirmed the glaring mistake.

2) Contact the author. I contacted DeMatteo in 2010 but his e-mail response was evasive and did nothing to resolve the issue. I have contacted him again, inviting him to react to this post.

3) Check others’ reactions. I found three publications which cited DeMatteo et al.’s article (Rucevic, 2010; Gao & Raine, 2010; Ullrich et al., 2008) and simply ignored the contradictory numbers. They went with the story that community dwelling psychopaths show psychopathic personalities more than psychopathic behaviours, even though the data in the article favours the exactly opposite conclusion.

4) Realising my predicament. At this point I realised my options. Either I pursued this full force while finishing a degree and, afterwards, moving on to my Master’s in a different country. Or I let it go. I had a suspicion which Ludeke’s story in New York Magazine confirmed: in these situations one has much to lose and little to gain. Pursuing a mistake in the research literature is ‘clearly a bad choice’ according to Ludeke.

The current situation

And now this blog post detailing my experience. Why? Well, on the one hand, I have very little to lose from a disagreement with DeMatteo as I certainly don’t want a career in law psychology research and perhaps not even in research in general. The balance went from ‘little to gain, much to lose’ to ‘little to gain, little to lose’. On the other hand, following my recent blog posts and article (Kunert, 2016) about the replication crisis in Psychology, I have come to the conclusion that science cynicism is not the way forward. So, I finally went fully transparent.

I am not particularly happy with how I handled this whole affair. I have zero documentation of my contact with DeMatteo. So, expect his word to stand against mine soon. I also feel I should have taken a risk earlier in exposing this. But then, I used to be passionate about science and wanted a career in it. I didn’t want to make enemies before I had even started my Master’s degree.

In short, only once I stopped caring about my career in science did I find the space to care about science itself.

— — —

DeMatteo, D., Heilbrun, K., & Marczyk, G. (2006). An empirical investigation of psychopathy in a noninstitutionalized and noncriminal sample Behavioral Sciences & the Law, 24 (2), 133-146 DOI: 10.1002/bsl.667

Gao, Y., & Raine, A. (2010). Successful and unsuccessful psychopaths: A neurobiological model Behavioral Sciences & the Law DOI: 10.1002/bsl.924

Kunert, R. (2016). Internal conceptual replications do not increase independent replication success Psychonomic Bulletin & Review DOI: 10.3758/s13423-016-1030-9

Rucević S (2010). Psychopathic personality traits and delinquent and risky sexual behaviors in Croatian sample of non-referred boys and girls. Law and human behavior, 34 (5), 379-91 PMID: 19728057

Ullrich, S., Farrington, D., & Coid, J. (2008). Psychopathic personality traits and life-success Personality and Individual Differences, 44 (5), 1162-1171 DOI: 10.1016/j.paid.2007.11.008

— — —

Update 16/11/2016: corrected numerical typo in sentence beginning ‘Adjusting the scores for the maxima…’ pointed out to me by Tom Foulsham via twitter (@TomFoulsh).

10 things I learned while working for the Dutch science funding council (NWO)


The way science is currently funded is very controversial. During the last 6 months I was on a break from my PhD and worked for the organisation funding science in the Netherlands (NWO). These are 10 insights I gained.


1) Belangenverstrengeling

This is the first word I learned when arriving in The Hague. There is an anal obsession with avoiding (any potential for) conflicts of interest (belangenverstrengeling in Dutch). It might not seem a big deal to you, but it is a big deal at NWO.


2) Work ethic

Work e-mails on Sunday evening? Check. Unhealthy deadline obsession? Check. Stories of burn-out diagnoses? Check. In short, I found no evidence for the mythical low work ethic of NWO. My colleagues seemed to be in a perfectly normal, modern, semi-stressful job.


3) Perks

While the career prospects at NWO are somewhat limited, there are some nice perks to working in The Hague including: an affordable, good cantine, free fruit all day, subsidised in-house gym, free massage (unsurprisingly, with a waiting list from hell), free health check … The work atmosphere is, perhaps as a result, quite pleasant.


4) Closed access

Incredible but true, NWO does not have access to the pay-walled research literature it funds. Among other things, I was tasked with checking that research funds were appropriately used. You can imagine that this is challenging if the end-product of science funding (scientific articles) is beyond reach. Given a Herculean push to make all Dutch scientific output open access, this problem will soon be a thing of the past.


5) Peer-review

NWO itself does not generally assess grant proposals in terms of content (except for very small grants). What it does is organise peer-review, very similar to the peer-review of journal articles. My impression is that the peer-review quality is similar if not better at NWO compared to the journals that I have published in. NWO has minimum standards for reviewers and tries to diversify the national/scientific/gender background of the reviewer group assigned to a given grant proposal. I very much doubt that this is the case for most scientific journals.


6) NWO peer-reviewed

NWO itself also applies for funding, usually to national political institutions, businesses, and the EU. Got your grant proposal rejected at NWO? Find comfort in the thought that NWO itself also gets rejected.


7) Funding decisions in the making

In many ways my fears for how it is decided who gets funding were confirmed. Unfortunately, I cannot share more information other than to say: science has a long way to go before focussing rewards on good scientists doing good research.


8) Not funding decisions

I worked on grants which were not tied to some societal challenge, political objective, or business need. The funds I helped distribute are meant to simply facilitate the best science, no matter what that science is (often blue sky research, Vernieuwingsimpuls for people in the know). Approximately 10% of grant proposals receive funding. In other words, bad apples do not get funding. Good apples also do not get funding. Very good apples equally get zero funding. Only outstanding/excellent/superman apples get funding. If you think you are good at what you do, do not apply for grant money through the Vernieuwingsimpuls. It’s a waste of time. If, on the other hand, you haven’t seen someone as excellent as you for a while, then you might stand a chance.


9) Crisis response

Readers of this blog will be well aware that the field of psychology is currently going through something of a revolution related to depressingly low replication rates of influential findings (Open Science Framework, 2015; Etz & Vandekerckhove, 2016; Kunert, 2016). To my surprise, NWO wants to play its part to overcome the replication crisis engulfing science. I arrived at a fortunate moment, presenting my ideas of the problem and potential solutions to NWO. I am glad NWO will set aside money just for replicating findings.


10) No civil servant life for me

Being a junior policy officer at NWO turned out to be more or less the job I thought it would be. It was monotonous, cognitively relaxing, and low on responsibilities. In other words, quite different to doing a PhD. Other PhD students standing at the precipice of a burn out might also want to consider this as an option to get some breathing space. For me, it was just that, but not more than that.

— — —

This blog post does not represent the views of my former or current employers. NWO did not endorse this blog post. As far as I know, NWO doesn’t even know that this blog post exists.

— — —

Etz, A., & Vandekerckhove, J. (2016). A Bayesian Perspective on the Reproducibility Project: Psychology PLOS ONE, 11 (2) DOI: 10.1371/journal.pone.0149794

Kunert R (2016). Internal conceptual replications do not increase independent replication success. Psychonomic bulletin & review PMID: 27068542

Open Science Collaboration (2015). Estimating the reproducibility of psychological science Science, 349 (6251) DOI: 10.1126/science.aac4716

Psychological researchers need to change their practices: here’s why

Why is a surprising amount of psychological research unreplicable? Psychology calls itself a science but often falls short on the replication test of scientific merit. I took a closer look at the data to find out why. The journal Psychonomic Bulletin and Review will publish the findings very soon, but the accepted pre-print is already available now (download it here). The take-home message is that psychology just cannot go on like this.

The scientific journal Science called it one of the breakthroughs of the year 2015. 270 researchers combined forces to estimate the reproducibility of their field. Surprises were guaranteed but what they found was more dispiriting than anything. Psychological science is not nearly as replicable as expected: instead of the predicted 92% replication rate only 36% of previously reported effects were found in the replication studies. This take-home message led to a media frenzy which ‘completely overwhelmed’ the lead author Brian Nosek.

What many people do not realise is that psychologists aren’t actually sure why it was so difficult to reproduce previous studies. There are essentially two big camps. One camp, most recently re-invigorated by a critical commentary in the journal Science, suggests that the replication teams did not reproduce all details of previous studies. Essentially, they ran slightly different studies compared to the originals and, unsurprisingly, they got different results.

A second camp suggests that it was so difficult to find the original effects in the replication studies because many original effects never actually existed or were much smaller than reported. According to this view, psychological researchers apply all sorts of tricks to their data in order to find what they are after. Original researchers find beautiful patterns in the chaos of the real world but the patterns are not much more than a product of their hopes and dreams of promotion.

I took a closer look at the data in order to see which camp got it right. My idea is simple: if an effect was already replicated before Brian Nosek and his huge team began their work then it should be easier for the huge reproducibility team to also reproduce it. In essence, if you see an effect twice you are more likely to see it a third time compared to only having observed it once.

This is actually the prediction of the first ‘change in details’ camp. Previous replications by the original authors are hardly ever of the exact kind, so previous successful replications despite small differences suggest that a new replication by the reproducibility team will also succeed despite possibly small deviations from the original study.

What I found was the opposite. There is no difference in the replication success of psychological effects, whether they were previously replicated by the original authors or not (see previous blog post). How can that be? I believe that this result supports the second camp. If you can apply all sorts of tricks to your data once, then you can also apply them twice. What looks like a replication is sometimes just trickery applied to two different data sets. So, the fact that a subset of psychological effects which one would expect to be more easily replicated are no different to all other psychological effects suggests that questionable research practices are quite common in psychology.

Fortunately, there are already good initiatives which change scientific practices in order to improve the situation. I am confident that in a few years from now psychology will fare much better in terms of reproducibility. Until then, articles like mine will hopefully convince the last doubters that psychology really should not continue using questionable research practices.

— — —

Gilbert, D., King, G., Pettigrew, S., & Wilson, T. (2016). Comment on “Estimating the reproducibility of psychological science” Science, 351 (6277), 1037-1037 DOI: 10.1126/science.aad7243

Open Science Collaboration (2015). Estimating the reproducibility of psychological science Science, 349 (6251) DOI: 10.1126/science.aac4716

Why are Psychological findings mostly unreplicable?

Take 97 psychological effects from top journals which are claimed to be robust. How many will replicate? Brian Nosek and his huge team tried it out and the results were sobering, to say the least. How did we get here? The data give some clues.

Sometimes the title of a paper just sounds incredible. Estimating the reproducibility of psychological science. No one had ever systematically, empirically investigated this for any science. Doing so would require huge resources. The countless authors on this paper which appeared in Science last week went to great lengths to try anyway and their findings are worrying.

When they tried to replicate 97 statistically significant effects with 92% power (i.e. a nominal 92% chance of finding the effect should it exist as claimed by the original discoverers), 89 statistically significant effect should pop up. Only 35 did. Why weren’t 54 more studies replicated?

The team behind this article also produced 95% Confidence Intervals of the replication study effect sizes. Despite their name, only 83% of them should contain the original effect size (see here why). Only 47% actually did. Why were most effect sizes much smaller in the replication?

One reason for poor replication: sampling until significant

I believe much has to do with so-called questionable research practices which I blogged about before. The consequences of this are directly visible in the openly available data of this paper. Specifically, I am focussing on the widespread practice of sampling more participants until a test result is statistically desirable, i.e. until you get a p-value below the arbitrary threshold of 0.05. The consequence is this:


Focus on the left panel first. The green replication studies show a moderate relation between the effect size they found and their pre-determined sample size. This is to be expected as the replicators wanted to be sure that they had sufficient statistical power to find their effects. Expecting small effects (lower on vertical axis) makes you plan in more participants (further right on horizontal axis). The replicators simply sampled their pre-determined number, and then analysed the data. Apparently, such a practice leads to a moderate correlation between measured effect size and sample size because what the measured effect size will be is uncertain when you start sampling.

The red original studies show a stronger relation between the effect size they found and their sample size. They must have done more than just smart a priori power calculations. I believe that they sampled until their effect was statistically significant, going back and forth between sampling and analysing their data. If, by chance, the first few participants showed the desired effect quite strongly, experimenters were happy with overestimating their effect size and stopped early. These would be red data values in the top left of the graph. If, on the other hand, the first few participants gave equivocal results, the experimenters continued for as long as necessary. Notice how this approach links sample size to the effect size measured in the experiment, hence the strong statistical relation. The approach by the replicators links the sample size merely to the expected effect size estimated before the experiment, hence the weaker association with the actually measured effect size.

The right panel shows a Bayesian correlation analysis of the data. What you are looking at is the belief in the strength of the correlation, called the posterior distribution. The overlap of the distributions can be used as a measure of believing that the correlations are not different. The overlap is less than 7%. If you are more inclined to believe in frequentist statistics, the associated p-value is .001 (Pearson and Filon’s z = 3.355). Therefore, there is strong evidence that original studies display a stronger negative correlation between sample size and measured effect size than replication studies.

The approach which – I believe – has been followed by the original research teams should be accompanied by adjustments of the p-value (see Lakens, 2014 for how to do this). If not, you misrepresent your stats and lower the chances of replication, as shown in simulation studies (Simmons et al., 2011). It is estimated that 70% of psychological researchers have sampled until their result was statistically significant without correcting their results for this (John et al., 2012). This might very well be one of the reasons why replication rates in Psychology are far lower than what they should be.

So, one approach to boosting replication rates might be to do what we claim to do anyways and what the replication studies have actually done: aquiring data first, analysing it second. Alternatively, be open about what you did and correct your results appropriately. Otherwise, you might publish nothing more than a fluke finding with no basis.

[24/10/2015: Added Bayesian analysis and changed figure. Code below is from old figure.]

[27/11/2015: Adjusted percentage overlap of posterior distributions.]

— — —
John LK, Loewenstein G, & Prelec D (2012). Measuring the prevalence of questionable research practices with incentives for truth telling. Psychological science, 23 (5), 524-32 PMID: 22508865

Lakens, D. (2014). Performing high-powered studies efficiently with sequential analyses European Journal of Social Psychology, 44 (7), 701-710 DOI: 10.1002/ejsp.2023

Open Science Collaboration (2015). Estimating the reproducibility of psychological science. Science (New York, N.Y.), 349 (6251) PMID: 26315443

Simmons, J., Nelson, L., & Simonsohn, U. (2011). False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant Psychological Science, 22 (11), 1359-1366 DOI: 10.1177/0956797611417632

— — —

code for reproducing the figure (if you find mistakes, please tell me!):

## Estimating the association between sample size and effect size from data provided by the reproducibility project https://osf.io/vdnrb/

#Richard Kunert for Brain's Idea 3/9/2015
#load necessary libraries

#get raw data from OSF website
info &lt;- GET('https://osf.io/fgjvw/?action=download', write_disk('rpp_data.csv', overwrite = TRUE)) #downloads data file from the OSF
MASTER &lt;- read.csv("rpp_data.csv")[1:167, ]
colnames(MASTER)[1] &lt;- "ID" # Change first column name to ID to be able to load .csv file

#restrict studies to those with appropriate data
studies&lt;-MASTER$ID[!is.na(MASTER$T_r..O.) &amp; !is.na(MASTER$T_r..R.)] ##to keep track of which studies are which
studies&lt;-studies[-31]##remove one problem study with absurdly high sample size (N = 23,0047)

#set font size for plotting
theme_set(theme_gray(base_size = 30))

#prepare correlation coefficients
dat_rank &lt;- data.frame(sample_size_O = rank(cbind(MASTER$T_N_O_for_tables[studies])),
sample_size_R = rank(cbind(MASTER$T_N_R_for_tables[studies])),
effect_size_O = rank(cbind(MASTER$T_r..O.[studies])),
effect_size_R = rank(cbind(MASTER$T_r..R.[studies])))
corr_O_Spearm = rcorr(dat_rank$effect_size_O, dat_rank$sample_size_O, type = "spearman")#yes, I know the type specification is superfluous
corr_R_Spearm = rcorr(dat_rank$effect_size_R, dat$sample_size_R, type = "spearman")

#compare Spearman correlation coefficients using cocor (data needs to be ranked in order to produce Spearman correlations!)
htest = cocor(formula=~sample_size_O + effect_size_O | sample_size_R + effect_size_R,
data = dat_rank, return.htest = FALSE)

#prepare data frame
dat_vis &lt;- data.frame(study = rep(c("Original", "Replication"), each=length(studies)),
sample_size = rbind(cbind(MASTER$T_N_O_for_tables[studies]), cbind(MASTER$T_N_R_for_tables[studies])),
effect_size = rbind(cbind(MASTER$T_r..O.[studies]), cbind(MASTER$T_r..R.[studies])))

#The plotting call
ggplot(data=dat_vis, aes(x=sample_size, y=effect_size, group=study)) +#the basic scatter plot
geom_point(aes(color=study),shape=1,size=4) +#specify marker size and shape
scale_colour_hue(l=50) + # Use a slightly darker palette than normal
geom_smooth(method=lm,   # Add linear regression lines
se=FALSE,    # Don't add shaded confidence region
aes(color=study))+#colour lines according to data points for consistency
geom_text(aes(x=750, y=0.46,
label=sprintf("Spearman rho = %1.3f (p = %1.3f)",
corr_O_Spearm$r[1,2], corr_O_Spearm$P[1,2]),
color="Original", hjust=0)) +#add text about Spearman correlation coefficient of original studies
guides(color = guide_legend(title=NULL)) + #avoid additional legend entry for text
geom_text(aes(x=750, y=0.2,
label=sprintf("Spearman rho = %1.3f (p = %1.3f)",
corr_R_Spearm$r[1,2], corr_R_Spearm$P[1,2]),
color="Replication", hjust=0))+#add text about Spearman correlation coefficient of replication studies
geom_text(x=1500, y=0.33,
label=sprintf("Difference: Pearson &amp; Filon z = %1.3f (p = %1.3f)",
htest@pearson1898$statistic, htest@pearson1898$p.value),
color="black", hjust=0)+#add text about testing difference between correlation coefficients
guides(color = guide_legend(title=NULL))+#avoid additional legend entry for text
ggtitle("Sampling until significant versus a priori power analysis")+#add figure title
labs(x="Sample Size", y="Effect size r")#add axis titles