Whose discovery? Declining Effect Sizes vs. Growing Influence

It is good practice in scientific publications to cite the original paper which established an idea, technique or model. However, there are worrying signs that these original papers exaggerated their effect sizes and are thus less trustworthy than later studies. Torn between a desire to grant credit where credit is due and writing about solid scientific findings, how should one decide?
A good example of a scientific breakthrough which established an idea comes from the neurosciences. Today, much of brain imaging is concerned with the localisation of function: Where in the brain is x? Whether we want to localise vision, intelligence or consciousness does not appear to matter to some researchers. There is much to say about this approach, its assumptions and its value. However, in this post I want to talk about its beginnings.
The first time a successful brain localisation was claimed was in the 19th century by anatomist Paul Broca. His discovery of the left inferior frontal cortex’s (the area below your left temple) importance for speech is still widely influential. After one of his patients with apparently normal general mental abilities but impaired speech died, Broca did a post mortem and found a lesion in his left frontal cortex. The same was true for 12 other patients. This was a breakthrough in the quest to link the brain to behaviour. The brain area Paul Broca identified as crucial for speech output is called Broca’s area to this day. The speech impairments resulting from its damage are still called Broca’s aphasia.
Discoveries like these can be game changers. However, there are worrying signs of reductions in how big effects are as results get replicated. Below is an example from psycholinguistics. Two sentences can mean practically identical things, such as ‘Peter gives the toy to Anna.’ (using a so called Prepositional Object)  and ‘Peter gives Anna the toy.’ (using a so called Direct Object). When language users have to decide which construction to use they show a strong tendency to unconsciously adopt the one they heard just before. This effect is called syntactic priming and is typically associated with Kathryn Bock’s 1986 article in the journal Cognitive Psychology. It has been used to argue for convergence of syntax in dialogue, shared mental resources for comprehension and production or overlap of the various languages available to a speaker.
As can be seen below, since its initial characterisation by Bock, the effect size of Prepositional vs. Direct Object priming has declined. Because this is actually a new, unpublished finding, let me elaborate what this figure actually shows (you can skip the next two paragraphs in case you trust me). On the vertical axis I show a standard way of reporting the priming effect size. Mind that the way this is calculated can differ between studies, so I recalculated all values ignoring answers not classifiable as either a Prepositional or a Direct Object. On the horizontal axis is simply the publication year. The thirteen data points refer to all studies I could find on Web of Science in December of 2011 which replicated Bock’s initial finding using the same task (picture description), totalling 1169 participants. The standard way of looking for associations between variables (here, effect size and time) is the Pearson correlation coefficient and it is significantly negative, indicating that the later a study is published the lower its effect size.

One should not over-interpret this finding. It could be due to a) unusual effect sizes, b) publication bias or c) task differences. What happens if I control for these things?
a) To check whether unusually high values early on or unusually low values later on carry the effect I replicate it with a so called non-parametric test (Kendall tau). This test ranks all values and correlates the ranks. The effect size is still significantly negatively associated with publication year. Extreme values (also called outliers) do not carry the effect.
b) Furthermore, I show the studies’ sample sizes in the size of the data points. Some may argue that small studies tend to have more extreme values. The extremely low ones are not published because of publication bias, i.e. journals’ tendency to only accept significant finings. This publication pattern may have declined over the years due to higher sample sizes or less publication pressure. To control for this possibility I statistically control for sample size and the decline effect still holds.Differences in sample size do not carry the effect.
c) Finally, there is a small difference in the methodology for two publications (three effect sizes marked in red) related to whether the priming was from comprehension to production with a repetition of the prime (as in Bock’s original article) or whether it was slightly different. Rejecting these two studies only strengthens the negative correlations (r=-0.71 (p=0.02); t=-0.69 (p=0.01); weighted r=-0.73 (p=0.02)).
If someone has another idea how this decline effect came about, I would be happy to hear it. For the moment it is clear that Bock’s syntactic priming effect mysteriously declines with every year.
In terms of Paul Broca’s finding of brain areas associated with speech production the decline effect has to be recast in terms of what Broca’s finding stands for: the localisation of a cognitive function in the brain. So, evidence for the decline effect would be both the fact that the same brain area can be associated with more than one function and the fact that the function of speech production is associated with more than one brain area.
What sort of functions is Broca’s area associated with today? I checked Brainscanr, a free online tool developed by Voytek which shows how likely two terms are to co-occur in the literature. As can be seen below, Broca’s area is indeed related to language production. However, it is also associated with language comprehension and even mirror neurons. The one-area-one-function mapping does not hold for Broca’s area.
But what about speech production? Perhaps Broca’s area is able to handle different jobs but still each job resides in a specific place in the brain. Again, I use a free online tool, Neurosynth, to check this. For this example, Neurosynth aggregates 44 studies showing various brain areas associated with speech production. Broca’s area is indeed present but so is its right hemisphere homologue and medial areas (in the middle of the brain) also light up. So, the decline effect can be shown quite nicely for the brain localisation of cognitive functions as well.
Jonah Lehrer gives more examples from drug, memory and sexual selection research. Mind that the decline effect is not a case of fraud-ish research but instead a good example of how the scientific process refines its methods over time and edges ever closer to agreeing on some basic truths about the world. The problem is that the truth is typically more difficult (read: messier) than initially thought.
But who is to take credit for the discovery then? The original scientist whose finding did not fully stand the test of time or the later scientist whose result is more believable? To me, the current agreement appears to be to credit the former (at least as long as he or she is still alive) and possibly add some findings by the latter if need be. Does this amount to good scientific practice?



Bock, J.K. (1986). Syntactic Persistence in Language Production. Cognitive Psychology, 18, 335-387. doi: 10.1016/0010-0285(86)90004-6

[12/4/2012: UPDATE]

I recently discovered a study which should have been included in the small scale meta-analysis. The revised values are:

correlations for analysis as reported in figure 1: r=-0.46 (p=0.07); t=-0.31 (p=0.12); r_weighted=-0.45 (p=0.09)

correlations for analysis without the two studies which slightly changed the design: r=-0.52 (p=0.07); t=-0.37 (p=0.12); r_weighted=-0.48 (p=0.12)

Thus, the original finding of a decline effect in the syntactic priming literature is at the most only marginally significant.


  1. thanks a lot! I was unaware of the nature commentary.

    It discusses the following reasons for the decline effect. Could they account for the decline effect in syntactic priming?

    – regression to the mean
    would predict a non-linear effect of an extreme early value and then ‘average’ values
    –> contradicted by non-parametric test
    would predict effect being carried by differences in sample size
    –> contradicted by weighted correlation

    – changes in participant knowledge
    unlikely because during de-briefing participants are typically asked whether they figured out the manipulation
    also, there is no social dimension to this task. a participant sits in a room and describes pictures after having heard/read sentences. It is not immediately clear to me how the experimenter could convey even unconsciously which syntactic structure to use.

    – publication bias
    would predict the data pattern observed

    – improvement of procedure with later studies (less exploratory data analysis, less selective methods/results reporting, more motivation to follow up effects which initially look unpromising)
    would predict the data pattern observed

    So, after having read the nature commentary I feel even more confident that the decline effect is indeed an edging towards scientific truth rather than a sign of sloppy modern science.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s