Month: March 2012

Whose discovery? Declining Effect Sizes vs. Growing Influence

It is good practice in scientific publications to cite the original paper which established an idea, technique or model. However, there are worrying signs that these original papers exaggerated their effect sizes and are thus less trustworthy than later studies. Torn between a desire to grant credit where credit is due and writing about solid scientific findings, how should one decide?
A good example of a scientific breakthrough which established an idea comes from the neurosciences. Today, much of brain imaging is concerned with the localisation of function: Where in the brain is x? Whether we want to localise vision, intelligence or consciousness does not appear to matter to some researchers. There is much to say about this approach, its assumptions and its value. However, in this post I want to talk about its beginnings.
The first time a successful brain localisation was claimed was in the 19th century by anatomist Paul Broca. His discovery of the left inferior frontal cortex’s (the area below your left temple) importance for speech is still widely influential. After one of his patients with apparently normal general mental abilities but impaired speech died, Broca did a post mortem and found a lesion in his left frontal cortex. The same was true for 12 other patients. This was a breakthrough in the quest to link the brain to behaviour. The brain area Paul Broca identified as crucial for speech output is called Broca’s area to this day. The speech impairments resulting from its damage are still called Broca’s aphasia.
Discoveries like these can be game changers. However, there are worrying signs of reductions in how big effects are as results get replicated. Below is an example from psycholinguistics. Two sentences can mean practically identical things, such as ‘Peter gives the toy to Anna.’ (using a so called Prepositional Object)  and ‘Peter gives Anna the toy.’ (using a so called Direct Object). When language users have to decide which construction to use they show a strong tendency to unconsciously adopt the one they heard just before. This effect is called syntactic priming and is typically associated with Kathryn Bock’s 1986 article in the journal Cognitive Psychology. It has been used to argue for convergence of syntax in dialogue, shared mental resources for comprehension and production or overlap of the various languages available to a speaker.
As can be seen below, since its initial characterisation by Bock, the effect size of Prepositional vs. Direct Object priming has declined. Because this is actually a new, unpublished finding, let me elaborate what this figure actually shows (you can skip the next two paragraphs in case you trust me). On the vertical axis I show a standard way of reporting the priming effect size. Mind that the way this is calculated can differ between studies, so I recalculated all values ignoring answers not classifiable as either a Prepositional or a Direct Object. On the horizontal axis is simply the publication year. The thirteen data points refer to all studies I could find on Web of Science in December of 2011 which replicated Bock’s initial finding using the same task (picture description), totalling 1169 participants. The standard way of looking for associations between variables (here, effect size and time) is the Pearson correlation coefficient and it is significantly negative, indicating that the later a study is published the lower its effect size.

One should not over-interpret this finding. It could be due to a) unusual effect sizes, b) publication bias or c) task differences. What happens if I control for these things?
a) To check whether unusually high values early on or unusually low values later on carry the effect I replicate it with a so called non-parametric test (Kendall tau). This test ranks all values and correlates the ranks. The effect size is still significantly negatively associated with publication year. Extreme values (also called outliers) do not carry the effect.
b) Furthermore, I show the studies’ sample sizes in the size of the data points. Some may argue that small studies tend to have more extreme values. The extremely low ones are not published because of publication bias, i.e. journals’ tendency to only accept significant finings. This publication pattern may have declined over the years due to higher sample sizes or less publication pressure. To control for this possibility I statistically control for sample size and the decline effect still holds.Differences in sample size do not carry the effect.
c) Finally, there is a small difference in the methodology for two publications (three effect sizes marked in red) related to whether the priming was from comprehension to production with a repetition of the prime (as in Bock’s original article) or whether it was slightly different. Rejecting these two studies only strengthens the negative correlations (r=-0.71 (p=0.02); t=-0.69 (p=0.01); weighted r=-0.73 (p=0.02)).
If someone has another idea how this decline effect came about, I would be happy to hear it. For the moment it is clear that Bock’s syntactic priming effect mysteriously declines with every year.
In terms of Paul Broca’s finding of brain areas associated with speech production the decline effect has to be recast in terms of what Broca’s finding stands for: the localisation of a cognitive function in the brain. So, evidence for the decline effect would be both the fact that the same brain area can be associated with more than one function and the fact that the function of speech production is associated with more than one brain area.
What sort of functions is Broca’s area associated with today? I checked Brainscanr, a free online tool developed by Voytek which shows how likely two terms are to co-occur in the literature. As can be seen below, Broca’s area is indeed related to language production. However, it is also associated with language comprehension and even mirror neurons. The one-area-one-function mapping does not hold for Broca’s area.
But what about speech production? Perhaps Broca’s area is able to handle different jobs but still each job resides in a specific place in the brain. Again, I use a free online tool, Neurosynth, to check this. For this example, Neurosynth aggregates 44 studies showing various brain areas associated with speech production. Broca’s area is indeed present but so is its right hemisphere homologue and medial areas (in the middle of the brain) also light up. So, the decline effect can be shown quite nicely for the brain localisation of cognitive functions as well.
Jonah Lehrer gives more examples from drug, memory and sexual selection research. Mind that the decline effect is not a case of fraud-ish research but instead a good example of how the scientific process refines its methods over time and edges ever closer to agreeing on some basic truths about the world. The problem is that the truth is typically more difficult (read: messier) than initially thought.
But who is to take credit for the discovery then? The original scientist whose finding did not fully stand the test of time or the later scientist whose result is more believable? To me, the current agreement appears to be to credit the former (at least as long as he or she is still alive) and possibly add some findings by the latter if need be. Does this amount to good scientific practice?



Bock, J.K. (1986). Syntactic Persistence in Language Production. Cognitive Psychology, 18, 335-387. doi: 10.1016/0010-0285(86)90004-6

[12/4/2012: UPDATE]

I recently discovered a study which should have been included in the small scale meta-analysis. The revised values are:

correlations for analysis as reported in figure 1: r=-0.46 (p=0.07); t=-0.31 (p=0.12); r_weighted=-0.45 (p=0.09)

correlations for analysis without the two studies which slightly changed the design: r=-0.52 (p=0.07); t=-0.37 (p=0.12); r_weighted=-0.48 (p=0.12)

Thus, the original finding of a decline effect in the syntactic priming literature is at the most only marginally significant.

Gendered Language, Gendered Mind

What is so female about ships to call them she (LINK)? What is so neuter about children to call them it (LINK)? Now imagine that entire languages – like German, Spanish and French – are full of these arbitrary gender assignments, not allowing any genderless nouns. This has a profound effect on the way the mind works. A couple of articles published last year on the grammatical gender of nouns in different languages nicely illustrate this point.
To native speakers of gendered languages – i.e. languages whose nouns are all masculine, feminine or perhaps neuter – their language’s gender system usually appears obvious. I vividly remember sitting in France in a Philosophy class and the teacher elaborated on the female gender of life (la vie). According to her, life could only ever be feminine for some forgotten reason. When a class mate pointed out that life was neuter in German and, that, therefore, her reasoning was flawed she turned to me as a native German speaker. I could only agree with the comment and see her theory fall apart in real time (btw, life can even be masculine as for example in Bulgarian or Hebrew). This is the first experience which I can remember of a native speaker applying the mostly arbitrary grammatical gender system beyond the domain of language.
Recent research has found more examples of grammatical gender influencing how language users think about completely asexual things. In a very small experiment, an Israeli friend of mine (Rony Halevy) asked Hebrew speakers to dress up cutlery and other objects and found more feminine dresses on grammatically female items and vice versa for male items (see picture). Dutch controls, who do not distinguish between male and female grammatical gender, did not show a similar effect. Still, one may argue that the reference to gender was in the task already. Similarly, language based tasks in this field could be said to only reveal an effect of grammatical gender on other linguistic processes. So, can language really influence the mind in general?
Rony Halevy
Hebrew is a gendered language and participants tend to dress up simple objects such as a spoon or a fork according to their grammatical gender. The Dutch gender system does not refer to male and female and does not show the same effect. Data based on student project by Rony Halevy.
Cubelli et al. (2011) used a categorisation task in which participants had to quickly judge whether two pictures showed objects belonging to the same category or not. Judgements were faster if the objects’ grammatical gender matched. The authors interpreted this as showing that people access the words related to the pictures even when this is not required for the task.
Even outside the laboratory the effect can be shown. Sampling images from a big online art database, Segel and Boroditsky (2011) looked at all the gendered depictions of naturally asexual entities like love, justice, or time. Depicted gender agreed with grammatical gender in 78% of the cases. The effect was replicable for Italian, French and German. On top of that, it even held when only looking at those entities whose grammatical genders are conflicting in the studied languages.
It is worth reiterating that the aforementioned behaviours were completely non-linguistic. The grammatical gender system is just a set of rules for how words change when combined. The fact that people draw on these purely linguistic rules to perform unrelated tasks shows quite powerfully what a central role language plays in our minds.
But the effect may go further than that. In English, natural gender must be included in personal pronouns (he/she). Admittedly, there are exceptions (child – it) but they are rare. In Chinese, there is no such requirement. Personal pronouns can mark gender (written forms of ta) or not (spoken ta). Chen and Su (2011, Experiment 2) presented English or Chinese participants with written English or Chinese sentences which included gendered personal pronouns. Participants were asked to match each sentence to one of two pictures, each showing a person of a different gender. English speaking participants were faster and more accurate than Chinese speakers on these judgements. It’s as if English speakers are better trained in thinking about natural gender because English makes such thinking compulsory. Chinese participants, on the other hand, can produce pronouns without thinking of natural gender and, thus, have this information less readily available for their judgements.
One may argue that the effect relies on people of different native tongues showing different behaviours. These people probably differ in many ways other than their native language. Wider cultural differences could be invoked. Still, given that the effect holds for German, French, Italian, Spanish and Chinese, the most straightforward explanation indeed appears to be their language background. A way of overcoming the confounding influence of cultural upbringing may be to contrast second language learners of the same native language who learn different second languages.
Despite these short comings, the influence of the gender status of a language on the mind of its users is clearly measurable. This illustrates quite nicely that thought is influenced by what you must say – rather than by what you can say. This highlights that language is not an isolated skill but instead a central part of how our minds function. Studying language use is important – not just for the sake of language.
Chen, J-Y., & Su, J-J. (2011). Differential Sensitivity to the Gender of a Person by English and Chinese Speakers. Journal of Psycholinguist Research, 40, 195–203. doi: 10.1007/s10936-010-9164-9
Cubelli, R., Paolieri, D., Lotto, L., & Job, R. (2011). The Effect of Grammatical Gender on Object Categorization. Journal of Experimental Psychology: Learning, Memory, and Cognition, 37, 449–460. doi: 10.1037/a0021965
Segel, E., & Boroditsky, L. (2011). Grammar in art. Frontiers in Psychology, 1,1. doi: 10.3389/fpsyg.2010.00244