Month: May 2012

Correcting for Human Researchers – The Rediscovery of Replication

We need to control for this.

You may have missed some of the discussion on fraud, errors and biases shaking the scientific community of late, so I will quickly bring you up to speed.
Firstly, a series of fraud cases (Ruggiero, Hauser, Stapel) in Psychology and related fields makes everyone wonder why only internal whistleblowers ever discover major fraud cases like these.
Secondly, a well regarded journal publishes an article by Daryl Bem (2011) claiming that we can feel the future. Wagenmakers et al. (2011) apply a different statistical analysis and claim that Bem’s evidence for precognition is so weak as to be meaningless. The debate continues. Meanwhile a related failed replication paper claims to have trouble getting published.
Thirdly, John Bargh criticises everyone involved in a failed replication of an effect he is particularly well known for. He criticises the experimenters, the journal, even a blogger who wrote about it.
This all happened within the last year and suddenly everyone speaks about replication. Ed Yong wrote about it in nature, the Psychologist had a special issue on it, some researchers set up a big replication project, the blogosphere goes crazy with it.
Some may wonder why replication was singled out as the big issue. Isn’t this about the ruthless, immoral energy of fraudsters? Or about publishers’ craving for articles that create buzz? Or about a researcher’s taste for scandal? Perhaps it is indeed about a series of individual problems related to human nature. But the solution is still a systemic one: replication. It is the only way of overcoming the unfortunate fact that science is only done by mere humans.
This may surprise some people because replication is not done all that much. And the way researchers get rewarded for their work totally goes against doing replications. The field carries on as if there were procedures, techniques and analyses that overcome the need for replication. The most common of which is inferential hypothesis testing.
This way of analysing your data simply asks whether any differences found among the people who were studied would hold up in the population at large. If so, the difference is said to be a ‘statistically significant’ difference. Usually, this is boiled down to a p-value which reports the likelihood of finding the same statistically significant difference again and again in experiment after experiment if in truth the difference didn’t exist at all in the population. So, imagine that women and men in truth were equally intelligent (I have no idea whether they are). Inferential hypothesis testing predicts that 5% of experiments will report a significant difference between male and female IQs. This difference won’t be replicated by the other 95% of experiments.
And this is where replication comes in: the p-value can be thought of as a prediction of how likely failed replications of an effect will be. Needless to say that a prediction is a poor substitute for the real thing.
This was brought home to me by Luck in his great book An Introduction to the Event-Related Potential Technique (2005, p. 251). He basically says that replication is the only approach in science which is not based on assumptions needed to run the aforementioned statistical analyses.
Replication does not depend on assumptions about normality, sphericity, or independence. Replication is not distorted by outliers. Replication is a cornerstone of science. Replication is the best statistic.
In other words, it is the only way of overcoming the human factor involved in choosing how to get to a p-value. You can disagree on many things, but not on the implication of a straight replication. If the effect is consistently replicated, it is real.
For example, Simmons and colleagues (2011) report that researchers can tweak their data easily without anyone knowing. This is not really fraud but it is not something you want to admit, either. Using four ways of tweaking the statistical analysis towards a significant result – which is desirable for publication – resulted in a statistically significant difference having a non-replication likelihood of 60%. Now, this wouldn’t be a problem if anyone actually bothered to do a replication – including the exact same tweaks to the data. It is very likely that the effect wouldn’t hold up.
Many people believe that this is what really happened with Bem’s pre-cognition results. They are perhaps not fraudulous, but the way they were analysed and reported inflated the chances of finding effects which are not real. Similarly, replication is what did not happen with Stapel and other fraudsters. My guess is that if anyone had actually bothered to replicate, it would have become clear that Stapel has a history of unreplicability (see my earlier blog post about the Stapel affair for clues).
So, if we continue to let humans do research, we have to address the weakness inherent in this approach. Replication is the only solution we know of.


Bem, D.J., Feeling the Future: Experimental Evidence for Anomalous Retroactive Influences on Cognition and Affect. Journal of Personality and Social Psychology, 100, 407-425. DOI: 10.1037/a0021524
Luck, S.J. (2005). An Introduction to the Event-Related Potential Technique. London: MIT Press.
Simmons, J.P., Nelson, L.D., Simonsohn, U. (2011). False Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant. Psychological Science, 22, 1359-1366. DOI: 10.1177/0956797611417632
Wagenmakers, E.J., Wetzels, R., Borsboom, D., van der Maas, H. (2011). Why Psychologists Must Change the Way They Analyze Their Data: The Case of Psi. Journal of Personality and Social Psychology, 100, 426-432. doi: 10.1037/a0022790

Thought Metaphors

Is crime alive? Where is musical pitch?
Neither question makes any sense.
And nonetheless, one can answer them. Crime can be a beast haunting local neighbourhoods and it must be eradicated – a description suggesting it is well and alive. And musical pitch is high or low.
Of course, these are all just metaphors useful for quickly talking about things without having to stop for lengthy definitions. However, they are not only linguistic short cuts. They are also mental short cuts – or opportunities for manipulation, if you prefer a more racy description. Last year, a bunch of studies showed examples of how far one can go with this.

A metaphorical breeding program.

Thibodeau and Boroditsky (2011) contrasted two common Western metaphors related to crime: the crime as a beast (preying on a town, lurking in the neighbourhood) and crime as a virus (infecting a town, plaguing the neighbourhood). They ‘activated’ these metaphors by using these words alongside fictional crime statistics of an unknown town. When participants were asked what to do about the town’s crime problem, those in the beast-condition were more likely to suggest law enforcement actions (capture, enforce, punish) than those in the virus-condition who often opted for reform-measures (diagnose, treat, inoculate).
Thus, a linguistic short-cut affected how people reacted to a realistic real world problem in the realm of social policy. And the effects are big. As one might expect, the same researchers also found political and gender differences (US Republicans as well as men tend to be more on the enforcement side than US Democrats/ Independents and women). Simply mentioning a metaphor was twice as powerful in shaping opinion than any of these variables.
high pitch

A literally high pitch.

In a different set of studies, even something as basic as the height of a tone was shown to be metaphorical. Dolscheid and colleagues (2011) showed that when a tone is presented with an image of height (basically a vertical line crossed by another line at a high or low point) this influences Westerners’ pitch repetition – as would be expected by the pitch-as-height metaphor. When Dutch participants sang a tone paired with a high line, they tended to sing higher. An image of thickness (a thick or thin line) was without influence. The reverse was the case for Farsi speakers even though they lived in the same country. In Farsi, low tones are called thick and high tones are called thin. In a second step, the research team trained people for only 20 minutes with the thickness metaphor – without them knowing. Afterwards, Dutch people performed similarly to Farsi speakers who had known it all their lives.
The wider point is one I have made before: Language is not just for talking, it is also a window into the Mind. However, the metaphor research goes further by also showing how easily this window gives access to the Mind, how easily we can be manipulated. Something as important as how to address crime can be influenced by a recently encountered metaphor. The same applies to something as basic as singing back a tone.
And don’t say they can be spotted easily. Or did you notice the race metaphor written black on white at the beginning of this post?
Dolscheid, S., Shayan, S., Majid, A., & Casasanto, D. (2011). The Thickness of Musical Pitch: Psychophysical evidence for the Whorfian hypothesis. Proceedings of the 33rd annual meeting of the Cognitive Science Society, Boston, MA.
Thibodeau, P.H., & Boroditsky, L. (2011). Metaphors We Think With: The Role of Metaphor in Reasoning. Plos One, 6 (2), e16782. doi:10.1371/journal.pone.0016782

Why do we like sad Music?


But I’m a creep.
I’m a weirdo.
What the hell am I doing here?
I don’t belong here.


Why would anyone want to listen to this?

Radiohead’s song Creep is not the exception in being a heartbreaking but nonetheless successful song. According to Wikipedia , of the ten best-selling music singles ever several are clearly sad songs: Elton John’s Candle in the Wind, The Ink Spot’s If I didn’t care, or Kenny Roger’s Lady. Music does influence one’s mood. For that reason some psychological experiments even use it as a mood induction technique. But given that people generally strive for happiness, why would anyone willingly opt for sad music?

This is exactly what Van den Tol and Edwards asked people online (article in press at Psychology of Music). The most important function they identified in the responses was (re-)experiencing affect, i.e. listening to sad music in order to induce ‘sadness, loss or grief, and occasionally other negative feelings such as disappointment and anger’ (p. 10). Other functions were also mentioned but the take-home message is that, usually, sad music is chosen because it makes people – who are often already sad – feel sad. Very puzzling.

Even more puzzling is that these objectively negative feelings were only rarely reported as being experienced in a negative way. As if music-induced sadness is not quite like real sadness. Van den Tol and Edwards interpret their results as sad music being a sort of self-regulation tool. But how does the tool work?

No one really knows. Still, there are some ideas out there.

1) The safe distance theory

Thompson (2009; see Schubert, 1996) claims that musical sadness is unlike real sadness because, well, it isn’t actually real. It is without consequence. Therefore, one can explore a feeling without becoming engulfed in it. According to this hypothesis one can listen to Radiohead’s Creep and feel like a complete loser without actually having to be one.

It is difficult to test this because one would have to distinguish between participants’ safely distant sadness and their real sadness. I doubt that any ethical board would allow a researcher to deliberately sadden a participant for real.

2) The shared pain theory

Levitin (2008) claims that musical sadness serves to ‘[bring] us through stages of feeling understood, feeling less alone in the world, hopeful that if someone else recovered so will we’ (p. 135). Like in most of his book, Levitin sees music as a social tool. On this account, the difference between musical sadness and real sadness lies in the former one being shared while the latter one is more private. Elton John’s Candle in the Wind is a good example. Released following Lady Diana’s death, it perhaps helped people worldwide to share an emotion which they otherwise would have had to deal with by themselves.

3) The Prolactin theory

Prolactin is a hormone associated with feelings of tranquillity, calmness, well-being, or consolation. Huron (2011) suggests that the body uses it to counteract grief and thus avoid descending into an uncontrollably depressive episode. Such hormonal counter-measures to negative environmental inputs are also found for physical pain. Physical pain is reduced by endorphins. Such a bodily mechanism can be exploited – as when heroin addicts fool the brain’s response to pain. Huron (2011) proposes that sad music can activate the counter-measures to actual sadness – i.e. prolactin production – without any real sadness being present. One gets the hormone’s consoling effect without the sadness and might thus actually enjoy it.

On should not forget that -even though it is intuitive – Huron’s Prolactin theory is not supported by a great deal of experimental evidence. But at least it is straight forward to test.

Of course, all three theories could be true. The puzzle of people’s tendency to often listen to sad music could have to do with the safe distance between musically induced sadness and one’s true emotions. This distance may allow prolactin to have an unusually positive effect because it is not balanced by the real sadness it is designed to counteract. On top of that, a more cognitive appreciation of sharing this experience with other people may aid the process. Targeted research is needed in order to test these theories.

So, people do indeed strive for happiness and therefore enjoy energetic, upbeat music. However, when times get rough it can seem better to switch gears and deal with the sadness first before moving on. It appears like this is where sad music could come in. According to the three aforementioned theories, gloomy music not so much leads to bad moods. It is the other way around. Bad moods require sad music.

— — —

Huron, D. (2011). Why is sad music pleasurable? A possible role for prolactin. Musica Scientiae, 15, 146-158. doi: 10.1177/1029864911401171

Levitin, D.J. (2008). The World in Six Songs. London: Aurum Press

Thompson, W.T. (2009). Music, Thought, and Feeling: Understanding the Psychology of Music. Oxford: Oxford University Press

Van den Tol, A.J.M., Edwards, J. (in press). Exploring the rationale for choosing to listen to sad music when feeling sad. Psychology of Music. doi: 10.1177/0305735611430433