How to test for music skills

In a new article I evaluate a recently developed test for music listening skills. To my great surprise the test behaves very well. This could open the path to better understand the psychology underlying music listening. Why am I surprised?

I got my first taste of how difficult it is to replicate published scientific results during my very first empirical study as an undergraduate (eventually published as Kunert & Scheepers, 2014). Back then, I used a 25 minute long dyslexia screening test to distinguish dyslexic participants from non-dyslexic participants (the Lucid Adult Dyslexia Screener). Even though previous studies had suggested an excellent sensitivity (identifying actually dyslexic readers as dyslexic) of 90% and a moderate to excellent specificity (identifying actually non-dylexic readers as non-dyslexic) of 66% – 91% (Singleton et al., 2009; Nichols et al., 2009), my own values were worse at 61% sensitivity and 65% specificity. In other words, the dyslexia test only flagged someone with an official dyslexia diagnosis in 11/18 cases and only categorised someone without known reading problems as non-dyslexic in 13/20 cases. The dyslexia screener didn’t perform exactly as suggested by the published literature and I have been suspicious of ability tests every since.

Five years later I acquired data to look at how music can influence language processing (Kunert et al., 2016) and added a newly proposed music abilitily measure called PROMS (Law & Zentner, 2012) to the experimental sessions to see how bad it is. I really thought I would see the music listening ability scores derived from the PROMS to be conflated with things which on the face of it have little to do with music (digit span, i.e. the ability to repeat increasingly longer digit sequences), because previous music ability tests had that problem. Similarly, I expected people with better music training to not have that much better PROMS scores. In other words, I expected the PROMS to perform worse than suggested by the people who developed the test, in line with my negative experience with the dylexia screener.

It then came as a surprise to see that PROMS scores were hardly associated with the ability to repeat increasingly longer digit sequences (either in the same order, i.e. forward digit span, or in reverse order, i.e. backward digit span), see Figure 1A and 1B. This makes the PROMS scores surprisingly robust against variation in working memory, as you would expect from a good music ability test.


Figure 1. How the brief PROMS (vertical axis) correlates with various validity measures (horizontal axis). Each dot is one participant. Lines are best fit lines with equal weights for each participant (dark) or downweighting unusual participants (light). Inserted correlation values reflect dark line (Pearson r) or a rank-order equivalent of it which is robust to outliers (Spearman rho). Correlation values range from -1 to +1.

The second surprise came when musical training was actually associated with better music skill scores, as one would expect for a good test of music skills, see Figures 1C, 1D, 1E, and 1H. To top it of, the PROMS score was also correlated with the music task performance in the experiment looking at how language influences music processing. This association between the PROMS and musical task accuracy was visible in two independent samples, see Figures 1F and 1G, which is truly surprising because the music task targets harmonic music perception which is not directly tested by the PROMS.

To conclude, I can honestly recommend the PROMS to music researchers. To my surprise it is a good test which could truly tell us something about where music skills actually come from. I’m glad that this time I have been proven wrong regarding my suspicions about ability tests.

— — —

Kunert R, & Scheepers C (2014). Speed and accuracy of dyslexic versus typical word recognition: an eye-movement investigation. Frontiers in psychology, 5 PMID: 25346708

Kunert R, Willems RM, & Hagoort P (2016). Language influences music harmony perception: effects of shared syntactic integration resources beyond attention. Royal Society open science, 3 (2) PMID: 26998339

Kunert R, Willems RM, & Hagoort P (2016). An Independent Psychometric Evaluation of the PROMS Measure of Music Perception Skills. PloS one, 11 (7) PMID: 27398805

Law LN, & Zentner M (2012). Assessing musical abilities objectively: construction and validation of the profile of music perception skills. PloS one, 7 (12) PMID: 23285071

Nichols SA, McLeod JS, Holder RL, & McLeod HS (2009). Screening for dyslexia, dyspraxia and Meares-Irlen syndrome in higher education. Dyslexia, 15 (1), 42-60 PMID: 19089876

Singleton, C., Horne, J., & Simmons, F. (2009). Computerised screening for dyslexia in adults Journal of Research in Reading, 32 (1), 137-152 DOI: 10.1111/j.1467-9817.2008.01386.x
— — —

The 10,000-Hour rule is nonsense

Have you heard of Malcom Gladwell’s 10,000-hour rule? The key to success in any field is practice, and not just a little. A new publication in the journal Psychological Science had a good look at all the evidence and concludes that this rule is nonsense. No Einstein in you, I am afraid.

Albert Einstein, by Doris Ulmann.jpg

Did he just practice a lot?

The authors of the new publication wanted to look at all major areas of expertise where the relationship between practice and performance had been investigated: music, games, sports, professions, and education. They accumulated all the 88 scientific articles that are available at this point and performed one big analysis on the accumulated data of 11,135 participants. A meta-analysis with a huge sample.

The take-home number is 12%. The amount of practice that you do only explains 12% of your performance in a given task. From the 10,000-Hour rule I expected at least 50%. And this low number of 12% is not due to fishy methods in some low-quality articles that were included. Actually, the better the method to assess the amount of practice the lower the apparent effect of practice. The same goes for the method to assess performance on the practiced task.

However, one should differentiate between different kinds of activities. Practice can have a bigger effect. For example, if the context in which the task is performed is very stable (e.g., running) 24% of performance is explained by practice. Unstable contexts (e.g., handling an aviation emergency) push this down to 4% . The area of expertise also made a difference:

  • games: 26%
  • music: 21%
  • sports: 18%
  • education: 4%
  • professions: 1%

In other words the 10,000-Hour rule is nonsense. Stop believing in it. Sure, practice is important. But other factors (age? intelligence? talent?) appear to play a bigger role.

Personally, I have decided not to become a chess master by practicing chess for 10,000 hours or more. I rather focus on activities that play to my strengths. Let’s hope that blogging is one of them.

Macnamara, B.N., Hambrick, D.Z., & Oswald, F.L. (2014). Deliberate Practice and Performance in Music, Games, Sports, Education, and Professions: A Meta-Analysis Psychological Science DOI: 10.1037/e633262013-474






Albert Einstein, by Doris Ulmann” by Doris Ulmann (1882 – 1934) – Library of Congress, Prints & Photographs Division, [reproduction number LC-USZC4-4940]. Licensed under Public domain via Wikimedia Commons.