I discuss three fundamental questions underpinning the study of consonance: 1) What features cause a particular chord to be perceived as consonant? 2) How did humans evolve the ability to perceive these features? 3) Why did humans evolve to attribute particular aesthetic valences to these features (if they did at all)? The first question has been addressed by several recent articles, including Friedman, Kowalewski, Vuvan, and Neill (2021), with the common conclusion that consonance in Western listeners is driven by multiple features such as harmonicity, interference between partials, and familiarity. On this basis, it seems relatively straightforward to answer the second question: each of these consonance features seems to be grounded in fundamental aspects of human auditory perception, such as auditory scene analysis and auditory long-term memory. However, the third question is harder to resolve. I describe several potential answers, and argue that the present evidence is insufficient to distinguish between them, despite what has been claimed in the literature. I conclude by discussing what kinds of future studies might be able to shed light on this problem.
Music psychologists have long been interested in consonance, the sense in which certain combinations of tones seem to fit well together while other combinations seem to fit poorly. The psychological study of consonance is underpinned by three fundamental research questions: 1) What features cause a particular chord to be perceived as consonant? 2) How did humans evolve the ability to perceive these features? 3) Why did humans evolve to attribute particular aesthetic valences to these features (if they did at all)?
Several recent studies, including Friedman, Kowalewski, Vuvan, and Neill (2021), study the first question using regression modeling (Eerola & Lahdelma, 2020; Harrison & Pearce, 2020; Smit, Milne, Dean, & Weidemann, 2019). The common conclusion of these studies is that consonance in Western listeners is driven by multiple features (though see McDermott, Lehr, & Oxenham, 2010, for contradicting claims). All studies identify distinct contributions of harmonicity and interference between partials, and several additionally identify contributions of familiarity as operationalized in various ways (Eerola & Lahdelma, 2020; Harrison & Pearce, 2020; Smit et al., 2019).
The second question seems also to have a straightforward answer: each feature seems to be grounded in fundamental aspects of human auditory perception. Harmonicity is a key feature underpinning fusion processes in auditory scene analysis that is particularly useful for understanding vocalizations (e.g., McPherson et al., 2020). Interference between partials produces characteristic amplitude modulations (“beating”) that propagate through the auditory system (e.g., Vencovskỳ, 2016). Familiarity is meanwhile a simple byproduct of auditory long-term memory.
The third question is less tractable. The literature suggests several speculative answers:
Auditory scene analysis
Humans evolved to like harmonicity and dislike interference because acoustic environments with these properties are easy to understand and hence safer for the organism (Huron, 2001).
Vocal similarity
Humans evolved to like harmonicity and to dislike interference between partials because this encouraged them to attend carefully to vocalizations, which carry important information about conspecifics (Bowling, Purves, & Gill, 2018).
Social bonding
Humans evolved to prefer sounds with high harmonicity and low interference because this preference encouraged collaborative music making, which in turn had adaptive functions of fostering social bonding (Savage et al., 2020).
These three positive hypotheses may be contrasted with the following null hypothesis:
The null hypothesis (H0)
Humans never evolved any particular aesthetic predispositions towards harmonicity or interference. The Western preference for harmonicity and dislike of interference is simply due to cultural convention (McDermott, Schultz, Undurraga, & Godoy, 2016).1
New behavioral data must be evaluated in the light of all competing hypotheses. Now, Friedman et al. (2021) write that their results provide support for the vocal similarity hypothesis, because the vocal similarity hypothesis successfully predicts that humans will prefer chords that exhibit high harmonicity and low interference. However, this conclusion is problematic, as their results are also consistent with each of the other hypotheses as detailed above. Bowling et al. (2018) similarly make a problematic conclusion in favor of the vocal similarity hypothesis without properly considering the alternative hypotheses.
In particular, before making any conclusions about Q3, it is essential to establish that humans are naturally predisposed towards harmonicity and against interference, rather than these preferences being solely culturally learned. McDermott et al. (2016) present important negative evidence here, showing that the Tsimane’ people do not share Western preferences for harmonicity or for the kinds of chords that Westerners perceive as consonant; likewise, the existence of beat diaphony traditions (Ambrazevičius, 2017; Florian, 1981; Vassilakis, 2005; Vyčinienė, 2002) suggests that listeners in some musical cultures can develop inverse preference patterns to Western listeners. However, a complete solution to this question requires large-scale studies examining preferences cross-culturally.
Large-scale cross-species studies could meanwhile deliver compelling support for one of the three positive hypotheses. If harmonicity liking and interference disliking were common to most animals with well-developed auditory scene analysis capacities, this would support the auditory scene analysis hypothesis. If these phenomena were shown to be mostly specific to species who make pitched vocalizations, this would support the vocal similarity hypothesis. If the phenomena were shown to be specific to species who make synchronized vocalizations, this would support the social bonding hypothesis. Unfortunately, the existing research does not point to any of these outcomes: as of yet, there is no compelling evidence that any non-human animals perceive consonance in the valenced way that Western listeners do (see Toro & Crespo-Bojorque, 2017, for a review and Harrison & Pearce, 2020, for a discussion).
Large-scale cross-cultural and cross-species studies present significant logistical challenges. In the meanwhile, some progress could be made in disentangling the hypotheses by examining additional chord features that might predict consonance judgments. For example, if consonance judgments were shown to be meaningfully influenced by additional acoustic features specific to auditory scene analysis, this would provide some support for the auditory scene analysis hypothesis. Wright and Bregman (1987) present interesting arguments in this direction that still lack comprehensive empirical investigation (though see Bonin, Trainor, Belyk, & Andrews, 2016, and Kowalewski, Friedman, Zavoyskiy, & Neill, 2019). Alternatively, if consonance judgments were shown to be meaningfully influenced by additional features specific to vocalizations, this would support the vocal similarity hypothesis. One must acknowledge, however, that this work cannot really determine whether preferences for these features truly evolved, rather than being wholly culturally determined: for this we still need cross-cultural studies.
On the one hand, this is a negative conclusion: properly understanding the evolutionary bases of consonance will require logistically complex cross-cultural or cross-species studies rather than comparatively straightforward studies with Western-enculturated listeners (e.g., Bowling et al., 2018; Friedman et al., 2021). On the other hand, the current growth of interest in large-scale cross-cultural studies provides good reasons to be optimistic for the future (e.g., Mehr et al., 2019; Savage, Brown, Sakai, & Currie, 2015). In the meanwhile, we must treat these kinds of evolutionary questions with caution.
Note
In fact, McDermott et al. (2016) state a slightly different version of this hypothesis. They claim that only harmonicity, not interference, contributes to consonance; furthermore, they do not claim that aversion to interference is not universal, only that liking of harmonicity is not universal.