The origins of tonal consonance—the tendency to perceive some simultaneously sounded combinations of musical tones as more pleasant than others—is arguably among the most fundamental questions in music perception. For more than a century, the issue has been the subject of vigorous debate, undoubtedly fueled by the formidable complexities involved in investigating music-induced affective qualia that are not directly observable and often ineffable. The challenge of drawing definitive conclusions in this area of inquiry is well exemplified by the markedly divergent, yet equally thoughtful, responses offered in these commentaries.
According to Bowling, our findings are an important source of converging evidence for his Vocal Similarity Hypothesis (VSH), the notion that consonance derives from an evolved preference for harmonic vocal sounds (Bowling, Purves, & Gill, 2018). However, he suggests that our interpretation of the results may cast a less favorable light on the VSH than is warranted. For example, he is skeptical of our contention that spectral interference (SI) accounts for greater variance in consonance judgments than harmonicity, arguing that the high correlation between these predictors “present[s] a problem for their separation via regression.” Yet, upon examination, the correlations between the harmonicity and SI measures that we used in our regression analyses were only moderate at best for our unconventional chord stimuli (-.54). Moreover, a Variance Inflation Factor analysis (Chatterjee & Price, 2012) for all four relevant regressions yields values under 1.26, close to their lower bound. This suggests that the precision of our regression coefficients was not likely to have been diminished due to multicollinearity. Our conclusion regarding the relative strength of the impact of SI on consonance ratings gains further credence from the work of Harrison and Pearce (2020), who reported analogous findings based on a reanalysis of four different behavioral datasets using conventional chords. Nevertheless, we agree with Bowling that consonance researchers should be wary of multicollinearity when comparing the predictive utility of different musical features, as certain harmonicity or SI metrics may indeed share substantial variance (see e.g., Bowling, this issue, Figure 2).
Whereas Bowling suggests that our analysis and study design may have sold the VSH short by underweighting the contribution of harmonicity to consonance, both Smit and Milne as well as Harrison argue the opposite, proposing that we may have oversold the extent to which our findings support the VSH. Indeed, Harrison argues that our results leave open at least two alternative hypotheses: First, harmonicity may be preferred, not due to an evolved preference for voice-like sounds, but because harmonicity facilitates the identification of distinct auditory sources in the environment. Second, a preference for harmonic sounds may have evolved not because it reinforced attention to conspecific vocal communications (as posited by the VSH; Bowling et al., 2018), but because it reinforced social bonding via collective music making.
Although critical details of these alternative accounts remain to be clarified, we agree that our results do not “support” the VSH in the strong sense of confirming it empirically. As we noted in our article, the primary goal of our study was to rule out the possibility that the association between consonance and harmonicity shown in Western chords was an artifact of familiarity. Our results suggest that this was unlikely to have been the case. In the absence of such evidence, the viability of the VSH would have been in grave doubt.
In line with Harrison’s assessment, we concur that it will be enormously challenging to find “positive” evidence of an evolved preference for voice-like sounds, assuming it does exist (cf. McDermott, Schultz, Underraga, & Godoy, 2016). As noted by Bowling (this issue), “the auditory system receives harmonic stimulation from mother’s larynx as soon as it comes on-line,” making it difficult to determine whether a preference for harmonic chords derives from our evolutionary heritage or instead reflects exposure to harmonic sounds over the course of development. In addition to calling for more extensive cross-cultural research, Harrison also highlights the potential value of studies using animal models. For instance, he astutely notes that if preferences for harmonicity were shown to exclusively exist in species that emit harmonic vocalizations, this would stand to affirm the VSH. However, research of this ilk may be fraught with ambiguity inasmuch as affirmative evidence for the VSH in vocal animal species (e.g., a preference for harmonicity in budgerigars; Wagner, Bowling, & Hoeschele, 2020) would stop short of confirming that human harmonicity preferences were analogously evolved rather than learned. Correspondingly, the failure to experimentally reveal such effects in non-human animals would by no means rule out the existence of an adapted preference for voice-like sounds within our own species.
Encouragingly, as reflected in all three commentaries, a consensus does appear to be emerging among music perception researchers that “no single acoustic measure can fully predict the complex experience of consonance” (Smit & Milne, this issue). In addition, as convincingly argued by Smit and Milne, these predictors may extend beyond harmonicity and spectral interference. In a provocative recent study, they found that higher average pitch (which may be associated with the expression of positively valenced emotion; e.g., Friedman, Neill, Seror, & Kleinsmith, 2018) reliably bolsters consonance judgments in unconventionally tuned chords (Smit, Milne, Dean, & Weidemann, 2019). In the same study, they also found compelling evidence that spectral entropy, the information-theoretical unpredictability of the spectrum constituting a given chord, leads chords to sound less pleasant. In a similar vein, Harrison and Pearce (2020) have recently found that the sheer number of tones in a chord is positively associated with consonance, whereas Lahdelma and Eerola (2020) conceptually replicated this “numerosity” effect while providing intriguing evidence that it might depend upon participants’ particular construal of consonance (e.g., as “pleasantness” versus “harmoniousness” or “purity”).
In light of these and other findings, it appears that music perception researchers are beginning to move beyond debating which mechanism accounts for tonal consonance and instead turning to explaining how a multitude of mechanisms additively and/or interactively contribute to these judgments. Although such questions may be difficult to resolve given existing tools, it is also clear that bold theoretical propositions such as Bowling and his colleagues’ (2018) VSH play an invaluable role in stimulating and giving shape to fruitful avenues for research. We are very grateful for the thought-provoking commentaries that we received regarding our article and hope that this spirited, yet constructive dialogue in Music Perception will help inspire additional efforts to collectively grapple with an age-old conundrum at the heart of our field.