In the article “Consonance preferences within an unconventional tuning system,” Friedman and colleagues (2021) examine consonance ratings of a large range of dyads and triads from the Bohlen-Pierce chromatic just (BPCJ) scale. The study is designed as a replication of a recent paper by Bowling, Purves, and Gill (2018), which proposes that perception of consonance in dyads, triads, and tetrads can be predicted by their harmonic similarity to human vocalisations.
In this commentary, we would like to correct some interpretations regarding Friedman et al.’s (2021) discussion of our paper (Smit, Milne, Dean, & Weidemann, 2019), as well as express some concerns regarding the statistical methods used. We also propose a stronger emphasis on the use of, as named by Friedman et al., composite models as a range of recent evidence strongly suggests that no single acoustic measure can fully predict the complex experience of consonance.
In recent years, there has been an increase in the use of unconventional or microtonal systems in music perception research (Herff, Olsen, Dean, & Prince; Leung & Dean, 2018; Loui, Wessel, & Hudson Kam, 2010; Milne, 2013; Milne, Laney, & Sharp, 2016; Smit, Milne, Dean, & Weidemann, 2019, 2020). We applaud the authors for following this path as—also indicated by them—prior research on consonance has been confounded with effects of familiarity, which can be reduced by using tuning systems with intervals unfamiliar to participants.
Friedman, Kowalewski, Vuvan, & Neill (2021) describe our 2019 study by emphasizing the differences between the studies with a particular focus on it not replicating Bowling, Purves, and Gill’s (2018) study. We would like to emphasize that our study was not designed to replicate Bowling et al., but rather to “combine multiple novel and established extrinsic and intrinsic predictors to model affective responses in a systematic approach” (Smit et al., 2019). Our study, initially inspired by Mathews, Pierce, Reeves, and Roberts (1988), aimed to collect consonance and valence ratings of a large number of Bohlen-Pierce triads. We modeled these responses with several intrinsic perceptual musical features that might be culture-independent, such as harmonicity, roughness, spectral entropy, and average pitch height, as well as culture-dependent features, such as the distance from the BP chord to the closest 12-TET chord.
It is important to emphasize that these predictors have many different mathematical quantifications; there is no single measure of harmonicity. Friedman et al. (2021) state that harmonicity in Smit et al. (2019) is “gauged using a relatively comprehensive measure akin to that of Harrison and Pearce (2020).” In fact, there is no kinship between these two quantifications of harmonicity. The latter is based on the quantification of harmonicity introduced and defined in Milne (2013) and Milne et al. (2016), but the method specified in Smit et al. (2019) is quite different to that, conceptually and mathematically. It would certainly be interesting to test multiple quantifications of harmonicity with the BP ratings obtained by Friedman et al. and by Smit et al. (2019). Besides harmonicity, we also included measures of roughness and spectral entropy. The latter is particularly interesting when studying unfamiliar (microtonal) musical systems, as it provides a way to capture the unpredictability of the spectrum of a chord.
Friedman et al. (2021) state that we “did not equate the average pitch heights of component chord tones,” but it is only by including variations in mean pitch that we can explicitly model its effect on pleasantness. Our experimental design allowed us to show that, for our participants, mean pitch is one of the most important predictors of pleasantness: any set of stimuli without such variation will miss this crucial fact. Finally, we included a measure that accounts for the difference between the tested chords and the closest Western chord, thereby controlling for specific aspects of familiarity.
We also have some concerns with the statistical methods reported in this paper. In the tests of correlations, there are a number of post hoc analyses where the data are subdivided until significance is achieved. It is not clear exactly how the regressions were performed, but it appears to be on responses averaged over participants. We suggest that using a mixed effects model applied to the unaveraged data (where the intercept and slope vary for each participant), and modeling the Likert data as ordinal, might be a better approach. In our own study we employed a Bayesian mixed effects model that allows for the full posterior distribution of every effect to be estimated and protects against overfitting.
Overall, our results support not only the inclusion of roughness, harmonicity, and familiarity, but also spectral entropy and, notably, mean pitch height. All of these are complementary predictors of pleasantness (and valence) for Bohlen Pierce chords. Furthermore, these findings—for every predictor—have been recently replicated with different participants in a different experimental setting (Smit et al., 2020).
We also caution against conflating evidence for harmonicity with vocal similarity. There are other possible mechanisms by which harmonicity may predict consonance, including perceptual fluency and learned associations. Experiments such as ours and Friedman et al.’s (2021) are not designed to distinguish between such possibilities.
Music and emotion perception are both complex and involve a vast array of different perceptual processes. The results from Friedman et al. (2021) and our studies (Smit et al., 2019, 2020) highlight the need for composite models when studying affect perception in music.