Historical listening has long been a topic of interest for musicologists. Yet, little attention has been given to the systematic study of historical listening practices before the common practice era (c. 1700–present). In the first study of its kind, this research compared a model of medieval perceptions of “sweetness” based on writings of medieval music theorists with modern day listeners’ aesthetic responses. Responses were collected through two experiments. In an implicit associations experiment, participants were primed with a more or less consonant musical excerpt, then presented with a sweet or bitter target word, or a non-word, on which to make lexical decisions. In the explicit associations experiment, participants were asked to rate on a three-point Likert scale perceived sweetness of short musical excerpts that varied in consonance and sound quality (male, female, organ). The results from these experiments were compared to predictions from a medieval perception model to investigate whether early and modern listeners have similar aesthetic responses. Results from the implicit association test were not consistent with the predictions of the model, however, results from the explicit associations experiment were. These findings indicate the metaphor of sweetness may be useful for comparing the aesthetic responses of medieval and modern listeners.
Carruthers (2013) has argued that medieval aesthetic experience was bound to human sensation and that medieval writers drew upon a common vocabulary that privileged sensory effect. This vocabulary, in attempting to make sense of physical multisensory responses to image, literature, music, and spectacle, was predominantly metaphorical when describing a quality of sensation in response to the natural and artificial physical world. These multimodal metaphors articulated and qualified “modes of perception by means of describing effects on the perceiver” (Carruthers, 2013, p. 45). For example, the Latin terms suavis and dulcis, which are both commonly translated into English as “pleasant” and “sweet,” were often used as metaphors to describe the sensory experience of aesthetic pleasure (Carruthers, 2013, p. 61). The increasing use of these types of multimodal metaphors reflects the growing emphasis on describing and theorizing the experience of listening to music at this time (Stoessel, 2017). Such terms captured in the conventions of written language approbative or disapprobative appraisals of music as heard. Although emotions may arise (or may have arisen) from these sensory experiences, emotion cannot as yet be understood as directly linked to the metaphors described above. Previous research has shown that metaphors tend to serve as representations of complex, multimodal states, part of which (but not all) can be explained through emotion (e.g., Crawford, 2009; Fainsilber & Ortony, 1987). For this reason, metaphor in this paper is considered for the aesthetic concepts and experiences it may represent, rather than the emotions with which it may be associated.
In medieval aesthetic statements, the metaphor of sweetness was used to describe what was considered “pleasing and beneficial” (Carruthers, 2013, p. 89). Music that was associated with sweetness in the writings of medieval theorists also tended to contain and to emphasize consonant sonorities. Thus, a connection exists between consonance, a metaphorical concept of sweetness, and what was considered pleasurable and beneficial in medieval cultures. This paper explores whether this relationship can be observed in the listening habits of modern-day listeners.
The last several decades have seen an increased interest in cultural and historical listening habits in many disciplines (e.g., Burnett, Fend, & Gouk, 1991), including in historical musicology (e.g., special issues of Early Music, volume 25, number 4, 1997; World of Music, volume 39, issue 2, 1997; Musical Quarterly, volume 82, issue 3–4, 1998). Such research is made possible by the presence of historical ear-witness accounts that survive in the form of writings about music as well as other evidence for the use of music (e.g., visual representations, including music making in architectural or ritualistic contexts).
An ecological approach may provide insight into such aesthetic experiences. Such an approach considers the impact of the relationship between a perceiver and their environment on music perception (e.g., Clarke, 2005). It requires researchers to consider how a listener’s sound world (such as that described by Gibson & Biddle, 2016) can affect musical experience. For example, how a listener’s engagement with seemingly unrelated stimuli, such as language (Botstein, 1992), impacts how they perceive music. Social and cultural practices may also shape a listener’s experience of music (Gay, 1996; Johnson, 1995). Such ecological models of listening are challenging long-held conceptions of how medieval music was engaged with and perceived by historical listeners (e.g., Bent, 2010; Clark, 2004; Zayaruznaya, 2017).
This project investigates whether the metaphor of sweetness is useful for comparing medieval aesthetic discourse about consonance with present day listeners’ perception of consonance. We explored whether the medieval association between sweetness and consonance can be observed in modern listeners. Informed by medieval music theory, we used musical analysis to identify pertinent musical structures that emphasize consonance. These same musical structures are associated with the metaphor of sweetness by medieval musical theorists. The expectations generated from these analyses were then compared to modern listeners’ responses to early music. To assist our analysis, we used the Computer-Assisted Symbolic Musical Analysis Toolbox (CASMAT). CASMAT offers an alternative approach to current computational models of music perception (e.g., Pearce & Eerola, 2016; Wiggins, Pearce, & Müllensiefen, 2011) in that it places counterpoint, which is a critical component of early music, at the center of the analysis. In this way, this paper exploits metaphor as a means for comparing modern-day listeners’ experiences of early music with a model of aesthetic judgment that can be extrapolated from early music theory. The broad intention of this paper is to take initial steps towards a framework for conducting an archaeology of early musical listening that is relevant to both historical enquiry and listener-centric approaches to early music.
An implicit associations experiment and an explicit associations experiment were conducted, which addressed two key questions. First, medieval theorists judge the resolution to a more consonant sonority to be sweeter than the final sonority’s constituent consonances alone (Frobenius, 1971; Fuller, 2013). The implicit associations experiment was used to test whether modern day listeners more readily associated the metaphor of sweetness with progressions that resolve to more consonant sonorities. This was achieved by measuring the effect of phrases that do and do not resolve to consonance on modern listeners’ lexical processing. In this experiment, participants were primed with a musical stimulus (i.e., musical prime), then presented with a sweet or bitter target word, or non-word. It was predicted that, given their potential conceptual relationship, sweet target words would be recognized faster as words after a musical prime that exhibited a strong resolution to consonance (henceforth referred to as a more consonant musical prime) than after a musical prime that did not resolve to consonance (henceforth referred to as a less consonant musical prime). Second, to examine associations with the individual sonorities that make up longer phrases, an explicit associations experiment was employed to determine whether modern day listeners rated consonant sonorities as sweet. Working from the perspective of early music theory, it was predicted that participants would judge consonant sonorities to be sweeter than less consonant ones.
A Fourteenth-Century Musical Informant
The Musica (1357) of Johannes Boen informed our reconstruction of a model of contrapuntal sweetness (Frobenius, 1971, pp. 32–78). This wide-ranging music treatise was chosen due to its ample and explicit use of the metaphor of sweetness to describe contrapuntal progressions in examples of medieval music that have survived in a notated form to the present day. A native of Holland, Boen seems to have lived in Oxford and perhaps in Paris as a scholar, and boasts of his knowledge of the music of England, France, and Italy. In contrast to the earlier writings of Engelbert of Admont and Jacobus (previously known as Jacques de Liège) on musical listening, Boen’s “discussion of consonance is peppered with comments on aural perception and reaction” (Fuller 1998, p. 473). Moreover, Fuller adds that Boen’s “faculty of hearing is a judicious arbiter of sound quality and a keen perceiver of musical events and contexts” (Fuller, 1998, p. 473). Although his Musica is not a treatise on counterpoint (that is, contrapunctus) per se, the medieval theorist’s mellifluous verbosity and his patent aural knowledge of the new music of his day sets his writings apart from more rudimentary fourteenth-century instructional manuals on music. He is an extraordinary informant that cannot be ignored. Significantly for this study, Boen’s language connects his musical thought to the pervasive medieval discourse that uses the metaphor of sweetness to describe aesthetic responses (Carruthers, 2013). Finally, Boen’s treatise affirms a strong discursive relationship between sweetness and consonance.
In fourteenth-century music theory, consonances are described as either perfect or imperfect (Crocker, 1962; Gut, 1976; Sachs, 1974, pp. 88–103). Perfect consonances consist of unisons, octaves, and fifths and their compounds; for example, a twelfth is considered equivalent to a fifth. Imperfect consonances are thirds, sixths, and their compounds. Dissonances constitute all other types of intervals. By the late fourteenth century, theorists had recategorized the interval of a fourth a dissonance, despite it having been considered a consonance in earlier and, in specific contexts, later periods.1 Boen affords a more complete picture of the relationship between interval qualities and aesthetic judgements of sweetness and, what he and his contemporaries consider its antithesis, bitterness.
In the fourth part of his Musica, Boen defines consonance as the “blend of high and low sounds falling pleasantly (suaviter) and uniformly on the ears” (Frobenius, 1971, p. 64). In the following passages discussing various types of consonances (Frobenius, 1971, pp. 64–67), the link between the pleasing combination of high and low sounds and the metaphor of sweetness (dulcis, dulcitudo) is explicit. Proceeding to consonances, Boen stands apart from most fourteenth-century theorists in his view that consonances exclude the unison. Given the relative exceptionality of this statement among theorists (Crocker, 1962; Gut, 1976), we set it aside. Yet, Boen accepts the octave as a perfect consonance, which he considers to be sweeter (dulcior) than the fifth. After the octave, he places the fifth as the next most perfect consonance, “just as hearing bears witness.” Of imperfect consonances, Boen notes that “in as much as the third and sixth fall short of sweetness, so too might each, namely the third and the sixth, more abundantly rejoice in the twofold arrangement of sounds.” Boen explains that the “twofold arrangement of sounds” refers to the fact that thirds and sixths may be major or minor, and that each quality invites different voice leading, such as the major sixth moving to the octave in contrast to the minor third moving to the unison. Of dissonances, Boen is less effusive, noting, however, that they should be used in an appropriate metric position and be of a duration so that “lingering on a dissonance does not burden the hearing of the ears with its bitterness” (Frobenius, 1971, p. 70).
Among the several musical compositions discussed in his Musica, Boen refers directly to a contemporary motet, Se grasse n’est/ Cum venerint/ Ite missa est (henceforth Se grasse), which concludes the famous mid fourteenth-century Tournai Mass, one of the earliest examples of a polyphonic setting of the medieval Catholic Mass (Frobenius, 1971, pp. 67–68). The motet was transmitted widely in Western Europe. It survives in a relatively large number of four fourteenth-century manuscripts, whose origins and provenance demonstrate this work’s dispersal ranges from Tournai in modern-day Belgium to as far south as Ivrea in northern Italy and as far east as Wroclaw (Breslau) in Poland.2 Boen’s discussion of Se grasse focuses on a false minor third (augmented second) in the fourth measure of the motet (see measure 4 of Figure 1). Although Boen reports that he hears—indeed, he appeals to the experience of hearing explicitly—this augmented minor second as a defective third, he refers to its “harshness” (asperitas). He also provides a telling remark that such an interval might be permitted in music “since it is propped up by the surrounding sweetness,” that is by adjacent perfect consonances. Fuller dubs Boen’s justification of instances of patent dissonance by their subsumption by into surrounding consonances as a “doctrine of compensation” (1998, p. 474). As shown below, the doctrine of compensation, predicated on a metaphysical teleology or, as medieval writers state (Cohen, 2001), the perfection of consonance, also applies to musical relationships between different degrees of consonance and therefore judgments of sweetness.
Example of numerical representation of the first six measures of Se grasse n’est/Cum venerint/Ite missa est. The numbers in brackets refer to: measure number, sonority index in measure, duration, aggregated sonority quality.
Example of numerical representation of the first six measures of Se grasse n’est/Cum venerint/Ite missa est. The numbers in brackets refer to: measure number, sonority index in measure, duration, aggregated sonority quality.
It is difficult to discern any direct influence of Boen on later writers although several fifteenth-century authorities discuss consonance in terms of sweetness, and dissonance in terms of bitterness, forming part of a discourse that can be situated within the broader aesthetic framework described by Carruthers. For the purpose of this article, it is sufficient to note two prominent theorists in the following century. Around 1430, Ugolino of Orvieto judged the fifth as the “sweetest” interval, one that makes the listener leap for joy and raises the intellect to higher contemplation (Seay, 1959/1962, Vol. 1, p. 58). By contrast, he attributes the minor seventh a status somewhere between a consonance and dissonance, akin to taste in its auditory effect to mixing of bitter bile with the sweetness of honey (Seay, 1959/1962, Vol. 1, p. 66). In the prologue of his Book on Counterpoint (I.i.13), the late fifteenth-century music theorist Johannes Tinctoris is at pains to stress that consonance arises from the pleasure of hearing sweetness of the musical concordance of voices and melodies (D’Agostino, 2008, p. 138). Dissonance between two voices (II.i.4), on the contrary, displeases the ear by its harshness or bitterness (D’Agostino, 2008, p. 282). Both Ugolino and Tinctoris were also composers and their remarks reveal a real concern for music’s aesthetic effect. These concise readings of Boen, Ugolino and Tinctoris (D’Agostino, 2008; Seay, 1959/1962) produce a mapping of interval qualities to the metaphor of sweetness as shown in Table 1.
A Mapping of Medieval Intervallic Qualities to the Metaphor of Sweetness
Intervals (including their compound forms) . | Quality . | Associated metaphor . |
---|---|---|
Octaves, Fifths | Perfect Consonance | Sweet |
Thirds and Sixths | Imperfect Consonance | Neither sweet nor not sweet |
All other intervals | Dissonance | Not sweet |
Intervals (including their compound forms) . | Quality . | Associated metaphor . |
---|---|---|
Octaves, Fifths | Perfect Consonance | Sweet |
Thirds and Sixths | Imperfect Consonance | Neither sweet nor not sweet |
All other intervals | Dissonance | Not sweet |
These additional examples point sufficiently to the pervasiveness of metaphors of sweetness (and conversely bitterness) in medieval writings about musical consonance. They invite the central research question of this study, namely whether modern listeners make similar or different metaphorical associations with musical consonance.
Predicting Historical Listening Habits – Computer-Assisted Analysis
Boen’s treatise is of additional use for this investigation since it deals not only with the relationship between sweetness and consonance but also with the role of counterpoint in creating the expectation of sweetness. Medieval counterpoint in the strict sense consisted only of consonances. In practice, however, florid counterpoint, which is more likely to be found in composed polyphony from the Middle Ages, was judiciously peppered with dissonances. Even in strict counterpoint, not all consonances are considered equal. Perfect consonances are held to be more consonant than imperfect consonances.
The premise for modeling a relatively simple medieval listener in this instance stemmed from a principle described by Boen in which musical consonance is judged all the sweeter when it consists of perfect consonance preceded by imperfect consonance. Moreover, Boen outlines how imperfect consonance creates the expectation of resolution to perfect consonance. An oft-quoted passage from Boen’s Musica (e.g., Fuller, 1992) serves to highlight the link between sensory experience and the fulfilment of listening expectation specifically in relation to consonance:
Moderns have increased the similarity between the third and the sixth, due to their belief in their mutual interchangeability, so that, just as three thirds may follow each other in succession, so also might three sixths. They introduced this so that a song, which is judged imperfect by the presence of thirds and sixths, though not discordant, may attract and allure the ears, so that thirds and sixths (like heralds and maidservants) may announce the song’s longer expected and sweeter perfection, which shall follow by means of the fifth and the octave, as here
The moderns did not however introduce this on fifths and octaves lest the ear cease from being attentive, thinking that, with the end reached, all motion has stopped (Original Latin: Frobenius, 1971, pp. 69–70, translated by Jason Stoessel).
Like other observations in Boen, this passage illustrates the importance of the sensory experience of listening in aesthetic judgments of music by evoking the primary sense organ involved. It also reveals how aesthetic judgements about two-voice counterpoint can be expanded to textures of three or more voices by the compounding the effect of each contrapuntal pair of voices. Importantly, the concept of the triad, which most musicians take for granted today, lay more than two centuries in the future at the time Boen was writing. The effect of a three-voice sonority, for example, can be instead construed in an historically informed way as the summation of its component intervals.
In an earlier study, Fuller (1986) has developed an historically informed model for analyzing progressions of musical sonorities of three or more voices based upon medieval theoretical categories of intervals as either consonant or dissonant. For compositions of more than two parts, Fuller proposes that the sum of vertical interval relations between the lowest voice in the three-part texture and other simultaneously sounding voices can be aggregated into a single quality to describe the net sonority. For example, when both highest voices in the texture sound in relation to the lowest voice as perfect consonances, then the net sonority can be described as doubly perfect. Other possibilities include perfect-imperfect and doubly imperfect (see Fuller, 1986, p. 43, for examples of the first three types of consonant sonority). Fuller’s classification is extended in this study to perfect-dissonant, imperfect-dissonant, and doubly dissonant sonorities. Fuller’s model is based upon the predominant three-voice texture of fourteenth-century polyphony. The motet Se grasse fits perfectly to Fuller’s three-part model, including short episodes in two voices. Based upon the aesthetic hierarchy of musical intervals set out in Table 1 above, the predicted aesthetic metaphor in terms of sweetness associated with each three-voice sonority is set out in Table 2.
A Mapping of Three-Voice Sonority Qualities to Predicted Aesthetic Metaphor of Sweetness
Three-voice sonority quality | Predicted aesthetic metaphor |
Doubly perfect | Sweet |
Perfect-imperfect | Neither sweet nor not sweet |
Doubly imperfect | Neither sweet nor not sweet |
Perfect-dissonant | Not sweet |
Imperfect-dissonant | Not sweet |
Doubly dissonant | Not sweet |
Three-voice sonority quality | Predicted aesthetic metaphor |
Doubly perfect | Sweet |
Perfect-imperfect | Neither sweet nor not sweet |
Doubly imperfect | Neither sweet nor not sweet |
Perfect-dissonant | Not sweet |
Imperfect-dissonant | Not sweet |
Doubly dissonant | Not sweet |
Drawing further on the motet Se grasse, computer-assisted analysis was used to make predictions of contrapuntal sweetness according to the model reconstructed from Boen. CASMAT was used to analyze sonority types in the selected piece. CASMAT was developed by Stoessel and members of his research teams (Stoessel, Collins, & Hill, 2020). This approach allows for systematic analysis of scores according to the framework of early music theory. Stoessel wrote and used the Sonority class in CASMAT, which is a numerical implementation of Fuller’s method, to analyze the score of Se grasse and selecting the excerpts for use in this study. An edition of Se grasse was prepared after that of Philippe Mercier (Pycke, Mercier, Dumoulin, & Huglo, 2016, pp. 106–109) and emended to reflect the particular accidentals and rhythmic choices found in the well-known recording by the French early music exponents, Ensemble Organum (1991). The score was encoded into the MusicXML format for computational analysis. The implicit associations experiment was used to test the effects on the modern-day listener of selected excerpts from the Ensemble Organum’s performance of Se grasse.
The example in Figure 1 illustrates how the sonorities in the first six measures of Se grasse can be represented numerically. Each sonority produced by a melodic progression in one or more voices can be labelled using four numbers shown in brackets below the music in Figure 1. The first two numbers indicate the relative location of the sonority by measure number and sonority within the measure. The third number indicates the duration of the sonority in ENIGMA Durational Units, where each unit is 1024ths of a quarter note.3 The last and most important number in the bracketed group is the sonority type, whose numerical representation corresponds to respective sonorities shown in Table 3.4
Sonority Qualities and Their Corresponding Numerical Representation
Sonority Type . | Representation . |
---|---|
Doubly perfect | 1 |
Doubly imperfect | 2 |
Perfect-imperfect | 3 |
Doubly dissonant | 4 |
Perfect-dissonant | 5 |
Imperfect-dissonant | 6 |
Sonority Type . | Representation . |
---|---|
Doubly perfect | 1 |
Doubly imperfect | 2 |
Perfect-imperfect | 3 |
Doubly dissonant | 4 |
Perfect-dissonant | 5 |
Imperfect-dissonant | 6 |
This representation of sonority types served as the basis for the expert selection of auditory stimuli with varying degrees of consonance from the aforementioned recording of the same piece. Although an algorithmic segmentation of the resulting numerical representations in CASMAT was considered, the comparatively small amount of data used in this Se grasse (177 sonorities) meant that a manual segmentation was the most time efficient approach. For our purposes, we identified musical phrases that concluded with either doubly perfect (1), doubly imperfect (2), or imperfect-perfect (3) sonorities. The sonorities preceding the final sonorities were also factored into our selection criteria, particularly where a phrase concluding in more consonant sonorities was preceded by less consonant or even dissonant sonorities. Phrases concluding with doubly imperfect or imperfect-perfect sonorities (2, 3) were categorized as less consonant, while phrases concluding with doubly perfect sonorities were categorized as consonant. The duration of the final consonant sonority was also a factor in determining its selection as a prime in the experiment. Phrases that concluded with longer consonant sonorities (equivalent to two or three measures in the transcription), especially where they exhibited cadence structures as described by Boen above, were prioritized. No parameters were set for the length of phrases, although these generally fell into units between 5 and 10 measures in duration, which is equivalent to approximately 5 to 10 seconds of listening in the recording used in the experiment. Finally, some phrases were “gamed” by truncating them so that they did not conclude with, for example, doubly perfect sonorities (1), but ended “prematurely” with a less consonant sonority (2, 3). The last parameter influenced the selection of Less Consonant Primes 1 and 2 in Figure 2, while the other Less Consonant Primes 3–5 fall into the category of phrases which conclude with less consonant sonorities. More Consonant Primes 1–5 in Figure 3 all conclude with more consonant, doubly perfect sonorities.
The five less consonant musical primes selected from the fourteenth-century motet Se grasse n’est/Cum venerint/Ite missa est.
The five less consonant musical primes selected from the fourteenth-century motet Se grasse n’est/Cum venerint/Ite missa est.
The five more consonant musical primes selected from the fourteenth-century motet Se grasse n’est/Cum venerint/Ite missa est.
The five more consonant musical primes selected from the fourteenth-century motet Se grasse n’est/Cum venerint/Ite missa est.
Examining the Modern-Day Listener – An Experimental Approach
This research used two experimental paradigms to test the responses of modern-day listeners to auditory stimuli. Testing for the implicit semantic associations of participants between longer musical phrases and the metaphor of sweetness was undertaken using a priming paradigm. The longer musical phrases consisted of auditory stimuli from the recording of Se grasse selected using the computational analysis described above. Existing evidence indicates that the processing of certain types of stimuli can be influenced by what precedes them, particularly for lexical decision tasks where participants have to quickly decide whether a string of letters is a word or non-word (McNamara, 2005, p. 4). Accuracy and/or reaction times have been found to be significantly affected by the prior presentation of words or even strings of letters that share (or do not share) a given characteristic (e.g., phonological similarity; Hamburger & Slowiaczek, 1996; Slowiaczek & Pisoni, 1986). This effect has also been found for semantic relationships (McNamara, 2005, p. 156). For example, people are faster at recognizing that the string of letters nurse is a real word if it is preceded by a semantically related word such as “doctor” than if it is preceded by a semantically unrelated word such as lawyer (Fromkin, Rodman, & Hyams, 2011; Tillmann & Bigand, 2002). Here, we make use of this semantic priming effect to determine whether musical primes that are considered sweet from the perspective of the medieval listener affect the reaction times to words that refer to sweet things or have strong semantic associations with sweetness.
A second experiment examined participants’ explicit associations between shorter musical excerpts (dyads, three-note sonorities, and cadences) and the metaphor of sweetness using self-report measures. Self-report is a common paradigm used in music psychology studies, such as music and emotions research, for which the Likert scale is a common form of measurement (Eerola & Vuoskoski, 2013, p. 314). For this reason, this experiment used the Likert scale to gauge participants’ judgment of the level of sweetness of a musical excerpt. The following sections present the results of these experimental approaches.
Experiment One – Implicit Associations Experiment
Method
Forty-eight participants (male = 18, female = 30) took part in this study. Participants ranged from 19 to 64 years of age (M = 35.8, SD = 16.76). Participants did not have a hearing or reading impairment. Forty-one participants reported having more than five years of music training, two reported having four or less years of music training, and five reported having no music training. Participation was voluntary, and participants went into a draw to win one of five $50 iTunes gift vouchers. At the outset of the experiment, it was recommended to participants that they use headphones, set their volume at a comfortable level, and close all computer programs which could cause distraction.
The experiment was delivered on a computer via the online platform “Gorilla Experiment Builder” (www.gorilla.sc). Data was collected between October 14, 2018 and December 9, 2018. A priming paradigm was used in which participants were first primed with a consonant or less consonant musical phrase, then presented with a string of letters. Participants were asked to judge as quickly as possible whether the string of letters formed a word or a non-word by selecting either the F or J keys on the keyboard (the association between the keys and the affirmative or negative responses was counterbalanced across participants). The target word was presented immediately following the conclusion of the musical prime. This timing of the presentation of the target work was chosen because, according to Boen, it is the whole musical passage which is considered to contribute to the sweetness judgment. That is, it is the interplay between sonorities over time which ultimately resolve to consonance, rather than the cadence alone, that contributes most to sweetness. Therefore, it was expected that placing the target word after the musical prime would be most likely to capture this experience.
The words and non-words used can be found in the Appendix. Non-words were selected from the English Lexicon Project database (Balota et al., 2007). The target words were of two kinds: half were words synonymous, or associated with, sweetness. The response to these was contrasted with the response to words that could not be considered sweet. For this purpose, words associated with bitterness were chosen, not as the opposite of sweet (indeed, sweetness and bitterness may not sit along such a bipolar scale) but rather because they offered a discrete category that represented something “other than” sweetness. The words describing sweetness and bitterness can all be found in medieval texts. Words originally appeared in Latin and were translated to English for this experiment. These words can be used to refer to taste (e.g., “that mango was very sweet,” “I ate some ginger, it was bitter”) as well as more abstract qualities (e.g., “her words were very sweet,” “the truth was bitter”). It was reasoned that, if an aesthetic relationship between consonance and sweetness existed for the modern-day listener, then participants would be faster and more accurate at responding to sweet words after a consonant musical phrase, compared to a less consonant musical phrase, and be faster and more accurate at responding to sweet words than to not sweet (i.e., bitter) words.
The selected words were pre-tested to ensure the modern-day listener still considered them sweet and bitter. In this pre-test, 30 participants were presented with 25 sweet and 25 bitter words. They were asked to rate each word on two seven-point Likert scales; one scale for sweetness (range = very sweet to not sweet), and one scale for bitterness (range = very bitter to not bitter). The ten most highly rated sweet words and ten most highly rated bitter words were selected for use in the experiment, with the following exceptions:
Honeysweet and sugared were rated by participants as among the top 10 most sweet words (1st and 3rd sweetest in the Likert scale). However, since these were very similar to honeyed and sugary, which were also among the top 10 sweetest words, there was concern that familiarity could impact the results. Therefore, honeysweet and sugared were replaced with delicious and pretty (tied 12th sweetest in the Likert scale) for inclusion in the main experiment.
Acrimonious was rated by participants as among the top 10 most bitter words. However, 3 participants (10%) indicated that they were unfamiliar with this word. Therefore, acrimonious was replaced with biting (12th most bitter on the Likert scale) for inclusion in the main experiment.
Participants who took part in the pre-test did not take part in the main experiment.
Consonant and less consonant musical phrases were taken from the fourteenth-century motet Se grasse. Based on the model described above, five consonant and five less consonant musical phrases were selected as primes. These are shown in Figures 2 and 3.
Each word was randomly paired with one of the consonant and one of the less consonant musical primes, giving rise to two priming conditions: more consonant and less consonant. These word-musical prime pairs were counterbalanced across two stimulus lists such that each participant encountered and responded to each target word only once. Each list comprised 40 experimental items (10 sweet words and 10 bitter words, half of which were paired with consonant musical primes and half with less consonant musical primes) and 20 filler items (20 non-words randomly paired with the same five consonant and five less consonant musical primes).
Results
The average reaction times for the two types of target words (sweet or bitter) in each priming condition (more consonant or less consonant) can be found in Figure 4. To determine whether the primes had been effective in creating differences between the reaction times (RTs) to the different types of target words according to which kind of prime had preceded them, the RT data were modeled with a linear mixed effects logistic regression (LME; Jaeger, 2008) using the lme4 package in R version 3.4.4 (Bates, Castellano, Rabe-Hesketh, & Skrondal, 2014; R Core Team, 2018). The model included RTs as the dependent variable and prime (more consonant vs less consonant) and target word (sweet vs bitter) conditions, as well as their interaction, as predictors. The initial model also included a maximal random effect structure consisting of the effects of prime and target condition, and their interaction, over subjects and items. However, convergence issues forced the reduction of this random effect structure (following a backward best-path procedure; Barr, Levy, Scheepers, & Tily, 2013), which resulted in only the intercepts over subjects and items remaining as random factors in the final model. Neither the individual fixed effects on their own nor their interaction were significant (all p’s > .262).
Average reaction times across the four conditions tested in the implicit associations experiment. No statistically significant result was detected between the conditions. Error bars indicate standard error.
Average reaction times across the four conditions tested in the implicit associations experiment. No statistically significant result was detected between the conditions. Error bars indicate standard error.
Discussion
It was predicted that participants would recognize sweet target words faster after a consonant musical prime then after a less consonant musical prime. Results did not show this facilitation. In fact, the primes did not seem to have any effect on lexical access.
One reason for this may be that the chosen method was simply not able to capture the semantic relationship between consonance and sweetness. Certainly, the priming paradigm has been found to capture semantic relationships in a number of tasks (e.g., McNamara, 2005; Meyer & Schvaneveldt, 1971), and priming has been found in some musical contexts (e.g., Poulin-Charronnat, Bigand, Madurell, & Peereman, 2005). However, it is also true that the priming paradigm does not always yield consistent results in semantic tasks (e.g., Caruso, Vohs, Baxter, & Waytz, 2013). A failure of the paradigm is more likely in cases that examine more indirect associations and/or long-lasting stimuli (Meyer, 2014). It could be that the length of the musical primes prevented effective facilitation. Although other studies into musical priming have found effects using longer musical primes (Spreadborough & Anton-Mendez, 2018), in the absence of a systematic review of the impact of stimuli length on priming effect, it is not possible to rule out that this may have been a contributing factor in the null effect. Additionally, this experimental paradigm—using an audio musical prime to facilitate recognition of a written linguistic target by activating a metaphorical connection between consonance and sweetness—is very novel. The complexity of this multi-modal association may have precluded a semantic priming effect.
Another potential explanation of these results rests on possible differences in the way that modern-day listeners perceive consonance. Music from the common practice era onwards permits an increasingly diverse range of voice leading techniques and complex sonorities (based upon the relatively novel concept of the triad after c. 1558) than early music. In particular, increasing use of dissonance is a feature of more recent musical styles. The “emancipation of dissonance” by the Second Viennese School (c. 1920–1950) (Samson, 1977), the tone-clusters of György Ligeti’s style in the 1960s (Levy, 2017), or the literally ear ringing tintinnabuli effects of Arvo Pärt’s music from the 1990s (Hillier, 1997), for example, move listening experience beyond categories of consonance and dissonance to aesthetical categories arising from textures, timbres, and orchestral color. Since the musical primes used in this experiment were categorized as consonant and less consonant according to the standards of the fourteenth century, it could be that the musical primes that should have been representative of less consonant musical sonorities were not perceived by the modern-day listener as such.
Finally, it may be that linguistic changes in semantic associations resulted in the purported sweet target words not being considered as overtly sweet by the modern-day listener. Although these words were pre-tested to ensure they were still considered sweet by modern day listeners, it may be that their usage today makes them less representative of the concept of sweetness as a pleasant feeling for the modern-day listener than they were for the medieval music listener. For example, a word such as “treacly” may have been common enough in the middle ages and may have evoked feelings of pleasant sweetness at that time. However, nowadays this word, although clearly denoting something sweet, is neither common nor necessarily associated with pleasant sweetness. Given that our aim was to see if the described associations for medieval music could be observed in today’s listeners, it was necessary to select targets that had been actually used in medieval texts to describe the music under study. However, it would be useful to also test the relationship between consonance and sweetness with equivalent words to those found in medieval texts, but which elicit the same associations of pleasant sweetness in the modern speaker.
In sum, while these results could be interpreted as a discontinuity between the way modern day and medieval listeners perceive consonance, and there are several ways that this difference could be explained, it is also possible that the priming task employed in this experiment was not able to detect the sought-after effect. Our second experiment was intended to shed more light on the same effect from a more explicit perspective.
Experiment Two – Explicit Associations Experiment
Method
Thirty-nine participants (a subset of participants from Experiment 1) (male = 13, female = 26) took part in this study. Participants ranged from 18 to 78 years of age (M = 36.41, SD = 16.74). Participants did not have a hearing impairment and reported having five or more years of music training. Participation was voluntary and participants went into a draw to win one of five $50 iTunes gift vouchers.
Three types of musical excerpts were tested: dyads, three-note sonorities, and cadences. These excerpts were classified as sweet, neither sweet nor not sweet, and not sweet. This resulted in the following number of stimuli per category:
nine dyads: three classified as sweet; three classified as neither sweet nor not sweet; three classified as not sweet
nine three-note sonorities: three classified as sweet; three classified as neither sweet nor not sweet; three classified as not sweet
six cadences: two classified as sweet; one classified as neither sweet nor not sweet; three classified as not sweet.
The musical excerpts can be seen in Figures 5, 6, and 7 respectively. Prior to implementing the experiment, musical stimuli were assigned expected sweetness categories based on the ratings of the first author of this paper who is an expert medieval musicologist. Ratings were informed by the principles of early music theory discussed in “Predicting Historical Listening Habits – Computer-Assisted Analysis” above. Using these models, consonant dyads (e.g., the unison) were rated sweet, whereas dissonant dyads (e.g., the second) were rated not sweet. Intervals not classified as consonant or dissonant in medieval music theory (namely thirds and sixths) were rated neither sweet nor not sweet (see the discussion around Table 1 above). Sweetness ratings in the case of three-note sonorities were more complex given multiple sonorities were sounding simultaneously. In these cases, the sonority qualities were calculated by the expert medieval musicologist based on the principles of CASMAT described in relation to Tables 2 and 3 above. This sonority quality was then used to rate the musical examples on the sweetness scale based on the degree of dissonance present.
Dyads rated by participants in the Likert scale. Expected sweetness ratings are given in italics above each dyad.
Dyads rated by participants in the Likert scale. Expected sweetness ratings are given in italics above each dyad.
Three-note sonorities rated by participants in the Likert scale. Expected sweetness ratings are given in italics above each sonority.
Three-note sonorities rated by participants in the Likert scale. Expected sweetness ratings are given in italics above each sonority.
Cadences rated by participants in the Likert scale. Expected sweetness ratings are given in italics above each cadence.
Cadences rated by participants in the Likert scale. Expected sweetness ratings are given in italics above each cadence.
Dyads and three-note sonorities had a duration of approximately 5 seconds, and cadences had a duration of approximately 10 seconds. Musical excerpts were recorded in just intonation at a pitch of A415 using three sound qualities: female voice, male voice, and organ. All stimuli were recorded live at the Early Music Studio, the University of Melbourne, using a Klop Chamber Organ and adult male and female vocalists. In this way a total of 72 musical excerpts were recorded (nine dyads + nine three-note sonorities + six cadences) x (three sound qualities). These were counterbalanced across three lists such that each participant heard each musical excerpt only once in one of the three sound qualities and encountered each sound quality an equal number of times.
The experiment was delivered on a computer via the online platform “Gorilla Experiment Builder” (www.gorilla.sc). Data was collected between October 14, 2018 and December 9, 2018. Participants rated the level of sweetness of the short musical excerpts using a three-point Likert scale (ratings = sweet, neither sweet nor not sweet, not sweet). While the use of a three-point Likert scale may be reductive, this approach was chosen because considering musical stimuli through the metaphor of sweetness is not common for modern day listeners. Therefore, a three-point scale was chosen to simplify the task for the participant.
Participants completed this experiment after taking part in the implicit associations experiment described above so that the explicit focus on the relationship between music and sweetness did not invalidate the search for an implicit link. Participants were asked to listen to each musical excerpt and rate each on the three-point scale described above. Participants were able to listen to each musical excerpt as many times as they wished. The experiment took approximately 5 minutes to complete.
Results
The average sweetness rating compared to the expected sweetness for the different musical types (dyad, three-note sonority, cadence) and different sound qualities (male, female, cadence) is shown in Figures 8 and 9.
Average sweetness rating in contrast with the expected sweetness for the different musical types. Error bars indicate standard error.
Average sweetness rating in contrast with the expected sweetness for the different musical types. Error bars indicate standard error.
Average sweetness rating in contrast with the expected sweetness for the different sound qualities. Error bars indicate standard error.
Average sweetness rating in contrast with the expected sweetness for the different sound qualities. Error bars indicate standard error.
To determine whether expected sweetness reflected perceived sweetness (sweetness ratings) and whether this differed for different musical types (dyad, three-note sonority, or cadence), sweetness ratings were modeled using LME in R (Bates et al., 2014; R Core Team, 2018). For this, the models included the fixed factors (predictors) of expected sweetness (sweet or unsweet) and musical type (dyad, three-note sonority, cadence), with sweetness ratings as the outcome. Since musical type is a categorical variable with three levels, the analysis was carried out in two phases: first the shorter musical types (i.e., dyad and three-note sonority) were contrasted against each other, and then they were contrasted together against the longer musical cadences (Piccinini, 2016). A maximal random effect structure with random intercepts and slopes over subjects and items was always attempted first (Barr et al., 2013). However, as previously, a backward best-path procedure (Barr et al., 2013) was followed to deal with non-convergence issues, and this resulted in the final models including only the random intercepts and the random slope of expected sweetness over subjects.
There was a significant effect of the expected sweetness based on consonance on the participants’ sweetness ratings (dyad vs three-note sonority contrast: estimate = 0.52, s.e. = 0.14, p = .001; cadence vs. dyad/three-note sonority contrast: estimate = 0.54, s.e. = 0.12, p < .001), since consonant musical stimuli were more likely to be rated sweet regardless of type. There were no overall differences in sweetness ratings between dyads and three-note sonorities (estimate = 0.06, s.e. = 0.13, p = .644), and no interaction between these musical types and expected sweetness (estimate = -0.15, s.e. = 0.33, p = .647). Cadences were considered sweeter overall than dyads/three-note sonorities (estimate = -0.47, s.e. = 0.15, p = .004), but this did not interact with expected sweetness (estimate = 0.41, s.e. = 0.33, p = .222).
Analyses were also conducted to see if sound quality impacted sweetness judgments. All three music types were pooled together before sweetness ratings were modeled using LME in R (Bates et al., 2014; R Core Team, 2013). The models included the fixed factors (predictors) of expected sweetness (sweet or unsweet) and sound quality (male, female, organ). The analysis of sound quality, a categorical variable with three levels, was carried out in two phases: first the two sound qualities considered sweeter (that is, female and organ) were contrasted against each other, and then they were contrasted together against the male sound quality (Piccinini, 2016). A maximal random effect structure with random intercepts and slopes over subjects and items was always attempted first (Barr et al., 2013). However, non-convergence issues forced a simplification of the random effects that was implemented following a backward best-path procedure (Barr et al., 2013) and resulted in the models including the random intercepts and the random slope of expected sweetness over subjects.
The estimate for the contrasting effect of the female and organ sound qualities did not contribute significantly to the model (estimate = -0.08, s.e. = 0.06, p = .141), while the estimate of the effect of the male sound quality in contrast with the female and organ sound qualities together was significant (estimate = 0.36, s.e. = 0.06, p < .001) indicating that male sound quality tended to be regarded as less sweet. The expected sweetness also contributed significantly to the model (estimate = 0.39, s.e. = 0.17, p = .028), confirming that the musical excerpts, which were expected to be perceived as sweet based on the relationship between sweetness and consonance present in medieval writings on music (namely, Johannes Boen), were indeed considered sweeter than those excerpts not expected to be sweet. The effect of expected sweetness did not interact with sound quality in either of the two models (female vs organ contrast: estimate = - 0.14, s.e. = 0.14, p = .304; female/organ vs male contrast: estimate = - 0.29, s.e. = 0.16, p = .063). In other words, expected sweetness had an equivalent effect on sweetness ratings regardless of sound quality.
Discussion
It was hypothesized that if modern day listeners exhibited similar aesthetic responses to consonant sonorities as medieval listeners, participants would judge consonant sonorities to be sweeter than less consonant ones. Our results support this. Musical excerpts that were expected to be sweet based on medieval music theory (e.g., Carruthers, 2013; Fuller, 2013) were in fact judged to be sweet by the modern-day listener. By drawing musical excerpts and metaphor directly from medieval theory, this study takes the first steps towards providing empirical evidence that the relationship between sweetness and consonance, which can be modeled from medieval discourses about listening, may also exist for modern-day listeners. Such preliminary results may suggest that the metaphor of sweetness is indeed a useful concept for comparing medieval music and modern-day musical perception. However, the null results of Experiment 1 suggest more research is needed in this area. This finding contradicts earlier doubts expressed in musicological literature discussing the relationship between consonance and sweetness in the writings of Tinctoris (Wegman, 1995, p. 311).
It was also found that male vocal sound quality was considered less sweet than female voices and organ. It could be that register plays a role here since the male voice was recorded one octave lower than the female voice and organ stimuli (to allow for the pitches to sit more comfortably in the tenor/bass range). Given that early church music was mostly sung by choirs of the same gender (especially in religious contexts), this relationship between sound quality and sweetness may be a fruitful avenue for future research.
General Discussion and Conclusion
This study is one of the first to explore the relationship between metaphorical sweetness and musical consonance in medieval music. The presented research explored whether sweetness is a useful metaphor for comparing the listening experience of modern day listeners with reconstructed models of consonance-related sweetness in medieval music. In doing so, this study examined potential points of comparison between medieval aesthetic discourse and modern-day listeners’ experiences of medieval music. Results from the explicit associations experiment (Experiment 2) support our hypothesis that sweetness may provide a link between historical listening practices and modern-day ones. Specifically, modern day listeners seemed to rate as sweet those musical sonorities identified as sweet in medieval texts.
An additional finding was that male voice sound quality was rated as less sweet by participants. While further research is needed to better understand why this is the case, one explanation is that register may have impacted ratings. At any rate, since the type of music investigated in this research was often performed by single gender choirs, this finding may have important implications for the way medieval music is analyzed (and performed) in future. That is, if sound quality can modulate judgments of consonance, then it may be as important to consider in analyses of medieval music as the sonorities of the intervals.
Contrary to our hypothesis, in the implicit associations experiment (Experiment 1), consonant musical primes had no effect on sweet target words. One reason for this (already explored above) could be that the current priming paradigm was not able to capture this semantic association. The fact that a relationship between sweetness and consonance was detected in the explicit associations experiment lends some credence to this explanation. Given the nature of this research project—attempting to observe a highly nuanced semantic and multi-modal association—it would not be too surprising if classic tasks fail to detect these possibly subtle effects.
Alternatively, it may be the relationship between consonance and sweetness is never implicit. That is, maybe this association is established by listeners only at a conscious level as a result of a “problem solving” process that detects enough points of contact between consonance and sweetness to make the connection when explicitly asked about it, as opposed to relying on implicit associations already present at an unconscious level. It would be worth trying to devise a task that would circumvent the potential shortcomings of the priming paradigm employed here to see if an implicit association can be detected.
One limitation of this study was a lack of discrete control over the tuning system used in the sample recording and listening stimuli. Medieval theorists repeatedly show a preference for Pythagorean tuning, which privileges perfect consonances based upon simple frequency ratios, from which arise more complex or “rough” imperfect consonances and dissonances. The degree of competency or enculturation required to perform vocal works in this tuning is high and often restricted today to specialist professionals. Adapting behavioral and event-related brain potential experimental methods (Schön, Regnault, Ystad, & Besson, 2005) and an auditory stream analysis-based method focused on the interaction of vertical and horizontal factors (Huron, 1991; Huron 2016) are two promising avenues for further enquiry in which MIDI-based stimuli prepared in Pythagorean tuning could be used instead of the original experiments’ equal temperament.
In conclusion, this research has taken a first step towards an archaeology of historical music listening. Its goal was twofold: to investigate whether sweetness is a useful metaphor for comparing responses to the perception of consonance expressed by medieval discourses and demonstrated by modern music listeners, and to identify the key questions that will lay the foundation for the next stage of research in this area. Sweetness does show potential when comparing perception of consonance through time, and this can be observed in experimental contexts where the stimuli is brief. Further research in the experimental realm is required to address outstanding questions around the relationship between explicit and implicit semantic associations between musical and non-musical stimuli in expanded and more ecologically valid contexts. Additionally, future research could further tease out the relationship between the metaphor of sweetness and emotional experience for the modern-day listener. The findings of this study would be important for such research since they indicate that the metaphor of sweetness can be used to investigate musical perception. Finally, the model outlined in this article has the potential for further applications, including the automated algorithmic segmentation and statistical analysis of a larger selection of medieval repertoire. An additional step of the digital signal processing of actual recorded examples of medieval repertoire (as opposed to scores) for audible structures of more or less consonance would also move one step closer to the experience of musical listening and the possible reconstruction of a historical listener.
Author Note
Kristal Spreadborough undertook this research while at the University of New England.
This project was supported in part by a grant from the University of New England’s Research Investment Scheme, Faculty of Humanities, Arts, Social Sciences and Education. Ethics approval has been sought for all experiments in this paper: University of New England, Australia, Approval Numbers HE18-233, HE18-255.
Datasets relating to this research:
Stoessel, J., Antón-Méndez, I. & Spreadborough, K. L. (2018, March 20). Comparing Habits of Medieval and Modern Musical Listening (version 1) [Data files]. DOI: 10.25952/5cd4eb6e26ff5 *Mediated access
Stoessel, J., Antón-Méndez, I. & Spreadborough, K. L. (2018, November). Comparing Medieval and Modern Musical Listening Habits (version 1) [Data files]. DOI: 10.25952/5c9d76b48e931 *Mediated access
Notes
It should also be noted that some theorists, including Boen, do not consider the unison a consonance since it admits no sweet admixture of a high and low sounds according to a fundamental (and ancient) definition of consonance (Frobenius, 1971, p. 65).
The four manuscripts that transmit Se grasse/Cum venerint/Ite missa est are: Ivrea, Biblioteca Capitolare d’Ivrea, MS CXV (115) (“Ivrea Codex”); Paris, Bibliothèque nationale de France, Département des Manuscrits, NAF 23190 (“Trémoille MS”); Tournai, Chapitre de la Cathédrale, MS. 476; Wroclaw (Breslau), Biblioteka Uniwersytecka, Ak 1955/KN 195.
The ENIGMA duration unit has been used in the music typesetting software Finale™ by MakeMusic for several decades now and represents a convenient method for representing written musical durations numerically.
Numerical representations for sonority types are generated using bitwise binary arithmetic, where a perfect consonance equals 1, an imperfect consonance equals 2 (binary 10), and a dissonance equals 4 (binary 100). So, an imperfect-perfect sonority is 3 (binary 01 AND 10 = 11). Where the intervals in a sonority are all the same quality, there is no difference between the numerical representation of the sonority and the numerical representation of the intervals due to the properties of bitwise arithmetic, e.g., binary 10 AND 10 = 10. The computed numerical type used in this article should not be confused with that found in Fuller (1986) in which a perfect-imperfect type is labelled Type 2 and doubly imperfect Type 3.
References
Appendix
Target Words and Non-Words Used in the Implicit Associations Experiment
Real Word - Bitter | Mean rating* | Matched Non-Word |
acidic | 2.33 | alabic |
bitter | 1.63 | banper |
biting | 2.77 | boying |
cursed | 2.47 | crined |
harsh | 2.5 | hetch |
painful | 2.53 | prucial |
sour | 2.4 | seor |
stinging | 2.37 | shaffing |
tart | 2.47 | thit |
vinegary | 2.23 | velocaty |
Real Word - Sweet | Matched Non-Word | |
charming | 2.47 | capeming |
delicious | 2.63 | deauteous |
delightful | 2.17 | drulptural |
honeyed | 2.03 | hubbled |
kind | 2.63 | kend |
pretty | 2.63 | plimey |
sugary | 1.83 | salcony |
sweet | 1.93 | slent |
syrupy | 1.77 | snisky |
treacly | 2.17 | tranchy |
Real Word - Bitter | Mean rating* | Matched Non-Word |
acidic | 2.33 | alabic |
bitter | 1.63 | banper |
biting | 2.77 | boying |
cursed | 2.47 | crined |
harsh | 2.5 | hetch |
painful | 2.53 | prucial |
sour | 2.4 | seor |
stinging | 2.37 | shaffing |
tart | 2.47 | thit |
vinegary | 2.23 | velocaty |
Real Word - Sweet | Matched Non-Word | |
charming | 2.47 | capeming |
delicious | 2.63 | deauteous |
delightful | 2.17 | drulptural |
honeyed | 2.03 | hubbled |
kind | 2.63 | kend |
pretty | 2.63 | plimey |
sugary | 1.83 | salcony |
sweet | 1.93 | slent |
syrupy | 1.77 | snisky |
treacly | 2.17 | tranchy |
*Words were rated on a 7-point Likert scale. A score of 1 indicates that the word is very sweet or very bitter, and a score of 7 indicates that the word is not sweet or not bitter.