Consonance and dissonance are basic phenomena in the perception of chords that can be discriminated very early in sensory processing. Musical expertise has been shown to facilitate neural processing of various musical stimuli, but it is unclear whether this applies to detecting consonance and dissonance. Our study aimed to determine if sensitivity to increasing levels of dissonance differs between musicians and nonmusicians, using a combination of neural (electroencephalographic mismatch negativity, MMN) and behavioral measurements (conscious discrimination). Furthermore, we wanted to see if focusing attention to the sounds modulated the neural processing. We used chords comprised of either highly consonant or highly dissonant intervals and further manipulated the degree of dissonance to create two levels of dissonant chords. Both groups discriminated dissonant chords from consonant ones neurally and behaviorally. The magnitude of the MMN differed only marginally between the more dissonant and the less dissonant chords. The musicians outperformed the nonmusicians in the behavioral task. As the dissonant chords elicited MMN responses for both groups, sensory dissonance seems to be discriminated in an early sensory level, irrespective of musical expertise, and the facilitating effects of musicianship for this discrimination may arise in later stages of auditory processing, appearing only in the behavioral auditory task.
Sensory consonance and dissonance are essential building blocks of Western music. Studies have shown that discrimination of different chord types occurs at the early sensory level in adults (Brattico et al., 2009), schoolchildren (Virtala, Huotilainen, Putkinen, Makkonen, & Tervaniemi, 2012), and newborn babies (Virtala, Huotilainen, Partanen, Fellman, & Tervaniemi, 2013). It is less clear how the auditory system processes different degrees of dissonance, and how the discrimination of these degrees is manifested neurally and behaviorally. Musicians have been shown to outperform nonmusicians in behaviorally discriminating chords and dissonance from consonance (see, e.g., Kung et al., 2014; Sares, Foster, Allen, & Hyde, 2018; Virtala, Huotilainen, Partanen, & Tervaniemi, 2014), but it is not altogether clear at what stage in the auditory processing stream musical expertise has an effect on consonance/dissonance discrimination.
Consonance and Dissonance
Harmony, an essential element of Western music, relies on the construction of individual and concurrent tones, intervals, and chords, and their successions (e.g., chord progressions). Western tonal music is characterized by functional harmony in which the tones, intervals, and chords have certain roles in creating music (Kopp, 1995; Rehding, 2003). The functionality of harmony is based to a large extent on how the harmonic structures are perceived as consonant versus dissonant (e.g., Rehding, 2019, p. 440). While the context of tonal, functional harmony has been central to, and prevalent in, the scientific study of consonance and dissonance, the phenomenon is not limited to Western tonality. It extends to Western musical practices before and after the so-called Western common-practice tonality, and can be extended to the study of non-Western musical traditions, non-musical sounds, and non-human subjects as well (Dibben, 1999; McDermott, Schultz, Undurraga, & Godoy, 2016; Tenney, 1988; Zarlino, 1558/1982; versus Bowling, Hoeschele, Gill, & Fitch, 2017; Parncutt & Hair, 2011; Virtala & Tervaniemi, 2017).
Despite their long history, the notions of consonance and dissonance have evaded clear consensus and exact definitions, proving challenging to study empirically (Cazden, 1980; Rehding, 2019, p. 437). There are objective acoustical properties (such as simple integer ratios, or their limited approximations, of fundamental frequencies or other partials of tones) and physiological features (such as those of the critical bandwidths of the basilar membrane) that affect judgments of consonance and dissonance (Bowling & Purves, 2015; Helmholtz, 1877/1954). However, these judgments seem to also be affected by the familiarity, context, cultural background, and musical expertise of the subject (Lahdelma & Eerola, 2016; Parncutt & Hair, 2011; Popescu et al., 2019; Rehding, 2019; Virtala & Tervaniemi, 2017). Furthermore, behavioral research on consonance/dissonance judgments is usually based on studying subjective, experiential, and preference-like ratings using guiding terms such as “pleasantness,” “stability,” “smoothness,” “compatibility” or “relaxation” versus “unpleasantness,” “instability,” “roughness,” “incompatibility” or “tension.” This practice is well motivated since the terms and concepts of “consonance” and “dissonance” might not be well understood by nonmusicians or even explicitly by musicians, and the guiding terms generally correlate with the music-theoretical descriptions and estimations of consonance/dissonance (see, e.g., Kuusi, 2001; Lahdelma & Eerola, 2016; Parncutt & Hair, 2011).
To bridge the gap between physical, physiological, psychological, and cultural aspects of consonance/dissonance studies, Parncutt and Hair (2011) proposed a holistic “conceptual structure for Western consonance/dissonance.” According to them, the consonance/dissonance of individual harmonic structures (as opposed to successive ones) has “two natural components, smoothness and harmonicity (fusion)” and “the cultural component,” that is, familiarity with music one is exposed to, as well as speech and sounds of the environment (see also Harrison & Pearce, 2020; Parncutt & Hair, 2018). Smoothness, roughness, and harmonicity are considered “natural components” because of “their apparent perceptual universality”: “they influence the everyday auditory experience of every hearing human” and other animals (Parncutt & Hair, 2011; see also Rehding, 2019). Further, harmonicity promotes consonance while roughness promotes dissonance in individual harmonic structures. Although they describe the cultural component as “simply familiarity with the music to which an individual has been exposed as well as speech and environmental sounds,” there appears to be more to it, e.g., exposure and learning, and possibly emotions as well (Parncutt & Hair, 2011; see also Arthurs, Beeston, & Timmers, 2018; Lahdelma & Eerola, 2016). However, it is not clear to what extent the two natural components versus the cultural ones affect the consonance/dissonance judgments. Furthermore, we do not have clear understanding of the roles of roughness versus harmonicity, nor of the roles of cultural components (e.g., those of familiarity or expertise versus preference) on the judgment of consonance and dissonance.
Here, consonance refers to a combination of tones (an interval or chord) that are perceived as concordant or in mutual agreement (con sonare, see, e.g., Parncutt, 1989, p. 56). Dissonance, in turn, refers to a combination of tones that is perceived as discordant, in mutual disagreement. We are carefully not defining dissonance simply as roughness (and consonance as the absence of “disturbance” or smoothness; cf. Helmholtz, 1877/1954, p. 194) or consonance as tonal fusion (and dissonance as absence of “melting”; cf. Schneider, 1997; Stumpf, 1890), nor either consonance or dissonance in terms prescribed in the practical traditions of music theory (e.g., Aldwell, Schachter, & Cadwallader, 2011, pp. 26–29; Cassiodorus, 550–562/1965, p. 89). We are also not solely defining these concepts in terms of simplicity or complexity of the (approximate) integer ratios of the fundamental frequencies of the tones (e.g., Galilei, 1638/1914, pp. 103–108; Nicomachus, as cited in Weiss & Taruskin, 1984; Parncutt & Hair, 2018). Finally, we avoid defining either term only based on subjective assessments such as pleasantness, tension, or other preferences, albeit some kind of judgment in consonance/dissonance is unavoidable. So, we wish to maintain conceptual possibility for the mutual interaction of two natural components (roughness and fusion) and the subjective and cultural components (such as familiarity, expertise, or preference), which all affect the perception and judgment of consonance and dissonance. We shall use the terms sensory consonance and sensory dissonance to refer to the natural components, and, when necessary, musical consonance and musical dissonance to refer to consonance/dissonance in the musical context, which is strongly linked to music cultures and affected by the individual’s musical expertise and musical preferences (cf. McDermott et al., 2016; Terhardt, 1984; but see Bidelman, 2013; Bowling et al., 2017; Harrison & Pearce, 2020; Kuusi, 2001; Lahdelma & Eerola, 2016; Parncutt & Hair, 2011; Virtala & Tervaniemi, 2017). Musical consonance/dissonance is inclusive of sensory consonance/dissonance, and we acknowledge that no strict demarcation might be necessary (or even possible) between sensory and musical consonance/dissonance, nor between natural and cultural components of perception and judgment of consonance/dissonance. Furthermore, our empirical focus is on consonance/dissonance of individual (isolated, vertical) harmonic structures, not on consonance/dissonance of their (horizontal) successions (e.g., chord progressions; cf. Lahdelma & Eerola, 2016, pp. 1–2).
The natural components of consonance/dissonance perception and judgment appear to be relatively universal (see Parncutt & Hair, 2011, p. 159). Newborn babies (Perani et al., 2010; Virtala et al., 2013) and even some non-human species (Izumi, 2000; Watanabe, Uozumi, & Tanaka, 2005) are able to discriminate consonance from dissonance. However, the cultural and subjective components rely on individual, contextual, historical, and cultural variability. Exposure to musical practices and the resulting musical enculturation leads to facilitated processing of music of one’s own culture, even without explicit training (Hannon & Trainor, 2007; Virtala & Tervaniemi, 2017). The process of enculturation is modified, e.g., by motivation (Virtala & Tervaniemi, 2017), and the perception and judgment of consonance/dissonance is connected to familiarity, emotions, preferences, learning, musical expertise, and knowledge of tonal hierarchy (Arthurs et al., 2018; Johnson-Laird, Kang, & Leong, 2012; Kuusi, 2015; Lahdelma & Eerola, 2016; Omigie, Dellacherie, & Samson, 2017; Parncutt & Hair, 2011; Popescu et al., 2019; Rehding, 2019; Virtala & Tervaniemi, 2017). In particular, musicians have been shown to outperform nonmusicians in behaviorally discriminating chords and dissonance from consonance (see, e.g., Kung et al., 2014; Sares et al., 2018; Virtala et al., 2014).
Brain Processing of Musical Stimuli
Mismatch negativity (MMN) is an event-related brain response measured with electroencephalography (EEG). A typical way to measure MMN is an oddball paradigm (Kujala, Tervaniemi, & Schröger, 2007) where a participant listens to a sound stream that consists of a frequent sound (typically 70 to 90 percent of all sounds) and an infrequent sound or sounds (typically 10 to 15 percent of all sounds, each). The infrequent sounds, or deviants, differ from the frequent sounds, or standards, in some acoustic feature like frequency, duration, or timbre. The MMN is defined as the difference in the responses between the standard and the deviant stimuli between 100–250 ms after the deviant onset (Näätänen, 1992; Näätänen, Gaillard, & Mäntysalo, 1978). The MMN originates from two sources: the bilateral supratemporal planes of the auditory cortices and the prefrontal cortex (Näätänen & Escera, 2000; Rinne, Alho, Ilmoniemi, Virtanen, & Näätänen, 2000). The dominating theory states that the MMN is elicited when an unexpected sound occurs in a stream of expected sounds: the prediction of the oncoming sound is violated and an MMN is elicited (Näätänen, 1992; Winkler, Denham, & Nelken, 2009; for a review, see e.g., Näätänen, Paavilainen, Rinne, & Alho, 2007).
The MMN latency and amplitude are closely related to perceptual accuracy in discriminating standard and deviant stimuli: The shorter MMN latency and/or the larger the MMN amplitude, the better the perceptual accuracy for pitch (Novitski, Tervaniemi, Huotilainen, & Näätänen, 2004; Tiitinen, May, Reinikainen, & Näätänen, 1994) or duration (Amenedo & Escera, 2000). Based on this close correspondence, the MMN has been called an “objective” measure of perceptual accuracy that is elicited even when the participants are not able to attend to the stimulus (e.g., patients with deficits in consciousness or communication skills; newborn infants and toddlers). Moreover, the MMN is useful when investigating nonclinical populations as well since its presence is not affected by modulation of participants’ motivation or attention. These are important issues particularly when investigating clinical populations or when comparing groups of participants who might differ in their task-specific motivation (e.g., high-level experts compared with laypeople).
Some studies have suggested that the MMN amplitude is affected by attention. Typically, the MMN is measured in a passive condition where the subject is not attending to the sound stream. However, the response has been shown to be larger in active compared to passive conditions for frequency deviations (Trejo, Ryan-Jones, & Kramer, 1995), intensity deviations in dichotic conditions (Woldorff, Hackley, & Hillyard, 1991), and violations in sound patterns (Alain & Woods, 1997). However, the MMN response in attended conditions is sometimes difficult to differentiate from the overlapping responses, such as N2b. This response occurs at the same latency but differs from the MMN e.g., based on scalp topography or sensitivity to attentional manipulation (Novak, Ritter, & Vaughan, 1992; Näätänen, Simpson, & Loveless, 1982), and in some studies where MMN and N2b have been carefully dissociated, there seems to be no sign of an effect of attention on the MMN amplitude (Gomes et al., 2000; Sussman et al., 2004; Sussman, Winkler, Huotilainen, Ritter, & Näätänen, 2002). According to Sussman (2007, 2017) the primary factor that modulates MMN is not attention to deviant stimuli, but the auditory context where the stimuli are presented; for example, some studies have reported that if attention is focused strongly on another simultaneous sound stream, the MMN responses are attenuated (Sussman, Winkler, & Wang, 2003; Woldorff et al., 1991). Instead, Näätänen et al. (2007) suggested in their review that attentional effects on the MMN might depend on the magnitude of the stimulus change or the quality of the change: smaller changes may be more affected by the attention than large changes. Overall, the role of attention on the MMN is not yet decided unequivocally.
The MMN has been used as a tool for inspecting change detection in auditory perception in numerous studies, and its response to deviations in a sound stream appears to be a very robust phenomenon. For example, the MMN has been found to be elicited by changes in pitch (Näätänen, Pakarinen, Rinne, & Takegata, 2004; Putkinen, Tervaniemi, Saarikivi, de Vent, & Huotilainen, 2014; Tervaniemi, Huotilainen, & Brattico, 2014; Vuust et al., 2011), sound duration (Näätänen et al., 2004; Putkinen et al., 2014), intensity (Näätänen et al., 2004; Putkinen et al., 2014; Vuust et al., 2011), timbre (Christmann, Lachmann, & Berti, 2014; Putkinen et al., 2014; Vuust et al., 2011), and rhythm (Putkinen et al., 2014; Vuust et al., 2011), as well as chord structure (Virtala et al., 2014). More complex deviations, such as change in the musical rules (like the change in the direction of the sound pairs, or adding an out-of-key tone to the melody) also elicit an MMN response (see, Näätänen et al., 2007; Paavilainen, Simola, Jaramillo, Näätänen, & Winkler, 2001). Particularly, the MMN response (or its magnetic counterpart MMNm) has been detected for discriminating consonant from dissonant intervals (Crespo-Bojorque, Monte-Ordoño, & Toro, 2018; Itoh, Suwazono, & Nakada, 2010) and chords (Brattico et al., 2009; Virtala et al., 2011). Although some of these studies have compared the ERPs between musicians and nonmusicians (Brattico et al., 2009; Crespo-Bojorque et al., 2018; Itoh et al., 2010), detection of the degree of dissonance in a chord setting has not been investigated. Moreover, in all studies of consonance/dissonance to date, tonality and consonance/dissonance have been confounded since the chords were formed from traditional Western major/minor modes, and not from more consonant intervals, such as fourths and fifths. Thus there was always difference in the familiarity of the consonant chords (major and minor triads) and dissonant chords (e.g., triads consisting of a minor second and a diminished fifth such as C4–C♯4–G4.
Musical Expertise in Sound Processing
Musicians typically outperform their nonmusician peers in behavioral sound change detection (Kung et al., 2014; Sares et al., 2018; Tervaniemi, Just, Koelsch, Widmann, & Schröger, 2005; Virtala et al., 2014). Moreover, several electrophysiological studies have reported enhanced auditory discrimination skills in musicians (Brattico et al., 2009; Kung et al., 2014; Pantev et al., 1998; Vuust et al., 2005), while others have reported MMN responses for musically relevant stimulus change occurring only in musicians (Crespo-Bojorque et al., 2018; Virtala et al., 2014; see Putkinen & Tervaniemi, 2018) or even in musicians representing specific genre (Tervaniemi, Rytkönen, Schröger, Ilmoniemi, & Näätänen, 2001). Virtala and colleagues (2014) found that only musicians showed an MMN response to minor and inverted major chords in the context of major chords, while the study by Tervaniemi et al. (2001) suggested that only the musicians with long-term experience in playing from memory showed MMN responses to deviating melodic contours, as opposed to nonmusicians, and musicians relying on the musical score while performing. This benefit of musicianship in neural discrimination seems to be more pronounced for complex stimuli, such as small pitch deviations (Koelsch, Schröger, & Tervaniemi, 1999; Marques, Moreno, Castro, & Besson, 2007; Tervaniemi et al., 2005) or musically embedded pitch changes, compared to, e.g., changes in sinusoidal sounds (Brattico, Näätänen, & Tervaniemi, 2001; Fujioka, Trainor, Ross, Kakigi, & Pantev, 2004). Indeed, the amplitude of the MMN response has been shown to be associated with differences in performance in behavioral paradigms such as, e.g., detecting deviating sound patterns or chord structures among the frequent standard stimuli (Tervaniemi, Ilvonen, Karma, Alho, & Näätänen, 1997; Tervaniemi et al., 2001; Virtala et al., 2014; see also, Näätänen et al., 2007).
Consonant vs. dissonant chords and intervals are detected by both musicians and nonmusicians, although musicianship seems to enhance this discrimination both behaviorally (Arthurs et al., 2018; Kung et al., 2014) and neurally (Brattico et al., 2009; Crespo-Bojorque et al., 2018; Kung et al., 2014). Recently, Arthurs et al. (2018) showed that when listening to twelve different chord types, listeners’ subjective ratings of consonance was associated with their level of experience in Western tonal music. Regarding brain measures, Crespo-Bojorque and colleagues (2018) did not find a difference in MMN amplitude between musicians and nonmusicians in neural discrimination of dissonant intervals in a consonant context, but only musicians showed an MMN response in a more atypical setting, namely, when consonant intervals were played in a dissonant context.
Previous studies have investigated the detection of dissonance vs. consonance with intervals (e.g., Arthurs et al., 2018; Crespo-Bojorque et al., 2018; Itoh et al., 2010; Kung et al., 2014) or with chords within a major/minor context (e.g., Brattico et al., 2009; Virtala et al., 2011). Furthermore, several ERP studies have deployed consonant and dissonant intervals and chords in a dichotomic way, not considering this musical feature as a continuum from very dissonant to highly consonant sound combinations (e.g., Brattico et al., 2009; Kung et al., 2014; Virtala et al., 2011). We wanted to investigate the ability to differentiate varying degrees of dissonance when compared to consonant chords. In order to do this, we employed two different degrees of dissonance in the experimental setting. We used four-tone chords comprised of either highly consonant intervals or highly dissonant intervals which do not belong to major or minor modes. The two different degrees of dissonance were created by adding more or less dissonant intervals to the chords. To enhance the ecological validity of the study, we used piano tones, as opposed to sine tones, as stimuli.
We hypothesized that: 1) the musicians would show larger MMNs and more accurate behavioral detection of dissonance vs. consonance than nonmusicians; 2) increasing the degree of dissonance would be seen in MMN responses and behavioral discrimination of chords; and 3) attending to the sounds would enhance neural discrimination of consonance vs. dissonance compared to the passive condition.
Fifteen nonmusicians (age 20–37 years, mean = 25.9, SD = 5.2, median = 24.0; 5 male) and sixteen musicians (age 20–32 years, mean = 25.7, SD = 3.6, median = 25; 6 male) participated in the study. All musicians had either graduated from (2) or were studying at university level music program (14). They had received formal instrument teaching for at least 10 years (mean = 14.2, SD = 2.5, median = 14.5, min = 10 max = 18), and were able to play 2 to 6 different instruments. Only classical musicians were included in the music group since the rules and conventions concerning dissonance are somewhat more distinct in classical music compared to, e.g., jazz music. Most of the nonmusicians had not received any instrument lessons (10). Five nonmusicians reported having participated in instrument lessons (1, 4, 5, 5 and 6 years) during childhood or adolescence. In order to avoid too much overlap in musical expertise in the groups, one participant reporting nine years of attendance in instrument lessons was excluded from the 16 individuals originally in the nonmusician group. None of the nonmusicians regularly played an instrument during the experiments.
The participants were Finnish speaking, right-handed, and reported having normal hearing and normal or corrected-to-normal vision. The education level between the groups did not differ. One participant from the nonmusician group was additionally excluded from the analyses for the behavioral test due to not completing the task.
The participants signed the informed consent and were told that they had the right to stop the experiment whenever they wanted. For participation, they were given vouchers (5€/30 minutes) to be used for cultural interests. The experiments were approved by the University of Helsinki Ethical Review Board in the Humanities and Social and Behavioural Sciences, in Helsinki, Finland, and were carried out in accordance with the committee’s guidelines and regulations, as well as with those of the Helsinki Declaration.
The chords of the experiment were created with Steinway piano sounds from the University of McGill DVD sound library (Opolko & Wapnick, 2006). Standard stimuli were consonant (CON) chords (dyads with octave doubles), and two types of dissonant chords (triads with an octave double of the root and tetrads) acted as deviant stimuli (D1 and D2). The chords were constructed with four tones so that: 1) they would pare down the effects of cultural context and conventions by excluding major or minor thirds that are typically included in chords of Western tonal music; 2) the ambitus, that is, the span between the outer voices (top and bottom tones of chords) does not change as the chord type changes, in order to minimize any perceptual issues relating to melodic motions and voice leading; 3) the changes between the three chord structures are produced by only a half-step change in one tone, again to minimize larger melodic movement; 4) the consonant chord consists only of the most consonant intervals (octave, perfect fifth, and perfect fourth; see e.g. Bowling & Purves, 2015); 5) the dissonant chords contain the clearly dissonant intervals of tritones and minor ninths (= minor second’s octave double), without emphasis on roughness as a factor contributing to dissonance (avoiding intervals of minor and major second); and 6) the other dissonant chord contains one dissonant interval more (a minor ninth) and one consonant interval less (an octave) than the other dissonant chord.
Thus, and taking octave equivalents into account, the consonant chord (CON) was comprised of tones C3-G3-C4-G4, which contains 2 octaves (C3-C4 and G3-G4), 3 perfect fifths (C3-G3, C4-G4 and C3-G4), and 1 perfect fourth (G3-C4). The first dissonant chord (D1) consisted of C3-F♯3-C4-G4, containing 1 perfect octave (C3-C4) and 2 perfect fifths (C4-G4 and C3-G4) but also 2 tritones (C3-F♯3 and F♯3-C4) and 1 minor ninth (F♯3-G4). The second dissonant chord (D2) consisted of C3-F♯3-C♯4-G4, containing no perfect octaves but 2 perfect fifths (C4-G4 and F♯3-C♯4) as consonant intervals, 2 tritones (C3-F♯3 and F♯3-C4) and 2 minor ninths (C3-C♯4 and F♯3-G4) as dissonant intervals. In D1, the fundamental frequencies of tones C3-C4-G4 are the first, second and third partial of the harmonic series starting from the fundamental frequency of C3, while the F♯3 stands apart from the series. In D2, the fundamentals of C3 and G4 are again the first and third partial of the same harmonic series on C3, while the fundamentals of F♯3 and C♯4 are the second and third partial of the series starting on the fundamental of F♯2.
Equal tempered natural-like piano sounds were used for all chords. The chords were further transposed to six different pitch levels, lowest tones being C3, D3, E3, F♯3, A♭3, and B♭3, fundamental frequency of all tones varying between C3 and F5, or 130.8–698.5 Hz. All the chords are depicted and listed in the Figure 1.
The same stimuli were used in both the active and passive conditions in the oddball paradigm (Figure 2) and in the behavioral test. The intensity of the stimuli was ∼60 dB SPL as measured with a sound level meter (Extech HD600, Extech Instruments, Boston, MA). Both EEG conditions were divided into four blocks, each including 360 stimuli. In the EEG paradigm, 80% of the stimuli were standards and each of the two deviant stimuli occurred in 10% of the stimuli. The stimuli were semi-randomized, so that two deviating stimuli never occurred consecutively, and two to six standard stimuli always occurred in succession before the next deviant. In addition, the transposition changed after each chord. The duration of the chord was 650 ms and the interstimulus interval was 350 ms.
The stimuli in the active condition (Figure 2B) was the same as in passive condition (Figure 2A) with the exception that the sound stream included also 16 violin tones (C4, 261.6 Hz). The participants were instructed to push a button when hearing a violin tone. The stimulus coming after the violin tone was excluded from the analyses to exclude the artefacts caused by muscle activity.
In a separate behavioral task, the participants were asked to discriminate between the chord types: 144 chord pairs were presented with an interstimulus interval of 800 ms between the chords of one chord pair. The participants were given 3700 ms to answer which of the two chords were more dissonant, and the next chord pair was presented automatically. The time given for answering was based on piloting the needed time to answer the task without prolonging the test unnecessarily. The chord types in one pair were always different. In half of the chord pairs (72), two dissonant chords of different types (D1 vs. D2) were presented, and in half, the consonant chord (CON) was compared to either the less dissonant chord CON vs. D1) or to the more dissonant chord (CON vs. D2). The order of the chords within chord pairs was counterbalanced and the transpositions and the order of the chord pairs were randomized.
The data were collected in the laboratory of the Department of Psychology and Logopedics at the University of Helsinki. During the experiment, the participants sat in an electrically shielded room. They were instructed to sit with their eyes open and avoid unnecessary movement. EEG measurement started with the passive condition where the participants were instructed to ignore the sounds coming from the headphones and concentrate on a muted self-chosen movie with subtitles. In the subsequent active condition, the participants looked at the picture on the screen and were instructed to concentrate on the sounds and push the button whenever they heard the violin tone. The stimuli were presented via Sony Professional MDR-7506 headphones. Each of the eight blocks lasted for 6 minutes and the participants were given the opportunity to have a break between the blocks.
The behavioral task was presented after the active EEG condition. The participants were presented with two consecutive chords and were asked to decide which of the chords was more dissonant and push the button accordingly (left-hand button for the first chord, right-hand button for the second chord). The instructions showed on the screen during the entire test. Before the test, the participants rehearsed the task: the dissonance and consonance of the chords was explained to the nonmusicians by using Finnish words for smoothness/accordance and roughness/discordance. All the participants were then played two examples of all three chord types. After this they were further played three chord pairs (consonant-more dissonant, less dissonant-more dissonant and more dissonant-less dissonant) with the experimenter giving the correct answer after each pair of which of the two chords was more dissonant. The participants were then allowed to listen to the chord pairs as many times as they wished to make sure that they understood the concepts of dissonance and consonance. The experiment lasted for 14 minutes and it was divided into two blocks, allowing the subject to have a break in between the blocks. The whole measurement (EEG and behavioral task) took 2.5 to 3 hours, total.
Data Recording and Processing
The EEG was recorded with 64 Ag-AgCl scalp electrodes according to the international 10–20 system (Electro-Cap International Inc., Eaton, OH, United States) and the data were registered with Biosemi 7.07 (BioSemi B. V., Amsterdam, Netherlands) with the sample rate of 512 Hz. In addition to 64 scalp electrodes, four additional electrodes were placed: one on the nose as a reference electrode, one below the left eye and one behind each mastoid bone. The stimuli were presented with Presentation 17.2 software (Neurobehavioral Systems, Inc., Albany, CA) and the responses in the active EEG paradigm and the behavioral test were recorded with the Cedars Response Pad (Model RB-834; Cedrus Corporation, San Pedro, CA, United States).
EEG data were processed with BESA 6.0. software (MEGIS Software GmbH, Gräfelfing, Germany). Noisy electrodes showing atypical signals (detected by visual inspection) were interpolated; that is, their signal was replaced by the average signal calculated (by BESA 6.0 software) over the surrounding electrodes. We further removed eye blink artefacts exceeding ±100 µV with semi-automatic Besa PCA method. EEG-epochs with amplitudes exceeding ± 100 µV were excluded from the analyses, and the percentage of accepted trials averaged over all participants and blocks was 92.2% (SD = 5.3). Of the six electrodes that were used in the conclusive analyses, on average 0.11 (SD = 0.32, median = 0, min = 0, max = 1) were interpolated.
Frequencies below 0.5 Hz and above 30 Hz were filtered out offline and the inspected epochs were extracted from the EEG between -100 ms to 500 ms around the onset of the stimulus. The electrode signals were re-referenced to the average of the mastoid electrodes. For each participant, the responses were averaged separately over each chord type, combining all transpositions, for both active and passive conditions separately. Averaged responses were then exported to MATLAB R2016 (The MathWorks Inc., Natick, MA) for further inspection.
The subtraction signals were created for both dissonant chord types by subtracting the averaged response for the consonant chord (CON) from the averaged responses for the dissonant chords (D1 & D2) for each participant, electrode, and condition. These subtraction signals were then averaged for both groups (musicians vs. nonmusicians), and for both conditions (active vs. passive).
Six electrodes (F3, Fz, F4, C3, Cz, and C4) were included in the EEG analyses. The MMN responses were first identified within the typical latencies of 100–250 ms (Näätänen, 1992; Näätänen et al., 1978). Following prior literature, the mean amplitudes for MMN were then calculated over 30 ms time windows (e.g., Ilvonen et al., 2004; Leung, Croft, Baldeweg, & Nathan, 2007; Tervaniemi, Maury, & Näätänen 1994; Weise, Grimm, Müller, & Schröger, 2010), covering ±15 ms area around the peak amplitudes of the averaged subtraction waveforms on the electrode Cz, where the amplitudes were the largest. Time windows were assessed separately for each group, condition, and deviant type. We then calculated the mean MMN amplitude over 6 electrodes (F3, Fz, F4, C3, Cz, and C4) for each participant and used this mean in the analysis. These frontal and central electrodes were chosen for averaging as the MMN typically appears on these scalp regions in EEG (See Näätänen et al., 2007, for a review). We conducted a two-way [condition (active, passive) x chord (less dissonant, more dissonant)] repeated measures ANOVA (rANOVA) with group as a between-subjects factor. Greenhouse-Geisser corrections were applied in the analyses if the sphericity could not be assumed. Statistically significant main effects and interactions with more than two levels were further inspected with pairwise comparisons. To counteract the problem of multiple comparisons, Bonferroni corrections were applied.
In the behavioral test, the percentage of correct answers in three different comparison conditions (CON vs. D1; CON vs. D2; D1 vs. D2) was compared between groups with rANOVA. The percentage of correct answers was not distributed normally for all groups and comparisons: in particular, the distribution of musicians’ group was skewed to the left when comparing the consonant chord to the dissonant ones. This was most likely due to a ceiling effect caused by the task being undemanding for the musicians. Because of this, the results were further inspected with the nonparametric Mann-Whitney U test. The alpha level was set at p ≤ .50. All the analyses were conducted with IBM SPSS Statistics 25.0 (IBM Corporation, NY).
Figures 3 and 4 depict standard and deviant responses averaged over both groups in the active and passive conditions on electrodes F3, Fz, F4, C3, Cz, and C4. The subtraction signals for both deviant chords are depicted in Figures 5 and 6.
Table 1 lists the mean amplitudes and the peak latencies for dissonant chords in both conditions and both groups for MMN response for the mean amplitude over frontal and central line (C3, Cz, C4, F3, Fz, and F4). All the inspected MMN amplitudes were statistically significant, indicating that both groups detected both degrees of dissonance in passive and in active conditions.
No main effects of group, F(1, 29) = 0.04, p = .839, or condition, F(1, 29) = 0.59, p = .448, were found for MMN responses. A marginally significant main effect was found for dissonant chord type, F(1, 29) = 3.67, p = .065, η2 = .112, suggesting larger mean amplitude for more—compared to less—dissonant chord type. Interaction between group and condition was significant, F(1, 29) = 7.25, p = .012, η2 = .220, with post hoc tests indicating that nonmusicians had larger MMN responses in the passive (mean = -1.52, SEM = 0.21) than in the active (mean = -1.01, SEM = 0.23) condition (p = .023). The results for all main effects and interactions are listed in Table 2.
The musicians identified the more dissonant chord accurately in 99.1% (SD = 1.3) of the CON vs. D2 chord pairs, in 96.5% (SD = 5.1) of the CON vs. D1 chord pairs and in 74.0% (SD = 8.8) of the D1 vs. D2 chord pairs. The nonmusicians were less accurate, identifying the more dissonant chord accurately in 82.1% (SD = 17.8) of the CON vs. D2 chord pairs, in 74.2% (SD = 18.2) of the CON vs. D1 chord pairs and in 63.3% (SD = 11.6) of the D1 vs. D2 chord pairs.
Group had a significant main effect on correct answers, F(1, 28) = 19.07, p < .001, η2 = .405, with musicians giving more correct answers than nonmusicians (Figure 7A). Furthermore, there was a significant main effect of chord pair, F(2, 56) = 84.06, p < .001, η2 = .750. Pairwise comparisons indicated that the fewest correct answers were given when comparing two different types of dissonant chords, followed by when comparing the less dissonant chord with the consonant chord, and the most correct answers were given when comparing the more dissonant chord with the consonant chord (D1/D2 < D1/CON, D1/CON< D2/CON; p < .001, each). Furthermore, there was a significant interaction of chord pair × group, F(1.51, 54) = 5.45, p = .013, η2 = .163. Pairwise comparisons showed that the musician group discriminated both dissonant chords from the consonant chord better than they discriminated the more and less dissonant chords from each other (p < .001, each). In addition, they discriminated the less and the more dissonant chords from the consonant chord equally well (D1 vs. CON and D2 vs. CON: p = .342) (Figure 7B). For nonmusicians, the pairwise comparisons indicated that there were fewer correct answers when comparing two different types of dissonant chords than when comparing both more dissonant and less dissonant chords with consonant chords (p < .001, p = .003, respectively) and fewer correct answers when comparing the less dissonant chord with a consonant chord than when comparing the more dissonant chord with a consonant chord (p < .001). The musicians gave more correct answers in all chord pairs than nonmusicians (D1 vs. D2, p = .008; D1 vs. CON, p < .001; D2 vs. CON, p = .001).
Group comparisons were further conducted with nonparametric Mann-Whitney U test. The results were in accordance with rANOVA [CON vs. D1: U = 25.00, p < .001; CON vs. D2: U = 14.00, p < .001; D1 vs. D2: U = 51.50, p = .012) indicating that the groups differed from one another in all chord discrimination conditions.
Our aim was to investigate the accuracy of neural and behavioral dissonance detection in chords with increasing degrees of dissonance. Our data indicate that both musicians and nonmusicians discriminated the dissonant chords from the consonant ones, neurally and behaviorally, but musicians outperformed nonmusicians in the behavioral task. While an increase in dissonance was detected by both groups behaviorally, the neural responses for the two types of dissonant chords differed only marginally (as reflected by the MMN). While attending to the stimuli did not modulate the musicians’ MMN, the nonmusicians’ MMN responses were larger in the passive than in the active condition.
Brain Basis of Consonance vs. Dissonance Detection
As there was no significant difference between the MMN responses of the musician vs. nonmusician groups, our results suggest that early sensory processing of sensory consonance vs. dissonance might not be an ability requiring explicit—or maybe not even implicit—training but is based on physiological properties of auditory system. In line with this suggestion, Virtala et al. (2013) have shown that even newborn babies are sensitive to dissonant chords played among standard major chords. However, the MMN, along with the other ERP components during the first 600 ms after the stimuli, is only one means to investigate neural discrimination, and thus the absence of difference between the groups’ MMN responses does not necessarily indicate that early sensory detection is not modulated by expertise. Furthermore, most Western listeners are exposed to consonant and dissonant chords prior to birth, and it would be difficult—and unethical—to measure the contribution of natural components compared to that of implicit or explicit learning on dissonance detection.
There was only a marginal difference between the MMN responses for the chords. The present study cannot determine whether this difference would become significant with a larger sample size (now N = 31) or not, and it does not rule out the possibility that increasing dissonance enhances its early sensory discrimination from consonance. Prior evidence shows that an increase in the degree of dissonance is detected in early auditory processing (Bidelman & Grall, 2014; Itoh et al., 2010) and further endorses the view of dissonance as a feature that is perceived as a continuum instead of dichotomic division early in the processing stream. However, based on the present study, it seems that the difference between the experts and the non-experts in the discrimination of the degrees of dissonance is not apparent on the level of early sensory processing but only in behavioral settings where the time for sound processing is substantially longer than in typical ERP paradigms.
Previous studies have suggested that the accuracy of behavioral detection is linked to larger MMN responses (Amenedo & Escera, 2000; Jaramillo, Paavilainen, & Näätänen, 2000; Novitski et al., 2004; Tiitinen et al., 1994). Our group-level results are not in line with these studies. Further, as the behavioral test was far too easy for the musicians in the current study, it was not possible to determine whether the neural and behavioral detection of the chords correlated on an individual level. Worth noting here is that the interval structures of the dissonant chords emphasized differently not only roughness but also harmonicity: the rougher chord (D2) contained two sets of two tones that constitute consonant intervals, while the smoother chord (D1) contained one set of three tones and one tone conflicting with that harmonic series. This, along with the differences between behavioral and MMN results, might suggest that the processing of dissonance and consonance may involve factors that the MMN does not detect, such as harmonicity. The early sensory processing of degrees of dissonance calls thus for more research, with paradigms consisting of dissonant chords structured in different ways regarding roughness and harmonicity, using different chords as frequent standards and infrequent deviating stimuli, and a spectrum of event-related brain potential (ERP) methods.
It might be that different interval structures within varying chord types—instead of the difference between consonance and dissonance per se—have an impact on the MMN response. Nevertheless, previous research has shown that unlike changes in chord type, inverted major chords among root major chords do not elicit an MMN response in nonmusicians (Virtala et al., 2011; Virtala et al., 2014). In these two studies, several transpositions were used for standard and deviant chords, and all deviant chords included only tones also present in the standard chords. These results suggest that the changing interval structure within the chord does not cause a change in the MMN, at least for laypeople. As the responses did not statistically differ between the musicians and nonmusicians in our study, it seems possible that the MMN response was not evoked by the different interval structures per se used in the deviant chord. To further explore this, consonant chords with inversion should be added in future studies to the deviating stimuli to rule out the other interpretation.
Attention and Dissonance Discrimination
The present study does not support previous results suggesting that attending to stimuli elicits larger MMN amplitudes compared to the passive paradigms (Alain & Woods, 1997; Trejo et al., 1995; Woldorff et al., 1991). In contrast, the MMN response was modulated by attention in nonmusicians, who showed larger MMN amplitudes to dissonance in passive compared to active conditions. Therefore, our result might support the studies showing that focused attention attenuates the MMN response in the passively listened sound stream (Sussman et al., 2003; Woldorff et al., 1991). Although the previous studies have used dichotic listening paradigms, they do not rule out the possibility that focusing attention to the highly salient sounds within the same sound stream might affect the neural discrimination of small deviations. This is also in line with Sussman’s (2007) proposal that passive processing of non-target sounds is attenuated while focusing to target sounds. In order to avoid artefact ERPs related to pressing the button in the present study, the target sound was very different from the actual stimuli, minimizing the possibility that the participants—especially the nonmusicians—would confuse target tones with the stimuli. This setting may have led the participants to ignore the actual harmonic structures of the chords and concentrate solely on the arguably more substantial timbral differences between the stimuli and target tones. Instead, due to their expertise in listening to musical sounds, the musicians might have needed to concentrate less on target sounds, so for them the difference between the conditions may not have been substantial. Finally, it cannot be ruled out that the musicians were listening covertly to the sounds during the passive condition, and this was why their brain responses did not differ between the conditions.
The Effects of Musical Expertise
In line with our results, Crespo-Bojorque et al. (2018) found no differences between expert and non-expert listeners in typical Western music context. Nevertheless, they did find modulating effects of musical expertise in a less conventional context; that is, infrequent consonant intervals within standard dissonant intervals. Also, Tervaniemi et al. (2005) failed to find any difference in the MMN responses between musicians and laypeople for frequency changes (although their discrimination performance during behavioral task differed). These studies support the suggestion that neural discrimination of relatively easy musical deviation is not affected by musical expertise beyond the implicit learning through everyday exposure to Western music. It seems thus possible that the paradigm of the present study may have been too “easy” to differentiate the groups on the early sensory processing level. The dissonant chords were not compared to each other, but both types of dissonance appeared in the middle of consonant chords.
Contradictory previous evidence using MMN as a marker for neural detection may stem from a difference in the settings: Brattico et al. (2009) used major triads and three-tone dissonant chords, whose frequency ratios do not differ from each other as much as the dissonant and consonant chords in our setting. Thus, this part of their experiment may have been more demanding for the listeners, highlighting the musician advantage. However, the results from two previous studies differ in their results for discrimination of minor and major chords. In Brattico et al. (2009), no difference was found in the groups’ MMN response for the deviating minor chord. Instead, Virtala et al. (2014) found that only musicians showed MMN responses to minor and inverted major chords among major chords. This discrepancy between these two studies could be due to the subjects: Brattico et al. (2009) had smaller groups and more musical experience in the nonmusician group than in the Virtala et al. (2014) study. Moreover, only Virtala et al. (2014) used chords transposed to several frequency levels, thus making the stimulation context far more demanding than in Brattico et al. (2009).
In line with our original hypothesis, the musicians more accurately detected the dissonant chords than the nonmusicians in the behavioral task, and they also showed less within-group variability. The musicians discriminated both types of dissonant chords from the consonant one equally well. This may be explained by a ceiling effect, due to the ease of the task for the experts. The nonmusicians were less accurate. However, even the nonmusicians performed better than chance level in all discrimination tasks, showing that detecting the differences between dissonant chords is not dependent on intensive training alone.
Taken together, the group-related results in ERPs and behavioral test indicate that while expertise does not facilitate discrimination as assessed by the MMN, it does facilitate behavioral performance. Thus, the effects of musical experience appear to emerge somewhere between the latency of the MMN response (∼150–250 ms) and the button press in the behavioral task (∼1000–2500 ms). It may be that the better performance of the musicians in the task is also partially due to better working memory capacity in musicians (e.g., Pallesen et al., 2010; Zuk, Benjamin, Kenyon, & Gaab, 2014). This capacity could thus facilitate the conscious comparison of sounds during the behavioral task.
Limitations of the Study
The musicians in the study were all trained in Western classical music. As the individual instruments and genres may have an effect on early sensory processing, the results may not be generalizable to all musicians (Tervaniemi, Janhunen, Kruck, Putkinen, & Huotilainen, 2016; Vuust, Brattico, Seppänen, Näätänen, & Tervaniemi, 2012), or other musical cultures.
The experimental setting tested the discrimination of dissonance in a consonant context. Differences between experts and laypeople might arise in more challenging settings, such as presenting consonant or varying levels of dissonant chords in dissonant contexts (Crespo-Bojorque et al., 2018). In an ideal experiment, both contexts, and more levels of dissonant chords could be explored. Unfortunately, this was out of the scope in the current project due to lack of sufficient resources.
It cannot be ruled out that the participants were actively listening to the sounds during the passive condition, and this may explain the lack of difference between the responses in the two conditions. In particular, the musicians might have been tempted to concentrate on the sounds in the passive condition, which was always played first. This was done to avoid the learning effects affecting the passive condition. However, participants might have been more fatigued and had more difficulties in maintaining attention in the active condition.
The present study offers new insight about the neural basis of sensory dissonance and the effects of expertise. Furthermore, it reveals the effects of expertise on the behavioral discrimination of different degrees of dissonance. Early sensory processing of consonance vs. dissonance, as indexed by the MMN, was processed irrespective of musical expertise. Attending to the stimuli had an impact only on the nonmusician group, which showed smaller MMN amplitudes for the active than the passive condition, suggesting that focusing strongly on specific sounds attenuates processing of unessential sounds for laypeople. As the facilitating effects of expertise were apparent only in the behavioral discrimination test, it seems that the musician benefit in discrimination of consonance vs. dissonance emerges somewhere between the latency of the MMN response and the button press in the behavioral task. In addition, the study shows marginal support for the notion of dissonance as a continuum, which appears in the early stages of auditory processing.
Authors Tanja Linnavalli, Juha Ojala, and Laura Haveri contributed equally to the manuscript.
This work was supported by the Finnish Cultural Foundation (Helsinki, Finland) and The Gyllenberg Foundation (Helsinki, Finland). The authors would like to thank laboratory engineer Tommi Makkonen for his help in technical issues, Dr. Caitlin Dawson for proof-reading of the manuscript, and Olli Saari and Kalle Toikka for their assistance in data collection.