Infants and children are able to track statistical regularities in perceptual input, which allows them to acquire structural aspects of language and music, such as syntax. However, much more is known about the development of linguistic compared to musical syntax. In the present study, we examined 3.5-year-olds’ implicit knowledge of Western musical pitch structure using electroencephalography (EEG). Event-related potentials (ERPs) were measured while children listened to chord sequences that either 1) followed Western harmony rules, 2) ended on a chord that went outside the key, or 3) ended on an in-key but less expected chord harmonically. Whereas adults tend to show an early right anterior negativity (ERAN) in response to unexpected chords (Koelsch, 2009), 3.5-year-olds in our study showed an immature response that was positive rather than negative in polarity. Our results suggest that very young children exhibit implicit knowledge of the pitch structure of Western music years before they have been shown to demonstrate that knowledge in behavioral tasks.
Music and language are two complex, hierarchically structured communication systems that both require extensive experience to develop. Infants and children learn the language(s) to which they are exposed without formal instruction, but they also learn themusic of their culture. Althoughmusical knowledge is often dismissed as a non-universal area of expertise, in fact, all normally developing individuals acquire sophisticated musical abilities (Brandt, Gebrian, & Slevc, 2012). For example, perceiving the emotion expressed through music, tapping along to a musical beat, and developing expectations for the order of musical events are all non-trivial abilities that emerge through development. However, far less is known about the developmental trajectory of these abilities compared to the development of language. How and when do young children develop sophisticated musical knowledge?
In the present study, we focused on the development of pitch knowledge in Western music. Several studies suggest that infants are either born with, or quickly acquire, several biases that may facilitate later understanding of pitch structure. Infants prefer or show facilitated processing for consonant, pleasant-sounding pitch combinations over dissonant, unpleasant-sounding pitch combinations (Masataka, 2006; Schellenberg & Trehub, 1996a, 1996b; Trainor & Heinmiller, 1998; Trainor, Tsang, & Cheung, 2002; Zentner & Kagan, 1998; but see Plantinga & Trehub, 2014). As musical phrases tend to follow a pattern of tension (i.e., dissonance) increase followed by resolution (i.e., consonance), this early bias may contribute to higher-level processing of musical phrases, as well as to emotional perception in music. A second processing bias is for relative pitch. Infants treat a transposed melody (i.e., a melody that has been shifted either higher or lower in pitch) as sounding familiar as long as the relative distances between the pitches are the same in the original and transposed versions, even if the transposed version includes none of the original pitches (Plantinga & Trainor, 2005; Schellenberg & Trehub, 1999; Trainor & Trehub, 1992; Trehub, Bull, & Thorpe, 1984). This ability to process pitch in a relative rather than an absolute way is necessary for processing patterns rather than individual tones, which is fundamental to music perception. A third bias is for infants to process pitch patterns based on unequal-step scales (e.g., Western scales based on tones and semitones) more easily than patterns based on scales in which adjacent notes are all equally spaced apart in pitch (Trehub, Schellenberg, & Kamenetsky, 1999); unequal spacing allows each note in the scale to have a unique set of pitch relationships with the remaining notes, which allows each note to serve different functions within a melody.
Biases that are present at birth or that emerge very early in development are more likely to be universal across musical systems and cultures (Hannon & Trainor, 2007). By contrast, culture-specific knowledge of musical structure is likely to emerge later in development because although it may build on early, universal processing biases, it requires accumulating musical experience (Corrigall & Schellenberg, 2015). Furthermore, different aspects of musical structure may be acquired earlier or later depending on their complexity; for example, Western music combines a relatively complex pitch structure with a simpler metrical structure, which may explain the relatively long developmental trajectory for the acquisition of Western pitch structure. Most cultures base their music on scales, which include a small number of pitch intervals per octave, most typically 5–7. As such, one aspect of the development of culture-specific pitch knowledge is understanding which notes belong in scale and which do not; in Western music, this is referred to as knowledge of key membership. Much more rare across musical systems is the simultaneous combination of notes into chords, which forms the basis for Western harmonic structure. Thus, acquiring pitch structure knowledge in Western music includes 1) knowing which notes are included in particular scales, such as the major scale, and which are “out-of-key” notes, 2) understanding which notes are combined to form chords, and 3) developing expectations for which notes and chords follow one another at particular points in a musical phrase. The rules for ordering musical events according to a hierarchical structure are collectively referred to as musical syntax just as linguistic syntax refers to rules for ordering linguistic elements.
Behavioral methods suggest that children have some understanding of key membership by approximately 5 years of age, whereas harmonic knowledge emerges slightly later at 6 or 7 years of age. For example, infants do not demonstrate knowledge of key membership in that they readily detect both within-key and out-of-key changes in a melody (Trainor & Trehub, 1992), whereas 5-year-olds do show knowledge of key membership, more readily detecting out-of-key changes than withinkey changes in a melody (Trainor & Trehub, 1994). However, it is not until children are 6 or 7 years of age that they demonstrate harmonic knowledge by judging or rating an in-key but harmonically unexpected note or chord as sounding “bad” or incomplete (Corrigall & Trainor, 2009, 2010, 2014; Cuddy & Badertscher, 1987; Krumhansl & Keil, 1982; Lamont & Cross, 1994; Martínez-Castilla, Rodríguez, & Campos, 2016; Speer & Meeks, 1985). Furthermore, school-aged children are slower to make speeded judgments about a target chord (e.g., whether it was played in piano or trumpet timbre) when that chord violates the rules of Western harmony than when it conforms (Schellenberg, Bigand, Poulin-Charronnat, Garnier, & Stevens, 2005). Familiarity may play a role, however, as 4-year-olds do show some sensitivity to key membership and harmony when very simple familiar passages are used (Corrigall & Trainor, 2009, 2010).
One issue with the use of behavioral tasks is that they require focused attention, which can be very difficult for preschoolers and toddlers. Electrophysiological methods, by contrast, place little demand on young children's attentional systems and can be used to examine implicit knowledge of pitch structure. Extensive work with adults has revealed that two event-related potential (ERP) components are typically elicited in response to chords that violate Western key or harmony rules (e.g., Koelsch, Gunter, Friederici, & Schröger, 2000; Koelsch et al., 2001; see Koelsch, 2009, for a review). The first, an early right anterior negativity (ERAN), is thought to be automatic because it can be elicited in the absence of focused attention (Koelsch et al., 2001; Koelsch, Schröger, & Gunter, 2002; Loui, Grent-'t Jong, Torpey, &Woldorff, 2005). In adults, its amplitude tends to peak between 150–200 ms after stimulus onset (Koelsch, 2009), it reverses polarity in occipital regions (e.g., Corrigall & Trainor, 2014), and it is primarily generated in the pars opercularis of the inferior fronto-lateral cortex (Koelsch, 2006). The second, later negative component (N5) is more strongly influenced by attentional focus (e.g., Koelsch et al., 2000), and peaks at approximately 500 ms after stimulus onset in adults, reversing polarity in the occipital and/or parietal regions (e.g., Corrigall & Trainor, 2014). The neural generators of the N5 remain unknown, but are suspected to be partly in the temporal lobe and partly in the inferior frontal gyrus (Koelsch, 2011). Importantly, both the ERAN and the N5 are influenced by the degree of harmonic violation (e.g., Kim, Kim, & Chung, 2011; Koelsch et al., 2003, 2000), suggesting that they are sensitive measures of implicit harmonic knowledge.
Studies show that both the ERAN and the N5 occur in response to harmonic violations in 5-, 9-, and 11-year-olds (Jentschke & Koelsch, 2009; Jentschke, Koelsch, Sallat, & Friederici, 2008; Koelsch et al., 2003), although the latency of the ERAN tends to be longer in children than it is in adults, especially at younger ages. More recently, two studies have examined electrophysiological responses in even younger children, but found discrepant results. In a recent study, we found that both key and harmony violations elicited positivities in 4.5-year-olds, in contrast to adults who showed the typical ERAN and N5 responses (Corrigall & Trainor, 2014). Nevertheless, the components were found in the same scalp regions and at similar latencies (150–250 ms for key violations, and 190–290 ms as well as 425–675 ms for harmony violations) as previous studies with older children, despite the fact that 4.5-year-olds did not show measureable behavioral sensitivity to key membership or harmony using the same stimuli. We suggested that the components found in young children reflect an immature brain response to disruptions in musical syntax. However, another recently published study with even younger children (2.5-year-olds) found that the adultlike negative ERAN response was elicited to both key and harmony violations, although no N5 was observed (Jentschke, Friederici, & Koelsch, 2014).
It is unclear why the two most recent studies on ERP responses to musical syntactic violations in young children have found inconsistent results. One possibility, suggested by Jentschke et al. (2014), is that the differences stem from the musical stimuli used in each study. The chord sequences used by Jentschke et al. may have produced stronger tonal expectations, leading to a more mature response to violations of those expectations in young children. Nevertheless, adults in our study showed the typical ERAN and N5 responses to the same stimuli, suggesting that chord sequences did in fact create sufficient and typical tonal expectations (Corrigall & Trainor, 2014). Similarly, it is possible that the discrepant findings resulted from Jentschke et al.'s use of a linked left and right mastoid reference compared to our use of a common average reference (Corrigall & Trainor, 2014); however, this explanation is again difficult to reconcile with our results from adult participants.
Another possibility is that the immature positivity is more typical in a representative sample of children, compared to a more selective group. In our study (Corrigall & Trainor, 2014), useable data (i.e., at least one fully completed block of trials without excessive artifacts) was collected from 48 of 55 children (one additional child was excluded for having formal music training), which represents an inclusion rate of 87%, and most children completed two blocks of trials. By contrast, only 62 of 96 children were included in Jentschke et al.'s (2014) final sample, equal to a 65% inclusion rate. Although children were significantly younger in Jentschke et al.'s study and therefore more prone to difficulties with attention and excessive movement, their final sample may have consisted of a more mature group of children who were able to sit still for a relatively long period of time.
A final possibility that may also contribute to the issue of selection bias is that Jentschke et al. (2014) used a conventional trial rejection method, which removes any trials with excessive artifacts from data analysis. This method is typically used with adults who have fewer problems remaining still; however, with children, it led to the inclusion of only 29–91 trials out of a possible 192 trials (Jentschke et al., 2014). In short, the included data may have been fairly selective. By contrast, we used an artifact-blocking algorithm that has been shown to be superior to the conventional trial method using infant data (Fujioka, Mourad, He, & Trainor, 2011), which allowed us to include all 120 trials per block. Immature positive responses may emerge when substantial data rejection is not required. These results also fit with other research on infants showing that immature positive responses may precede mature frontal negativities (e.g., He, Hotson, & Trainor, 2009; Tew, Fujioka, He, & Trainor, 2009; Trainor et al., 2003).
In the present study, we sought to replicate and extend our previous results with 4.5-year-olds in an age group that has not been studied thus far: 3.5-year-olds. These children were also equidistant in age from the 2.5-year-olds examined by Jentschke et al. (2014) and the 4.5-year-olds examined in our earlier study (Corrigall & Trainor, 2014). We used the same chordal stimuli as in our previous experiment, which consisted of 5-chord sequences that ended 50% of the time on the expected tonic chord, and 50% of the time on a chord in an unexpected key (in one condition) or on an in-key but harmonically unexpected chord (in a second condition). Our goal was to examine whether these younger children would also show immature positive responses rather than the adult-typical ERAN and/or N5 components.
We tested 44 3-year-olds (22 girls, 22 boys; Mage = 3.5 years, SD = 0.1 years) in one of two conditions. The final sample included 11 girls and 12 boys in the unexpected key condition, and 11 girls and 10 boys in the unexpected harmony condition. An additional six children were tested but excluded from the final analyses for the following reasons: unwilling to put or keep the EEG cap on (n = 2), excessive movement or artifacts in the EEG data (n = 4). This represents an inclusion rate of 88%. Demographic information and music/dance experience was collected via parent questionnaires; there were no group differences in average parent education (coded on a 7-point scale), t(40) = 1.08, p = .29, family income (coded on a 6-point scale), t(40) = 0.16, p = .87, cumulative duration (in months) of participation in infant and toddler music and dance classes, t(41) = 0.30, p = .77, or the number of hours children listened to music per week, t(37) = 0.57, p = .57.
The stimuli were identical to the chord sequences used by Corrigall and Trainor (2014) in Experiment 2. There were four different sequences, each consisting of five 4-note chords in root position, played in piano timbre. Each sequence had one standard version that always ended on the tonic chord, and two deviant versions that were identical to the standard except that they either ended 1) outside the key on a flat supertonic (unexpected key version), or 2) in-key but on the less expected subdominant chord (unexpected harmony version; see Figure 1 for the musical notation of all three versions of one example sequence). Table 1 lists the number of exact pitches (e.g., C4) and pitch classes (e.g., a C in any octave) in each target (i.e., last) chord (standard, unexpected key, unexpected harmony) that were presented in each prime sequence (the first four chords of the sequence). By definition, the flat supertonic chord never occurred in the prime, nor had most of its individual notes or pitch classes. As such, sensory priming (e.g., Bigand, Delbé, Poulin-Charronnat, Leman, & Tillmann, 2014; Bigand, Poulin, Tillmann, Madurell, & D'Adamo, 2003) could potentially explain any observed response differences between sequences ending in a tonic chord and those ending in a flat supertonic chord, as in the unexpected key condition. However, to reduce sensory priming in the unexpected harmony condition, the tonic and subdominant chords never occurred in the prime; furthermore, there were no significant differences between the standard and unexpected harmony conditions regarding the number of exact pitches of each target chord that were presented in the prime, t(6) =1.85, p = .11, nor in the number of pitch classes, t(6) = 1.67, p = .15. Sequences were 3.6 s in duration, with each of the first four chords lasting 600 ms in duration and the final chord lasting 1,200 ms in duration. Subsequent sequences began immediately after the end of the final target chord. Each version of each sequence was transposed to all 12 major keys.
|.||Number of Exact Pitches Presented in the Prime Sequence .|
|Standard .||Unexpected Key .||Unexpected Harmony .|
|.||Number of Exact Pitches Presented in the Prime Sequence .|
|Standard .||Unexpected Key .||Unexpected Harmony .|
|.||Number of Pitch Classes Presented in the Prime Sequence .|
|Standard .||Unexpected Key .||Unexpected Harmony .|
|.||Number of Pitch Classes Presented in the Prime Sequence .|
|Standard .||Unexpected Key .||Unexpected Harmony .|
Children either sat next to their parent or on their parent's lap in a sound-attenuated room, facing a speaker and a monitor (approximately 1 meter away). An experimenter instructed children to be “as quiet as a mouse” and “as still as a statue” and silently reminded them of these instructions if they forgot during the experiment. While the musical stimuli played from the speaker, children watched a silent movie of their choice, and the experimenter entertained them with silent toys and bubbles if they became distracted from the silent movie.
Children were presented with 120 trials of the standard and 120 trials of one of the deviant versions of the same chord sequence, transposed to all major keys and presented in a pseudo-random order such that no two consecutive trials were presented in the same key. In other words, each child heard 10 repetitions of one of the four standard sequences in each of the 12 keys, and 10 repetitions of the deviant version of that sequence in each of the 12 keys. The experiment lasted approximately 15 min. Stimuli were presented using E-prime software (version 1.2). After the experiment, children received a certificate and a book or toy as appreciation for participating.
DATA RECORDING AND ANALYSIS
We recorded EEG with a Geodesic Sensor net and Electrical Geodesics Inc. Netstation 4.3.1 software at 124 scalp locations. EEG was recorded online at a sampling rate of 1,000 Hz with a Cz reference, and bandpass filtered between 0.1 and 400 Hz, keeping impedances below 50 kΩ. Offline, the data were filtered between 0.5 and 20 Hz, downsampled to 200 Hz using eeprobe software, and then run through an artifact-blocking (AB) algorithm to reduce movement-related artifacts (see Fujioka et al., 2011) with Matlab software using a threshold of ± 100 μV. In eeprobe, electrodes were then digitally re-referenced to a common average, and the data were segmented into 900 ms epochs with a baseline starting 100 ms before the onset of the final chord of each trial. For each electrode site, standard and deviant trials were averaged separately relative to the 100 ms baseline. For analysis, ten groups of electrodes were formed, averaging together the channels in each group that corresponded to frontal, central, parietal, occipital, and temporal regions of the scalp for each hemisphere (see Figure 2). We then created a difference wave for each participant in each condition at each scalp region by subtracting the standard waveform from the deviant waveform. Data were analyzed using IBM SPSS Statistics software.
Grand average standard and deviant waveforms in each condition are shown in Figure 3, and difference waves in Figure 4. To follow the same procedure as our previous study with 4-year-olds (Corrigall & Trainor, 2014), preliminary t-tests comparing standard and deviant waves across time were conducted. They suggested an early component (deviants more positive than standards) in both the unexpected key and unexpected harmony conditions, and a late positive component in the unexpected key condition, but not in the unexpected harmony condition. In the unexpected key condition, the early and late components were evident primarily in right frontal and central regions and reversed polarity at the left temporal region. In the unexpected harmony condition, the early component was present primarily in both left and right frontal regions and reversed polarity at left and right occipital regions.
To analyze the peaks of the early and late components in the grand average at frontal (in both conditions) and central (in the unexpected key condition) sites, preliminary t-tests were used to specify time windows (+/- 100 ms around the peak of the grand average) in which the average amplitudes of the difference waves were calculated and used as the dependent measures in the following analyses. The average amplitudes at temporal (in the unexpected key condition) and occipital (in the unexpected harmony condition) regions were reverse-signed (such that negative average amplitudes were transformed to have a positive average amplitude and vice versa) so that the magnitude of the component across scalp regions could be analyzed. Parietal regions were not included as no significant components were present. Greenhouse-Geisser corrections to degrees of freedom were used whenever appropriate.
In the unexpected key condition, the time window of the early peak in the difference wave was 120 to 320 ms after the onset of the final chord. We first conducted an ANOVA with hemisphere (left, right) and region (frontal, central, temporal) as within-subjects factors. There were no main effects, but the hemisphere by region interaction approached significance, F(2, 44) = 3.99, ε = .54, padj = .054, . We then conducted t-tests comparing the average amplitude of the difference wave between 120 and 320 ms in each hemisphere of each region to zero. With Bonferroni correction for multiple tests, the adjusted significance level was .008. The amplitude of the difference wave was significantly different from zero in the right frontal region, t(22) = 3.49, p = .002, and the left temporal region, t(22) = 5.28, p < .001. There was also a nonsignificant trend at the right central region, t(22) = 2.49, p = .021. Thus, 3-year-olds exhibited an early positivity that was maximal at the right frontal region and reversed polarity in the left temporal region in response to a violation of key membership.
The late component was measured between 360 and 560 ms. Again, the only significant effect was a hemisphere by region interaction, F(2, 44) = 5.36, p = .008, . Follow-up t-tests revealed that the amplitude of the difference wave was not significantly different from zero at any region when Bonferroni corrections were applied. However, there were trends for significance in the right frontal region, t(22) = 2.78, p = .011, and the left temporal region, t(22) = 2.84, p = .010. Thus, there was a hint of a late positivity in the right frontal region that reversed polarity in the left temporal region when a chord went outside the key of the preceding sequence, but any such effects were weak.
In the unexpected harmony condition, the early component was measured as the average amplitude of the difference wave between 160 and 360 ms. We first conducted an ANOVA with hemisphere (left, right) and region (frontal, occipital) as within-subjects factors. Although there were no significant differences between the standard and unexpected harmony conditions with regard to the number of exact pitches or pitch classes shared between the target and prime chords, inspection of Table 1 suggested that these numbers may have been more equivalent in Sequence 4 than in Sequences 1-3. As such, we also included sequence (1, 2, 3, or 4) as a between-subjects variable in this analysis to examine whether responses differed according to sequence. There were no significant main effects or interactions in the omnibus ANOVA, all ps > .198. A follow-up t-test comparing the average amplitude of the difference wave across hemispheres and frontal and occipital regions to zero was significant, t(21) = 3.17, p = .005, suggesting that 3-year-olds showed an early frontal bilateral positivity to violations of harmony. Because no late component was identified in the unexpected harmony condition in preliminary t-test analyses, no further analyses were conducted for this time window.
The present study replicates and extends our previous results with 4.5-year-olds (Corrigall & Trainor, 2014): 3.5-year-olds showed early positivities in frontal regions that peaked at approximately 220 ms after the onset of the last chord in the unexpected key condition, and at 260 ms in the unexpected harmony condition. They also showed hints of a late positivity that peaked at approximately 460 ms in the unexpected key condition, but this effect did not reach significance when we corrected for multiple comparisons. The early responses are highly similar to those of the 4.5-year-olds in our previous study, but slightly later in latency (approximately 20 ms). Previous research suggests that before the emergence of mature negative responses such as the mismatch negativity (MMN), young infants often show an early positive response to the same stimuli (e.g., He et al., 2009; Tew et al., 2009; Trainor et al., 2003). Our results suggest that a similar developmental pattern occurs in response to violations of musical syntax: an immature positive response that precedes the development of the adult-like ERAN.
The presence of an immature positive response in 3.5-year-olds suggests that they have some implicit knowledge of key membership and harmony. The stimuli were constructed such that responses cannot be explained easily by sensory priming, in which different responses are elicited to notes and chords that occur frequently compared to infrequently in the prime sequence (the chords that precede the target chord; see Bigand et al., 2014, 2003). This is because very similar results were found in the unexpected key condition, in which sensory priming could have occurred, and the unexpected harmony condition, in which sensory priming was unlikely. Furthermore, the chord stimuli were transposed to all 12 keys, and standard and deviant chord sequences were each presented 50% of the time, further limiting any influence of sensory priming. Lastly, the size of the early positivity did not differ according to the particular sequence children heard, even though priming of the tonic versus subdominant target chord was more equivalent in sequence 4 compared to sequences 1-3. As such, 3.5-year-olds’ responses most likely reflect their implicit knowledge of patterns in Western music, such as the fact that the tonic chord often follows the dominant chord at the end of a musical phrase. Nevertheless, there was a nonsignificant trend for more repeated pitches and pitch classes of tonic (standard) chords compared to subdominant (unexpected harmony) chords in the prime sequences, which may have led to a higher expectation for tonic compared to subdominant chords (i.e., learning) over the course of the experiment. To ensure that results cannot be explained by sensory priming, future research should use standard and deviant sequences that are even more closely matched with regard to priming of exact pitches and pitch classes. Furthermore, it may be important to present the standard and deviant versions of several different sequences to each participant rather than using the same sequence repeatedly. Although learning to expect one type of chord over another is unlikely to occur with 50% presentation of each sequence type (because each type of ending is equally likely), slight differences in the number of repeated pitches or pitch classes for a given sequence could lead to the formation of target chord expectations over hundreds of trials. As such, varying the sequences for a given participant could potentially reduce this possibility.
ERP responses appear to be particularly sensitive measures of Western tonality perception because children do not show behavioral evidence of musical syntax understanding, at least with the stimulus set used in the current study, until 5 years of age (Corrigall & Trainor, 2014). The immaturity of the ERP response at 3.5 years of age (current study) and 4.5 years of age (Corrigall & Trainor, 2014), however, provides converging evidence that knowledge of key membership and harmony develops gradually throughout childhood. Behaviorally, infants typically fail to show sensitivity to either key membership or harmony (Schellenberg & Trehub, 1999; Trainor & Trehub, 1992; Trehub, Cohen, Thorpe, & Morrongiello, 1986), 4-year-olds exhibit understanding of both in the context of a familiar song (Corrigall & Trainor, 2010), 5-year-olds show knowledge of key membership in the context of unfamiliar Western melodies and chord sequences (Corrigall & Trainor, 2014; Trainor & Trehub, 1994), and children who are 6 years old and older show evidence of harmony perception in unfamiliar Western melodies and chord sequences (Costa-Giomi, 2003; Cuddy & Badertscher, 1987; Krumhansl & Keil, 1982; Lamont & Cross, 1994; Schellenberg et al. 2005; Speer & Meeks, 1985; Trainor & Trehub, 1994). Electrophysiologically, the developmental pattern is similar: infants fail to show an ERAN response (see Koelsch, 2009), 2.5- to 4.5-year-olds show either a small ERAN with a very late latency (Jentschke et al., 2014) or a positive response with a somewhat longer latency than the ERAN that is observed in adults (Corrigall & Trainor, 2014; present data), 5-year-olds show a clear ERAN with a longer latency than has been observed in adults (Jentschke et al., 2008; Koelsch et al., 2003), and 11-year-olds show an adult-like ERAN (Jentschke & Koelsch, 2009; Koelsch, 2009). As such, the results of the present study imply that implicit knowledge of Western tonality, as measured through EEG, emerges earlier in development than explicit knowledge that is measured behaviorally (e.g., by asking children to make judgments about whether chord sequences sound good or bad).
Musical syntax knowledge is likely acquired implicitly and driven largely by experience through domaingeneral learning mechanisms such as implicit or statistical learning (see Aslin & Newport, 2012; Rohrmeier & Rebuschat, 2012; Romberg & Saffran, 2010, for reviews). These learning mechanisms allow listeners to internalize the structural features of complex input by tracking statistical regularities during passive listening. For example, Saffran, Aslin, and Newport (1996) showed that after 2 minutes of exposure to a continuous stream of language syllables, 8-month-old infants recognized groups of syllables that had frequently occurred successively (e.g., the sound “go” always followed by the sound “la”). Subsequent research revealed that infants could also track statistical regularities in other kinds of input, including musical tones (e.g., Saffran, Johnson, Aslin, & Newport, 1999) and visual object sequences (e.g., Kirkham, Slemmer, & Johnson, 2002), and that they could extract more abstract patterns (algebra-like rules) from similar input (e.g., Marcus, Vijayan, Rao, & Vishton, 1999). For particularly complex information systems—such as language and music—that include multiple structural aspects (e.g., phonetic and syntactic in language; tonal and metrical in music), implicit learning likely requires years of exposure. Findings from connectionist modeling suggest that these statistical probabilities are learned incrementally over time with accumulating training or exposure to Western music, with earlier emergence of sensitivity to key/scale membership and later emergence of sensitivity to harmony (Matsunaga, Hartono, & Abe, 2015). Furthermore, although previous studies suggest that there is a genetic component to basic pitch discrimination (e.g., Drayna, Manichaikul, de Lange, Snieder, & Spector, 2001; Mosing, Madison, Pedersen, Kuja-Halkola, & Ullén, 2014; Seesjärvi et al., 2016), which relies on working memory skills, a recent twin study estimated the influence of shared environment effects on tonality perception at 59%, whereas genetic effects were negligible (Seesjärvi et al., 2016). These findings provide further support for the role of experience in acquiring sensitivity toWestern pitch structure.
Studies on formal music exposure also support the role of experience in the development of key and harmony perception. For example, Gerry, Unrau, and Trainor (2012) found that participation in six months of baby music classes led 12-month-olds to show greater interest in music that followed Western tonal rules than in atonal music. In another study, Corrigall and Trainor (2014) found that 4- and 5-year-old girls—who were more likely to have participated in early music and dance classes—tended to perform better than boys on music but not memory tasks. These results imply that the gender differences in the music task were driven by experience rather than general cognitive abilities. Furthermore, Corrigall and Trainor (2009) followed 3- to 6-year-olds who were either beginning music lessons or who were not participating in any formal music training for 8-12 months, and found more rapid improvement in harmony understanding over this period in the music training group. Several studies have also found that the ERAN is larger in musically trained children and adults compared to individuals without any music background (Jentschke & Koelsch, 2009; Koelsch, Jentschke, Sammler, & Mietchen, 2007; Koelsch et al., 2002; see Koelsch, 2009, for a review). Taken together, improvements in sensitivity to key membership and harmony with age, as well as with formal training, suggest a strong role for music experience. However, future research should examine the association between individual differences in informal music listening experience and children's behavioral and electrophysiological responses to violations of Western pitch structure.
It remains unclear why the results of the present study, as well as our previous study with 4.5-year-olds (Corrigall & Trainor, 2014), differ from those of Jentschke et al. (2014), who found a small ERAN in 2.5-year-olds. One possibility is that our stimuli may have created weaker tonal expectations than those of Jentschke et al. because of our effort to reduce the influence of sensory priming. Weaker tonal expectations could lead to more immature electrophysiological responses, especially in young children who are only beginning to show these ERP components. Discrepant results could also result from different re-referencing procedures (linked left and right mastoid in Jentschke et al.; common average in the present study), although this does not explain our previous findings of typical ERAN and N5 in adults (Corrigall & Trainor, 2014). Another possibility is that our use of an artifact-blocking algorithm (Fujioka et al., 2011) rather than a conventional trial rejection method (as in Jentschke et al., 2014) allowed us to include more data per child and exclude fewer children from the final analyses. As such, our results may be more representative of a typical child. Future research should examine 2.5- to 4.5-year-olds’ ERP responses to sequences that create stronger tonal expectations using an artefact-blocking algorithm to examine whether more mature responses can be elicited. It is likely that the developmental trajectory of the ERAN is not static, but rather dependent on aspects of the child (e.g., how much musical experience a child has accumulated), as well as aspects of the stimuli (e.g., how strongly the sequences create particular tonal expectations).
Our results suggest that implicit knowledge of complex musical structure is already evident in children as young as 3.5 years of age. This knowledge is likely acquired through daily, informal musical experience, such as listening to the radio or being sung to, suggesting that the development of musical knowledge is an integral part of childhood. Future research should examine the age at which an ERAN-like response emerges in order to identify the earliest point at which implicit knowledge of Western tonality can be detected. Research on the acquisition of key membership and harmony perception can help explain how infants and children naturally attend to and internalize statistical regularities in complex perceptual input.