Learning to read and play music written in standard notation, termed sight reading, is an important yet difficult aspect of early music education. However, the music contingency learning procedure produces rapid and robust early learning of the motor execution associated with note positions. In this task, nonmusicians identify a note name (e.g., “do”) written inside a note in one of the vertical positions of the musical staff with a keyboard response. Each note position is presented frequently with the matching (congruent) note name and rarely with the incongruent note names. The present work further explores this novel learning paradigm. In Experiment 1, we manipulated the proportion of congruent trials from 50 to 100%. The contingency effect, along with contingency awareness (i.e., verbalizable knowledge of note meanings), increased with a stronger contingency manipulation. In Experiment 2, half of the participants responded to the note positions (instead of the note names) with a keyboard response. A learning effect was also observed for this task, though contingency awareness was reduced in this group. These results shed more light on the properties of incidental music learning and further suggest more ideal parameters for future practical applications to supplement traditional instruction in real-world music education.
Introduction
There are many ways of representing music in a written form, some of which are easy but very instrument-specific (e.g., tablature for guitar or bass, or piano roll), but the more universal method (i.e., useable with all instruments) is standard notation. An example of standard notation is presented in the top panel of Figure 1. Standard notation represents various types of information on the staff (i.e., set of five lines), such as the key, tempo, and temporal durations of notes and rests, but for the present report we will focus on pitch. The vertical location of the note on the music staff, which we will refer to as the note position, indicates which note must be played or sung (i.e., do, ré, mi, etc.; or C, D, E, etc. in North American notation). The bottom panel of Figure 1 illustrates the note name “translations” of seven notes on the treble staff (in fixed-do solfège).
One important skill to acquire when learning music is sight reading, that is, the ability to “read” music in standard notation and rapidly produce the note on your instrument without prior practice (Lehmann & Kopiez, 2008; Sloboda, 2004; Wolf, 1976).1 A sufficiently skilled musician may therefore be able to look at a new piece of music written in standard notation and play the song while “reading” it for the first time. Sight-reading skill is important and is, indeed, one of the key competencies tested as part of an entrance exam to music conservatories. It is not easy, however, to learn how to sight read (Anderson, 1981; Hahn, 1985; Hubicki & Miles, 1991; Stewart et al., 2004). At least with standard musical training, it will take a novice musician several semesters of music training to automatize their music reading abilities to a sufficient degree to sight read (e.g., without having to laboriously search memory for the “translation” of each note and the corresponding action to execute). Musicians need sufficient practice seeing each note and producing it, similar to learning how to read a language or to automatize any other sort of skill (e.g., encoding chess positions on a chessboard; Saariluoma, 1994). Perhaps because standard music training does not typically involve such intense training, sight-reading abilities take a long time to develop.
Incidental Learning and Automaticity
In a recent report, however, we asked whether learning to sight read must take so long (Iorio et al., 2023). Often, repetition is all that is needed to automatize a skill. This automatization follows a practice curve that is so universal that it is often referred to as a “law” of human behaviour, the power law of practice (Logan, 1988; Newell & Rosenbloom, 1981; Snoddy, 1926; though the learning curve may actually be closer to an exponetial function; see Heathcote et al., 2000; Myung et al., 2000). Briefly, performance at a novel task is (unsurprisingly) initially very poor/slow, improves rapidly during early practice, and then continues to improve at ever-diminishing rates toward a performance asymptote. This is illustrated in Figure 2. The formula for a power function is k + axb, where k is the asymptote (i.e., the response time to which performance approaches with infinite practice), a is the difference between k and initial performance (i.e., how much performance improves from Trial 1 to asymptotic performance), b is a learning rate (i.e., which determines how steep the power function is), and x is the current trial counter.
A wealth of literature on human contingency learning indicates that we are more than capable of learning simple stimulus-action contingencies extremely rapidly and with little effort. This is observed, for instance, in research on incidental learning. In an incidental learning task, a regularity is present but participants are not given the explicit goal to learn this regularity. For instance, in the colour-word contingency learning procedure (Schmidt et al., 2007) participants identify the print colour of neutral words (e.g., “move” in blue). Each word is presented most often in one colour (e.g., “move” very frequently in blue, but rarely in other colours). Thus, a nontarget stimulus (word) is informative of the likely target stimulus (colour), but participants are asked to simply identify the colour and are (typically) not even informed about the regularities present in the task. Any learning of the regularities between the words and colours is therefore incidental to the target-identification goal of the task. Participants do learn these regularities, as indicated by robustly faster and more accurate responses to the frequent, or high contingency, pairings (e.g., “move” in blue) relative to the infrequent, or low contingency, pairings (e.g., “move” in red; for reviews, see MacLeod, 2019; Schmidt, 2021a, 2021b). The term implicit learning is also used in this context, though implicit learning is generally considered to be both incidental and unconscious.
Robust learning is observed in a matter of minutes or even seconds in a range of incidental learning procedures, including artificial language learning (Saffran, Aslin, et al., 1996; Saffran, Newport, et al., 1996), hidden covariation detection (Lewicki, 1985, 1986; Lewicki et al., 1992), sequence learning (Nissen & Bullemer, 1987), the Hebb digits task (Mckelvie, 1987), and the colour-word contingency learning task (Lin & MacLeod, 2018; Schmidt et al., 2010, 2020; Schmidt & De Houwer, 2016b). Importantly, although some learning of the associations between stimuli might be possible (Geukes et al., 2019), the response time effects primarily reflect the learning between the predictive nontarget stimulus (e.g., the word in the above example) and the response to make, now established in a variety of learning tasks (Miller, 1987; Schmidt et al., 2007; Schmidt & De Houwer, 2012b, 2016a). That is, participants learn to anticipate what action to execute (i.e., response to make) on the basis of the predictive but nontarget stimulus.
Training of Sight Reading
The reason why acquisition of sight-reading abilities is typically so slow is probably not due to any sort of inherent difficulty with the materials, but rather to suboptimal training of this particular skill. Indeed, training is typically explicit (Hébert & Cuddy, 2006) and extensive practice of the “translation” process is not usual. Rather, music practice often involves blocked repetition of music scores (Barry, 1992, 2007; Maynard, 2006; Rohwer & Polk, 2006). In other words, the musician learns an excerpt, then repeats it until mastery. This implies considerable procedural repetition, but limited sight-reading practice. Although blocked repetition is often encouraged and gives a subjective sense of improvement (Kornell & Bjork, 2008), it is less effective than interleaved practice, where musicians alternate between short practice sessions of several musical scores (Carter & Grahn, 2016; Stambaugh, 2011; Stambaugh & Demorest, 2010). Randomized focal practice of sight reading on its own is not the norm. Indeed, a survey of music instructors (Hardy, 1998) indicates that the majority of instructors do not even attempt to train sight reading, either because they do not know how or because they do not have good sight-reading skills themselves. A typical music semester might be more focused on mastering a small number of music pieces, which does not expose students the large amount of new and unfamiliar materials that would be needed to automatize sight reading.
Some research has been conducted to explore potential ways to improve sight reading. Results are somewhat inconsistent, with improvements often small and nonsignificant, depending on the learning approach (for a meta-analysis, see Mishra, 2014). Training normally takes place over several weeks at minimum to an entire academic year and again tends to focus on some type of blocked repetition. For example, Pike and Carter (2010) had participants practice only two eight-measure excerpts per testing day during the three-week training period. The improvements observed in their control and treatment groups were not remarkable.
Of course, traditional music instruction trains very important skills (including skills related to sight reading), and the present work does not aim to propose an alternative to traditional music instruction. Indeed, many such skills that are learned via traditional instruction are unlikely to benefit from the type of incidental learning explored in the current work (e.g., theory, technique, and expressivity). Rather, our goal is to explore to what extent learning about musical materials obeys similar types of principles (most importantly: rapid learning) as other types of learning with nonmusical materials.
As hinted at above, the standard narrative on sight reading seems to be that it is inherently laborious, quite unlike what has been observed with nonmusical materials. Indeed, this narrative cannot be discarded as necessarily false. Learning is often impaired if the regularity to learn is too complex. For instance, in Jiménez and colleagues (1996), participants performed a sequence learning task in which they responded to the location of a stimulus. The sequence of stimuli followed a semi-regular pattern (i.e., where the current stimulus location could be strongly predicted by previous stimulus locations). Learning immediate stimulus sequences (i.e., what stimulus should or should not follow the immediately preceding stimulus) was good, as was learning three-trial sequences (i.e., given the last two stimuli, which stimulus is likely on the current trial), but learning of four-trial sequences was absent. Similarly, the reproducibility of certain effects with hidden covariation tasks has been actively debated (Hendrickx, De Houwer, et al., 1997a, 1997b; Hendrickx, Eelen, et al., 1997; Hendrickx & De Houwer, 1997; Lewicki et al., 1997). In such tasks, the regularity is “hidden” in a rather complex stimulus display, such as pictures of women presented with text descriptions that focus on the concept of “kindness” or “capability” depending on the hair length of the pictured woman. Thus, it could similarly be proposed that standard notation involves too much complexity to be easily learned.
It is our hypothesis, however, that incidental learning could help. Although it is not the immediate goal of the current work to propose a direct application to real-world music instruction (e.g., much more research is needed; see the General Discussion), a longer-term hope is that our alternative approach might serve as a useful supplement to traditional instruction, aiding with the reinforcement of a particular skill. Aspects of sight reading, such as the rapid identification of note positions and execution of the corresponding response (the focus of the current work) is one such skill. In our prior report (Iorio et al., 2023), then, we proposed that important aspects of learning to sight read could occur just as quickly as in the various nonmusical learning paradigms mentioned above by using the right type of training procedure. Ultimately, the key skill to automatize is seeing a note (eventually: groups of notes) and translating it (Waters et al., 1997; Wolf, 1976), then retrieving the corresponding action and executing it (eventually: for the correct duration). We note that sight reading is a complex activity involving a combination of different abilities such as note identification/reading, rhythm, dynamics, and activation of the correct motor response. However, we are interested in assessing a paradigm which leads to fast learning and automatization of only some of these abilities (i.e., note reading and action execution). Thus, we do not test all aspects of sight reading in the current work (e.g., note timing or dynamics), but the common starting point of music reading and sight playing (see Footnote 1) is the recognition of musical notation (Schön et al., 2001).
Sight reading/playing further involves the execution of the actions indicated by this notation. Our work focuses on the learning of the recognition of the pitch names of note positions and the corresponding actions that need to be executed (e.g., what key to press on the piano). Our key notion is that seeing a series of random notes and responding to them as quickly as possible will allow for rapid automatization of this knowledge. The randomness is important to the extent that the participant needs to actively practice reading on every trial. In contrast, a musical extract is only useful once for the purpose of training sight reading: namely, the first time that the student sees it. Indeed, this is one of the difficulties identified by music instructors with teaching sight reading: one would need an enormous amount of novel scores to effectively teach sight reading (Hardy, 1998). Computerized tasks, of course, can generate an infinite series of novel materials.
Incidental Musical Learning Task
To create our learning task, we adapted the musical Stroop procedure (Grégoire et al., 2013, 2014a, 2014b, 2015, 2019; Grégoire & Poulin-Charronnat, 2019; for other variants, see Brodsky & Kessler, 2017; Crump et al., 2012; Drost et al., 2005a, 2005b; Stewart, 2005; Stewart et al., 2003; Zakay & Glicksohn, 1985). In this procedure, participants are presented with a note on the musical staff with a note name written inside of it. The task of the participant is to ignore the note position and identify/read the note name. As illustrated in Figure 3, the note name and position can be either congruent, with a matching note name and position (e.g., “fa” written inside the note for “fa”), or incongruent, with a mismatching note position and note name (e.g., “fa” written inside of the note for “ré”). Initial work with the musical Stroop task studied the automatic nature of note reading in experienced musicians, who are slower on incongruent relative to congruent trials (MacLeod, 1991; Stroop, 1935). In other words, the musical Stroop task is a useful tool for measuring the acquired bias to rapidly execute the appropriate response to the corresponding note stimulus presented on the music staff. Although the present work focuses on initial learning of sight-reading skill rather than the testing of pre-acquired skills, it is important to indicate that this sort of automatic influence of note positions on note name identification emerges gradually after years of music instruction.
Our adaptation (Iorio et al., 2023) was to create a musical learning task for training nonmusicians (who, of course, do not show a musical Stroop effect; e.g., Grégoire et al., 2013) to automatize sight reading.2 In our procedure, we introduced a contingency manipulation, illustrated in Table 1, where the congruent pairings were much more frequent than the incongruent pairings. For example, the “fa” note position was presented much more frequently with “fa” written inside of it than “do”, “ré”, “mi”, etc. To clarify, this means that congruent trials are also high contingency and incongruent trials are also low contingency, and both sets of terms will be used in the present report.3 In our learning task, the note positions were task-irrelevant (nontarget stimuli) but informative about the likely response. And the nonmusicians in our sample robustly learned these regularities, as indicated by a musical learning effect (i.e., low – high contingency response times), despite a relatively short training period (about 15 min). Indeed, learning was measured during the learning phase and was robust within this learning phase. As previously mentioned, this sort of learning procedure produces learning of the correspondences between the nontarget stimuli and the response to execute (e.g., key to press), and thus the response time effects reflect a learning of what actions to produce on the instrument for which note positions. The same learning effect was observed both in participants given the explicit instruction to try to learn the regularities while performing the task and in participants who were not informed about the regularities at all, comparable to findings with nonmusical materials (Schmidt & De Houwer, 2012a). In a subsequent phase, participants were asked to identify the note positions (rather than the note names, no longer presented) and were able to do this well above chance guessing. This indicates learning of the correspondences between note positions and note names (i.e., note reading). Thus, we were able to show that learning of some important aspects of early sight reading, namely, note position identification (Hubicki & Miles, 1991) and execution of a keyboard response (Emond & Comeau, 2013), can occur substantially faster than typically assumed possible with more traditional learning techniques.
Note Name | Note Position | ||||||
do | ré | mi | fa | sol | la | si | |
do | 18 | 1 | 1 | 1 | 1 | 1 | 1 |
ré | 1 | 18 | 1 | 1 | 1 | 1 | 1 |
mi | 1 | 1 | 18 | 1 | 1 | 1 | 1 |
fa | 1 | 1 | 1 | 18 | 1 | 1 | 1 |
sol | 1 | 1 | 1 | 1 | 18 | 1 | 1 |
la | 1 | 1 | 1 | 1 | 1 | 18 | 1 |
si | 1 | 1 | 1 | 1 | 1 | 1 | 18 |
Note Name | Note Position | ||||||
do | ré | mi | fa | sol | la | si | |
do | 18 | 1 | 1 | 1 | 1 | 1 | 1 |
ré | 1 | 18 | 1 | 1 | 1 | 1 | 1 |
mi | 1 | 1 | 18 | 1 | 1 | 1 | 1 |
fa | 1 | 1 | 1 | 18 | 1 | 1 | 1 |
sol | 1 | 1 | 1 | 1 | 18 | 1 | 1 |
la | 1 | 1 | 1 | 1 | 1 | 18 | 1 |
si | 1 | 1 | 1 | 1 | 1 | 1 | 18 |
Note: Numbers indicate the number of times that each note name is presented in each note position. Congruent pairings are presented much more often than incongruent pairings.
The Current Work
Long term, our goal is to develop a practical application of contingency learning research for novice and advanced musicians alike, such as a free-to-use learning app to supplement traditional music instruction. However, this is not the goal of the present research. Unlike the bulk of existing research on training sight-reading abilities that investigates the impact of extended training regimes on longer-term improvements in music students, the present work is more analogous to the approach of Reber (1967) in studying the incidental learning of grammatical rules in language (see the General Discussion for an extended discussion of this point). That is, we used simplified materials in naïve participants to determine what is learnable and what factors influence learning. Indeed, before imagining an eventual application, we need to determine the optimal conditions for learning. That is, while our initial report demonstrated that rapid learning of sight-reading abilities is possible, further improvement is likely possible. There are, of course, an infinite number of permutations of design factors that may help or hinder learning (see the General Discussion for other directions), but the present report focuses on two potentially important ones: contingency strength and task relevance.
In our previous study, we used a 75% contingency manipulation (see Table 1), meaning that each note position was presented 75% of the time with the congruent (or high contingency) note name and 25% of the time distributed across the remaining incongruent (or low contingency) note names. This was done for practical reasons. In particular, in order to measure the learning that did occur, high and low contingency trials could be contrasted. For instance, we would anticipate faster and/or more accurate responses for trials that obeyed the regularity (high contingency) relative to trials that violated the regularity (low contingency). Learning, of course, might be even stronger with a stronger contingency, as has been observed in some past work on nonmusical contingency learning (Forrin & MacLeod, 2018; Miller, 1987). This is not certain, however. For instance, nonperfect contingencies seem beneficial in (nonmusical) classroom settings (Hulac et al., 2016), and random reinforcement schedules produce more extinction-resistant operant conditioning (Ferster & Skinner, 1957). However, we do anticipate that learning will only continue to strengthen the stronger the regularity. In Experiment 1 of the current report, contingencies were manipulated across groups from 50% to 100% congruent in increments of 10%.
More theoretically, it is also important to investigate to what extent incidental learning phenomena (musical or otherwise) are influenced by contingency strength. It is even further interesting and currently poorly studied what the exact form of the relationship is between contingency strength and the size of the learning effect. In the study of Forrin and MacLeod (2018) using the colour-word contingency learning task, the authors did compare two models: (1) a linear model, in which the contingency effect increases linearly in magnitude with contingency proportion, and (2) a quadratic model. Their results did not clearly distinguish between these two models. As we will expand on in further detail in the General Discussion, however, a power function (or perhaps exponential function) seems more likely on the basis of models of automaticity.
In our previous work, the note name was the task-relevant (target) dimension, and the note position was the task-irrelevant (nontarget) dimension. This was done so that we could measure the automatic influences of the note positions on note name identification, similar in logic to the musical Stroop task. The note positions therefore served as an informative cue of the likely target stimulus in our prior studies. However, there are many relevant things to learn about music notation, some of which are illustrated in Figure 4. Arguably, the most important is to learn how to see the note positions on the staff and play them on the instrument. It may therefore make more sense to practice responding to the note positions directly, that is, as the task-relevant dimension.
Still, learning the note names for note positions is also important. For example, the action to perform for a given note position is specific to each instrument (e.g., the actions required to produce a given note on a piano or on a guitar are very different). The note name “meanings” of the note positions is universal and generalizable across instruments (i.e., the abstract “language” of music). It is possible that learning the name-position pairings will be better with note positions as the predictive (task-irrelevant) stimulus rather than as the target (task-relevant) stimulus. In our Experiment 2, we contrast these two types of learning. In Experiment 2, we also explore the degree to which performance improves with practice in accordance with the above-discussed power law of practice.
In both studies, we are especially interested in measuring the automatic influences of acquired knowledge on performance. In particular, if participants have learned the correspondences between note positions and the actions to perform for the note names, then response times should be faster to high contingency trials than to low contingency trials. For example, seeing the note position for “mi” should automatically bias a “mi” key response, facilitating performance if the correct response is, in fact, “mi” (congruent) and slowing responding if the correct response is not “mi” (incongruent). This contingency effect in response times (perhaps also in errors) may therefore increase with a stronger contingency (Experiment 1) or with the use of note-position targets (Experiment 2) if these manipulations enhance learning. We also took awareness measures in order to see both (a) whether the above-mentioned manipulations increase awareness of the contingency manipulation, and (b) whether participants are able to explicitly identify the note positions.
Experiment 1
In order to test the potential influence of contingency strength on learning, we created six groups of participants in Experiment 1 that experienced 50, 60, 70, 80, 90, or 100% contingency proportions. The hypotheses for the present work were the following. First, we anticipated that the contingency effect would increase in response times and errors with increasing contingency proportions. We anticipate relatively small contingency effects for weaker contingencies (e.g., 50%) and increasingly larger contingency effects for stronger manipulations. Indeed, by definition there must be some point at which a regularity is too weak to learn (e.g., no regularity at all), but whether the magnitude of the learning effect continues to increase all the way up to a perfect contingency is not yet clear. During the learning phase, of course, a contingency effect cannot be measured for the 100% contingency group (i.e., as there are no low contingency trials to contrast with high contingency trials). However, awareness can be assessed for this group.
Second, we anticipated that measures of contingency awareness would similarly reveal increases in awareness and the ability to explicitly identify note positions with a stronger manipulation. In particular, we measured subjective and objective awareness after the learning phase (Cheesman & Merikle, 1986). Subjective awareness refers to verbalizable knowledge of the presence of the contingency manipulation. We therefore described the contingency manipulation and asked participants whether they noticed that each note position was presented more frequently with a particular note name (see Methods for more details). Objective awareness refers to above-chance guessing of the regularities that were present. For each note position, the participant needed to indicate (guessing if necessary) what the corresponding note name was and to press the corresponding key. Awareness would be indicated by above-chance guessing of the position-name correspondences. The objective awareness test is also another measure of learning, only a more explicit one. Concretely, while the response time and error effects investigated in the learning phase may, wholly or in part, result from implicit/unconscious knowledge of the correct keyboard actions to execute for each of the note positions, objective awareness measures assess the ability of participants to explicitly/consciously identify the names of the note positions. Both subjective and objective awareness are anticipated to increase with a stronger contingency manipulation. Third, both awareness measures are hypothesized to correlate positively with the observed contingency effect in the response time and/or error performance measures (i.e., participants that were more aware of the contingency will produce a larger learning effect).
Finally, we will test the form of the relationship between the contingency proportion manipulation and the effect for all of the above-mentioned dependent measures. Our prediction is that the contingency effect should not only, more generally, increase with a stronger contingency manipulation, but further that this increase should follow a power function. In particular, effects should be particularly large for a perfect (or near perfect) contingency proportion, then should rapidly diminish in a decelerating function as the contingency strength is reduced (i.e., levelling off at a zero intercept at a null contingency manipulation). In contrast to this prediction, if an imperfect contingency does aid learning in some way, then we might anticipate some form of an inverted-U shaped function, whereby the contingency effect increases with a stronger contingency manipulation, but only up to some critical point (i.e., where the contingency is “too strong”) after which the contingency effect decreases again.
Methods
Participants
One hundred and ninety-one participants (after removal of 1 participant with high overall errors, see Data Analysis section) were recruited online via Prolific.co, completed the study online via Pavlovia.org, and were paid £2 for their participation. The entire experiment lasted less than 20 minutes. Participants were randomly assigned to one of the six contingency groups (50, 60, 70, 80, 90, or 100%). We aimed for a minimum of 28 participants per group based on rough power estimates on data from our prior study (Iorio et al., 2023), which would represent high power (.8) to detect an effect size as small as ηp2 = .055 for the interaction between contingency and group in response times. The final sample included 31, 29, 33, 37, 32, and 29 participants, respectively.4 All participants were native English speakers, were not musicians, and did not know how to read musical notation, as determined via the screening questions required prior to participation. Written consent was obtained before beginning the study. The study adhered to the Declaration of Helsinki.
Apparatus
The experiment was programmed in PsychoJS (i.e., JavaScript-converted PsychoPy experiment for data collection online) and was run online. The experiment worked on desktop or laptop computers. Responses for the practice, learning, and objective awareness phases were made with the D, F, G, H, J, K, and L keys for fa, sol, la, si, do, ré, and mi, respectively. Of course, we did not use a real instrument for this study, but these arbitrary keyboard responses are analogous to a piano keyboard response modality (i.e., with the same left-to-right key ordering). The Y (“yes”) and N (“no”) keys were used for the subjective awareness question.
Design and Procedure
For stimuli, we used the note positions illustrated in the bottom panel of Figure 1, that is, F4 (“fa”) to E5 (“mi”) and the corresponding note names (fa, sol, la, si, do, ré, and mi). As in our prior work, we used these particular note positions because they fit within the musical staff and therefore do not require additional notation (i.e., small line markings are added for notes above or below the main staff, as in the top panel of Figure 1). In all phases, stimuli were presented in black (RGB: 0,0,0) bold 48 pt Courier New font on a white (255,255,255) background, unless specified otherwise.
A depiction of the phases of Experiment 1 is presented in Figure 5. The experiment contained two practice phases, the main learning phase, and finally the awareness questions. In the initial practice phase, participants learned the key mappings for the target note names. On each trial, a fixation cross was presented for 500 ms, followed by the note name until a response was made (no time limit). If the participant made an error, the note name changed to red (255,0,0) and stayed on the screen until the participant made the correct response. There were 5 blocks of 7 trials. Each block contained one trial with each note name. Throughout the phase, the note names and the corresponding response key were presented in the bottom half of the screen. In particular, the names “fa” through “mi” were written in 32 pt Courier New font, x-axis centred and 250 pixels below fixation with five spaces between each. One line below were the corresponding response keys in uppercase (i.e., D F G H J K L). The second practice phase was identical in all respects to the first, except that the on-screen key reminders were removed and participants were asked to try to respond from memory. These practice phases were included only to familiarize participants with the stimulus-key mappings and results from these phases were not analysed.
In the main learning phase, participants were presented with a musical staff (an image of 700 × 500 pixels), which stayed on the screen throughout the procedure. Each trial began with an empty staff for 250 ms, followed by the note (an image of 100 × 60 pixels) presented in one of the seven positions for 250 ms. The note name was then added inside of the note itself and the participant had 3000 ms to respond. After a correct response, the next trial began immediately. Following erroneous responses and trials in which participants failed to respond in 3000 ms, the note name was replaced with “XXX” in red for 1000 ms. There were 420 learning trials for all groups of participants. Six proportion groups were created with congruent-to-incongruent trial ratios of: 5:5 (50% contingency), 6:4 (60%), 7:3 (70%), 8:2 (80%), 9:1 (90%), and 10:0 (100%). Table 2 presents the stimulus pairings for the 50% contingency condition, in which each congruent pairing is presented 30 times and each incongruent pairing 5 times. The number of repetitions of each congruent and incongruent pairing, respectively, for the remaining conditions was: 36 and 4 (60%), 42 and 3 (70%), 48 and 2 (80%), 54 and 1 (90%), and 60 and 0 (100%).
Note Name | Note Position | ||||||
do | ré | mi | fa | sol | la | si | |
do | 30 | 5 | 5 | 5 | 5 | 5 | 5 |
ré | 5 | 30 | 5 | 5 | 5 | 5 | 5 |
mi | 5 | 5 | 30 | 5 | 5 | 5 | 5 |
fa | 5 | 5 | 5 | 30 | 5 | 5 | 5 |
sol | 5 | 5 | 5 | 5 | 30 | 5 | 5 |
la | 5 | 5 | 5 | 5 | 5 | 30 | 5 |
si | 5 | 5 | 5 | 5 | 5 | 5 | 30 |
Note Name | Note Position | ||||||
do | ré | mi | fa | sol | la | si | |
do | 30 | 5 | 5 | 5 | 5 | 5 | 5 |
ré | 5 | 30 | 5 | 5 | 5 | 5 | 5 |
mi | 5 | 5 | 30 | 5 | 5 | 5 | 5 |
fa | 5 | 5 | 5 | 30 | 5 | 5 | 5 |
sol | 5 | 5 | 5 | 5 | 30 | 5 | 5 |
la | 5 | 5 | 5 | 5 | 5 | 30 | 5 |
si | 5 | 5 | 5 | 5 | 5 | 5 | 30 |
Note: 50% contingency illustrated. The number of congruent pairings is increased and the number of incongruent pairings is decreased in the other lists (see main text).
Note that congruent pairings were presented much more frequently (e.g., 30 times for each note position in the 50% contingency condition illustrated in Table 2) than incongruent pairings (e.g., 5 repetitions of each pairing in Table 2). Congruent trials were therefore high contingency and incongruent trials were low contingency. The main analyses compare high and low contingency response times (within-subject factor), also as a function of group (between-subject factor). Error percentages are also analysed in the same way to ensure that no speed-accuracy trade-offs were evident.
The awareness questions followed the learning phase. First, participants were probed for subjective awareness with the following question:
In the main part of the experiment, the note was presented in various positions on the musical staff. Each note position was presented most often with one note name written inside of it. In particular, the note position for “do” was presented most often with “do” written inside of it and rarely with other note names written inside of it. The same was true for each note position. Did you notice that each note position was presented most often with a particular note name?
Following this, there was an objective awareness phase. Each trial in this phase was similar to the learning phase, except that no note name was presented, the delay between trials was 500 ms, the on-screen response key (i.e., from the first practice phase) was presented, there was no accuracy feedback, and participants were encouraged to focus on accuracy rather than speed. Each of the seven note positions were presented once each. The instructions also informed participants that the correct response was the high contingency response. The instructions for this phase were:
In this final (very short) phase, you will now see a musical staff with a note on it in each trial. However, no note names will be presented this time. Instead, try to identify the note position with the correct note name label. The correct note name is the note name that the note position was presented with most often. You do not need to respond as quickly as in the previous phases. Focus instead on which response you think is correct. You will not, however, receive any feedback as to whether your response is correct or incorrect.
Chance guessing in the objective awareness phase is 1/7 or about 14% correct. Mean accuracy above this therefore indicates objective awareness. Awareness scores across study groups were investigated. Though not the main aim of the present work, we also investigated to what degree evidence of learning without awareness was observed (e.g., a learning effect in response times despite chance guessing performance in the objective awareness test).
Data Analysis
Raw CSV data files were first merged into a single dataset using CSVDataMerge (Schmidt, 2021c) and then analysed with R (R Core Team, 2018). We performed analyses on the learning phase for mean correct response times and mean error rates. Trials on which participants failed to respond before the 3000 ms deadline were excluded from both dependant measures. All statistical tests are two-tailed. Group was treated as a linear factor (i.e., with 1 degree of freedom) in the initial analyses. In subsequent analyses on all dependent measures, we compared a linear effect with power, exponential, and quadratic functions. Bayes factors were conducted with matched models using bayesfactor_inclusion from the bayestestR package for ANOVA results and with ttestBF from the BayesFactor package for t tests. BF10 is reported for tests with greater evidence for the alternative hypothesis and BF01 is reported otherwise. The nls function was used for curve fitting. For correlational analyses, Spearman’s ρ was used, as it is less sensitive to outliers than Pearson’s r. Data exclusion rules were minimalist: response times were not trimmed and participants would only be excluded if they showed near chance level guessing (1/7, around 14%) in the learning phase (> 80% errors), which only applied to one participant in the 80% group.5
Results
Learning Phase Response Times
The response time contingency effect (low − high contingency) data for the learning phase are presented in Figure 6. We first performed a contingency (high vs. low) by group (50, 60, 70, 80, and 90% contingency) ANOVA with contingency as a within-subjects factor and group as a linear between-groups factor. We note that the 100% contingency group cannot be included in this ANOVA, because there were no low contingency trials in this group. The ANOVA revealed a significant main effect of contingency, F(1, 160) = 23.009, MSE = 3775, p < .001, ηp2 = .13, BF10 > 1000, indicating faster overall response times to high relative to low contingency trials. The main effect of group was not significant, F(1, 160) = 0.246, MSE = 160962, p = .621, ηp2 < .01, BF01 = 1.6. Most importantly, the interaction between group and contingency was significant, F(1, 160) = 11.065, MSE = 3775, p = .001, ηp2 = .06, BF10 = 23, indicating that the contingency effect increased with a stronger contingency. The individual cell response times and test statistics are presented in Table 3.
Group . | High . | Low . | Statistic . |
---|---|---|---|
50% | 1123 | 1130 | t(30) = 0.584, SEdiff = 12, p = .564, η2 = .01, BF01 = 4.5 |
60% | 1106 | 1130 | t(28) = 2.811, SEdiff = 9, p = .009, η2 = .22, BF10 = 5.0 |
70% | 1167 | 1186 | t(32) = 1.415, SEdiff = 13, p = .167, η2 = .06, BF01 = 2.2 |
80% | 1114 | 1147 | t(36) = 2.599, SEdiff = 13, p = .013, η2 = .16, BF10 = 3.3 |
90% | 1039 | 1123 | t(31) = 3.503, SEdiff = 24, p = .001, η2 = .28, BF10 = 24 |
100% | 1132 | Not applicable |
Group . | High . | Low . | Statistic . |
---|---|---|---|
50% | 1123 | 1130 | t(30) = 0.584, SEdiff = 12, p = .564, η2 = .01, BF01 = 4.5 |
60% | 1106 | 1130 | t(28) = 2.811, SEdiff = 9, p = .009, η2 = .22, BF10 = 5.0 |
70% | 1167 | 1186 | t(32) = 1.415, SEdiff = 13, p = .167, η2 = .06, BF01 = 2.2 |
80% | 1114 | 1147 | t(36) = 2.599, SEdiff = 13, p = .013, η2 = .16, BF10 = 3.3 |
90% | 1039 | 1123 | t(31) = 3.503, SEdiff = 24, p = .001, η2 = .28, BF10 = 24 |
100% | 1132 | Not applicable |
Learning Phase Percentage Error
Next, we performed the same contingency by group ANOVA on the error data. The data are presented in Figure 7. The ANOVA revealed a significant main effect of contingency, F(1, 160) = 11.807, MSE = 8.2, p < .001, ηp2 = .07, BF10 = 36, indicating less errors to high relative to low contingency trials. The main effect of group was not significant, F(1, 160) = 0.611, MSE = 447.9, p = .436, ηp2 < .01, BF01 = 1.4. The interaction between group and contingency was only marginal, F(1, 160) = 3.726, MSE = 8.2, p = .055, ηp2 = .02, BF01 = 1.2, at least numerically consistent with a contingency effect that increased with a stronger contingency. The individual cell error percentages and test statistics are presented in Table 4.
Group . | High . | Low . | Statistic . |
---|---|---|---|
50% | 14.4 | 14.6 | t(30) = 0.459, SEdiff = 0.5, p = .649, η2 < .01, BF01 = 4.7 |
60% | 11.3 | 12.4 | t(28) = 2.122, SEdiff = 0.5, p = .043, η2 = .14, BF10 = 1.4 |
70% | 7.0 | 8.1 | t(32) = 2.349, SEdiff = 0.5, p = .025, η2 = .15, BF10 = 2.0 |
80% | 17.9 | 18.2 | t(36) = 0.427, SEdiff = 0.7, p = .671, η2 < .01, BF01 = 5.2 |
90% | 12.9 | 15.8 | t(31) = 2.771, SEdiff = 1.0, p = .009, η2 = .20, BF10 = 4.7 |
100% | 9.9 | Not applicable |
Group . | High . | Low . | Statistic . |
---|---|---|---|
50% | 14.4 | 14.6 | t(30) = 0.459, SEdiff = 0.5, p = .649, η2 < .01, BF01 = 4.7 |
60% | 11.3 | 12.4 | t(28) = 2.122, SEdiff = 0.5, p = .043, η2 = .14, BF10 = 1.4 |
70% | 7.0 | 8.1 | t(32) = 2.349, SEdiff = 0.5, p = .025, η2 = .15, BF10 = 2.0 |
80% | 17.9 | 18.2 | t(36) = 0.427, SEdiff = 0.7, p = .671, η2 < .01, BF01 = 5.2 |
90% | 12.9 | 15.8 | t(31) = 2.771, SEdiff = 1.0, p = .009, η2 = .20, BF10 = 4.7 |
100% | 9.9 | Not applicable |
Contingency Awareness
The subjective and objective awareness data are presented in Figure 8. A linear regression with group (50, 60, 70, 80, 90, and 100%) as a predictor of subjective awareness revealed a significant effect of group, t(189) = 3.344, β = 0.05975, SE = 0.01787, p < .001, η2 = .06, BF10 = 26, as did an identical regression on the objective awareness data, t(189) = 4.525, β = 0.06141, SE = 0.01357, p < .001, η2 = .10, BF10 > 1000. These results indicate an increase in contingency awareness with a stronger manipulation. Objective awareness was significantly above chance guessing (1/7 or about .143, indicated by dashed line in Figure 8) in all groups, as shown in Table 5.
Group . | Subjective . | Objective . | Objective Awareness Statistic . |
---|---|---|---|
50% | .065 | .249 | t(30) = 2.286, SE = .046, p = .029, η2 = .15, BF10 = 1.8 |
60% | .207 | .281 | t(28) = 2.794, SE = .049, p = .009, η2 = .22, BF10 = 4.8 |
70% | .152 | .312 | t(32) = 3.714, SE = .045, p < .001, η2 = .30, BF10 = 40 |
80% | .243 | .317 | t(36) = 3.235, SE = .054, p = .003, η2 = .23, BF10 = 13 |
90% | .344 | .415 | t(31) = 4.736, SE = .058, p < .001, η2 = .42, BF10 = 521 |
100% | .379 | .606 | t(28) = 6.153, SE = .075, p < .001, η2 = .57, BF10 > 1000 |
Group . | Subjective . | Objective . | Objective Awareness Statistic . |
---|---|---|---|
50% | .065 | .249 | t(30) = 2.286, SE = .046, p = .029, η2 = .15, BF10 = 1.8 |
60% | .207 | .281 | t(28) = 2.794, SE = .049, p = .009, η2 = .22, BF10 = 4.8 |
70% | .152 | .312 | t(32) = 3.714, SE = .045, p < .001, η2 = .30, BF10 = 40 |
80% | .243 | .317 | t(36) = 3.235, SE = .054, p = .003, η2 = .23, BF10 = 13 |
90% | .344 | .415 | t(31) = 4.736, SE = .058, p < .001, η2 = .42, BF10 = 521 |
100% | .379 | .606 | t(28) = 6.153, SE = .075, p < .001, η2 = .57, BF10 > 1000 |
The correlations are shown in Table 6. There was a significant positive correlation between subjective and objective awareness. Both subjective and objective awareness significantly correlated in the positive direction with the response time and error contingency effects. The response time contingency effect was significant for subjectively unaware (19 ms), t(128) = 3.257, SE = 6, p = .001, η2 = .08, BF10 = 14, and subjectively aware participants (93 ms), t(32) = 3.886, SE = 24, p < .001, η2 = .32, BF10 = 62. The error contingency effect was marginal for subjectively unaware participants (0.5%), t(128) = 1.766, SE = 0.3, p = .080, η2 = .02, BF01 = 2.3, and significant for subjectively aware participants (3.4%), t(32) = 3.681, SE = 0.9, p < .001, η2 = .30, BF10 = 37. Given the correlations between objective awareness and the contingency effects, intercept analyses were conducted to calculate the size of the contingency effect at chance guessing in the objective awareness test. The intercept analysis indicated that the contingency effect was numerically positive, but not significant in the response time (12 ms), t(160) = 1.653, SE = 8, p = .100, η2 = .02, BF01 = 1.9, and error data (0.4%), t(160) = 1.009, SE = 0.4, p = .315, η2 < .01, BF01 = 5.8. Thus, unlike the subjective awareness data, there was no (significant) evidence of a contingency effect in the absence of objective contingency awareness.
. | Objective awareness . | Response time effect . | Error rate effect . |
---|---|---|---|
Subjective awareness | .371 p < .001 | .218 p = .005 | .180 p = .022 |
Objective awareness | .225 p = .004 | .239 p = .002 |
. | Objective awareness . | Response time effect . | Error rate effect . |
---|---|---|---|
Subjective awareness | .371 p < .001 | .218 p = .005 | .180 p = .022 |
Objective awareness | .225 p = .004 | .239 p = .002 |
Note: Degrees of freedom = 189 for all tests.
Curve Fitting
In the preceding sections, group was treated as a linear factor to determine whether the learning effect increased with stronger contingency manipulations. However, a linear effect of contingency group is unlikely to be the correct function for such a factor. Although the results clearly do not support a diminishing effect with too strong of a contingency (e.g., an inverted-U shaped function where the contingency effect increased up to a certain contingency proportion, then reverses with an even stronger manipulation), there are hints that the effect increases more rapidly the stronger the contingency, as we initially predicted. In a similar task with nonmusical materials, Forrin and MacLeod (2018) tested for a quadratic fit, that is ax2 + bx + c, and found little difference with a linear model. A quadratic model was also tested on the current dataset, though we note that we only present results for a quadratic model with an intercept, c, fixed at 0, or ax2 + bx. This was because (a) this simpler model always fit better and (b) a model with the added intercept fit rather nonsensical values for the intercept (e.g., a 134 ms contingency effect with a null contingency manipulation in the RT data). In addition to a linear and quadratic model, we also tested a power function, axb (where x = 101 – group contingency percentage), and exponential function, ebx (where x = 100 – group contingency percentage). Note that both of these latter functions could theoretically have an added intercept, k, however we only present the simpler models because (a) a priori the intercept should be zero (i.e., no contingency manipulation = no contingency effect), and (b) the more complex model either had worse fit or did not fit at all.
Table 7 presents the AIC values for each of the four models for all dependent measures (note that, given equivalent number of degrees of freedom for all models, BIC differences were always identical). With the exception of subjective awareness, the power function fit the data best. The power function fit line is presented in the corresponding figures above. Relative to the linear model, the power function fit the data better for response times (AIC/BIC difference: 3.78), error rates (2.77), and objective awareness (5.08). For subjective awareness, the power function did not fit well (-3.52), and the remaining two models were largely indistinguishable from the linear model. A caveat with the linear fit, however, is that the intercept was rather nonsensical (-13.3% aware at a null contingency).
. | Response time effect . | Error rate effect . | Subjective awareness . | Objective awareness . |
---|---|---|---|---|
Linear | 1910.26 | 917.78 | 206.64 | 101.57 |
Power | 1906.48 | 914.01 | 210.16 | 96.49 |
Exponential | 1907.16 | 915.56 | 206.81 | 99.10 |
Quadratic | 1908.84 | 916.28 | 206.63 | 101.07 |
. | Response time effect . | Error rate effect . | Subjective awareness . | Objective awareness . |
---|---|---|---|---|
Linear | 1910.26 | 917.78 | 206.64 | 101.57 |
Power | 1906.48 | 914.01 | 210.16 | 96.49 |
Exponential | 1907.16 | 915.56 | 206.81 | 99.10 |
Quadratic | 1908.84 | 916.28 | 206.63 | 101.07 |
Discussion
The results of Experiment 1 replicated earlier observations of a musical learning effect in nonmusicians. Novel to the present experiment, we also observed, as hypothesized, that the size of this contingency effect increased the stronger the contingency manipulation. That is, participants learn to automatize the keyboard response for each note position and this learning effect is augmented with a stronger manipulation of the regularities in the task. Indeed, while numerically positive in all groups, the effect was not significant in response times or errors of the group with the weakest (50%) contingency manipulation and was the largest in the group with the strongest (90%) contingency manipulation (i.e., excluding the 100% contingency group, for which a learning effect cannot be calculated). Curve fitting analyses further indicated that this increase was best modelled with a power function, coherent with the power law of practice discussed in the Introduction (a point which we will expand upon in the General Discussion). As mentioned in the Introduction, there are certainly reasons for using imperfect contingencies when studying learning (viz., to contrast high and low contingency trials as a measure of learning), but a stronger (or even perfect) contingency is most ideal for producing robust learning (e.g., for an eventual practical application).
Although our main goal was to study the automatization of actions to note positions, our awareness data were also consistent with our hypotheses. Subjective and objective awareness were also observed to increase with a stronger contingency manipulation, again smallest in the 50% group and largest in the 100% group. Curve fitting again indicated that a power function best fit the objective awareness scores. Data for subjective awareness were a bit more ambiguous, however. Notably, the power function did not fit the data well, and the linear model was largely indistinguishable from exponential and quadratic fits. In any case, the objective awareness data indicate that participants developed verbalizable knowledge of the note position meanings. The contingency effect was significant in response times for subjectively unaware participants (also marginally in the errors), though evidence of a contingency effect independent of objective awareness was less clear. At minimum, clear influences of contingency awareness were observed on both the response time and error contingency effects.
Experiment 2
In Experiment 2, we investigate task relevance. Two groups of participants were created. The name-targets group were presented with task-relevant note names and task-irrelevant note positions (as in Experiment 1), whereas the position-targets group, novel to the present experiment, were presented with task-relevant note positions and task-irrelevant note names. We used a 90% contingency for both groups, given that this proportion worked best in Experiment 1. Of course, the new position-targets group does not allow for a test of the automatic influences of note positions on performance, but rather the reverse (i.e., automatic influences of note names on note position identification performance). Indeed, this is the very reason that we started this work with the name-targets manipulation, as it allows us not only to assess learning, but also automaticity. That is, we can test whether participants can learn which key corresponds to which note position, for instance in a nonspeeded verbal report, but this is arguably rather trivial. More interesting is to establish whether, after brief learning, the learned association between the note position and response is sufficiently strong that the note position, even though the task-irrelevant stimulus, rapidly retrieves and automatically biases the corresponding response.
We note that there are three related but distinct theoretical questions that we could ask about the two versions of the task. First, do both groups learn and automatize the correspondences between the task-irrelevant stimulus (names or positions, depending on the group) sufficiently to produce automatic effects on response times and errors? Thus, the first goal of the present study is to determine whether note names as predictive stimuli for target note positions is equally effective in producing learning effects of these name-position associations as the reverse. We do anticipate a priori that a learning effect will be observed in both groups. That is, we also expect automatic influences of the note names on note position identification. A “reverse musical Stroop effect” like this is observed in experienced musicians (Grégoire et al., 2014b, 2019).
Second, do both groups acquire equally well verbalizable knowledge about the position-to-note correspondences (i.e., “music reading”, see Footnote 1)? In this vein, we will explore whether subjective and objective awareness is similar in the two groups. We note in advance that this is a theoretically interesting but less central aspect of an incidental learning procedure like the present one. As previously discussed, the main advantage of an incidental learning task is that it allows rapid automatization of actions/responses to stimuli, which is particularly useful for learning to sight read (or “sight play/sing”; see Footnote 1). Incidental learning may or may not produce verbalizable knowledge. It seems rather unlikely to us that incidental learning would be the optimal way to acquire verbalizable knowledge. Still, it is interesting to investigate to what extent verbalizable knowledge is reinforced more or less in each group. After the learning phase, subjective and objective awareness tests will also measure explicit knowledge acquired about the correspondences between the note positions and note names. Although we do anticipate sensitivity to the contingency in both groups, it is conceivable that participants will acquire less (verbalizable) knowledge about the position-to-name correspondences in the position-targets group. If so, including name-targets training for, at minimum, some proportion of training would remain useful in acquisition.
Of course, awareness of the associations between the target stimuli (i.e., task-relevant dimension) and the action (i.e., keypress response) will likely be at ceiling (i.e., participants should be able to indicate with near 100% accuracy the correct response action for each target stimulus after responding directly to the stimulus for several dozen trials). Given this, training with the note positions as the task-relevant dimension (as in our position-targets group) might seem inherently better than presenting them as the task-irrelevant stimulus (i.e., because the principal goal is to learn how to look at the note positions and play, rather than to identify the note name).
It could alternatively be the case that this assumption is wrong and that learning position-to-action correspondences is inherently hard and difficult to master (i.e., as the standard narrative on sight reading would suggest). If so, then participants may fail to automatize the key mappings in the position-targets group, the third major question of Experiment 2. That is, the mapping of the note positions to keys may be too difficult to keep straight and response times and/or error rates might fail to reduce following the typical practice curves discussed in the Introduction. Hypothetically, then, learning position-to-action mappings indirectly while responding to the easier-to-identify note names (i.e., in the name-targets group) might be more efficient than trying to practice responding to the note positions directly (i.e., in the position-targets group). We do find this alternative prediction unlikely, however. To explore this in the current experiment, we performed another type of curve fitting analysis. Specifically, we wanted to explore to what extent it is difficult to respond to note positions and note names and how rapidly participants improve with practice. Moreover, we further wanted to explore whether performance-based improvements follow the standard power law observed for a wide range of other tasks (i.e., rapid initial improvements, followed by continued but ever-decreasing improvements). To do this we fit practice curves to the practice phases of the two groups.
Methods
Participants
Seventy-three participants were recruited online in the same manner as in Experiment 1, except that there were only two groups (name-targets and position-targets). The final sample included 28 and 44 participants, respectively, after one excluded participant (see Data Analysis section).6
Apparatus, Design, and Procedure
The apparatus, design, and procedure of Experiment 2 were identical to Experiment 1 with the following exceptions. A depiction of the phases for the position-targets group is presented in Figure 9. The on-screen key reminder (applicable to only certain phases), previously only text in Experiment 1, was replaced with an 845 × 118 px image file in Experiment 2, x-axis centred and 270 px below the y-axis centre. For the name-targets group, the key reminder was near identical to the text in Experiment 1. In the position-targets group, the note names in the key reminder were replaced with narrow images of the five staff lines with a note placed in one of the seven relevant positions. All other stimuli were shifted up 35 pixels (relative to Experiment 1) to accommodate for a slightly larger on-screen key reminder (which needed to be larger to fit the staff images), also in the phases without a key reminder (for consistency).
During the learning phase, the note position did not appear in advance of the note name, as this would have created an inequivalence between the name-targets and position-targets group in terms of the relative onset of the task-relevant and -irrelevant information. Thus, instead of a 250 ms blank staff followed by the staff with a note for 250 ms, the staff was left blank for 500 ms before the target. After errors, instead of replacing the note name with “XXX” in red, both the note name and note form remained on the screen but changed in colour to red. In all other respects, the name-targets condition was identical to the 90% contingency group in Experiment 1.
The position-targets group was conceptually identical to the name-targets group, except with a different task-relevant stimulus. Thus, in the practice phases, instead of presenting a fixation and note name, the staff was presented with a note position. In other words, the practice phase looked identical to the learning phase without the note names. As in the other group, however, there was no time limit to respond during practice and errors needed to be corrected. After an incorrect response, the note changed colour to red and stayed on the screen until the participant made the correct response. The learning phase was identical to that for the name-targets group, with the sole exception, of course, that the note position was the task-relevant stimulus and therefore determined the correct response.
As in Experiment 1, the main analyses compare high and low contingency response times (within-subject factor), also as a function of group (between-subject factor). The only design difference, of course, is that there were only two groups in the present experiment (name-targets vs. position-targets). Error percentages were again assessed to ensure that no speed-accuracy trade-offs were evident.
The awareness phases were also identical to those in Experiment 1, except that for the position-targets group the task was reversed in the objective awareness phase. That is, instead of a staff with a note position and note name response key (name-targets group), there was a fixation followed by note name with the position response key. This therefore looked just like the practice phase in the name-targets group (without feedback, of course). Note that the two versions of the objective awareness test effectively test the same knowledge (i.e., note name-position pairings), but it made more sense to maintain the target-response mappings from the initial practice and learning phases for each group. Indeed, asking participants in the position-targets group to identify note positions with note-name labelled keys would represent an inherent confound, as participants could largely ignore the note name labels and continue responding to the note positions with the same keys as they did in the learning phase. Of course, the instructions were also adjusted appropriately for position-targets group. Awareness scores were again compared across study groups.
Data Analysis
Data treatments were identical to those in Experiment 1. One participant was excluded from the analyses based on the accuracy criterion. The practice curve analysis was conducted with the nlmer function of lme4 in R with a random intercept for subjects.
Results
Learning Phase Response Times
The response time data for the learning phase are presented in Figure 10. We first performed a contingency (high vs. low) by group (name-targets vs. position-targets) ANOVA with contingency as a within-subjects factor and group as a between-groups factor. The ANOVA revealed a significant main effect of contingency, F(1, 70) = 21.732, MSE = 5461, p < .001, ηp2 = .24, BF10 > 1000, indicating faster overall response times to high relative to low contingency trials. The main effect of group was also significant, F(1, 70) = 5.789, MSE = 117130, p = .019, ηp2 = .08, BF10 = 2.2, indicating faster overall responses in the position-targets group (albeit with weak Bayesian evidence). The interaction between group and contingency was not significant, F(1, 70) = 0.052, MSE = 5461, p = .821, ηp2 < .01, BF01 = 3.4. Despite the lack of an interaction, we conducted separate t tests on each group. The contingency effect was significant in the name-targets group (high: 1091 ms; low: 1147 ms; effect: 56 ms), t(27) = 2.193, SEdiff = 26, p = .037, η2 = .15, BF10 = 1.6, and in the position-targets group (high: 947 ms; low: 1009 ms; effect: 62 ms), t(43) = 5.158, SEdiff = 12, p < .001, η2 = .38, BF10 > 1000.
Learning Phase Percentage Error
Next, we performed the same contingency by group ANOVA on the error data. The data are presented in Figure 11. The ANOVA revealed a significant main effect of contingency, F(1, 70) = 6.378, MSE = 30.8, p = .014, ηp2 = .08, BF10 = 3.2, indicating less errors to high relative to low contingency trials. The main effect of group was not significant, F(1, 70) = 0.079, MSE = 344.2, p = .779, ηp2 < .01, BF01 = 2.2, nor was the interaction between group and contingency, F(1, 70) = 0.227, MSE = 30.8, p = .635, ηp2 < .01, BF01 = 4.0, with moderate evidence of a true null. We nevertheless conducted separate t tests on each group. The contingency effect was numerically larger but not significant in the name-targets group (high: 14.0%; low: 16.9%; effect: 2.8%), t(27) = 1.674, SEdiff = 1.7, p = .106, η2 = .09, BF01 = 1.5, and only marginal in the position-targets group (high: 13.6%; low: 15.5%; effect: 1.9%), t(43) = 1.835, SEdiff = 1.1, p = .073, η2 = .07, BF01 = 1.3.
Contingency Awareness
The subjective and objective awareness data are presented in Figure 12. A Welch two-sample t test on subjective awareness did not reveal an effect of group, t(55) = 0.634, SE = .118, p = .529, η2 < .01, BF01 = 3.4, with moderate evidence for a true null. The proportion of participants that reported being subjectively aware was .393 (11 of 28) in the name-targets group and .318 (14 of 44) in the position-targets group. A similar t test on the objective awareness data revealed a significant effect of group, t(53) = 2.336, SE = .061, p = .023, η2 = .09, BF10 = 2.7, indicating greater awareness of the contingency in the name-targets group. Objective awareness was significantly above chance guessing (1/7 or about .143) in the name-targets group (.393), t(27) = 5.037, SE = .050, p < .001, η2 = .48, BF10 = 845, and in the position-targets group (.250), t(43) = 3.000, SE = .036, p = .004, η2 = .17, BF10 = 7.9.
The correlations are presented in Table 8. There was a significant positive correlation between subjective and objective awareness. Apart from the significant positive correlation between objective awareness and the response time contingency effect, the awareness measures did not correlate with the learning phase contingency effects. The response time contingency effect was significant for subjectively unaware (52 ms), t(46) = 3.335, SE = 15, p = .002, η2 = .19, BF10 = 18, and subjectively aware participants (74 ms), t(24) = 3.729, SE = 20, p = .001, η2 = .37, BF10 = 33. The error contingency effect was not significant for subjectively unaware participants (2.0%), t(46) = 1.649, SE = 1.2, p = .106, η2 = .06, BF01 = 1.8, but was for subjectively aware participants (2.9%), t(24) = 2.072, SE = 1.3, p = .049, η2 = .15, BF10 = 1.3. As in Experiment 1, intercept analyses were conducted on the objective awareness data. The intercept for the response time contingency effect was positive and significant (40 ms), t(70) = 2.857, SE = 14, p = .006, η2 = .10, BF10 = 24. The intercept in the error data was positive but not significant (1.7%), t(70) = 1.544, SE = 1.1, p = .127, η2 = .03, BF01 = 1.5. Thus, there were some hints, particularly in response times, for a contingency effect in the absence of contingency awareness.
. | Objective awareness . | Response time effect . | Error rate effect . |
---|---|---|---|
Subjective awareness | .267 p = .023 | .084 p = .486 | .115 p = .336 |
Objective awareness | .333 p = .004 | .092 p = .442 |
. | Objective awareness . | Response time effect . | Error rate effect . |
---|---|---|---|
Subjective awareness | .267 p = .023 | .084 p = .486 | .115 p = .336 |
Objective awareness | .333 p = .004 | .092 p = .442 |
Note: Degrees of freedom = 70 for all tests.
Practice Effects
In a final analysis, we tested the extent to which automatization of name-response and position-response associations follow standard laws of practice. We conducted these analyses on the two practice phases of the experiment, separately for the name-targets and position-targets groups. Mean reaction time for each of the 10 practice blocks were calculated. Four participants were removed from the sample due to missing cells (i.e., no correct response times in a block). First, we compared two models. The first was a simple linear model, that is, with an intercept and a slope, k + ax, where k is the intercept, a is the slope, and x is the block number (coded 0-9). The second was a power function, k + axb, where k is the intercept (i.e., the response time to which performance decreases with infinite practice), a is the difference between initial performance and the intercept, b is a learning rate, and x is the block number (coded 1-10). Again, the linear model is almost by definition incorrect (e.g., because with any negative slope it would assume the response times will eventually become negative with sufficient practice) but serves as a baseline for comparing to the power fit. As shown in Table 9, the power function fit better than a linear function in all comparisons. In the name-targets group, the AIC was lower than in the linear model (difference: 17.5), as was the BIC (14.0). This was even clearer for the position-targets group (37.5 and 33.4, respectively).
Name-targets | Position-targets | |||
AIC | BIC | AIC | BIC | |
Linear | 3870.8 | 3885.1 | 6672.5 | 6688.6 |
Power | 3853.3 | 3871.1 | 6635.0 | 6655.2 |
Name-targets | Position-targets | |||
AIC | BIC | AIC | BIC | |
Linear | 3870.8 | 3885.1 | 6672.5 | 6688.6 |
Power | 3853.3 | 3871.1 | 6635.0 | 6655.2 |
Figure 13 presents the response times by practice block for both conditions. Coherent with the above analyses, a power function speedup is observed across blocks for both groups. Interestingly, however, this effect is notably more drastic in the position-targets group. Indeed, participants were much slower at the start of the practice phase in the position-targets group (742 ms difference) but improved considerably more. This can be observed in the parameters of the above-mentioned models. Notably, the a parameter, which estimates improvement from initial performance to asymptotic performance, was over four times larger in the position-targets group (1581 ms) relative to the name-targets group (455 ms). Similarly, asymptotic performance, k, is over twice as fast for the position-targets group (750 ms) than in the name-targets condition (1581 ms). It might also be noted that there are some visual trends around the separation between the first practice phase (left of dashed line in figure) and the second practice phase (right of dashed line), notably a “reacceleration” in the position-targets group after the pause and restart cost in the name-targets group. However, we did not fit each practice phase separately as there is some meaningful risk of overfitting a three-parameter power function to only five condition means. For the same reason, these visual patterns should probably be interpreted with caution.
Discussion
As hypothesized, the results of Experiment 2 indicated that incidental learning occurs both with note position nontargets and note name targets, as in Experiment 1, and with note name nontargets and note position targets, novel to Experiment 2. That is, participants learned and automatized the associations between nontarget note names and responses, similar to what we previously observed with nontarget note positions. The learning effect was roughly comparable in both groups. In the learning phase, participants were overall faster in the position-targets group, likely due to a compatibility in the spatial left-to-right organisation of responses and the spatial down-to-up organisation of note positions. Of course, this compatibility between standard notation and responding also exists with a piano, though not with some other types of instruments (e.g., wind). Note, however, that responding to note positions was initially more difficult during practice but was automatized much more rapidly.
Interestingly, objective awareness of the contingencies was higher in the name-targets group (.250 above the chance guessing rate; .393 − .143 = .250) than in the position-targets group (.250 − .143 = .107), a significant 133% increase (.250 / .107 = 2.33). There was also a nonsignificant 24% increase in subjective awareness (.393 / .318 = 1.24). Although not the main focus of this experiment, these results seem to indicate that acquiring verbalizable music reading skills occurs more easily with nontarget note positions.7 Though not the main interest of the present report, we also found some evidence for learning without awareness, with a significantly positive response time contingency effect intercept in the objective awareness data along with a significant contingency effect for subjectively unaware participants. Globally, the correlations between the learning effect and awareness seemed to be weaker in Experiment 2.
Curve-fitting analyses on the practice phases of the two groups indicated performance perfectly coherent with the power law of practice. Interestingly, these results also indicated that performance was initially much poorer in the position-targets group but improved much more dramatically. That is, it is harder to respond to the position targets initially, but improvement with practice is rapid. Evidently, performance only continued to improve in the learning phase where response times were even faster (i.e., continuing to approach the asymptote implied by our curve fitting analyses of the practice phase).
General Discussion
In the present article, we reported results from two experiments with nonmusicians. In the first, we manipulated the strength of the contingency manipulation from 50% congruent to 100% congruent. We observed that the contingency effect, in response times and errors, increases with a stronger manipulation, as does subjective and objective awareness of the note position meanings. In the second experiment, we tested whether participants could also incidentally learn with task-relevant note positions and task-irrelevant note names. Here, we observed a robust contingency effect in response times for participants in this position-targets group. That is, participants incidentally learned the keyboard actions required for the note name meanings while responding to note positions, and this acquired knowledge had automatic influences on note position identification. This effect was comparable in magnitude to the effect in the name-targets group. However, subjective awareness was numerically (but not significantly) reduced in the position-targets group and objective awareness was significantly reduced. It would therefore seem that verbalizable knowledge of note name-position correspondences works best with name-targets.
As mentioned in the Introduction, the goal of the present series of experiments was not only to understand what kind of learning can occur with musical materials, but also to think about optimal strategies for future practical applications to music education. The present results speak to both of these goals. Of course, some of the design decisions in our initial work (Iorio et al., 2023) were motivated by practical concerns. For instance, imperfect contingencies were used in order to have a measure of learning. In order to have a measure of what was learned, it is necessary to have events consistent with the to-be-learned regularity (high contingency) and events inconsistent with the regularity (low contingency), such that the two can be compared. The same, of course, applies to all learning procedures. For example, in artificial grammar learning (Reber, 1967) “grammatical” test items consistent with the artificial grammar are compared to ungrammatical test items, which are inconsistent with the artificial grammar. In real language learning, of course, only the “real” grammar needs to be learned (and tested). In Experiment 1, we observed that the contingency effect increases with a stronger manipulation (cf., Forrin & MacLeod, 2018). Although the contingency effect cannot be measured in the 100% contingency group, due to the lack of incongruent trials, the awareness data and the global trend in the response times and errors during the learning phase suggest that a 100% contingency is optimal in practical applications.
Relatedly, our original work used task-relevant note names and task-irrelevant note positions. This was because we wanted to measure the automatic influences of note positions on keyboard responding during note name identification. When the task is reversed (i.e., note positions as the targets), this is no longer the case. Instead, the automatic influences of the note names on keyboard responses during note position identification are measured. Practically speaking, however, a new musician might be most interested in automatizing actions to note positions (e.g., which key to press based on the note position). Learning this position-to-action mapping directly with a task-relevant note position might therefore seem ideal. Indeed, response times were rather rapid in the position-targets condition and accuracy was near ceiling. Interestingly, though, responding was initially much slower in the position-targets group, but was automatized rapidly, as illustrated in our practice curve analyses. Learning of the abstract/semantic meaning of the note positions (i.e., what note name to “read” a note position as) is also relevant, however. Experiment 2 demonstrated that some learning of the name-position pairings also occurs with task-relevant note positions, just not as much as in the name-targets condition. Though the response time and error learning effects did not differ significantly in the learning phase between the name-targets and position-targets groups, objective awareness of the name-position pairings was greater in the name-targets group. Practically, then, both types of training might be useful for learning to read standard notation.8
Though not the main question of the current report, some results suggest (albeit inconsistently) that learning without awareness might be possible. In both experiments, evidence of a contingency effect was present for contingency unaware participants. For objective awareness, a contingency effect in response times was observed even at chance guessing (i.e., with the intercept analysis) in Experiment 2 (a similar effect was observed in an evaluative conditioning version of the colour-word contingency learning task; Schmidt & De Houwer, 2019), whereas in Experiment 1 the effect (though positive numerically) was not significant.
The term implicit learning is not always used consistently, though often refers to learning without awareness (which may or may not be unintentional), to unintentional learning (which may or may not produce conscious knowledge), or to both (Berry & Dienes, 1993; Cleeremans et al., 1998; Perruchet, 2019; Perruchet & Pacteau, 1990; Reber, 1967, 1989; Shanks, 2005). In the present work, we were more interested in incidental learning, that is, learning about a regularity despite the lack of an explicitly instructed goal to do so. In our case, the explicit goal is to identify note names (or note positions) and any learning about the task-irrelevant dimension is therefore incidental to this goal. Implicit learning has been studied with musical materials (for a review, see Rohrmeier & Rebuschat, 2012). For instance, research has studied the implicit learning of melody with tone sequences created from an artificial grammar or language (e.g., Saffran et al., 1999, 2000; Tillmann & Poulin-Charronnat, 2010). A similar logic has been used to study the learning of harmony (Bly et al., 2009; Loui et al., 2009; Rohrmeier & Cross, 2009), timbre (Bigand et al., 1998), and temporal sequences (Brandon et al., 2012; Salidis, 2001; Schultz et al., 2013; Tillmann et al., 2011). This prior work focuses on learning about musical materials that one listens to. Curiously, similar work has not been conducted on implicit (or even incidental) learning of the performance aspects of music, that is, learning to play. Future research might explore the implicitness of learning in more detail. Relatedly, there are many other features of automaticity (Moors & De Houwer, 2006) that might be studied in future research, such as whether attention or cognitive resources are needed to support learning.
As mentioned in the Introduction, sight reading is often considered to be a difficult-to-master skill. However, as also mentioned, part of the reason for this may be the lack of extensive practice. Attempting to master a piece of music, for instance, involves considerably more procedural practice than active practice of translating a new and unfamiliar score to actions on the instrument. One of the advantages of the type of incidental learning procedure explored in the present series of studies is the rapid presentation of a large number of novel stimuli. Indeed, incidental learning is often much faster than intentional learning specifically because more trials can be experienced in a shorter period of time. Automaticity depends to a much larger extent on how many times a participant has seen stimulus pairings (Grant & Logan, 1993) that to how they solved the problems (Logan & Klapp, 1991).
Globally, the results of our studies are coherent with standard laws of automaticity. Though not explored before in previous work, larger learning effects with a stronger contingency manipulation can be easily understood in terms of the number of training exemplars experienced for high and low contingency pairings, as illustrated in Figure 14. For instance, in the 90% contingency condition, each high contingency pairing is presented 54 times, further down on the practice curve, whereas low contingency trials are only presented once. This produces a large difference in practice-based improvements for high and low contingency trials. Decreasing the contingency proportion both decreases the number of times participants see high contingency pairings and increases the number of times participants see the low contingency pairings, thereby decreasing the net contingency effect. As can be seen in the figure, the implication of an instance theory, as we novelly interpret it here, is not only that response times become faster following a power curve with more and more practice (e.g., as in the curve fitting analyses in our Experiment 2), but also that the contingency effect (i.e., low – high contingency trials) will decrease following a similar power curve as the contingency proportion is reduced. This is exactly what we observed in Experiment 1.
Results are also not coherent with an alternative perspective that we considered, namely, that imperfect contingencies might aid learning in some way. If true, this perspective would have predicted a reduction (or perhaps levelling off) of the contingency effect when the manipulation becomes “too strong”. Further application of the novel analyses presented in the current work might also be applied to other (e.g., nonmusical) learning tasks to see whether the same rules apply. An analysis like that presented above in Figure 14 might also be quantified in computational models of learning and automaticity (e.g., Logan, 1988; Schmidt et al., 2016).
Similarly, in the curve fitting analyses in Experiment 2, we again showed results coherent with automaticity in both the name-targets and position-targets conditions. Specifically, initial performance is bad (i.e., while learning new stimulus-response mappings), but rapidly improves during early practice. With more and more practice we continue to improve, albeit at a decelerating rate. Learning to identify note positions on a musical staff with keypress responses does not violate this general rule in any way. Indeed, practice-based improvements were particularly large in the position-targets condition. At the start of the task, responding to note positions was particularly slow. This is coherent with the standard narrative concerning the difficulty of sight reading. However, improvements were particularly rapid in this condition.
We would like to strongly highlight the ways in which the present research approach differs markedly from the bulk of the prior research published on sight reading in the musicology and music cognition domains. Most typically, the research question is very different than the current one: how effective is an in-class (often: extended) intervention on improving real-life sight reading ability, for example, as measured by standardized test of sight reading (e.g., Watkins & Farnum, 1962). Our research question is fundamentally different than this. In the current research, our goal was to ask to what degree aspects of sight-reading skill are learnable incidentally and whether this learning occurs as rapidly and effortlessly as in other domains. This follows similar logic as other research in the implicit learning domain concerning other competencies. For example, Reber (1967) posed the question of what extent the grammatical rules of a language are learnable incidentally. For a few minutes, participants memorized lists of words created from an artificial “grammar” (for related work with auditory artificial speech and tone streams in adults and infants, see Saffran, Aslin, et al., 1996; Saffran et al., 1999; Saffran, Newport, et al., 1996). Stimuli were composed of pseudowords using five consonants that followed a small set of artificial grammatical rules on letter order to determine whether participants could implicitly acquire knowledge of these grammatical rules (i.e., as measured by above-chance ability to distinguish between “grammatical” strings, which followed the artificial grammar, and “nongrammatical” strings, which violated the artificial grammar). He did not train participants on an entire language (e.g., with all letters of the alphabet and all of the many grammatical rules of the language) and did not test them with standardized tests of language proficiency. Similarly in the present work, our goal was to assess whether elements of sight-reading skill can be automatized with an incidental learning procedure and whether this is just as rapid and effortless as in more arbitrary laboratory tasks. To do this, we created a simplified task with an experimentally naïve population (nonmusicians) in a controlled setting to determine whether sight-reading materials pose some type of specific difficulty for this type of learning task.
Relatedly, while we have mentioned that a real-world application of an incidental learning procedure might eventually be imaginable, it is certainly not our aim to suggest that the specific tasks used in the present work are “classroom ready” and will outperform traditional instruction. This may eventually be the case and it is indeed a long-term goal of our research program to pose such questions. Before then, many interesting theoretical (and perhaps eventually practical) questions remain open. For instance, the current research focused on the learning of correspondences between note positions and actions (i.e., what motor action to execute to play the note) in nonmusicians. Future research is, of course, still needed to explore whether a similar approach can be effective in learning other information on the musical staff, such as timing information (e.g., how long to play a note; see Boyle, 1970; Pike & Carter, 2010). In addition, aural learning, that is, learning to associate the sounds with note positions, has previously been identified as a useful strategy in learning to sight read (see Mishra, 2014, for a meta-analysis). In currently ongoing research we are exploring how auditory stimuli influence sight reading in addition to how procedures akin to the current one can aid in developing the ability to detect pitches by ear (Iorio et al., 2022), termed absolute (or perfect) pitch (Bachem, 1955; Deutsch, 2013).
Future research should also focus on establishing whether the present approach is equally effective in learning to sight read for different instruments, each of which require very different actions to produce a given note. In the present research we used a piano analogue. In future research, we plan on applying our same approach some different real instruments (e.g., piano9 and guitar). The present research also focused on an experimentally pure situation of training naïve nonmusicians on completely unfamiliar materials. This is also one of the reasons why we focused on only training the pitch dimension of musical notation with single notes. Future research might therefore aim to assess the effectiveness of the current approach in improving sight-reading skills in novice or intermediate-level musicians, where it would also be feasible to use more complete tests of sight reading (e.g., Watkins & Farnum, 1962). The current learning approach can therefore be tested further to determine ideal learning parameters and further potential applications.
We also note that an eventual real-world application might look different from the present task. For instance, to have control over stimulus timing and to record individual-stimulus response times we presented one note on the staff per trial. In real-world sight reading, of course, there are many notes on the staff. This seemingly trivial detail (i.e., only one note at a time, but otherwise identical to real-world sight reading) might actually be important. Transfer of learning from one task to another that only differs in seemingly superficial ways is often shockingly bad (e.g., Owen et al., 2010). Also on a purely theoretical side, it could be interesting to not only explore how well participants learn from multiple-note staffs, but also how well they learn to process and learn about repeating sequences of notes, analogous to sequence learning studies (e.g., Nissen & Bullemer, 1987).
In sum, the present research showed that learning in the music contingency learning procedure, along with verbalizable knowledge, is augmented with a stronger manipulation of this contingency. Reversing the task and having participants respond to the note positions while ignoring the note names produces similarly robust learning, though verbalizable knowledge seems at least somewhat reduced relative to our original procedure. In any case, verbalizable knowledge is usually not the target of incidental learning (viz., as it often produces unconscious knowledge). Globally, we hope that the present work will inspire more research into the incidental learning of the performance aspects of music learning (i.e., learning to play). Of course, learning to play music is rather complex, involving the need to engage in many cognitive activities (for a review of some music cognition research, see Pearce & Rohrmeier, 2012). It is implausible to think that all necessary skills can be acquired in a purely incidental way. Explicit instruction is certainly required for many aspects of musical education (e.g., for learning music theory or technique). Other subskills, however, can benefit from simple repetition. The same is true for any complex activity, whether it be in sports, in gaming, or in other domains (e.g., the rules of a game are inevitably best learned explicitly, but perfection of techniques comes through practice). The skills that are learnable incidentally, however, can be acquired quickly and easily with the present type of approach. As discussed in the introduction, this is largely due to the fact that participants can experience large numbers of novel stimuli within a relatively brief training period. More narrowly, sight reading is traditionally considered difficult and slow-to-acquire, and it is hoped that the present research will inspire more investigations of ways to supplement traditional music instruction and simplify the process of early familiarization with musical materials.
Contributions
Contributed to conception and design: JRS, CI, PBC
Contributed to acquisition of data: JRS
Contributed to analysis and interpretation of data: JRS, CI, BPC
Drafted and/or revised the article: JRS, CI, BPC
Approved the submitted version for publication: JRS, CI, BPC
Funding Information
This work was supported by the French “Investissements d’Avenir” program, project ISITE-BFC (contract ANR15-IDEX-0003) to James R. Schmidt.
Competing Interests
The authors declare no competing interests.
Data Accessibility Statement
All the stimuli, presentation materials, participant data, and analysis scripts can be found in the following Open Science Framework repository: https://osf.io/42fc3/.
Footnotes
As a minor aside, though definitions vary, sight reading might be slightly distinguished from music reading, or the ability to read the notes without necessarily being able to play them or to play them rapidly enough to perform the song in time while reading. In contrast, sight reading requires both music reading and music production (Sergent et al., 1992); the term sight playing (or sight singing) is therefore sometimes used instead (Udtaisuk, 2005).
As an aside, we note that when we talk about “automatic” influences on performance, we are referring to one of the many “features of automaticity” (Moors & De Houwer, 2006). Specifically, we refer to the “automatic” impact of a task-irrelevant stimulus on identification of a task-relevant stimulus and remain agnostic regarding other automaticity features (e.g., resource, attention, or awareness requirements).
To clarify, “congruent” and “incongruent” refer to whether the name and position, respectively, match or mismatch in meaning, whereas “high contingency” and “low contingency” refer to whether a name-position stimulus combination occurs, respectively, frequently or infrequently in the task. In our particular task, the congruent pairings occur frequently and the incongruent pairings infrequently. Of course, nonmusicians are not sensitive to the congruency as they do not know the meanings of the note positions, but rather the contingency as they are learning.
The variability in sample sizes is explained by randomized group assignment.
We note that this (a priori) criterion is rather liberal (favouring more data over less) and does contribute to some noisiness in between-group cell means (particularly in errors). More restrictive accuracy or response speed criteria do not modify the conclusions of the current report, however.
The inequality in group sizes is again due to random chance in allocating participants to groups.
An anonymous reviewer of a previous version of this manuscript suggested that there may be a potential confound with the objective awareness test in this experiment. In particular, participants in the name-targets group see note positions and respond with (left-to-right organized) responses labelled with note names. Participants in the position-targets group do the reverse: see note names and respond with (left-to-right organized) responses labelled with note positions. The reviewer suggested that participants in the name-targets condition might be able to use spatial compatibility between the note position and response location to improve performance in the test phase (analogous to SMARC effects; see Ariga & Saito, 2019; Rusconi et al., 2006), and perhaps not (or to a lesser degree) in the position-targets condition. There are a few problems with this idea, however. First, we already tested for spatial compatibility effects like this in Experiment 3 of Iorio et al. (2023), and results barely deviate from chance guessing. This would not explain the 14.3% difference between the two groups in the current study. Indeed, Experiment 1 of the current manuscript also shows how objective awareness scores diminish with a weaker contingency manipulation, already falling below the position-targets score with a 50% contingency manipulation. The idea that the nearly 40% correct responding in Experiment 2 is explained by spatial compatibility therefore seems untenable and the increases with proportion in Experiment 1 clearly indicate learning. It may nevertheless be worthwhile to explore other ways make the two test phases more similar in future research to avoid any potential confounding (e.g., by using noncompatible or randomized mappings).
As a minor aside, responding directly to the note positions was, overall, faster than responding to note names. On the other hand, note-name identification could be made much faster with a verbal naming response, as in the musical Stroop studies mentioned in the Introduction (Grégoire et al., 2013, 2014a, 2014b, 2015, 2019; Grégoire & Poulin-Charronnat, 2019), perhaps a direction for future research with our learning adaptation.
In an ongoing project with another doctoral student of the first author, we have already found equally-robust learning with a MIDI piano response modality.