In an experimental study, we investigated how well novices can learn from each other in situations of technology-aided musical skill acquisition, comparing joint and solo learning, and learning through imitation, synchronization, and turn-taking. Fifty-four participants became familiar, either solo or in pairs, with three short musical melodies and then individually performed each from memory. Each melody was learned in a different way: participants from the solo group were asked via an instructional video to: 1) play in synchrony with the video, 2) take turns with the video, or 3) imitate the video. Participants from the duo group engaged in the same learning trials, but with a partner. Novices in both groups performed more accurately in pitch and time when learning in synchrony and turn-taking than in imitation. No differences were found between solo and joint learning. These results suggest that musical learning benefits from a shared, in-the-moment, musical experience, where responsibilities and cognitive resources are distributed between biological (i.e., peers) and hybrid (i.e., participant(s) and computer) assemblies.
Jordan and Perry are good friends. Growing up together, they share a mutual passion for a number of things, including music and music-making. Recently, Jordan came up with the idea of taking music lessons together, about which Perry was enthusiastic. While they did not know whether this form of learning was appropriate for them, or useful, they nevertheless wanted to give it a try and keep spending time with each other. However, they had little or no money to attend music classes. And even if they did, the only music teacher available lived far away from their homes. A solution, Perry suggested, might involve watching learning videos on the Internet, where expert musicians offer online courses as well as useful tips to help develop one’s musical skills more informally. Video after video, Jordan and Perry learned a variety of exercises and techniques that improved their musicianship in a number of ways. However, this kind of training left the two unsatisfied: they were not actually learning together—the process of learning remained inherently individual. Indeed, online and informal teaching resources are mostly designed for individual learners, and often based on imitation and repetition. How can they share their learning experience and acquire new musical skills together? What is the best strategy for novices to learn music from each other while taking advantage of the possibilities offered by technology? How can they optimize the reciprocal exchange of information and enhance their musical experience? And what musical skills are best acquired and developed together?
In this paper, we report data from an original behavioral study that can help advance some preliminary answers. The study addresses the question of how well novices can learn from each other in situations of technology-aided musical skill acquisition. We expected that reciprocal interaction between musically untrained individuals can lead to enhanced learning experiences and musical outcomes when it involves an active, in-the-moment, form of participation. This can contribute to minimize the “cognitive burden of individual responsibility” (El Zein, Bahrami, & Hertwig, 2019), and help learners co-create a “we-space” (Krueger, 2011)—a shared cognitive niche where skills and mental resources are dynamically negotiated and co-determined as the musical activity unfolds. We thus assessed three different forms of technology-aided peer-to-peer musical learning: synchronization (playing the same musical pattern at the same time), turn-taking (playing different musical patterns one after the other), and imitation (playing the same musical pattern one after the other), and compared them with corresponding tasks based on individual musical learning. The latter involved individual participants learning the melodies in the same conditions (synchronization, turn-taking, and interaction), alone with a computer. As such, three types of novice-novice-computer interactions were tested and compared to analogous novice-computer interactions. This research builds on previous work on collaborative musical pedagogy (see Gaunt & Westerlund, 2013; Hanken, 2016; Nielsen, Johansen, & Jørgensen, 2018), and is inspired by the conceptual tools of ‘4E’ cognitive science - a cross-disciplinary approach that conceives of mental life as Embodied, Embedded, Extended, and Enactive1 (Menary, 2010; Newen, DeBruin, & Gallagher, 2018). We suggest that putting these two research avenues together can help frame our work within a broader context involving both practice-oriented (i.e., pedagogical), and theoretical interests. This can contribute a new perspective on the interactive roots of musical learning, helping Jordan and Perry find a way to learn music together.
A Problem in Mind
While there is a long tradition in music scholarship to explore the various forms of participatory music-making across cultures and styles, questions about the capacity of novices to collaboratively generate novel musical skills have rarely been posed—particularly with regard to the psychological mechanisms involved in the process. A reason for this is that the most common learning approaches in Western contexts are based on sessions of master-apprentice tuition and individual training (Hallam, 1998). “[E]stablished as a continuing core activity in Western classical instrumental learning” (Creech & Gaunt, 2018, p. 145), this pedagogical strategy contributed significant benefits for the lives and musical development of the students (see Gaunt, 2008). For instance, individual tuition is particularly useful to foster the creation of personalized learning strategies, monitor the progress of the student over time, and stimulate verbal communication in a rich variety of situations involving close personal interactions (e.g., Barrett & Gromko, 2007). The psychological model adopted to explain the teacher-student interaction often involves a representational, correspondence-based, schema (see van der Schyff, Schiavio, & Elliott, 2016). Here, pre-given cognitive mechanisms are thought to respond to external perturbations (i.e., musical stimuli, verbal instructions from the teacher, etc.) via sub-personal, rule-based computational processes. The internal elaboration of such stimuli, then, gives rise to task-specific mental representations that assist and guide the student in various ways (see Hallam & Bautista, 2018; Lehmann, 1997). In other words, if there is an adequate correspondence between the internal psychological predisposition of the learner and the information being communicated verbally (or transmitted by the structural antecedents intrinsic to the musical stimulus), a representation with a contextual meaning dependent on the learner’s expertise can be formed2 (see Lehmann & Jørgensen, 2018).
Novices like Jordan and Perry, however, can hardly provide each other with the information necessary for novel skills to be developed in such a way. They lack the expertise of a teacher, and do not know how to enhance their reciprocal capacity to engage with the musical material to be learned. And even if information is available from external sources (e.g., a teacher, an instructional video, etc.), then it seems implausible that their reciprocal interaction could play any significant role in improving their performance. In fact, one might expect that an interaction between two or more novices while learning would somehow disrupt the linear mechanisms (elaboration of external information followed by the generation of a given behavioral outcome) of the process. Their focus should arguably remain bounded to their individual activity, allowing them to accurately represent their learning goal (e.g., playing a chord), and optimize the best behavioral strategy to achieve it (e.g., the most appropriate fingering solution). But when looking at concrete musical practices, musical skills appear to be often developed collaboratively. And indeed, one can find many music pedagogies specifically oriented toward joint learning:
“These range from well-established music education approaches developed in the 20th century by Kodaly, Orff, Dalcroze, and Gordon to national interpretations and developments of these approaches, traditional, intercultural approaches, and methods developed in the context of psychology research with specific transfer aims in mind. Such methods involve group learning, shared musical experiences, synchronization, imitation, and a range of other socially interactive behaviors that are common to ‘real-world’ social musical experiences, and could perhaps be used effectively and systematically in music intervention research” (Overy, 2012, p. 66).
In such contexts two or more individuals can jointly learn from a teacher, negotiating their skills and expertise as the lesson unfolds. In a recent qualitative study, Schiavio and colleagues (2018) explored the teaching dynamics behind similar forms of collective music lessons, showing how teachers sometimes “step back” to let their students more actively participate in, and guide, their own learning. Such participatory learning dynamics, it is suggested, are functional to the students’ musical development; by being “less present,” teachers aim to foster a shared sense of responsibility in the group. This way, skills and expertise are negotiated in the moment-to-moment contingencies of the continuous interactions among peers. Similar insights are illustrated in well-known work on informal musical learning, where novices are shown to collaboratively acquire and develop musical skills without prescribed instructions (e.g., Green, 2008). Novices are seen here to explore together possibilities, meanings, and challenges that emerge with the collaborative process in itself, rather than through the intrapersonal elaboration of external stimuli. It has been argued that settings where students are asked to actively participate in group musical activities may foster important benefits in terms of negotiating differences and stimulating trust and social understandings (Higgins & Mantie, 2013), improving critical thinking as well as social skills (Gokhale, 1995), and promoting processes of social inclusion (Welch et al., 2014).
The same focus on relationships is found among the main tenets of 4E cognitive science (Ward & Stapleton, 2012). The label indicates a school of thought that conceives of mental life as Embodied, Embedded, Extended, and Enactive. Putting together insights from theoretical biology, cognitive neuroscience, artificial intelligence, philosophy of mind, and ecological psychology, this framework trades the traditional view of cognition as information-processing occurring in the head, for a more integrated perspective in which mind is understood as an emerging property of the relationship between brain-body systems and the environment in which they are situated (see Gallagher, 2017; Stewart, Gapenne, & Di Paolo, 2010; Varela, Thompson, & Rosch, 1991). The move points to the constitutive role of (inter)active, sensorimotor experience for the living system’s cognitive economy: as organisms of different biological complexity can establish meaningful relationships with their niche via recursive patterns of action and perception, the boundaries between physiology, world, and mental life become blurred (Colombetti & Thompson, 2008). Such relationships include the exchange of information between agents (e.g., speech), the optimization of certain behavioral dispositions to reach a meaningful objective (e.g., the quest for nutrition for an organism), the coupling between living systems and non-biological devices (such as computers or musical instruments), and many other possibilities. If cognitive states are fluidly integrated with bodily and ecological resources on the basis of these continuous sensorimotor loops, then explanation of the mental should entail a focus on such agent-world dynamical couplings (see Barrett, 2011; Hutto & Myin, 2013; Wilson & Golonka, 2013). Recent scholarship in music cognition and music pedagogy increasingly draws from similar insights, and moved away from brain-centered perspectives to explore the broader network of musical bodies, brains, and niches, from which mental life (and music cognition), emerge (see Bowman, 2004; Clarke, 2005; Krueger, 2014; Schiavio & van der Schyff, 2016, 2018).
In this changing landscape, empirical research inspired by the conceptual resources of the 4E approach is still rare. The present contribution aims to address this gap by quantitatively investigating two aspects of the theory that may have strong implications for musical skill acquisition and development. These are: 1) the primacy of sensorimotor experience, that is, the idea that body and action play a fundamental role in driving cognition, including learning; and 2) the coupling between agents and their environment, namely, the capacity of living systems to functionally integrate social and physical elements of their contingent milieu to achieve a specific (cognitive) task. Before presenting our study, however, we first need to explore in more detail the main features of musical settings based on collective activity and describe three main learning modalities, which are then tested empirically.
Joint Action in Practice
In learning settings, joint activities where active music-making is prioritized (being facilitated by a teacher or emerging spontaneously within the group of novices), often involve one or more of the following modalities: synchronization, turn-taking, and imitation.
With regard to synchronization, a first thing to notice is how pervasive synchronized behavior is for human and non-human life. Consider, for example, how humans tend to synchronize their gait when walking together (van Ulzen, Lamoth, Daffertshofer, Semin, & Beek, 2008), or how non-human animals can also coordinate with each other synchronically to produce periodic signaling (Patel, 2008, p. 408). Acting in synchrony with another person can increase cohesion and social affiliation among members of a group (Hove & Risen, 2009, Stupacher, Maes, Witte, & Wood, 2017), also inducing a sense of compassion and trust (Launay, Dean, & Bailes, 2013; Valdesolo, Ouyang, & Desteno, 2010). The tendency exhibited by listeners to move together at the same time with a musical beat is considered a human universal, and shared musical experiences can enhance prosocial skills in both children and adults (Kirchner & Tomasello, 2010). Producing synchronous behavior in musical contexts, however, can be complicated. Interesting empirical findings by Endedijk and coworkers (2015) showed that when children between 2- and 4-years old play drums in dyads, older children are able to coordinate and adapt to their partner’s drumming significantly better when compared to younger ones, even if all subjects were able to produce a steady beat. This aligns with studies that suggest a facilitation for the productions of synchronous behaviors in individuals with musical experience (e.g., Drake, Jones, & Baruch, 2000). So, while the ability to “lock in” to temporal patterns with other individuals is understood as a natural resource that can be highly useful for the “development of executive functions and far transfer effects” (Miendlarzewska & Trost, 2014), its functional role in novices’ peer learning needs to be assessed. Can playing in synchrony with each other lead to enhanced musical outcomes in nonmusicians when compared to other modes of learning?
Joint musical activities, of course, involve more than simply playing in synchrony with others. Rather, such contexts are often based on meaningful exchanges, where implicit rules and skills are negotiated and reciprocally developed during performance. Because of this, training based on co-regulated turn-taking behaviors may also be beneficial. Like synchronized behavior, turn-taking can be found early in life. It has been shown, for example, that young infants are highly sensitive to rhythmic patterns of vocal exchanges based on turn-taking (Murray & Trevarthen, 1985). Trevarthen (1999), among others, suggests that infants and mothers are united by a single rhythm: a turn-taking in a slow Adagio (a beat every 0.9 seconds). In a similar vein, Gratier and colleagues (2015) presented evidence of early turn-taking behaviors in young infants (8 to 21 weeks), suggesting a key role for joint activity and preverbal interaction in the development of socio-cognitive skills, communication, and learning. This is particularly interesting when considering how turn-taking is instead usually associated with the dynamics of spoken conversation (Stivers et al., 2009). This joint activity, indeed, “enables the fluency and continuity of natural conversation and entails regulation of the timings of turns at talk. Over different languages and cultures, the conversation participants share an ability to exchange the speakership in tens of milliseconds and, for most of the time, without overlaps” (Hirvenkari et al., 2013). But can learning modalities based on turn-taking help two nonmusicians improve their musical skills? Are there benefits in taking turns while learning to perform a musical phrase?
Finally, imitative behaviors are perhaps the most used tool for learning with a teacher. Here, “[c]ues such as direct eye gaze and pointing signal that the expert is about to communicate something which is learning-relevant and generalizable” (McEllin, Knoblich, & Sebanz, 2018), allowing the novice to parse actions into subunits. Building on previous work by Byrne and Russon (1998), Leman (2008), distinguishes between those imitative processes that copy the organization of actions and their goal(s), and those who copy the kinematics of movements. The former (often called “true” imitation) is particularly suited for learning, as it stimulates the agent to produce an action in function of the goal. Imitating goals might build on initial patterns of copied behavior that are then optimized through repetition and performance. While this solution might work well with expert teachers—who can show what is to be copied with confidence and precision—novices might face more difficulties in imitating each other, struggling to extract the “correct” learning-relevant information.3 Nevertheless, it may still serve as a complementary resource for elementary goals. By imitating each other, for example, peers can become aware of novel possibilities for action and recursively integrate such expertise with each other.
As we saw, these three modalities of joint action can play an important role in music training, social life, and development more generally, stimulating the emergence of various skills at different levels and timescales. Conceiving of musical learning as an individual process based on the realization of inner representations, however, might play down the moment-to-moment, interactive dynamics of such joint behaviors. Traditional models based on a representationalist doctrine, in other words, might display important limitations when addressing the collective emergence of skills that characterize many participatory forms of musical practices. Inspired by the main insights of 4E cognition, we thus designed an experiment that involves a learning task based on action; examines the reciprocal interaction between co-actors; and includes the manipulation of a non-biological device to aid learning (i.e., a computer). We assessed the three learning modalities discussed here (synchronization, turn-taking, and imitation), and systematically compared collective vs. solo learning with both including computer support. Because 4E accounts help eschew the dichotomy between internal psychological domains and external “objective” environment in favor of a more dynamic story based on the constant interplay of brains, bodies, and world, we expected that active forms of reciprocal participation can determine stronger agent-world couplings, leading to more accurate musical outcomes.
Fifty-four healthy participants (40 females, 14 males, mean age = 23.1 years, SD = 4.9) were recruited for this study by the University of Graz mailing list. Eighteen volunteers were randomly assigned to the solo learning group, while the remaining thirty-six formed eighteen pairs for the duo learning conditions. All participants were nonmusicians: none of them regularly performed music, attended music lessons or music schools, or informally learned to play a musical instrument. They reported to listen to music on average 12.1 hours per week (SD = 8.3), with their favorite genre being pop (n = 17), rock (n = 12), indie (n = 5), hip-hop (n = 3), electronic music (n = 2), R′n′B (n = 2), classic (n = 2), folk (n = 2), tango (n = 1), country (n = 1), jazz (n = 1). The remaining participants (n = 6) reported no preferences for specific genres. The thirty-six participants of the duo group were known to each other for an average of 27.28 months. Here, the high standard deviation (SD = 56.37) is due to the fact that two participants were brothers and two participants knew each other for nine years, whereas the remaining participants were unknown to each other (n = 12), or knew each other for less than four years (n = 20). The study was conducted at the Centre for Systematic Musicology of the University of Graz, Austria. All procedures were approved by the Ethical Committee at the University of Graz and were in accordance with the statements of the Declaration of Helsinki. Participants were monetarily compensated for their involvement in the study and provided written informed consent.
Three melodies were composed by AS for the purpose of this study. All melodies involved the same limited set of pitches (F♯, G♯, and A♯), in different order, style, and rhythm. Each melody was characterized by a defining feature: Melody 1 was based on a simple melodic pattern, Melody 2 included simultaneous tones, whereas Melody 3 was rhythmically challenging. The purpose of such variety was to keep the task engaging for our participants throughout the experiment. The melodies are shown in Figure 1.
The melodies were performed by RP on a digital piano (Yamaha Clavinova CLP370), connected to a computer (MacBook) via USB. The audio was recorded with the software Reaper64. Videos were recorded with a JVC Everion digital camera placed above RP’s hand (see Figure 2). The three resulting video recordings (one per melody) were edited with iMovie software to assist participants in the different learning conditions. The videos were presented to the participants over a laptop (another MacBook) placed over the digital piano. Participants only received instructions using these videos. Music notation was not shown.
Prior to the testing, participants were assigned to one of two groups: solo (n = 18) or duo (n = 36). Participants of the solo group took part in the whole study individually, while participants of the duo group were paired with another participant (see Figure 3). The experiment involved three blocks, each divided into a learning phase and a performance phase. For the entire duration of the experiment, a researcher was present in the experimental room to assist participants, monitor the data collection, and ensure each procedure was carried out correctly.
A learning phase consisted of one of the three piano-based learning conditions (with a maximum duration of twelve minutes each): synchronization, turn-taking, and imitation. Learning phases based on synchronization involved participants playing along with the video (in the solo group), or with video and partner (in the duo group). Turn-taking and imitation conditions were divided into two sections with a maximum duration of six minutes each to allow an alternation of the different segments being performed (see Figure 4). Turn-taking consisted of the alternation of participant and computer (solo group), or participant and participant (duo group), for the realization of a musical phrase. In the imitation condition, participants were instructed not to play along with the videos. In the solo group the video was manipulated such that after each section, a pause allowed the participants to repeat on the piano the short segment just seen on video. In the duo group, the video was only available to one participant. Here the experimenter manually changed the position of the monitor before the learning phase started so that only one participant could access the video information. For the same purpose, headphones in this condition were used to convey auditory information to only one participant. Each learning phase was associated with one of three learning conditions and one of three melodies. The presentation order of the learning conditions and their association with the melodies were pseudorandomized to avoid order artifacts. All learning phases were video recorded with a JVC Everion digital camera placed at around two meters on the back of the participant(s). Videos were coded offline by a coder blind to the study. A representation of all learning phases is illustrated in Table 1. The coder watched all videos and reported the time spent by participants: 1) talking to each other (duo) or to the researcher (solo); 2) playing according to the learning condition; 3) interacting with the computer (e.g., stopping the video, replaying a specific passage, etc.); 4) observing the instructional videos; and 5) resting (e.g., periods of silence or inaction, where participants tried to remember the motor sequence).
Each learning phase was followed by a performance phase, in which the participants’ ability to perform the learned melodies was assessed. Participants of both groups individually performed the learned melody without any visual or acoustic information—in the same octave register as in the learning phase—in the original tempo (60 BPM), and in a faster tempo (90 BPM). Participants had the chance to perform each melody three times in each tempo, and then chose the two best performances (one in the original tempo, one in the fast tempo). Best performances were chosen to give participants the best opportunity of success, reducing the inclusion of noisy data due to performance errors. An additional benefit is a likely reduction of anxiety to perform accurately. The faster tempo was chosen to challenge participants and test whether learning was robust and adaptable. Playing at a faster tempo may help to bring out weaknesses in learning, more so than when exactly the same task was repeated. The tempo was given by a metronome prior to the performances.
Having obtained written informed consent, participants (individuals or pairs) were invited to enter the experimental room and were instructed about the aims of the study (i.e., they were told that their ability to perform different melodies after three learning phases was tested). Participants were comfortably seated in front of the digital piano, above which the laptop was placed. Depending on melody and condition, the corresponding instructional video was looped on the laptop for a total of twelve minutes. In the solo group, participants played the exact same notes as in the original melody. In the duo group, participants played one octave lower or higher, depending on where they were seated. During the learning phase, participants were instructed to interact with the laptop as they like in order to optimize their learning. For example, they were free to stop the video and focus on a particularly difficult passage, or talk to their partner about a given issue. If they felt ready before the twelve minutes ended, they asked the experimenter to terminate the learning phase. After the third, final, performance phase, each subject completed a questionnaire involving a total of 13 items (solo) or 16 items (duo). These comprised an initial section on the background and preferences in music listening of participants, followed by a series of Likert-scale items dedicated to assess the felt quality of their learning experience. The questionnaire for duos also featured the 7-point single-item “Inclusion of Other in the Self” (IOS) scale, originally developed by Aron and colleagues (1992), and since then adopted to measure how close respondents feel to another individual or social group (e.g., Himberg et al., 2018; Stupacher et al., 2017). A representation of the experimental procedure is depicted in Figure 5.
Two measures of accuracy of the performances of the melodies were employed as dependent variables, which captured pitch similarity and temporal similarity.
The similarity of notes played by the participants in the final performances and the original melodies was calculated with the edr (edit distance on real signals) function in MATLAB (MathWorks, Natick, MA). Notes were represented as MIDI numbers. The edr function returned the number of notes that a participant played wrong, missed, or added in comparison to the original melodies. The edr outcome was divided by the total number of notes in the original melodies, to normalize pitch similarity. In the duo group, the edr function was additionally used to assess the pitch similarity of the two performances of the paired participants. Here, the mean of the number of notes played by the participants was used to normalize the edr outcome. As Melody 2 mostly consisted of chords, notes played in a time window of 60 ms were considered as part of a chord and the order of these notes was irrelevant. To give an example, if the chord of the original melody consisted of the MIDI notes 66, 68, and 70 being played at the same time, and the participant played the notes 68, 70, and 66 within 60 ms, the order of these notes was marked as correct.
The similarity of the time series in performed and original melodies was calculated with the dtw (distance between signals using dynamic time warping) function in MATLAB. The function stretched the performed and original melodies such that the sum of all asynchronies between the two melodies was the smallest. This was done to account for performances that were played too fast or too slow. The outcome was the sum of all asynchronies between the performed and original melody in milliseconds. To normalize temporal similarity, the sum of the asynchronies was divided by the absolute time of the original melody. In the duo group, the dtw function was additionally used to assess the temporal similarity of the two performances of the paired participants. Here, the mean of the absolute time of performances was used to normalize the dtw outcome. For the calculation of temporal similarity between the original stimulus and the faster performance (count in metronome with 90 BPM), the time series of the original stimulus was multiplied by 0.6667 before using the dtw function.
The distributions of pitch similarity outcomes (i.e., the normalized edit distances) were heavily right-skewed. This is because many comparisons of performance and original melody revealed no wrong notes (i.e., a value of zero). Thus, Wilcoxon rank tests for paired samples were used to assess differences between the three learning conditions. As a visual data inspection of the boxplots of normalized edit distances suggested no interaction between condition and performance tempo (Figure 6), comparisons between the conditions were based on both performance tempi. The effect size is computed as r = Z/√N. The temporal similarity outcomes (i.e., the normalized Euclidean distance) were ln-transformed to meet the assumptions of normality and entered into repeated measures ANOVAs with the factors Condition (imitation, synchrony, turn-taking) and Performance tempo (original, fast). Post hoc comparisons were Bonferroni corrected.
Solo group - Comparison of performance and original stimulus (
In the imitation condition, participants played more wrong notes than in the synchrony condition (Z = 2.03, p = .042, r = .34). The differences between imitation and turn-taking (Z = 0.47, p = .646, r = .08), and synchrony and turn-taking (Z = -1.13, p = .264, r = .19) conditions, were nonsignificant.
Duo group - Comparison of performance and original stimulus (
In the imitation condition, participants played more wrong notes than in the synchrony condition (Z = 5.22, p < .001, r = .87) and the turn-taking condition (Z = 3.51, p < .001, r = .59). In the synchrony condition, participants played fewer wrong notes than in the turn-taking condition (Z = -2.90, p = .003, r = .48).
Duo group - Comparison of performances in duos (
In the imitation condition, participants played more wrong notes than in the synchrony condition (Z = 4.08, p < .001, r = .68) and the turn-taking condition (Z = 2.85, p = .004, r = .48). In the synchrony condition, participants played fewer wrong notes than in the turn-taking condition (Z = -2.11, p = .034, r = .35).
Solo vs. duo group
Three individual Mann-Whitney U tests on the normalized edit distances (one per learning condition) revealed no significant difference between the solo and the duo performances (all p > .13).
Solo group - Comparison of performance and original stimulus (
An ANOVA on the ln-transformed normalized Euclidean distances revealed no significant main effect of condition, F(2, 34) = 1.33, p = .278, η2 = .07, or performance tempo, F(1, 17) = 3.66, p = .073, η2 = .18, and no interaction between the two factors, F(2, 34) = 0.80, p = .804, η2 = .05.
Duo group - Comparison of performance and original stimulus (
An ANOVA on the ln-transformed normalized Euclidean distances revealed a significant main effect of condition, F(2, 70) = 8.47, p < .001, η2 = .20, with larger distances for imitation than for synchrony (p = .001, d = .65) and turn-taking (p = .041, d = .43) and no significant difference between synchrony and turn-taking conditions (p = .361, d = .27). Euclidean distances also differed significantly between performance tempi, F(1, 35) = 6.61, p = .015, η2 = .16, with smaller distances for the original performance tempo compared to the faster performance tempo (p = .015, d = .43). The interaction between the two factors was nonsignificant, F(2, 70) = 1.55, p = .220, η2 = .04.
Duo group - Comparison of performances in duos (
An ANOVA on the ln-transformed normalized Euclidean distances revealed a significant main effect of condition, F(2, 34) = 4.63, p = .017, η2 = .21, with larger distances for imitation than for synchrony (p = .006, d = .86). The other two post hoc comparisons were nonsignificant (both p > .30). No significant difference was found for performance tempo, F(1,17) = 0.11, p = .743, η2 = .01, and the two factors showed no significant interaction (F(2,34) = 0.54, p = .588, η2 = .03).
Solo vs. duo group
An ANOVA on the averaged ln-transformed normalized Euclidean distances of original and fast tempo performances with the within-subjects factor learning condition and the between-subjects factor group (solo vs. duo) revealed a significant main effect of condition, F(2, 104) = 6.00, p = .003, η2 = .10, but no significant main effect of group, F(1, 52) = 1.05, p = .311, η2 = .02, and no significant interaction between condition and group, F(2, 104) = 1.23, p = .297, η2 = .02. Distances were smaller for synchrony compared to imitation (p = .002, d = .49). The other two post hoc comparisons were nonsignificant (imitation vs. turn-taking, p = .169, d = .27; synchrony vs. turn-taking, p = .084, d = .31).
On average, participants of the duo group engaged in learning phases based on synchronization for 9 min 40 s (SD = 2.48); imitation for 9 min 55 s (SD = 2.41); and turn-taking for 10 min 53 s (SD = 2.52). In the solo group, participants engaged with synchronous learning for an average of 9 min 37 s (SD = 3.09); learning based on imitation for 8 min 40 s (SD = 2.84); and learning based on turn-taking for 10 min 16 s (SD = 2.43).
Activities during learning phase
Time spent playing piano: The analyses of the video coding data (Table 2) indicate that in the solo as well as in the duo group, participants spent less time playing in the imitation condition (Solo: M = 2 min 44 s, Duo: M = 4 min 58 s) than in the synchrony (Solo: M = 6 min 12 s, Duo: M = 6 min 44 s) and turn-taking (Solo: M = 3 min 13 s, Duo: M = 7 min 13 s) conditions. In the solo group participants also spent less time playing in the turn-taking condition compared to the synchrony condition. Within each learning condition, the time spent playing was not significantly correlated with pitch or temporal performance measures (Table 3). Time spent talking: The only significant difference of talking time occurred in the comparison of synchrony and turn-taking conditions in the duo group (M = 2 min 12 s, and M = 1 min 30 s, respectively). Time spent interacting with the computer: In the duo group, participants spent more time manipulating the video on the computer (for example, repeating a specific section of the video, or stopping the loop) in the synchrony (M = 18 s) compared to the imitation (M = 5 s) and turn-taking (M = 3 s) conditions. In the solo group, participants spent more time interacting with the computer in the synchrony (M = 21 s) compared to the turn-taking (M = 2 s) condition, whereas imitation (M = 8 s) did not differ significantly from the other two conditions. Time spent watching the video: In the duo group, participants spent most time watching the instruction video in the turn-taking condition (M = 10 min 11 s), followed by synchrony (M = 8 min 28 s) and imitation (M = 5 min 1 s). In the solo group, participants spent more time watching the video in turn-taking (M = 9 min 31 s) and synchrony (M = 9 min 24 s) compared to imitation (M = 5 min 30 s). We should note here that one could be engaged in two or more of such activities at the same time: e.g., one can talk while playing and watching the video.
Subjective ratings of learning and performance
Participants of both groups completed a written post-test questionnaire based on a series of Likert-scale items dedicated to assess the felt quality of their learning experience (range 1–7). On average, the overall enjoyment of learning was ranked 6.25 (SD = 0.84), and satisfaction with their own learning 4.81 (SD = 1.38). Confidence about learning was 4.88 (SD = 1.65), while feeling of being nervous was 4.22 (SD = 1.72). Confidence of performance was 4.81 (SD = 1.49), and nervousness while performing 4.20 (SD = 1.89). Finally, an average of 5.96 (SD = 1.03) was reported with regards to motivation to learn music in the future. These ratings are averaged from both groups, and did not significantly differ between the solo and duo group (p values ranging from .14 to .99). Participants of the duo group reported a level of satisfaction with their partner’s learning ability of 5.88 (SD = 0.69) and ranked the quality of their reciprocal interaction as 5.80 (SD = 0.84). The Inclusion of Other in the Self (IOS; Aron et al., 1992) in the duo group was rated as M = 4.38 (SD = 1.49) on a scale from 1 (no connectedness) to 7 (very high connectedness). IOS ratings were not significantly correlated with the performance measures (pitch similarity and temporal similarity). All participants were also asked to indicate the learning condition they preferred: synchrony was the top choice (n = 24), followed by imitation (n = 16) and turn-taking (n = 14).
Discussion and Conclusion
Our study addressed the question of whether technology-aided collaborations rooted in reciprocal interactions can help enhance musical learning in nonmusicians. We operationalized collaborative learning as three main learning conditions based on synchronization, imitation, and turn-taking; we then compared participants involved in joint learning with subjects involved in corresponding tasks based on individual learning. In assessing post-training individual performances, we found that pitch and temporal cues of the newly learned musical excerpts were generally more accurate when participants engaged in synchronous learning and turn-taking, over imitation. As no significant differences of performance measures (i.e., correct pitch and timing) were found between solo and duo groups, our results suggest that novices can maximize their learning in both individual and collective settings when they actively (co-)participate in the generation of musical material, rather than imitating external stimuli. The lack of significant differences between groups might be associated to the key role of computers in the learning process. Because of the intimate coupling developed between learners and the device they interact with, the functional role of the latter during individual training becomes comparable to that played by peers. While this does not imply that, phenomenologically speaking, peers and technology are “present” to each other in the same way, the proposed insight is consistent with the idea of fluid integration of biological and non-biological systems at the core of the extended mind thesis. A suggestive implication of this scenario points to an original account of how learning responsibilities are distributed within the large cognitive assembly: as computers arguably become part of the learner’s cognitive ecology, the responsibilities peers can develop for each other during collaborative tasks might be efficiently offloaded into the device in the absence of the partner; this allows the hybrid network to be equally conducive to optimal learning when compared to peer-to-peer occurrences, adequately compensating for the “loss” of one of the peers.
In the performance phase, participants of the duo group played more right notes when the learning was done in synchrony with others, when compared to learning based on turn-taking and imitation. Looking at temporal accuracy, we found that participants were more precise when the melodies were learned in synchrony or turn-taking, when compared to imitation. We suggest that the preference for contextual collaborations based on a moment-to-moment interaction, points to a sense of shared responsibility4: when engaging in synchronous or turn-taking behaviors with a partner, a single mistake can immediately compromise the shared outcome of a given musical passage. Conversely, in imitative tasks, one could make mistakes, and repeat the pattern to be copied by the other without too much worry. Surprisingly, though, the latter makes the actors less prone to learn effectively. With regard to this point, one could ask whether our results (with imitation being the less efficient learning condition overall) could not be determined by the level of expertise of peers: as they are both novices, it is not surprising to see no real benefit when it comes to imitative learning. You simply cannot show to your partner how to play a melody on the piano correctly if you do not know how to do it. If there was an expert, instead, the quality of the information being transmitted from one another would have been certainly better, allowing the imitator to learn efficiently without any need to play in synchrony or in turn-taking with the other. If this was the case, however, we would have likely witnessed data pointing to the benefits of imitation in individual settings, where the novice can freely observe the video and learn from the expert playing in it. Instead, our data show that also participants of the solo group did not benefit from learning based on imitation. Another possible confound is that synchronization might be regarded as the condition with the most reinforcement of the instructional video, which may have led to more effective learning. However, in both collective and individual learning, participants spent most time watching the instruction video in the turn-taking condition, suggesting that even in situations where one does not have to play continuously, technologies can provide additional audio-visual support to enhance learning. It should also be noted that participants in the duo group (alike those in the solo group reported below) spent more time playing the piano during learning phases based on turn-taking and synchronization when compared to imitation. However, this potential confound was arguably mitigated by the possible flexibility exhibited by imitative behaviors involving two participants (when compared to learning based on synchronization and turn-taking, as well as to imitations based on repeating visual information delivered via computer, as those involved in individual learning). Moreover, the total exposure to the melodies was not reduced during imitation, as the latter included being visually informed about aspects of the stimulus that may not be easily captured by playing only.
Participants of the solo group performed more wrong notes during performance done after imitative training, when compared to synchronous learning; and although no significant differences emerged when comparing temporal accuracy across learning conditions, our results remain somewhat surprising given that much musical practices in Western tradition entails forms of imitative behavior, where experts produce and show an exemplary model to novices, who are then asked to imitate. A factor that could have possibly contributed to this result is that learning phases based on imitation and turn taking (as a direct consequence of the experimental design) involved less actual playing time than learning based on synchronous playing. However, the segmentation of the musical material into chunks, central to imitative learning and turn-taking, allowed our participants to focus on specific issues of the musical passages being performed, which could have not been addressed within other settings. For instance, in light of said focus on chunks rather than overall phrases, in imitative settings one could more easily discover and repeat a particularly challenging fingering configuration. This might have compensated for the difference in time spent playing. Moreover, individual performances measured after learning based on turn-taking (where learning is also based on different chunks rather than continuous playing) were generally more accurate than those analyzed after imitative learning; this suggests that what matters is the quality (i.e., the condition) of learning, rather than time spent on the task (Williamon & Valentine, 2000).
Constituting Cognitive Assemblies for Musical Action and Learning
When looking at both individual and collective learning (with the notable exception of temporal accuracy in the solo group), it appears that musical skills are better acquired when the target melodies are performed collaboratively with the video or with the other peer, rather than linearly observed, internalized, and transformed into behavioral outcomes as in the imitation condition. We argue that this resonates with existing literature that stresses the role of collaboration, interaction, and co-presence for cognitive life more generally (see De Jaegher & Di Paolo, 2007; De Jaegher, Di Paolo, & Gallagher, 2010; Di Paolo, Cuffari, & De Jaegher, 2018). Along these lines, it has been argued that information concerning jointly performed activities might be processed more efficiently when compared to other information, resulting in enhanced accuracy for remembering and performing a given task (see Mesoudi, Whiten, & Dunbar, 2006; Smith & Semin, 2004). This might depend on the kind of co-regulation occurring “when both partners are responsive to mutual influence, resulting in the emergence of new information not previously available to participants prior to their joint engagement” (Krueger, 2011, p. 645). We suggest synchronous learning and turn-taking to be good examples of such dynamical interplay rooted in bodily action, and achieved via back and forth processes of reciprocal causation (Clark, 1997).
This is consistent with the main insights from 4E cognitive science discussed above: as living systems create their own experience through action rather than through representational recovery, much of their cognitive life is dynamically looped with social and physical resources of their bodies and surrounding environments (Chemero, 2009; Thompson, 2007; see also Malafouris, 2013, 2015). The intersection of gestures and actions emerging from learning settings based on synchrony and turn-taking appears to facilitate the collective and individual acquisition and development of skill (see Schiavio, Gesbert, Reybrouck, Hauw, & Parncutt, 2019), being implemented by “ongoing feedback loops that transform our cognitive profile in real-time and help us negotiate complex cognitive tasks” (Krueger, 2019, p. 48). This is particularly interesting from an “extended mind” perspective, which is increasingly applied to musical contexts (Cochrane, 2008; Kersten, 2017; Ryan & Schiavio, 2019): the theory holds that under certain conditions, living systems can realize part of their cognitive processes thanks to various forms of external scaffolding involving physical and social elements (Clark & Chalmers, 1998). Consider how we use musical notation to facilitate the performance of a piece that might be too demanding to be remembered. It is argued that because we use the information stored on the musical score as if the same information were stored in our biological memory, there is a functional similarity between “internal” and “external” domains, which can enrich our cognitive capacity (e.g., remembering a non-written fingering configuration while playing music written on the score). As such, the achievement of a cognitive task can often depend on the fluid integration of biological and non-biological resources forming a hybrid cognitive assembly.5 In social situations, this might involve the complex musical interplay brought forth by groups such as orchestras or ensembles, where the reciprocal interaction of all performers constitutes the final musical outcome. With regards to learning music, our study shows that the continuous integration of social and physical resources is more effective when it involves a moment-to-moment, online participation, in which action is prioritized over observation and repetition. This gives rise to the constitution of a distributed cognitive system where biological (peers) and non-biological (computers) resources are functionally coupled, possibly leading to a reduced cognitive load. If such resources are partially distributed in a shared space for musical action, tasks can be achieved more easily, resulting in enhanced musical outcomes. Such insights align with pedagogical perspectives based on an “action-first” philosophy, which put major emphasis on learning settings where students are encouraged to act musically together—and explore their creativity—from the very beginning of their musical journey (see e.g., Borgo, 2005, 2007; Schiavio, van der Schyff, Gande, & Kruse-Weber, 2019).
Finally, our study provides further empirical grounding to existing cross-disciplinary line of research that stresses the importance of synchrony for human artistic and social behavior (see e.g., Bowling, Herbst, & Fitch, 2013; Merker, 2000). In evolutionary terms, for example, it is argued that the “prosocial consequences of interpersonal synchrony conferred a fitness advantage on individuals in groups that practiced music” (Ravignani, Bowling, & Fitch, 2014). The importance of synchrony has been also addressed from an ontogenetical standpoint: as reported by Baimel and co-workers (2015), in learning how to synchronize with others or through spatiotemporal coordination to an external stimulus, young children can successfully (re)define the boundaries between self and other, “while simultaneously allowing for effective navigation of those boundaries in fostering efficient interpersonal coordination” (Baimel, Severson, Baron, & Birch, 2015). In all, it can be argued that synchrony contributes to the development of a shared space for action, a performative niche in which agents can co-create a unique experience, and consequently learn new skills. In intersubjective contexts, this aligns with experimental work from Paladino and colleagues (2010) on self-other merging, where it was found that the ability to distinguish one’s face from those of whom a subject has been synched up with, is affected by synchronous multisensory stimulation. As distinctions from self to other are blurred, cognitive resources can be distributed across agents more fluidly, giving rise to enhanced behavioral outcomes. Overy and Molnar-Szackacs (2009), among others, argue that multiagent synchrony can constitute a shared musical environment where the prediction of others’ behavior is facilitated (see also Schaefer & Overy, 2015). Here music can convey a sense of agency: a feeling of presence of the other that can help anticipate and understand each other. And indeed, “the experience of being synchronized together in time, and yet with a musical, human flexibility and variety creates a powerful sense of togetherness […]. The emerging sound is a group sound, almost ‘larger than life,’ created by a sense of shared purpose” (Overy & Molnar Szacaks, 2009, p. 495).
We should note here that our data failed to show a relation between performance quality (most importantly temporal synchrony) and interpersonal connectedness as measured by the IOS scale. A potential reason for this might involve the focus on individual performance throughout the study. While participants in the duo group learned together, they also performed individually: such discontinuity might have contributed to prevent social connectedness in the reported form. Having participants performing together, however, might have introduced additional confounding effects, such as nervousness of playing in front of the peer. This brings us to one of the main limitations of the study, that is, a lack of focus concerning the emotional aspects involved in the dynamics of learning and interacting. This is indeed a fruitful research avenue, which features a growing number of studies in the cognitive psychology and neuroscience of social collaboration. Here, as reported by Clark and Dumas (2015), recent work in the field provides additional evidence that concerns the recruitment of the mesolimbic dopamine reward system during peer-interaction, pointing to a mutual feeling of fulfillment and pleasure in shared activity (Pfeiffer et al., 2014; Redcay et al., 2010). As a number of recent studies show how emotionality contributes to the development of the motivational cues adopted to facilitate coordination (see Michael, 2011), new empirical work might address this point more clearly, exploring how musical emotions play a role in driving learning processes individually and collaboratively, and how they contribute to form cognitive assemblies with biological and non-biological tools from the environment (see also Colombetti, 2014). This can complement the focus of the present study on action and interaction, and offer novel fascinating possibilities for concrete musical settings and learning contexts. For example, more attention dedicated to the emotional aspects involved in the process might provide future studies with different options to avoid stress-reduction in participants (in the present study, as reported above, we let participants chose their best performance out of three possibilities). Another possible limitation involves the differences in time spent playing by participants during the three learning conditions. These differences are a result of the design of the learning conditions and often occur in real-world practice; playing in synchrony will usually result in more spent playing time than imitating or taking turns. However, we demonstrated that within each learning condition, the time spent playing was not associated with better or worse performance measures. Moreover, our results indicate that active playing time may be more beneficial than total exposure time that partially includes active playing. To better account for this difference, future research might develop experimental paradigms that more specifically control for actual playing time during each learning trial. This work could also explore whether the key role of synchronization and turn-taking in a novice’ learning experiences is also present in expert musicians. Comparing novices and experts might offer valuable insights into the cognitive mechanisms associated with musical learning, and shed new light on their coupling with the social and physical elements of the environment in which they are embedded.
We examined performance accuracy of novices who learned to play short musical pieces on the piano through individual or joint learning. We contrasted methods based on synchronization, turn-taking, and imitation (all assisted by instructional videos on a computer). We found that in both solo and duo settings, imitation is the least efficient condition overall, with learning settings based on synchrony and turn-taking leading to better pitch and tempo accuracy. These results are consistent with cross-disciplinary literature that highlights the role of co-presence and embodied action for musical learning, and that calls into question too rigid separations between the social, the individual, and the non-biological in the study of human cognition and skill acquisition. Novices like Jordan and Perry, to come back to our initial vignette, might benefit from technology-aided learning settings rooted in reciprocal interaction as much as their friends who learn music individually. As long as active musicking and moment-to-moment participation are prioritized, novices can share their musical journey from their very first lesson, and participating in each other’s learning. This opens up fascinating possibilities for music educators interested in developing novel teaching resources. For example, future computer-assisted music pedagogies might include instrumental courses specifically dedicated to groups of novices. This way, those who prefer learning music with friends from the beginning—as Jordan and Perry—can find opportunities and resources similar to those offered to more individually inclined learners. This scenario seems to be particularly suited for genres such as rock, jazz, or blues, where playing together is the norm; however, it might as well be applied to Western classical repertoires. This possibility for learning resonates with a more general trend in music scholarship that increasingly places emphasis on a variety of multi-agent contexts. In doing so, recent research explores creativity as a distributed phenomenon (see Bishop, 2018; Sawyer, 2006; van der Schyff, Schiavio, Walton, Velardo, & Chemero, 2018); considers musical agency as multiply constituted and inter-personal (Ryan & Schiavio, 2019; Schiavio & Cummins, 2015); and emphasizes the “hidden socialities” that dwell in individual musicking (Cook, 2018) without posing fundamental distinctions between genres or styles. Openly engaging with these “socialities” in joint and computer-based learning settings might help individual performers develop richer expressive nuances as well as communicate their musical intentions and ideas more clearly. An exploration of such dimensions goes well beyond the scope of the present study, but will hopefully find home in future research studies. While the precise relationship between 4E cognition and the acquisition of musical skills, as well as the nature of the cognitive assemblies formed during musical learning, remain to be further addressed, an understanding of the musical mind as fluidly distributed across social and physical resources continues to be developed.
We would like to thank Elli Xypolitaki for coding the videos and Nils Meyer-Kahlen for technical support during data collection. We are grateful to the action editor and three anonymous reviewers for their helpful suggestions. AS was supported by a Lise Meitner Postdoctoral Fellowship from the Austrian Science Fund (FWF): project number M2148. JS was supported by an Erwin Schrödinger Fellowship from the Austrian Science Fund (FWF): project number J4288.
Put simply, Embodiment emphasizes the mutual specifications between a living system’s body and its ability to think, perceive, feel, act, and communicate (e.g., Gallagher, 2005); Embedded cognition focuses on the ways in which the cultural, physical, and social properties of the environment can shape mental life (e.g., Haugeland, 1998); the main concern of Extended cognition is to explore the fluid integration between internal biological resources (e.g., memory) and external tools (e.g., computers), offering insights into how agents offload (parts of) their cognitive domain into the world to serve specific cognitive functions (e.g., Clark & Chalmers, 1998). Finally, Enactive cognitive scientists argue that mind and life compose a structured unity, intimately dependent on the patterns of ecological activity of an organism and its biological complexity (e.g., Thompson, 2007).
The linear model portrayed here displays important similarities with what has been often described as the “sandwich view” of cognition, standing at the core of traditional cognitive psychology (see Hurley, 1998). The metaphor assumes that cognition—the meat—is segregated by two layers of bread, namely perception and action. By this view, the latter two categories remain at unbridgeable removes from each other and end up being only partially involved in what mental life really is: the cognitive “meat.” The same focus on this middle level can arguably be found in the learning approach based on representations described in this section.
For expert musicians this might involve finding the best fingering solution for a passage they saw performed by a famous guitarist during a concert; for novices, imitating a complex musical goal might be more complicated because of the lack of expertise in their motor repertoire.
Here, shared responsibility refers to a condition where the success of the duo (learning the melodies) is equally distributed between co-actors, and not concentrated in a single individual.
It should be noted that some scholars remain skeptical of the explanatory power and logical consistency of the extended mind approach. Among others, Adams & Aizawa (2010) argue that the theory operates under a “coupling-constitution fallacy,” namely it confuses the causal relationship between mind and environment with a constitutive one. Put simply, while it can be true that living systems are closely coupled with external tools, it would be a mistake to conceive of the latter as actual “parts” of the mind (see also Ross & Ladyman, 2010). According to Clark (2010), however, such criticism ultimately fails to appreciate the functional significance of various forms of coupling for large-scale systems, leaving the core thesis fundamentally untouched.