Considerable evidence converges on the plasticity of attention and the possibility that it can be modulated through regular training. Music training, for instance, has been correlated with modulations of early perceptual and attentional processes. However, the extent to which music training can modulate mechanisms involved in processing information (i.e., perception and attention) is still widely unknown, particularly between sensory modalities. If training in one sensory modality can lead to concomitant enhancements in different sensory modalities, then this could be taken as evidence of a supramodal attentional system. Additionally, if trained musicians exhibit improved perceptual skills outside of the domain of music, this could be taken as evidence for the notion of far-transfer, where training in one domain can lead to improvements in another. To investigate this further, we evaluated the effects of music training using tasks designed to measure simultaneity perception and temporal acuity, and how these are influenced by music training in auditory, visual, and audio-visual conditions. Trained musicians showed significant enhancements for simultaneity perception in the visual modality, as well as generally improved temporal acuity, although not in all conditions. Visual cues directing attention influenced simultaneity perception for musicians for visual discrimination and temporal accuracy in auditory discrimination, suggesting that musicians have selective enhancements in temporal discrimination, arguably due to increased attentional efficiency when compared to nonmusicians. Implications for theory and future training studies are discussed.
When considering the scope and requirements of day-to-day activities, the human perceptual and attentional systems are impressively competent information processing systems even by modern standards of computational technology. That is, at any given point in time, our sensory system is inundated with a plethora of stimuli, whereupon the efficiency and selectivity of attention allows for effective navigation and selection from moment to moment, permitting goal-oriented behavior to be accomplished.
Interestingly, the neurological foundations of attention can sometimes change under certain conditions. Indeed, a more plastic understanding of brain function has become widely accepted, as opposed to a modular notion of brain functioning (Kolb, Gibb, & Robinson, 2003). For instance, this “plasticity” of the brain has been associated with slight compensations when a sensory modality is lost (Shimojo & Shams, 2001), or in adapting to conditions such as age, disease, stress, and even addiction (Kolb et al., 2003). In a behavioral and neurophysiological study, for example, Röder et al. (1999) compared congenitally blind adults with blindfolded controls on a central and peripheral sound localization task. Behavioral results showed blind participants to possess greater peripheral spatial localization abilities, dovetailing with electrophysiological (ERP) results that revealed enhanced early potentials for spatial attention. This study, in conjunction with others conducted with deaf participants (see for example, Rettenbach, Diller, & Sireteanu, 1999), suggests that the neural plasticity exhibited by the brain can partially make up for loss in one modality with compensation in another (for a tactile example, see also Borsook et al., 1998).
Evidence for plasticity can also be seen in attentional and perceptual mechanisms of populations not suffering any sensory loss. Indeed, many recent findings indicate that attentional mechanisms can be significantly modulated as side effects of specific daily activities or hobbies. For instance, research with video-game players (VGPs) using functional magnetic resonance imaging (fMRI) showed increased prefrontal cortex activity when compared with non-players during complex non-gaming tasks (e.g., one task required participants to monitor a target while moving their hands to control a cursor in the opposite direction), a change that the authors attribute to the constant demands on spatial attention while training with video games (Granek, Gorbet, & Sergio, 2010; see Bavelier et al., 2011, for a review).
Of particular importance to our understanding of training effects in different domains is whether the training results in increased performance in tasks not directly associated with the domain for which the training is completed. Despite recent claims otherwise (see for example the claims of numerous “brain training” companies suggesting generally improved cognitive capabilities after training on specific tasks), the concept of far transfer has been largely debated (see Zelinski, 2009). Indeed, most instances of cognitive related benefits are typically observed after extensive training and only in situations that strongly replicate the training situation. This view has been recently challenged by claims coming from research involving video players (both expert and trained) who have been observed to have increased perceptual skills outside of the video game when compared to non-video game players (see Green & Bavelier, 2003; for a recent meta-analysis see Bediou et al., 2018).
While the concept of far transfer has recently been debated with contemporary examples (e.g., video game training), the possibility of harnessing music training for extra-musical applications is an important area of research, especially given that music performance, on the other hand, is as old as civilization itself (Sachs, 1943). The topic of non-musical benefits (i.e., outside of their domain of expertise) in musicians has been the focus of much research, not in the least due to public interest and popular notions such as the “Mozart effect.” This commercialized phenomenon, which spans a number of products claiming to enhance development and increases intelligence in infants (e.g., Baby Bach, Baby Mozart, etc.), often misrepresents a popular study by Rauscher, Shaw, and Ky (1993), that in fact only demonstrated a temporary (15 minutes) enhancement of spatial-temporal reasoning after listening to Mozart. Many similar investigations claiming performance enhancements after listening to classical music have demonstrated limited effects, or effects possibly accountable to mood arousal or other indirect factors (Rogenmoser, Kernbach, Schlaug, & Gaser, 2018; Steele et al., 1999; Thompson, Schellenberg, & Husain, 2001). Moreover, a wide range of research has examined how music training can influence performance in areas such as mathematics (proportional reasoning scores), language (reading test scores), spatial-temporal abilities, and verbal memory (for a concise summary of findings, see Rauscher, 2003). Despite these efforts, it should nevertheless be noted that there is considerable skepticism and controversy on the existence of actual cognitive enhancements due to music training, with many arguing that the evidence is inconclusive (Črnčec, Wilson, & Prior, 2006; Schellenberg, 2001), or that the literature on training effects suffers from systematic methodological errors (see Green et al., 2019).
In examining possible training related attentional enhancements, it should be noted that specific training parameters are often necessary. For example, the purported improvements for video game players are found exclusively when training with “action” video games, which by their very nature demand quick responses to peripheral cues, multi-tasking, and efficient perceptual processing (Cohen, Green, & Bavelier, 2008). For instance, the regular use of action video games has been found to enhance visual spatial resolution (Green & Bavelier, 2007), as well as temporal resolution (Donohue, Woldorff, & Mitroff, 2010). One could claim that music training does not share such similar characteristics, at least to the same extent as playing video games.
Specifically, the demands of video game and musical activities are different in at least three aspects: modality of emphasis, vigilance requirements, and the degree of symbolic manipulation. Although both activities involve multisensory information, video games focus on visual stimuli, whereas music is more auditory based. Second, where most action video games require quick responses and vigilance for dealing with unexpected occurrences (e.g., impending foes), music arguably does not require this same extent of alertness, and certainly does not involve the same level of unpredictability (even musical improvisation often occurs within structures). Third, varying with the type of performance, music can demand spontaneous and continuous manipulations of symbolic representations (e.g., improvisation) that may consume significant cognitive resources, where this may be less likely to occur during action video gameplay. Thus, although the extent of these differences is debatable, it is clear that there are contrasts between the two activities, and it is presently open to investigate whether this will translate to differences in information processing in other domains.
Despite the differences between training with action video games and music, experimentation has nevertheless also shown attentional enhancements for musicians. A study by Helmbold and colleagues (2005), for example, compared musicians to nonmusicians on psychometric assessments of intelligence and general mental abilities (verbal comprehension, word fluency, mental rotation, perceptual speed, reasoning, number computations, various memory tasks, etc.) using five sub-tests from the Leistungsprüfsystem (a German intelligence test). No differences in performance were found except on two tests: flexibility of closure and perceptual speed. The flexibility of closure task (identifying hidden visual patterns) involved detecting single elements in complex objects, while the measure of perceptual speed involved finding visually presented letters amongst digit distractors. Relevant to our interest of musicians’ attentional resources, the authors speculated that the better performance on tasks measuring perceptual speed could be explained by the demands of music training in requiring quick recognition of musical symbols or structures. In a related study, Jones and Yee (1997) found musicians to be better at discriminating time change to auditory rhythmic patterns, but only when these were simple patterns. Furthermore, research using a line bisection task also showed faster reaction times and fewer errors in musicians, suggesting better visual perceptual processing (Patston, Hogg, & Tippett, 2007). Collectively, these findings suggest that temporal and spatial processing might indeed be enhanced in expert musicians when compared to nonmusicians.
Reflecting perhaps the interest in ascertaining the true effects and structural differences caused by music training, there is a wealth of neuropsychological studies on the musician’s brain. A considerable amount of evidence (see Jäncke, 2009, for a review) seems to suggest that musicians have greater neuroplasticity resulting in both functional and anatomical differences, including increases of grey and white matter volume in particular locations within the left cerebellum. Other evidence from neurophysiological studies suggests more pronounced cortical reorganizations for musically related motor activity, and larger evoked potential fields (25%) in response to instrumental tones when compared to controls (Gaser & Schlaug, 2003; Münte, Altenmüller, & Ja¨ncke, 2002). Whether or not these musically related brain differences also lead to benefits in temporal processing is speculative (i.e., perhaps these differences already existed, and lead people to become musicians). However, the aforementioned behavioral evidence with musicians does point towards increased attentional and perceptual abilities on various tasks in both the auditory and visual modalities (for visual enhancements, see also Lim & Sinnett, 2011).
Using a sample of expert musicians enables a closer look at specific theories of attention. For instance, Farah, Wong, Monheit, and Morrow (1989; see also Pavani, Husain, Ládavas, & Driver, 2004) claimed that all sensory modalities have access to a single reservoir of attentional resources, while other authors have argued for a segregated system including individualized resources for specific modalities (Sinnett, Costa, & Soto-Faraco, 2006; Sinnett, Juncadella, Rafal, Azanon, & Soto-Faraco, 2007; Spence & Driver, 1996; Thesen, Vibell, Calvert, & Osterbauer, 2004; Wickens, 1984). An fMRI experiment presented by Macaluso, Frith and Driver (2000) demonstrated modality specific and multisensory areas in the brain and early interactions have been shown for all sensory modalities (Thesen et al., 2004). Cueing senses have been shown to affect early modality specific areas in the brain, but these links are likely to back projections (Johansen-Berg, Christensen, Woolrich, & Matthews, 2000; Meyer, Wuerger, Rohrbein, & Zetzsche, 2005).
Attention influences these perceptual brain activations in a number of ways (see Vibell, Klinge, Zampini, Spence, & Nobre, 2007, 2017). Perceptual cueing depends both on temporal and spatial proximity. Spatially, the cue needs to be in the general area where the preceding stimulus is presented, and temporally it needs to occur within a specific timeframe (Spence, Shore, & Klein, 2001). Work looking at the attentional blink has shown that a stimulus presented within a time frame of 100–500 ms before another stimulus can actually be distracting (Raymond, Shapiro, & Arnell, 1992). Otherwise, it is commonly accepted that a cue presented in the same location as a preceding target enhances perception (Posner & Rothbart, 1980), regardless if the cue is in the same sensory modality or another one (Spence et al., 2001).
By testing musicians for enhanced attentional and perceptual capabilities in the visual modality, we can indirectly assess whether training in one sense leads to general cognitive performance enhancements both in the same sensory modality and in another. Such a finding would closely align with investigations, where auditory enhancements were observed despite the training being predominantly visual based (Donohue, et al., 2010; Green, Pouget, & Bavelier, 2010). Music has been shown to activate a number of areas in the brain including pitch-related regions for single sounds and higher antero-ventral and postero-dorsal pathways for musical patterns. Many aspects of music such as pacing, note durations, microduration shifts, and specific emphases play a role in musical expression. Importantly, the variance of timing is one of the key components to making music sound lifelike (Bartlette, Headlam, Bocko & Velikic, 2006). For this reason, the present study looked at timing perception in musicians and nonmusicians; however, the present study used naturalistic sounds that musicians and nonmusicians should have equal experience with to avoid the potential for the results to be simply an artifact of bias to or experience with the chosen stimuli.1
The research described here has both theoretical and practical relevance. First, extending the investigation of enhanced attentional mechanisms to understudied populations (e.g., musicians) will support many of the findings pertaining to the plasticity of the human attentional system, specifically with regard to the speed at which information can be processed. Second, comparing performance across different sensory modalities enables us to provide further evidence regarding the extent to which attention may be best described by segregated or supramodal theoretical frameworks. Third, it should be noted that this is the first study with musicians that directly compares performance across identical tasks in visual, auditory, and crossmodal conditions. However, before describing the approach taken to address these questions, and the findings, it is important to first discuss in detail temporal perception.
Temporal Perception
The investigation into the human capacity to discriminate temporal events is one of the oldest topics in experimental psychology. Indeed, this topic has witnessed a variety of methodologies in multiple sensory modalities (Bald, Berrien, Price, & Sprague, 1942; Boring, 1929; Exner, 1875; Titchener, 1908). The rich history and varied empirical approaches to understanding temporal perception exemplifies the fundamental implications of temporal processing on how humans perceive and interact with their environment. It influences audiovisual integration, speech recognition, and how humans generally integrate multiple stimuli into coherent percepts of events (Navarra et al., 2005). For instance, audiovisual integration has been found to depend on attention and to have specific temporal constraints (Alsius, Navarra, Campbell, & Soto-Faraco, 2005; van Wassenhove, 2013; van Wassenhove, Grant, & Poeppel, 2005). This is highlighted by the seemingly simple act of speech recognition, which has been shown to rely on detecting temporal cues, thereby necessitating accurate temporal discrimination (Shannon, Zeng, Kamath, Wygonski, & Ekelid, 1995; Tallal, Miller, & Fitch, 1993).
The temporal order judgment (TOJ) task has been widely used as a tool to measure temporal processing. From this task, two measures of perceptual processing can be calculated: the just noticeable difference (JND), and the point of subjective simultaneity (PSS). The JND is a measure of the resolution or threshold of temporal discrimination, while the PSS is the time in which one stimulus can be presented before the other such that they are still perceived as occurring simultaneously (e.g., in a crossmodal task, it can indicate whether auditory or visual stimuli must be presented first for them to be perceived as simultaneous). In studies examining unimodal and cross-modal (visual, auditory, and tactile) temporal order judgments (TOJ), Hirsh and Sherrick (1961) demonstrated that participants could discriminate temporal order between stimuli (JND) when presented as quickly as 20 ms apart. Interestingly, however, in crossmodal tasks (i.e., audio-visual presentations) the visual stimuli had to lead auditory stimuli by approximately 40–80 ms for participants to perceive them as being presented simultaneously (PSS; Hirsh & Sherrick, 1961; Zampini, Shore, & Spence, 2003). Other research also suggests that the baseline resolution of temporal acuity is better in the auditory modality than in the visual or tactile modalities (Chen & Yeh, 2009; Gebhard & Mowbray, 1959; O'Connor & Hermelin, 1972; Vibell et al., 2007, 2017; Welch, DuttonHurt, & Warren, 1986).
Furthermore, performance on TOJ tasks can also be sensitive to procedural variations, changing according to response requirements (e.g., “which side came first?”, “which modality came first?”, etc.), as well as whether stimuli are presented from different or same locations. In an audiovisual TOJ study by Zampini et al. (2003), for instance, it was found that participants could discriminate up to a threshold of 22 ms when the task required them to respond which “modality” occurred first (i.e., visual, auditory, or tactile), as compared to 62 ms when responding which “side” occurred first (i.e., left or right). This would indicate that determining the spatial resolution of temporal order may be more difficult than simply determining the order of modality of presentation. Additionally, Zampini et al. noted that across different modalities, an average increase in the temporal resolution of 10–30 ms could be observed when stimuli were presented from different spatial locations (i.e., first on the left and second on the right), as opposed to when they were repeatedly presented from the same location (i.e., first on the left and second also on the left), suggesting that spatial information can provide additional help in discriminating temporal order (see Vibell et al., 2017, for similar findings between vision and touch).
Importantly, this research indicates that our ability to determine temporal order is malleable, which leads to the question of whether or not extensive musical experience would be capable of modulating (improving) temporal acuity. Indeed, basic research has shown that many factors can influence the efficacy of humans’ ability to discriminate temporally. This implies that temporal perception is perhaps dependent on attentional mechanisms, and not purely a sensory-based process. For example, deficits in temporal discrimination have been linked to stroke patients suffering from unilateral visual neglect (Husain, Shapiro, Martin, & Kennard, 1997). Here, patients were required to detect specific target letters embedded within a rapid serial visual presentation (RSVP) stream of letters. Brain-injured patients exhibited diminished performance when compared with controls at short temporal lags between targets, suggesting that their ability to detect stimuli appearing in close temporal proximity is greatly impaired. More recent work by Sinnett et al. (2007) also found deficits in both visual and auditory TOJ tasks for patients with right-hemisphere lesions, with JND scores of up to 250 ms. Finally, age has also been observed as a factor that affects TOJs, with a general decline in perceptual acuity occurring with increased age (Dinnerstein & Zlotogura, 1968). Further demonstrating the malleability of temporal processing, numerous studies have shown that enhancements can occur from different training (Donohue, et al., 2010; Green & Bavelier, 2003; Green, Li, & Bavelier, 2010; Spence & Feng, 2010). Importantly, these findings have been extended to musical conductors in an auditory TOJ task (Hodges, Hairston, & Burdette, 2005).
In a study using both fMRI and behavioral measures, Hodges et al. (2005) examined the effects of musical expertise by comparing ten conductors to age and education matched controls. Overall, conductors outperformed the control group on several tasks. For instance, they found that conductors had better pitch discrimination skills as well as shorter auditory temporal thresholds in a TOJ task. Specifically, conductors required less time between two sounds to correctly discriminate which one had occurred first, and also performed better on a crossmodal TOJ task involving visual targets and concurrent auditory clicks. It should also be noted that previous findings indicated that receiving concurrent information via additional sensory modalities (i.e., a multisensory condition) can actually enhance performance on a temporal order task (Morein-Zamir, Soto-Faraco, & Kingstone, 2003). These novel findings suggest that conductors may have more efficient and enhanced levels of multisensory processing.
If music training does indeed lead to enhancements in information processing, then temporal discrimination may be better than control performance both within and across sensory modalities, which would be manifested in lower temporal thresholds (JNDs) in the temporal order judgment task. While this would be expected in the auditory modality, as it corresponds to the medium used in music training, it is unclear as to what will happen in the visual modality (but note potential enhancements for visual processing in musicians, as observed by Lim & Sinnett, 2011) or even in crossmodal situations. Other patterns of differences (e.g., prior entry effects, seen in PSS scores) within and between musicians and controls will likewise have implications on sensory dominance as well as how attentional orienting occurs across sensory modalities.
The investigation described below compared trained musicians with controls (who did not have any music training) on a series of TOJ tasks that were presented in either the visual or auditory modalities, or across modalities. Additionally, a non-predictive cue from the same sensory or different, sensory modality was randomly introduced. Altogether this created seven different conditions (see Table 1) that were randomized and counterbalanced. For the uncued conditions, auditory, visual, and audio-visual (crossmodal) were presented. For the cued stimuli, four conditions occurred: auditory stimuli with unimodal and crossmodal cues, and visual stimuli with unimodal and crossmodal cues. While all of the participants completed each of these tasks, for ease of presentation, the conditions are described in two experiments. The first experiment discusses visual, auditory, and crossmodal TOJ tasks that did not include spatial cues. The main interest is the possible music-related modulations of detection thresholds (JND) in all conditions. The JND is a measure of the ability to detect minute differences in timing between stimuli and should be lower for trained musicians. The PSS however, should theoretically be close to zero (i.e., show no bias) in the unimodal conditions given the absence of a spatial cue. However, the PSS in the crossmodal condition should indicate that visual stimuli need to be presented before auditory stimuli for simultaneity to be perceived (see Hirsh & Sherrick, 1961; Zampini et al., 2003). It is possible that this effect would be reduced in musicians if they do indeed process multisensory information more efficiently. The conditions discussed in the cueing experiment involve adding a spatial cue that could be in either the same or a different modality to the target task. For instance, the visual TOJ task could have a visual (peripheral flash, unimodal) or auditory cue (lateral beep, crossmodal) preceding the presentation of the stimuli. While the JND continues to be of interest here, the addition of the spatial cue enables a modulation of the PSS, thereby addressing the question of whether musical expertise potentially influences spatial attention as well as temporal attention. The cue modulates the processing of temporal perception by “temporally distracting” the first of the TOJ stimuli. That way we can evaluate how temporally distracting the cue is for musicians versus nonmusicians and if this temporal distraction differs when the cue is of the same or a different sensory modality. If the human attentional system has access to segregated attentional resources (i.e., dedicated attentional resources for each sensory modality), then, then the potential enhancement should be restricted to only the auditory modality. However, if the human attentional system operates in a supramodal manner, any enhancement for the musicians in the auditory modality could also be observed in the visual modality (see Johnson & Zatorre, 2006; Spence, Shore & Klein, 2001). Musicians have been trained to play while having to simultaneously listen to competing melodies, which could make them more immune to distraction from other sounds. They have also been trained to pay attention to visual cues while playing music, and therefore, could show a different influence from cues in this sensory modality. Importantly, and given that the stimuli for this experiment were non-musical in nature, any improvements in the auditory, visual, or cross-modal conditions would be indicative of perceptual improvements outside of music (i.e., far transfer), and therefore be of great importance to our understanding of how music training might be beneficial in non-music situations.
Measuring Temporal Perception
Method
Participants
A total of 20 musicians (age = 28 ± 12; 5 females) were recruited from the music department at the University of Hawaii at Mānoa, local music studios, and through flyers placed throughout campus. Gender has been shown not to influence TOJs so no effort was made to balance the sample on this basis (van Kesteren & Wiersinga-Post, 2007). To qualify as a musician, participants were required to have at least three years of formal training in music, and to report a regular practice schedule of at least six hours/week over the past six months (see Appendix Table 1 for instrument types and musical experience reported). These figures were chosen based on similar cutoffs for experts used in studies with looking at far transfer of training on general cognitive abilities (Bediou et al., 2018; Green & Bavelier, 2003; Helmbold, et al., 2005).
Control participants (n = 20; age = 22 ± 5, 16 females)2 were recruited from undergraduate courses at the University of Hawaii at Mānoa. All control participants had little or no formal training in music and normal or corrected to normal hearing and vision, as assessed by an initial survey. Control participants were offered course credit for their participation, and musicians were given $10 in order to facilitate recruitment. Ethical approval was obtained from the Committee on Human Subjects at the University of Hawaii at Mānoa.
Stimuli and Apparatus
Visual, auditory, and crossmodal stimuli were used to compare musicians and controls’ responses in various modalities. Stimuli were presented on a 21-inch Core2Duo 2.4 GHz iMac computer using DMDX software (Forster & Forster, 2003), with visual stimuli occurring on screen and audio stimuli presented via external speakers placed directly besides the monitor. Participants were seated at an eye to monitor distance of approximately 60 cm. From this distance, all presented auditory stimuli occurred at approximately 75 db, as measured by a sound meter.
Stimuli for the visual task were horizontal and vertical lines subtending 0.9° and occurring centrally within the placeholder squares. For the auditory stimuli, processed samples of a dog and crow sound, both lasting for 350 ms were used. The auditory stimuli were downloaded from http://www.a1freesoundeffects.com and manipulated using Cool Edit software (Syntrillium Software Corp.) in order to achieve two sounds of equivalent duration (350 ms) and average amplitude (see Sinnett et al., 2006; Sinnett, Juncadella, Rafal, Azañon, & Soto-Faraco, 2007, for more details).
In the audio-visual (crossmodal) condition, the visual stimulus consisted of a black square of width 0.9° (appearing within the placeholder), whereas the auditory stimulus was a 50 ms white noise burst. The auditory stimuli in the crossmodal condition were simplified to match the visual stimuli. There were no further attempts to match stimuli across modalities as it is unclear what is the most appropriate dimension to match between (e.g., detection latencies, discrimination latencies, subjective or objective intensities; see Spence et al., 2001, for a discussion). Responses to auditory and visual stimuli were made via key presses on a keyboard (either the C or D buttons for auditory stimuli or the Z and / buttons w for visual responses). The fixation cross was 0.5° visual angle wide. Visual stimuli appeared in placeholder squares that were 1.4° wide and situated 4° from the central fixation.
Procedure
The basic temporal order judgment task involves presenting participants with two stimuli separated by variable stimulus onset asynchronies (SOAs). Task difficulty was manipulated using a staircase approach to adjust SOAs following the setup of Stelmach and Herdman (1991, see also Levitt, 1971). Accordingly, the SOA for each successive trial either decreases or increases in a stepwise manner dependent on whether the participant answers the previous trial correctly. Stepwise increments occurred at intervals of 16.7 ms (monitor refresh rate). The experiment started off with a relatively easy SOA of 167 ms, and as it progressed, each trial’s SOA decreased until the stimuli became very close and the order of occurrence difficult to determine. It can be inferred then, that as time progresses, changes in stepwise direction (up and down) will increase, reflecting increasing uncertainty in the participant. The task then terminates once a cutoff number of turning points is reached (12 in this study; see West, Stevens, Pun, & Pratt, 2008, for a similar approach, albeit one with a less conservative cutoff).
Prior to each trial a fixation-cross flanked by two square placeholders on the left and right was presented (see Figure 1). The procedure was identical across all three modality conditions. Participants made unspeeded responses signaling which stimulus they believed had appeared first using one of two keyboard buttons corresponding to each stimulus (e.g., horizontal or vertical line). In all three modality conditions, participants were first presented with onscreen instructions followed by a short sequence of practice trials, with accuracy feedback directly appearing after each trial (feedback was not given during the experiment proper). The experimenter monitored completion of the practice trials and ensured that participants understood the task requirements (repeating the practice session if needed). Presentation side (i.e., left or right) and stimuli order (e.g., horizontal or vertical line first) were all randomized, as was the order of experimental conditions (e.g., audio, visual, crossmodal) for each participant. Each condition took approximately 5–8 minutes to complete.
Results
The results from the TOJ task can be analyzed to determine the just noticeable difference (JND), which is the minimum amount of time that must separate two events such that they are still accurately perceived as occurring successively (and not simultaneously). In addition to determining the threshold of temporal discrimination, a second measure can be extracted from the results, referred to as the point of subjective simultaneity (PSS). The PSS is the point in time in which one stimulus can be presented before the other such that they are still perceived as being simultaneous. Although this measure is usually expected to fall at 0 ms (or close to it) in unimodal conditions (unless there is a bias in response), it is more informative in the crossmodal condition as it indicates whether auditory or visual events must be presented first for them to be perceived as simultaneous. Recall that Zampini et al. (2003) demonstrated a visual target must precede an auditory target by about 40–80 ms for simultaneity to be perceived.
The calculation of both the JND and PSS was based on approaches used by previous research (for examples of other studies using similar methodologies and analyses, see Spence, Baddeley, Zampini, James, & Shore, 2003; Stelmach & Herdman, 1991). The average ratio of responses “horizontal line first” (or “crow first”/“auditory first”) for each group (musician or control) was plotted as a function of the time in which the horizontal line preceded the vertical line (or the crow sound preceded the dog sound, etc.). For TOJ tasks, response rates typically follow a sigmoidal curve, from which data can be fit using the following logistic function:
where the response rate is mapped as a function of the SOA (x), with two estimated parameters of central tendency (a), and slope (b; see Spence et al., 2003).
Data were fit to this equation by minimizing the weighted sum of squares to obtain parameter estimates for a and b. The PSS, or SOA at which the participants considered the two stimuli to be simultaneous, corresponds to parameter a. The JND, or smallest interval between two stimuli giving a correct judgment probability 75% of the time, is directly related to parameter b (analogous to the slope of the central portion of logistic function). Here, the relationship is that a steep slope will result in a smaller JND (i.e., better temporal resolution), and a shallow slope in a larger JND. The JND can be obtained using the slope according to the following formula (see Spence et al., 2003):
Confidence intervals (95%) for each group statistic were calculated using a parametric bootstrap method with 999 replications (Efron & Tibshirani, 1993). For between-group comparisons a permutation bootstrap method was used, where data from both groups were combined and resampled to construct a distribution from which the likelihoods (p value) of obtaining the observed differences between each groups’ scores from a mixed population pool were estimated (for an example of another TOJ study that employed a similar bootstrap approach, see Azañón & Soto-Faraco, 2007).
Given the unique constraints of our dataset, we decided to use a bootstrap resampling approach for the statistical analyses due to particular benefits over more traditional means. That is, due to the varied number of observations and different response patterns resulting from the adaptive staircase paradigm, each individual’s data points could vary significantly, and fitting the logistic function individually did not always converge or yield meaningful estimates. Thus, combining data from all participants in each group allowed for a better distribution of scores across all SOAs for the logistic fit from which we were able to extrapolate the overall JND and PSS values for musicians and controls using the abovementioned functions. Furthermore, using bootstrap resampling enabled a direct comparison of these parameters as well as confidence interval estimations (for a study using a similar bootstrap approach with pooled data; see Jeon, Hamid, Maurer, & Lewis, 2010). Unfortunately, however, the combining of scores across participants did not allow for a traditional ANOVA analysis for main effects or interactions due to the lack of estimates for individual scores. Nevertheless, we strove to compare group scores both within and across experiments to provide the most comprehensive scope of analyses.
Visual Condition
In the visual condition, the average PSS score for musicians’ was significantly lower than controls by 10 ms (-14 ms, CI = -22 to 5 ms; vs. -4 ms, CI = -9 to 2 ms; p = .037; respectively), with the negative PSS values indicating a possible bias in responses towards horizontal lines for both groups. Nevertheless, the confidence intervals for both groups straddle 0 ms, and thus may not actually differ from 0 ms. The average musicians’ JND score was also significantly lower than controls by 18 ms (29 ms, CI = 23 to 35 ms; and 47 ms, CI = 37 to 56 ms; p = .006; see Figure 2).
Auditory Condition
In the auditory condition, the differences between musicians and controls were non-significant for PSS (2 ms, CI = -8 to 12 ms; vs. 9 ms, CI = 1 to 16 ms; p = .31; respectively) and approaching significance for JND scores (43 ms, CI = 34 to 53 ms; and 56 ms, CI = 45 to 68 ms; p = .07; see Figure 3).
Crossmodal Condition
In the crossmodal condition, musicians’ average PSS score was not significantly different than controls (-43 ms, CI = -60 to -25 ms; and -63 ms, CI = -93 to -30 ms; p = .261; respectively). It is also worth noting here that the negative PSS results indicate a bias in response towards auditory stimuli. That is, the visual stimuli needed to be presented prior to the auditory stimuli in both groups for simultaneity to be perceived (see also Zampini et al., 2003). In regard to the average JND, musicians’ scores were significantly lower than controls by 59 ms (104 ms, CI = 80 to 127 ms; and 163 ms, CI = 112 to 207 ms; p = .021; see Figure 4).
Discussion
Not surprisingly, there were no differences between musicians and controls for auditory PSS scores, although this is to be expected given that PSS should be relatively close to zero, unless there is a bias of some sort to one of the targets, or if a spatial cue is incorporated (i.e., the cueing experiment). Furthermore, and contrary to our initial hypothesis, the difference in temporal thresholds between musicians and controls in the auditory condition was only marginally lower (p = .07). This is surprising given that music training is arguably largely auditory in nature, and therefore a more robust difference in auditory JND scores between musicians and control participants was expected. That said, these stimuli were designed not to be musical in nature. Auditory stimuli were natural sounds (bark and crow) that were selected to avoid the possibility that the effects would be driven by familiarity or experience, instead of a transfer of perceptual skills to non-trained stimuli. On the other hand, in the visual condition musicians did have both significantly lower PSS as well as JND scores, compared to controls (see also Hodges et al., 2005; Lim & Sinnett, 2011). Helmboldt and colleagues (2005) saw a similar bias towards vision in musicians in their closure task (identifying hidden visual patterns) and Patston et al. (2007) found faster reaction times and fewer errors in musicians, suggesting better visual perceptual processing. Although visual PSS scores for both musicians and controls were biased towards the horizontal line, the significant difference between groups demonstrates that control participants were slightly more biased towards responding to the horizontal bar (i.e., the vertical bar needed to be presented 14 ms prior to the horizontal bar for simultaneity to be perceived). It is difficult to speculate as to why participants favored the horizontal bar overall, or why control participants favored the horizontal bar more than the musicians although musicians are trained to look out for staves as a visual cue in music. It should be noted that PSS scores were low in general (i.e., close to zero, with confidence intervals also straddling 0 ms). More importantly, the significant JND enhancements seen in the visual modality may suggest improved resolution of visual temporal order in the musician group.
Despite the differences observed in the visual modality, musicians only showed a marginal difference when compared to nonmusicians in the auditory modality. It is possible for instance, that the “realistic” auditory stimuli (i.e., crow and dog sounds) used in our experiment may be more difficult than simpler tones, and therefore any effect might be partially masked. Furthermore, it might be possible that as the sounds were non-tonal (musical tones have been used in past research, see Hodges et al., 2005), musicians may not have had a distinct advantage. Future research could directly compare these auditory types. Additionally, using tones could lead to a ceiling effect. Indeed, given that humans discriminate temporal events better in the auditory modality when compared to the visual modality, it is possible that both musician and control participants performed similarly due to a ceiling effect (Chen & Yeh, 2009; Gebhard & Mowbray, 1959; O'Connor & Hermelin, 1972; Welch, et al., 1986). Note, however, that much of this is based on speculation, and therefore future research is needed.
The largest differences were seen in the crossmodal condition. As can be seen in Figure 5, it is worth noting that PSS scores were substantially larger in the crossmodal condition when compared with those in the auditory or visual conditions, with a shift towards visual first SOAs. That is, for both groups, visual stimuli had to precede auditory stimuli (by 43 and 63 ms) for both events to be perceived as occurring simultaneously (although there were no statistical differences between musicians and controls for PSS, despite the large numerical trend in results). These findings coincide with previous crossmodal TOJ findings, where visual events generally preceded auditory events by 40 to 80 ms (Hirsh & Sherrick, 1961; Zampini, et al., 2003) for simultaneity to be perceived. Additionally, musicians had significantly lower JND scores, indicating that the temporal resolution of multisensory processing was better for the musicians who took part in this study. Speculatively, this could potentially be due to the requirements of reading musical notation while concurrently listening to auditory information. And finally, also supported by Zampini et al. (2003), JND scores increased nearly three-fold when compared to unimodal conditions, demonstrating that the multimodal task was more difficult.
In addition to the basic TOJ setup used in the first experiment to measure temporal perception, spatial cues can also be incorporated into these tasks, thereby allowing a measure of how attention is oriented and captured for the two groups (in addition to still being able to measure JND). This is ideal as the additional spatial cues not only provide an opportunity for a better understanding of how information processing is modulated but are perhaps more analogous to real-world situations requiring attention to be directed to a task while at the same time being presented with irrelevant within and across modality stimuli. The presentation of exogenous cues prior to stimuli onset in a TOJ task creates a “prior entry” effect, whereby attention is directed towards the cued side and subsequently affects performance on the task, regardless of whether or not the cue is predictive of location (i.e., in our task the cue is only correct half of the time; see also Shore, Spence, & Klein, 2001; Spence et al., 2001; Vibell et al., 2017; Zampini, Shore, & Spence, 2005).
Exogenous orienting can occur from any stimulus that causes a reflexive, automatic, or bottom-up orienting of attention (e.g., bright flashes, loud sounds, etc., that immediately capture attention). By presenting an exogenous cue in the TOJ task prior to the onset of the first stimulus, participants’ attention should theoretically be involuntarily directed to the cued side. If both left and right stimuli are then presented simultaneously, the effect will be that the cued side is perceived as having occurred first. The PSS score, in this case, would indicate how much in advance the uncued side must be presented before the cued for subjective simultaneity to be perceived. Analyses generally reveal the PSS as deviating from the central tendency by shifting towards the cued side (Shore et al., 2001). Thus, in the cueing experiment, we replicated the unimodal conditions of the first experiment with additional within sensory and cross-modal cues to determine whether spatial attention, as measured by the cued TOJ task, will differ between musicians and nonmusicians. Based on findings from the first experiment, it is hypothesized that musicians should have a smaller orienting effect, which would manifest in lower PSS and JND scores than controls across all conditions. That is, smaller JND and PSS scores would be indicative of improved temporal processing (smaller JND) and less influence from peripheral distraction (smaller PSS). Furthermore, to our awareness, this would be the first time multimodal cued TOJ tasks have been conducted on musicians (although for preliminary findings in the visual modality using exogenous and endogenous cues; see Lim & Sinnett, 2011).
Unimodal and Crossmodal Cueing
Method
Participants
The same participants from the first experiment also took part in the cueing experiment. All conditions from both experiments were interleaved and randomized into a battery of seven tasks for each participant in order to counterbalance for any possible training effects. The description of the experimentation is separated here only to facilitate coherence in presentation and analyses.
Stimuli and Apparatus
The stimuli and conditions were identical to those in the first experiment, except for the use of exogenous non-predictive cues in all conditions. In the visual condition, the cue was created by thickening the placeholder box of the respective side to a thickness of 4 pixels for 45 ms (see Figure 6). In the auditory condition, the cue was a laterally presented 500 Hz sinusoidal wave lasting for 45 ms, followed by a cue-target interval of 45 ms. The crossmodal condition consisted of two tasks: the first was an auditory TOJ task with visual cues, while the second was a visual TOJ task with auditory cues. All cues were randomly determined by the step staircase algorithm, and thus had an equal chance of validly or invalidly cuing the target stimuli.
Procedure
The procedure was identical to that used in the first experiment, except that participants were notified of the cues and that they were non-predictive in nature (i.e., they were instructed to make their judgments based on temporal order alone). The order of experimental conditions (e.g., auditory, visual, crossmodal) was randomized for each participant.
Results
The JND and PSS scores were calculated using similar methods as in the first experiment. First, data were pooled over musicians and control participants into two separate groups. For each of the four conditions, data from the two groups were fit to a weighted logistic function according to which stimuli were cued (e.g., horizontal/vertical bar, dog/crow sound, etc., see Figure 6 and 7). Separate PSS and JND estimates (a and b parameters) were then calculated for each of the two cued curves. The overall PSS value for each condition was computed as half the distance between each of the PSS values for the two curves,
and the average of the two JND values for each curve was then used as the overall JND score. This approach essentially calculates the PSS for each type of stimulus cued, and averages the effect. In order to gauge the influence of the cue, the two fitted curves were compared against one another. Logically, if the two curves were to map out on top of one another then the average would be 0 (PSS), as would be expected if the cue did not have any effect (assuming no bias for one stimulus type or the other). Thus, the larger the difference between the logistic fits for each cue, the larger the PSS, and by extension the greater effect that the cues had in general. This can similarly be applied to the calculation of JND (see Shore et al., 2001), although as the slope, JND1 and JND2 are expected to be similar for each stimulus type.
Similar to the first experiment, confidence intervals (95%) and p values were computed using appropriate bootstrap and permutation resampling methods for each statistic.
Unimodal Cues
Visual Condition
To assess whether the visual cues had any effect on temporal judgment, horizontal-cued trials were compared to vertical-cued trials. This comparison revealed significant differences in PSS scores for both musicians and controls (both p < .001), suggesting that the visual cues effectively captured attention. As observed in the first experiment, the magnitude of this effect, however, was significantly lower for musicians than controls by 29 ms (30 ms, CI = 10 to 46 ms; and 59 ms, CI = 48 to 71 ms; p = .023). On the other hand, JND scores for both groups were not significantly different from one another (80 ms, CI = 63 to 93 ms; and 84 ms, CI = 75 to 96 ms; p = .29; see Figure 7).
Auditory Condition
To assess whether the auditory cues had any effect on temporal judgment, crow-cued trials were compared to dog-cued trials. This comparison revealed significant differences in PSS scores for both musicians and controls (both p < .01), suggesting that the auditory cues effectively captured attention.3 The magnitude of this effect, however, was not significantly different between musicians and controls (23 ms, CI = 12 to 37 ms; and 29 ms, CI = 17 to 42 ms; p = .259; respectively). Similarly, there were no differences in JND scores between the two groups (92 ms, CI = 77 to 110 ms; and 109 ms, CI = 93 to 125 ms; p = .106; see Figure 8).
Crossmodal Cues
Auditory TOJ with Visual Cues
To assess whether the crossmodal cues had any effect on temporal judgment, cued trials were compared to uncued trials. This comparison revealed significant differences in PSS scores for both musicians and controls (both p < .05), suggesting that the visual cues effectively captured attention. However, the magnitude of the PSS scores did not significantly differ between musicians and controls (8 ms, CI = 1 to 14 ms; and 9 ms, CI = 1 to 15 ms; p = .39; respectively). On the other hand, JND scores were significantly lower by 16 ms for musicians compared to controls (47 ms, CI = 42 to 56 ms; and 63 ms, CI = 53 to 73 ms; p = .014; see Figure 9).
Visual TOJ with Auditory Cues
To assess whether the crossmodal cues had any effect on temporal judgment, cued stimuli (horizontal and vertical) were compared to non-cued stimuli. This comparison revealed significant differences in PSS scores for both musicians and controls (both p < .001), suggesting that the visual cues effectively captured attention. However, the magnitude of the PSS scores did not significantly differ between musicians and controls (10 ms, CI = 6 to 14 ms; and 13 ms, CI = 8 to 18 ms; p = .19). Similarly, there were no differences in JND scores between the two groups (31 ms, CI = 26 to 35 ms; and 35 ms, CI = 29 to 38 ms; p = .088; see Figure 10).
Within vs. Across Modality Cue Comparisons
To assess any differential effects of within modality or cross modality cues, the respective TOJ tasks were compared. Comparing the auditory TOJ task that had auditory cues (i.e., unimodal) to the auditory TOJ task that had visual cues (i.e., crossmodal) revealed significantly lower PSS and JND scores for the crossmodal condition for both musicians and controls (both groups PSS, p < .05 and JND, p < .001). Comparing the visual TOJ task that had visual cues (i.e., unimodal) to the visual TOJ task that had auditory cues (i.e., crossmodal) also revealed significantly lower PSS and JND scores for the crossmodal condition for both musicians and controls (both groups PSS, p < .05 and JND, p < .001).
Cross-experiment Comparisons
Further understanding of the cuing effects can be determined by comparing the results from the cued tasks in the cueing experiment to the no-cue unimodal tasks (auditory and visual) of the first experiment. A comparison of cued JND scores to no-cue scores revealed that they differed for unimodal conditions but not for crossmodal conditions. That is, the additional cues in the cueing experiment made the unimodal tasks harder for both musicians and nonmusicians, as evidenced by longer temporal thresholds (JND). This was reflected by differences between cue conditions in both the auditory (crow- or dog-cued) and visual (horizontal or vertical bar-cued) unimodal conditions of the cueing experiment when compared with their respective counterparts in the first experiment (auditory and visual no cues condition), for both musicians and controls (all p < .01, using the same bootstrap comparison method as conducted throughout this study). However, when the cues were presented in a separate modality (i.e., the crossmodal conditions of the cueing experiment), JND scores were indistinguishable from the unimodal no cue conditions (Experiment 1) for both musicians and nonmusicians. This was reflected by a lack of differences between crossmodal TOJ tasks and their analogous no cue conditions in the first experiment (all p > .05). Collectively, this may suggest that a difficult unimodal task can be made easier when presented as a crossmodal task (Sinnett et al., 2006; Sinnett et al., 2007; Toro, Sinnett, & Soto-Faraco, 2005; Wickens, 1984).
For PSS scores, comparing the no-cue conditions of the first experiment to the cued conditions of the cueing experiment yielded less consistent results across conditions. In the unimodal auditory task, the cues had an effect in shifting PSS scores when compared to the non-cued condition for the control participants (p = .041 and p = .001 for each target type, crow and dog), whereas for musicians only the crow-cued condition differed from the no-cue condition (p = .001 and p = .643). This would suggest that for musicians, auditory cues did not have an effect on the detection of the dog stimulus. In the crossmodal auditory task (with visual cues), cued PSS scores did not differ from the no-cue condition for controls (all p > .05), but for musicians the crow-cued condition differed from the no-cue condition (p = .04), while the dog-cued condition did not (p = .66). This result is curious in and of itself, as it suggests that visual cues only had an effect on musicians when they appeared before the crow sound.
Varying trends were seen for visual PSS scores. In the unimodal visual task, horizontal-cued and vertical-cued PSS scores differed significantly from the no-cue conditions for both musicians and controls (all p < .01). Interestingly, in the crossmodal visual task, similar differences occurred for musicians and controls, where horizontal-cued PSS scores did not differ from the no-cue condition (both p > .10), while vertical-cued PSS scores did (both p = .004). Curiously, this finding suggests that auditory cues had a stronger effect on vertical lines than horizontal lines for both groups. It should be emphasized, however, despite the inconsistent PSS findings across cueing and no cueing conditions, the cues nevertheless were effective in capturing attention as based on the within experiment PSS analysis for the cued conditions.
Discussion
Robust findings from cross-experiment analyses broadly suggest that unimodal cues have a detrimental effect on JND scores, whereas crossmodal cues do not. These results were similar for both musicians and controls. Excluding cross experiment analyses, the only observed significant difference in the cueing experiment between musicians and controls was the lower PSS scores in the visual unimodal condition for musicians, and the lower JND score in the auditory-task/visual-cues condition for musicians. The lower PSS score indicates that unimodal visual cues captured musicians’ attention to a lesser degree than nonmusicians’, while the lone JND difference seemingly suggests that cross-modal processing was easier for musicians, but only when judging temporal order for auditory targets that were cued visually (see Figure 11). This could be due to the fact that musicians are accustomed to processing visual cues (the stave) before the sound is produced (i.e., improved JND score), and perhaps an increased ability to ignore distracting material, at least in the visual modality (i.e., improved PSS score). The lack of effect for auditory cues contrasts previous findings observed by Helmbold and colleagues (2005). The differences could stem from either the lower amount of music training in the current study, or the non-musical stimuli used in this experiment. For instance, Helmbold and colleagues used eleven years as a cut off to determine musical expertise, while the current study used three years. That said, over half of the participants had more than ten years of music training, and other studies using experts in different in music and other domains have used similar cut offs (Chan, Ho, & Cheung, 1998; Green & Bavelier, 2003; Sadakata & Weijkamp, 2017). With respect to the stimuli characteristics, the current study used naturalistic sound stimuli that were selected not to be musical in nature so that both groups would have equal experience with the sounds. It is possible that using musically based stimuli would lead to results that align more closely with Helmbold and colleagues’ study, although such a possible result could then be explained by musicians’ increased familiarity with the stimuli rather than any broadly based improvement in perceptual processing.
General Discussion
There are a number of important findings that merit discussion (See Table 1 for a summary of results). To begin with, performance between musicians and controls were mixed across both experiments in the auditory condition (musicians did have significantly lower JND scores in the auditory-task/visual-cues condition as well as marginally lower JNDs in the auditory condition in the first experiment [p = .07], while the unimodal auditory condition of the cueing experiment was not significant). Thus, we do not see as strong a trend as Hodges et al. (2005), where auditory JND scores were significantly lower for musical conductors when compared to controls. The difference in findings may be due to the use of different stimuli and experimental conditions, or the type of musical expertise involved in each experiment. In the present experiment, realistic sounds (dog and crow) were used, while auditory tones (440 Hz and 660 Hz) were used in Hodges et al.’s studies. Given that our auditory stimuli did not differ as much in frequency (both at approximately 500 Hz), it is possible that pitch discrimination skills would not aid musicians in the auditory task used here. Furthermore, it is also possible that differences in auditory temporal processing may exist between conductors and performing musicians.
First Experiment . | PSS . | JND . |
---|---|---|
Auditory | ns | Marginally lower for musicians |
Visual | Lower for musicians | Lower for musicians |
Crossmodal | ns | Lower for musicians |
Cueing Experiment | ||
Auditory with unimodal cues | ns | ns |
Auditory with crossmodal cues | ns | Lower for musicians |
Visual with unimodal cues | Lower for musicians | ns |
Visual with crossmodal cues | ns | ns |
First Experiment . | PSS . | JND . |
---|---|---|
Auditory | ns | Marginally lower for musicians |
Visual | Lower for musicians | Lower for musicians |
Crossmodal | ns | Lower for musicians |
Cueing Experiment | ||
Auditory with unimodal cues | ns | ns |
Auditory with crossmodal cues | ns | Lower for musicians |
Visual with unimodal cues | Lower for musicians | ns |
Visual with crossmodal cues | ns | ns |
*Note: ns = not significant
It is worth noting, however, that across all task types, JND scores for musicians were lower than those for controls. Although these differences were statistically significant in only three out of the seven conditions (first experiment: visual and crossmodal; cueing experiment: audio-task/visual cues), it may be the case that with larger sample size, more results would reach significance. Nevertheless, we do see a consistent trend towards lower thresholds of temporal discrimination in musicians.
Tentative speculation for a supramodal account of attentional resources may also be supported given that musicians outperformed controls on several non-auditory related tasks, including smaller capture from visual cues, and lower JNDs for visual and crossmodal conditions. Having said that, as music training involves much exposure to auditory stimuli, it was reasonable to expect enhancements in the auditory modality, although this was not consistently observed. Enhancements in other modalities, however, could be attributed to 1) better attentional resources, and 2) concomitant training in the visual modality from reading music while at the same time listening to and playing music, etc. Since we cannot rule out the second cause, these results can only be seen as tentative support for a supramodal account, pending further investigation with specific training conditions (see also footnote 1). Interestingly, the robust findings in the cueing experiment where crossmodal PSS and JND scores were, in fact, lower than their unimodal counterparts (all p < .05 and p < .001; respectively) may provide stronger evidence for a segregation of attentional systems (Sinnett, et al., 2006, 2007; Spence & Driver, 1996). Nevertheless, the current set of data makes it difficult to arrive at a decisive claim on either side of the debate. Indeed, it is likely that many of the findings supporting one theoretical account or the other may indeed be constrained by the varying methodologies used.
Another related novel finding observed in both musicians and controls is the selective deficits in JND for only unimodal cues and not crossmodal cues. That is, when a within modality cue was added to the task, JND scores increased significantly for both musicians and control participants. However, when the cues were presented across modalities (i.e., a visual cue and an auditory TOJ task, or vice versa), the performance was significantly better, and in fact, did not differ from the no-cue conditions. These results suggest that the threshold of temporal detection may be robust to crossmodal distraction, while at the same time be vulnerable to distractions within the same modality. This is in line with findings showing that the unimodal and crossmodal attention systems overlap but have some degree of independence too (Thesen et al., 2004).
While between-group differences in the auditory task in the first experiment were not observed (although note that it was approaching significance), this may suggest that auditory temporal acuity is less amenable to improvement through training (at least for the stimuli and task conditions used here), and that concomitant training effects are perhaps more robust in the visual domain, which other studies with expert populations have also shown (for an example with VGPs, see Donohue, et al., 2010). Importantly, the visual enhancements observed in JND in the first experiment lend support to the idea that attentional allocation, and thereof the improvement through training, may not be constrained within particular sensory modalities but instead distributed to multiple modalities as demanded by the current task. Nevertheless, an important criticism of studies that used “trained” populations such as musicians and VGPs, is the extent to which observed differences in experimental settings can actually be attributed to prior training (Boot, Blakely, & Simons, 2011; Green et al., 2019). Boot and colleagues extend this criticism to studies using pre-post training procedures, claiming that participants are often not blind to the purpose of the study, and that this awareness and potential motivational factor may very well influence performance. Unfortunately, our recruitment strategy for finding musicians did not allow us to keep them blind to the purpose of the study, and they may have been influenced by such knowledge. To this extent, our between-group conclusions must be interpreted carefully. Moreover, the nature of music training in sighted individuals is in itself a multimodal experience, and further training studies would be better equipped to draw conclusions by controlling for the type of training each participant receives.
Learning an instrument is a complex task that takes years to master, as it requires a variety of complex motor, auditory, and multimodal skills (Ericsson, Krampe, & Tesch-Römer, 1993). Any study aiming to administer music training to participants in order to measure the cognitive effects then must either choose between realistic instrumental training that requires a considerable duration of time for progress to be made (months to years), or a shorter regimen of training that focuses perhaps on more specific aspects of music learning. It should be noted that most studies to date have examined the effects of training using longitudinal approaches, in order to realistically replicate the process that trained musicians to go through (e.g., for studies with one year training durations, see Fujioka, Ross, Kakigi, Pantev, & Trainor, 2006; Schlaug, Norton, Overy, & Winner, 2005). It would also be informativeto train participants on subsets of musical tasks that may require a shorter time period to master, as this would allow us to focus on the more specific effects of the various subcomponents of music training, and the cognitive mechanisms they potentially invoke. Although many well-known studies have examined the effects of listening to excerpts of music on cognitive performance (e.g., Rauscher et al., 1993; Steele et al., 1999; Thompson et al., 2001), we are not aware of any studies that trained participants on actual subsets of musical skills (e.g., pitch discrimination, rhythmic training, notation reading, etc.). Although the full extent of instrumental training, and the multimodal processes involved therein could only be acquired through realistic training, gaining a better understanding of the subcomponents may also help to piece together a picture of what makes music training unique from a cognitive standpoint and any relevant side effects thereof.
A recurring topic of discussion in studies looking at far transfer of cognitive skills from areas such as music includes confounding variables. For example, it is possible that more focused or intelligent individuals are attracted to music in the first place. This issue is widely acknowledged as a problem (e.g., Boot et al., 2011; Green et al., 2019). Though this is a relevant problem, the current study looked at low level perception, which ought to be less susceptible to such bias. Other studies such as Hodges et al. (2005) would be susceptible to similar biases. Another important factor to consider is the number of years that the musicians had accomplished prior to participating in this study. The participants in the current study all had a minimum of three years of music training. It appears that there is no agreed upon number in the literature, as researchers interested in the question of how music training affects human cognition have recruited participants with five years (Sakata & Weijkamp, 2017) and eleven years of experience (Helmbold et al., 2005), to name just two. That said, over half of the participants in the current study had over ten years of experience. Other studies on how training transfers to core cognitive skills have comparable cutoffs (Chan et al., 1998), or much lower amounts in other domains (see, for example, Bediou et al., 2018; Green & Bavelier, 2003).
In summary, musicians showed a general improvement in temporal acuity (JND), especially in the visual and crossmodal conditions (audition was only marginally significant). It seems like music training enhances a general ability to discriminate across sensory modalities. In addition, visual cues seem to enhance both crossmodal simultaneity perception and crossmodal discriminability. These results are a start to map out the specific components that are enhanced by musical expertise. Further studies are needed to verify these results with a larger number of participants, better controlled experts, and other types of expertise to further understand how expert training influences temporal perception and attention between the senses.
Notes
It should be noted, however, that any such findings must be interpreted with caution as studies that recruit already “trained” participants such as VGPs and musicians cannot infer that the training itself is the cause for such enhancements. Only an intervention study that administers such training in a controlled setting can make such strong claims in regards to training related enhancements, and specifically to a segregated attentional system.
Given the difference in male to female ratio for controls (4:16) versus musicians (15:5), we pooled together all male and female participants across both musicians and controls for t-test comparisons. No differences were found in accuracy scores between male and female for any of the conditions in both Experiments (all p > .10).
Note, this analysis does not look at whether the crow or dog was preferred (i.e., analogous to horizontal bars being preferred in the visual condition of the first experiment), but merely confirms whether there was indeed a cueing effect.
References
Appendix
Education . | n . | Percentage . |
---|---|---|
High school graduate | 2 | 10 |
Currently in college | 13 | 65 |
College graduate | 3 | 15 |
Currently in graduate school | 2 | 10 |
Principle instrument | ||
Guitar | 10 | 50 |
Bass guitar | 3 | 15 |
Piano | 3 | 15 |
Saxophone | 1 | 5 |
French horn | 1 | 5 |
Voice | 1 | 5 |
Years of study on instrument | ||
3 – 10 | 9 | 45 |
11 – 20 | 8 | 40 |
21 – 30 | 1 | 5 |
Over 30 | 2 | 10 |
Hours/week of practice | ||
6 – 10 | 13 | 65 |
11 – 20 | 4 | 20 |
21 – 30 | 1 | 5 |
Over 30 | 2 | 10 |
Video games (hours/week)* | ||
1 – 4 | 7 | 35 |
None | 13 | 65 |
Education . | n . | Percentage . |
---|---|---|
High school graduate | 2 | 10 |
Currently in college | 13 | 65 |
College graduate | 3 | 15 |
Currently in graduate school | 2 | 10 |
Principle instrument | ||
Guitar | 10 | 50 |
Bass guitar | 3 | 15 |
Piano | 3 | 15 |
Saxophone | 1 | 5 |
French horn | 1 | 5 |
Voice | 1 | 5 |
Years of study on instrument | ||
3 – 10 | 9 | 45 |
11 – 20 | 8 | 40 |
21 – 30 | 1 | 5 |
Over 30 | 2 | 10 |
Hours/week of practice | ||
6 – 10 | 13 | 65 |
11 – 20 | 4 | 20 |
21 – 30 | 1 | 5 |
Over 30 | 2 | 10 |
Video games (hours/week)* | ||
1 – 4 | 7 | 35 |
None | 13 | 65 |
*Note: It is worth noting that all “video game” experience reported for both musicians and controls were on non-action type video games. Thus, this small amount of experience was not a concern, as the only type of video games that have been shown to lead to attentional enhancements are action video games (Green & Bavelier, 2003). Also, the ratio of gaming to non-gaming experience were similar for participants of both groups. Furthermore, the cut-off in video gaming experiments generally only considers a participant to be in the gaming group if they play action video games for more than 4 hours a week (e.g., see Green & Bavelier, 2007).
Education . | n . | Percentage . |
---|---|---|
Currently in college | 20 | 100 |
Activities (> 10 hours/week) | ||
None | 8 | 40 |
Sports & exercise | 8 | 40 |
Other (sewing, reading, dancing and drama) | 4 | 20 |
Video games (hours/week) | ||
1 – 4 | 8 | 40 |
None | 12 | 60 |
Education . | n . | Percentage . |
---|---|---|
Currently in college | 20 | 100 |
Activities (> 10 hours/week) | ||
None | 8 | 40 |
Sports & exercise | 8 | 40 |
Other (sewing, reading, dancing and drama) | 4 | 20 |
Video games (hours/week) | ||
1 – 4 | 8 | 40 |
None | 12 | 60 |