What are the factors that determine how long a person chooses to listen to an unfamiliar piece of music? We examined this question across three experiments in which we played participants novel repeating multi-instrument stimuli and recorded their listening times and reasons for their decisions to either continue or stop listening. To influence the habituating effects of repeating musical material drawn from a large stimulus library (> 450 items), we manipulated novelty along several musical dimensions. In Experiment 1, all instruments entered simultaneously. In Experiment 2, instrument entrances were also offset in time. In Experiment 3, we composed core multi-instrument loops and manipulated them to further minimize harmonic variability, minimize rhythmic variability, introduce spatialization, or change timbral characteristics. Novelty introduced by instrument entrances was the strongest determinant of listening times, though harmonic variability and timbral features were also important. Subjective enjoyment was the best predictor of listening times, mediating the effects of the degree of perceived groove in a stimulus, the urge to move, interest in a stimulus, perceived complexity, and congruency with current mood. We conclude that naturalistic looping musical stimuli serve well to examine the diverse psychological and musical determinants of choice behavior underlying music consumption.

In his seminal book, Aesthetics and Psychobiology, Berlyne (1971) pointed out a crucial flaw in the experimental design of most modern research concerning music: that the autonomy of subjects is removed when instructed to fixate on experimenter-selected stimuli, thus limiting our insights into perception and aesthetic experience. In describing common human engagement with art and other stimuli, Berlyne made reference to exploratory behavior, behavior in which organisms purposefully seek out and interact with stimuli of interest in their environment. Exploratory behavior can be further divided into the extrinsic variety, when one seeks out informational content that is biologically valuable in terms of future behavior, and the intrinsic, where stimuli are sought after for their own sake. The latter encompasses aesthetic behavior, which Berlyne would go on to study in detail using exploratory behavior as a dependent variable. In terms of music, one potentially rich index of such behavior is listening time: the duration for which one chooses to listen to a musical stimulus. This behavior manifests in a variety of everyday situations as consumers of music choose from many options what to listen to. Switching among radio stations in search of desired music, or choosing to skip a currently playing piece of music from a streaming music service are two common examples of exploratory behavior and consumer decisions in the service of psychological goals.

PRIOR USE OF LISTENING TIME AS A DEPENDENT VARIABLE

Despite Berlyne's argument in its favor, the use of listening time as a dependent variable in music psychology is uncommon. Not surprisingly, most of those who did heed the call were either colleagues or contemporaries of Berlyne. In an anthology comprising their work, Berlyne (1974) posited that listening time was to be interpreted as a measure of the intensity of the orienting reaction, attention, and perceptual curiosity. Listening (or looking) time thus can serve as a dependent measure in the quantification of attention to competing perceptual targets, and has long been exploited in studies examining knowledge in infants (Colombo & Mitchell, 2009; Oakes, 2010), even though it has not been utilized extensively in studies of aesthetic judgments or stimulus preference. Early experiments that examined listening times and subjective ratings for musical sequences utilized tone sequences in which the variation in pitch, duration, and loudness dimensions was motivated by principles of information theory (Crozier, 1974). Listening times were found to increase with stimulus complexity (defined as the level of uncertainty). Moreover, 78% of the variance in listening time was accounted for by a factor that comprised semantic differentials generally associated with interestingness, whereas only 10% was explained by a factor that was associated with enjoyment.

Berlyne's theories of preference dominated the field of experimental aesthetics for decades to come (Martindale & Moore, 1989). Though some aimed to refine some of Berlyne's hypotheses, such as the inability of the Wundt curve to adequately describe medium arousal potential in regards to music (Martindale & Moore, 1989), others have expanded his research, one example being the investigation of relationships between judgments of complexity, pleasingness, and interestingness in real jazz songs (Russell, 1982). While progress was made on these related topics, listening time was left out of the picture.

Holbrook and Gardner (1993), from the perspective of consumer research, noticed that questions concerning the duration of consumption, specifically about listening receptivity to music, were largely unexplored. Their work employed listening time as a dependent variable, reasoning that it unobtrusively measured overall receptiveness to consumption experiences through voluntary behavioral self-exposure. In an experiment involving a selection of various jazz recordings, they discovered that although tempo strongly affected arousal, listening time peaked at intermediate levels of arousal, and pleasure shifted these peaks left and right on a non-monotonic curve. In a follow up study, these results were replicated, and new and much more interesting results were acquired: the combined effects of pleasure, arousal, and task motivation (intrinsic vs. extrinsic) accounted for about a third of the variance in listening time (Holbrook & Gardner, 1998).

Finally, listening time figures importantly in the Operant Music Listening Recorder, a device that measures both exploratory choice and listening times for the different choices. Listeners freely select between available music channels. In the case of popular and classical music it was found that listening time followed familiarity and liking, but not quality ratings (Hargreaves, 1987), a pattern of results supporting a theory in which responses to music are fragmented according to cognitive (“evaluative quality”) and affective (“liking”) dimensions (Hargreaves, Messerschmidt, & Rubert, 1980). This fragmentation into cognitive versus affective dimensions echoed the earlier results of Crozier (1974) in which cognitive factors dominated listening times for synthetic random tone sequences.

EMBODIED ENGAGEMENT WITH MUSIC

Often ignored in studies of music perception and aesthetic judgments about music is the idea that meaningful engagement with music is embodied engagement with music. In other words, pleasurable experiences of music typically comprise perception and action. Action need not take the form of overt movements such as singing or dancing, but can also manifest as covert action associated with silently singing along with a piece of music, or forming expectations for forthcoming musical events. Numerous brain imaging studies have illustrated that premotor regions of the frontal lobe become active as a person listens to or imagines a familiar piece of music (Halpern & Zatorre, 1999; Herholz, Halpern, & Zatorre, 2012; Janata, 2009; Pereira et al., 2011), or listens to rhythmic stimuli even in the absence of overt movement (Grahn & Brett, 2007; Janata & Grafton, 2003; Kornysheva, von Cramon, Jacobsen, & Schubotz, 2010).

The tendency for motor areas to respond when a person listens to a piece of music may underlie the urge to move in response to music, an experiential state associated with “being in the groove” (Janata, Tomic, & Haberman, 2012; Stupacher, Hove, Novembre, Schuetz-Bosbach, & Keller, 2013). Such embodied engagement with music is generally regarded as pleasing. Among young adults, positive correlations exist between appraisals of groove in music and the enjoyment of it, including the desire to keep listening to it. Hurley, Martens, and Janata (2014) postulated that perceived groove and motoric engagement would be enhanced if the complexity of a musical stimulus increased gradually. They tested this idea by constructing musical stimuli in which the different instruments’ parts entered sequentially, and compared responses to these stimuli with responses to stimuli in which all of the instrument parts entered simultaneously. They reasoned that the entrance of each additional instrument would help to sustain attention to the stimulus by virtue of having added a novel musical element that interacts with the established musical scene, and that this would increase the urge to engage with the stimulus, ultimately resulting in greater amounts of overt movement. Indeed, they found that the urge to move, enjoyment of the music, and the desire to continue an experimental trial increased when instrument entrances were staggered and when the music consisted of multiple instrument parts, rather than a solo part. Each instrument entrance also increased the likelihood of spontaneous initiation of bimanual drumming and spontaneous head movements that were coupled to the temporal structure of the musical stimuli.

REPETITION AND HABITUATION

The last critical concepts for our set of experiments are those of repetition and habituation. Repetition within pieces of music and repeated listening to specific pieces of music are hallmarks of music and engagement with music, respectively (Margulis, 2014). The phenomenon of repetition occurs at different timescales. Within pieces of music, brief melodic or rhythmic patterns can be repeated, as can phrases spanning several seconds. These repetitions can occur in immediate succession or they can be separated by intervening musical material. Repetition also pertains to entire pieces, which can be listened to multiple times in succession or days, months, or years apart. What is the impact of these different types and timescales of repetition on our affective responses to music?

Most extensively examined has been the impact of repeated exposures of discrete musical units, whether short melodic fragments or entire pieces, on liking. In typical experiments, participants are presented with varying numbers of presentations of stimuli in randomized order. For example, Szpunar and colleagues (2004) tested the effects of incidental versus focused listening on musical stimuli that varied in ecological validity from short sequences of notes to 15 s excerpts of orchestral music from the Classical genre. They found that liking increased monotonically with the number of exposures when the music was heard incidentally in one ear as participants monitored a speech stream in the other ear. These results corroborated the basic properties of the “mere exposure” effect in which preference for a stimulus increases with the number of subliminal stimulus presentations (Bornstein, 1989; Zajonc, 1968). Ecologically valid and explicitly attended musical items showed an inverted-U function, consistent with a satiation account in which liking increases as a function of familiarization with a stimulus (up to 8 repetitions) but then decreases with additional repetitions (tested with 32) as boredom sets in (Szpunar et al., 2004). Inverted-U functions of liking or preference for a stimulus have long been postulated to underlie affective responses to artistic stimuli (Berlyne, 1971; Madison & Schiölde, 2017; Szpunar et al., 2004) with the peak positioned at some optimal level of stimulus complexity or familiarity. However, in some cases, repeated exposures to ecologically valid stimuli varying in complexity show no signs of satiation (Madison & Schiölde, 2017), indicating that Berlyne's arousal potential model (Berlyne, 1971) may be constrained by stimulus-related and psychological factors that are not yet fully understood.

Rather than examining the effect of multiple exposures separated by time and intervening stimuli, as is typical of most studies in this domain, we sought to examine the processes that drive engagement with a stimulus in the moment. Presumably, processes of familiarization, novelty, and satiation may also be at play at shorter, within-stimulus, timescales. Repetition is an indispensable compositional device (Margulis, 2014). Adding repetition into pieces of music can result in greater liking for the modified versions than for the originals (Margulis, 2013), indicating that there is an affective benefit to some repetition. Despite widespread recognition that too much intra-musical repetition is not characteristic of most musical compositions (Margulis, 2014), there has been a surprising lack of studies that aim to delimit the interaction of repetition and psychological and musical factors that would position a listener along some decision axis stretching from a decision to continue listening to a piece of music or to seek a different piece of music.

We assume that the underlying psychological and neural principle that we assayed here is the ubiquitous phenomenon of habituation, which presumably underlies boredom and is akin to, but not synonymous with, satiation (McSweeney, 2004). These phenomena manifest neurally and behaviorally throughout the animal kingdom (McSweeney, 2004). Though satiety is typically considered in regard to the satisfaction of hunger and thirst, the concept has been invoked in aesthetics to explain the sometimes-observed decrease that follows the increase in liking for stimuli to which a person is repeatedly exposed (for a discussion, see Szpunar et al., 2004). Although a concept of “affective satiation” may be appropriate when considering the repeated consumption of the same music across longer timespans of hours to days, repetition of a stimulus on the order of seconds reliably results in habituation.

Assuming that the default tendency when experiencing a repeating stimulus will be to habituate to it and disengage from it, the amount of time spent engaging with the stimulus, in our case listening to it, becomes a measure of the habituation process on which there are overlaid factors that either speed or slow this process. Though not widespread in studies of music perception and cognition, the technique of habituating participants to stimuli (e.g., in order to assess the capacity for perceptions of change along stimulus dimensions of interest) is a crucial tool for studies of infant perception and cognition (Colombo & Mitchell, 2009; Oakes, 2010). When viewed against the background of habituation, measures of engagement with a stimulus, such as the amount of time spent looking at it or listening to it, thus tap fundamental aspects of appetitive behavior in which novel objects or information are assessed, presumably with regard to their value for one's present or future state.

THE PRESENT SERIES OF EXPERIMENTS

Across three experiments we manipulated musical properties, assessed several subjective variables, and performed several types of analyses in order to better understand the musical and psychological factors that influence how long a person listens to a novel, but repeating, musical stimulus. In order to obtain listening time estimates on a per-stimulus basis that would be sensitive to our manipulations, we constructed repeating (looping) musical phrases that consisted of multiple instruments with diverse, naturally occurring timbres and rhythmic and melodic complexity. We expected participants to habituate/satiate to these stimuli, and for there to be systematic effects of our manipulations.

Given the various studies discussed above, we expected that increases in perceived groove and incremental increases in instrumentation density would increase enjoyment and listening times. Building on the findings of Holbrook and Gardner (1993), we expected the relationship between enjoyment and listening time to hold across genres, and based on Crozier (1974), we also expected subjects’ reports of stimulus complexity and interest to have a considerable positive effect on listening time. Experiments 1 and 2 primarily addressed these variables.

In the same way that listening times for synthetic random tone sequences increased as variability in pitch and duration across the items in those sequences increased (Crozier, 1974), we expected that manipulations of the amount of variability in tonal, temporal, and timbral dimensions of stimuli that more closely approximate actual music that people seek out would similarly affect listening times: reduced variability/complexity would result in shorter listening times. We tested these predictions in Experiment 3, for which we composed a core set of stimuli that we then manipulated to independently: 1) reduce tonal variability by reducing or eliminating harmonic progressions and melodic information, 2) reduce rhythmic variability by eliminating syncopation and constraining note onsets to one or two metric levels, 3) alter the timbral properties (instrumentation) by assigning different instruments to each of the parts, or 4) increase the spatial separation among instruments by panning them toward the left or right channels.

In Experiment 3 we also examined trait-level participant characteristics that may influence listening times. Hunter and Schellenberg (2011) found that the Openness-to-Experience dimension of the Big Five personality inventory explained variability in curvilinear relationships of liking of Classical music excerpts as a function of the number of exposures, with higher Openness scores more likely to be associated with linear decreases in liking after the initial exposure to an unfamiliar music, or a peak in liking at fewer exposures when an inverted-U function was present. We asked participants to complete the Brief Affective Neurosciences Personality Scales (BANPS; Barrett, Robins, & Janata, 2013), a shorter and psychometrically improved version of the Affective Neurosciences Personality Scales, which assesses personality dimensions that were conceived in terms of underlying neurobiological systems (Davis & Panksepp, 2011). The ANPS dimensions are correlated with the Big Five dimensions, though some of the ANPS dimensions, such as Seek, Play, Care, and Anger, are each correlated with multiple Big 5 dimensions (Barrett et al., 2010, Figure 2). The ANPS has proven effective in explaining individual differences in behavioral and neural studies of music-evoked nostalgia (Barrett et al., 2010; Barrett & Janata, 2016). In the present context, we expected that the Seek dimension might serve to explain variability in average listening times among participants.

Experiment 1

The first experiment assessed the degree to which subjective post-stimulus self-report ratings of enjoyment, groove, the urge to move, and reported movement could predict listening times for novel repeating musical stimuli drawn from a library of 162 stimuli created for this purpose. We also sought to identify common participant-reported reasons for terminating the music or allowing it to continue.

METHOD

Participants

One hundred and five participants were recruited from an undergraduate participant pool at the University of California, Davis, in exchange for partial course credit. All participants provided informed consent in accordance with a protocol approved by the UC Davis Institutional Review Board. Approximately two-thirds of the participants participated during the end of the Winter academic quarter and one-third participated at the beginning of the Spring quarter in 2012, thus helping to mitigate possible motivational differences among individuals in the sample.

Seven participants failed to evaluate more than 20 stimuli and were excluded from the analysis. Of the final 98 participants, 61.2% were female. The racial distribution of participants was 40.8% Asian, 27.6% Caucasian, 3.1% African American, 6.1% more than one racial category, 11.2% other racial categories, and 11.2% unknown. 17.4% reported their ethnicity as Hispanic. Ages ranged between 18 and 39 years of age (mean ± SD = 21.1 ± 3.1).

Stimuli

One hundred and sixty-two original musical stimuli were created using Logic and the Apple Loops available therein (Apple, Inc.). The objective was to create stimuli that consisted of brief musical phrases or ideas that would then repeat (loop) until a maximum listening duration of two minutes had been reached. We aimed to create a diverse library of loops, and thus created loops that we expected to vary in their degree of: 1) pleasantness, and 2) perceived groove (Hurley et al., 2014; Janata et al., 2012).

Distributions of basic characteristics of the musical loops are shown in Figure 1. Loops comprised between three and five instrument parts; in most cases four. The durations of single loop iterations were four or eight seconds, skewed heavily toward the longer duration. Loop tempo was held constant at 120 beats per minute (bpm). Genre categories from which the constituent instrument loops were drawn included blues, contemporary rock, folk, funk, jazz, R&B, and rock. It should be noted, however, that many of the loops seemed genre-atypical and so a miscellaneous category was added for loops that could not be easily categorized.

FIGURE 1.

Distributions of stimulus characteristics in each of the three experiments.

FIGURE 1.

Distributions of stimulus characteristics in each of the three experiments.

For each stimulus, a two-minute audio file was created such that the original loop duration became the cycle time within the overall stimulus. Two minutes was thus defined as the maximum listening time for any given stimulus.

For the Winter-quarter sample of participants, there were 108 stimuli. The remaining 54 stimuli were created subsequently and therefore only included for the Spring-quarter sample.

Procedure

The experiment was conducted online using Ensemble, a web-based experiment management system (Tomic & Janata, 2007). Participants could participate as long as they had a computer with Internet access, keyboard, mouse, and stereo speakers, headphones, or earbuds. Each experimental session lasted one hour.

Participants first verified that the audio was audible and set to a comfortable volume. Upon completing a set of questionnaires pertaining to different psychological styles of engaging with music,1 the following instructions were presented to the participant:

You are now going to hear a series of musical excerpts. You are free to listen to each musical excerpt for as long as you like (up to 2 minutes). When you want to stop listening to the particular musical excerpt, simply hit the “Stop Playing” button that will be on the screen. After each excerpt you will be asked a few questions before hearing the next excerpt. The overall duration of the experiment is fixed, and does not depend on how many or how few excerpts you listen to. Press Next to begin. It may take several seconds for the first excerpt to load and start playing.

Each stimulus was accompanied by the words “Playing Audio” in bold red text and a button labeled “Stop Playing” on which the participant could click at any time. The listening time for each stimulus was recorded as the shorter of two minutes or the clicking of the Stop Playing button from the time at which the stimulus began playing.

Upon termination of the stimulus, the participant was presented with two forms containing a series of questions that were to be answered with respect to the musical stimulus that had just been heard. The first form comprised four questions. The first three questions required responses on a 7-point scale (1 = not at all; 7 = very much): “How much did you enjoy the musical excerpt?” “To what extent did you feel an urge to move while listening to the music?” and “To what extent did you feel that the musical excerpt grooved?” The fourth question was “Did you move along with the music, e.g., bob your head, tap your toes, tap your hands, sway?” which could only be answered by checking either “Yes” or “No.”

The questions on the second form pertained to reasons for stopping or continuing to listen to a loop. The first of these was “Please indicate the reasons you stopped listening (check any that apply).” The response options were intuitive statements meant to encompass a range of possible reasons: “I didn't like the combination of instruments that was playing,” “The sound of a particular instrument bothered me,” “I didn't like some of the rhythms,” “I didn't like some of the melodies,” “I kept wanting a new part to enter, but it didn't,” “I kept wanting the existing parts to change, but they didn't,” “I got bored,” “I was curious what the next musical stimulus would be,” and “Other.” The second question was “Please indicate the reasons you continued to listen as long as you did (check any that apply)” with response options: “I kept listening because I heard new things in the music,” “I kept waiting for a new part to enter,” “I liked the instruments that were playing,” “I liked the rhythms,” “I liked the melodies,” “I liked how the instruments fit together,” “I liked the complexity of the music,” “I liked the simplicity of the music,” and “Other.” The participant could check none, some, or all of the responses provided. If “Other” was selected, the participant had the opportunity to type in a short reason for stopping or continuing that was not expressed by any of the items in the given list.

Upon completion of the post-stimulus questions, a new stimulus was selected and began playing. This process repeated until the experiment ended. Stimuli were selected at random so as to avoid order effects and to sample uniformly from a stimulus library that was too large to be listened to in its entirety by any given individual. Using this strategy, each musical stimulus would tend to be evaluated approximately as many times as every other stimulus.

DATA ANALYSIS

Preprocessing

Custom MATLAB scripts were written to extract and organize the data. Those participants who did not complete a minimum of 20 trials were suspected of not paying attention to the task and were removed from further analysis. Listening to 20 stimuli for the maximum duration of two minutes would take 40 minutes. Given the average listening times indicated below, it was very unlikely that any individual would listen to this few stimuli for two minutes each while needing a total of 15 minutes for answering the post-stimulus questions.

Statistical analyses

Statistical analyses were performed using R (R Core Development Team, 2012). Given the decaying exponential shape of the per-trial listening time distributions (Figure 2A) the logarithm (base 10) of the listening times was used for the analyses. We used the lmer function from the linear mixed-effects modeling library (lme4) (Bates, Maechler, & Bolker, 2013) with maximum likelihood estimation to fit a series of models (Table 1). We compared the fits of different models using a likelihood ratio test in order to determine significant effects of variables in the models.

FIGURE 2.

Distributions of per-trial listening times and per-stimulus average listening times for each of the three experiments. The stimulus denoted by the black diamond in Panel F is the initial two minutes of Chameleon by Herbie Hancock.

FIGURE 2.

Distributions of per-trial listening times and per-stimulus average listening times for each of the three experiments. The stimulus denoted by the black diamond in Panel F is the initial two minutes of Chameleon by Herbie Hancock.

TABLE 1.

Model Results in Experiment 1

Model#Model statementAICTest ofModels comparedChi-SquaredfProbability
listen ~ 1 +(1 | subject_id)
     + (1 | stimulus_id) 
4958.7      
listen ~ 1 + (1 | stimulus_id) 9827.0 Subject 2, 1 4870.4 2.2e-16 
listen ~ 1 + (1 | subject_id) 4995.8 Stimulus 3, 1 39.10 4.0e-10 
listen ~ 1 + enjoy + groove + urge
     + (1 | subject_id)
     + (1 | stimulus_id) 
4446.4 All fixed
 effects 
1, 4 518.33 2.2e-16 
listen ~ 1 + groove + urge
     + (1 | subject_id)
     + (1 | stimulus_id) 
4548.1 Enjoy 5, 4 103.77 2.2e-16 
listen ~ 1 + enjoy + urge
     + (1 | subject_id)
     + (1 | stimulus_id) 
4449.8 Groove 6, 4 5.45 0.020 
listen ~ 1 + enjoy + groove
     + (1 | subject_id)
     + (1 | stimulus_id) 
4451.5 Urge 7, 4 7.15 0.008 
listen ~ 1 + enjoy + groove + urge
     + (enjoy + groove + urge | subject_id)
     + (1 | stimulus_id) 
4241.4 Random
 slopes 
4, 8 222.96 2.2e-16 
Model#Model statementAICTest ofModels comparedChi-SquaredfProbability
listen ~ 1 +(1 | subject_id)
     + (1 | stimulus_id) 
4958.7      
listen ~ 1 + (1 | stimulus_id) 9827.0 Subject 2, 1 4870.4 2.2e-16 
listen ~ 1 + (1 | subject_id) 4995.8 Stimulus 3, 1 39.10 4.0e-10 
listen ~ 1 + enjoy + groove + urge
     + (1 | subject_id)
     + (1 | stimulus_id) 
4446.4 All fixed
 effects 
1, 4 518.33 2.2e-16 
listen ~ 1 + groove + urge
     + (1 | subject_id)
     + (1 | stimulus_id) 
4548.1 Enjoy 5, 4 103.77 2.2e-16 
listen ~ 1 + enjoy + urge
     + (1 | subject_id)
     + (1 | stimulus_id) 
4449.8 Groove 6, 4 5.45 0.020 
listen ~ 1 + enjoy + groove
     + (1 | subject_id)
     + (1 | stimulus_id) 
4451.5 Urge 7, 4 7.15 0.008 
listen ~ 1 + enjoy + groove + urge
     + (enjoy + groove + urge | subject_id)
     + (1 | stimulus_id) 
4241.4 Random
 slopes 
4, 8 222.96 2.2e-16 

Note: Model statements (regression equations) are presented in the exact form in which they were specified in the R statistical software. The variable on the left side of the tilde is the dependent variable that is being predicted by the variables on the right hand side of the equation. Variables contained in parentheses are variables for which random effects estimates are obtained, whereas fixed effects estimates are obtained for variables not contained in parentheses. A “1” indicates an estimate of an intercept, which in the case of a random effects variable such as subject_id would represent the average listening time across all trials completed by an experiment participant.

Given the pattern of results described below, we also performed a random effects multilevel mediation analysis, with trial-level data for the Level 1 observations, and participants as the Level 2 observations (Bauer, Preacher, & Gil, 2006). We implemented the analysis in R, based on code obtained from the UCLA Institute for Digital Research and Education (2014; “How can I perform mediation with multilevel data? (Method 2)”) in order to obtain path coefficient estimates and 95% confidence intervals. Estimation of the variance associated with the covariance estimate for the random effects of either groove or urge to move (independent variables) on enjoyment (mediator variable) and enjoyment on listening time (dependent variable) was accomplished by a bootstrap analysis using resampling (1000 iterations).

RESULTS AND DISCUSSION

Listeners rated between 20 and 162 loops each (median ± SD = 72 ± 32.8), and between 10 and 68 listeners rated each loop (median ± SD = 55 ± 20.4). The distribution of per-trial listening times was characterized by an exponential decay function, with median and mean listening times of 6.2 and 12.1 s, respectively (Figure 2A). Average listening times for the stimuli were distributed primarily between 4 and 22 s, with a mean of 11.7 s (Figure 2B). It appears, therefore, that habituation was rapid, with the average decision to terminate a loop coming within its second repetition.

A summary of the statistical models used to assess the influence of perceived groove, the urge to move, and enjoyment on listening times is presented in Table 1. The simplest models determined whether random intercepts associated with participants and stimuli contributed to the variance in observed per-trial listening times. Both did, indicating that some participants listened to the musical stimuli longer, on average, than did others, and that some stimuli were listened to for longer than were others (Figure 2B).

To address whether fixed effects associated with the 7-point Likert-like scales (enjoyment, groove, urge to move) explained a significant amount of the variance in listening times, we compared a full model containing the fixed effects and the random intercepts for participant and stimulus, with the intercepts-only model. The fixed-effect variables significantly improved model fit (Table 1, Model 4). We then determined whether the effect of each fixed-effect variable was significant, i.e., contributed to improving model fit. We did this by removing the variable of interest from the full model (i.e., Model 4), evaluating this restricted model, and then comparing the full and restricted models to determine, using a chi-square test, whether the difference in model fit was statistically significant. The individual elimination of each of the enjoyment, perceived groove, and urge to move variables in the full model significantly reduced model fit (Table 1, Models 5–7 compared with Model 4). The model coefficients from Model 4 (all fixed effects) showed that increases in enjoyment of a stimulus increased listening times (b = 0.045, se = 0.004, t = 10.34), as did, to a lesser degree, increases in perceived groove (b = 0.011, se = 0.004, t = 2.34), and the urge to move (b = 0.013, se = 0.005, t = 2.68).

We further tested whether participants varied in the slopes of the relationships between subjective variables and listening time (Table 1, Model 8). We observed significant effects of the random slopes for both enjoyment and groove. This means that subjects differed in the degree to which their personal enjoyment of a loop changed for how long they would listen. For some participants a large increase in enjoyment resulted in a large increase in listening times, whereas for other participants a similar increment in enjoyment would have less of an effect on listening times. The same was true for perceived groove, thus indicating that one of our initial expectations was truer for some participants than others. We did not pursue the sources of this variability among individuals any further, though we believe that modeling individual differences will be important in future studies on this topic.

Mediation models

Perceived groove of the excerpts was a significant predictor of listening times, χ2 (1) = 350.78, p < .0001, though less so when enjoyment was also in the model,χ2(1) = 14.25, p < .0002. Groove significantly predicted enjoyment, χ2(1) = 5399.3 p < .0001. The strength of the urge to move was a significant predictor of listening times, χ2(1) = 363.7, p < .0001, though less so when enjoyment was also in the model,χ2(1) = 15.95, p < .0001. Urge to move significantly predicted enjoyment,χ2 (1) = 5593.2, p < .0001. Such a pattern of results is indicative of strong mediation (Baron & Kenny, 1986) by enjoyment of the effects of perceived groove and urge to move on listening times. The path coefficients and 95% confidence intervals estimated by a random effects multilevel mediation analysis are shown in Figure 3. Although there were small significant direct effects of perceived groove and urge to move on listening times, these variables primarily contributed to enjoyment, which in turn predicted listening times.

FIGURE 3.

Mediation analysis for Experiment 1. The numbers indicate beta coefficient estimates with 95% confidence intervals.

FIGURE 3.

Mediation analysis for Experiment 1. The numbers indicate beta coefficient estimates with 95% confidence intervals.

Continuation and termination reasons

Although this experiment focused on constructs of enjoyment and groove, we expected that diverse factors might influence listening times. The percentages of trials on which each termination and continuation reason was endorsed are summarized in Figure 4.

FIGURE 4.

Termination/continuation reasons for all experiments. Values indicate the percentage of trials on which the reason was endorsed.

FIGURE 4.

Termination/continuation reasons for all experiments. Values indicate the percentage of trials on which the reason was endorsed.

The most commonly endorsed reasons for continuing to listen and to stop listening had to do with anticipation and the desire for novelty. Boredom was the primary reason for stopping in almost half of the trials; similarly in ~28% of the trials participants continued to listen in anticipation that a new part would enter. Likely related to boredom, the failure of new parts to appear or change were given as the termination reasons in ~12% of the trials. Many participants were motivated to change because of their curiosity for what the upcoming stimuli would be, and these decisions came as soon as participants determined that the initial loop was repeating.

The second most common domain of reasons influencing decisions to continue or stop listening pertained to the instrumentation of music, both at the level of individual instruments that were playing as well as the general sense of how well they fit together. Liking of the rhythms in the music was a common continuation reason, and is generally consistent with the effects of perceived groove on listening times described above.

The option to provide “Other” reasons yielded 444 reasons for terminating listening and 100 for continuing. Despite our objective of producing original loops that were generally free of associations with other familiar music, some responses (“Sounded like Twinkle Twinkle Little Star” or “Reminded me of James Bond”) revealed that we did not succeed fully. Other responses were purely expressive (“I did not like it at all” and “Miserable”), whereas others included identification of stimulus characteristics which reflected continuation reasons (“I really don't like this type of rhythm”), pointed out new sources of reasons for stopping (“I don't like this genre”), and indicated that impressions may often be in flux (“Deciding whether I liked it or not”). Another common type of response was that the music did not match a person's mood.

Interestingly, individuals’ concepts of simplicity and complexity were endorsed as reasons for continuing to listen on less than 10% of the trials, in contrast to the earlier results of Crozier (1974), in which the complexity of monophonic melodies was a determinant of listening time. These differences must be interpreted very cautiously given the stark differences in the stimulus materials used in our respective experiments. The fact that timbral qualities influenced the decision of whether to continue listening to a stimulus was apparent in user-entered responses as well as a post-experiment assessment of rank-ordered stimulus lists. Loops containing instruments with high-pitched, harsh timbres were often at the very bottom of the list, suggesting other attributes of music such as rhythm, melody, harmony, or complexity may become less important when the instrument sound is hard to bear.

Experiment 2

In Experiment 1, participants’ reasons for deciding to continue or stop listening to the music loops indicated clearly that change in the musical stimulus across time is desired. Our next step was to quantify the effects of one particularly salient form of change in the musical scene. One of the simplest ways of introducing change into a multipart repeating musical stimulus is to manipulate the number of instruments that are playing concurrently. When used with looping stimuli, this type of manipulation preserves multiple levels of musical information (i.e., timbral, rhythmic, melodic, and harmonic) within musical streams (parts) while altering the number of streams across which a person may orient his or her attention at any given moment in time. At the beginning of a piece, a common compositional device is to stagger (delay) instrument entrances. The more general manipulation of instrument entrances and exits, as well as increases and decreases in volume, has been described as the “ramp archetype” (Huron, 1992).

Hurley et al. (2014) demonstrated that staggering entrances of instrument parts in looping stimuli, similar to those used in this experiment, increased ratings of enjoyment and also the amount of motoric engagement with the music. Thus, the aims of the second experiment were to explicitly test the hypothesis that staggering instrument entrances across successive iterations of the loops would increase listening times.

We also predicted that the degree to which participants found the musical stimuli interesting—beyond the novelty associated with entering instrument parts—would increase enjoyment and listening times. Thus we used Likert-type scales to better assess interestingness and perceived complexity on a per-stimulus basis. Finally, because some participants expressed wanting to skip a music excerpt due to incongruity of the music with their current mood, we assessed mood congruity in more detail also.

METHOD

Participants

In exchange for partial course credit, 108 participants were recruited from an undergraduate subject pool at the University of California, Davis. All participants provided informed consent in accordance with a protocol approved by the UC Davis Institutional Review Board. None of the participants had participated in the previous experiment.

Twelve participants who failed to evaluate more than 20 stimuli, and an additional 3 participants who encountered trouble with the online experiment and heard at least one stimulus more than once, were excluded from the analysis. Of the final 93 participants, 79.6% were female. The racial distribution of participants was 39.8% Asian, 28.0% Caucasian, 2.1% African American, 8.6% more than one racial category, 10.8% other racial categories, and 9.7% unknown. 19.4% reported their ethnicity as Hispanic. Ages ranged between 18 and 33 years of age (mean ± SD = 20.5 ± 3.1).

Stimuli

A starting set of 30 stimuli was drawn from among the initial 162 stimuli used in Experiment 1. Specifically, 10 were chosen from those with the 20 longest average listening times (mean ± SD = 19.9 ± 3.6 s), 10 from among the middle-ranked 80 stimuli (11.9 ± 0.7 s), and 10 from the 20 bottom-ranked stimuli (8.3 ± 1.2 s). Two variants were created for each of the starting stimuli, resulting in 90 total stimuli. Variants were created by delaying the entrance of each successive instrument by one loop cycle. One variant was labeled most optimal and the other least optimal. “Optimality” reflected an aesthetic judgment by two of the authors involved in modifying the stimuli. In most optimal arrangements, instrument entrances were ordered in a manner that seemed, to the arranger, to be congruent with the genre that the superimposed loops evoked. Least optimal arrangements were intended to serve as a control for any increase in listening time due to staggering alone. Once entered, an instrument part continued playing for the remainder of the 2-minute stimulus.

The staggered-entrance manipulation effectively prolonged the amount of time that a participant could expect to hear novel musical information. Figure 1 illustrates the distributions of times at which the final instrument to enter made its entrance. For the simultaneous entrances, these were necessarily zero. The exact values varied across stimuli because of the numbers of instruments and loop durations. The most common final entrance time was 24 s, which corresponded to stimuli with four instruments and 8-s loop durations.

Procedure

The procedure mirrored that of Experiment 1. The stimuli were again presented in a random order. To avoid familiarity effects, each participant heard only one version of each stimulus from across the staggered— most optimal, staggered—least optimal, or simultaneous entry (original unaltered) stimulus categories, and therefore listened to at most 30 stimuli.

The first questionnaire following each stimulus comprised four questions that required a response on a 7-point rating scale (1 = not at all; 7 = very much): “How much did you enjoy the musical excerpt?” “How well did the musical excerpt match your current mood?” “To what extent did the musical stimulus sound complex?” and “To what extent did you find the musical excerpt interesting?” The last question was, “Did you move along with the music, e.g., bob your head, tap your toes, tap your hands, sway?” and could only be answered by selecting “Yes” or “No.”

The second post-stimulus questionnaire remained nearly the same as in Experiment 1, with two possible reasons added to the question, “Please indicate the reasons you continued to listen as long as you did (check any that apply).” The choice “The music made me feel good” was added as a general endorsement of visceral positive affect separate from enjoyment. Second, given that our experiment was conducted online, the choice “I would have changed it earlier, but I got distracted” was added to assess a potential source of variance in listening time data due to the distractions of a partially uncontrolled online environment.

DATA ANALYSIS

The same data analysis procedures as in Experiment 1 were followed. Participants indicated that they would have stopped the excerpt earlier had they not been distracted on 11.7% (306/2623) of the trials. These trials were removed from the data before the analyses.

RESULTS AND DISCUSSION

Listeners each rated between 20 and 30 loops (median ± SD = 29 ± 1.8), and between 21 and 39 listeners rated each loop (median ± SD: 30 ± 3). At ~28 s, listening times in Experiment 2 were on average over twice as long as in Experiment 1 (Figure 2C, 2D). The differences in average listening times for stimuli in which all instruments entered simultaneously and those in which entrances were staggered are shown in Figure 5. Differences in the pre-assigned stimulus category accounted for a significant portion of the variance (Table 2, Model 9). Pairwise comparisons among the different entrance types (lsmeans with Tukey correction) revealed that listening times were longer for least optimal staggered stimuli than for simultaneous stimuli (0.161 ± 0.016; mean ± SE), t(2227.84) = 9.941, p < 0.0001, and longer for most optimal staggered stimuli than for simultaneous stimuli (0.183 ± 0.016; mean ± SE), t(2227.91) = 11.30, p < 0.0001. Listening times were not different between the two staggered entrance categories (-0.021 ± 0.016; mean ± SE), t(2227.75) = −1.32, p = .38, indicating that the experimenters’ opinion of stimulus optimality was not shared consistently across our sample.

FIGURE 5.

Average listening times in each of the different stimulus categories in Experiments 2 and 3. Lines with asterisks indicate statistically significant differences, * = p < .05; *** = p < .0001.

FIGURE 5.

Average listening times in each of the different stimulus categories in Experiments 2 and 3. Lines with asterisks indicate statistically significant differences, * = p < .05; *** = p < .0001.

TABLE 2.

Model Results in Experiment 2

Model#Model statementAICTest ofModels comparedChi-SquaredfProbability
listen ~ (1 | subject_id)
     + (1 | stimulus_id) 
1596.2      
listen ~ (1 | stimulus_id) 3109.2 Subject 2, 1 1515 2.2e-16 
listen ~ (1 | subject_id) 1727.0 Stimulus 3, 1 132.82 2.2e-16 
listen ~ enjoy + mood + complex
     + interest +
     + (1 | subject_id)
     + (1 | stimulus_id) 
1400.1 All fixed
 effects 
4, 1 204.13 2.2e-16 
listen ~ mood + complex + interest
     + (1 | subject_id)
     + (1 | stimulus_id) 
1452.5 Enjoy 5, 4 54.41 1.6e-13 
listen ~ enjoy + complex + interest
     + (1 | subject_id)
     + (1 | stimulus_id) 
1398.3 Mood 6, 4 0.22 0.6376 
listen ~ enjoy + mood + interest
     + (1 | subject_id)
     + (1 | stimulus_id) 
1403.0 Complex 7, 4 4.99 0.0255 
listen ~ enjoy + mood + complex
     + (1 | subject_id)
     + (1 | stimulus_id) 
1402.9 Interest 8, 4 4.83 0.0280 
listen ~ stimulus_category
     + (1 | subject_id) 
1583.6 Stimulus
 Category 
9, 3 147.44 2.2e-16 
10 listen ~ enjoy + mood + complex
     + interest + (enjoy + mood
     + complex + interest | subject_id)
     + (1 | stimulus_id) 
1322.2 Random
 slopes 
10, 4 105.8 14 3.6e-16 
Model#Model statementAICTest ofModels comparedChi-SquaredfProbability
listen ~ (1 | subject_id)
     + (1 | stimulus_id) 
1596.2      
listen ~ (1 | stimulus_id) 3109.2 Subject 2, 1 1515 2.2e-16 
listen ~ (1 | subject_id) 1727.0 Stimulus 3, 1 132.82 2.2e-16 
listen ~ enjoy + mood + complex
     + interest +
     + (1 | subject_id)
     + (1 | stimulus_id) 
1400.1 All fixed
 effects 
4, 1 204.13 2.2e-16 
listen ~ mood + complex + interest
     + (1 | subject_id)
     + (1 | stimulus_id) 
1452.5 Enjoy 5, 4 54.41 1.6e-13 
listen ~ enjoy + complex + interest
     + (1 | subject_id)
     + (1 | stimulus_id) 
1398.3 Mood 6, 4 0.22 0.6376 
listen ~ enjoy + mood + interest
     + (1 | subject_id)
     + (1 | stimulus_id) 
1403.0 Complex 7, 4 4.99 0.0255 
listen ~ enjoy + mood + complex
     + (1 | subject_id)
     + (1 | stimulus_id) 
1402.9 Interest 8, 4 4.83 0.0280 
listen ~ stimulus_category
     + (1 | subject_id) 
1583.6 Stimulus
 Category 
9, 3 147.44 2.2e-16 
10 listen ~ enjoy + mood + complex
     + interest + (enjoy + mood
     + complex + interest | subject_id)
     + (1 | stimulus_id) 
1322.2 Random
 slopes 
10, 4 105.8 14 3.6e-16 

Together, the fixed effects for the subjective ratings of enjoyment, mood congruity, interestingness, and complexity explained a significant portion of the variance (Table 2, Model 4). Subsequent tests of the statistical significance of individual variables (Table 2, Models 5–8) indicated that enjoyment precipitated a statistically significant reduction in model fit when it was removed. The effects of perceived complexity and interest were rather weak when enjoyment was in the model. However, models that regressed listening time on each of the subjective variables individually indicated that every variable was statistically significant in predicting listening times (individual results not shown). We therefore explored the relationships among variables taking a multilevel mediation analysis approach as in Experiment 1.

Mediation analyses

Mood congruency of the excerpts was a significant predictor of listening times, χ2 (1) = 97.44, p < .0001, but not when enjoyment was also in the model, χ2(1) = 0.06, p = .81. Mood congruency significantly predicted enjoyment, χ2(1) = 1792.7, p < .0001. The same pattern of results was found for interestingness: a model incorporating how interesting a person found each excerpt to be also significantly predicted listening times, χ2(1) = 137.57, p < .0001, but not when enjoyment was also in the model,χ2(1) = 1.76, p = .18. Interestingness significantly predicted enjoyment, χ2(1) = 2306.6, p < .0001. These sets of results indicate that the effects of mood congruency and perceived interestingness of the stimuli on listening times are fully mediated by enjoyment. The coefficients and associated 95% confidence intervals for the paths between these variables are summarized in Figure 6. Further analyses showed that perceived complexity did not significantly predict enjoyment when both mood congruency and interestingness were in the model, χ2(1) = 0.02, p = .90. Complexity was, however, a significant predictor of interestingness, χ2(1) = 1133.6, p < .0001.

FIGURE 6.

Mediation analysis for Experiment 2. The numbers indicate beta coefficient estimates with 95% confidence intervals.

FIGURE 6.

Mediation analysis for Experiment 2. The numbers indicate beta coefficient estimates with 95% confidence intervals.

Continuation and termination reasons

The percentages and rank orderings of endorsed per-trial continuation and termination reasons largely matched those observed in Experiment 1 (Figure 4).

SUMMARY

The results of Experiment 2 affirmed that college-aged listeners desire change as they listen to an unfamiliar piece of music, and that satisfying this desire by progressively introducing new parts into the musical scene increased—as predicted—the amount of time a listener would listen to it.

As in Experiment 1, listening time was best predicted at the trial level by the enjoyment that a participant experienced. Enjoyment fully mediated the effects of how interesting a person found the music to be and how much it matched his or her mood. Mood congruency and interestingness contributed independently to ratings of enjoyment, thus pointing to the importance of both musical and situational factors in deciding how long to listen to a piece of music. Even though our assessment of perceived complexity was more fine-grained than in Experiment 1, the results indicate that complexity is only a distal determinant of listening times when using a library of stimuli that are more complex than monophonic melodies. In other words, two higher-order constructs— interestingness and enjoyment—mediated the effects of perceived complexity.

Experiment 3

An aim of the third experiment was to determine whether systematic manipulations of musical features other than the entry of instrument parts would affect listening times. Because the Apple Loops that we utilized as starting material in the previous experiments are pre-recorded and cannot be manipulated easily at the level of individual events, we composed a set of loops coded as MIDI stimuli that could be manipulated according to various criteria, most notably to reduce variation in tonal and rhythmic dimensions.

METHOD

Participants

In exchange for partial course credit, 390 participants were recruited from an undergraduate subject pool at the University of California, Davis. All participants provided informed consent in accordance with a protocol approved by the UC Davis Institutional Review Board. Data collection occurred in three separate time windows with approximately 135 participants per time window: Spring quarter of 2013 (N = 161), Fall quarter of 2013 (N = 111), and Fall quarter of 2015 (N = 118). None of the participants had participated in either of the previous experiments.

Ninety-five participants were excluded for failing to evaluate more than 20 stimuli or because they endorsed statements on a self-evaluation questionnaire at the end of the experiment that suggested that their data were compromised. The questionnaire included the statements (response options in parentheses), “I muted the audio for one or more stimuli (Yes, No)” and “I performed the experiment to the best of my ability and believe my responses are valid (Strongly Disagree, Disagree, Neither Agree nor Disagree, Agree, Strongly Agree).” Participants who endorsed the audio muting question in the affirmative or the performance question with either level of disagreement were excluded from further analysis. An additional 12 participants were excluded due to starting multiple sessions and hearing at least one stimulus more than once. Of the final 283 participants, 72.1% were female. The racial distribution of participants was 46.4% Asian, 25.2% Caucasian, 1.8% African American, 6.7% more than one racial category, 11.7% other racial categories, and 7.4% unknown. 21.3% reported their ethnicity as Hispanic. Ages ranged between 18 and 30 years of age (mean ± SD = 20.5 ± 2.7).

Stimuli

Forty-seven original stimuli were composed using a MIDI keyboard and Logic by C.N. and B.K. (27 and 20 loops, respectively). Most of the loops were composed for four instruments and most had loop durations of 8 s (Figure 1). All loops were constrained to have all of the instrument parts enter within the first iteration. However, the expressive freedom granted to the composers resulted in the utilization of smaller or larger numbers of instruments, and a wide range of tempi and phrase lengths. The latter two factors resulted in a broader distribution of loop durations than in the previous two experiments. Longer phrases were generally due to the introduction of harmonic progressions (i.e., sequences of chord changes). The presence of chord progressions was a significant departure from the musical material composed from Apple Loops. In order to be able to flexibly arrange combinations of Apple Loops without having them clash harmonically (i.e., create pervasive dissonance) there is effectively a requirement that they remain on the same chord, thus greatly limiting their mobility within tonal space, one of the most important feature spaces in western tonal music (Collins, Tillmann, Barrett, Delbé, & Janata, 2014; Janata et al., 2002; Krumhansl, 1990).

By introducing harmonic variation to many of the composed loops, it became possible, for any given original parent loop, to create a modified version in which the harmonic changes were eliminated or minimized, resulting in a “reduced harmony” set of loops. Three additional sets of manipulated loops were created for each parent loop. These included a “reduced rhythm” version, in which a majority of events were aligned to an isochronous grid, a “timbral variant” version, in which different instruments were substituted for the original instruments, and a “spatial” version, in which instruments were spatially distributed by panning them by various amounts to either the left or right. We thus arrived at a library of 235 novel stimuli within 47 stimulus families. Each stimulus had a maximal duration of two minutes.

In order to provide continuity of material across experiments, we also included stimuli from Experiment 2: 10 stimuli with simultaneous entrances (five with the longest (best) and five with the shortest (worst) average listening times) and 20 stimuli with staggered entrances (10 with the longest and 10 with the shortest average listening times).

Finally, in order to address a potential concern that the looping stimuli in our library do not adequately reflect properties of commercially available popular music that participants might normally listen to, we included the opening two minutes of a famous (though largely unfamiliar to our sample) piece of music, Chameleon (Hancock, 1973). The first two minutes of this piece consist of looping instrument parts with staggered entrances. Chameleon starts with a repeating bass line, is joined by a drum kit at approximately 12 s, and continues with further instrument entrances spaced approximately 19 s apart. This naturally occurring piece of music is thus uniquely well matched to our stimulus corpus, and we therefore included the first two minutes of Chameleon as a point of reference for our corpus.

Procedure

Participants completed the experiment online at a location of their choosing, as in the previous experiments. The overall procedure was the same as in the previous experiments, with a small number of changes.

Prior to hearing any stimuli, participants completed the Brief Affective Neurosciences Personality Scales (BANPS; Barrett et al., 2013). We also included a number of new post-stimulus questions. The new first page of the questionnaire comprised six questions in total requiring a rating on a 7-point scale (1 = not at all; 7 = very much): “How much did you enjoy the musical excerpt?” “How well did the musical excerpt match your current mood?” “To what extent did the musical stimulus sound complex?” “To what extent did you find the musical excerpt interesting?” “How familiar did the music sound to you (1 = not at all; 7 = very much)?” and “To what extent did the music groove (1 = not at all; 7 = very much)?” The question about familiarity was added because a number of participants had indicated in their free responses to the termination/continuation questions that pieces sounded familiar.

On the second page of post-stimulus questions, one additional possible answer was added to the question “Please indicate the reasons you stopped listening (check all that apply).” The choice “The music didn't match my mood” was added as the complement to the “The music matched my mood” reason from the preceding question relating to continuation of listening.

The stimuli were presented in random order. In order to prevent familiarity effects due to presentation of different versions of the same composed parent stimulus, each participant heard only one stimulus version from any given stimulus family. Each participant could have heard up to 78 different stimuli.

As in the previous experiments, participants were informed that the experiment would last exactly one hour and therefore how long they chose to listen to any given stimulus had no impact on the overall duration of the experiment. Recognizing, however, that some number of participants would likely be disinterested in the experiment, we further bolstered our attempts to identify participants whose data warranted removal. Specifically, at the end of the experiment, participants completed the aforementioned 5-item self-evaluation questionnaire.

Data analysis

The same data analysis procedures as in the previous experiments were followed. Participants indicated that they would have stopped the excerpt earlier had they not been distracted on 18.8% (2564/13656) of the trials. These trials were removed from the data before the analyses.

RESULTS

Listeners each rated between 20 and 70 loops (median± SD = 42 ± 18), and between 30 and 116 listeners rated each stimulus (median ± SD = 44 ± 17). Average listening times in Experiment 3 were ~22 s, thus falling between those of Experiments 1 and 2 (Figure 2E, 2F). Eighty-four participants heard Chameleon. Figure 5 illustrates the differences in average listening times for the different stimulus categories alongside those from Experiment 2. Differences in the pre-assigned stimulus category accounted for a significant portion of the variance (Table 3, Model 11). Planned contrasts of listening times for the original composed stimuli with the listening times for the modified versions showed that only listening times for reduced-harmony stimuli were significantly shorter (-0.03 ± 0.01; mean ± SE), t(10124.17) = −2.73, p = .05. Of the stimuli from the previous experiment, stimuli from both of the simultaneous entrance categories were associated with shorter listening times than the composed stimuli: worst simultaneous (-0.13 ± 0.02; mean ± SE), t(10124.54) = −6.26, p < .0001; best simultaneous (-0.12 ± 0.02; mean ± SE), t(10125.90)= −6.47, p < .0001. The “best staggered” stimuli carried over from Experiment 2 were listened to longer on average (0.10 ± 0.02; mean ± SE), t(10128.53) = 7.03, p < .0001, whereas there was no difference for either the “worst staggered” or for Chameleon, which had an average listening time of 17.5 s.

TABLE 3.

Model Results in Experiment 3

Model#Model statementAICTest ofModels comparedChi-SquaredfProbability
listen ~ (1 | subject_id)
     + (1 | stimulus_id) 
7304.7      
listen ~ (1 | stimulus_id) 15286.7 Subject  2, 1 7966 2.2e-16 
listen ~ (1 | subject_id) 7490.5 Stimulus  3, 1 187.77 2.2e-16 
listen ~ enjoy + mood + complex
     + interest + familiar + groove
     + (1 | subject_id)
     + (1 | stimulus_id) 
6403.8 All fixed effects  4, 1 912.9 2.2e-16 
listen ~ mood + complex + interest
     + familiar + groove
     + (1 | subject_id)
     + (1 | stimulus_id) 
6452.6 Enjoy  5, 4 50.76 1.0e-12 
listen ~ enjoy + complex + interest
     + familiar + groove
     + (1 | subject_id)
     + (1 | stimulus_id) 
6407.2 Mood  6, 4 5.41 0.020 
listen ~ enjoy + mood + interest
     + familiar + groove
     + (1 | subject_id)
     + (1 | stimulus_id) 
6406.5 Complex  7, 4 4.70 0.030 
listen ~ enjoy + mood + complex
     + familiar + groove
     + (1 | subject_id)
     + (1 | stimulus_id) 
6485.3 Interest  8, 4 83.48 2.2e-16 
listen ~ enjoy + mood + complex
     + interest + groove
     + (1 | subject_id)
     + (1 | stimulus_id) 
6401.8 Familiar  9, 4 0.01 0.907 
10 listen ~ enjoy + mood + complex
     + interest + familiar
     + (1 | subject_id)
     + (1 | stimulus_id) 
6402.3 Groove 10, 4 0.44 0.508 
11 listen ~ stimulus_category
     + (1 | subject_id) 
7308.0 Stimulus
 Category 
11, 3 200.46 2.2e-16 
12 listen ~ enjoy + mood + complex
     + interest + (enjoy + mood
     + complex + interest | subject_id)
     + (1 | stimulus_id) 
6058.0 Random slopes 12, 4 399.83 27 2.2e-16 
Model#Model statementAICTest ofModels comparedChi-SquaredfProbability
listen ~ (1 | subject_id)
     + (1 | stimulus_id) 
7304.7      
listen ~ (1 | stimulus_id) 15286.7 Subject  2, 1 7966 2.2e-16 
listen ~ (1 | subject_id) 7490.5 Stimulus  3, 1 187.77 2.2e-16 
listen ~ enjoy + mood + complex
     + interest + familiar + groove
     + (1 | subject_id)
     + (1 | stimulus_id) 
6403.8 All fixed effects  4, 1 912.9 2.2e-16 
listen ~ mood + complex + interest
     + familiar + groove
     + (1 | subject_id)
     + (1 | stimulus_id) 
6452.6 Enjoy  5, 4 50.76 1.0e-12 
listen ~ enjoy + complex + interest
     + familiar + groove
     + (1 | subject_id)
     + (1 | stimulus_id) 
6407.2 Mood  6, 4 5.41 0.020 
listen ~ enjoy + mood + interest
     + familiar + groove
     + (1 | subject_id)
     + (1 | stimulus_id) 
6406.5 Complex  7, 4 4.70 0.030 
listen ~ enjoy + mood + complex
     + familiar + groove
     + (1 | subject_id)
     + (1 | stimulus_id) 
6485.3 Interest  8, 4 83.48 2.2e-16 
listen ~ enjoy + mood + complex
     + interest + groove
     + (1 | subject_id)
     + (1 | stimulus_id) 
6401.8 Familiar  9, 4 0.01 0.907 
10 listen ~ enjoy + mood + complex
     + interest + familiar
     + (1 | subject_id)
     + (1 | stimulus_id) 
6402.3 Groove 10, 4 0.44 0.508 
11 listen ~ stimulus_category
     + (1 | subject_id) 
7308.0 Stimulus
 Category 
11, 3 200.46 2.2e-16 
12 listen ~ enjoy + mood + complex
     + interest + (enjoy + mood
     + complex + interest | subject_id)
     + (1 | stimulus_id) 
6058.0 Random slopes 12, 4 399.83 27 2.2e-16 

The series of mixed effects models (Table 3) showed that together, the fixed effects due to enjoyment, mood congruency, perceived interestingness, complexity, groove, and subjective familiarity explained a significant portion of the variance (Table 3, Model 4). A model incorporating random slopes (that is, differences among individuals in the strengths of relationships between the subjective variables and listening times) showed that the random slopes explained a very large part of the variance beyond that explained by the fixed effects and random intercepts model, but no further attempts were made to explore these individual differences.

Of the subjective variables, enjoyment (Table 3, Model 5) and interestingness (Table 3, Model 8) independently explained large amounts of variance. Mood congruency (Table 3,Model 6) and perceived complexity (Table 3, Model 7) also explained statistically significant amounts of variance in listening times, albeit it to a much lesser degree. Perceived groove and a sense of familiarity did not explain significant amounts of variance when all other variables were present in the model. The large amount of variance that interestingness explained independently of enjoyment contrasted with what we had observed in Experiment 2. In Experiments 1 and 2, enjoyment dominated the other subjective variables in explaining listening time. That interest in the music served to explain additional variance in listening time suggests that stimuli in the present experiment enticed participants to listen longer to the music, even if they were not being motivated purely by enjoyment. A two-sample t-test comparing interestingness ratings across stimuli and participants in Experiments 2 and 3 indicated that mean interestingness was significantly higher in Experiment 3 (3.54 ± 1.66; mean ± SD) than in Experiment 2 (3.20 ± 1.71), t(3367.2) = 8.84, p < .0001.

Mediation analyses

Figure 7 shows the results of the multi-level mediation analyses. Because enjoyment and interestingness explained such large amounts of listening time variance, we examined the potential mediating role of each of these variables separately,2 beginning with enjoyment.

FIGURE 7.

Mediation analysis for Experiment 3. The numbers indicate beta coefficient estimates with 95% confidence intervals.

FIGURE 7.

Mediation analysis for Experiment 3. The numbers indicate beta coefficient estimates with 95% confidence intervals.

Following on the results from Experiment 2, we separately estimated the mediating effect of enjoyment on mood congruency and interestingness. Mood congruency of the excerpts was a significant predictor of listening times, χ2 (1) = 578.2, p < .0001, and to a lesser but still significant degree when enjoyment was also in the model,χ2 (1) = 18.31, p < .0001. Together with the fact that mood congruency significantly predicted enjoyment, χ2 (1) = 8918.2, p < .0001the effect of mood congruency on listening times could be considered to be partially mediated by enjoyment.

Perceived interestingness of the excerpts was a significant predictor of listening times, χ2 (1) = 811.94, p < .0001, but to a lesser but still significant degree when enjoyment was also in the model,χ2 (1) = 103.26, p < .0001. Interestingness also significantly predicted enjoyment, χ2 (1) = 10440, p < .0001, thus indicating partial mediation, as in the case of mood congruency.

In this experiment, we also assessed the sense of familiarity with the loops, even though all of the loops were composed by us and therefore unfamiliar to the participants. Perceived familiarity (3.32 ± 1.74; mean ± SD) of the excerpts was a significant predictor of listening times, χ2 (1) = 333.3, p < .0001, though much less so when enjoyment was added to the model, χ2(1) = 7.35, p = .007. Familiarity significantly predicted enjoyment, χ2 (1) = 4175.2, p < .0001, thus indicating almost full mediation of familiarity on listening times by enjoyment.

Perceived groove had a significant effect on listening times, χ2 (1) = 439.16, p < .0001, and to a lesser, but still significant, degree when enjoyment was in the model,χ2 (1) = 15.51, p < .0001. Groove significantly predicted enjoyment, χ2 (1) = 5465.2, p < .0001, thus indicating partial mediation of groove on listening times by enjoyment.

Because the mediating effects of enjoyment on the influence of interestingness on listening times was only partial and because perceived complexity was found in Experiment 2 to influence interestingness, we examined a mediation model in which interestingness was evaluated as a mediator of the effect of complexity on listening times. Perceived complexity had a significant effect on listening times, χ2 (1) = 345.7, p < .0001, but not when interestingness was in the model, χ2(1) = 0.35, p = .55. Complexity significantly predicted interestingness, χ2 (1) = 6401, p < .0001, thus indicating full mediation of complexity on listening times by interestingness.

DISCUSSION

The results of Experiment 3 confirmed that the introduction of new instrument parts extends engagement with a stimulus. Again, enjoyment of a stimulus emerged as the primary determinant of listening times, strongly mediating both how interesting and how congruent with a listener's present mood a looping stimulus was. However, interestingness also emerged as having a significant direct effect on listening times. This result could have been due to the expansion of the stimulus library, with over 40 looping stimuli that were composed for the experiment and that allowed for examining directly the effects of specific manipulations of several musical factors, including the tonal and rhythmic properties of the music. Of these manipulations, only reduction of the harmonic variability (that is, the progression from one chord to another) resulted in shorter listening times compared to the original counterparts. Thus, tonal variation seems important for sustaining interest when listening to unfamiliar musical stimuli. On average, the composed stimuli were rated as being more interesting (mean = 3.56) than were the stimuli composed using Apple Loops (mean = 3.32), t(490.96) = 2.84, p < .005, which might explain the overall longer listening times for the composed stimuli, despite the simultaneous instrument entrances, than for the previously used looping stimuli in which all of the instruments entered at the same time.

Notably, participants did not listen to Chameleon any longer, on average, than they did to the majority of the other stimuli in the experiment. Although one must be cautious in drawing inferences based on the results for only one real-world example, this result suggests to us that participants were as discerning with our stimuli as they would be with natural stimuli that share some of the same structural features, in particular, staggered entrances of looping instrument parts.

Stimulus Characteristics and Listening Times

In each of the experiments above, the random stimulus intercept explained a significant proportion of the variance in listening times, indicating that some stimuli were listened to longer on average than others. Average listening times appeared to be normally distributed across a considerable range of ~20 s (Figure 2B, 2D, 2F). The results of the experimental manipulations in Experiments 2 and 3 of staggering/delaying loop entrances clearly indicated that the entrance of new instrument parts was a strong driver of engagement with the music. The shorter listening times for stimuli in which harmonic variation had been reduced suggested that tonal variation is important for sustaining interest. To quantify the influence of stimulus factors on per-stimulus listening times more fully, we modeled the variability in average listening times across stimuli with several stimulus descriptors. These descriptors included macroscopic factors such as the number of instruments or the time at which the last instrument entered, as well as parameters obtained from calculations performed on the audio signals that were played to the participants.

METHOD

Stimulus parameters were obtained using two MATLAB toolboxes: the Music Information Retrieval Toolbox (MIR Toolbox v1.6.1; Lartillot, Toiviainen, & Eerola, 2008) and the Janata Lab Music Toolbox (JLMT; Collins et al., 2014). Instead of calculating and testing for statistical significance parameters for all available representations and metrics within these toolboxes, several were chosen, a priori, as described below, based on the relationship of the features that they purportedly measure to the psychologically relevant predictors of listening time.

The MIR Toolbox was used to obtain parameters related to three timbral qualities of the stimuli: brightness, roughness, and irregularity. These timbral qualities were chosen because listeners frequently endorsed continuation/termination reasons pertaining to the way that specific instruments sounded. Brightness is a measure of the high frequency content present in the stimulus and is a salient cue in timbre perception (Grey, 1977; McAdams, Winsberg, Donnadieu, Desoete, & Krimphoff, 1995). The presence of shrill or tinny instruments would, for example, increase the brightness values for a stimulus. Roughness (calculated using the Sethares method) reflects the interactions of peak frequencies in short-term spectra, and is associated with buzzy or dirty qualities of a sound. Roughness has affective implications for musical sounds that are distinct from other spectral characteristics that influence the perception of consonance and dissonance (Cousineau, McDermott, & Peretz, 2012; McDermott, Schultz, Undurraga, & Godoy, 2016). Finally, irregularity refers to moment-to-moment changes (within 50 ms windows) in the amplitudes of peak frequencies. We included this measure because of its estimation of variability in the stimuli.

The novelty metric was chosen because its name harkens to the evidence that novelty and variability appeared to be strong determinants of listening times. In this case, the novelty measure in the MIR Toolbox was calculated on the similarity matrix of Mel-frequency cepstral coefficient (MFCC) representations. MFCCs are commonly used in the music information retrieval (MIR) community because of their compact representation of periodicities in log-spaced frequency spectra of the audio waveforms in brief time windows (e.g., 50 ms). The novelty measure represents the variation along the diagonal of the similarity matrix calculated from the sequence of MFCCs. As stated in the MIRToolbox Users Guide 1.3, “Convolution along the main diagonal of the similarity matrix using a Gaussian checkerboard kernel yields a novelty curve that indicates the temporal locations of significant textural changes.”

Because all of the MIRToolbox functions returned time-series for the different metrics, and because we were not collecting any time-series response data on a per stimulus basis, means of each time-series metric were calculated so that the central tendency of each measure was used per stimulus in the analyses of listening time.

The JLMT was used to characterize metrics pertaining to metric/rhythmic and tonality properties of the stimuli. The JLMT extends the IPEM Toolbox (Leman, Lesaffre, & Tanghe, 2001), which implements a model of the manner in which sound (audio) is converted to firing patterns of auditory nerve neurons (Van Immerseel & Martens, 1992), whereupon other functions implement analyses that focus on either temporal (metric and rhythmic) or tonal properties of the audio input. A measure of metric/rhythmic complexity, referred to here as logRatioComplexity was obtained using the “rhythm profiler” functions of the JLMT (Tomic & Janata, 2008). Specifically, for each stimulus, we first identified the set of dominant temporal periodicities in the stimulus (i.e., the peaks in the mean periodicity profile; see Tomic & Janata, 2008, Figure 4). We then calculated the logarithm of the ratio of energy contained in periodicities related to the periodicity with the most energy by simple integer ratios (1/8, 1/4, 1/3, 1/2, 1, 2, 3, 4, 8, 16) and energy contained at other periodicities, following the approach for estimating the meter of musical excerpts described in Tomic & Janata (2008).

The tonality measures reflected the amount of movement in tonal space, as represented on the surface of torus (Collins et al., 2014; Janata, 2007; Janata et al., 2002; Krumhansl & Kessler, 1982; Krumhansl & Toiviainen, 2000). Specifically, for each stimulus, we calculated the correlation matrix (self-similarity matrix) for sequences of toroidal activations obtained using 0.1 and 2-s time-constants (Collins et al., 2014; Janata, 2007). Activations of tonal variation at these two time-scales reflect moment-to-moment chord changes and key-estimates based on harmonic changes in short phrases, respectively. We then calculated the mean of the off-diagonal elements of this correlation matrix to obtain tonal self-similarity measures. Lower values are indicative of greater amounts of harmonic change.

In total, 9 variables served as the pool of independent variables for selection in stepwise regression models: 1) the number of instruments in the loop, 2) the total loop time – the amount of time during which novel musical material was played, 3) mean brightness, 4) mean roughness, 5) mean irregularity, 6) mean novelty, 7) logRatioComplexity, 8) mean tonal self-similarity using a 100 ms time constant, and 9) mean tonal self-similarity using a 2-s time constant. Listening Time, Enjoyment, Interestingness, and Complexity each served as dependent variables in separate models. Stimuli from each experiment were modeled separately.

RESULTS AND DISCUSSION

Table 4 shows the results of the stepwise regression analyses. For each of the dependent variables of interest— listening time, enjoyment, interestingness, complexity—and for each of the three experiments, the rows are ordered by the model step in which a stimulus parameter entered into the model. With the exception of listening times in Experiment 3, the total loop time was present in all of the models. As the duration of novel musical information increased, so did listening times. The ratios of listening times to total loop duration are plotted in Figure 8. Total loop duration is the total amount of time for which novel musical information is being presented. This would be 8 s for a version of a piece in which four instruments, each with its own 8-s loop entered simultaneously, and 32 s for another version of the piece in which the instrument entrances were staggered. The distributions in Figure 8 show that the majority of stimuli were listened to for a minimum of one full iteration of the novel material. For example, in Experiment 1 the peak of the distribution falls between 1 and 2 full iterations.

TABLE 4.

Results of Stepwise Regression Models of Average Listening Times and Psychological Variables Using Acoustic and Musical Parameters

Dependent variableExperimentModel stepParameterEstimate (b)Standard ErrorProbability (p)
Listening Time Total loop time 0.31 0.14 .02 
 Total loop time 0.43 0.06 < .0001 
  Roughness 0.01 0.002 .01 
 Brightness –11.36 2.50 < .0001 
  Tonal self-similarity (tau = 2s) –10.83 3.68 .004 
Enjoyment Tonal self-similarity (tau = 2s) –1.01 0.45 .03 
  Brightness –0.68 0.27 .01 
  Total loop time 0.04 0.02 .03 
 Total loop time 0.02 0.01 .0001 
  Novelty –0.24 0.08 .005 
 Brightness –0.76 0.22 .001 
  Total loop time 0.01 0.005 .003 
  #instruments 0.13 0.05 .005 
Interestingness Total loop time 0.01 0.004 .002 
 Brightness –0.58 0.19 .003 
  Total loop time 0.01 0.004 .002 
  #instruments 0.11 0.04 .01 
Complexity Total loop time 0.01 0.003 .03 
 #instruments 0.15 0.04 < .00001 
  logRatio Complexity 0.15 .05 .01 
  Total loop time 0.01 0.003 .03 
Dependent variableExperimentModel stepParameterEstimate (b)Standard ErrorProbability (p)
Listening Time Total loop time 0.31 0.14 .02 
 Total loop time 0.43 0.06 < .0001 
  Roughness 0.01 0.002 .01 
 Brightness –11.36 2.50 < .0001 
  Tonal self-similarity (tau = 2s) –10.83 3.68 .004 
Enjoyment Tonal self-similarity (tau = 2s) –1.01 0.45 .03 
  Brightness –0.68 0.27 .01 
  Total loop time 0.04 0.02 .03 
 Total loop time 0.02 0.01 .0001 
  Novelty –0.24 0.08 .005 
 Brightness –0.76 0.22 .001 
  Total loop time 0.01 0.005 .003 
  #instruments 0.13 0.05 .005 
Interestingness Total loop time 0.01 0.004 .002 
 Brightness –0.58 0.19 .003 
  Total loop time 0.01 0.004 .002 
  #instruments 0.11 0.04 .01 
Complexity Total loop time 0.01 0.003 .03 
 #instruments 0.15 0.04 < .00001 
  logRatio Complexity 0.15 .05 .01 
  Total loop time 0.01 0.003 .03 
FIGURE 8.

Distributions of ratios of listening time to duration of musical novelty.

FIGURE 8.

Distributions of ratios of listening time to duration of musical novelty.

In Experiment 3, as the total number of instruments in the loops increased, so did the perceived complexity, the degree of interest, and the amount of enjoyment (Table 4). The same effect was not observed in Experiments 1 or 2, likely because of insufficient variation in the number of instruments per loop in these experiments (Figure 1). Of the acoustic variables, brightness entered into the largest number of models, explaining variability in enjoyment in Experiments 1 and 3, and the degree of interestingness and listening times in Experiment 3. In all of these cases, as brightness (average frequency in the spectrum) increased, interestingness, enjoyment, and listening times decreased. This result points to a possible acoustic correlate of the third most common reason participants gave for stopping any given loop, which was, “The sound of a particular instrument bothered me.” The effects of other acoustic measures were less common, though increases in MFCC variability, as captured in the novelty variable, resulted in decreased enjoyment in Experiment 2. Nevertheless, the desire for tonal variability was evident in average enjoyment ratings in Experiment 1 and listening times in Experiment 3. As the amount of tonal self-similarity increased, enjoyment and listening times decreased.

Of interest was that the measure of rhythmic/metric complexity (logRatioComplexity) was, along with the number of instruments playing, a significant predictor of perceived complexity in Experiment 3. That a variable that represents higher order structural complexity in music only entered into a model of perceived complexity helps to establish this computed metric as a variable of interest in future experiments of subjective complexity and related dimensions. Moreover, that this variable emerged as a significant stimulus-level predictor for the Experiment 3 materials, which were considerably more diverse musically than in Experiments 1 and 2, helps to explain why interestingness also had a direct effect on listening times (instead of being mediated completely by enjoyment) and why it served as a mediator of perceived complexity on listening times. Overall, the stimulus-level analyses echoed the subjective variable and termination/continuation reason results across the three experiments, thus serving to further validate the stimulus metrics.

Personality

Aside from the state- or context-level variables described in the preceding sections (that is, those variables that are not purely a function of a stimulus, but rather depend in some manner on a participant's emotional state or listening history), there exist also trait-level participant characteristics that may influence listening times. We sought to understand whether dimensions of the BANPS would explain variability in the average listening times across participants. Specifically, we anticipated that higher scores on the Seek dimension would correspond to listening decisions driven by desire for novelty: shorter average listening times and greater proportions of trials in which participants reported that their reason for terminating the current musical excerpt was their curiosity about the upcoming stimulus (Figure 4).

To test these hypotheses, we calculated, for each person, the average log-transformed listening time and the proportion of trials on which they endorsed the, “curious what the next musical stimulus would be” termination reason. These served as dependent variables in a linear multiple regression analysis with the set of seven BANPS dimensions (Play, Seek, Care, Fear, Anger, Sadness, and Spirituality), as predictors. Higher scores on Fear were associated with longer average listening times, t(272) = 3.36, p = .0009. This result was unexpected and we have no explanation for it. Higher scores on Seek were also associated with longer listening times, t(272) = 2.18, p = .03. This result ran counter to our prediction. Consequently, we postulated that longer listening times among high Seek individuals might arise from continuing to listen in anticipation of novel musical elements, as assessed by the “waiting for new parts to enter” continuation reason, or obtaining fulfillment as assessed by the “kept listening because I heard new things” reason. However, variability in the proportion of trials on which these reasons were endorsed was not explained by the Seek or any of the other BANPS dimensions. As anticipated, higher Seek scores predicted a higher proportion of trials in which curiosity about the next musical stimulus was endorsed as the termination reason, t(272) = 2.77, p = .006. The Play dimension also explained variation in curiosity, t(272) = 2.00, p = .05, as did Spirituality, t(272) = 2.07, p = .04.

Our results indicate that decisions about how long to consume an unfamiliar piece of music are also driven by personality traits, although we hasten to note that our experimental context provided an abundance of musical material that could be encountered with minimal effort by the participants. In other words, there were no costs associated with having to search for new sources of music that might provide a more satisfying listening experience.

General Discussion

We sought to characterize important psychological and musical determinants underlying decisions about the consumption of realistic yet unfamiliar music. The musical properties that we manipulated, and the psychological constructs that we assessed, spanned a range of related concepts to which we now turn our attention. The concepts have been studied extensively within neuroscience, psychology, and music alike, but have not, to our knowledge, been integrated within a circumscribed series of experiments.

REPETITION, HABITUATION, AND LISTENING TIME

The unifying feature of our large stimulus library was the principle of repeating phrasal structures consisting of multiple instrument streams. Participants were asked across trials to make decisions about whether to continue listening to the currently repeating stimulus or instead answer a series of questions about their experience of the preceding stimulus prior to listening to another novel repeating stimulus.

We found that the introduction of novelty through delayed entrances of instrument parts increased listening times more than other factors did, thus extending the findings that staggered instrument entrances increased enjoyment and motoric engagement (Hurley et al., 2014). Although listeners tended to keep listening as they waited for new parts to enter (Figure 4), they chose to stop listening fairly quickly once it became apparent that a new part was unlikely to enter. Figure 8 shows the distributions of ratios obtained by dividing listening times by the maximal amount of novel information in a stimulus. In the first experiment, which used only synchronous onset stimuli, the average listening time fell between 1 and 2 iterations, suggesting that participants chose to stop listening to the current stimulus when 2 to 4 seconds of additional listening indicated with high likelihood that the same loop was repeating. Thus, habituation leading to a decision to stop listening was quite rapid.

Rapid attenuation of responses to repeating stimuli is also a common finding in neuroscience; reduction in response amplitude is often used as a basis for inferences that a brain region represents those properties of a stimulus that are being adapted. Many examples from the human neuroimaging literature show rapid adaptation (between the first and second stimulus in a sequence) of auditory cortical responses to repeating auditory and musical stimuli, whether repeating simple clicks or tones (Näätänen & Picton, 1987) or notes in a chord sequence or melody (Janata, 1995; Navarro Cebrian & Janata, 2010), both with and without lyrics (Sammler et al., 2010).

Of particular relevance is a neuroimaging study in which participants were confronted with a consumption decision about novel pieces of popular music (Salimpoor et al., 2013). Instead of listening time as a measure, the study used the amount of money that a participant bid on each musical item (30 s previews of commercially available popular music) after appraising it as a proxy for the degree to which a person desired to engage with each musical stimulus in the future. Activity within the right nucleus accumbens, a crucial node in the brain's reward circuitry, differentiated between subsequently purchased (at the highest price) or unpurchased music at approximately 18 s from the onset of the excerpt. A more dorsal region of the striatum, the caudate nucleus, showed differential activity within 10–14 s. This result suggests that the decision-driving appraisal process may have been completed within 15 seconds, a span of time that is within the range of listening times that we observed. The dorsal caudate is believed to play an important role in anticipation and expectation in music (Salimpoor, Benovoy, Larcher, Dagher, & Zatorre, 2011; Seger et al., 2013). Activity in the ventral striatum appears to be modulated directly by processing of a stimulus in the auditory cortex as evidenced by changes in functional connectivity during the appraisal task (Salimpoor et al., 2013). There is reduced connectivity between these regions in musical anhedonics—people who do not derive pleasure from listening to music (Martínez-Molina, Mas-Herrero, Rodríguez-Fornells, Zatorre, & Marco-Pallarés, 2016). A prediction to be tested is that habituation of activity in the auditory cortex to a looping stimulus would reduce the driving of activity in the ventral striatum and result in reduced experiencing of reward with each further repetition.

ENJOYMENT, MOOD CONGRUENCY, AND FAMILIARITY

The construct of enjoyment emerged as the primary predictor of listening times. It strongly mediated the effects of an affective construct of mood congruency, a cognitive construct of interestingness, and even the more participatory motoric constructs of urge to move and groove. This result differs from an earlier finding that listening time to monophonic tone sequences was better predicted by interestingness than enjoyment (Crozier, 1974). The difference likely lies in the nature of the material that listeners were appraising: pared down melodies versus more richly instrumented musical phrases that are more representative of music that an average listener might actually choose to consume. Although interestingness fed the sense of enjoyment in our experiment, our results thus speak to the importance of looking beyond stimulus-level factors when trying to understand the degree to which a person is likely to engage with a novel aesthetic object—in this case music.

We found that listening times were also influenced by mood congruency primarily via the enjoyment of any given stimulus. Mood congruency represents a contextual factor that is expected to vary with any given person's circumstances in the moment. Although we did not appraise each person's mood at the start of each session—or how it may have fluctuated during a session— our results corroborate those of a previous study examining the effects of mood manipulations on liking (akin to enjoyment) of mood-in/congruent and emotionally ambiguous pieces of music on liking. Using pictures for mood induction, Hunter, Schellenberg, and Griffith (2011) found that congruency between the induced emotions and the emotions associated with 30 s music excerpts influenced how much a person liked any given music excerpt, such that sad pieces were liked more following a sad mood induction, and emotionally ambiguous pieces were liked more following induction of a happy mood and liked less following induction of a sad mood. State-level affect (mood) thus emerges as an important variable to be accounted for in studies and models of consumption decisions pertaining to aesthetic objects.

We also found that the degree to which each piece sounded familiar influenced listening times via enjoyment. Familiarity has long been recognized as an important contributor to the shaping of liking and preferences for novel items (Bornstein, 1989; Madison & Schiölde, 2017; Szpunar et al., 2004). Familiarity is operationalized in experiments as the number of exposures to an item in question. Because all of our musical material was novel, any sense of familiarity would have been driven by genre-typical elements or other associations a participant may have made. For example, some of the composed loops were evocative of film scores. Our results indicate that a general sense of familiarity, rather than stimulus-specific familiarity, is sufficient to influence consumption behavior of a musical stimulus. Indeed, the sense of familiarity operating in this manner may be what encourages us to listen to an unfamiliar piece of music that nonetheless has typical characteristics of a genre with which we are familiar.

MUSICAL FACTORS AND COMPLEXITY

Our stimulus-level analyses clearly showed that stimulus characteristics at multiple musical levels shape listening times. The major determinant was at the level of orchestration, with the entrance of novel instruments capable of sustaining interest in the stimulus. However, we found that even timbral characteristics influenced listening times and consumption decisions, both in terms of termination reasons endorsed by participants as well as in terms of acoustic analyses of the stimuli themselves. These results are unsurprising in light of the fact that predominantly timbral qualities of a musical stimulus underlie extremely rapid (< 1 s) recognition of musical excerpts (Krumhansl, 2010; Schellenberg, Iverson, & McKinnon, 1999) and very rapid (1–4 s) emotional responses to music excerpts (Bigand, Vieillard, Madurell, Marozeau, & Dacquet, 2005).

At a higher structural level, tonal (melodic and harmonic) variation also seems to be an important prerequisite for sustained listening. In Experiment 3, pieces of music that tended to remain in the same region of tonal space were not enjoyed as much or listened to as long on average. Our calculated metric for rhythmic complexity correlated positively with perceived complexity ratings. Together, these results accord well with a recent study that used naturalistic musical stimuli, in which overall complexity ratings were driven by ratings of melodic, followed by harmonic and rhythmic complexity (Madison & Schiölde, 2017). In this experiment, the number of repetitions (across weeks) positively influenced liking of the stimuli, but complexity did not. Complexity did, however, correlate with endorsements of how “dull” or “odd” the music sounded—terms that are congruous with the construct of “interestingness” that we asked our participants to endorse. Our findings, in which complexity predicted interestingness, which in turn predicted enjoyment (Experiments 2 and 3) and to some extent listening time (Experiment 3), thus add to the evidence (Madison & Schiölde, 2017; Martindale & Moore, 1989) that stimulus complexity is not a primary driver of liking, enjoyment, or consumption decisions unless differences in complexity are stark (Szpunar et al., 2004) or the library of exemplars is highly restricted and not very naturalistic (Crozier, 1974).

LISTENING TIME AND AESTHETICS

We believe that earlier attempts to construe aesthetic experience in terms of attention, arousal, and novelty— by Berlyne and other influential voices regarding the psychology of aesthetics—warrant further examination using direct measures of consumption behavior. Although aesthetic engagement with music has been examined extensively by studying the relationship of repeated exposures to liking—in large part with the aim of distinguishing arousal theories from perceptual fluency theories (e.g., Hunter & Schellenberg, 2011; Madison & Schiölde, 2017; Szpunar et al., 2004)—we believe that an approach that focuses on consumption decisions in the moment can help to understand the dynamic interactions of an individual's aesthetic and affective motivations with the aesthetic object with which she or he is faced.

Because repetition is both an integral part of music and musical experience at many levels (Margulis, 2014) and because it has strong psychological and neural habituating consequences that drive a person to disengage from a stimulus, the combination of manipulating repetition in a stimulus with listening time as an objective measure of engagement/utility presents an experimental paradigm that seems well-suited to studying such dynamic interactions and the specific factors that underlie sustained engagement. Though only a first step in this direction, the series of experiments and analyses presented here demonstrates that examining more closely the timing of decisions to continue engaging with or to switch to another stimulus is likely to yield insights into the interplay of individual differences, contextual, and musical factors that drive aesthetic, corporeal, and visceral engagement with any given piece of music.

Notes

Notes
1.
This survey instrument is still under development and is not yet a suitable psychometric tool. The responses were therefore not analyzed in any of the experiments reported in this paper.
2.
As in Experiments 1 and 2, we utilized a multilevel mediation model estimation framework in which a single independent variable, single mediator variable, and single outcome variable are considered. Models incorporating additional variables and their possible interrelationships become increasingly challenging to specify and estimate. We therefore adopted an approach of estimating a series of simpler models, recognizing that the path coefficient estimates we obtained could be different if the full covariance structures were taken into account.

References

References
Baron, R.M., & Kenny, D. A. (
1986
).
The moderator–mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations
.
Journal of Personality and Social Psychology
,
51
(
6
),
1173
1182
. DOI:
Barrett, F. S., Grimm, K. J., Robins, R. W., Wildschut, T., Sedikides, C., & Janata, P. (
2010
).
Music-evoked nostalgia: Affect, memory, and personality
.
Emotion, 10
(
3
),
390
403
. DOI:
Barrett, F. S., & Janata, P. (
2016
).
Neural responses to nostalgia-evoking music modeled by elements of dynamic musical structure and individual differences in affective traits
.
Neuropsychologia
,
91
,
234
246
. DOI:
Barrett, F. S., Robins, R.W., & Janata, P. (
2013
).
A brief form of the Affective Neuroscience Personality Scale
.
Psychological Assessment, Advance online publication
. DOI:
Bates, D. M., Maechler, M., & Bolker, B. (
2013
).
lme4: Linear mixedeffects models using S4 classes (Version 0.999999 –2)
. Retrieved from http://CRAN.R-project.org/package_lme4
Bauer, D. J., Preacher, K. J., & Gil, K. M. (
2006
).
Conceptualizing and testing random indirect effects and moderated mediation in multilevel models: New procedures and recommendations
.
Psychological Methods
,
11
(
2
),
142
163
. DOI:
Berlyne, D. E. (
1971
).
Aesthetics and psychobiology
.
New York
:
Appleton-Century-Crofts
.
Berlyne, D. E. (Ed.). (
1974
).
Studies in the new experimental aesthetics: Steps toward an objective psychology of aesthetic appreciation
.
New York
:
Wiley
.
Bigand, E., Vieillard, S., Madurell, F., Marozeau, J., & Dacquet, A. (
2005
).
Multidimensional scaling of emotional responses to music: The effect of musical expertise and of the duration of the excerpts
.
Cognition and Emotion
,
19
(
8
),
1113
1139
.
Bornstein, R. F. (
1989
).
Exposure and affect: Overview and meta-analysis of research, 1968–1987
.
Psychological Bulletin
,
106
(
2
),
265
289
. DOI:
Collins, T., Tillmann, B., Barrett, F. S., Delbé, C., & Janata, P. (
2014
).
A combined model of sensory and cognitive representations underlying tonal expectations in music: From audio signals to behavior
.
Psychological Review
,
121
(
1
),
33
65
. DOI:
Colombo, J., & Mitchell, D. W. (
2009
).
Infant visual habituation
.
Neurobiology of Learning and Memory
,
92
(
2
),
225
234
. DOI:
Cousineau, M., McDermott, J. H., & Peretz, I. (
2012
).
The basis of musical consonance as revealed by congenital amusia
.
Proceedings of the National Academy of Sciences
,
109
(
48
),
19858
19863
. DOI:
Crozier, J. B. (
1974
). Verbal and exploratory responses to sound sequences varying in uncertainty level. In D. E. Berlyne (Ed.),
Studies in the new experimental aesthetics: Steps toward an objective psychology of aesthetic appreciation
(pp.
27
90
).
New York
:
Wiley
.
Davis, K. L., & Panksepp, J. (
2011
).
The brain's emotional foundations of human personality and the Affective Neuroscience Personality Scales
.
Neuroscience and Biobehavioral Reviews
,
35
(
9
),
1946
1958
. DOI:
Grahn, J. A., & Brett, M. (
2007
).
Rhythm and beat perception in motor areas of the brain
.
Journal of Cognitive Neuroscience
,
19
(
5
),
893
906
.
Grey, J. M. (
1977
).
Multidimensional perceptual scaling of musical timbres
.
Journal of the Acoustical Society of America
,
61
,
1270
1277
.
Halpern, A. R., & Zatorre, R. J. (
1999
).
When that tune runs through your head: A PET investigation of auditory imagery for familiar melodies
.
Cerebral Cortex
,
9
(
7
),
697
704
.
Hancock, H. (
1973
).
Chameleon
. On
Headhunters
.
New York
:
Columbia Records
.
Hargreaves, D. J. (
1987
).
Verbal and behavioral responses to familiar and unfamiliar music
. [journal article].
Current Psychology
,
6
(
4
),
323
330
. DOI:
Hargreaves, D. J., Messerschmidt, E., & Rubert, C. (
1980
).
Musical preference and evaluation
.
Psychology of Music
,
8
(
1
),
13
18
.
Herholz, S. C., Halpern, A. R., & Zatorre, R. J. (
2012
).
Neuronal correlates of perception, imagery, and memory for familiar tunes
.
Journal of Cognitive Neuroscience
,
24
(
6
),
1382
1397
.
Holbrook, M. B., & Gardner, M. P. (
1993
).
An approach to investigating the emotional determinants of consumption durations: Why do people consume what they consume for as long as they consume it?
Journal of Consumer Psychology
,
2
(
2
),
123
142
. DOI: https://doi.org/10.1016/S1057-7408(08)80021-6
Holbrook, M. B., & Gardner, M. P. (
1998
).
How motivation moderates the effects of emotions on the duration of consumption
.
Journal of Business Research
,
42
(
3
),
241
252
. DOI: https://doi.org/10.1016/S0148-2963(97)00121-5
Hunter, P. G., & Schellenberg, E. G. (
2011
).
Interactive effects of personality and frequency of exposure on liking for music
.
Personality and Individual Differences
,
50
(
2
),
175
179
. DOI:
Hunter, P. G., Schellenberg, E. G., & Griffith, A. T. (
2011
).
Misery loves company: Mood-congruent emotional responding to music
.
Emotion
,
11
(
5
),
1068
1072
. DOI:
Hurley, B. K., Martens, P. A., & Janata, P. (
2014
).
Spontaneous sensorimotor coupling with multipart music
.
Journal of Experimental Psychology: Human Perception and Performance
,
40
(
4
),
1679
1696
.
Huron, D. (
1992
).
The ramp archetype and the maintenance of passive auditory attention
.
Music Perception
,
10
,
83
92
.
Janata, P. (
1995
).
ERP measures assay the degree of expectancy violation of harmonic contexts in music
.
Journal of Cognitive Neuroscience
,
7
,
153
164
.
Janata, P. (
2007
). Navigating tonal space. In W. B. Hewlett, E. Selfridge-Field & E. Correia (Eds.),
Tonal theory for the digital age
(Vol.
15
, pp.
39
50
).
Stanford, CA
:
Center for Computer Assisted Research in the Humanities
.
Janata, P. (
2009
).
The neural architecture of music-evoked autobiographical memories
.
Cerebral Cortex
,
19
,
2579
2594
. DOI:
Janata, P., Birk, J. L., Van Horn, J. D., Leman, M., Tillmann, B., & Bharucha, J. J. (
2002
).
The cortical topography of tonal structures underlying Western music
.
Science
,
298
(
5601
),
2167
2170
.
Janata, P., & Grafton, S. T. (
2003
).
Swinging in the brain: Shared neural substrates for behaviors related to sequencing and music
.
Nature Neuroscience
,
6
(
7
),
682
687
.
Janata, P., Tomic, S. T., & Haberman, J. M. (
2012
).
Sensorimotor coupling in music and the psychology of the groove
.
Journal of Experimental Psychology: General
,
141
(
1
),
54
75
. DOI:
Kornysheva, K., Von Cramon, D. Y., Jacobsen, T., & Schubotz, R. I. (
2010
).
Tuning-in to the beat: Aesthetic appreciation of musical rhythms correlates with a premotor activity boost
.
Human Brain Mapping
,
31
(
1
),
48
64
. DOI:
Krumhansl, C. L. (
1990
).
Cognitive foundations of musical pitch
.
New York
:
Oxford University Press
.
Krumhansl, C. L. (
2010
).
Plink: “thin slices” of music
.
Music Perception
,
27
,
337
354
.
Krumhansl, C. L., & Kessler, E. J. (
1982
).
Tracing the dynamic changes in perceived tonal organization in a spatial representation of musical keys
.
Psychological Review
,
89
(
4
),
334
368
.
Krumhansl, C. L., & Toiviainen, P. (
2000
).
Dynamics of tonality induction: A new method and a new model
.
Paper presented at the 6th International Conference on Music Perception and Cognition
,
Keele, United Kingdom
.
Lartillot, O., Toiviainen, P., & Eerola, T. (
2008
). A Matlab toolbox for music information retrieval. In C. Preisach, H. Burkhardt, L. Schmidt-Thieme, & R. Decker (Eds.),
Data analysis, machine learning, and applications
(pp.
261
268
).
Berlin
:
Springer
.
Leman, M., Lesaffre, M., & Tanghe, K. (
2001
).
Introduction to the IPEM toolbox for perception-based music analysis
.
Paper presented at the Proceedings of the FWO Research Society on Foundations of Music
,
Ghent, Belgium
.
Madison, G., & Schiölde, G. (
2017
).
Repeated listening increases the liking for music regardless of its complexity: Implications for the appreciation and aesthetics of music
.
Frontiers in Neuroscience
,
11
,
147
. DOI:
Margulis, E. H. (
2013
).
Aesthetic responses to repetition in unfamiliar music
.
Empirical Studies of the Arts
,
31
(
1
),
45
57
. DOI:
Margulis, E. H. (
2014
).
On repeat: How music plays the mind
.
New York
:
Oxford University Press
.
Martindale, C., & Moore, K. (
1989
).
Relationship of musical preference to collative, ecological, and psychophysical variables
.
Music Perception
,
6
,
431
446
.
Martínez-Molina, N., Mas-Herrero, E., Rodríguez-Fornells, A., Zatorre, R. J., & Marco-Pallarés, J. (
2016
).
Neural correlates of specific musical anhedonia
.
Proceedings of the National Academy of Sciences
,
113
(
46
),
E7337
E7345
. DOI:
McAdams, S., Winsberg, S., Donnadieu, S., Desoete, G., & Krimphoff, J. (
1995
).
Perceptual scaling of synthesized musical timbres - Common dimensions, specificities, and latent subject classes
.
Psychological Research-Psychologische Forschung
,
58
(
3
),
177
192
.
McDermot T, J. H., Schultz, A. F., Undurraga, E. A., & Godoy, R. A. (
2016
).
Indifference to dissonance in native Amazonians reveals cultural variation in music perception
.
Nature
,
535
(
7613
),
547
550
. DOI:
McSweeney, F. K. (
2004
).
Dynamic changes in reinforcer effectiveness: Satiation and habituation have different implications for theory and practice
.
The Behavior Analyst
,
27
(
2
),
171
188
.
Näätänen, R., & Picton, T. (
1987
).
The N1 wave of the human electric and magnetic response to sound: A review and an analysis of the component structure
.
Psychophysiology
,
24
(
4
),
375
425
.
Navarro Cebrian, A., & Janata, P. (
2010
).
Electrophysiological correlates of accurate mental image formation in auditory perception and imagery tasks
.
Brain Research
,
1342
,
39
54
. DOI:
Oakes, L. M. (
2010
).
Using habituation of looking time to assess mental processes in infancy
.
Journal of Cognition and Development
,
11
(
3
),
255
268
. DOI:
Pereira, C. S., Teixeira, J., Figueiredo, P., Xavier, J., Castro, S. L., & Brattico, E. (
2011
).
Music and emotions in the brain: Familiarity matters
.
PLoS ONE
,
6
(
11
),
e27241
. DOI:
R FAQ
(
2014
).
How can I perform mediation with multilevel data?
(Method 2).
Los Angeles, CA
:
UCLA Statistical Consulting Group
. Retrieved from http://stats.idre.ucla.edu/r/faq/how-can-i-perform-mediation-with-multilevel-data-method-2/
Russell, P. A. (
1982
).
Relationships between judgements of the complexity, pleasingness and interestingness of music
.
Current Psychological Research
,
2
(
1
),
195
201
. DOI:
Salimpoor, V. N., Benovoy, M., Larcher, K., Dagher, A., & Zatorre, R. J. (
2011
).
Anatomically distinct dopamine release during anticipation and experience of peak emotion to music
.
Nature Neuroscience
,
14
(
2
),
257
U355
. DOI:
Salimpoor, V. N., Van Den Bosch, I., Kovacevic, N., Mcintosh, A. R., Dagher, A., & Zatorre, R. J. (
2013
).
Interactions between the nucleus accumbens and auditory cortices predict music reward value
.
Science
,
340
(
6129
),
216
219
. DOI:
Sammler, D., Baird, A., Valabregue, R., Clement, S., Dupont, S., Belin, P., & Samson, S. (
2010
).
The relationship of lyrics and tunes in the processing of unfamiliar songs: A functional magnetic resonance adaptation study
.
Journal of Neuroscience
,
30
(
10
),
3572
3578
. DOI:
Schellenberg, E. G., Iverson, P., & Mckinnon, M. C. (
1999
).
Name that tune: Identifying popular recordings from brief excerpts
.
Psychonomic Bulletin and Review
,
6
(
4
),
641
646
.
Seger, C. A., Spiering, B. J., Sares, A. G., Quraini, S. I., Alpeter, C., David, J., & Thaut, M. H. (
2013
).
Corticostriatal contributions to musical expectancy perception
.
Journal of Cognitive Neuroscience
,
25
(
7
),
1062
1077
. DOI:
Stupacher, J., Hove, M. J., Novembre, G., Schuetz-Bosbach, S., & Keller, P. E. (
2013
).
Musical groove modulates motor cortex excitability: A TMS investigation
.
Brain and Cognition
,
82
(
2
),
127
136
. DOI:
Szpunar, K. K., Schellenberg, E. G., & Pliner, P. (
2004
).
Liking and memory for musical stimuli as a function of exposure
.
Journal of Experimental Psychology-Learning Memory and Cognition
,
30
(
2
),
370
381
.
Tomic, S. T., & Janata, P. (
2007
).
Ensemble: A web-based system for psychology survey and experiment management
.
Behavior Research Methods
,
39
(
3
),
635
650
.
Tomic, S. T., & Janata, P. (
2008
).
Beyond the beat: Modeling metric structure in music and performance
.
Journal of the Acoustical Society of America
,
124
(
6
),
4024
4041
.
Van Immerseel, L. M., & Martens, J. P. (
1992
).
Pitch and voiced/unvoiced determination with an auditory model
.
Journal of the Acoustical Society of America
,
91
(
6
),
3511
3526
.
Zajonc, R. B. (
1968
).
Attitudinal effects of mere exposure
.
Journal of Personality and Social Psychology
,
9
(
2P2
),
1
27
. DOI: