Sonata and rondo movements are often defined in terms of large-scale form, yet in the classical era, rondos were also identified according to their lively, cheerful character. We hypothesized that sonatas and rondos could be categorized based on stylistic features, and that rondos would involve more acoustic cues for happiness (e.g., higher average pitch height and higher average attack rate). In a corpus analysis, we examined paired movement openings from 180 instrumental works, composed between 1770 and 1799. Rondos had significantly higher pitch height and attack rate, as predicted, and there were also significant differences related to dynamics, meter, and cadences. We then conducted an experiment involving participants with at least 5 years of formal music training or less than 6 months of formal music training. Participants listened to 120 15-second audio clips, taken from the beginnings of movements in our corpus. After a training phase, they attempted to categorize the excerpts (2AFC task). D-prime scores were significantly higher than chance levels for both groups, and in post-experiment questionnaires, participants without music training reported that rondos sounded happier than sonatas. Overall, these results suggest that classical formal types have distinct stylistic and affective conventions.

Musical form is often understood as a large-scale pattern of repetition (Caplin, 1998, p. 9; Huron, 2013). Rondo forms, for example, are defined by a recurring main section (the refrain or rondo theme) that alternates with contrasting episodes. These forms are commonly represented by patterns such as ABACA, ABACADA, and ABACABA, where each letter corresponds to a discrete section. This formal strategy presumably has psychological effects: contrasting episodes reduce boredom or inattention, functioning as dishabituation stimuli; meanwhile, the increasingly familiar refrain enhances processing fluency and, consequently, aesthetic pleasure (Huron, 2013; Margulis, 2014; see also, Reber, Schwarz, & Winkielman, 2004). But in this account, processing fluency would relate to a specific piece, not a genre. In other words, the refrain would engender veridical expectations, based on specific musical passages, not schematic expectations, involving associative networks derived from multiple pieces (see Bharucha, 1994; Huron, 2006, pp. 224–225). According to David Huron (2006, p. 208), “The differences between sonata-allegro form and rondo form are not likely to lead to different schemas…The two forms might be considered different from a formal theoretical perspective, but they almost certainly do not evoke different listening schemas such as occurs for major and minor, or for reggae and baroque.” He contrasts these classical forms with the fugue, whose monophonic beginning immediately signals its overall type. If rondo form is aurally identifiable only through the refrain’s return—an event that first occurs toward the middle of a piece—then it follows that the opening of a rondo will not reveal its large-scale form for listeners.

Nonetheless, the differences between rondo and sonata movements (as actual pieces of music) arguably go beyond the differences between rondo and sonata forms (as relatively abstract music-theoretical models). Many theories of classical form consider how large-scale form relates to local features. For example, William Caplin’s form-functional approach assumes that “formal units of a work play specific roles in articulating its overall structure” (1998, p. 3). As such, its investigation of large-scale form describes stylistic conventions that involve melody, harmony, rhythm, and texture. And James Hepokoski and Warren Darcy consider the late eighteenth-century sonata in terms of genre (Hepokoski & Darcy, 2006, p. 604). (Note that they use the term “sonata” to refer to individual sonata-form movements, not only multi-movement works.) Hepokoski and Darcy aim “to sketch the outlines of a complex set of common options or generic defaults” (p. 9). To that end, they theorize five general sonata “types” and consider how individual compositions confirm—or expressively depart from—historically situated norms. For such theorists, form and content are not independent but interrelated.

We suggest, then, that classical rondos and sonatas can be distinguished through style, as well as large-scale form. Though various musical elements might index this distinction, the present study focused on movement openings. We employed two complementary methods: first, a corpus analysis compared the opening measures of sonata and rondo movements, considering diverse features as potential stylistic discriminators; second, in an experiment, participants listened to the openings of sonata or rondo movements and attempted to categorize them. We thus investigated musical characteristics of sonata and rondo movements, as well as listeners’ evaluations of them.

Our specific hypotheses were informed by classical-era commentary on the rondo. For present purposes, historical evidence reveals two important points: first, late eighteenth-century musicians understood sonatas and rondos as flexible genres, which were not solely defined by large-scale form; second, they consistently associated rondos with a cheerful mood, and these affective associations can guide hypotheses about stylistic differences between sonata and rondo movements.

The rondo was particularly fashionable in the 1770s and 1780s, appearing in symphonies and concertos, chamber and keyboard pieces (Cole, 1969a, 2001; see Figure S1 on this project’s OSF page for the distribution of instrumental rondos by year). Composers capitalized on the rondo’s popularity: C. P. E. Bach admitted that rondos helped to sell his publications (Cole, 1964, p. 31), and Mozart, in a bid to win over Viennese audiences, wrote a new finale for his Piano Concerto No. 5 in D major, K. 175—the Rondo for Piano and Orchestra in D major, K. 382 (Cole, 2001, §3). Meanwhile, many critics complained about the trend. For Johann Nikolaus Forkel (J. S. Bach’s first biographer), most rondos had “little true inner value” (1778, quoted in Cole, 1964, p. 25), and Johann Friedrich Reichardt wrote that “one hears hundreds of mediocre rondeaus and thousands of poor rondeaus” (1782, quoted in Cole, 1964, p. 27). These negative opinions ultimately confirm the popularity of this “lighter genre” (Cramer, 1783, quoted in Cole, 1964, p. 30).

Despite their complaints, Forkel, Reichardt, and other critics attempted to theorize the rondo. Most noted its characteristic formal strategy—often referring to the poetic rondeau, which ends with its opening line—and many noted common tonal goals. However, they also linked the rondo with a particular character or mood. According to Reichardt (1782, quoted in Cole, 1964, p. 26), a rondo should create “a certain agreeable feeling.” Forkel held up the rondo theme from C. P. E. Bach’s Sonata in G major, Wq. 90/2, as an exemplar: “Extremely pleasant, cheerful, clear, and comprehensible without being barren, it is heard with new satisfaction at each repetition” (1778, quoted in Cole, 1964, p. 25). Later, in the nineteenth century, Beethoven’s student Carl Czerny explained that a rondo should be “gay, lively and brilliant” (1848, p. 81). Going further, Czerny explicitly distinguished the opening of a rondo from that of a first movement:

The commencement of the first movement of a Sonata may be either energetic, or melodious; excited, or soft and tranquil. The same may be said of the Rondo or Finale; but there must be a palpable difference between the two, in regard to the description of the leading idea: for rarely would a suitable commencement for a first movement, serve also for the theme of a Finale [emphasis added]. It is not easy to render this difference intelligible by words. (Czerny, 1848, p. 67)

To illustrate, Czerny juxtaposed the initial measures of first movements and finales from keyboard sonatas by Mozart, Clementi, Haydn, Beethoven, and others. “Every one,” he asserted, “will perceive the great difference between the beginnings of these first movements and of the Finales” (Czerny, 1848, p. 69). Historical critics, then, associated the rondo not only with a formal strategy but also with particular musical and affective content.

Rondos might even be identified on the basis of style or character, in the absence of the typical refrain-and-episodes structure. Mozart’s K. 382 Rondo, mentioned above, is not in rondo form per se. It is a set of variations. His Rondo in D, K. 485, and the Rondo from Eine Kleine Nachtmusik, K. 525, are commonly understood as instances of sonata form (Cole, 1970; Galand, 1995; Grave, 2010; Hepokoski & Darcy, 2006, p. 399). And the so-called “Rondo alla turca” from his Piano Sonata in A major, K. 331, has a distinctive formal pattern in which the opening material returns only once (ABCDECABC). These pieces, along with other “mislabeled” rondos, suggest that the large-scale formal pattern might be a typical—but not essential feature—of the category rondo. As often happens, the textbook definition would impose crisp boundaries, turning a Type 1 “natural” category into a Type 2 “artificial” one (Zbikowski, 2002, p. 42). Sonata form was also codified later in the nineteenth century, through the efforts of theorists like Adolf Bernhard Marx—who understood rondo forms as less evolved precursors of sonata form (Burnham, 1989, pp. 256–259). Late eighteenth-century concepts of rondo and sonata were richer and messier, involving an interplay of form, style, and genre.

If the rondo and the sonata are genres as much as forms, then it might be possible to distinguish these types based on stylistic features. Genre classification is an established topic in the field of music information retrieval (MIR; for a review, see Sturm, 2012). Generally, this research involves training computational models on some musical corpus, which might involve sound files (Tzanetakis & Cook, 2002), symbolic MIDI representations (McKay & Fujinaga, 2004), or even a mix of audio, textual, and visual sources, including album reviews and cover art (Oramas, Barbieri, Nieto, & Serra, 2018). The computational models are then given a categorization task, making judgments that are comparable to those of human listeners (Lippens, Martens, & De Mulder, 2004). This work can help reveal associated musical features that characterize particular musical genres.

Psychological studies, further, explore genre classification as a cognitive process. They show reasonable levels of intersubjective agreement about genre, though individuals’ responses always reflect their musical background and preferences (Gjerdingen & Perrott, 2008; Mace, Wagoner, Teachout, & Hodges, 2012). For example, accuracy is higher for favored genres. Genre recognition, moreover, occurs almost instantaneously: participants can successfully apply genre labels to excerpts as short as 125–250 ms. As such, Gjerdingen and Perrott (2008, p. 100) suggest that “the rapid recognition of musical genre occurs concomitantly with the decoding of component features. In a manner reminiscent of Gestalt effects, it would appear that listeners can achieve a global categorization of genre at least as fast as they can categorize any component feature.” This process of musical categorization seems to be fast and holistic.

With extremely brief excerpts, listeners presumably rely on timbre, because melodic, harmonic, and rhythmic information—and, of course, formal cues—are effectively unavailable. Timbre readily differentiates genres such as classical, hip hop, and rock. But sonata and rondos use roughly the same instrumental and tonal resources—and, indeed, were created by the same musicians. It seems unlikely that particular timbres or chords would separate them. What other parameters might be relevant, then?

Here the rondo’s cheerful mood suggests some stylistic differences. Music is typically perceived to be happier when it is higher, faster, and louder. These associations emerge in experiments where listeners rate the affect of musical stimuli, according to various scales (Dalla Bella, Peretz, Rousseau, & Gosselin, 2001; Eerola, Friberg, & Bresin, 2013; Friedman, Neill, Seror, & Kleinsmith, 2018; Gagnon & Peretz, 2003; Hevner, 1937), and also in experiments where participants manipulate musical parameters to express given emotions (Bresin & Friberg, 2011; Quinto & Thompson, 2012). Corpus studies have observed the same relationships, working on the assumption that major and minor modes can represent happy and sad affect respectively (Huron, 2008; Poon & Schutz, 2015; Post & Huron, 2009; Turner & Huron, 2008). That is, pieces in major keys tend to be higher, faster, and louder (see also, Broze & Huron, 2013). Moreover, these pitch, rhythmic, and dynamic attributes correlate with the acoustic properties of happy speech (Juslin & Laukka, 2003; Schutz, 2017). This literature approaches affect in various ways, via the categorical opposition of happy and sad, or a multidimensional model of affect that distinguishes between valence (positive/negative or pleasant/unpleasant) and arousal (Russell, 1980). Note that happiness and sadness are opposed in both dimensions (Russell, 1980, p. 1175): happiness involves higher arousal, which is implicated in the relative intensity of happy music. For present purposes, though, research on acoustic cues for emotion can help us to translate historical evidence into empirically testable hypotheses. To test these hypotheses, we designed a corpus study that asked whether a movement’s type (sonata or rondo) could be predicted on the basis of stylistic features from its beginning.

Corpus Study

Method

Sample

We compiled a corpus of 180 multi-movement works, composed between 1770 and 1799. Each piece includes both a sonata-form movement and a rondo. The corpus includes keyboard, chamber, and symphonic works by Haydn, Mozart, three of J. S. Bach’s sons, the young Beethoven, and their contemporaries. More than half of the rondos (55%) were labeled as rondos in eighteenth-century sources (with the Italian term, “rondo,” or the French equivalents, “rondeau” or “rondeaux”). The remaining rondos in the corpus were identified by a historian of the genre, Malcolm Cole (1964). This approach includes supposedly “mislabeled” rondos, which are not, according to modern music theory, in rondo form. It excludes standalone pieces such as the rondos from C. P. E. Bach’s “Kenner und Liebhaber” collections or Beethoven’s Rondo in G, Op. 129 (nicknamed “Rage Over a Lost Penny”). This strategy ensures that there are an equal number of sonata and rondo movements in the corpus, paired by composer, instrumentation, date, and tonic.

Procedure

For each movement in the corpus, we collected data for several categorical and numerical variables (see Table 1). We consulted published scores, using historical or critical editions wherever possible. Generally, we hypothesized that sonata and rondo movements can be distinguished on the basis of stylistic features that appear in their opening measures. More specifically, we predicted that, in relevant musical parameters, such differences will relate to acoustic cues for emotion.

Table 1.

Variables for Corpus Study

Variables for Corpus Study
Variables for Corpus Study

Categorical data were derived from each movement’s beginning, starting with its mode (major or minor). We predicted that, compared with sonata-form movements, rondos would more often be in the major mode. We accounted for meter via two variables—beat division (simple or compound) and beat count (single, duple, triple, or quadruple)—and noted whether the movement started with a pickup. We hypothesized that compound meters would be more common in rondos and that rondos would be more likely to begin with a pickup. As McKee (2004, p. 1) explains, pickups often relate to mood in late eighteenth-century music: “Downbeat melodies are often heard as strong and assertive; anacrustic melodies are typically softer in character and evoke a sense of lightness.”

Though texture can be quantified in terms of synchrony and pitch comodulation (De Souza, 2019; Duane, 2013; Huron, 1989), we preferred a categorical approach, coding opening textures as monophonic (with a single melodic line, played solo or in unison), homophonic (with multiple synchronized parts), or polyphonic (with multiple independent parts, including contrapuntal and melody-and-accompaniment textures). Our intuition was that polyphonic (particularly melody-and-accompaniment) textures would be more common in rondos.

We also recorded initial dynamic markings, whether original or editorial. For the statistical modeling described below, we coded dynamic as an ordinal variable, coding pianissimo and piano as “soft,” mezzo forte, mezzo piano, and no dynamic marking as “mezzo,” and forte and fortissimo as “loud.” Given previous research on emotional cues in speech and music, we hypothesized that rondos would have a louder initial dynamic.

Numerical variables related to pitch and timing were calculated for the equivalent of the movement’s first four measures. For movements that began with a pickup, the duration of the pickup was subtracted from the end of the fourth measure. Four-measure subphrases are normative in the classical style (Caplin, 1998), and this relatively brief unit corresponds to a first impression of the piece.

We predicted that rondos would have higher average pitch height than sonata movements. To determine weighted average pitch height, we adapted a method from Poon and Schutz (2015). Whereas an earlier approach to average pitch height ignored duration (Huron, 2008), Poon and Schutz’s average is weighted so that longer notes have greater influence. Pitches are first converted into numbers, which correspond to their positions on a standard 88-key keyboard (A0 = 1, C4 = 40). Then, pitch numbers are multiplied by durations (with the value for a quarter note set at 1), and the sum of these weighted values is divided by the sum of all duration values. That is,

 
x¯*=i=1Nwixii=1Nwi
graphic
1

where x¯* is the weighted average pitch height, xi is a note’s pitch, wi is that note’s weight (i.e., duration), and N is the total number of notes in the sample. Following Poon and Schutz (2015), double-stemmed notes received two separate entries. However, pitches on staves with instrumental doubling (e.g., violins, 2 oboes) were entered only once. In orchestral contexts, combined parts for “Violoncello e Basso” were counted as a single pitch (though the contrabasses sound an octave lower than the cellos). Finally, we did not include melodic ornaments (e.g., trills, mordents) in our calculations of weighted average pitch height.

Weighted average pitch height indicates a registral “center of gravity” for a group of notes—but does not show how spread out they are in pitch space. For example, imagine a two-part invention, and a similar piece where the top voice is transposed up an octave and the bottom voice, down an octave: both versions could have the same weighted average pitch height, though the latter would have greater pitch variance. We prefer pitch variance over a simpler metric for pitch range or ambitus (i.e., the distance between an excerpt’s highest and lowest pitches), which could be disproportionately affected by one extremely high or low note. Instead, our measure of variance considers all pitches in a passage and is again weighted by duration. Thus,

 
sw2=i=1Nwi(xix¯*)2(i=1Nwi)1
graphic
2

where sw2 is the weighted pitch variance, x¯* is the weighted average pitch height, xi is a note’s pitch, wi is that note’s weight (i.e., duration), and N is the total number of notes in the sample. Note that this estimate is made unbiased by applying Bessel’s Correction, as indicated by subtracting 1 from the sum of the unnormalized weights. We hypothesized that pitch variance would tend to be larger in sonata-form movements, which might more commonly feature dramatic changes of register. According to this hypothesis, then, rondos would have less registral variability.

We predicted that rondos would have higher average attack rate, which we again measured following Poon and Schutz (2015). Attack rate goes beyond tempo to account for the music’s event density. (After all, it is possible to have fast-tempo music with relatively few attacks, or slow-tempo music with many attacks.) First, we counted the number of distinct attacks in each excerpt and divided this count by the overall number of beats, giving a number that indicates average attacks per beat. Two or more simultaneous attacks count as a single event here. In simple meters (e.g., 2/4, 4/4, 2/2), we treated the quarter note as the beat for purposes of calculation; in compound meters (e.g., 6/8, 12/8), we used the dotted quarter note. Average attacks per beat is multiplied by the number of beats per second—that is, beats per minute (BPM) divided by 60—to produce average attack rate (in units of attacks per second):

 
average attack rate=(number of attacksnumber of beats)×(BPM60)
graphic
3

Of course, eighteenth-century scores do not provide metronome markings. In their corpus analysis of J. S. Bach’s Well-Tempered Clavier, Poon and Schutz (2015) dealt with this by using seven sets of editorial tempo indications. This is not an option for the present study, since our larger corpus includes compositions that have not been republished in new editions. Instead, we derived BPM values from musicological research on Mozart’s tempo indications by Jean-Pierre Marty (1988). As Marty emphasizes, traditional terms for tempo (e.g., andante or allegro) cannot be directly converted to BPM numbers; instead, in eighteenth-century music, there is an interaction between tempo categories and meter. Our BPM values were drawn from his final chart of recommended metronome markings, primarily from the composite tempo category (Marty, 1988, pp. 208–210). These are approximations at best, and a range of tempos would be suitable in performance. This is not a major problem, however, since we are looking not for a single correct tempo but for a consistent way to operationalize tempo that facilitates the comparison of paired movements.

Finally, to evaluate the prevalence of harmonic closure, we went beyond the first four measures. We recorded the measure number for the first perfect authentic cadence (PAC) in each movement. Half cadences and imperfect authentic cadences were excluded here, though PACs in secondary keys were accepted. In ambiguous cases, precedence was given to the earliest possible PAC. A time to PAC value was determined by multiplying this measure number by the number of beats per measure, then multiplying by the number of seconds per beat (60/BPM). This value represents the arrival time in seconds for the measure that includes the movement’s first PAC, if the piece were performed at Marty’s recommended speed. As with average attack rate, this is simply an approximation that facilitates comparison. We predicted that PACs would generally arrive earlier in rondos. This is motivated by claims of Formenlehre theorists such as Caplin (1998, p. 231), who notes that “a rondo theme always closes with a perfect authentic cadence, never a half cadence.”

Data Analysis

Our statistical analysis aimed to determine whether rondos could be discriminated from sonatas on the basis of the musical features we measured. We constructed a multiple logistic regression model in MATLAB with the dependent variable Movement Type (sonata or rondo); the categorical predictors Beat Division (simple or compound), Mode (major or minor), Pickup (noPickup or yesPickup), and Texture (homophonic, monophonic, or polyphonic); the numerical predictors Weighted Average Pitch Height, Weighted Pitch Variance, Time to PAC, Attack Rate, and BPM; and the ordinal predictor Dynamic (soft, mezzo, or loud).

We can use our logistic regression model to estimate the effect of our predictor variables on the log-odds of a piece being a rondo, to predict the probability that a movement is a rondo or not based on our independent variables, to order pieces by their probability of being a rondo, and to predict whether movements not included in our dataset are rondos or not based on their features. We can also use it as a binomial classifier by setting a probability threshold. We have provided MATLAB code on this project’s OSF page should the reader wish to use our model to calculate the probability that a movement not included in the corpus is a sonata or rondo, given its values for the predictor variables (see  Appendix). With this code, it is also possible to try other combinations of predictors or add new ones that we did not measure.

Model specification

Constructing our logistic regression required various decisions about which predictor variables to include. One difficult decision was whether to include Beat Count as a predictor in our model. We identified two reasons to exclude it. First, there were only 10 pieces with a “single” beat count (i.e., those pieces that had a 3/8 time signature, felt “in 1”). Having a small number of observations for this level of the Beat Count factor could make its coefficient estimate inaccurate, and single beat count pieces in our sample were always in compound meter, meaning it is not an independent variable. Second, certain distinctions here are not readily perceptible for listeners: the differences between 3/8 and 6/8 (single and duple) or 2/4 and 4/4 (duple and quadruple) are largely notational, a problem reflected in the theoretical distinction between “notated” and “real” (i.e., perceived) meter (Caplin, 1998). Nonetheless, excluding Beat Count has its disadvantages: triple meter is readily distinguished from duple or quadruple meter. Further, any systematic relationship between Beat Count and other predictors that are included in the model could lead to misattributing predictive power to those predictors instead of Beat Count; in other words, we cannot control for the independent effect of Beat Count. In our model, Attack Rate is a linear function of BPM. Despite this relationship, these variables are not completely collinear because Attack Rate contains variability that does not stem from BPM, due to Attacks per Beat (the other multiplicand in the equation for Attack Rate). Attack Rate is a meaningful construct in itself, so we have not included Attacks per Beat in the model. This means that the effect of Attack Rate may be partially due to variability in Attacks per Beat, and partially due to variability in the interaction between BPM and Attacks per Beat (i.e., Attack Rate itself).

Finally, note that we have not included interaction terms in our model. A potentially interesting question is whether the features that predict a movement being a rondo differ across instrumentation, composer, or year. However, including such factors would severely constrain the model by introducing many more coefficients, and would reduce our statistical power. There is also not enough variability within each categorical interaction term to estimate its coefficient (e.g., we do not have a full set of comparisons of the other variables within pieces by Pleyel because Pleyel contributes very few movements to the corpus). This means that for the purposes of this modeling, we are assuming a similar set of characteristics across these features.

In an initial analysis, we constructed a mixed-effects logistic regression treating the same predictors listed here as fixed effects and including Piece as a random effect (since each piece contributed two movements to the overall corpus). Though this model accounts for the paired nature of the stimuli, it did not perform better than the fixed effects–only model in terms of classification accuracy (as described below), and in the interest of parsimony, we report here the model with fixed effects only. For reference, we have included the results of the mixed model in the supplementary materials on this project’s OSF page (see  Appendix).

Classification and validation

Logistic regression models can readily be turned into classification models by choosing a probability threshold. If the model outputs a probability greater than the threshold, we say that it predicts that a piece is a rondo (since rondos were arbitrarily coded as 1 and sonatas as 0), and if below the threshold, a sonata. For present purposes, we set this threshold at .50. We can then assess the performance of the model based on the proportion of correct classifications it makes (i.e., the model accuracy), by comparing these predictions against the true movement identities. In classification models, there are different ways to assess the performance of the classifier besides accuracy, and another commonly used metric is the area under the receiver operating characteristic curve (AUC) score. Our discussion here relies on the accuracy metric because we are equally concerned with false positives as we are with false negatives. However, in the OSF supplementary materials, we provide code to calculate the AUC metric and plots of the associated receiver operating characteristic curves for reference (see  Appendix and Figure S2). We also report the AUC values in the article for comparison to the accuracy values.

We assessed our model’s performance using a leave-one-out cross-validation procedure (LOOCV). This procedure fits the logistic regression model to all but one of the movements in the corpus and uses the model coefficients to calculate the probability that the left-out piece is a rondo (based on that piece’s predictor variables). This procedure is repeated such that each of the 360 movements takes a turn as the one that is left out, resulting in 360 predictions. This is done so that a given movement is not involved in predicting itself. It was these probability values that we used to calculate the model’s accuracy and AUC values.

Results

Summary Statistics

Figure 1 displays boxplots of the numerical predictor variables for sonatas and rondos separately, while Figure 2 displays bar graphs of the categorical variables, as well as the ordinal variable, Dynamic. (Most pieces in the corpus began with an original dynamic marking [70.28%], 4.17% began with an editorial dynamic marking, and 25.56% began with no dynamic indication.) Note that some of the categorical variables in Figure 2 are not included in the logistic regression model, yet they still help display the characteristics of our corpus.

Figure 1.

Box plot summaries of numeric features of the full corpus. The box plot for weighted average pitch height uses pitch numbers, where C4 = 40.

Figure 1.

Box plot summaries of numeric features of the full corpus. The box plot for weighted average pitch height uses pitch numbers, where C4 = 40.

Figure 2.

Bar graphs of the categorical features of pieces in our corpus. Note that Composer, Instrumentation, and Beat Count were not included in the statistical modeling.

Figure 2.

Bar graphs of the categorical features of pieces in our corpus. Note that Composer, Instrumentation, and Beat Count were not included in the statistical modeling.

Logistic Regression

The results of the logistic regression are reported in Table 2. We also report the odds ratios for each predictor. Note that continuous predictor values were standardized to allow for easier comparison of the coefficient magnitudes. A deviance test showed that the model was significantly better than a null model (with intercept only), χ2 = 178.70, p < .001. Weighted Average Pitch Height, Weighted Pitch Variance, Time to PAC, Attack Rate, Dynamic, the presence of a Pickup, and compound Beat Division all significantly predicted greater log-likelihood of a piece being a rondo, as denoted by the positive coefficient estimates. Mode was not significant (p = .629). Compared to Homophonic Texture, Polyphonic Texture was not significant (p = .713) nor was Monophonic Texture (p = .077).

Table 2.

Logistic Regression Model (Full Corpus)

Logistic Regression Model (Full Corpus)
Logistic Regression Model (Full Corpus)

Model Performance and Validation

The LOOCV procedure showed that the model could correctly predict the movement type (rondo vs. sonata) 79.44% of the time (AUC = .85). The model performed significantly better than chance as assessed with a binomial test (p < .001). The confusion matrix for the model is presented in Table 3.

Table 3.

Confusion Matrix of the Logistic Regression Model

Confusion Matrix of the Logistic Regression Model
Confusion Matrix of the Logistic Regression Model

Discussion

The logistic regression classifier distinguishes very well between rondos and sonatas on the basis of our predictor variables, as indicated by high accuracy and AUC scores. This supports our overall hypothesis that rondos and sonatas can be discriminated on the basis of features from a movement’s beginning. More specifically, our model predicts that a piece is more likely to be a rondo if it has higher weighted average pitch height, lower weighted pitch variance, less time to the first PAC, higher attack rate, softer opening dynamic, an initial anacrusis, and compound beat division. Differences between sonatas and rondos, then, involve a diverse collection of musical features, including register, meter, rhythmic activity, harmonic closure, and dynamics.

Many of these findings fit with the movement types’ affective norms. For example, rondos’ higher pitch height and higher attack rate are consistent with acoustic cues for happiness in speech. With dynamics, however, our predictions were not confirmed: though prior research suggests that happy music, like happy speech, tends to be louder (Schutz, 2017; Turner & Huron, 2008), in our findings, rondos were relatively quiet. This recalls the major/fast/quiet combination that Horn and Huron (2015) characterize as “light/effervescent.” According to their corpus analysis, “light/effervescent” music was especially popular in the late eighteenth century—and the rondo vogue might contribute to that trend. Meanwhile, sonata-form movements featured more varied opening dynamics, though louder dynamics were most common in this relatively serious, dramatic movement type.

Mode has often been used as an a priori classifier in studies of emotional cues in music, yet it was not significant here. This is largely because the major mode was most prevalent in both sonata movements (93%) and rondos (96%). Note, however, that within the works in our corpus, major rondos sometimes followed minor sonata movements (n = 3), but minor rondos never followed major sonata movements—a subtle asymmetry that fits with our original hypothesis about mode.

Rondos were more likely to have compound meters, as predicted. Compound meters (3/8, 6/8, 12/8) appeared in 20% of rondos and approximately 3% of sonata-form movements. There were also differences in simple meters: 57% of rondos were in 2/4 (compared with 8% of sonata movements). Rondos were also more likely to start with a pickup.

While texture did not reach significance independent of other predictors in our model, melody-and-accompaniment textures were most common in rondos. Monophonic openings, by contrast, more commonly appeared in sonata movements. For Janet Levy (1982, pp. 497–501), solo textures function as a contextual sign in classical music, often pointing out a beginning or deferring closure. The connection between monophony and sonata form might also be interpreted in affective terms: according to Hansen and Huron (2018), solo textures in orchestral music are associated with sadness, which would make them less appropriate for rondos.

Time to PAC data indicate that strong cadences tend to arrive earlier in rondos than in sonata-form movements. While other variables focus on movement openings, time to PAC can engage longer stretches of music. It assesses cadential closure—a feature that is stylistic but also associated with formal articulation at various levels. For example, PACs mark the ends of themes (Vallières, Tan, Caplin, & McAdams, 2009, p. 29) and larger formal sections (Hepokoski & Darcy, 2006, p. 12). With time to PAC, a couple of outliers appear in Figure 1: the sonata excerpt in question is Haydn’s Symphony No. 102, and the rondo movement comes from Mozart’s String Quintet No. 4, K. 516. Both pieces start with a slow introduction that does not include a PAC, and both cadence after 8 measures in the secondary tempo.

Our corpus included 20 movements with slow introductions, followed by new, faster tempo (19 sonatas, 1 rondo). In the late eighteenth-century, slow introductions connoted formality and seriousness, and they were most common in public genres like the symphony (Hepokoski & Darcy, 2006, pp. 292–297). Indeed, 13 of the examples in our corpus appear in Haydn symphonies. Because theorists often conceptualize such introductions as occurring “before the beginning” (Caplin, 1998, p. 203), we might have excluded slow introductions and treated the sonata’s exposition or rondo’s initial refrain as a “true” beginning. Still, we aimed to assemble a corpus that was representative of late-eighteenth-century instrumental music, without excluding music based on a priori theoretical assumptions. Pieces with slow introductions are still relatively rare, comprising 5.55% of the entire corpus. It might seem that these sonata movements could disproportionately influence average BPM. But while rondos tended to have higher average attack rate, there was no significant tempo difference in our model. In other words, it appears that rondos had faster rhythms but not faster tempi overall. Even if a slow introduction strongly indicates that a movement is a sonata rather than a rondo, our results suggest that this difference involves many musical features.

Our model produces individual probability estimates that any given movement in the corpus is a rondo. A full list of the pieces with their ranks is provided in the supplementary materials (see  Appendix). It is interesting to return to particular movements with these estimates in mind. For example, the piece identified as most likely to be a rondo was the rondo from Clementi’s Sonata, Op. 35, No. 1 (see Figure 3), and the piece least likely to be a rondo was the first movement from Pleyel’s Keyboard Trio in C major, B. 428 (see Figure 4). The rondo with the lowest probability of being a rondo (i.e., the most sonata-like rondo) was the finale from Mozart’s String Quintet No. 4, K. 516—a rare minor-key rondo that, as mentioned above, starts with a slow introduction (see Figure 5). Meanwhile, the sonata movement with the highest probability of being a rondo (i.e., the most rondo-like sonata) appeared in Haydn’s Keyboard Sonata in G, Hob. XVI:40 (see Figure 6). These compositions illustrate many of the stylistic differences that our model highlights. For example, the rondo-like pieces from Clementi and Haydn include quiet dynamics, compound meter, fairly fast attack rate, and fairly high register, whereas Pleyel’s sonata movement starts forte, with simple meter, slow attack rate, and lower average pitch height.

Figure 3.

Muzio Clementi, Piano Sonata in C major (1796), Op. 35, No. 1, mvt. 3, mm. 1–4

Figure 3.

Muzio Clementi, Piano Sonata in C major (1796), Op. 35, No. 1, mvt. 3, mm. 1–4

Figure 4.

Ignaz Pleyel, Keyboard Trio in C major, B. 428 (1784), mvt. 1, mm. 1–4

Figure 4.

Ignaz Pleyel, Keyboard Trio in C major, B. 428 (1784), mvt. 1, mm. 1–4

Figure 5.

Wolfgang Amadeus Mozart, String Quintet No. 4 in G minor, K. 516 (1787), mvt. 4, mm. 1–4

Figure 5.

Wolfgang Amadeus Mozart, String Quintet No. 4 in G minor, K. 516 (1787), mvt. 4, mm. 1–4

Figure 6.

Joseph Haydn, Keyboard Sonata in G, Hob. XVI:40 (1784), mvt. 1, mm. 1–4

Figure 6.

Joseph Haydn, Keyboard Sonata in G, Hob. XVI:40 (1784), mvt. 1, mm. 1–4

These examples demonstrate how individual compositions mix various musical features. They also suggest additional variables. Note, for example, that the rondo-like excerpts from Clementi and Haydn (Figures 3 and 6) both include staccato markings. This is unsurprising, since staccato articulation is associated with musical expressions of happiness (Eerola et al., 2013). Future studies might include variables related to articulation, theme types (Caplin, 1998), or rhythmic variety (see Vallières et al., 2009, p. 28; Vallieres, 2011, p. 88). Such studies might also employ computational analysis techniques, involving the MIDIToolbox (Eerola & Toiviainen, 2004) or MIRToolbox (Lartillot, Toiviainen, & Eerola, 2008). Still, though our corpus study is by no means exhaustive, it does consider diverse musical features—which, in our view, effectively capture the rondo/sonata distinction.

As for the model’s external validity (i.e., whether it can do well at predicting the identity of movements not included in our corpus), the cross-validation procedure protects us to some extent against overfitting (i.e., having an accurate model for our data that does not generalize well to movements that were not involved in the model fitting). Still, claiming that our model generalizes to movements outside of our corpus relies upon the assumption that our corpus is representative of the rondo and sonata “populations.” Given our large sample size and high accuracy values, we are confident the model is robust and will perform well outside of our corpus, but exactly how well it performs is an empirical question that would rely on measuring many other pieces. We provide MATLAB code to test further pieces in the supplementary materials on this project’s OSF page (see  Appendix).

Following the results of our model, we argue that sonata-form movements and rondos can be distinguished according to a suite of stylistic features on the musical surface—and moreover, from openings rather than entire movements, regardless of stylistic or formal features that appear later. But in some sense, this model is omniscient: it knows the true values of the predictors for each of the movements. Listeners would not have access to these values and might not use the same kind of information to judge the difference between sonatas and rondos. What features would they use to make the distinction? Can they distinguish between rondos and sonatas at all? These questions cannot be answered with corpus analysis alone. Instead, they must be tested experimentally. As such, we planned an experiment in which participants—with and without formal music training—judged whether pieces were sonatas or rondos. We hypothesized that both groups would be able to categorize sonata and rondo movements based on stylistic cues, and we expected trained musicians to be more sensitive to the distinction. However, if musicians could not successfully categorize sonatas and rondos, it would seem unlikely that musically untrained participants could.

Experiment

Method

Participants

Forty participants were recruited via posters on campus at the University of Western Ontario. All participants reported normal hearing, and each one was compensated with an honorarium of $10 (CAD). Twenty participants (8 women, 12 men) had at least 5 years of private music instruction (M = 15.10, SD = 4.79). For this group with music training, age ranged from 23 to 57 years (M = 34.65, SD = 10.58). The remaining participants (11 women, 9 men) reported less than 6 months of formal music training, though most had some experience with informal music-making (M = 2.37 years, SD = 2.85). For example, 7 participants played guitar and 7 played piano. Most of their everyday music listening involved pop/rock (M = 40%, SD = 28%) and R&B/hip-hop (M = 25%, SD = 27%). For the musically untrained group, age ranged from 18 to 32 years (M = 21.75, SD = 4.39). Two participants—one from each group—were excluded from the experiment because of technical malfunctions resulting in incomplete data collection.

Stimuli

Stimuli consisted of uncompressed audio clips from commercial recordings of a subset of the movements in our total corpus. For each movement, we used the initial fifteen seconds of the recording, presenting the opening of the movement. These works were quasirandomly selected from the keyboard, quartet, and symphonic pieces in the corpus (20 of each). We created chronological lists of pieces with each instrumentation, then systematically selected every nth entry. When a commercial recording was not accessible for a selected piece, it was replaced by the next available piece on the list. This method ensured that excerpts from throughout our historical period were included. Half of the excerpts were sonata movements, while the other half were rondos from the same multi-movement works. The rondo and sonata movement of each was included, totalling 120 movements (60 × 2), constituting a third of our total corpus. This set was then divided into two equal subsets, comprising sonatas and rondos from 30 of the pieces (10 of each type of instrumentation). These subsets were assigned as either a training or a test set, counterbalanced across participants.

Procedure

Participants were invited to a quiet classroom space for a one-hour session. The experiment was administered on a PC laptop, running PsychoPy, v. 1.85.2. Participants used Sony on-ear noise-cancelling headphones (MDRZX110NC), and they were allowed to adjust the volume to a comfortable level. During a training phase, participants listened to the 60 excerpts from their assigned training set, presented in a random order. On-screen text indicated each excerpt’s formal category (sonata or rondo), and after the excerpt, participants were prompted to press a key on a computer keyboard corresponding to the category. Participants used their left index fingers to press “a” for rondo and their right index fingers to press “l” for sonata.

In a subsequent testing phase, participants listened to the 60 novel excerpts from the assigned test set (30 sonatas and 30 rondos) presented in a random order and were tasked with categorizing them as either a sonata or rondo by pressing one of the same corresponding keys as in the training phase. Participants were instructed to respond as quickly and accurately as possible, and the audio stopped after the response. They did not receive feedback about the accuracy of their responses.

In a post-experiment questionnaire, participants responded to questions about their musical background and listening preferences and rated the difficulty of the experimental task on a five-point scale. The questionnaire also asked participants to name any musical excerpts that they had recognized during the experiment and posed open-ended questions about their impression of the sonata and rondo categories (i.e., “What words would you use to describe the rondo category?” and “What words would you use to describe the sonata category?”).

Data Analysis

Each participant’s responses were analyzed via the sensitivity index d ′ and a measure of bias C. We compared groups via a two-sample t-test. Under the null hypothesis that sonatas and rondos are indistinguishable on the basis of listening to the opening of a movement, participants should perform at chance levels (d ′ = 0). As such, we also used Bonferroni-corrected one-sample t-tests to judge whether each group differed from chance performance. We set an α level of .05 to assess significance. Data were analyzed in MATLAB.

Content analysis

We also conducted an inductive content analysis with descriptions of sonata and rondo categories from the post-experiment questionnaire. Whereas deductive content analysis organizes qualitative data in terms of concepts that are defined a priori, inductive content analysis derives concepts from the data (Elo & Kyngäs, 2008). The second author initially coded participants’ responses. The first author reviewed this coding and proposed adjustments, and following discussion, both authors agreed on coding.

Rondo ratings

Questions about whether or not people can discriminate sonatas from rondos at all are separate from questions about what features make people think that a given excerpt is a rondo (regardless of whether or not it actually is one). Thus, for each piece included in the experiment, we calculated the proportion of responses that classed it as a rondo. If a participant indicated they knew the piece, we did not include their response in the calculation.

We considered these Rondo Ratings in two ways. First, we checked whether they correlated with the predictions for the subset of pieces made by the logistic regression model for the full corpus reported above. In other words, for the subset of pieces included in the experiment, does the probability of being a rondo as estimated by the model correlate with the likelihood that participants identified those pieces as rondos? To assess this relationship, we used Spearman’s rank-order correlation test to compare the rankings.

Such a significant correlation would show that the logistic regression model reported above captures similar information about pieces that people use to make their judgments, but it does not say whether the predictors are weighted similarly. People may not be relying on the same features as the omniscient corpus model, which knows the true values of all the predictor variables. Thus, in a second, exploratory analysis, we constructed a new logistic regression model using the Rondo Ratings as the response variable. Given the reduced set of pieces, and the fact that a maximum of 20 classifications were made for each movement (since judgments were excluded if participants knew the piece, and each participant only rated half the excerpts), the power of such a model is significantly reduced because the Rondo Rating estimates may be unreliable. To account for this, we used a stepwise procedure to find a subset of predictors that best predicted the Rondo Ratings rather than using the full set of features. We started with an intercept-only model, and added predictors from the full set included in the corpus model one at a time with the criterion that the new predictor must significantly improve the fit of the model (as assessed using a deviance test, i.e., showing that the explained variance is significantly more as assessed using a chi-square test) and is the best improver of the model of all possible additional predictors. This procedure was repeated until no new predictors improved the fit of the model. Note that this procedure does not require that the coefficients themselves be significant. The resulting model provides an indication of the most important features of the corpus for participants, but note that this is one of many possible kinds of stepwise procedures (which could potentially lead to different results), and the results of this analysis should be regarded as exploratory in nature. (A more conclusive approach is proposed in our General Discussion.)

Results

Accuracy rates, d ′ scores, and a measure of bias (C) were calculated for each participant. In post-experiment questionnaires, six musicians identified compositions among the stimuli (10 compositions total, 0.88% of the trials for their group). Our analysis excluded their responses for these compositions.

For participants with music training, accuracy ranged from 66% to 93% correct for sonata movements (M = 81.82%, SD = 8.35%), and from 50% to 93% correct for rondos (M = 71.20%, SD = 9.74%). For participants without music training, accuracy ranged from 47% to 80% correct for sonata movements (M = 61.75%, SD = 9.32%), and from 40% to 77% correct for rondos (M = 59.47%, SD = 10.96%). Mean d ′ scores were 1.51 (SD = 0.50) for the trained group and 0.56 (SD = 0.45) for the untrained group (see Figure 7), and this difference was highly significant, t(36) = 6.18, p < .001, with a large effect size (Cohen’s d = 2.00). Trained participants, then, were more sensitive to the difference between rondos and sonatas, relative to untrained participants. However, Bonferroni-corrected one-sample t-tests indicated that performance differed significantly from chance levels for both the trained group, t(18) = 13.15, p < .001, Cohen’s d = 3.02, and the untrained group, t(18) = 5.45, p < .001, Cohen’s d = 1.25.

Figure 7.

Box plots of d ′ scores and bias (C) for participants with and without formal music training

Figure 7.

Box plots of d ′ scores and bias (C) for participants with and without formal music training

The groups also differed in terms of response bias (C), t(36) = 2.25, p = .031, with a medium effect size (Cohen’s d = 0.73). Mean values for C were -0.17 (SD = 0.21) for the trained group and -0.03 (SD = 0.16) for the untrained group. Bonferroni-corrected one-sample t-tests showed that the trained group, on average, had a more liberal criterion for responding “rondo,” t(18) = -3.42, p = .006, Cohen’s d = -0.78, whereas the untrained group was not significantly biased, t(18) = -0.81, p = .854, Cohen’s d = -0.19.

The median rating for task difficulty was 3 (Neutral) for trained participants and 2 (Difficult) for untrained participants. That is, musicians found the task easier (Mann–Whitney’s U = 117, p = .015).

Inductive Content Analysis

Questionnaire responses from each experiment group are presented in Tables 4 and 5. In each, subcategories and response counts are presented within two main categories: musical features and character descriptions. Note that subcategories, such as pitch height or dynamics, contained opposed pairs (e.g., high/low, loud/quiet). In musicians' responses, we identified a total of 60 words or phrases that described sonata-form movements and 61 words or phrases that described rondos. From this set of 121, 45 words or phrases (37%) were ambiguous and unsuitable for content pairing. Musical features occupy 40% of responses, while the other (23% of responses) referred to character. In untrained participants' responses, we identified 57 words or phrases that described sonata-form movements and 52 words or phrases that described rondos. Entries based on musical features are present in 56% of responses, while the other (34% of responses) referred to character. The remaining 10% of responses, including ambiguous adjectives and imagery, did not fit into our main categories.

Table 4.

Content Analysis for Descriptions of Musical Features

Content Analysis for Descriptions of Musical Features
Content Analysis for Descriptions of Musical Features
Table 5.

Content Analysis for Descriptions of Character

Content Analysis for Descriptions of Character
Content Analysis for Descriptions of Character

Rondo Ratings

The likelihood that musician participants classed the compositions as rondos correlated with the probability predictions from the logistic regression for the corpus model reported above, rs = .59, p < .001. The stepwise logistic regression model procedure led to a model that was significant overall, χ2(2) = 25.28, p < .001, with two predictors significantly predicting the probability that participants would classify a piece as a rondo: Weighted Pitch Variance (B = -0.52 ± 0.21, p = .015) and Time to PAC (B = -1.23 ± 0.35, p < .001). Pieces with higher Weighted Pitch Variance and longer Time to PAC were less likely to be classed as rondos by participants.

The likelihood that untrained participants classed the compositions as rondos correlated with the probability predictions from the logistic regression for the corpus model reported above, rs = .40, p < .001. The stepwise logistic regression model procedure led to a model that was significant overall, χ2(1) = 6.15, p = .013, with only one predictor significantly predicting the probability that participants would classify a piece as a rondo: Time to PAC (B = -0.51 ± 0.22, p = .022). Pieces with higher times to PAC were less likely to be classed as rondos by participants. Again, note that this analysis is exploratory (see General Discussion).

Discussion

In this experiment, musicians with extensive formal training were highly sensitive to the sonata/rondo distinction. They were generally able to categorize movements based on openings alone, relying on stylistic cues rather than large-scale formal repetition. This effect should not depend on the recognition of specific compositions: we excluded responses where participants identified the musical stimuli, though it is possible that some pieces were familiar but not identified in the post-experiment questionnaire. Nonetheless, these participants relied on their practical and theoretical knowledge of late eighteenth-century music.

Participants in the group without music training had limited experience with classical music, and they received no theoretical explanation for the difference between sonata and rondo movements. Their training simply involved listening to 7.5 minutes of music from each category. Although a few participants were unable to distinguish between the categories (with d ′ scores at or near 0), others correctly identified more than 3 out of every 4 excerpts. Most participants in this group rated the task as “difficult,” yet perhaps they underestimated their ability to hear, categorize, and describe the differences between sonata and rondo movements. While trained musicians were more sensitive to these differences, both groups’ performance was significantly better than chance.

Our musical stimuli were quasirandomly selected from our corpus, and they included 10 sonata movements with slow introductions. As in the corpus study, we aimed for a representative sample of late eighteenth-century repertoire, without excluding movements (or sections) based on prior theoretical assumptions. A slow starting tempo might strongly indicate that an excerpt belongs to the sonata category, but in our view, this is not a problem if that accurately reflects tempo differences between the two movement types. That said, our stepwise logistic regression model suggests that tempo—unlike pitch variance and harmonic closure (time to PAC)—was not a significant factor in participants’ judgments of rondoiness.

All the same, several participants mentioned speed in the post-experiment questionnaire. Questionnaire responses reflected participants’ experience of the categories. Participants disagreed on many points. For example, some untrained listeners associated violins with sonata-form movements or with rondos, even though we controlled for instrumentation. And many subcategories, unfortunately, had a very small number of responses. Still, some notable oppositions appeared. In terms of musical features, participants in both groups sensed that rondos were faster and higher. These responses index attack rate and pitch height—two features that are closely related to acoustic cues for emotion. In terms of character, untrained participants felt that rondos were happier and less dramatic than sonatas; musicians found rondos more energetic. This qualitative content analysis is limited and provisional, yet it implies that both musical and character traits contribute to participants’ sensitivity to the sonata/rondo distinction.

General Discussion

Our corpus study suggested that rondos and sonatas can be distinguished on the basis of perceptible, surface-level features, and described characteristic features of rondos and sonatas. In our experiment, the likelihood of participants identifying the different movements as rondos correlated with the predictions of the corpus model. For musicians, the Spearman correlation coefficient was medium to large (.59) and statistically significant. It was smaller for untrained participants (.40), but even after minimal training, this relationship was already present and statistically significant. In other words, movements that were rated as more probable to be rondos by the full model were also more likely to be identified as rondos by the participants.

Having determined that listeners were sensitive to differences between sonatas and rondos, we examined features that they used to make their judgements. Two variables significantly improved the null model to predict rondo ratings. For both groups of participants, Time to PAC was a significant improver of the model: pieces with lower Time to PAC were more likely to be classed as rondos. Additionally, pieces with lower pitch variance were more likely to be classed as rondos by participants with extensive music training. This suggests that the features these participants used to distinguish the rondos were related to—but not necessarily identical to—the features used in our full model.

This aspect of our analysis is exploratory, though, and we cannot draw firm conclusions about which specific factors affect participants’ judgments of whether a certain excerpt is a rondo or not. For example, the corpus study and experiment involved slightly different materials: while much of the corpus analysis was based on the first four measures from each movement, listeners heard the first fifteen seconds. The audio clips often included additional music, though this would not provide more stylistic information for many features (e.g., meter and tempo, which typically remain constant). While we assume that these differences are not significant, it would be possible to compare the audio clips and scores by extracting corresponding features via the MIRToolbox and the MIDIToolbox (Eerola & Toiviainen, 2004; Lartillot et al., 2008). The exploratory models we provide here are also limited by the reliability of the likelihood estimates (i.e., each piece was rated by only 20 people, minus any participants who recognized it or were excluded due to technological malfunctions) and the reduced size of the corpus used in the experiment (120 movements vs. the 360 used in the full corpus). Finally, there is an important conceptual limitation here: we cannot rule out that latent variables are mediating whatever effect the observed variables have on judgments. In other words, the perceptible features that we did measure could correlate with features that we did not measure (or perhaps are not directly measurable), and it would not be clear which variables have a direct, causal influence on participants’ judgments. One proposal to investigate this further would be to generate music that directly and independently manipulated the features we measured (e.g., Weighted Average Pitch Height) to see if this changed participants’ judgments. This method could more directly assess the variables that influence participants’ rondo ratings.

Conclusions

The differences between classical sonatas and rondos are both formal and stylistic. They vary in their long-range juxtaposition of thematic and tonal areas, but also, as our corpus study has shown, in their registral, dynamic, rhythmic, and perhaps textural tendencies. Our experiment further suggests that listeners can learn to categorize these movement types based on their openings. In late eighteenth-century music, then, global form seems to correlate with local style. Form and style, in other words, might be two aspects of classical subgenres.

Cognitive categories or conceptual models for such subgenres might engage other factors too. For example, all of the sonata-form examples in our corpus are first movements, and 99% of the rondos are finales. At this level, these movement types might be compared with minuets, which have distinctive metrical and phrase-structural features—but also a typical compound-ternary form and a typical location in the middle of a multi-movement work. Extending the current project to other movement types would productively supplement the sonata/rondo opposition. There are likely strong resemblances among rondos and other finale types, such as variation movements. Similarly, sonatina or sonata-rondo movements might represent distinct subgenres. The sonata-rondo, in particular, is a formal hybrid, but its stylistic and affective tendencies might be more consistent with rondos than sonatas. As Hepokoski and Darcy (2006, p. 405) note, “this hybrid typically begins with a light, square-cut, and memorable ‘rondo-style’ opening theme.” And it was understood as a kind of rondo in the late eighteenth century (Cole, 1969b). Nonetheless, individual sonata-rondo movements can be more rondo-like or more sonata-like (Hepokoski & Darcy, 2006, p. 429), and their relation to generic norms can guide analytic and hermeneutic interpretation. Here we would emphasize the interrelation of content and form or, as Naomi Waltham-Smith (2017) puts it, material and use. As Waltham-Smith argues, the expectations fostered by classical conventions permit a kind of participation, in which listeners might imagine themselves co-composing the music. Generic features would thus ground a certain kind of listening community.

Affect plays into these conventions in complex ways. Obviously a single mood is not shared by all sonata-form movements (or even all rondos). These categories—and individual compositions—have a certain emotional range. Moreover, our study focused on only one factor contributing to musical affect: acoustic cues, such as pitch height and attack rate, that are analogous to emotional speech. There are many other mechanisms for simulating affective responses in music, involving tonal expectation, timbre, and sonic analogs (Huron, 2006; Wallmark, Iacoboni, Deblieck, & Kendall, 2018; Zbikowski, 2010, 2017). Nonetheless, sonata-form pieces as a category tend to be more serious, while rondos are typically light and cheerful. Indeed, this is one of their clearest differences for both historical experts and participants in our study. Affective tendencies, at this level, seem essential to musical genre.

Relations between musical style and form are far from simple, and scholars disagree about the perceptibility of large-scale form (e.g., see Gjerdingen & Bourne, 2015, §2.3.2; Levinson, 1997; Tillmann & Bigand, 2004; Tillmann, Bigand, & Madurell, 1998). The present study does not address form perception directly, and considering “mislabeled” rondos, we have argued that the rondo as a historical cognitive category is not exclusively defined by its large-scale formal design. Yet if style and form are usually related, then local features might cue formal associations of some kind. Listeners might have a general sense of beginning or ending, or of a section’s formal stability or instability (see Granot & Jacoby, 2011). With late eighteenth-century music, such experiences of formal function might relate to particular musical topics, thematic returns, or expressive content (Caplin, 2005; Greenberg, 2017; Warrenburg & Huron, 2019). Still, further research on this topic need not be restricted to classical repertoire. For example, formal sections in rock music (e.g., verse and chorus) can be distinguished on the basis of tonal stability (de Clercq, 2017). In jazz, conventional turnarounds and tags mark returns or endings, respectively. Similarly, culturally situated listeners might respond to schematic closing gestures in gospel music (Shelley, 2017), much as they do to closing gestures in virtuoso piano music (Gjerdingen & Bourne, 2015). Ultimately, our study suggests that cognitive musicology might reconsider form, style, and genre together.

Author Note

This research was supported by a SSHRC Endowment Grant from the University of Western Ontario. Thanks to Richard Ashley, Jessica Grahn, David Temperley, and the journal’s two anonymous reviewers, and to Orlena Bray, Cohen Chaulk, Kathryn McDonald, and Jade Roth for assistance with data collection.

References

References
Bharucha
,
J. J.
(
1994
).
Tonality and expectation
. In
R.
Aiello
&
J. A.
Sloboda
(Eds.),
Musical perceptions
(pp.
213
239
).
New York
:
Oxford University Press
.
Bresin
,
R.
, &
Friberg
,
A.
(
2011
).
Emotion rendering in music: Range and characteristic values of seven musical variables
.
Cortex
,
47
,
1068
1081
.
DOI: 10.1016/j.cortex.2011.05.009
Broze
,
Y.
, &
Huron
,
D.
(
2013
).
Is higher music faster? Pitch–speed relationships in Western compositions
.
Music Perception
,
31
,
19
31
.
Burnham
,
S.
(
1989
).
The role of sonata form in A. B. Marx’s theory of form
.
Journal of Music Theory
,
33
,
247
271
.
Caplin
,
W. E.
(
1998
).
Classical form: A theory of formal functions for the instrumental music of Haydn, Mozart, and Beethoven
.
New York
:
Oxford University Press
.
Caplin
,
W. E.
(
2005
).
On the relation of musical topoi to formal function
.
Eighteenth-Century Music
,
2
,
113
124
. https://doi.org/10.1017/S1478570605000278
Cole
,
M. S.
(
1964
).
The development of the instrumental rondo finale from 1750 to 1800
(
Doctoral dissertation
).
Available from ProQuest Dissertations and Theses database. (UMI No. 6500046)
Cole
,
M. S.
(
1969
a).
The vogue of the instrumental rondo in the late eighteenth century
.
Journal of the American Musicological Society
,
22
,
425
455
.
Cole
,
M. S.
(
1969
b).
Sonata-rondo, the formulation of a theoretical concept in the 18th and 19th centuries
.
The Musical Quarterly
,
55
,
180
192
.
Cole
,
M. S.
(
1970
).
Rondos, proper and improper
.
Music and Letters
,
54
,
388
399
.
Cole
,
M. S.
(
2001
). Rondo. In
S.
Sadie
&
J.
Tyrell
(Eds.),
The new Grove dictionary of music and musicians
(2nd ed., Vol.
21
, pp.
649
656
).
London
:
Macmillan
.
Czerny
,
C.
(
1848
).
School of practical composition
(
J.
Bishop
, Trans.).
London
:
Robert Cocks & Co
.
Dalla Bella
,
S.
,
Peretz
,
I.
,
Rousseau
,
L.
, &
Gosselin
,
N.
(
2001
).
A developmental study of the affective value of tempo and mode in music
.
Cognition,
80
,
B1–B10
.
DOI: 10.1016/S0010-0277(00)00136-0
de Clercq
,
T.
(
2017
).
Interactions between harmony and form in a corpus of rock music
.
Journal of Music Theory
,
61
,
143
170
. https://doi.org/10.1215/00222909-4149525
De Souza
,
J.
(
2019
). Texture. In
A.
Rehding
&
S.
Rings
(Eds.),
The Oxford handbook of critical concepts in music theory
(pp.
160
183
).
New York
:
Oxford University Press
.
Duane
,
B.
(
2013
).
Auditory streaming cues in eighteenth- and early nineteenth-century string quartets: A corpus-based study
.
Music Perception
,
31
,
46
58
.
Eerola
,
T.
,
Friberg
,
A.
, &
Bresin
,
R.
(
2013
).
Emotional expression in music: Contribution, linearity, and additivity of primary musical cues
.
Frontiers in Psychology
.
Retrieved from
https://doi.org/10.3389/fpsyg.2013.00487
Eerola
,
T.
, &
Toiviainen
,
P.
(
2004
).
MIDI Toolbox: MATLAB tools for music research
.
Jyväskylä, Finland
:
University of Jyväskylä
.
Elo
,
S.
, &
Kyngäs
,
H.
(
2008
).
The qualitative content analysis process
.
Journal of Advanced Nursing
,
62
,
107
115
. https://doi.org/10.1111/j.1365-2648.2007.04569.x
Friedman
,
R. S.
,
Trammell Neil
,
W.
,
Seror
,
G. A.
, &
Kleinsmith
,
A. L.
(
2018
).
Average pitch height and perceived emotional expression within an unconventional tuning system
.
Music Perception
,
35
,
518
523
.
Gagnon
,
L.
, &
Peretz
,
I.
(
2003
).
Mode and tempo relative contributions to “happy-sad” judgements in equitone music
.
Cognition and Emotion
,
17
,
25
40
.
Galand
,
J.
(
1995
).
Form, genre, and style in the eighteenth-century rondo
.
Music Theory Spectrum
,
17
,
27
52
.
Gjerdingen
,
R. O.
, &
Bourne
,
J.
(
2015
).
Schema theory as a construction grammar
.
Music Theory Online
,
21
(
2
).
Retrieved from
http://www.mtosmt.org/issues/mto.15.21.2/mto.15.21.2.gjerdingen_bourne.html
Gjerdingen
,
R. O.
, &
Perrott
,
D.
(
2008
).
Scanning the dial: The rapid recognition of music genres
.
Journal of New Music Research
,
37
,
93
100
.
Granot
,
R. Y.
, &
Jacoby
,
N.
(
2011
).
Musically puzzling I: Sensitivity to overall structure in the sonata form?
Musicae Scientiae
,
15
,
365
385
.
Grave
,
F.
(
2010
).
Mozart’s problematic rondos
.
Min-Ad: Israeli Studies in Musicology Online
,
8
(2),
134
148
.
Retrieved from
https://www.biu.ac.il/hu/mu/min-ad/10/Floyd%20Grave.pdf
Greenberg
,
Y.
(
2017
).
Of beginnings and ends: A corpus-based inquiry into the rise of the recapitulation
.
Journal of Music Theory
,
61
,
171
200
.
Hansen
,
N. C.
, &
Huron
,
D.
(
2018
).
The lone instrument: Musical solos and sadness-related features
.
Music Perception
,
35
,
540
560
.
Hepokoski
,
J.
, &
Darcy
,
W.
(
2006
).
Elements of sonata theory: Norms, types, and deformations in the late-eighteenth-century sonata
.
New York
:
Oxford University Press
.
Hevner
,
K.
(
1937
).
The affective value of pitch and tempo in music
.
The American Journal of Psychology
,
49
,
621
630
.
Horn
,
K.
, &
Huron
,
D.
(
2015
).
On the changing use of the major and minor modes
,
1750
1900
.
Music Theory Online
,
21
(
1
).
Retrieved from
http://www.mtosmt.org/issues/mto.15.21.1/mto.15.21.1.horn_huron.html
Huron
,
D.
(
1989
).
Characterizing musical textures
. In
Proceedings of the 1989 International Computer Music Conference
(pp.
131
134
).
San Francisco, CA
:
Computer Music Association
.
Huron
,
D.
(
2006
).
Sweet anticipation: Music and the psychology of expectation
.
Cambridge, MA
:
MIT Press
.
Huron
,
D.
(
2008
).
A comparison of average pitch height and interval size in major- and minor-key themes: Evidence consistent with affect-related pitch prosody
.
Empirical Musicology Review
,
3
,
59
63
.
Huron
,
D.
(
2013
).
A psychological approach to musical form: The habituation-fluency theory of repetition
.
Current Musicology
,
96
,
7
35
.
Juslin
,
P.
, &
Laukka
,
P.
(
2003
).
Communication of emotions in vocal expression and music performance: Different channels, same code?
Psychological Bulletin
,
129
,
770
814
.
Lartillot
,
O.
,
Toiviainen
,
P.
, &
Eerola
,
T.
(
2008
).
A Matlab toolbox for music information retrieval
. In
C.
Preisach
,
H.
Burkhardt
,
B.
Schmidt-Thieme
, &
R.
Decker
(Eds.),
Data analysis, machine learning and applications
(pp.
261
268
).
Berlin
:
Springer
.
Levinson
,
J.
(
1997
).
Music in the moment
.
Ithaca, NY
:
Cornell University Press
.
Levy
,
J. M.
(
1982
).
Texture as a sign in classic and early romantic music
.
Journal of the American Musicological Society
,
35
,
482
531
.
Lippens
,
S.
,
Martens
,
J. P.
, &
De Mulder
,
T.
(
2004
).
A comparison of human and automatic musical genre classification
.
IEEE International Conference on Acoustics, Speech, and Signal Processing
,
4
,
233
236
. https://doi.org/10.1109/ICASSP.2004.1326806
Mace
,
S. T.
,
Wagoner
,
C. L.
,
Teachout
,
D. J.
, &
Hodges
,
D. A.
(
2012
).
Genre identification of very brief musical excerpts
.
Psychology of Music
,
40
,
112
128
. https://doi.org/10.1177/0305735610391347
Margulis
,
E. H.
(
2014
).
On repeat: How music plays the mind
.
New York
:
Oxford University Press
.
Marty
,
J.-P.
(
1988
).
The tempo indications of Mozart
.
New Haven, CT
:
Yale University Press
.
McKay
,
C.
, &
Fujinaga
,
I.
(
2004
).
Automatic genre classification using large high-level musical feature sets
. In
Proceedings of the International Conference on Music Information Retrieval
(pp.
525
530
).
Barcelona, Spain
:
ICMIR
.
McKee
,
E.
(
2004
).
Extended anacruses in Mozart’s instrumental music
.
Theory and Practice
,
29
,
1
37
.
Oramas
,
S.
,
Barbieri
,
F.
,
Nieto
,
O.
, &
Serra
,
X.
(
2018
).
Multimodal deep learning for music genre classification
.
Transactions of the International Society for Music Information Retrieval
,
1
,
4
21
. https://doi.org/10.5334/tismir.10
Poon
,
M.
, &
Schutz
,
M.
(
2015
).
Cueing musical emotions: An empirical analysis of 24-piece sets by Bach and Chopin documents parallels with emotional speech
.
Frontiers in Psychology
.
Retrieved from
http://dx.doi.org/10.3389/fpsyg.2015.01419.
Post
,
O.
, &
Huron
,
D.
(
2009
).
Western classical music in the minor mode is slower (except in the Romantic period)
.
Empirical Musicology Review
,
4
,
2
10
.
DOI: 10.18061/1811/36601
Quinto
,
L.
, &
Thompson
,
W. F.
(
2012
).
Composing by listening: A computer-assisted system for creating emotional music
.
International Journal of Synthetic Emotions
,
3
(
2
),
48
67
.
DOI: 10.4018/jse.2012070103
Reber
,
R.
,
Schwarz
,
N.
, &
Winkielman
,
P.
(
2004
).
Processing fluency and aesthetic pleasure: Is beauty in the perceiver’s processing experience?
Personality and Social Psychology Review
,
8
,
364
382
.
Russell
,
J. A.
(
1980
).
A circumplex model of affect
.
Journal of Personality and Social Psychology
,
39
,
1161
1178
. https://doi.org/10.1037/h0077714
Schutz
,
M.
(
2017
).
Acoustic constraints and musical consequences: Exploring composers’ use of cues for musical emotion
.
Frontiers in Psychology
.
Retrieved from
https://doi.org/10.3389/fpsyg.2017.01402
Shelley
,
B.
(
2017
).
Sermons in song: Richard Smallwood, the vamp, and the gospel imagination
(
Unpublished doctoral dissertation
).
University of Chicago
,
Chicago, IL
.
Sturm
,
B. L.
(
2012
).
A survey of evaluation in music genre recognition
. In
A.
Nürnberger
,
S.
Stober
,
B.
Larsen
, &
M.
Detyniecki
(Eds.),
Adaptive multimedia retrieval: Semantics, context, and adaptation
(pp.
29
66
). https://doi.org/10.1007/978-3-319-12093-5_2
Tillmann
,
B.
, &
Bigand
,
E.
(
2004
).
The relative importance of local and global structures in music perception
.
The Journal of Aesthetics and Art Criticism
,
62
,
211
222
.
Tillmann
,
B.
,
Bigand
,
E.
, &
Madurell
,
F.
(
1998
).
Local versus global processing of harmonic cadences in the solution of musical puzzles
.
Psychological Research
,
61
,
15
174
.
Turner
,
B.
, &
Huron
,
D.
(
2008
).
A comparison of dynamics in major- and minor-key works
.
Empirical Musicology Review
,
3
,
64
68
.
Tzanetakis
,
G.
, &
Cook
,
P.
(
2002
).
Musical genre classification of audio signals
.
IEEE Transactions on Speech and Audio Processing
,
10
,
293
302
. https://doi.org/10.1109/TSA.2002.800560
Vallières
,
M.
(
2011
).
Beginnings, middles, and ends: Perception of intrinsic formal functionality in the piano sonatas of W. A. Mozart
(
Unpublished doctoral dissertation
).
McGill University
,
Montreal, Canada
.
Vallières
,
M.
,
Tan
,
D.
,
Caplin
,
W. E.
, &
McAdams
,
S.
(
2009
).
Perception of intrinsic formal functionality: An empirical investigation of Mozart’s materials
.
Journal of Interdisciplinary Music Studies
,
3
,
17
43
.
Wallmark
,
Z.
,
Iacoboni
,
M.
,
Deblieck
,
C.
, &
Kendall
,
R. A.
(
2018
).
Embodied listening and timbre: Perceptual, acoustical, and neural correlates
.
Music Perception
,
35
,
332
363
.
Waltham-Smith
,
N.
(
2017
).
Music and belonging between revolution and restoration
.
New York
:
Oxford University Press
.
Warrenburg
,
L.
, &
Huron
,
D.
(
2019
).
Tests of contrasting expressive content between first and second musical themes
.
Journal of New Music Research
,
48
,
21
35
.
Zbikowski
,
L. M.
(
2002
).
Conceptualizing music: Cognitive structure, theory, and analysis
.
New York
:
Oxford University Press
.
Zbikowski
,
L. M.
(
2010
).
Music, emotion, and analysis
.
Music Analysis
,
29
,
37
60
.
Zbikowski
,
L. M.
(
2017
).
Foundations of musical grammar
.
New York
:
Oxford University Press
.

Appendix

Description of OSF Contents

This project has an associated webpage on the Open Science Framework (OSF). On this site, you can find data and supplementary materials related to this project, the contents of which we list below. You can access the page here: https://osf.io/ew9h3/?view_only=3a57011526444dad9360c328a3afca01

  1. Figure S1: Distribution of instrumental rondos from 1750–1800 (data from Cole, 1964, p. 34)

  2. publicCorpusFit_R1.mat: This MATLAB file contains the coefficients for a logistic regression model associated with the corpus analysis reported above, but without the standardized continuous predictor variables. Because this model is based on unstandardized variables, it can be used to predict the probability that a new piece (not included in our corpus) is a rondo or sonata movement.

  3. publicRondoAnalysis_R1.m: This MATLAB script contains instructions and code to use the publicCorpusFit.mat file. You can also use it to see how we conducted our analysis in MATLAB, and make your own models from other combinations of our predictor variables.

  4. RondoData_preprocessed_R1.mat: A data table necessary to construct the logistic regression in the publicRondoAnalysis_R1.m script.

  5. CorpusData.xls: This file lists each piece in our corpus and its associated metrics. It also contains a list of the pieces sorted by the probability of being a rondo, as predicted by our model. Note that continuous variables are not standardized in this spreadsheet.

  6. Figure S2: The ROC curve showing the performance of the logistic regression model reported above.