Perceived valence, tension, and movement of harmonic musical intervals (from the unison to the octave presented in a low- and high-register) and standard noises (brown, pink, white, blue, purple) were assessed in two studies that differed in the crossmodal procedure by which tension and movement were rated: proprioceptive device or visual analog scale. Valence was evaluated in both studies with the visual analog scale. In a preliminary study, the proprioceptive device was calibrated with a psychophysical procedure. Roughness of the stimuli was included as covariate. Tension was perceived higher in dissonant intervals and in intervals presented in the high register. The higher the high-pitch energy content in the standard noise, the higher the perceived tension. The visual analog scale resulted in higher tension ratings than the proprioceptive device. Perception of movement was higher in dissonant intervals, in intervals in the high register, and in standard noises than in musical intervals. High-pitch spectrum noises were associated with more sense of movement than low-pitch spectrum noises. Consonant intervals and low-register intervals were evaluated as more pleasant than dissonant and high-register intervals. High-pitch spectrum purple and blue noises were evaluated as more unpleasant than low-pitch spectrum noises.
Musical intervals are the building blocks of each musical composition and their distinctive acoustical properties greatly affect musical experience. In this paper we focused on the perception of tension, movement, and valence (pleasantness/unpleasantness) in harmonic musical intervals varying in pitch register. The results are compared with the same attributes related to specific standard noises. Furthermore, we wanted to explore if the evaluations of tension, movement, and valence were related to some underlying acoustical property such as roughness, or to the contrary if they exhibited distinctive and independent properties.
In the physics domain, “tension” is defined as a pulling force applied to an object. In physiology, this term mainly refers to muscle activity; namely, a state in which the muscle is contracted, as opposed to a state of muscle inactivity or relaxation. Drawing from these basic meanings, tension is widely used in psychology, at a more metaphorical level, to express an emotional state of unrest, imbalance, effort, and latent hostility. Although it is often used with a negative connotation associated with fear, concern, or distress, tension could well be a property of positive emotions, as in a strong erotic desire or the expectations for an adventurous experience (Schimmack & Grob, 2000; Schimmack & Rainer, 2002a).
Wilhelm Wundt (1896, 1911) was the first to stress the importance of the dichotomy of tension (Spannung) and resolution (Lösung) in his three-dimensional model of emotion. However, in more recent dimensional models of emotions, tension is not included as a main factor along with valence, arousal, dominance, and action tendency (Davidson, Scherer, & Goldsmith, 2002). According to Lehne and Koelsch (2015), tension is an affective state that (a) is associated with conflict, dissonance, instability, or uncertainty; (b) creates a yearning for resolution; (c) builds on future, directed processes of expectation, anticipation, and prediction. In this sense, tension cannot be assimilated to the dimension of arousal since it is possible to experience very high states of arousal without tension (e.g., winning a sport competition), since the sense of instability and uncertainty is missing, and very low states of arousal with tension (e.g., tip of the tongue or not recalling the right name).
According to Meyer's (1956) theory of the expression of emotions in music, musical tension is mainly due to the violation of expectations. Different studies have attempted to continuously track perceived tension in music over the whole course of a piece. The first attempt was that of Nielsen (1983) who used a pair of tongs with a spring resistance and a potentiometer placed in the axis to measure the level of tension experienced during listening to Haydn’s Symphony No. 104. The variations in tension during the listening task were explained in terms of grouping tendency, melodic movement, tonality, factors relating to compositional techniques, density as a function of instrumentation and sonority, dynamics as indicated in the musical score, and dynamics as assessed by a sound level meter. Madsen and Fredrickson (1993) replicated Nielsen’s research using a continuous response digital interface for the recording of participant perception of musical tension. Differently from Nielsen (1983) the response did not require physical effort; however, the resulting tension graph showed a high degree of concordance with the one obtained by Nielsen (1983).
Real and virtual sliders were used by Farbood (2012); Krumhansl (1996); Lehne, Rohrmeier, Gollmann, and Koelsch (2013); and Vines, Nuzzo, and Levitin (2005). Krumhansl (1996) used a digital slider to collect participants’ tension ratings while listening to Mozart’s piano sonata K. 282. Intersubject correlation of perceived tension was rather high (.42), showing a good agreement among participants. Peaks of tension were recorded at the end of segments (i.e., perceived autonomous phrases within the piece). Furthermore, the highest tension peaks occurred in measures with the slowest tempos, when melodic contour reached the highest pitch, when note density increased, and when dynamics and loudness increased.
Lehne et al. (2013) compared continuous ratings of felt musical tension for original and modified versions of two piano pieces by Mendelsshon and Mozart. Tension ratings were obtained from the position of a virtual slider presented on a computer screen that could be moved with a mouse. Modifications included versions without dynamics and/or without agogic accents, as well as versions in which the music was reduced to its melodic, harmonic, or outer voice components. The modifications that canceled dynamics and agogics largely preserved the pattern of tension resolution, even if tension ratings were significantly lower, and the tension profiles were flatter. Reducing a piece of music to the outer voices also preserved the tension pattern, showing that the outer voices embody major aspects of the musical structure. The authors also found a strong redundancy between the expressive features that affected the perception of tension, which contributed to the build-up of strong experiences. For example, the highest tension peaks reflected the main structural dominant on the harmonic level, and were prepared by a long crescendo, the rising melody line, the lowest local bass note, the fortissimo and sforzando, and repetition of the chords.
Bigand, Parncutt, and Lerdahl (1996) investigated the effect of tonal hierarchy, sensory chordal consonance, horizontal motion, and music training on perceived musical tension of short chord sequences. Participants had to evaluate the tension created by major or minor triads, major-minor seventh chords, and minor seventh chords when preceded and followed by a major triad on the same scale. The results showed that chords belonging to the key context created less tension than did nondiatonic chords. Diatonic chords falling on the first, fourth, and fifth scale degrees created a decrease in tension. The musical tension experienced on the tonic chord was weaker than that experienced on the dominant and subdominant chords. This research underlined the importance of tonal hierarchies for perceived musical tension, as theorized by Lerdahl (1988, 1996) and Lerdahl and Krumhansl (2007).
In addition to tonal hierarchy, Bigand et al. (1996) also found that minor chords and seventh chords resulted in higher tension ratings than major chords, highlighting the effect of more basic acoustical parameters such as dynamics, and timbral elements such as sensory dissonance, roughness, brightness, and density on the perception of musical tension (Farbood & Price, 2017; Hutchinson & Knopoff, 1978; Krumhansl, 1996; Nielsen, 1983; Plomp & Levelt, 1965; Pressnitzer, McAdams, Winsberg, & Fineberg, 2000). Perceived tension tends to increase with increasing dynamics (Burnsed & Sochinski, 1998; Granot & Eitan, 2011; Ilie & Thompson, 2006; Krumhansl, 1996; Misenhelter, 2001). Among low-level timbre attributes roughness is the one mostly related to tension. Bigand et al. (1996) reported that higher roughness in tonal chord progressions was correlated with higher tension. Pressnitzer et al. (2000) showed that this effect also applies to atonal harmony. Roughness is a sensation that occurs when pairs of sinusoids are close enough in frequency such that listeners experience a beating sensation. It is closely related with sensory dissonance, a term first introduced by Helmholtz (1877/1954), who proposed that the perception of dissonance corresponded to the beating between partials and fundamental frequencies of two tones. Roughness is a more general term than sensory dissonance that can be applied to all kinds of sounds, including noises (Leman, 2000). Plomp and Levelt (1965) showed that roughness and sensory dissonance were greatest when the distance between the components of a pair of pure-tones was approximately one quarter of the critical band, where critical bandwidth lies in the approximate range 10%–20% of the center frequency (≅ three semitones), for center frequencies above 500–1,000 Hz, and in the approximate range of 50–100 Hz at lower frequencies (Moore & Glasberg, 1983; Plomp & Steeneken, 1968). Later studies, based on amplitude modulated (AM) tones and noises have confirmed Plomp’s results, providing additional details. In these studies the term roughness, rather than sensory dissonance, was applied. Zwicker and Fastl (1990, p. 234) related roughness to three attributes: the degree of amplitude modulation, the frequency of the modulation, and the center frequency of the sound. The computation of roughness could be performed according to two main models: curve mapping or auditory mapping. In the first category roughness is derived from the mapping of all frequency intervals or frequency component pairs present in the spectrum of the sound. The roughness is then defined to be equal to the sum of the dissonances generated by each pair of adjacent frequency components. Sethares’s roughness index (Sethares, 2005), for example, follows this computational model. A second class of models is based on auditory modeling, which simulates cochlear mechanical filtering using an array of overlapping band-pass filters (Aures, 1985; Daniel & Weber, 1997). Vassilakis (2001) has offered a computational method that tried to correct previous models for overestimation of the contribution of sound pressure level to roughness, and underestimation of the contribution of the degree of amplitude fluctuation to roughness.
Hutchinson and Knopoff (1978) formalized the model of Plomp and Levelt (1965) so that it could be applied to musical chords. The results showed that chords with minor thirds have greater roughness than chords with major thirds, and chords with sevenths have greater roughness than chords without sevenths.
Few studies have examined the influence of pitch register on perceived tension. Granot and Eitan (2011) found that lower register (in the range of 73–139 Hz) was strongly associated with higher tension values compared to the higher register (in the range of 247–466 Hz.), but only for nonmusicians. On the other hand, Ilie and Thompson (2006) found that low-pitched music was rated less tense. However, their “low” pitch versions (mean frequency = 156.77 Hz) were only approximately four semitones below the “high” versions (mean frequency = 191.28 Hz), whereas in Granot and Eitan (2011) the two registers were two octaves apart. Farbood (2012) examined ascending or descending sequences of chords and found that ascending sequences were clearly related to an increase of perceived tension, while descending sequences were related to a decrease in perceived tension. However, in those sequences the directionality of melodic pitch covaried with register. Therefore, the evidence of the influence of pitch register on perceived tension is inconclusive. In our studies we manipulated pitch register both in Experiment 2 and 3: the “low” pitch musical intervals were 19 semitones apart from the “high” set of musical intervals.
As showed by Fredrickson (1999), having extensive familiarity with music does not greatly affect listeners’ perception of tension, and both musicians and nonmusicians tend to respond similarly in tension rating tasks (Bigand & Parncutt, 1999; Fredrickson, 2000; Fredrickson & Coggiola, 2003; Frego, 1999; Lychner, 1998), although some studies highlighted significant differences between musicians and nonmusicians. Bigand et al. (1996), for example, found that horizontal pitch motion (i.e., melodic structure) was less effective than vertical motion (i.e., changes in harmony, tonal hierarchy, and key region) in influencing the perception of tension in musicians. These results are in line with those of Parncutt (1989) who found that musicians are generally less sensitive to melodic effects and more sensitive to harmonic effects than nonmusicians. Intersubject agreement tends to be higher among musicians than nonmusicians (Bigand & Parncutt, 1999; Krumhansl, 1996). Judgements of tension also tend to be consistent for repeated trials. For example, Bigand and Parncutt (1999) noted that tension ratings were similar from the first to the fourth hearing of an excerpt.
Previous literature on the perception of musical tension has mainly focused on musical excerpts and chord sequences, neglecting more basic musical features such as musical intervals. We think that an analysis of perceived tension induced by musical intervals alone could better clarify the role of sensorial features such as consonance/dissonance, roughness, and brightness in comparison to more high-level musical features such as tonal hierarchy, melodic contour, or dynamic expression in the perception of tension. Musical intervals, either melodic or harmonic, are the basic units of every musical composition, and have a profound impact on the expressive function of music (Costa, Fine, & Ricci Bitti, 2004; Costa, Ricci Bitti, & Bonfiglioli, 2000).
Musical tension was evaluated with two cross-modal matching procedures that were then compared. In one procedure participants had to match perceived tension in musical stimuli with the muscular tension and the pulling angular movement they had to apply to a lever connected to a spring, whereas in the second procedure participants had to match perceived tension with a horizontal visual analog scale. Since the pioneering work by Nielsen (1983), only two studies have employed a proprioceptive system based on force feedback for the evaluation of auditory sensations (e.g., loudness) (Susini & McAdams, 2000; Susini, Mcadams, & Smith, 2002). We suspected that the mapping of perceived tension over muscular tension, using a lever with a wide angle of rotation (60°), would lead to more accurate ratings since the two dimensions (i.e., perceived tension and muscular tension) shared the same core concept of tension. Furthermore, a cross-modal matching procedure has the advantage of avoiding biases associated with numerical ratings (Poulton, 1989) and the effects of pitch mapping on the horizontal and vertical space. In fact, with judgments expressed on a vertical or horizontal direction the rating could be affected by the sensory mapping of pitch on the vertical and horizontal space dimensions. Pitch has in fact a main space mapping on the vertical space (Bonetti & Costa, 2018; Evans & Treisman, 2010; Rusconi, Kwan, Giordano, Umiltà, & Butterworth, 2006), and secondary mapping on the horizontal dimension (high pitch-right, low pitch-left) in musicians (Rusconi et al., 2006).
The main aim of the three experiments presented in this paper was to investigate how perceived tension, perceived movement, and pleasantness varied across musical intervals and standard noises and how it was modulated by pitch register using two cross-modal matching psychophysical procedures. In the first experiment, we performed a psychophysical calibration of a proprioceptive device used for the subsequent ratings of perceived tension. We determined the power function relating the physical force to the apparent force using a ratio production task (Stevens, 1959; Susini & McAdams, 2000) in which participants had to double or to halve a specific initial tension that was varied for each trial.
In the second and third experiments, two cross-modal matching procedures were applied to the assessment of perceived tension, perceived movement, and pleasantness in musical (harmonic) intervals and five standard noises. We considered all musical intervals within the octave, including the unison (13 intervals) and brown, pink, white, blue, and purple noises. The choice to include standard noises was the opportunity to test on a wider psychoacoustic scale the role of roughness in the perception of tension, movement, and valence. If the perception of tension is mainly due to roughness, then standard noises should be perceived as extremely tense. In standard noises, in fact, many frequencies that fall within critical bandwidths are combined, giving rise to a mixing of many beating sounds. We also manipulated pitch register, comparing a set of intervals in a low-pitch register with a set of analogous intervals transposed to a high-pitch register. The timbre was a combination of fundamental and five harmonics with linear decreasing amplitude. In the second experiment participants had to assess the perception of tension and movement with the proprioceptive device, whereas valence (pleasantness-unpleasantness) was rated with a visual analog scale. We choose the visual analog scale for valence because we could set a neutral-central point, whereas in the proprioceptive device the scale must be unipolar with a null point and a maximum point. The third experiment mirrored exactly the procedure used in the second experiment with the only exception of the cross-modal assessing device, which was a visual analog scale for all the three dependent variables: perceived tension, perceived movement, and valence.
In order to design a cross-modal matching task between perceived tension, perceived movement in musical stimuli, and muscular tension/angular movement, we developed a proprioceptive device consisting of a long lever (85 cm) with an angular displacement of 60° and a pulling range of 0 to 33.1 N, as showed in Figure 1. We decided to use a long lever with a high angular excursion to maximize the rating range. The first experiment aimed to provide a psychophysical validation of the proprioceptive device, determining the power function linking the physical force (expressed in Newtons) to the apparent force using a ratio production task (Stevens, 1959; Susini & McAdams, 2000). The procedure mirrored the one used by Susini and McAdams (2000) for the rating of loudness.
Eighteen university students took part in the experiment (8 females, Mage = 24.78 years, SD = 6.82). All participants were right handed as assessed by the Edinburgh Handedness Inventory (Oldfield, 1971). The students participated on a voluntary basis.
The proprioceptive device (Figure 1) consisted of a vertical lever (85 cm) rotating around a fulcrum. A harmonic-steel spring was connected 16-cm apart from the fulcrum to the lower end of the lever and joined horizontally to the metallic chassis. When the upper end of the lever was pulled, the spring created a linearly increasing tension. The maximum rotation displacement of the lever was 60°, corresponding to a force of 33.1 N. In order to continuously record the displacement of the lever, a 9V DC supply and a voltage regulator (to maintain a constant voltage) were wired to a linear 10 kΩ potentiometer mounted in the fulcrum so that the output voltage was a linear function of the lever displacement. Voltages were converted into digital values using a DAC device (National Instruments USB-6225) and recorded on a PC using a Matlab script. A red tape was applied to the top 10 cm of the lever, marking the position of the handgrip for the participants.
The linearity of the function relating the output voltage with the physical force was computed sampling the physical force (N) and the output voltage (V) over 20 discrete angles equidistant from each other (3°), covering the whole 60° displacement. The force was measured with a digital dynamometer (accuracy: ± 0.049 N). The resulting linear regression had an R2 of .993. The linear function is reported in Equation 1 where N is for force expressed in Newtons and V is the voltage measured at the potentiometer output.
Written informed consent was obtained from all participants before the beginning of the experiment. Participants were seated comfortably on a chair in front of a computer with the lever on their right. After explaining how the proprioceptive device worked and the exact position of the handgrip, participants were asked to familiarize with the lever. They were instructed about the ratio production task and asked to follow the instructions on the screen during the experimental session. Each session included two conditions (25 trials each, including 5 practice trials) for a total of 50 trials. At the beginning of each trial, participants were required to pull the lever until they heard a continuous beeping sound. Then, starting from that position (position A) they were required to double the tension (“double” condition) or to halve the tension (“halve” condition). Once they reached the target position (position B) they had to press the spacebar on a keyboard and move the lever to the initial rest position waiting for the following trial. A five-second pause after each trial ensured muscular rest. The order of the two conditions (“double” and “halve”) was counterbalanced across participants. Position A was randomly assigned in each trial (tolerance ± 1.2°) using a specific restriction in the “double” condition. In this case, in order to avoid a ceiling effect, position A randomly varied between 0° and 27° (corresponding to the 45% of the overall angle of displacement).
Voltage values recorded during the ratio production task were converted into force values (expressed in N) using Equation 1. The constant k and the exponent a for the power function were estimated using the Curve Estimation function in SPSS. The curve estimation was performed separately for the “double” and “halve” conditions. The force values (N) that matched positions A were doubled in the “double” condition and halved in the “halve” condition, then these values were regressed with the apparent forces that matched positions B. A general power function was then obtained by averaging the values of the constant k and the exponent a for the two conditions.
The mean exponent of the power function was 1.03 while the mean constant was 1.18. Thus, the resulting proprioceptive power equation is reported in Equation 2 where designates the apparent tension and the physical force. Since the exponent is greater than 1 the proprioceptive tension sensation was positively accelerated as a function of the magnitude of physical force.
Figure 2 shows the power functions of the two experimental conditions (“Double” and “Halve”) and the general power function resulting from the average of the two conditions.
In this experiment we offered a psychophysical calibration of the cross-modal proprioceptive device that was used in Experiment 2 for the evaluation of perceived tension and perceived movement to musical stimuli. The calibration followed a procedure of ratio production in which participants had to double or halve a given pulling force applied to the lever (Stevens, 1958). The power function relating physical and perceived force showed a general tendency to overestimate the force applied to the proprioceptive device, with a greater overestimation in the “double” condition compared to the “halve” condition. This effect increased as the physical force increased.
Previous studies that have investigated the power function for muscular tension are not unanimous in showing a specific exponent. Stevens (1989), for example, cited different experiments that measured psychophysical functions for isometric force with exponents that ranged from 1.5 to 1.8. Stevens (1975) measured the apparent muscular force exerted by a participant on the handle of a dynamometer using different methods of direct judgments obtaining a power law with an exponent of 1.7. In another study, a force was applied to the palm of the hand, yielding an exponent of 1.1 (Stevens, 1960). Susini and McAdams (2000) validated a proprioceptive device in which both the force and the angular displacement varied obtaining an exponent of 1.77. To the contrary, Van Doren (1996) found exponents between 0.6 and 0.8 in a halving and doubling procedure for the assessment of isometric force. The differences highlighted in previous literature could be attributed to the high variability in the methods and procedures for eliciting muscular force and to differences in the scaling techniques (Poulton, 1989).
Experiments 2 and 3
In Experiments 2 and 3 we applied two cross-modal procedures for studying the perception of tension, movement, and pleasantness/unpleasantness of harmonic musical intervals and standard noises. The device described in Experiment 1 was used for the assessment of the perceived tension and movement of musical stimuli in Experiment 2, whereas in Experiment 3 the same stimuli were evaluated using a visual analog scale. In both studies stimuli were musical intervals (all the musical intervals from the unison to the octave), and five calibrated noises: white, purple, blue, pink, and brown noise. They differ in their spectrum and emphasis on low-pitch or high-pitch frequencies.
Pitch register of the musical intervals was manipulated within participants in two levels: high-pitch register and low-pitch register. The interval sets in the two registers differed by an interval of 19 semitones (one octave and a fifth). We introduced pitch register as an independent variable because few studies have examined pitch register explicitly in relation to perceived tension (Farbood, 2012; Granot & Eitan, 2011; Ilie & Thompson, 2006).
Perceived tension, movement, and valence were also compared with the level of roughness of musical intervals and noises, computed according to Sethares (2005).
Experiment 2 Participants
Twenty-five university students (17 females, Mage = 25.47 years, SD = 6.82, and 8 males: Mage = 24.50 years, SD = 1.87) participated in the experiment. None of the participants were professional musicians. The distribution of years of music study or musical instrument practice between participants was: 0 years = 18, 1 year = 2; 3 years = 2, 5 years = 1, 6 years = 1.
Experiment 3 Participants
Twenty university students (5 females, Mage = 29.83 years, SD = 10.03, and 15 males: Mage = 27.07 years, SD = 7.89) participated in the experiment. The distribution of years of music study or musical instrument practice between participants was: 0 years = 12; 1 year = 1; 2 years = 4; 3 years = 1; 5 years = 1; 10 years = 1.
For both studies none of the participants had hearing loss (self-reported). Participation was on a voluntary basis and an informed written consent was obtained from each participant. Both studies were approved by the University of Bologna research ethics committee.
The proprioceptive device validated in Experiment 1 was used for tension and movement ratings in Experiment 2. The audio output was controlled by a USB Audio/MIDI interface (Roland UA-25). Audio stimuli were delivered over noise-isolating headphones (Sennheiser HD 2.20s). Stimulus presentation and timing were controlled by the E-Prime software. Tension and movement ratings were acquired and recorded through a Matlab script. Synchronization between the E-Prime software and the Matlab acquisition routine was guaranteed by a parallel-port connection between the two PCs. Valence ratings were acquired through a visuo-analogic scale consisting of a horizontal line presented at the center of the screen and a cursor that could be moved along the line with the mouse. The horizontal bar subtended a viewing angle of 11.6°. Unpleasant and Pleasant were placed as anchors at the extremes left and right of the line. Psychtoolbox-3 for Matlab (Brainard, 1997) controlled the presentation of the visuo-analogic scale and managed data recording.
In Experiment 3 the visual analog scale was used for all the ratings (tension, movement, and valence).
Two sets of thirteen musical dyads (i.e., two-note musical intervals) were digitally created using the Csound software. One set comprised the thirteen musical intervals within an octave (from the perfect unison to the perfect octave) using C3 as lower note (low-register condition); the other set was analogously built using G4 as lower note (high-register condition). Stimuli in the two conditions were therefore 19 semitones apart. To exclude the influence of beatings due to a specific tempered tuning system, intervals were computed using just ratios between the lower and the upper voice (five-limit tuning). The frequency spectrum of each note forming the dyads was computed adding five linear-decreasing partials to the fundamental frequency according to the Formula 3,
The standard noises included white, purple, blue, pink, and brown noises. In white noise all 20–20,000 Hz frequencies had equal power. Purple noise power density increased 6 dB per octave with increasing frequency (density proportional to f2). Blue noise power density increased 3 dB per octave with increasing frequency (density proportional to f). In purple and blue noises there was a dominance of high-register frequencies. In pink noise power density, there was a fall off of 3 dB/octave with increasing frequency (density proportional to 1/f). The frequency spectrum was linear in a logarithmic scale. In brown noise (also Brownian or red noise) the power density decreased 6 dB/octave with increasing frequency (density proportional to 1/ f2). In pink and brown noises there was a dominance of low-register frequencies (Figure 4). The spectrums of the five noises, considering a linear frequency scale in abscissa, are reported in Figure 4.
Stimuli were stationary sounds with a rise- and decay-time of 50 ms; their loudness was equalized to 23.88 sones with the Matlab Genesis Loudness Toolbox (Genesis, 2009), applying the ANSI S34 2007 procedure (American National Standards Institute, 2007). Stimuli were presented at a sound level of 68.5 dbA (measured with a DeltaOhm HD2010 phonometer set with A ponderation curve). All the stimuli (musical intervals and noises) are available in the Supplementary Materials accompanying this paper at mp.ucpress.edu.
Roughness values according to Sethares’ model were computed using the MIRtoolbox for Matlab (Lartillot, Toiviainen, & Eerola, 2008). The levels of roughness for musical intervals and standard noises used in Experiments 2 and 3 are shown in Figure 5. For informative purpose we also show the brightness levels for the same stimuli (Figure 6). Brightness for each stimulus was assessed by MIRtoolbox 1.7.2 (Lartillot et al., 2008). Brightness is related to the amount of energy that exceeds a specific frequency threshold that in our case was set as 1,500 Hz. It was expressed as a proportion ranging from 0 to 1. The correlation between roughness and brightness was .47 (p < .001). The mean roughness level for noises was higher than for musical intervals (Mnoise = 5952.99, Mintervals = 725.98). The difference was significant, F(1, 29) = 10.08, p = .003, = 0.26.
Considering musical intervals only, roughness was higher for low-pitch intervals (M = 1352.73) than for high-pitch intervals (M = 99.23), F(1, 24) = 28.96, p < .001, = 0.54. Roughness parameters were not significantly different between consonant intervals (M = 904.99), imperfect consonant intervals, thirds, and sixths (M = 1006.93), and dissonant intervals (M = 763.64): p = .68.
A brief questionnaire assessed self-reported hearing problems, and the years of music practice (singing or playing an instrument) and/or music study. Ratings for tension, pleasantness, and movements were collected in three separate blocks, whose order was randomized between participants. Each block consisted of 36 trials in which the 13 musical intervals and the five standard noises were presented twice. The order of the trials was randomly assigned within each block and the order of blocks (tension, movement, valence) was randomly assigned within each participant. Each stimulus had a duration of 1 s, and the participant could relisten the sound pressing the R key on the keyboard. Interstimulus interval, computed from the emission of the response until the onset of the next stimulus, was 3.5 s. The response was not time limited.
For the ratings with the proprioceptive device, the voltages recorded as output were converted as pulling force (Newtons) through Equation 1 found in Experiment 1. The pulling force was then converted to perceived force with Equation 2 found in Experiment 1. The maximum perceived force, corresponding to a lever pulling until the upper limit, was 43.65. For the ratings on the visual analog scale the point that was chosen was converted as percentage of line bisection. Evaluations were therefore converted in a 0–100 scale, with 0 the extreme left and 100 the extreme right of the horizontal line.
The data were analyzed considering these independent variables: (a) crossmodal procedure with two levels (proprioceptive device and visual analog scale); (b) stimulus with 18 levels (13 musical intervals and 5 noises); (c) register with two levels (low and high, only for musical intervals); (d) consonance with three levels (perfect consonances: P0, P4, P5, P8; imperfect consonances: m3, M3, m6, M6; dissonances: m2, M2, A4, m7, M7). Dependent variables were the ratings of tension, movement, and valence. Roughness and years of music studies/instrumental practice were entered as covariates. Pairwise comparisons were performed with the Tukey-HSD test.
In order to compare the data collected with the proprioceptive device with those collected with the visual analog scale, we linearly remapped the data with the proprioceptive device from a range 0–43.65 to the range 0–100 of the visual analog scale. We decided not to z-transform the data because their distribution in both Experiments 2 and 3 was not normal, as shown by Shapiro-Wilk tests. Figures 7, 8, and 9 show the distributions for perceived tension, movement, and valence ratings for both Experiments 2 and 3. For valence, skewness was relatively small (−.01 and .005 for Experiment 2 and Experiment 3, respectively) and the levels of kurtosis were negative (−.19 and −.55 for Experiment 2 and Experiment 3) (Figure 7). For tension ratings, the distribution related to the use of the proprioceptive device was positively skewed (.42), while the distribution related to the use of the visual analog scale was negatively skewed (−.22) while kurtosis was negative in both cases (−.48 and −.83 when using the proprioceptive device and the visual analog scale, respectively) (Figure 8). For movement ratings, both distributions were positively skewed (.65 when using the proprioceptive device and .18 when using the visual analog scale); the distribution was playkurtic (negative kurtosis) when using the visual analog scale (−.83) and leptokurtic (positive kurtosis) (.66) when using the proprioceptive device (Figure 9).
The data were analyzed applying a linear mixed-effect model (Laird & Ware, 1982; Pinheiro & Bates, 2000), which is more robust in dealing with repeated measures design with covariates (Wallace & Green, 2002). The assumption of normality of residuals was tested with a visual inspection of the Q-Q plot. For each dependent variable (valence, tension, and movement) we performed two linear mixed model analyses: the first including all stimuli (musical intervals and noises) and the second including only musical intervals. The reason was that some attributes (high vs. low register and level of consonance) pertained only to musical intervals and not to noises. In both analyses participant was entered as random effect. In the analysis involving all stimuli, the type of stimulus and the crossmodal procedure (visual analog scale vs. proprioceptive device) were entered as fixed effects and roughness and years of music study/instrumental practice were inserted as covariates. In the analysis involving musical intervals, the level of consonance (perfect consonance, imperfect consonance, dissonance), the register (high, low), and the crossmodal procedure were entered as fixed effects, whereas roughness was entered as covariate. Each fixed effect was inserted sequentially in the model in order to test if it contributed significantly or not to increase the validity of the model. Each model was fit by maximizing the log-likelihood and assessed using the Akaike information criterion (AIC). With reference to valence, in order to increase the legibility of the results, the data, originally expressed in the range 0–100 were converted to the range −50 +50 since in the visual analog scale the middle point was indicated as a neutral point. All statistical computations were performed using R (version 3.6.1).
The Pearson’s correlations between the dependent variables and the covariate roughness are reported in Table 1. Valence was negatively correlated with tension and roughness. The correlation with movement was very low but significant. Tension was negatively correlated with valence, and positively correlated with movement. The correlation between tension and roughness was not significant. Movement was slightly negatively correlated with valence, and positively correlated with tension, and roughness.
Valence: Musical intervals and noises
Table 2 shows the results of the comparison between the incremental linear mixed design models that tested the fixed effects on valence ratings. The significant predictors resulted the stimulus and the covariate roughness. The parameters and coefficients of the final linear model are presented in Table 3. The Q-Q plot of the residuals is shown in Figure 10.
Note: The fixed factors were sequentially inserted in the model. Participant was considered as random factor.
Estimated marginal means and 95% confidence intervals for valence ratings considering the stimuli used in Studies 2 and 3 are reported in Figure 11.
Valence: Musical intervals only
The linear mixed model testing the effects of consonance, register, roughness, and crossmodal procedure on valence ratings of musical intervals showed a significant effect of consonance and register as reported in Table 4, which shows the estimated parameters of the model.
Estimated means and 95% confidence intervals for the three levels of consonance as a function of register are reported in Figure 12.
Estimated marginal means for the three consonance levels were: consonance = −2.43 (SE = 1.62); imperfect consonance = −5.79 (SE = 1.62); dissonance = −11.41 (SE = 1.59). Tukey HSD tests showed that all the contrasts between the three levels were significant. High-register intervals were evaluated as more unpleasant (EMM = −11.81; SE = 1.61) than low-register intervals (EMM = −1.28; SE = 1.61) (z = 10.92, p < .001).
Tension: Musical intervals and noises
Table 5 shows the results of the mixed linear model analysis applied to tension ratings. We sequentially added each fixed factor, testing the significance of each n model with the n −1 model. The factors that proved significant were stimulus (musical interval and noises) and crossmodal procedure. In the final model we included therefore stimulus and crossmodal procedure and the estimated parameters are shown in Table 6, whereas Figure 13 shows the Q-Q plot for residuals.
Note: The fixed factors were sequentially inserted in the model. Participant was considered as random factor.
Estimated marginal means and 95% confidence intervals for tension ratings as a function of the crossmodal procedure are reported in Figure 14.
Tension: Musical intervals only
The linear mixed model for tension rating of musical intervals including level of consonance, register, roughness, and crossmodal procedure as predictors showed that all the fixed effects and the covariate were significant, as shown in Table 7.
Tension was evaluated highest for dissonant intervals (EMM = 56.61 SE = 1.88), intermediate for imperfect consonances (EMM = 46.35, SE = 1.93), and lowest for consonant intervals (EMM = 43.85, SE = 1.93). Tukey HSD tests showed that all the contrasts between the three levels were significant.
Tension was evaluated higher for high-register intervals (EMM = 54.93, SE = 1.91) in comparison to low-register intervals (EMM = 42.86, SE = 1.91): z = −8.67, p < .001. With reference to the crossmodal procedure, tension was rated higher when rated by the visual-analog scale (EMM = 53.75, SE = 2.51), than when rated with the proprioceptive device (EMM = 44.11, SE = 2.51): z = −2.71, p = .006.
Movement: Musical intervals and noises
Table 8 shows the results of the mixed linear model analysis for movement perception. We sequentially added each fixed factor, testing the significance of each n model with the n −1 model. The factors that resulted significant were stimulus (musical interval and noises) and roughness. In the final model we included therefore stimulus and roughness. The estimated parameters are shown in Table 9, whereas Figure 17 shows the Q-Q plot for residuals.
Note: The fixed factors were sequentially inserted in the model. Participant was considered as random factor.
The estimated marginal means for movement rating are reported in Figure 18.
Movement: Musical intervals only
The linear mixed model for movement rating of musical intervals including level of consonance, register, roughness, and crossmodal procedure as predictors showed that the fixed effects of consonance level and register were significant, as shown in Table 10.
Estimated marginal means and 95% confidence intervals for the three levels of consonance as a function of register are shown in Figure 19.
The perception of movement was higher for dissonant intervals (EMM = 39.78, SE = 1.95) and imperfect consonances (EMM = 39.03, SE = 1.99) in comparison to consonant intervals (EMM = 34.81, SE = 1.99). Tukey HSD tests showed that the only contrast that was not significant was the difference between dissonant and imperfect consonant intervals. The perception of movement was significantly higher for high-register intervals (EMM = 41.34, SE = 1.98) than for low-register intervals (EMM = 34.31, SE = 1.98).
This paper had five main goals: 1) to investigate the perception of tension, movement, and valence in musical intervals and specific standard noises; 2) to test the influence of acoustical roughness in the perception of tension, movement, and valence; 3) to assess the influence of register on the three attributes with reference to musical intervals only; 4) to compare tension, movement, and valence between “voiced” musical intervals and “unvoiced” standard noises that differed in their spectral emphasis on low or high frequencies; 5) to compare two crossmodal methods for the assessment of the perception of tension and movement: one that mapped tension and movement along a visual analog scale and the other that relied on a proprioceptive device in which tension and movement were mapped with muscular force and pulling angle.
The three attributes of valence, tension, and movement, as expected, were not completely independent but showed a discrete degree of independence. The strongest relation was between valence and tension, and it was inversely related. Tension was mainly experienced with sounds perceived unpleasant and with negative valence. This is an interesting aspect in the perspective of emotion and music and emotion theory. Some authors have associated tension with affective arousal (Krumhansl, 1997; Trolio, 1976). Rozin (2004), for example, measured moment-to-moment “affective intensity,” and Huron's model of expectation (2006) includes an arousal-related tension component. The association of tension with arousal tends to be further promoted by the common use of a 2D arousal-valence space for collecting data on emotional response to music. Other authors proposed to differentiate between tension and energy arousal. Specifically, Thayer (1989) reconceptualized activation as varying along two dimensions: energetic arousal (awake-tired) and tense arousal (tense-calm), and this distinction was further supported by Ilie and Thompson (2006) and Schimmack and Rainer (2002). Eerola and Vuoskoski (2011) tested a 3D model for emotion in music that included valence, tension, and energy as the main dimensions. In our study tension and valence were not completely orthogonal factors since tension was mainly associated with dissonant intervals. The highest linear coefficients in the tension model were found for the minor second, the augmented fourth, and major seventh, which are the major dissonant intervals. For standard noises we recorded a similarly inversely related pattern between valence and tension that was mediated by the spectrum-related energy content. The colored noises that emphasized high frequencies—namely the purple and blue noises—were evaluated extremely negative for valence in comparison to the brown and pink noises, whose spectral density is more concentrated on low frequencies. When evaluated for tension pink and brown noises were evaluated as less tense than blue and purple noises. Interestingly, while purple and blue noises were evaluated as more unpleasant than all dissonant musical intervals, when evaluated for tension the pattern was not completely symmetrical because their perceived tension was lower than that of dissonant intervals (minor second, augmented fourth and major seventh).
Perceived tension was strongly related to pitch register both in musical intervals and noises. Perceived tension was lower in the low-register set of musical intervals and for the colored noises that emphasized low-frequencies, pink and brown noises, whose beta coefficients in the linear model of tension were considerably high. This is in line with Ilie and Thompson (2006) who, as in our study, found that low-pitched music was rated as more pleasant and less tense, and with the results of Farbood (2012), who found that sequences of descending chords were mostly associated with a decrease in tension and sequences of ascending chords were associated with an increase in tension. The use of more structured stimuli in these other studies implied that pitch register was not isolated from melodic contour. Low-pitch stimuli were also often the results of a descending melodic line and, since pitch is strongly mapped on a vertical space (Bonetti & Costa, 2018), the decrease in tension could have been the result of a perceived descending melodic line. Since in our studies the stimuli were stationary and composed by steady musical intervals and noises and the effect of pitch register was very pronounced, we can conclude that this factor is one the best predictor of perceived tension. This effect was also found by McAdams, Douglas, and Vempala (2017), who investigated the perception of affective qualities of musical instrument sounds across pitch register. They found that higher tension was carried by brighter sound.
This association between high pitch and increase in perceived tension could be explained in an ecological-evolutionary framework. From this perspective, musical tension is affected by those auditory features that are associated with tension in “natural” extramusical contexts. An increase in pitch-height in vocal emissions is a signal of distress, fear, anger, and isolation in many species. Most alarm calls that are used in social animals to alert conspecifics about the presence of a predator are characterized by a significant increase in pitch in comparison to normal vocalizations (Fallow, Gardner, & Magrath, 2011). High-pitch vocalizations are reliably perceived as an indicator of distress in infant cries (Schuetze & Zeskind, 2001; Soltis, 2004; Zeskind & Marshall, 1988), and an increase in pitch in the voice is frequently associated with experience of distress and tense emotions as fear, anger (Sobin & Alpert, 1999).
Starting with Helmholtz (1877/1954) the role of beatings of adjacent partials has been one of the main explanatory factors for the perception of dissonance. Plomp and Levelt (1965) have further developed this theory introducing the concepts of critical bandwidth and of sensory consonance (Terhardt, 1978), thus distinguishing the consonance due to basic physical and physiological factors from the consonance in musical situations that is influenced by more high-order factors. The term roughness was then preferred over the expression sensory consonance because it could be applied also to amplitude modulated tones (Zwicker & Fastl, 1990). In our studies we have further tested the role of roughness, as measured according to the model of Sethares (2005), in the perception of valence, tension and movement introducing a comparison between musical intervals and standard noises. A standard noise, by definition, is composed of all frequencies in the range of acoustical perception (typically 20–20,000 Hz). The energetic content of each frequency is regulated by a mathematical function, and in our case, we choose five colored noises that differed in their emphasis on low- and high-pitch frequencies. Specifically, in brown and pink noises there was an emphasis on low-register frequencies; in blue and purple noise there was an emphasis on high-register frequencies. In white noise the energy amplitude was flat all over the frequency range. The roughness level in colored noises is strongly influenced by the content of high-pitch frequencies, reaching a peak in purple noise, whose roughness level was 12.20 times greater than that of the minor second built over C3, and of 37.98 times greater than that of the minor second built over G4. Nevertheless, the results showed that the rated tension for purple noise was lower than that attributed to the dissonant musical intervals of minor second, augmented fourth, and major seventh. The results of the linear mixed model that included both musical intervals and noises showed that roughness was not a significant predictor of perceived tension, while it was a significant predictor in the case of valence. It seems, therefore, that the relation between roughness and tension is not straightforward as in the case of pleasantness/unpleasantness. Low-register musical intervals, for example, had a significantly higher level of roughness in comparison to high-register intervals, but the perception of tension was opposite, with high-register intervals perceived as more tense than low-register intervals.
Considering musical intervals, tension was proportional to the level of dissonance of the interval. The intervals that were perceived as more tense were the seconds, seventh, and augmented fourth. The rank order of tension perception in intervals strictly mirrored the rank order of intervals by consonance and dissonance in the classic study of Malmberg (1918). A similar effect, but applied to single chords, was found also by Lahdelma and Eerola (2016a, 2016b). They found that perceived tension was very high for the Neapolitan pentachord, followed by the dominant seventh sharp eleventh chord. The lowest perceived tension was found for the major triad, which was the most consonant chord in their study. Tension for minor chords was higher than tension for major chords. Tension was also affected by the position of the chord, increasing linearly from the root position to the first and second inversion. In Lahdelma and Eerola (2016b), which was also focused on chords, they found a high correlation between tension and energy (.50). Augmented and diminished chords received the highest ratings for tension, followed by sevenths, and minor and major chords.
The comparison of the two crossmodal procedures used for the assessment of tension and movement in Experiments 2 and 3 showed a significant difference that was limited to the evaluation of tension. Specifically, the use of the proprioceptive device led systematically to tension evaluations that were lower than those obtained with the visual-analog scale. The distribution related to the proprioceptive device was positively skewed, while the distribution for the visual-analog scale was negatively skewed. Since in the first study the proprioceptive device was tested for a linear psychophysical relation between applied force and perceived force, the distribution that we have obtained in Experiments 2 and 3 could not be attributed to an intrinsic nonlinearity in force perception along the range of the device. The effect could be due to the fact that in the proprioceptive device the participant had to match perceived tension with a muscular tension. The response implied more effort and a more direct feedback of the increase in the response level, whereas in the visual analog scale the mapping of tension on the horizontal line could have been less steep.
The crossmodal procedure did not alter the pattern of perceived tension as a function of the stimuli presented in Experiments 2 and 3. The pattern remained substantially the same and was simply shifted between the two procedures. The results cannot favor one procedure over the other, but are important in showing that tension ratings are highly susceptible to the methodology used for collecting the data, and that a procedure of standardization would be preferable to relying on absolute values.
Roughness was a strong predictor in the perception of movement, but only for standard noises. Brown and pink noises—which shared a high spectral content of high frequencies—were perceived as inducing a higher sense of movement in comparison to white, pink, and brown noises. The linear mixed model analysis showed that the strongest predictors of movement were pitch register (movement perception was higher for high-register intervals) and the level of dissonance of the interval.
In conclusion, although related by a certain degree of commonality, the attributes of valence, tension, and movement applied to musical intervals and noises appear to have also distinctive properties that cannot be reduced to a single core explained by roughness level. For example, low-register intervals had a higher level of roughness, but they were perceived as more pleasant, less tense, and inducing a lower sense of movement than high-register intervals. Roughness was a significant predictor of valence and movement, but not for tension when including standard noises in the analysis. What was shared between the attributes of valence, tension, and movement was the role of register. Its influence was consistent between in the three domains. The intervals and noises that were perceived as more unpleasant were also perceived as more tense and inducing a high sense of movement. This has interesting implications especially for the status of tension in the theory of emotions, and emotions in music in particular. Specifically, these results question the complete orthogonality of tension, conceptualized as a component of arousal, in comparison to valence. Although it is certainly possible to conceive states of high tension in conjunction with positive valence due to high-order musical elements, for example, a crescendo and accelerando of a major mode melody, in the basic vocabulary of musical intervals and standard noises tension is solely associated with experience of negative valence, while positive valence is equated to a perception of relaxation and steadiness. This paper has also shown how the attributes of valence, tension, and movement applied to musical intervals can be applied also to unvoiced acoustical stimuli as noises. Standard noises are interesting research tools because physically they can be considered as supernormal stimuli of dissonant intervals. If this held true in the case of movement perception, in which their evaluation exceeded that of dissonant musical intervals, in the case of tension and valence some distinctive interesting properties emerged. Brown noise, for example, was evaluated as the calmest stimulus, and far less tense than the perfect consonant intervals, the unison and the octave. Similarly, for valence, brown noise had the highest ratings for pleasantness. The combined study of musical intervals and noises could significantly contribute to shed light on the processes that underlie the attribution of psychological qualities to sounds.
We would like to thank Julien Giovani for his help in collecting the data for Experiments 2 and 3.