CONCERTGOERS, CRITICS, TEACHERS, AND PERFORMERS are often called upon to cast judgment on the performances they hear. Research to date has typically focused on the judgments themselves, with very few empirical studies of the processes and decisions that lead to these judgments. This paper details an investigation of time-dependent characteristics of performance evaluation. Thirty-three participants were played five recordings of a Bach Prelude and five of a Chopin Prelude. They rated the quality of each performance continuously, by moving a mouse cursor on a 7-point scale displayed on a computer screen, and using written scales. The results suggest that: the time taken to reach an evaluative decision was typically short (around 15––20 s); there was a significant difference between the initial and final ratings, with a tendency for ratings to improve as the performances progressed; and the largest revisions of opinion took place within the first minute of the performance.
Much applied research into musical performance requires a method of quantifying differences and changes between performances; for this purpose, researchers have commonly used performance assessment schemes taken from educational contexts. This article considers some conceptual and practical problems with using judgments of performance quality as a research tool. To illustrate some of these, data are reported from a study in which three experienced evaluators watched performances given by students at the Royal College of Music, London, and assessed them according to a marking scheme based on that of the Associated Board of the Royal Schools of Music. Correlations between evaluators were only moderate, and some evidence of bias according to the evaluators' own instrumental experience was found. Strong positive correlations were found between items on the assessment scheme, indicating an extremely limited range of discrimination between categories. Implications for the use of similar assessment systems as dependent measures in empirical work are discussed, and suggestions are made for developing scales with greater utility in such work.