Much applied research into musical performance requires a method of quantifying differences and changes between performances; for this purpose, researchers have commonly used performance assessment schemes taken from educational contexts. This article considers some conceptual and practical problems with using judgments of performance quality as a research tool. To illustrate some of these, data are reported from a study in which three experienced evaluators watched performances given by students at the Royal College of Music, London, and assessed them according to a marking scheme based on that of the Associated Board of the Royal Schools of Music. Correlations between evaluators were only moderate, and some evidence of bias according to the evaluators' own instrumental experience was found. Strong positive correlations were found between items on the assessment scheme, indicating an extremely limited range of discrimination between categories. Implications for the use of similar assessment systems as dependent measures in empirical work are discussed, and suggestions are made for developing scales with greater utility in such work.

