The use of raters as a methodological tool to detect significant differences in performances and as a means to evaluate music performance achievement is a solidly defended practice in music psychology, education, and performance science research. However, psychometric concerns exist in raters’ precision in the use of task-specific scoring criteria. A methodology for managing rater quality in rater-mediated assessment practices has not been systematically developed in the field of music. The purpose of this study was to examine rater precision through the analysis of rating scale category structure across a set of raters and items within the context of large-group music performance assessment using a Multifaceted Rasch Partial Credit (MFR-PC) Measurement Model. Allowing for separate parameterization estimation of the rating scale for each rater can more clearly detect variability in rater judgment and improve model-data fit, thereby enhancing objectivity, fairness, and precision of rating quality in the music assessment process. Expert judges (N = 23) rated a set of four recordings by middle school, high school, collegiate, and professional jazz big bands. A single common expert rater evaluated all 24 jazz ensemble performances. The data suggest that raters significantly vary in severity, items significantly vary in difficulty, and rating scale category structure significantly varies across raters. Implications for the improvement and management of rater quality in music performance assessment are provided.

This content is only available via PDF.
You do not currently have access to this content.