Off-campus UMass Amherst users: To download dissertations, please use the following link to log into our proxy server with your UMass Amherst user name and password.

Non-UMass Amherst users, please click the view more button below to purchase a copy of this dissertation from Proquest.

(Some titles may also be available free of charge in our Open Access Dissertation Collection, so please check there first.)

Investigation of the validity of the Angoff standard setting procedure for multiple -choice items

John D Mattar, University of Massachusetts Amherst


Setting passing standards is one of the major challenges in the implementation of valid assessments for high-stakes decision making in testing situations such as licensing and certification. If high stakes pass-fail decisions are to be made from test scores, the passing standards must be valid for the assessment itself to be valid. Multiple-choice test items continue to play an important role in measurement. The Angoff (1971) procedure continues to be widely used to set standards on multiple-choice examinations. This study focuses on the internal consistency, or underlying validity, of Angoff standard setting ratings. The Angoff procedure requires judges to estimate the proportion of borderline candidates who would answer each test question correctly. If the judges are successful at estimating the difficulty of items for borderline candidates that suggests an underlying validity to the procedure. This study examines this question by evaluating the relationships among Angoff standard setting ratings and actual candidate performance from professional certification tests. For each test, a borderline group of candidates was defined as those near the cutscore. The analyses focus on three aspects of judges' ratings with respect to item difficulties for the borderline group: accuracy, correlation and variability. The results of this study demonstrate some evidence for the validity of the Angoff standard setting procedure. For two of the three examinations studied, judges were accurate and consistent in rating the difficult of items for borderline candidates. However, the study also shows that the procedure may be less successful in its application. These results indicate that the procedure can be valid, but that its validity should be checked for each application. Practitioners should not assume that the Angoff method is valid. The results of this study also show some limitations to the procedure even when the overall results are positive. Judges are less successful at rating very difficult or very easy test items. The validity of the Angoff procedure may be enhanced by further study of methods designed to ameliorate those limitations.

Subject Area

Educational evaluation

Recommended Citation

Mattar, John D, "Investigation of the validity of the Angoff standard setting procedure for multiple -choice items" (2000). Doctoral Dissertations Available from Proquest. AAI9988820.