Off-campus UMass Amherst users: To download dissertations, please use the following link to log into our proxy server with your UMass Amherst user name and password.

Non-UMass Amherst users, please click the view more button below to purchase a copy of this dissertation from Proquest.

(Some titles may also be available free of charge in our Open Access Dissertation Collection, so please check there first.)

Standard setting methods for complex licensure examinations

Mary Jean Pitoniak, University of Massachusetts Amherst


As the content and format of educational assessments evolve, the need for valid and workable standard setting methods grows as well. Although there are numerous standard setting methods available for multiple-choice items, there is a much smaller pool of methods from which to choose when constructed-response items or performance assessments are considered. In this study, four standard setting methods were evaluated. Two of the methods were used with the simulation component of a licensing examination, and two were used with the multiple-choice component. The two methods used with the simulations were the Work Classification method and the Analytic method. With the multiple-choice items, the Item Cluster method and Direct Consensus method were employed. The Item Cluster and Direct Consensus methods had each been the subject of research on two previous occasions, and the aims of the current study were to make modifications suggested by earlier findings and to seek replication of trends found earlier. The Work Classification and Analytic methods, while bearing some similarity to existing methods, are seen as new approaches specially configured to reflect the features of the simulations under consideration in the study. The results for each method were evaluated in terms of three sources of validity evidence—procedural, internal, and external—and the methods for each item type were contrasted to each other to assess their relative strengths and weaknesses. For the methods used with the simulations, the Analytic method has an advantage procedurally due to time factors, but panelists felt more positively about the Work Classification method. Internally, interrater reliability for the Analytic method was lower. Externally, the consistency of cut scores between methods was good in two of the three simulations; the larger difference on the third simulation may be explainable by other factors. For the methods used with the multiple-choice items, this study's findings support most of those found in earlier research. Procedurally, the Direct Consensus method is more efficient. Internally, there was less consistency across panels with the Direct Consensus method. Externally, the Direct Consensus method produced higher cut scores. Suggestions for future research for all four methods are given.

Subject Area

Psychological tests|Educational evaluation

Recommended Citation

Pitoniak, Mary Jean, "Standard setting methods for complex licensure examinations" (2003). Doctoral Dissertations Available from Proquest. AAI3078711.