Off-campus UMass Amherst users: To download dissertations, please use the following link to log into our proxy server with your UMass Amherst user name and password.

Non-UMass Amherst users, please click the view more button below to purchase a copy of this dissertation from Proquest.

(Some titles may also be available free of charge in our Open Access Dissertation Collection, so please check there first.)

The impact of judges' consensus on the accuracy of anchor-based judgmental estimates of multiple-choice test item difficulty: The case of the NATABOC Examination

Matthew DiBartolomeo, University of Massachusetts - Amherst

Abstract

Multiple factors have influenced testing agencies to more carefully consider the manner and frequency in which pretest item data are collected and analyzed. One potentially promising development is judges’ estimates of item difficulty. Accurate estimates of item difficulty may be used to reduce pretest samples sizes, supplement insufficient pretest sample sizes, aid in test form construction, assist in test form equating, calibrate test item writers who may be asked to produce items to meet statistical specifications, inform the process of standard setting, aid in preparing randomly equivalent blocks of pretest items, and/or aid in helping to set item response theory prior distributions. ^ Two groups of 11 and eight judges, respectively, provided estimates of difficulty for the same set of 33 multiple-choice items from the National Athletic Trainers’ Association Board of Certification (NATABOC) Examination. Judges were faculty in Commission on Accreditation of Athletic Training Education-approved athletic training education programs and were NATABOC-approved examiners of the former hands-on practical portion of the Examination. ^ For each item, judges provided two rounds of independent estimates of item difficulty and a third round group-level consensus estimate. Prior to providing estimates of item difficulty in rounds two and three, group discussion of the estimates provided in the preceding round was conducted. ^ In general, the judges’ estimates of test item difficulty did not improve across rounds as predicted. Two-way repeated measures analyses of variance comparing item set mean difficulty estimates by round and the item set mean empirical item difficulty revealed no statistically significant differences across rounds, groups, or the interaction of these two factors. Moreover, item set mean difficulty estimates by round gradually drifted away from the item set mean empirical item difficulty and, therefore, mean estimation bias and effect size analyses gradually increased in correspondence with the item set mean item difficulty estimates provided across rounds. Therefore, the results revealed that no item difficulty estimation round yielded statistically significantly better recovery of the empirical item difficulty values compared to the other rounds. ^

Subject Area

Education, Tests and Measurements

Recommended Citation

Matthew DiBartolomeo, "The impact of judges' consensus on the accuracy of anchor-based judgmental estimates of multiple-choice test item difficulty: The case of the NATABOC Examination" (January 1, 2010). Doctoral Dissertations Available from Proquest. Paper AAI3427517.
http://scholarworks.umass.edu/dissertations/AAI3427517

Share

COinS