Off-campus UMass Amherst users: To download dissertations, please use the following link to log into our proxy server with your UMass Amherst user name and password.

Non-UMass Amherst users, please click the view more button below to purchase a copy of this dissertation from Proquest.

(Some titles may also be available free of charge in our Open Access Dissertation Collection, so please check there first.)

Comparison of kernel equating and item response theory equating methods

Yu Meng, University of Massachusetts Amherst


The kernel method of test equating is a unified approach to test equating with some advantages over traditional equating methods. Therefore, it is important to evaluate in a comprehensive way the usefulness and appropriateness of the Kernel equating (KE) method, as well as its advantages and disadvantages compared with several popular item response theory (IRT) equating techniques. The purpose of this study was to evaluate the accuracy and stability of KE and IRT true score equating by manipulating several common factors that are known to influence the equating results. ^ Three equating methods (Kernel post-stratification equating, Stocking-Lord and Mean/Sigma) were compared with an established equating criterion. A wide variety of conditions were simulated to match realistic situations that reflected differences in sample size, anchor test length and, group ability differences. The systematic error and random error of equating were summarized with bias statistics and the standard error of equating (SEE), and compared across the methods. The overall better equating methods under specific conditions were recommended based on the root mean squared error (RMSE). ^ The equating results revealed that, and as expected, in general, equating error decreased as the number of anchor items was increased and sample size was increased across all the methods. Aside from method effects, group differences in ability produced the greatest impact on equating error in this particular study. The accuracy and stability of each equating method depended on the portion of the score scale range where comparisons were being made. ^ Overall, Kernel equating was shown to be more stable in most situations but not as accurate as IRT equating for the conditions studied. The interactions between pairs of factors investigated in this study seemed to be more influential and beneficial to IRT equating than for KE. Further practical recommendations were suggested for future study: for example, using alternate methods of data simulation to remove the advantage of the IRT equating methods.^

Subject Area

Educational tests & measurements

Recommended Citation

Meng, Yu, "Comparison of kernel equating and item response theory equating methods" (2012). Doctoral Dissertations Available from Proquest. AAI3518262.