Off-campus UMass Amherst users: To download campus access dissertations, please use the following link to log into our proxy server with your UMass Amherst user name and password.

Non-UMass Amherst users: Please talk to your librarian about requesting this dissertation through interlibrary loan.

Dissertations that have an embargo placed on them will not be available to anyone until the embargo expires.

Date of Award


Access Type

Campus Access

Document type


Degree Name

Doctor of Education (EdD)

Degree Program

Education; Research and Evaluation Methods

First Advisor

Ronald K. Hambleton

Second Advisor

Stephen G. Sireci

Third Advisor

Daeyoung Kim

Subject Categories

Educational Assessment, Evaluation, and Research


The kernel method of test equating is a unified approach to test equating with some advantages over traditional equating methods. Therefore, it is important to evaluate in a comprehensive way the usefulness and appropriateness of the Kernel equating (KE) method, as well as its advantages and disadvantages compared with several popular item response theory (IRT) equating techniques. The purpose of this study was to evaluate the accuracy and stability of KE and IRT true score equating by manipulating several common factors that are known to influence the equating results.

Three equating methods (Kernel post-stratification equating, Stocking-Lord and Mean/Sigma) were compared with an established equating criterion. A wide variety of conditions were simulated to match realistic situations that reflected differences in sample size, anchor test length and, group ability differences. The systematic error and random error of equating were summarized with bias statistics and the standard error of equating (SEE), and compared across the methods. The overall better equating methods under specific conditions were recommended based on the root mean squared error (RMSE).

The equating results revealed that, and as expected, in general, equating error decreased as the number of anchor items was increased and sample size was increased across all the methods. Aside from method effects, group differences in ability produced the greatest impact on equating error in this particular study. The accuracy and stability of each equating method depended on the portion of the score scale range where comparisons were being made.

Overall, Kernel equating was shown to be more stable in most situations but not as accurate as IRT equating for the conditions studied. The interactions between pairs of factors investigated in this study seemed to be more influential and beneficial to IRT equating than for KE. Further practical recommendations were suggested for future study: for example, using alternate methods of data simulation to remove the advantage of the IRT equating methods.