Off-campus UMass Amherst users: To download campus access dissertations, please use the following link to log into our proxy server with your UMass Amherst user name and password.

Non-UMass Amherst users: Please talk to your librarian about requesting this dissertation through interlibrary loan.

Dissertations that have an embargo placed on them will not be available to anyone until the embargo expires.

Author ORCID Identifier



Open Access Dissertation

Document Type


Degree Name

Doctor of Education (EdD)

Degree Program


Year Degree Awarded


Month Degree Awarded


First Advisor

Ronald K. Hambleton

Second Advisor

Jennifer Randall

Subject Categories

Educational Assessment, Evaluation, and Research


Test scores are usually equated only at the total score level. If a test mainly measures a single trait, indicating that the test is essentially unidimensional, equating at the total score level could be the best choice. However, when a test is composed of subtests having negligible relationships among them, separate equating for each subtest offers the best choice. Given a moderate amount of correlations among the subtests, performing individual equating for each subtest may be misleading in that it ignores the relationship of the subtests. This study applied and compared several possible subtest score equating methods based on classical test theory and item response theory examining some important factors including correlations among dimensions, different proficiency distributions with skewness or mean shifts, and the number of items and common items. Based on the methods from a classical test theory perspective, the results showed that when the correlations among dimensions were high, using either the total or anchor total score as the anchor could produce better equating results than using the anchor score from each subtest. Among the different input scores for equating—observed scores, weighted averages, and augmented scores—using augmented scores yielded slightly less equating error than the other two methods. Under the item response theory framework, concurrent calibration and separate calibration as well as unidimensional IRT equating and the unidimensional approximation method using multidimensional IRT parameters were applied. The unidimensional approximation method did not perform well compared to unidimensional IRT methods. The proficiency distribution with relatively high skewness or mean shifts yielded the largest equating errors compared to other distributions. Further study is recommended: using more complex models, rather than a simple structure model, to simulate item responses, as well as using direct multidimensional IRT equating rather than the two steps of the unidimensional approximation method and unidimensional IRT equating.