Off-campus UMass Amherst users: To download campus access dissertations, please use the following link to log into our proxy server with your UMass Amherst user name and password.

Non-UMass Amherst users: Please talk to your librarian about requesting this dissertation through interlibrary loan.

Dissertations that have an embargo placed on them will not be available to anyone until the embargo expires.

Author ORCID Identifier

N/A

AccessType

Open Access Dissertation

Document Type

dissertation

Degree Name

Doctor of Philosophy (PhD)

Degree Program

Education

Year Degree Awarded

2018

Month Degree Awarded

September

First Advisor

Lisa Keller

Second Advisor

Craig Wells

Third Advisor

Elizabeth Harvey

Fourth Advisor

Richard J. Patz

Subject Categories

Educational Assessment, Evaluation, and Research

Abstract

Many studies have examined the quality of automated raters, but none have focused on the potential effects of systematic rater error on the psychometric properties of test scores. This simulation study examines the comparability of test scores under multiple rater bias and variability conditions, and addresses questions of their effects on test equating solutions. Effects are characterized by a comparison of equated and observed raw scores and estimates of examinee ability across the bias and variability scenarios. Findings suggest that the presence of, and changes in, rater bias and variability affect the equivalence of total raw scores, particularly at higher and lower ends of the score scale. The effects are shown to be larger where variability levels are higher, and, generally, where more constructed response items are used in the equating. Preliminary findings also suggest that consistently higher rater variability may have a slightly larger negative impact on the comparability of scores than does reducing rater bias and variability under the conditions examined here. Finally, a non-equivalent groups anchor test (NEAT) equating design may be slightly more robust to changes in rater bias and variability than a single group equating design for the bias scenarios investigated.

DOI

https://doi.org/10.7275/12363514

Share

COinS