Date of Award

9-2010

Document Type

Open Access Dissertation

Degree Name

Doctor of Education (EdD)

Degree Program

Education (also CAGS)

First Advisor

Lisa A. Keller

Second Advisor

Craig S. Wells

Third Advisor

George R. Milne

Keywords

classification accuracy, equating, item parameter drift, matrix-sampling design, opportunity to learn

Subject Categories

Education

Abstract

The presence of outlying anchor items is an issue faced by many testing agencies. The decision to retain or remove an item is a difficult one, especially when the content representation of the anchor set becomes questionable by item removal decisions. Additionally, the reason for the aberrancy is not always clear, and if the performance of the item has changed due to improvements in instruction, then removing the anchor item may not be appropriate and might produce misleading conclusions about the proficiency of the examinees. This study is conducted in two parts consisting of both a simulation and empirical data analysis. In these studies, the effect on examinee classification was investigated when the decision was made to remove or retain aberrant anchor items. Three methods of detection were explored; (1) delta plot, (2) IRT b-parameter plots, and (3) the RPU method. In the simulation study, degree of aberrancy was manipulated as well as the ability distribution of examinees and five aberrant item schemes were employed. In the empirical data analysis, archived statewide science achievement data that was suspected to possess differential opportunity to learn between administrations was re-analyzed using the various item parameter drift detection methods. The results for both the simulation and empirical data study provide support for eliminating the use of flagged items for linking assessments when a matrix-sampling design is used and a large number of items are used within that anchor. While neither the delta nor the IRT b-parameter plot methods produced results that would overwhelmingly support their use, it is recommended that both methods be employed in practice until further research is conducted for alternative methods, such as the RPU method since classification accuracy increases when such methods are employed and items are removed and most often, growth is not misrepresented by doing so.

Share

COinS