Thumbnail Image

Evaluating Several Multidimensional Adaptive Testing Procedures For Diagnostic Assessment

This computer simulation study was designed to comprehensively investigate how formative test designs can capitalize on the dimensional structure among the proficiencies being measured in a test, item selection methods, and computerized adaptive testing to improve measurement precision and classification accuracy. Four variables were manipulated to investigate the effectiveness of multidimensional adaptive testing (MAT): Number of dimensions measured by the test, magnitude of the correlations among the dimensions, the item selection method, and the test design. Outcome measures included recovery of known proficiency scores, bias in estimation, and accuracy of proficiency classifications. Unlike previous MAT research, no significant effect was found on the outcome measures due to the number of dimensions. A moderate improvement in the outcome measures was found with higher correlations (e.g., .50 or .80) among the dimensions. Four different item selection methods--Bayesian, Fisher, optimal, and random--were applied to evaluate the measurement efficiency of adaptive item selection methods and non-adaptive methods. As a baseline, the findings from the item selection method using random selection were available. The Bayesian item selection method showed the best results under different conditions. The Fisher item selection method showed the second best results, but the gap among adaptive item selection methods was reduced with longer tests and higher correlations among the dimensions. The optimal item selection method produced comparable results to adaptive item selection methods, when the focus was on the accuracy of decision making which in many applications of diagnostic assessment is the most important criterion. The level of impact of increased test length with a fixed test length condition was apparent on all of the outcome measures. The results from the study suggest that the Bayesian item selection method can be quite useful when there are at least moderate correlations among the dimensions. As these results were obtained using a good estimate of the priors, in a next step, the impact of poor prior (i.e., inaccurate) information on the validity of the Bayesian approach (e.g., too high, too low, too tight) should be investigated. We note too the very good results obtained with optimal item selection when the focus was on accuracy of proficiency classifications.
Research Projects
Organizational Units
Journal Issue
Publisher Version
Embedded videos