Thumbnail Image

Detecting Candidate Preknowledge of Items Using A Predictive Checking Method

In on-demand high-stakes testing programs such as GRE and TOEFL, some items are repeatedly used across test administrations to reduce the cost of developing new items constantly. Item exposure provides an opportunity for examinees to have knowledge of particular test items in advance of their administration. It poses a threat to test security and ultimately will result in invalid test scores. Therefore, many testing programs conduct quality control to monitor test compromise at individual and/or group level. A predictive checking method is proposed in this study to detect examinee preknowledge on exposed items. We consider a scenario where a test can be divided into two subsets of items: one consisting of secure items with very low exposure rates and the other consisting of possibly compromised items (i.e. unsecure items) which have been exposed for a while. An examinee’s proficiency distribution is first obtained from secure items and then the predictive distribution for the examinee’s test scores on the unsecure items is constructed. The extent of test compromise is determined by comparing an individual’s observed score on the unsecure items with the predictive distribution. To evaluate the effectiveness of this approach, three studies are conducted: the first study investigates the statistical properties (i.e. type-I error and power) of this method under four factors through Monte Carlo simulation; the second study applies this method to two simulated test compromise situations that are likely to happen in practice, and compares this method to three other detection approaches; the third study applies this method to a real dataset to demonstrate its practice use. Findings from the simulation studies suggest that the predictive checking method is effective in detecting examinees’ preknowledge in the unsecure subset given a moderate to large test compromise rate, while maintaining its type-I error close to or lower than the nominal level. It also demonstrates similar or better performance than the other approaches under investigation. These results have implications for conducting quality control at individual examinee level in an on-demand testing program.