Thumbnail Image

Measuring Teacher Effectiveness Using Students' Test Scores

Comparisons within states of school performance or student growth, as well as teacher effectiveness, have become commonplace. Since the advent of the Growth Model Pilot Program in 2005 many states have adopted growth models for both evaluative (to measure teacher performance or for accountability) and formative (to guide instructional practice, curricular or programmatic choices) purposes. Growth model data, as applied to school accountability and teacher evaluation, is generally used as a mechanism to determine whether teachers and schools are functioning to move students toward curricular proficiency and mastery. Teacher evaluation based on growth data is an increasingly popular practice in the states, and the introduction of cross-state assessment consortia in 2014 will introduce data that could support this approach to teacher evaluation on a larger scale. For the first time, students in consortium member states will be taking shared assessments and being held accountable for shared curricular standards - setting the stage to quantify and compare teacher effectiveness based on student test scores across states. States' voluntary adoption of the Common Core State Standards and participation in assessment consortia speaks to a new level of support for collaboration in the interest of improved student achievement. The possibility of using these data to build effectiveness and growth models that cross state lines is appealing, as states and schools might be interested in demonstrating their progress toward full student proficiency based on the CCSS. By utilizing consortium assessment data in place of within-state assessment data for teacher evaluation, it would be possible to describe the performance of one state's teachers in reference to the performance of their own students, teachers in other states, and the consortium as a whole. In order to examine what might happen if states adopt a cross-state evaluation model, the consistency of teacher effectiveness rankings based on the Student Growth Percentile (or SGP) model and a value added model are compared for teachers in two states, Massachusetts and Washington D.C., both members of the Partnership for Assessment of Readiness for College and Career (PARCC) assessment consortium The teachers will be first evaluated based on their students within their state, and again when that state is situated within a sample representing students in the other member states. The purpose of the current study is to explore the reliability of teacher effectiveness classifications, as well as the validity of inferences made from student test scores to guide teacher evaluation. The results indicate that two of the models currently in use, SGPs and a covariate adjusted value added model, do not provide particularly reliable results in estimating teacher effectiveness with more than half of the teacher being inconsistently classified in the consortium setting. The validity of the model inferences is also called into question as neither model demonstrates a strong correlation with student test score change as estimated by a value table. The results are outlined and discussed in relation to each model's reliability and validity, along with the implications for the use of these models in making high-stakes decisions about teacher performance.
Research Projects
Organizational Units
Journal Issue
Publisher Version
Embedded videos