Propensity score applications are often used to evaluate educational program impact. However, various options are available to estimate both propensity scores and construct comparison groups. This study used a student achievement dataset with commonly available covariates to compare different propensity scoring estimation methods (logistic regression, boosted regression, and Bayesian logistic regression) in combination with different methods for constructing comparison groups (nearest-neighbor matching, optimal matching, weighting) relative to balancing pre-existing differences and recovering a simulated treatment effect in small samples. Results indicated that applied researchers evaluating program impact should first consider use of standard logistic regression methods with nearest-neighbor or optimal matching or boosted regression in combination with propensity score weighting. Advantages and disadvantages of the methods are discussed. Accessed 12,046 times on https://pareonline.net from November 05, 2013 to December 31, 2019. For downloads from January 1, 2020 forward, please click on the PlumX Metrics link to the right.
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.
Stone, Clement A. and Tang, Yun
"Comparing Propensity Score Methods in Balancing Covariates and Recovering Impact in Small Sample Educational Program Evaluations,"
Practical Assessment, Research, and Evaluation: Vol. 18
, Article 13.
Available at: https://scholarworks.umass.edu/pare/vol18/iss1/13