Thumbnail Image

A Comparison of Techniques for Handling Missing Data in Longitudinal Studies

Missing data are a common problem in virtually all epidemiological research, especially when conducting longitudinal studies. In these settings, clinicians may collect biological samples to analyze changes in biomarkers, which often do not conform to parametric distributions and may be censored due to limits of detection. Using complete data from the BioCycle Study (2005-2007), which followed 259 premenopausal women over two menstrual cycles, we compared four techniques for handling missing biomarker data with non-Normal distributions. We imposed increasing degrees of missing data on two non-Normally distributed biomarkers under conditions of missing completely at random, missing at random, and missing not at random. Generalized estimating equations were used to obtain estimates from complete case analysis, multiple imputation using joint modeling, multiple imputation using chained equations, and multiple imputation using chained equations and predictive mean matching on Day 2, Day 13 and Day 14 of a standardized 28-day menstrual cycle. Estimates were compared against those obtained from analysis of the completely observed biomarker data. All techniques performed comparably when applied to a Normally distributed biomarker. Multiple imputation using joint modeling and multiple imputation using chained equations produced similar estimates across all types and degrees of missingness for each biomarker. Multiple imputation using chained equations and predictive mean matching consistently deviated from both the complete data estimates and the other missing data techniques when applied to a biomarker with a bimodal distribution. When addressing missing biomarker data in longitudinal studies, special attention should be given to the underlying distribution of the missing variable. As biomarkers become increasingly Normal, the amount of missing data tolerable while still obtaining accurate estimates may also increase when data are missing at random. Future studies are necessary to assess these techniques under more elaborate missingness mechanisms and to explore interactions between biomarkers for improved imputation models.