Document Type

Campus-Only Access for Five (5) Years

Embargo Period

2-1-2017

Degree Program

Public Health

Degree Type

Master of Science (M.S.)

Year Degree Awarded

2016

Month Degree Awarded

September

Advisor Name

Kenneth

Advisor Middle Initial

P

Advisor Last Name

Kleinman

Co-advisor Name

Brian

Co-advisor Middle Initial

W

Co-advisor Last Name

Whitcomb

Third Advisor Name

Nicholas

Third Advisor Middle Initial

G

Third Advisor Last Name

Reich

Abstract

Missing data are a common problem in virtually all epidemiological research, especially when conducting longitudinal studies. In these settings, clinicians may collect biological samples to analyze changes in biomarkers, which often do not conform to parametric distributions and may be censored due to limits of detection. Using complete data from the BioCycle Study (2005-2007), which followed 259 premenopausal women over two menstrual cycles, we compared four techniques for handling missing biomarker data with non-Normal distributions. We imposed increasing degrees of missing data on two non-Normally distributed biomarkers under conditions of missing completely at random, missing at random, and missing not at random. Generalized estimating equations were used to obtain estimates from complete case analysis, multiple imputation using joint modeling, multiple imputation using chained equations, and multiple imputation using chained equations and predictive mean matching on Day 2, Day 13 and Day 14 of a standardized 28-day menstrual cycle. Estimates were compared against those obtained from analysis of the completely observed biomarker data. All techniques performed comparably when applied to a Normally distributed biomarker. Multiple imputation using joint modeling and multiple imputation using chained equations produced similar estimates across all types and degrees of missingness for each biomarker. Multiple imputation using chained equations and predictive mean matching consistently deviated from both the complete data estimates and the other missing data techniques when applied to a biomarker with a bimodal distribution. When addressing missing biomarker data in longitudinal studies, special attention should be given to the underlying distribution of the missing variable. As biomarkers become increasingly Normal, the amount of missing data tolerable while still obtaining accurate estimates may also increase when data are missing at random. Future studies are necessary to assess these techniques under more elaborate missingness mechanisms and to explore interactions between biomarkers for improved imputation models.

First Advisor

Kenneth P. Kleinman

Second Advisor

Brian W. Whitcomb

Third Advisor

Nicholas G. Reich

Share

COinS