Thumbnail Image

Statistical Methods to Accommodate Censored Covariates in Regression Analysis

The problem of censored covariates arises frequently in family history studies, in which an outcome of interest is regressed on the age of onset, as well as in longitudinal cohort studies, in which biomarkers may be measured post-baseline. The use of censored covariates without any adjustment is well known to lead to biased estimates of coefficients of interest and inflated type I error. However, random censoring is more complicated than fixed censoring because of the within-subject variation. We propose three statistical methods to accommodate randomly censored covariates in regression analyses. The first method is an expectation-maximization (EM) type algorithm to estimate the regression coefficients of censored covariates. The proposed EM type algorithm is semi-parametric and avoids the misspecification of the parametric distributional assumption. The second method is a robust multiple imputation using fully conditional specification. It may accommodate to the substantive model in order to use information not only from the imputation model, but also from the substantive model to improve the estimation accuracy. In addition, the fully conditional specification can ensure the compatibility of the imputation and substantive models to avoid model misspecification. The third method is a threshold regression accommodating censored covariates under the logistic model framework. It is an extension of the threshold regression under the linear model framework, and uses a latent variable setup for binary outcomes. We evaluate the finite sample performance of the three methods in comprehensive simulation studies, and compare them to the complete-case analysis and other available methods. The simulation results suggest satisfactory finite sample performance of the proposed methods, and their advantages over existing methods in certain scenarios. We also apply the proposed methods to an Alzheimer's disease study, in which the covariate of interest, i.e., the maternal onset age of dementia, is right-censored by the last time that the patients are known to be dementia-free by their offspring. Moreover, a dataset from a neuropathology study has been analyzed. The censored covariate is the duration of disease, which is right-censored by the global clinical dementia rate (CDR) at the first clinical visit.