Off-campus UMass Amherst users: To download campus access dissertations, please use the following link to log into our proxy server with your UMass Amherst user name and password.

Non-UMass Amherst users: Please talk to your librarian about requesting this dissertation through interlibrary loan.

Dissertations that have an embargo placed on them will not be available to anyone until the embargo expires.

Author ORCID Identifier



Open Access Dissertation

Document Type


Degree Name

Doctor of Philosophy (PhD)

Degree Program

Public Health

Year Degree Awarded


Month Degree Awarded


First Advisor

Raji Balasubramanian

Second Advisor

Andrea S. Foulkes

Third Advisor

Mahlet G. Tadesse

Subject Categories



Interval censored time to event outcomes arise when a silent event of interest is known to have occurred within a specific time period, determined by the times of the last negative and first positive diagnostic tests. The four chapters comprising this thesis are tied together by a common theme in that the outcome of interest is an interval censored time to event random variable. In Chapter 1, we describe a stratified Weibull model appropriate for interval cen- sored outcomes and implement a new R package straweib. We compare the proposed approach with the log-linear form of the Weibull regression model that is currently im- plemented in the existing R package survival, and illustrate its use by analyzing data from a longitudinal oral health study on the timing of the emergence of permanent teeth in 4430 children. In Chapter 2, we present methods to estimate the association of one or more covariates with an error-prone, self reported time to event outcome. We present simulation studies to assess the effect of error in self reported outcomes with regard to bias in the estimation of the regression parameter of interest. We apply the proposed methods to the data from Women’s Health Initiative (WHI) to evaluate the effect of statin use with respect to incident diabetes risk. In Chapter 3, we develop tools to calculate power and sample size for studies in which data from sequentially administered, error-prone, laboratory-based diagnostic tests or self-reported questionnaires are collected to determine the occurrence of a silent event. We evaluate the effects of the characteristics of the imperfect diagnos- tic test on resulting power and sample size calculations. We compare the relative efficiency of various study designs in the context of error-prone outcomes. In Chapter 4, we propose a lasso and a Bayesian variable selection approach in the context of error-prone self reported outcomes to address the problem of vari- able selection in high dimensional data settings. We perform simulation studies to compare prediction performance of proposed methods and naive methods that ignore measurement error. We apply our proposed methods to the genome-wide association study data from the WHI to select biomarkers associated with diabetes.


Included in

Biostatistics Commons