Date of Award


Document Type

Open Access Dissertation

Degree Name

Doctor of Philosophy (PhD)

Degree Program


First Advisor

Anna Liu

Second Advisor

Daeyoung Kim

Third Advisor

John Staudenmayer

Subject Categories

Mathematics | Statistics and Probability


The first part of my thesis is concerned with estimation for longitudinal data using generalized semi-parametric mixed models and multilevel generalized linear mixed models for a binary response. Likelihood based inferences are hindered by the lack of a closed form representation. Consequently, various integration approaches have been proposed. We propose a spherical radial integration based approach that takes advantage of the hierarchical structure of the data, which we call the 2 SR method. Compared to Pinheiro and Chao's multilevel Adaptive Gaussian quadrature, our proposed method has an improved time complexity with the number of functional evaluations scaling linearly in the number of subjects and in the dimension of random effects per level. Simulation studies show that our approach has similar to better accuracy compared to Gauss Hermite Quadrature (GHQ) and has better accuracy compared to PQL especially in the variance components. The second part of my thesis is concerned with identifying differentially expressed gene pathways/gene sets. We propose a logistic kernel machine to model the gene pathway effect with a binary response. Kernel machines were chosen since they account for gene interactions and clinical covariates. Furthermore, we established a connection between our logistic kernel machine with GLMMs allowing us to use ideas from the GLMM literature. For estimation and testing, we adopted Clarkson's spherical radial approach to perform the high dimensional integrations. For estimation, our performance in simulation studies is comparable to better than Bayesian approaches at a much lower computational cost. As for testing of the genetic pathway effect, our REML likelihood ratio test has increased power compared to a score test for simulated non-linear pathways. Additionally, our approach has three main advantages over previous methodologies: 1) our testing approach is self-contained rather than competitive, 2) our kernel machine approach can model complex pathway effects and gene-gene interactions, and 3) we test for the pathway effect adjusting for clinical covariates. Motivation for our work is the analysis of an Acute Lymphocytic Leukemia data set where we test for the genetic pathway effect and provide confidence intervals for the fixed effects.