Off-campus UMass Amherst users: To download campus access dissertations, please use the following link to log into our proxy server with your UMass Amherst user name and password.
Non-UMass Amherst users: Please talk to your librarian about requesting this dissertation through interlibrary loan.
Dissertations that have an embargo placed on them will not be available to anyone until the embargo expires.
Author ORCID Identifier
0000-0003-0402-3771
AccessType
Open Access Dissertation
Document Type
dissertation
Degree Name
Doctor of Philosophy (PhD)
Degree Program
Public Health
Year Degree Awarded
2022
Month Degree Awarded
February
First Advisor
Raji Balasubramanian
Second Advisor
Laura B. Balzer
Third Advisor
Roberta De Vito
Fourth Advisor
Susan E. Hankinson
Fifth Advisor
Laura D. Kubzansky
Subject Categories
Biostatistics
Abstract
Gaussian graphical models (GGMs) are useful network estimation tools for modeling direct dependencies that characterize multivariate data. The GGM modeling framework is one way to elucidate complex systems-level properties that can be difficult to detect in univariate analyses. In this dissertation, we begin by presenting a tutorial and review of the current state of the field of GGM theory and application. Next, we present a motivating application of GGMs in a study of metabolomic networks associated with chronic distress in women in the Women's Health Initiative (WHI) and in the Nurses' Health Study cohorts. In the third chapter, we present a tool called SpiderLearner, a SuperLearner-based ensemble method for GGM estimation that utilizes a range of existing GGM estimation approaches together with K-fold cross-validation to optimize a likelihood-based loss function. We show via simulation that SpiderLearner performs as well as or better than each individual method and present an application to risk prediction in ovarian cancer genomic data. In the fourth chapter, we present a factor analysis-based method that we have developed to estimate direct dependencies (GGMs) that are shared across studies (or conditions) and those that are study-specific in settings of multi-study data. We apply this method to analyze maternal response to an oral glucose tolerance test as assessed by targeted metabolomic profiles collected in the Hyperglycemia and Adverse Pregnancy Outcomes (HAPO) study. We investigate differences in glucose metabolism across ancestry groups, constructing a GGM that is shared across four ancestry groups and a GGM specific to each of these.
DOI
https://doi.org/10.7275/25702304.0
Recommended Citation
Shutta, Katherine H., "Gaussian Graphical Models for Omics Data: New Methodology and Applications" (2022). Doctoral Dissertations. 2474.
https://doi.org/10.7275/25702304.0
https://scholarworks.umass.edu/dissertations_2/2474