Off-campus UMass Amherst users: To download campus access dissertations, please use the following link to log into our proxy server with your UMass Amherst user name and password.
Non-UMass Amherst users: Please talk to your librarian about requesting this dissertation through interlibrary loan.
Dissertations that have an embargo placed on them will not be available to anyone until the embargo expires.
Author ORCID Identifier
https://orcid.org/0000-0002-6994-0557
AccessType
Open Access Dissertation
Document Type
dissertation
Degree Name
Doctor of Philosophy (PhD)
Degree Program
Public Health
Year Degree Awarded
2021
Month Degree Awarded
September
First Advisor
Laura B. Balzer
Second Advisor
Raji Balasubramanian
Third Advisor
Iván Díaz
Subject Categories
Biostatistics | Data Science | Environmental Public Health | Epidemiology
Abstract
Many questions in public health and medicine are fundamentally causal in that our objective is to learn the effect of some exposure, randomized or not, on an outcome of interest. As a result, causal inference frameworks and methodologies have gained interest as a promising tool to reliably answer scientific questions. However, the tasks of identifying and efficiently estimating causal effects from observed data still pose significant challenges under complex data generating scenarios. We focus on (1) high-dimensional settings where the number of variables is orders of magnitude higher than the number of observations; and (2) multi-level settings, where study participants are grouped into clusters and the exposure is assigned at the cluster level. First, we propose a novel adaptation of the Super Learner algorithm for the task of feature selection in high-dimensional settings. In simulations and with real data, we demonstrate that our proposed approach improves the accuracy for identifying potential causes of a target variable by using a novel measure of variable importance, and by combining a library of feature selection algorithms. Second, we consider the task of estimating ‘biological age’ from a set of age-dependent variables of potentially high dimensions (e.g., -omics). We propose a new method for calculating biological age that is based on an adaptation of the algorithm presented in chapter 2. Then, we develop an approach to evaluate, compare, and combine different approaches to biological age estimation with the goal of constructing age-related disease risk scores which could potentially aide in diagnosis and prognosis of age-related diseases. Third, we turn our attention to causal mediation analysis in a multi-level setting where the exposure is assigned at the cluster level, but the mediator and outcomes are measured at the participant level. We extend the general hierarchical causal model to include mediating variables. We adapt the mediation effects that arise from the population intervention effect (PIE) via stochastic interventions on the exposure to the multi-level setting.
DOI
https://doi.org/10.7275/24318267
Recommended Citation
Saddiki, Hachem, "High-dimensional Feature Selection and Multi-level Causal Mediation Analysis with Applications to Human Aging and Cluster-based Intervention Studies" (2021). Doctoral Dissertations. 2327.
https://doi.org/10.7275/24318267
https://scholarworks.umass.edu/dissertations_2/2327
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.
Included in
Biostatistics Commons, Data Science Commons, Environmental Public Health Commons, Epidemiology Commons