Loading...
Thumbnail Image
Publication

High-dimensional Feature Selection and Multi-level Causal Mediation Analysis with Applications to Human Aging and Cluster-based Intervention Studies

Abstract
Many questions in public health and medicine are fundamentally causal in that our objective is to learn the effect of some exposure, randomized or not, on an outcome of interest. As a result, causal inference frameworks and methodologies have gained interest as a promising tool to reliably answer scientific questions. However, the tasks of identifying and efficiently estimating causal effects from observed data still pose significant challenges under complex data generating scenarios. We focus on (1) high-dimensional settings where the number of variables is orders of magnitude higher than the number of observations; and (2) multi-level settings, where study participants are grouped into clusters and the exposure is assigned at the cluster level. First, we propose a novel adaptation of the Super Learner algorithm for the task of feature selection in high-dimensional settings. In simulations and with real data, we demonstrate that our proposed approach improves the accuracy for identifying potential causes of a target variable by using a novel measure of variable importance, and by combining a library of feature selection algorithms. Second, we consider the task of estimating ‘biological age’ from a set of age-dependent variables of potentially high dimensions (e.g., -omics). We propose a new method for calculating biological age that is based on an adaptation of the algorithm presented in chapter 2. Then, we develop an approach to evaluate, compare, and combine different approaches to biological age estimation with the goal of constructing age-related disease risk scores which could potentially aide in diagnosis and prognosis of age-related diseases. Third, we turn our attention to causal mediation analysis in a multi-level setting where the exposure is assigned at the cluster level, but the mediator and outcomes are measured at the participant level. We extend the general hierarchical causal model to include mediating variables. We adapt the mediation effects that arise from the population intervention effect (PIE) via stochastic interventions on the exposure to the multi-level setting.
Type
openaccess
article
dissertation
Date
Publisher
Rights
License
http://creativecommons.org/licenses/by/4.0/
Research Projects
Organizational Units
Journal Issue
Embargo
Publisher Version
Embedded videos
Collections