Off-campus UMass Amherst users: To download campus access dissertations, please use the following link to log into our proxy server with your UMass Amherst user name and password.

Non-UMass Amherst users: Please talk to your librarian about requesting this dissertation through interlibrary loan.

Dissertations that have an embargo placed on them will not be available to anyone until the embargo expires.

Author ORCID Identifier



Open Access Dissertation

Document Type


Degree Name

Doctor of Philosophy (PhD)

Degree Program

Computer Science

Year Degree Awarded


Month Degree Awarded


First Advisor

David Jensen

Second Advisor

Ramesh Sitaraman

Third Advisor

Benjamin Marlin

Fourth Advisor

Krista Gile

Subject Categories

Artificial Intelligence and Robotics


Estimating the causal effect of a treatment from data has been a key goal for a large number of studies in many domains. Traditionally, researchers use carefully designed randomized experiments for causal inference. However, such experiments can not only be costly in terms of time and money but also infeasible for some causal questions. To overcome these challenges, causal estimation methods from observational data have been developed by researchers from diverse disciplines and increasingly studies using such methods account for a large share in empirical work. Such growing interest has also brought together two arguably separate fields: machine learning and causal estimation, and this thesis also contributes to this intersection. Specifically, in observational data researchers have lack of control over the data generation process. This results in a fundamental challenge: the presence of confounder variables (i.e., variables that affect both treatment and outcome). Such variables, when not adjusted statistically, can result in biased causal estimates. When confounder variables are observed, many methods can be used to adjust for their effect. However, in most real world observational data sets, accurately measuring all potential confounder variables is far from feasible, hence important confounder variables are likely to remain unobserved. The central idea of this thesis is to explicitly account for unobserved confounders by inferring their values using a predictive model. This thesis presents three main contributions in the intersection of machine learning and causal estimation. First, we present one of the earliest application of causal estimation methods from social sciences to social media platforms to answer three causal questions. Second, we present a novel generative model for estimating ordinal variables with distant supervision. We also apply this model to data from US Twitter user population and discover variation in behavior among users from different age groups. Third, we characterize the behavior of an effect restoration model based on graphical models with theoretical analysis and simulation studies. We also apply this effect restoration model with predictive models to account for unobserved confounder variables.