Off-campus UMass Amherst users: To download campus access dissertations, please use the following link to log into our proxy server with your UMass Amherst user name and password.
Non-UMass Amherst users: Please talk to your librarian about requesting this dissertation through interlibrary loan.
Dissertations that have an embargo placed on them will not be available to anyone until the embargo expires.
Author ORCID Identifier
AccessType
Open Access Dissertation
Document Type
dissertation
Degree Name
Doctor of Philosophy (PhD)
Degree Program
Computer Science
Year Degree Awarded
2019
Month Degree Awarded
May
First Advisor
Hanna Wallach
Subject Categories
Applied Statistics | Artificial Intelligence and Robotics | Categorical Data Analysis | International Relations | Probability | Statistical Methodology | Statistical Models
Abstract
Social science data often comes in the form of high-dimensional discrete data such as categorical survey responses, social interaction records, or text. These data sets exhibit high degrees of sparsity, missingness, overdispersion, and burstiness, all of which present challenges to traditional statistical modeling techniques. The framework of Poisson factorization (PF) has emerged in recent years as a natural way to model high-dimensional discrete data sets. This framework assumes that each observed count in a data set is a Poisson random variable $y ~ Pois(\mu)$ whose rate parameter $\mu$ is a function of shared model parameters. This thesis examines a specific subset of Poisson factorization models that constrain $\mu$ to be a multilinear function of shared model parameters. This subset of models---hereby referred to as allocative Poisson factorization (APF)---enjoys a significant computational advantage: posterior inference scales linearly with only the number of non-zero counts in the data set. A challenge to constructing and performing inference in APF models is that the multilinear constraint on $\mu$---which must be non-negative, by the definition of the Poisson distribution---means that the shared model parameters must themselves be non-negative. Constructing models that capture the complex dependency structures inherent to social processes---e.g., networks with overlapping communities of actors or bursty temporal dynamics---without relying on the analytic convenience and tractability of the Gaussian distribution requires novel constructions of non-negative distributions---e.g., gamma and Dirichlet---and innovative posterior inference techniques. This thesis presents the APF analogue to several widely-used models---i.e., CP decomposition (Chapter 3), Tucker decomposition (Chapter 4), and linear dynamical systems (Chapters 5 and 6) and shows how to perform Bayesian inference in APF models under local differential privacy (Chapter 7). Most of these chapters introduce novel auxiliary-variable augmentation schemes to facilitate posterior inference using both Markov chain Monte Carlo and variational inference algorithms. While the task of modeling international relations event data is a recurrent theme, the models presented are applicable to a wide range of tasks in many fields.
DOI
https://doi.org/10.7275/14228535
Recommended Citation
Schein, Aaron, "Allocative Poisson Factorization for Computational Social Science" (2019). Doctoral Dissertations. 1656.
https://doi.org/10.7275/14228535
https://scholarworks.umass.edu/dissertations_2/1656
Included in
Applied Statistics Commons, Artificial Intelligence and Robotics Commons, Categorical Data Analysis Commons, International Relations Commons, Probability Commons, Statistical Methodology Commons, Statistical Models Commons