Off-campus UMass Amherst users: To download campus access dissertations, please use the following link to log into our proxy server with your UMass Amherst user name and password.

Non-UMass Amherst users: Please talk to your librarian about requesting this dissertation through interlibrary loan.

Dissertations that have an embargo placed on them will not be available to anyone until the embargo expires.

Author ORCID Identifier

https://orcid.org/0000-0002-5507-2904

AccessType

Open Access Dissertation

Document Type

dissertation

Degree Name

Doctor of Philosophy (PhD)

Degree Program

Computer Science

Year Degree Awarded

2019

Month Degree Awarded

May

First Advisor

Hanna Wallach

Subject Categories

Applied Statistics | Artificial Intelligence and Robotics | Categorical Data Analysis | International Relations | Probability | Statistical Methodology | Statistical Models

Abstract

Social science data often comes in the form of high-dimensional discrete data such as categorical survey responses, social interaction records, or text. These data sets exhibit high degrees of sparsity, missingness, overdispersion, and burstiness, all of which present challenges to traditional statistical modeling techniques. The framework of Poisson factorization (PF) has emerged in recent years as a natural way to model high-dimensional discrete data sets. This framework assumes that each observed count in a data set is a Poisson random variable $y ~ Pois(\mu)$ whose rate parameter $\mu$ is a function of shared model parameters. This thesis examines a specific subset of Poisson factorization models that constrain $\mu$ to be a multilinear function of shared model parameters. This subset of models---hereby referred to as allocative Poisson factorization (APF)---enjoys a significant computational advantage: posterior inference scales linearly with only the number of non-zero counts in the data set. A challenge to constructing and performing inference in APF models is that the multilinear constraint on $\mu$---which must be non-negative, by the definition of the Poisson distribution---means that the shared model parameters must themselves be non-negative. Constructing models that capture the complex dependency structures inherent to social processes---e.g., networks with overlapping communities of actors or bursty temporal dynamics---without relying on the analytic convenience and tractability of the Gaussian distribution requires novel constructions of non-negative distributions---e.g., gamma and Dirichlet---and innovative posterior inference techniques. This thesis presents the APF analogue to several widely-used models---i.e., CP decomposition (Chapter 3), Tucker decomposition (Chapter 4), and linear dynamical systems (Chapters 5 and 6) and shows how to perform Bayesian inference in APF models under local differential privacy (Chapter 7). Most of these chapters introduce novel auxiliary-variable augmentation schemes to facilitate posterior inference using both Markov chain Monte Carlo and variational inference algorithms. While the task of modeling international relations event data is a recurrent theme, the models presented are applicable to a wide range of tasks in many fields.

DOI

https://doi.org/10.7275/14228535

Share

COinS