## Doctoral Dissertations

Dissertations that have an embargo placed on them will not be available to anyone until the embargo expires.

#### Author ORCID Identifier

https://orcid.org/0000-0002-5507-2904

#### Document Type

Open Access Dissertation

#### Degree Name

Doctor of Philosophy (PhD)

Computer Science

2019

May

Hanna Wallach

#### Subject Categories

Applied Statistics | Artificial Intelligence and Robotics | Categorical Data Analysis | International Relations | Probability | Statistical Methodology | Statistical Models

#### Abstract

Social science data often comes in the form of high-dimensional discrete data such as categorical survey responses, social interaction records, or text. These data sets exhibit high degrees of sparsity, missingness, overdispersion, and burstiness, all of which present challenges to traditional statistical modeling techniques. The framework of Poisson factorization (PF) has emerged in recent years as a natural way to model high-dimensional discrete data sets. This framework assumes that each observed count in a data set is a Poisson random variable $y ~ Pois(\mu)$ whose rate parameter $\mu$ is a function of shared model parameters. This thesis examines a specific subset of Poisson factorization models that constrain $\mu$ to be a multilinear function of shared model parameters. This subset of models---hereby referred to as allocative Poisson factorization (APF)---enjoys a significant computational advantage: posterior inference scales linearly with only the number of non-zero counts in the data set. A challenge to constructing and performing inference in APF models is that the multilinear constraint on $\mu$---which must be non-negative, by the definition of the Poisson distribution---means that the shared model parameters must themselves be non-negative. Constructing models that capture the complex dependency structures inherent to social processes---e.g., networks with overlapping communities of actors or bursty temporal dynamics---without relying on the analytic convenience and tractability of the Gaussian distribution requires novel constructions of non-negative distributions---e.g., gamma and Dirichlet---and innovative posterior inference techniques. This thesis presents the APF analogue to several widely-used models---i.e., CP decomposition (Chapter 3), Tucker decomposition (Chapter 4), and linear dynamical systems (Chapters 5 and 6) and shows how to perform Bayesian inference in APF models under local differential privacy (Chapter 7). Most of these chapters introduce novel auxiliary-variable augmentation schemes to facilitate posterior inference using both Markov chain Monte Carlo and variational inference algorithms. While the task of modeling international relations event data is a recurrent theme, the models presented are applicable to a wide range of tasks in many fields.

COinS