Off-campus UMass Amherst users: To download campus access dissertations, please use the following link to log into our proxy server with your UMass Amherst user name and password.

Non-UMass Amherst users: Please talk to your librarian about requesting this dissertation through interlibrary loan.

Dissertations that have an embargo placed on them will not be available to anyone until the embargo expires.

Author ORCID Identifier


Open Access Dissertation

Document Type


Degree Name

Doctor of Philosophy (PhD)

Degree Program


Year Degree Awarded


Month Degree Awarded


First Advisor

Krista Gile

Subject Categories

Applied Statistics | Statistical Methodology | Statistical Models


Clustering Network Tree Data From Respondent-Driven Sampling With Application to Opioid Users in New York City

There is great interest in finding meaningful subgroups of attributed network data. There are many available methods for clustering complete network. Unfortunately, much network data is collected through sampling, and therefore incomplete. Respondent-driven sampling (RDS) is a widely used method for sampling hard-to-reach human populations based on tracing links in the underlying unobserved social network. The resulting data therefore have tree structure representing a sub-sample of the network, along with many nodal attributes. In this paper, we introduce an approach to adjust mixture models for general network clustering for samplings by RDS. We apply our model to data on opioid users in New York City, and detect communities reflecting group characteristics of interest for intervention activities, including drug use patterns, social connections and other community variables.

Nested Dirichlet Process for population size estimation from multi-list recapture data

Heterogeneity of response patterns is important in estimating the size of a closed population from multi-list recapture data when capture patterns are different over time and location. In this paper, we extend the one layer Dirichlet Process mixture model proposed by Manrique-Vallier (2016) to a Nested Dirichlet Process model with the first layer modeling individual heterogeneity and the second layer modeling location-time differences. In the Nested Dirichlet Process mixture model, location-time groups with similar recording patterns are in the same top layer latent class and individuals within it are dependent. The Nested Dirichlet Process mixture model incorporates hierarchical heterogeneity into the modeling to estimate population size from multi-list recapture data.

Bayesian non-parametric latent class model for population size estimation and missing covariate imputation in multi-source recapture data

In the Bayesian non-parametric latent class model for multi-list recapture data (LCMCR), different recording patterns across latent classes are used to reflect individual heterogeneity when covariates are not available. In this paper, we add covariates, assuming capture patterns and covariates are independent given the latent classes. In this way, individuals in each latent class are similar in capture patterns and also in covariate distributions. When they have strong association, individual attributes reduce uncertainly of the latent classes and thus uncertainly of the population size estimation. Comparing those latent classes, we can better understand how capture patterns relate with individual characteristics. Meanwhile, there are missing covariate values. We apply data augmentation to impute missing values during MCMC for parameter estimation.


Creative Commons License

Creative Commons Attribution 4.0 License
This work is licensed under a Creative Commons Attribution 4.0 License.