Off-campus UMass Amherst users: To download campus access dissertations, please use the following link to log into our proxy server with your UMass Amherst user name and password.

Non-UMass Amherst users: Please talk to your librarian about requesting this dissertation through interlibrary loan.

Dissertations that have an embargo placed on them will not be available to anyone until the embargo expires.

Author ORCID Identifier



Open Access Dissertation

Document Type


Degree Name

Doctor of Philosophy (PhD)

Degree Program


Year Degree Awarded


Month Degree Awarded


First Advisor

Krista Gile

Second Advisor

Michael Lavine

Third Advisor

John Staudenmayer

Fourth Advisor

Leontine Alkema

Subject Categories

Biostatistics | Design of Experiments and Sample Surveys | Statistical Methodology | Statistical Models | Statistical Theory


The objective of this thesis is to develop methods to make inference about the prevalence of an outcome of interest in hard-to-reach populations. The proposed methods address issues specific to the survey strategies employed to access those populations.

One of the common sampling methodology used in this context is respondent-driven sampling (RDS). Under RDS, the network connecting members of the target population is used to uncover the hidden members. Specialized techniques are then used to make inference from the data collected in this fashion. Our first objective is to correct traditional RDS prevalence estimators and their associated uncertainty estimators for misclassification of the outcome variable.

RDS also has the unusual characteristic that the participants are driving the sampling process by recruiting members into the survey. Since the researchers forfeit their control over the sampling process, the estimators are therefore susceptible to a great extent to participants' behavioral induced biases. Our second objective is therefore to provide a mathematical parametrization for a behavior referred to as differential recruitment and subsequently adjust the inference for potential induced bias.

Finally, a common issue encountered in the application motivating this thesis, that is, HIV prevalence estimation, is the derivation of a national prevalence estimate. Data are often collected at different study sites within a given country. Public health officials however commonly report national prevalence. Therefore, our last objective consists of using Bayesian hierarchical models to derive a national prevalence estimator from regional data.