Thumbnail Image

Inference from network data in hard-to-reach populations

The objective of this thesis is to develop methods to make inference about the prevalence of an outcome of interest in hard-to-reach populations. The proposed methods address issues specific to the survey strategies employed to access those populations. One of the common sampling methodology used in this context is respondent-driven sampling (RDS). Under RDS, the network connecting members of the target population is used to uncover the hidden members. Specialized techniques are then used to make inference from the data collected in this fashion. Our first objective is to correct traditional RDS prevalence estimators and their associated uncertainty estimators for misclassification of the outcome variable. RDS also has the unusual characteristic that the participants are driving the sampling process by recruiting members into the survey. Since the researchers forfeit their control over the sampling process, the estimators are therefore susceptible to a great extent to participants' behavioral induced biases. Our second objective is therefore to provide a mathematical parametrization for a behavior referred to as differential recruitment and subsequently adjust the inference for potential induced bias. Finally, a common issue encountered in the application motivating this thesis, that is, HIV prevalence estimation, is the derivation of a national prevalence estimate. Data are often collected at different study sites within a given country. Public health officials however commonly report national prevalence. Therefore, our last objective consists of using Bayesian hierarchical models to derive a national prevalence estimator from regional data.
Research Projects
Organizational Units
Journal Issue
Publisher Version
Embedded videos