Off-campus UMass Amherst users: To download campus access dissertations, please use the following link to log into our proxy server with your UMass Amherst user name and password.

Non-UMass Amherst users: Please talk to your librarian about requesting this dissertation through interlibrary loan.

Dissertations that have an embargo placed on them will not be available to anyone until the embargo expires.

Date of Award


Access Type

Open Access Dissertation

Document type


Degree Name

Doctor of Philosophy (PhD)

Degree Program

Public Health

First Advisor

Andrea S. Foulkes

Second Advisor

John P. Buonaccorsi

Third Advisor

John Staudenmayer

Subject Categories

Public Health | Statistics and Probability


Characterizing the genetic contributors to complex disease traits will inevitably require consideration of haplotypic phase, the specific alignment of alleles on a single homologous chromosome. In population based studies, however, phase is generally unobservable as standard genotyping techniques provide investigators only with data on unphased genotypes. Several statistical methods have been described for estimating haplotype frequencies and their association with a trait in the context of phase ambiguity. These methods are limited, however, to diploid populations in which individuals have exactly two homologous chromosomes each and are thus not suitable for more general infectious disease settings. Specifically, in the context of Malaria and HIV, the number of infections is also unknown. In addition, for both diploid and non-diploid settings, the challenge of high-dimensionality and an unknown model of association remains. Our research includes: (1) extending the expectation-maximization approach of Excoffier and Slatkin to address the challenges of unobservable phase and the unknown numbers of infections; (2) extending the method of Lake et al. to estimate simultaneously both haplotype frequencies and the haplotype-trait associations in the non-diploid settings; and (3) application of two Bayesian approaches to the mixed modeling framework with unobservable cluster (haploype) identifiers, to address the challenges associated with high-dimensional data. Simulation studies are presented as well as applications to data arising from a cohort of children multiply infected with Malaria and a cohort of HIV infected individuals at risk for anti-retroviral associated dyslipidemia. This research is joint work with Drs. S.M. Rich, R.M. Yucel, J. Staudenmayer and A.S. Foulkes.