Approaches to Estimation of Haplotype Frequencies and Haplotype-trait Associations

Xiaohong Li, University of Massachusetts Amherst

This dissertation has been moved to the following series:


Characterizing the genetic contributors to complex disease traits will inevitably require consideration of haplotypic phase, the specific alignment of alleles on a single homologous chromosome. In population based studies, however, phase is generally unobservable as standard genotyping techniques provide investigators only with data on unphased genotypes. Several statistical methods have been described for estimating haplotype frequencies and their association with a trait in the context of phase ambiguity. These methods are limited, however, to diploid populations in which individuals have exactly two homologous chromosomes each and are thus not suitable for more general infectious disease settings. Specifically, in the context of Malaria and HIV, the number of infections is also unknown. In addition, for both diploid and non-diploid settings, the challenge of high-dimensionality and an unknown model of association remains. Our research includes: (1) extending the expectation-maximization approach of Excoffier and Slatkin to address the challenges of unobservable phase and the unknown numbers of infections; (2) extending the method of Lake et al. to estimate simultaneously both haplotype frequencies and the haplotype-trait associations in the non-diploid settings; and (3) application of two Bayesian approaches to the mixed modeling framework with unobservable cluster (haploype) identifiers, to address the challenges associated with high-dimensional data. Simulation studies are presented as well as applications to data arising from a cohort of children multiply infected with Malaria and a cohort of HIV infected individuals at risk for anti-retroviral associated dyslipidemia. This research is joint work with Drs. S.M. Rich, R.M. Yucel, J. Staudenmayer and A.S. Foulkes.