Date of Award

5-13-2011

Document Type

Open Access Dissertation

Degree Name

Doctor of Philosophy (PhD)

Degree Program

Computer Science

First Advisor

Oliver Brock

Second Advisor

Lila Gierasch

Third Advisor

David Kulp

Subject Categories

Computer Sciences

Abstract

The most significant impediment for protein structure prediction is the inadequacy of conformation space search. Conformation space is too large and the energy landscape too rugged for existing search methods to consistently find near-optimal minima. Conformation space search methods thus have to focus exploration on a small fraction of the search space. The ability to choose appropriate regions, i.e. regions that are highly likely to contain the native state, critically impacts the effectiveness of search. To make the choice of where to explore requires information, with higher quality information resulting in better choices. Most current search methods are designed to work in as many domains as possible, which leads to less accurate information because of the need for generality. However, most domains provide unique, and accurate information. To best utilize domain specific information search needs to be customized for each domain. The first contribution of this thesis customizes search for protein structure prediction, resulting in significantly more accurate protein structure predictions.

Unless information is perfect, mistakes will be made, and search will focus on regions that do not contain the native state. How search recovers from mistakes is critical to its effectiveness. To recover from mistakes, this thesis introduces the concept of adaptive balancing of exploitation with exploration. Adaptive balancing of exploitation with exploration allows search to use information only to the extent to which it guides exploration toward the native state. Existing methods of protein structure prediction rely on information from known proteins. Currently, this information is from either full-length proteins that share similar sequences, and hence have similar structures (homologs), or from short protein fragments. Homologs and fragments represent two extremes on the spectrum of information from known proteins. Significant additional information can be found between these extremes. However, current protein structure prediction methods are unable to use information between fragments and homologs because it is difficult to identify the correct information from the enormous amount of incorrect information. This thesis makes it possible to use information between homologs and fragments by adaptively balancing exploitation with exploration in response to an estimate of template protein quality. My results indicate that integrating the information between homologs and fragments significantly improves protein structure prediction accuracy, resulting in several proteins predicted with <1>°A RMSD resolution.

Share

COinS