First Principles Prediction of Protein Tertiary Structures

Joseph F. Danzer, Derek A. Debe, and William A. Goddard III

The problem of predicting a proteinís native tertiary structure from sequence alone remains one of the great unsolved problems of molecular biology. A fast hierarchical prediction method has been developed in order to tackle this problem. Initially, a fast off-lattice Monte Carlo algorithm is used to generate a large ensemble of structures using a Cα only model of the protein. Successive stages of the hierarchy involve selecting out the best structures from the generated ensemble based on an energy that measures how well a structure packs its hydrophobic core. The selected structures are then refined via energy minimization with an increasingly detailed force field. Final structural refinement uses fold recognition techniques to garner structural information from the protein data bank. Results for a test set of 40 proteins are presented. On a single processor, total prediction time for a single protein is on average 5 hours. It was found that this method accurately predicts the overall fold of a protein and reproduces its native structure with moderate accuracy. An example of one of our more successful predictions for a 113 residue protein is found in Figure 1.

 

Figure 1. Comparison of predicted structure to native for 113 residue 1hmdA. Native structure is shown in green, while the 4.0Å predicted structure is show in blue.