Although the variables discussed in the previous section could be tuned to specific problems, the same values were used for six different proteins, ranging from the 46 residue crambin to myoglobin, which has 153 residues. These proteins are listed in Table . The proteins have widely different structures, as indicated by the percentages of their secondary structures which are -helical and sheet. Four of the six proteins are included in the subset of crystal structures used to develop the Monte Carlo probability grids (see Section ). Of the other two, the flavodoxin structure used is merely a different form (oxidized) than the one used in the dataset (semiquinone), while the plastocyanin studied is homologous, but not identical, to the structure used in the dataset.
For each of these six proteins, the C coordinates from the listed crystal structure were used to rebuild the backbone conformation twenty times, as described in the preceding sections for crambin. In each case, all prosthetic groups, such as the myoglobin heme, were removed from the crystal structure, as were any cofactors or solvent molecules. Each of the twenty backbone conformations was compared to the crystal structure and the results were analyzed. Table lists the average rms deviation as well as the standard deviation () for the twenty structures. Also listed are the rms deviations for the lowest energy conformation and the conformation with the best fit. Again, it is seen that the lowest energy conformation is never the one with the best fit to the crystal structure. However, it is encouraging that the lowest energy conformation was better than average for five of the six proteins.
Comparing Tables and , it is clear that the size of the protein has little effect on the accuracy of Phase 1. In fact, the largest protein, myoglobin, is consistently modeled most accurately. This is not surprising considering the crambin results, where the average backbone deviations was approximately 0.2 Å for helical residues. The protein myoglobin, with almost 80%of its residues in helices, is greatly benefited by the accuracy with which the method models helices. Plastocyanin is also modeled relatively well, even though it has a sheet protein, with little helical content. The large sheet content is probably also a favorable factor, as these conformations are also very well represented by the probability grids. It is proteins such as bovine pancreatic trypsin inhibitor (BPTI), with only about 50% helix and sheet content, which are relatively poorly modeled, though even for this case the rms deviation is badly distorted by poor modeling of the C-terminal residues. The average rms deviation for residues 1-54 is 0.501 Å.
Phase 2 simulations were carried out on flavodoxin and plastocyanin, building five complete structures from each of the top five backbone conformations from Phase 1. The same parameters were used for these simulations as were used for Phase 2 simulations of crambin. The energy and all-atom rms deviation for each of the 25 conformations was evaluated and the results were analyzed. Table lists the results for these two proteins, along with those for crambin. Unlike Phase 1, the results for Phase 2 are highly dependent on the size of the protein, with the average deviation increasing substantially for larger proteins. In Phase 1 simulations, each residue was sampled the same number of times, regardless of the size of the protein. In the Phase 2, simulations, however, each simulation involved a total of 1000 Monte Carlo steps. For crambin, this meant that the average residue was varied 27 times during the simulation (alanine and glycine residues are not affected). For plastocyanin, the 73 relevant dihedrals were sampled an average of 14 times; for flavodoxin, the average was 8.5. Clearly, the sidechains of flavodoxin are not being adequately sampled. Unfortunately, the cpu time required for the simulations also grows substantially as the size of the protein grows. While the 1000 Monte Carlo steps take seven minutes for crambin, they require nearly 20 minutes for plastocyanin and over 40 minutes for flavodoxin. Therefore, it is computationally expensive to increase the number of steps for flavodoxin. Nevertheless, the results for flavodoxin are comparable to or better than published results using other methods.
The lowest energy conformation of flavodoxin was chosen for comparison with other methods. This protein has become a standard test case for published methods of building all-atom conformations from C coordinates. This includes both methods based on molecular mechanics and those using database searches to determine conformations for multiple-residue peptide fragments from the protein. Table lists several measures of the accuracy of these models. ``Peptide flips'' refer to the number of peptide units (the planar backbone unit between the C coordinates) which are rotated by more than 90 degrees from the crystal structure. This occurs seven times in our model, compared to only 5 and 4 times in the fragment-matching methods of Reid and Thornton and Holm and Sander. This is the only measurement by which the PGMC method appears deficient. In most of the other measures, the PGMC method is comparable to, or better than, the other published methods. The PGMC C Builder is currently not quite as accurate as the method of Holm and Sander, but is comparable in most respects, even though it is based on a more general approach to protein modeling: Probability Grid Monte Carlo. The PGMC method is applicable to unconstrained systems as well as those constrained by a priori knowledge of the C coordinates.