The values listed in Table were used in an attempt to reproduce the structure of crambin using the C coordinates from the crystal structure. Twenty different backbone conformations were generated by using different random numbers to control the selection of dihedrals as well as to determine which conformations would be accepted and which rejected. The conformational energy of the backbone, the rms deviations in backbone atoms and dihedrals from each of these structures is listed in Table , ranked by energy. The average backbone rms
deviation for these 20 simulations was 0.527 Å, in close agreement with the previous result of 0.520 Å mentioned in the preceding section. The average all-atom deviation was 1.696 Å. It is apparent that there is only a small correlation between the backbone energy and the rms fit to the crystal structure backbone. The backbone of the crystal structure itself has an energy of 759.8 kcal/mol, higher than 12 of the 20 model conformations. This is likely to be due both to errors in the crystal structure and in the limitations of the forcefield approach: no forcefield can be optimal for every crystal structure, even when such factors as crystal packing and solvation are considered. Nevertheless, in cases where the crystal structure is unknown, the backbone energy is the best criterion for selecting model structures. Other possible selection criteria, including C constraint energy and total energy including sidechain atoms, had even worse correlation with the deviation in the backbone coordinates.
The five lowest-energy backbone conformations from Phase 1 were used as a starting point for Phase 2. For each of the five backbone conformations, five Phase 2 simulations were carried out, again using different random numbers to produce different results. Each simulation involved 1000 Monte Carlo steps using 10 probability grids and a simulation temperature of 300 K. The 25 conformations produced are listed in Table . Again, there is only a small correlation between energy and rms fit to the crystal structure. Nevertheless, the fits are quite good, with an average rms deviation from the crystal structure of 1.323 Å. All five backbone conformations were represented throughout the list of of all-atom conformations, so the backbone energy was not the determining factor in the overall energy.
The best energy conformation from Phase 2 was chosen as the ``model'' conformation of crambin for detailed comparison to the ``true'' structure, the crystal structure. Table
gives a breakdown of the rms deviation of the crambin model for different regions of the protein. Some of this information is shown graphically in Figure , where the backbone rms
deviation of each residue is shown. The largest deviations occur at the carboxy terminus, where residues 45 and 46 are very poorly modeled. If these two residues are excluded, the backbone rms deviation drops from 0.543 Å to 0.361 Å. The carboxy terminal residues are generally the worst modeled residues because there are fewer constraints on the structure: they usually lie on the surface of the protein where there are fewer inter-residue contacts and there is no C to constrain the orientation of the terminal carboxyl group. In the crambin model, the Asn 46 sidechain and the terminal carboxyl group have reversed positions, giving rise to a large error even though the chemical significance is small. The backbone rms deviation is fairly consistent throughout the rest of the protein, with 34 of the 46 residues having deviations in the 0.1-0.4 Å range. The lowest backbone deviations are in the residues of the long helix, Helix 1, where the deviation in atomic coordinates is 0.209 Å, and the deviation in and dihedrals is only 13.7. The deviations are equally low (0.232 Å and 13.1) for the first seven residues of Helix 2. However, the last residue in the helix starts a turn, and is poorly modeled. In general, the turn regions before and after helices are the most poorly modeled residues other than those at the C-terminus. This is very apparent from both the graph in Figure and the picture in Figure . These regions (particularly residues 5, 20, and 30) have nonstandard values which have very low probabilities in the probability grids. No probability grids were specifically developed for turn regions, but these might prove very valuable.
The sidechain modeling is not as successful as the backbone modeling, with the average deviation in atomic coordinates being near 2.0 Å. This is not at all surprising, since each peptide unit in the polypeptide backbone is constrained at both ends by the positions of two consecutive C's while the sidechains are usually constrained only by their attachment to a single C. The constraints on the sidechain conformations are primarily steric in nature: sidechains in the interior of a protein can have considerable steric overlap and their conformations must be correlated to allow for closest packing. The rms deviation for the atomic coordinates may not be the best indication of modeling success, since it will be heavily weighted toward any poorly modeled large sidechain such as arginine. A better measure is the deviation in sidechain dihedral angles, , defined as the absolute value of the difference between the dihedral in the model and in the crystal structure. The deviations in are shown for the crambin model in Figure . Most dihedrals have
high probabilities at 60, -60, and 180, so deviations would be expected to be near 0 or 120. Of the 37 's in crambin, 24 have deviations less than 30, and 11 have deviations between 90 and 150. Therefore, only two have deviations between 30 and 90. It is important to note that five of the 11 poorly modeled sidechains are cystein residues involved in disulfide bridges in the crystal structure. The C Builder does not currently predict the presence of disulfide bridges, so the disulfide bond is not included in the Monte Carlo energy evaluations. Such a term could be included and would certainly improve the results for these residues. RMS deviations for the different backbone and sidechain dihedrals are shown in Table .
Although the sidechain dihedrals are not as well modeled as the backbone, the results are not discouraging with respect to other methods. As discussed below, our method provides results for flavodoxin dihedrals as good or better than other methods, and these results for crambin are even better.
The differences between the crambin model and the crystal structure are shown in detail in Figures , , and . Figure shows the model and crystal
structure backbones for the entire protein. For most of the protein, it is very difficult to distinguish between the two structures. Only in the turn regions after the two helices is the difference readily apparent. The two following figures show the all-atom structures of the two helices of crambin. Helix 2, shown in Figure , is very well modeled, with an rms deviation of 1.03 Å for all atoms. In terms of the all-atom deviation, it is the best modeled region of the protein (see Table ). The picture shows this quite well, with both sidechain and backbone atoms showing little difference between the two structures, except for Thr 30 on the C-terminal (right) end of the helix. As explained above, this residue begins a turn in the backbone conformation and is poorly sampled during the Phase 1 backbone Monte Carlo. The Helix 1 backbone, in contrast, is modeled quite well throughout its length, including Pro 19 at its C-terminal (left) end. However, Helix 1 has many large sidechains which are difficult to model. Large errors can be seen in Asn 14 and Arg 17. The latter has a particularly large impact on the rms deviation. Excluding Arg 17, the crambin model has an rms deviation of 1.207 Å, rather than 1.386 Å. However, this incorrect conformation of Arg 17 may be energetically more favorable than other conformations more similar to the crystal structure. Of the next four lowest-energy conformations listed in Table , all five have more native-like conformations of Arg 17, but all are higher in energy.
The crambin model illustrates several general findings for simulations using the PGMC C Builder. The lowest-energy structures from Phases 1 and 2 are usually among the best models built, but are rarely the very best. Regardless, the backbone models from Phase 1 are consistently good, and almost any one of them provides an acceptable model of the true backbone. The model backbones are especially good in regions of regular secondary structure such as helices and sheets, but rather poor in turn regions. These results are obtained consistently in different simulations. There is a much larger variation among the results from Phase 2. This may be due to the constraints of time; the number of 1000 Monte Carlo steps was selected largely in order to keep the simulation time below ten minutes, so that large numbers of different conformations could be evaluated. Better and more consistent results might be obtained by substantially longer calculations. Nevertheless, between 40%and 60%of dihedrals are modeled correctly.