Although the variables discussed in the previous section could be
tuned to specific problems, the same values were used for six
different proteins, ranging from the 46 residue crambin to myoglobin,
which has 153 residues. These proteins are listed in
Table
. The proteins have widely different structures, as
indicated by the percentages of their secondary structures which are
-helical and
sheet. Four of the six proteins are included in
the subset of crystal structures used to develop the Monte Carlo
probability grids (see Section
). Of the other two, the flavodoxin
structure used is merely a different form (oxidized) than the one used
in the dataset (semiquinone), while the plastocyanin studied is
homologous, but not identical, to the structure used in the dataset.
For each of these six proteins, the C coordinates from the listed
crystal structure were used to rebuild the backbone conformation
twenty times, as described in the preceding sections for crambin. In
each case, all prosthetic groups, such as the myoglobin heme, were
removed from the crystal structure, as were any cofactors or solvent
molecules. Each of the twenty backbone conformations was compared to
the crystal structure and the results were analyzed.
Table
lists the average rms deviation
as well as the standard deviation () for the twenty
structures. Also listed are the rms deviations for the lowest energy
conformation and the conformation with the best fit. Again, it is
seen that the lowest energy conformation is never the one with the
best fit to the crystal structure. However, it is encouraging that the
lowest energy conformation was better than average for five of the six
proteins.
Comparing Tables
and
, it is clear
that the size of the protein has little effect on the accuracy of
Phase 1. In fact, the largest protein, myoglobin, is consistently
modeled most accurately. This is not surprising considering the
crambin results, where the average backbone deviations was
approximately 0.2 Å for helical residues. The protein myoglobin,
with almost 80%of its residues in helices, is greatly benefited by
the accuracy with which the method models helices. Plastocyanin is
also modeled relatively well, even though it has a
sheet protein, with
little helical content. The large
sheet content is probably also a
favorable factor, as these conformations are also very well
represented by the probability grids. It is proteins such as bovine
pancreatic trypsin inhibitor (BPTI), with only about 50%
helix and
sheet content, which are relatively poorly modeled, though even for
this case the rms deviation is badly distorted by poor modeling of the
C-terminal residues. The average rms deviation for residues 1-54 is
0.501 Å.
Phase 2 simulations were carried out on flavodoxin and plastocyanin,
building five complete structures from each of the top five backbone
conformations from Phase 1. The same parameters were used for these
simulations as were used for Phase 2 simulations of crambin. The energy
and all-atom rms deviation for each of the 25 conformations was evaluated
and the results were analyzed. Table
lists the results
for these two proteins, along with those for crambin. Unlike Phase 1,
the results for Phase 2 are highly dependent on the size of the
protein, with the average deviation increasing substantially for
larger proteins. In Phase 1 simulations, each residue was sampled the
same number of times, regardless of the size of the protein. In the
Phase 2, simulations, however, each simulation involved a total of
1000 Monte Carlo steps. For crambin, this meant that the average
residue was varied 27 times during the simulation (alanine and glycine
residues are not affected). For plastocyanin, the 73 relevant
dihedrals were sampled an average of 14 times; for flavodoxin, the
average was 8.5. Clearly, the sidechains of flavodoxin are not being
adequately sampled. Unfortunately, the cpu time required for the
simulations also grows substantially as the size of the protein grows.
While the 1000 Monte Carlo steps take seven minutes for crambin, they
require nearly 20 minutes for plastocyanin and over 40 minutes for
flavodoxin. Therefore, it is computationally expensive to increase
the number of steps for flavodoxin. Nevertheless, the results for
flavodoxin are comparable to or better than published results using
other methods.
The lowest energy conformation of flavodoxin was chosen for comparison
with other methods. This protein has become a standard test case for
published methods of building all-atom conformations from C
coordinates. This includes both methods based on molecular
mechanics[84] and those using database searches to
determine conformations for multiple-residue peptide fragments from the
protein[85][83]. Table
lists several
measures of the accuracy of these models. ``Peptide flips'' refer to
the number of peptide units (the planar backbone unit between the
C coordinates) which are rotated by more than 90
degrees
from the crystal structure. This occurs seven times in our model,
compared to only 5 and 4 times in the fragment-matching methods of Reid and
Thornton[83] and Holm and Sander[85]. This is the only
measurement by which the PGMC method appears deficient. In most of
the other measures, the PGMC method is comparable to, or better than,
the other published methods. The PGMC C
Builder is currently not
quite as accurate as the method of Holm and Sander[85], but
is comparable in most respects, even though it is based on a more
general approach to protein modeling: Probability Grid Monte Carlo. The
PGMC method is applicable to unconstrained systems as well as those
constrained by a priori knowledge of the C
coordinates.