s5.method.
The PGMC C Builder was developed as an extension of the BIOGRAF
program from Molecular Simulations, Incorporated[94].
All calculations reported here were run on Silicon Graphics Power
Series and Indigo workstations; all timing numbers were obtained from
simulations run on a single processor of an SGI 4D/380. During the
first stage of the model-building procedure, the protein is created
one residue at a time until the entire protein has been built. As
each residue
is added, its geometry is initially built from the
standard peptide geometries in the BIOGRAF peptide library, then the
backbone (
) and sidechain (
) dihedrals are rotated to their most
probable conformations according to the relevant probability grids. A
Monte Carlo simulation using
probability grids is then used to
search the conformational space of a ``pulse'' of residues: the last
residues of the current chain (residues
through
). The
residues preceding the pulse are held fixed and are not included in
the energy calculations. Simulations in which these early residues
are held fixed, but included in the energy calculation, are
considerably slower and give worse results. The sidechains are also
ignored during the chain-building phase; they are added in the second
stage after the backbone conformation has been built. The energy used
during the Monte Carlo simulations is essentially the DREIDING energy
of the backbone atoms of the pulse, plus harmonic terms constraining
the pulse C
coordinates to the true coordinates. The best
conformation sampled during the Monte Carlo simulation is saved and
then optimized by conjugate gradients minimization. This process
proceeds sequentially, with each new residue being involved in several
optimization cycles before finally being held in its final position as
the pulse moves beyond it.
The backbone Monte Carlo simulations are aided by pre-determination of
the secondary structure, where possible. There is a high correlation
between the dihedrals of a protein and its C
coordinates, so
knowledge of the C
coordinates can limit the possible
values.
The most common secondary structural elements,
helices and
sheets, have
very specific C
configurations, as
described by the virtual angle
and virtual dihedral
,
shown in Figure
. Analysis of the distributions of
the proteins in our H64 dataset showed that HELIX and SHEET residues
almost always have
and
values within the ranges
specified in Table
. Residues with
distributions in one of these two regions are assumed to have
values common to that secondary structure type; when their
and
conformations are sampled during the chain-building process,
the
grids determined for HELIX or SHEET residues are used.
Residues with
values falling outside this region are sampled
using the generic
probability grids. 85%of the residues in
the H64 dataset having
and
values within the
high-probability
sheet region listed in Table
, also
have values within the specified region. The correlation is
even higher for
helices, where 88%of the residues with
helix
values have
values within the corresponding range. If there
were no variation in bond lengths and angles in the protein backbone,
the
angles would provide almost completely sufficient
information to determine the
angles, according to the method
developed by Purisima and Scheraga[82]. Unfortunately,
the variability in real conformations is too high for this exact
method to work, and
angles must be derived from simulation
methods such as the one presented here. Nevertheless, the correlation
between
and
angles is sufficient to determine which
residues should be sampled using the HELIX and SHEET
grids. The
use of these grids for the appropriate residues improves our results
significantly.
Because Monte Carlo simulations depend on random numbers, each time
the calculation is run, it produces a different backbone conformation.
However, an exhaustive search is much more computationally intensive,
even if only a few conformations were allowed for each residue. A
complete sampling of just the top 20 conformations for each
residue in a three-residue pulse would require evaluation of 8000
different conformations. In contrast, we are able to obtain excellent
results from only 200 Monte Carlo steps. The Metropolis criterion
(see Section
and Reference [88]) rejects conformations
which produce very bad energies, allowing the conformational sampling
to focus on low-energy conformations. It is therefore possible to
quickly build backbone conformations. A typical simulation takes
approximately 15 seconds per residue on one processor of an Silicon
Graphics 4D/380 workstations, or less than 12 minutes for the 46 residue
protein, crambin. Speed is crucial for simulations where different C
conformations are being evaluated, for instance when numerous
conformations are generated by a lattice-based protein structure
prediction method[79]. In cases where a single set of
C
coordinates is being used, it may not be necessary to limit the
calculations to a matter of minutes. In these cases, several
simulations can be run, using different random numbers for the Monte
Carlo calculation. Each will produce a slightly different backbone
conformation. From these, the lowest energy conformations are
selected for the second stage of the calculation.
The best-energy conformations generated in Phase 1 were evaluated
without regard to their sidechain positions. During the
chain-building process, energies were determined for only a small
pulse of residues; all previous residues were ignored. However, after
the chain is built, the energy of the entire backbone is evaluated and
this value is used to determine which backbone conformations are used
in Phase 2. The sidechain conformations are optimized by
a PGMC simulation using probability grids. In this stage, the
backbone atoms are held fixed, but are included in the energy
calculation. Because the backbone is held fixed, constraints to the
C
coordinates are removed. In these calculations, at every Monte
Carlo step, one sidechain is selected at random, and a new sidechain
conformation is chosen from it according to the residue-specific
probability grid. The energy of the new conformation is calculated,
and the Metropolis criterion is used to accept or reject this
structure. Since the Metropolis acceptance probability
(Equation (
)) is dependent upon the , the change in
energy, only the energy of the sidechain being modified needs to be
evaluated; all interactions not involving the sidechain being modified
can be considered constant and do not need to be evaluated. This
results in a huge speed increase over calculations which
re-evaluate the entire energy of the protein at every step. Using
this method, the second stage can be quite rapid. For the small protein
crambin, which has 46 residues and 396 atoms in the DREIDING
calculations, 1000 Monte Carlo steps requires seven minutes of cpu
time, while plastocyanin, with 98 residues and 857 atoms, requires 22
minutes for 1000 steps. Like the backbone-building process, the
sidechain-modeling process is a stochastic simulation, dependent upon
random numbers. Therefore, it is useful to run the simulation several
times, using different random number seeds, and to use the
lowest-energy structures for further studies.