Several methods have been reported for modeling loop conformations. These fall into two general categories: those using databases of known loop conformations[105][104] and those using conformational searching[107][106]. A combined approach has also been described[108]. The method here also combines features of both approaches, as it uses a conformational search method similar to Bruccoleri and Karplus[106], but with dihedral angles sampled from probability grids developed through an analysis of the Brookhaven Protein Database.
The Probability Grid Monte Carlo (PGMC) method is described in detail
in Section
. The method samples conformational space by
modifying the dihedral angles of a polypeptide or protein according to
probability matrices determined from an analysis of known protein
structures. At each step of the simulation, one amino acid residue is
selected for modification and either its backbone or sidechain
conformation is modified. If its backbone conformation is to be
modified, new values of and
are chosen from
2-dimensional grids, where each possible
combination has been
assigned a specific probability. The values of
and
are
confined to discrete values between -180
and 180
. The spacing
between gridpoints is
, so there are
possible
conformations for each amino acid. Probabilities have been determined
for grids with spacings of 5, 10, 15, 30, and 60
. Different
probability grids were determined for three different residue types:
glycine, proline, and the 18 standard residues, from the
distributions found among residues in a selection of high-quality
structures in the Brookhaven Protein Database (PDB). Additionally,
different grids were developed for different secondary structure
types:
helices,
sheets, and coil conformations. Separate grids were not
determined for
turns because these conformations require
specific four-residue conformations and are not represented well by
single-residue probabilities. The probabilities for coil regions were
derived from all residues not specified by the HELIX, SHEET, and TURN
designators of the PDB files. These coil probability grids are the
most pertinent to loop conformations and are, therefore, the ones used
in these simulations.
Sidechain conformations are also chosen from probability grids, but
these grids are 1- to 5-dimensional, depending upon the number of
sidechain dihedrals that are sampled in a particular residue. Only
dihedrals which affect the geometries of heavy atoms
(non-hydrogen) and are not part of ring systems are included. The
number of PGMC
dihedrals varies from one (e.g., for serine and
threonine) to five (for arginine).
The goal in loop modeling is to predict the native conformation of the loop. If a forcefield is used to evaluate conformations, and the forcefield is highly accurate, the minimum energy conformation should be very similar to the native conformation. Our simulations use the DREIDING forcefield[109] to evaluate structures. Although there is probably no forcefield in existence which guarantees that its global minimum is the native conformation, we have included the capacity to improve the results by increasing the sophistication of the calculations. Ideally, such factors as solvation and loop-protein and loop-substrate interactions would be included in the calculation. However, including such terms can increase computational time so much that only a few possible conformations can be evaluated. There must be a balance between speed and accuracy. We have addressed this need for balance by creating a hierarchical procedure, which increases in accuracy as the simulation proceeds. The first stage of the procedure creates backbone conformations which meet the endpoint criteria; these conformations are evaluated without regard to sidechain interactions. The second stage optimizes the positions of the sidechains of the loop residues. Interactions among all the sidechain and backbone atoms of the loop are considered, as are interactions with residues from other regions of the protein and, if possible, interactions with a substrate. The final stage is complete optimization of the best conformations from the second stage, using energy-minimization of all degrees of freedom of the loop. At this stage, solvent may be added to enhance the accuracy of the simulation.
The first stage of the simulation involves the generation of numerous
backbone conformations which meet the endpoint criteria established by
the constant framework. The conformation of each loop in a protein is
predicted independently. The framework residues from the template
protein are held constant, while all loop residues are removed. The
loop being modeled is then constructed using standard geometries from
the BIOGRAF[110] peptide libraries. Loop conformations
are then generated which meet the criteria that the endpoint residues
of the loop attach to the framework with the same geometry as in the
template protein. There is, theoretically, no limit to the number of
conformations of an -residue loop which meet such endpoint
criteria if
, so there is no method for directly calculating
all possible conformations. The seminal work of Go and
Scheraga[111], however, described a method for exactly
determining the conformations of three consecutive residues which
enable them to meet endpoint criteria. This is done by solving
constraint equations for the three
and
dihedrals while
holding all other dihedrals, bonds, and angles fixed. The algebraic
equations described cannot be solved for all cases; Go and
Scheraga[111] found the number of solutions varied
from 0 to 8.
We have implemented the chain-closure algorithm to work in
conjunction with our probability grids to generate -residue loop
structures which exactly meet the endpoint criteria.
An initial conformation is generated by randomly selecting
pairs
from the probability grids for the ``outer'' loop residues - those
besides the central three residues. New conformations are generated
by randomly choosing one of the outer residues and choosing a new
pair from the appropriate probability grid. After each new
conformation is constructed, the chain-closure algorithm is used to
determine whether any combination of
's and
's for the
central three residues can close the loop. If so, each of the
solutions is constructed and the energy of the structure is
calculated. If not, the process continues with a new loop residue
selected at random and a new
pair chosen from the probability
grids. The process continues until a loop is successfully built.
As each successful loop structure is saved, its energy is calculated.
Because the first phase is only concerned with the generation of
backbone conformations, the sidechain atoms are ignored in the energy
calculations. A typical calculation produces and tests 2000 conformations
per cpu minute on a single processor of a Silicon Graphics 4D/380
workstation. On average, 20 of these 2000 structures can form a
successful loop.
The successful loops from Phase 1 are ranked by energy and the best
are saved for sidechain positioning in Phase 2. This second phase
uses the PGMC method, including selection of sidechain
conformations from probability grids, calculation of the energy of
each new conformation, and acceptance or rejection according to the
Metropolis criterion[112]. For each of the backbone
conformations saved from Phase 2, a Monte Carlo simulation is
run in which sidechain conformations are initially randomly selected
from the sidechain probability grids, and new conformations are built
by modifying one sidechain at a time. As each new conformation is
built, the energy of its sidechain and backbone are calculated, including
its interactions with nearby atoms of the framework. The energy of
the new conformation is compared to the previous energy. If the
change in energy,
, is less than 0, the new structure is
saved. If the new structure is higher in energy, the probability of
accepting it is
, where
is the Boltzmann
constant and
is the simulation temperature. The Monte Carlo
simulation proceeds for a number of steps and the best energy
conformation is saved. A similar simulation is run for each backbone
structure.
The best conformations from Phase 2 are selected for full
minimization. The lowest-energy conformation for each of the six
loops is built onto the crystal structure framework. This new
structure is then minimized using conjugate gradients minimization
with the framework atoms included in the force calculations, but only
the loop atoms allowed to move. Considerable refinement can be
achieved even for the lowest-energy conformations from Phase 2 because
bonds and angles need no longer be fixed and dihedral angles are no
longer restrained to gridpoint values, i.e., multiples of the grid
spacing . At this point, solvation models may be introduced into
the calculations. Initial work has been done to incorporate the
solvation potential of Eisenberg and McLachlan[113], but
without substantial improvement over the vacuum calculations reported
here.
In summary, the three phases of the loop-building simulations presented here are: