Several methods have been reported for modeling loop conformations. These fall into two general categories: those using databases of known loop conformations[105][104] and those using conformational searching[107][106]. A combined approach has also been described[108]. The method here also combines features of both approaches, as it uses a conformational search method similar to Bruccoleri and Karplus[106], but with dihedral angles sampled from probability grids developed through an analysis of the Brookhaven Protein Database.
The Probability Grid Monte Carlo (PGMC) method is described in detail in Section . The method samples conformational space by modifying the dihedral angles of a polypeptide or protein according to probability matrices determined from an analysis of known protein structures. At each step of the simulation, one amino acid residue is selected for modification and either its backbone or sidechain conformation is modified. If its backbone conformation is to be modified, new values of and are chosen from 2-dimensional grids, where each possible combination has been assigned a specific probability. The values of and are confined to discrete values between -180 and 180. The spacing between gridpoints is , so there are possible conformations for each amino acid. Probabilities have been determined for grids with spacings of 5, 10, 15, 30, and 60. Different probability grids were determined for three different residue types: glycine, proline, and the 18 standard residues, from the distributions found among residues in a selection of high-quality structures in the Brookhaven Protein Database (PDB). Additionally, different grids were developed for different secondary structure types: helices, sheets, and coil conformations. Separate grids were not determined for turns because these conformations require specific four-residue conformations and are not represented well by single-residue probabilities. The probabilities for coil regions were derived from all residues not specified by the HELIX, SHEET, and TURN designators of the PDB files. These coil probability grids are the most pertinent to loop conformations and are, therefore, the ones used in these simulations.
Sidechain conformations are also chosen from probability grids, but these grids are 1- to 5-dimensional, depending upon the number of sidechain dihedrals that are sampled in a particular residue. Only dihedrals which affect the geometries of heavy atoms (non-hydrogen) and are not part of ring systems are included. The number of PGMC dihedrals varies from one (e.g., for serine and threonine) to five (for arginine).
The goal in loop modeling is to predict the native conformation of the loop. If a forcefield is used to evaluate conformations, and the forcefield is highly accurate, the minimum energy conformation should be very similar to the native conformation. Our simulations use the DREIDING forcefield[109] to evaluate structures. Although there is probably no forcefield in existence which guarantees that its global minimum is the native conformation, we have included the capacity to improve the results by increasing the sophistication of the calculations. Ideally, such factors as solvation and loop-protein and loop-substrate interactions would be included in the calculation. However, including such terms can increase computational time so much that only a few possible conformations can be evaluated. There must be a balance between speed and accuracy. We have addressed this need for balance by creating a hierarchical procedure, which increases in accuracy as the simulation proceeds. The first stage of the procedure creates backbone conformations which meet the endpoint criteria; these conformations are evaluated without regard to sidechain interactions. The second stage optimizes the positions of the sidechains of the loop residues. Interactions among all the sidechain and backbone atoms of the loop are considered, as are interactions with residues from other regions of the protein and, if possible, interactions with a substrate. The final stage is complete optimization of the best conformations from the second stage, using energy-minimization of all degrees of freedom of the loop. At this stage, solvent may be added to enhance the accuracy of the simulation.
The first stage of the simulation involves the generation of numerous backbone conformations which meet the endpoint criteria established by the constant framework. The conformation of each loop in a protein is predicted independently. The framework residues from the template protein are held constant, while all loop residues are removed. The loop being modeled is then constructed using standard geometries from the BIOGRAF[110] peptide libraries. Loop conformations are then generated which meet the criteria that the endpoint residues of the loop attach to the framework with the same geometry as in the template protein. There is, theoretically, no limit to the number of conformations of an -residue loop which meet such endpoint criteria if , so there is no method for directly calculating all possible conformations. The seminal work of Go and Scheraga[111], however, described a method for exactly determining the conformations of three consecutive residues which enable them to meet endpoint criteria. This is done by solving constraint equations for the three and dihedrals while holding all other dihedrals, bonds, and angles fixed. The algebraic equations described cannot be solved for all cases; Go and Scheraga[111] found the number of solutions varied from 0 to 8.
We have implemented the chain-closure algorithm to work in conjunction with our probability grids to generate -residue loop structures which exactly meet the endpoint criteria. An initial conformation is generated by randomly selecting pairs from the probability grids for the ``outer'' loop residues - those besides the central three residues. New conformations are generated by randomly choosing one of the outer residues and choosing a new pair from the appropriate probability grid. After each new conformation is constructed, the chain-closure algorithm is used to determine whether any combination of 's and 's for the central three residues can close the loop. If so, each of the solutions is constructed and the energy of the structure is calculated. If not, the process continues with a new loop residue selected at random and a new pair chosen from the probability grids. The process continues until a loop is successfully built. As each successful loop structure is saved, its energy is calculated. Because the first phase is only concerned with the generation of backbone conformations, the sidechain atoms are ignored in the energy calculations. A typical calculation produces and tests 2000 conformations per cpu minute on a single processor of a Silicon Graphics 4D/380 workstation. On average, 20 of these 2000 structures can form a successful loop.
The successful loops from Phase 1 are ranked by energy and the best are saved for sidechain positioning in Phase 2. This second phase uses the PGMC method, including selection of sidechain conformations from probability grids, calculation of the energy of each new conformation, and acceptance or rejection according to the Metropolis criterion[112]. For each of the backbone conformations saved from Phase 2, a Monte Carlo simulation is run in which sidechain conformations are initially randomly selected from the sidechain probability grids, and new conformations are built by modifying one sidechain at a time. As each new conformation is built, the energy of its sidechain and backbone are calculated, including its interactions with nearby atoms of the framework. The energy of the new conformation is compared to the previous energy. If the change in energy, , is less than 0, the new structure is saved. If the new structure is higher in energy, the probability of accepting it is , where is the Boltzmann constant and is the simulation temperature. The Monte Carlo simulation proceeds for a number of steps and the best energy conformation is saved. A similar simulation is run for each backbone structure.
The best conformations from Phase 2 are selected for full minimization. The lowest-energy conformation for each of the six loops is built onto the crystal structure framework. This new structure is then minimized using conjugate gradients minimization with the framework atoms included in the force calculations, but only the loop atoms allowed to move. Considerable refinement can be achieved even for the lowest-energy conformations from Phase 2 because bonds and angles need no longer be fixed and dihedral angles are no longer restrained to gridpoint values, i.e., multiples of the grid spacing . At this point, solvation models may be introduced into the calculations. Initial work has been done to incorporate the solvation potential of Eisenberg and McLachlan[113], but without substantial improvement over the vacuum calculations reported here.
In summary, the three phases of the loop-building simulations presented here are: