The PGMC C Builder was developed as an extension of the BIOGRAF program from Molecular Simulations, Incorporated. All calculations reported here were run on Silicon Graphics Power Series and Indigo workstations; all timing numbers were obtained from simulations run on a single processor of an SGI 4D/380. During the first stage of the model-building procedure, the protein is created one residue at a time until the entire protein has been built. As each residue is added, its geometry is initially built from the standard peptide geometries in the BIOGRAF peptide library, then the backbone () and sidechain () dihedrals are rotated to their most probable conformations according to the relevant probability grids. A Monte Carlo simulation using probability grids is then used to search the conformational space of a ``pulse'' of residues: the last residues of the current chain (residues through ). The residues preceding the pulse are held fixed and are not included in the energy calculations. Simulations in which these early residues are held fixed, but included in the energy calculation, are considerably slower and give worse results. The sidechains are also ignored during the chain-building phase; they are added in the second stage after the backbone conformation has been built. The energy used during the Monte Carlo simulations is essentially the DREIDING energy of the backbone atoms of the pulse, plus harmonic terms constraining the pulse C coordinates to the true coordinates. The best conformation sampled during the Monte Carlo simulation is saved and then optimized by conjugate gradients minimization. This process proceeds sequentially, with each new residue being involved in several optimization cycles before finally being held in its final position as the pulse moves beyond it.
The backbone Monte Carlo simulations are aided by pre-determination of the secondary structure, where possible. There is a high correlation between the dihedrals of a protein and its C coordinates, so knowledge of the C coordinates can limit the possible values. The most common secondary structural elements, helices and sheets, have very specific C configurations, as described by the virtual angle and virtual dihedral , shown in Figure . Analysis of the distributions of the proteins in our H64 dataset showed that HELIX and SHEET residues almost always have and values within the ranges specified in Table . Residues with distributions in one of these two regions are assumed to have values common to that secondary structure type; when their and conformations are sampled during the chain-building process, the grids determined for HELIX or SHEET residues are used. Residues with values falling outside this region are sampled using the generic probability grids. 85%of the residues in the H64 dataset having and values within the high-probability sheet region listed in Table , also have values within the specified region. The correlation is even higher for helices, where 88%of the residues with helix values have values within the corresponding range. If there were no variation in bond lengths and angles in the protein backbone, the angles would provide almost completely sufficient information to determine the angles, according to the method developed by Purisima and Scheraga. Unfortunately, the variability in real conformations is too high for this exact method to work, and angles must be derived from simulation methods such as the one presented here. Nevertheless, the correlation between and angles is sufficient to determine which residues should be sampled using the HELIX and SHEET grids. The use of these grids for the appropriate residues improves our results significantly.
Because Monte Carlo simulations depend on random numbers, each time the calculation is run, it produces a different backbone conformation. However, an exhaustive search is much more computationally intensive, even if only a few conformations were allowed for each residue. A complete sampling of just the top 20 conformations for each residue in a three-residue pulse would require evaluation of 8000 different conformations. In contrast, we are able to obtain excellent results from only 200 Monte Carlo steps. The Metropolis criterion (see Section and Reference ) rejects conformations which produce very bad energies, allowing the conformational sampling to focus on low-energy conformations. It is therefore possible to quickly build backbone conformations. A typical simulation takes approximately 15 seconds per residue on one processor of an Silicon Graphics 4D/380 workstations, or less than 12 minutes for the 46 residue protein, crambin. Speed is crucial for simulations where different C conformations are being evaluated, for instance when numerous conformations are generated by a lattice-based protein structure prediction method. In cases where a single set of C coordinates is being used, it may not be necessary to limit the calculations to a matter of minutes. In these cases, several simulations can be run, using different random numbers for the Monte Carlo calculation. Each will produce a slightly different backbone conformation. From these, the lowest energy conformations are selected for the second stage of the calculation.
The best-energy conformations generated in Phase 1 were evaluated without regard to their sidechain positions. During the chain-building process, energies were determined for only a small pulse of residues; all previous residues were ignored. However, after the chain is built, the energy of the entire backbone is evaluated and this value is used to determine which backbone conformations are used in Phase 2. The sidechain conformations are optimized by a PGMC simulation using probability grids. In this stage, the backbone atoms are held fixed, but are included in the energy calculation. Because the backbone is held fixed, constraints to the C coordinates are removed. In these calculations, at every Monte Carlo step, one sidechain is selected at random, and a new sidechain conformation is chosen from it according to the residue-specific probability grid. The energy of the new conformation is calculated, and the Metropolis criterion is used to accept or reject this structure. Since the Metropolis acceptance probability (Equation ()) is dependent upon the , the change in energy, only the energy of the sidechain being modified needs to be evaluated; all interactions not involving the sidechain being modified can be considered constant and do not need to be evaluated. This results in a huge speed increase over calculations which re-evaluate the entire energy of the protein at every step. Using this method, the second stage can be quite rapid. For the small protein crambin, which has 46 residues and 396 atoms in the DREIDING calculations, 1000 Monte Carlo steps requires seven minutes of cpu time, while plastocyanin, with 98 residues and 857 atoms, requires 22 minutes for 1000 steps. Like the backbone-building process, the sidechain-modeling process is a stochastic simulation, dependent upon random numbers. Therefore, it is useful to run the simulation several times, using different random number seeds, and to use the lowest-energy structures for further studies.