There are a considerable number of variables which affect the efficiency of the PGMC C builder. Several of these are listed in Table . In order to determine which combination of parameters were most effective, we ran numerous simulations using crambin as a model. This protein was chosen because of its small size, which allowed for rapid calculations, and because it contained helix, sheet, and turn regions. Phase 1 parameters were evaluated by running 20 simulations for each set of parameters, building the complete crambin backbone from its C coordinates. The efficacy of the parameters was determined by averaging, over the 20 runs, the root-mean-square (rms) deviations from the crystal structure for the backbone atoms of the models produced. This average correlated very well with a second measure of the accuracy of the backbone model: the rms deviations in the dihedrals. Not every variable had a large impact on the results. In particular, the simulation temperature and the grid spacing had smaller effects than did the pulse size, the harmonic constraint, or the number of Monte Carlo steps.
The average rms deviations from twenty Phase 1 simulations are shown in Figure for several temperatures and pulse sizes. These simulations were run using 200 Monte Carlo steps for each pulse, a grid spacing of 10, and a C constraint of 1000 (kcal/mol)/Å. There are no consistent trends with respect to temperature. For pulse lengths of three or four, the best results are obtained at a temperature of 1000 K. However, for longer pulses, higher temperatures are more favorable. The pulse length, itself, has a much bigger impact on the results. There is a consistent trend favoring shorter pulse lengths at all temperatures except 5000 K, where a pulse of six is better than a pulse of five. It was clear from numerous other simulations that a pulse length of three gave the best results, with four residues being slightly worse and large numbers significantly worse. The number of possible conformations grows exponentially with the number of residues in the pulse, so smaller pulse lengths are clearly favored in that a larger percentage of their conformational space can be searched during the Monte Carlo calculation. This makes up for the fact that important hydrogen bonding interactions occur between residues and in helices, a fact that would favor a pulse length of at least four. In addition, the time of the simulation is roughly proportional to , so a pulse length of three is preferable from the standpoint of speed, as well.
Another important variable in these simulations is the force constant of the harmonic constraint between the C's of the protein chain being built and the input C coordinates. The energy of each constraint is given by the expression
where is the force constant and is the distance between the C coordinate of residue in the model and in the template. There is a constraint of this type for each residue in the pulse. There is an additional constraint, with a weak force constant of and an offset of 2.0 Å, between the carbonyl carbon of the most recently added residue, , and the template C of residue . This helps to orient the final residue of the growing chain. Figure shows the effect of the constraint on the average rms errors in the backbone atoms (RMSB) and the C coordinates (RMSC). These simulations were run at a temperature of 1000 K, using a grid spacing of 10 and a pulse length of three. As should be expected, the deviations for the C coordinates decrease exponentially as the force constant increases. However, the fit of the entire backbone has a minimum of 0.520 Å when (kcal/mol)/Å. This is substantially less than a typical DREIDING force constant of 700 (kcal/mol)/Å or more for bond stretches. Therefore, the C constraints do not cause distortions in the geometries during the conjugate gradients minimization stage which follows the Monte Carlo.
As each new residue is added, the pulse of residues is optimized first by the Monte Carlo conformational search, then by 100 steps of conjugate gradients minimization. Both stages are important. The minimization process is necessary to provide flexibility in the bond lengths and angles of the protein model, in order to match closely the specific C geometry of the protein being built. Although the minimization process makes only small adjustments in the conformation of the pulse residues, it makes a substantial difference in the results. With no minimization, the errors in the backbone model built up very quickly. Using the same parameters which produced an average backbone deviation of 0.52 Å when minimization was included, the C Builder produces crambin backbone models with an average rms deviation of 1.32 Å when no minimization is involved. The parameters were optimized for simulations including minimization and probably do not represent the best possible results for simulations without minimization. Nevertheless, it is clearly preferable to include the minimization process. It is also important to include the Monte Carlo conformational search. The results using different numbers of Monte Carlo steps are shown in Figure . Simulations with one step correspond to simply using the highest probability conformation from the grids for each residue; no other conformations are sampled. Although the results for this case are good (0.60 Å rms), the results are clearly improved by the use of even a small number of Monte Carlo steps, and get better as the number of steps increases. The standard error in these averages is typically 0.01 Å, so there is little statistical significance to the improvements above 50 steps. Nevertheless, in order to increase the number of conformations sampled while keeping the simulation time to 10 minutes per crambin backbone conformation, we chose to use a value of 200 Monte Carlo steps for most simulations.
The choice of grid spacing was based upon simulations of the pentapeptide Met-enkephalin (see Section ), which found that the best results were obtained using a grid spacing of 10. The 10 dihedral spacing appears to provide the best balance between conflicting trends which arise as the grid spacing becomes smaller: there are far more possible conformations, so the protein can assume more low-energy conformations, but the fraction of the total conformational space that can be sampled during a given number of Monte Carlo steps decreases.
After backbone models have been developed in Phase 1, the sidechains are optimized in Phase 2. In these calculations, the backbone is held fixed while the sidechains are modified by randomly choosing new conformations according to the probability grids. The most important variables for these simulations are the grid spacing, the temperature, and the number of Monte Carlo steps. A grid spacing of 10 was selected for these calculations in order to be consistent with the grid spacing chosen for Phase 1. Results improved consistently as the number of Monte Carlo steps was increased, but improvement slowed after about 500 steps; therefore, a value of 1000 was used for the calculations reported below. As discussed below, this number may be insufficient for large proteins, but for crambin it represents more than 25 conformations per residue for the 37 non-alanine, non-glycine optimized during these simulations.
In order to determine the best simulation temperature for Phase 2, ten PGMC calculations were run at several temperatures between 0 K and 5000 K. The starting structure for these calculations was the crambin crystal structure, with its sidechains rotated to their most probable conformations according to the 10 probability grids. This structure had an rms deviation from the crystal structure of 1.52 Å; the deviation for sidechain atoms alone was 2.34 Å. For each simulation, 1000 Monte Carlo calculations were run, after which the lowest energy conformation was saved and its overall rms deviation from the crystal structure was recorded. The average for the ten simulations at each temperature is shown in Figure . As was found for the backbone Monte Carlo simulations in Phase 1 (see Figure ), there is not a large variation with respect to temperature. This is the case despite the fact that the acceptance rate for new structures rises from 7.7%at 0 K to 46.8%at 5000 K. Apparently, the much greater acceptance rate of new structures does translate directly into the creation of more low-energy conformations. The simulations at 300 K were more consistently accurate, so this temperature was used in the simulations reported below. Table lists the values used for Phase 1 and Phase 2 simulations reported in the following sections.