One of the most important classes of proteins to be studied by homology modeling is the immunoglobulins. These molecules are ideal candidates for homology modeling studies because each organism can produce a huge number of immunoglobulins, or antibodies, which differ dramatically in their binding specificities, but are nearly identical in sequence and structure. The specificity is due to the great variability of six small loop regions, the ``complimentarity determining regions'' (CDR's), also known as hypervariable loops. A reliable method for predicting the conformations of the six CDR's would essentially be a method for predicting the antigen binding sites of an enormous variety of immunoglobulins and would provide valuable information about the immune system, catalytic antibodies, and on a more fundamental level, molecular recognition.
The most common of the five closely related immunoglobulin classes is immunoglobulin G (IgG), which is Y-shaped and contains two antigen-binding sites. Figure shows a schematic diagram of an IgG molecule, which contains two copies each of two different types of chains. The light chains contain a variable (V) and a constant (C) domain, while the heavy chains contain one variable (V) and three constant (C1, C2, C3) domains. All six domain types are similar in sequence and structure, containing 100-120 residues folded into two antiparallel sheets. The great diversity of antibody specificity arises from the extreme variability of three loops in each variable domain. The three variable loops in V are called H1, H2, and H3, while those in V are L1, L2, and L3. The spatial arrangement of the six loops is shown in Figure for two immunoglobulins whose F fragments have been crystallographically resolved: HyHEL-5  and McPC603 .
Because the specificity of antibodies is determined by the six CDR's, considerable work has gone into understanding the structure-sequence relationships for these small loop regions. Several crystal structures of murine and human IgG's have been solved. Most are not complete IgG's, but just the Fab fragment (see Figure ). Many are co-crystals which include the antigen or a hapten. After analyzing these crystal structures, as well as several hundred other immunoglobulin sequences, C. Chothia and coworkers proposed that a small number of ``canonical structures'' exist for each of the five loops other than H3. Most examples of these loops should conform to one of these structures. For example, as many as 95%of the L2 loops are believed to conform to a single canonical structure, while L1 has four canonical structures which account for about 70%of the total. For these loops, antigen specificity is due primarily to the sidechain conformations, so it is of paramount importance to model these correctly. Additionally, no canonical structures have been identified for the H3 loops, which show extreme variation in size and sequence. Therefore, a major challenge in modeling the antigen binding site is to model both the backbone and sidechains of the H3 loop.
The studies reported here used the PGMC-based method described above to reproduce the conformations of the hypervariable loops from the murine IgG molecules HyHEL-5 (structure 2HFL from the Brookhaven Protein Database (PDB)) and McPC603 (PDB structure 1MCP). These two cases have been used to test methodologies which use canonical conformations, as well as those using conformational search methods  and those which use a combination of database information and conformational searching. Different published reports have used slightly different definitions of the loop regions. As our method is most similar to that of Bruccoleri and coworkers, we used the loop definitions similar to those in Reference , as shown in Table . There are two differences: we include an additional residue in the L1 and L3 loops of HyHEL-5, making these six-residue loops. In these cases, increasing the size of the loops consistently improved our results.
In order to produce the largest variety of loop conformations, we used probability grids with = 5. As described above, all six loops were removed from the crystal structure, and each loop was modeled independently. In the first stage of the simulation, 1000 loops were created. The energy of the backbone atoms of each of these loops was calculated, and the top 100 were saved for Phase 2. Any repeat conformations were eliminated. These 100 best backbone conformations were each subjected to 50 steps of PGMC, using probability grids and a simulation temperature of 1000 K. The full energy of the loops, including the sidechain atoms, was used for the Monte Carlo phase. For each of the six loops, the overall lowest-energy conformation from the 100 Monte Carlo runs was saved for Phase 3. In this last stage, the six loops were assembled and were energy-minimized using the conjugate-gradients minimizer of BIOGRAF.
The results for McPC603 and HyHEL-5 are described in Tables and . The Phase 2 root-mean-square (rms) deviations are given with respect to the crystal structures. The Phase 3 rms deviations are given with respect to DREIDING reference structures, in order to reduce the effects of the choice of forcefield. The DREIDING reference structures were created by minimizing the loop conformations from the crystal structure while holding the framework atoms fixed. For all cases, hydrogen atoms were not included in the determination of rms deviations.
Results were consistently good for small and medium-sized loops (4-6 residues), with the exception of the H1 loop of HyHEL-5. For this loop, 17 of the top 25 loops from Phase 2 had C rms deviations of less than 1.5 Å, but the best loop had a deviation of 2.48 Å. It is possible that increasing the sophistication of the energy term, such as including terms for surface area or solvation, could improve the selection of more native-like conformations. As noted in other studies, the energy from vacuum calculations correlates only modestly with improved fit to the crystal structure. For example, the bests C fits from the 1000 loops generated in Phase 1 are listed in Table , clearly indicating that better loops were sampled. Nevertheless, the vacuum calculations used here did quite well for most of the medium sized loops, with C deviations ranging from 0.27 Å for H3 of HyHEL-5 to 1.75 Å for L3 of HyHEL-5. It is important to note that the backbone conformations of the two H3 loops were modeled quite well, since these loops cannot be modeled using canonical structures. Modeling of the L3 loop of McPC603 was very successful, not only in that the rms deviations were low, but that the cis peptide bond of Pro 101 was accurately predicted. Likewise, the trans peptide bond of the homologous Pro 94 of HyHEL-5 was predicted, as well. On the other hand, the large L1 and H2 loops of McPC603 were modeled poorly in comparison to the shorter loops. The conformational space of these loops is so large that even 1000 trial structures is probably too small a number for adequate sampling. Modeling loops greater than 10 residues in length is probably beyond the capabilities of both database and conformation-searching algorithms, the former because of the lack of example conformations meeting the necessary criteria and the latter because the huge conformational space cannot be adequately searched in a short period of time. Generally, the calculations here took two hours per loop. One hour was required to generate the 1000 loop conformations and one hour was required to optimize the sidechain conformations of the 100 initial conformations in Phase 2.
The C traces of the modeled and actual loops for HyHEL-5 and McPC603 are shown in Figures and . The loops are shown in the same face-on view of the antigen-binding site as used in Figure . The excellent fits of the predicted loops for H3, L1, and L2 of HyHEL-5 and H1, L2, and L3 of McPC603 are very apparent, even in the simple C trace pictures. More detail can be seen in Figures and , which show the entire backbone conformations of the model and actual loops in views perpendicular to the plane of the loops. The large errors in the backbone conformations of the L1 and H2 loops of McPC603 would prevent an adequate model of the antigen-binding pocket from being predicted, if this were a blind test where the conformation of the loops was unknown. L1 and H2 are both involved in binding the phosphocholine hapten co-crystallized with the McPC603 Fab in the PDB structure 2MCP, so better modeling would be required to understand this interaction if no crystal structure were available. In contrast, the predicted backbone conformations of the HyHEL-5 loops may be sufficient to understand its binding to lysozyme. Sidechain modeling would still require improvement. This may be possible with concurrent optimization of the sidechain loops. Currently, the Phase 2 simulations are done on each individual loop, without the other loops present. Simulations run with all six loops present should improve the sidechain packing in the binding site model.
The results are quite comparable to the results of Bruccoleri et al., who used a more brute-force conformational search algorithm for constructing possible loop structures and included surface area calculations in a variety of ways to choose particular loop conformations to use. That study predicted the HyHEL-5 loops to an rms deviation of 2.6 Å for all atoms and 1.4 Å for backbone atoms, compared to our results 2.73 Å and 1.74 Å, respectively. Our results for McPC603 were not as good, since the two large loops were modeled less successfully. Nevertheless, the results for the other four loops are quite good. The results from various methods are shown in Table . The PGMC method reported here, produces results similar to those of the published packages, despite being a highly generalized method, which does not require the loops being modeled to have any relationship to loops of known conformations, and without having taken into account any solvent effects. The method described here does not require any user input once the initial conformation and the sequence to be changed has been specified.