One of the most important classes of proteins to be studied by homology modeling is the immunoglobulins. These molecules are ideal candidates for homology modeling studies because each organism can produce a huge number of immunoglobulins, or antibodies, which differ dramatically in their binding specificities, but are nearly identical in sequence and structure. The specificity is due to the great variability of six small loop regions, the ``complimentarity determining regions'' (CDR's), also known as hypervariable loops. A reliable method for predicting the conformations of the six CDR's would essentially be a method for predicting the antigen binding sites of an enormous variety of immunoglobulins and would provide valuable information about the immune system, catalytic antibodies, and on a more fundamental level, molecular recognition.
The most common of the five closely related immunoglobulin classes is
immunoglobulin G (IgG), which is Y-shaped and contains two
antigen-binding sites. Figure
shows a schematic diagram
of an IgG molecule, which contains two copies each of two different
types of chains. The light chains contain a variable (V) and a
constant (C
) domain, while the heavy chains contain one variable
(V
) and three constant (C
1, C
2, C
3) domains. All six
domain types are similar in sequence and structure, containing 100-120
residues folded into two antiparallel
sheets. The great diversity of
antibody specificity arises from the extreme variability of three loops in
each variable domain. The three variable loops in V
are called
H1, H2, and H3, while those in V
are L1, L2, and L3. The spatial
arrangement of the six loops is shown in Figure
for two
immunoglobulins whose F fragments have been
crystallographically resolved: HyHEL-5 [114] and McPC603
[115].
Because the specificity of antibodies is determined by the six CDR's,
considerable work has gone into understanding the structure-sequence
relationships for these small loop regions. Several crystal
structures of murine and human IgG's have been
solved. Most are not complete IgG's, but just the Fab fragment (see
Figure
). Many are co-crystals which include the antigen
or a hapten. After analyzing these crystal structures, as well as
several hundred other immunoglobulin sequences, C. Chothia and
coworkers[116][104] proposed that a small
number of ``canonical structures'' exist for each of the five loops
other than H3. Most examples of these loops should conform to one of
these structures. For example, as many as 95%of the L2 loops are
believed to conform to a single canonical structure, while L1 has
four canonical structures which account for about 70%of the total.
For these loops, antigen specificity is due primarily to the sidechain
conformations, so it is of paramount importance to model these
correctly. Additionally, no canonical structures have been identified
for the H3 loops, which show extreme variation in size and sequence.
Therefore, a major challenge in modeling the antigen binding site is
to model both the backbone and sidechains of the H3 loop.
The studies reported here used the PGMC-based method described above
to reproduce the conformations of the hypervariable loops from the
murine IgG molecules HyHEL-5[114] (structure 2HFL from the
Brookhaven Protein Database (PDB)) and McPC603[115] (PDB
structure 1MCP). These two cases have been used to test methodologies
which use canonical conformations[116], as well as those
using conformational search methods [118][117] and
those which use a combination of database information and conformational
searching[108]. Different published reports have used
slightly different definitions of the loop regions. As our method is
most similar to that of Bruccoleri and
coworkers[118][106], we used the loop definitions
similar to those in Reference [118], as shown in
Table
. There are two differences: we include an
additional residue in the L1 and L3 loops of HyHEL-5, making these
six-residue loops. In these cases, increasing the size of the loops
consistently improved our results.
In order to produce the largest variety of loop conformations, we used
probability grids with = 5
. As described above, all six loops
were removed from the crystal structure, and each loop was modeled
independently. In the first stage of the simulation, 1000 loops were
created. The energy of the backbone atoms of each of these loops was
calculated, and the top 100 were saved for Phase 2. Any repeat
conformations were eliminated. These 100 best backbone conformations
were each subjected to 50 steps of PGMC, using
probability
grids and a simulation temperature of 1000 K. The full energy of the
loops, including the sidechain atoms, was used for the Monte Carlo
phase. For each of the six loops, the overall lowest-energy
conformation from the 100 Monte Carlo runs was saved for Phase 3. In
this last stage, the six loops were assembled and were
energy-minimized using the conjugate-gradients minimizer of
BIOGRAF[110].
The results for McPC603 and HyHEL-5 are described in
Tables
and
. The Phase 2
root-mean-square (rms)
deviations are given with respect to the crystal structures. The
Phase 3 rms deviations are given with respect to DREIDING reference
structures, in order to reduce the effects of the choice of
forcefield. The DREIDING reference structures were created by
minimizing the loop conformations from the crystal structure while
holding the framework atoms fixed. For all cases, hydrogen atoms were
not included in the determination of rms deviations.
Results were consistently good for small and medium-sized loops (4-6
residues), with the exception of the H1 loop of HyHEL-5. For this
loop, 17 of the top 25 loops from Phase 2 had C rms deviations of
less than 1.5 Å, but the best loop had a deviation of 2.48 Å. It
is possible that increasing the sophistication of the energy term,
such as including terms for surface area or solvation, could improve
the selection of more native-like conformations. As noted in other
studies[108][118], the energy from vacuum
calculations correlates only modestly with improved fit to the crystal
structure. For example, the bests C
fits from the 1000 loops
generated in Phase 1 are listed in Table
, clearly
indicating that better loops were sampled. Nevertheless, the vacuum
calculations used here did quite well for most of the medium sized
loops, with C deviations ranging from 0.27 Å for H3 of HyHEL-5
to 1.75 Å for L3 of HyHEL-5. It is important to note that the
backbone conformations of the two H3 loops were modeled quite well,
since these loops cannot be modeled using canonical
structures[116]. Modeling of the L3 loop of McPC603 was
very successful, not only in that the rms deviations were low, but
that the cis peptide bond of Pro 101 was accurately predicted.
Likewise, the trans peptide bond of the homologous Pro 94 of
HyHEL-5 was predicted, as well. On the other hand, the large L1 and H2
loops of McPC603 were modeled poorly in comparison to the shorter
loops. The conformational space of these loops is so large that even
1000 trial structures is probably too small a number for adequate sampling.
Modeling loops greater than
10 residues in length is probably beyond the capabilities of both
database and conformation-searching algorithms, the former because of
the lack of example conformations meeting the necessary criteria and
the latter because the huge conformational space cannot be adequately
searched in a short period of time. Generally, the calculations here
took two hours per loop. One hour was required to generate the 1000
loop conformations and one hour was required to optimize the sidechain
conformations of the 100 initial conformations in Phase 2.
The C traces of the modeled and actual loops for HyHEL-5 and
McPC603 are shown in Figures
and
.
The loops are shown in the same face-on view of the antigen-binding
site as used in Figure
. The excellent fits of the
predicted loops for H3, L1, and L2 of HyHEL-5 and H1, L2, and L3 of
McPC603 are very apparent, even in the simple C trace pictures.
More detail can be seen in Figures
and
, which show the entire backbone conformations of the
model and actual loops in views perpendicular to the plane of the
loops. The large errors in the backbone conformations of the L1 and H2
loops of McPC603 would prevent an adequate model of the
antigen-binding pocket from being predicted, if this were a blind test
where the conformation of the loops was unknown. L1 and H2 are both
involved in binding the phosphocholine hapten[119]
co-crystallized with the McPC603 Fab in the PDB structure
2MCP[120], so better modeling would be required to understand
this interaction if no crystal structure were available. In contrast,
the predicted backbone conformations of the HyHEL-5 loops may be
sufficient to understand its binding to lysozyme. Sidechain modeling
would still require improvement. This may be possible with concurrent
optimization of the sidechain loops. Currently, the Phase 2
simulations are done on each individual loop, without the other loops
present. Simulations run with all six loops present should improve the
sidechain packing in the binding site model.
The results are quite comparable to the results of Bruccoleri
et al.[118], who used a more brute-force conformational
search algorithm for constructing possible loop structures and
included surface area calculations in a variety of ways to choose
particular loop conformations to use. That study predicted the
HyHEL-5 loops to an rms deviation of 2.6 Å for all atoms and 1.4 Å
for backbone atoms, compared to our results 2.73 Å and 1.74 Å,
respectively. Our results for McPC603 were not as good, since the two
large loops were modeled less successfully. Nevertheless, the results
for the other four loops are quite good. The results from various
methods are shown in Table
. The PGMC method reported
here, produces results similar to those of the published packages,
despite being a highly generalized method, which does not require the
loops being modeled to have any relationship to loops of known
conformations, and without having taken into account any solvent
effects. The method described here does not require any user input
once the initial conformation and the sequence to be changed has been
specified.