In homology modeling studies, the X-ray crystal structure of one protein serves as a template for predicting the 3-dimensional structure of a second protein, which is similar in sequence but whose tertiary structure has not been determined experimentally. This method has been used successfully for a variety of systems, such as HIV-1 protease modeled from a protease of Rous sarcoma virus and amyloid precursor protease inhibitor domain modeled from bovine pancreatic trypsin inhibitor. If the sequences of the template protein (T) and the unsolved protein (U) are very similar in length and composition, the structure of protein U can be modeled simply by using nearly the entire 3-dimensional structure of T, modifying only the coordinates of the sidechains which differ in the two proteins. Replacement of sidechain geometries is a standard facility of most molecular modeling software. A more complex modeling task arises when regions of the proteins differ in both sequence and the number of residues. There is no standard method for replacing three residues in protein T with six residues from protein U. Even if such sequence length mismatches are localized to short segments of the protein, there are significant rearrangements in the backbone conformations which must be modeled by more sophisticated techniques.
The loop-modeling procedure described here provides a methodology for making such replacements by sampling the conformational space of the variable-length sequences. Regions of the two proteins which are highly similar in both sequence and length are termed ``framework'' regions. The regions of variable length are termed ``loops.'' This is a broader use of the term ``loop'' than in the terminology of Rose and coworkers, who use the term to define regions of proteins which meet certain geometric criteria. Nevertheless, the variable regions described here are often loops in both senses. Modeling these loops requires success in two endeavors: determination of the loop backbone, which must meet the geometric constraints imposed by the loop ``endpoints,'' where it attaches to the framework, and optimization of the sidechain positions. We have separated these two components into different phases. The first phase rapidly produces a large number of backbone conformations which meet the endpoint criteria while the second phase samples sidechain conformations for the best loops from Phase 1. A third phase optimizes the best structure from Phase 2 by using energy minimization of all atomic positions. This strategy allows increasing sophistication to be built into the model as the breadth of the conformational searches is decreased.