ABSTRACT

  1. How do proteins fold? We have developed the Generic Protein (GP) Direct Monte Carlo method for generating large ensembles of non-overlapping polypeptide chains. Using a massively parallel implementation of this method at the National Center for Supercomputing Applications at Urbana-Champaign, Illinois, we have demonstrated that the GP direct Monte-Carlo approach is capable of exhaustively sampling all possible fold topologies for polypeptides up to 100 amino acids in length. We show that the size of this set is ~3x10^7, substantially smaller than 10^47. This result has significant implications with regard to how a protein samples its unique native fold in vivo in time scales on the order of milliseconds (resolving the Levinthal Paradox).
  2. Ab-initio tertiary structure prediction. Given the difficulty of protein structure prediction, it is important to simplify the problem using prediction approaches that incorporate predicted or experimentally determined structural information. For many prediction targets, distance restraints are available from labeling experiments, disulfide bond connectivity, or preliminary NMR data. Furthermore, methods exist for predicting local structural characteristics such as residue contacts, secondary structure and accessible surface area, and surface turns. The Restrained Generic Protein (RGP) Direct MC method is an off-lattice residue buildup procedure for generating all polypeptide topologies that are consistent with a set of inter-residue distance restraints. The RGP method is useful when very limited (sparse) structural information is available and the topology of the protein is far from uniquely specified. The method efficiently generates the complete set of topologies consistent with a set of inter-residue restraints, even when the number of restraints is very small. As few as N/24 inter-residue restraints reduce the number of topologies sufficiently so that a simple residue burial score can identify the native topology.

    The RGP method is the foundation of the generate-and-select hierarchical approach to the protein structure prediction problem. The RGP method is a highly efficient off-lattice residue buildup procedure that can quickly generate the complete set of topologies that satisfy a very small number of inter-residue distance restraints. For 3 restraints uniformly distributed in a 72-residue protein, we demonstrate that the size of this set is ~10^4. The RGP method can generate this set of structures in less than one hour using a Silicon Graphics R10000 single processor workstation. Following structure generation, a simple criterion that measures the burial of hydrophobic and hydrophilic residues can reliably select a reduced set of ~10^2 structures that contains the native topology. A minimization of the structures in the reduced set typically ranks the native topology in the five lowest energy folds. In the final hierarchical step, full atom models of the remaining candidate structures are created, allowing for further refinement and recognition of the final structure using full atom/full solvation molecular dynamics procedures. Thus, using this hierarchical approach, the de novo prediction of moderate resolution globular protein structure can be achieved in just a few hours on a single processor workstation. We have demonstrated the successful use of this hierarchy on two protein targets, 72-residue DNA-binding protein, and 146-residue Myoglobin. The results are in press in J. Phys. Chem. B.


Complete Talk (pdf format)