Chapter 1. Introduction

Large-scale systems of thousands and millions of atoms are of great interest in many areas of chemistry, biochemistry, and materials science. Atomistic-level simulations of such systems can provide increased accuracy when compared with either smaller-scale model calculations or grossly-averaged macroscopic models. The ability to analyze such simulations at the atomic level can lead to greater insight into structure/function relationships, effects of chemical modification, and in general the underlying physical basis of the system's behavior. Two key examples of large-scale systems include simulations of polymers and viruses.

Polymer molecular weights are typically in the range of millions, requiring at least hundreds of thousands of atoms to properly model a single polymer chain, let alone several chains at a time. Use of a shorter chain may lead to unphysical end effects; use of an infinite chain via periodic boundary conditions may ignore such effects, or artificially limit the achievable chain conformations. In particular, amorphous or partially-crystalline assemblies, typical of industrial polymers, are not readily describable using a single chain per unit cell. The study of the organization and structures of these systems and of their mechanical and thermodynamic properties thus will require models with on the order of 1 million atoms per unit cell.

The starburst dendrimer class of polymers [1] leads to a monodisperse collection of large molecules having the same topology but each with different packing of the branches and leaves. The limits of growth for these polymers depends critically upon how closely the branches and leaves can pack. For the most interesting case, the PAMAM dendrimers, this may require molecular dynamics studies of systems ranging from 0.25 to 0.5 million atoms.

The smallest important viruses, the picornaviruses (responsible for polio, the common cold, and hoof-and-mouth disease) [2] are composed of protein coats of about 0.5 million atoms and a nucleic acid genome of about the same size. The exterior of the protein coat has approximate icosahedral symmetry. The interior surface of the coat, which must fit around the RNA, is assuredly not highly symmetric, however, and is therefore ill-resolved in X-ray diffraction studies. It is also likely that the exterior symmetry will be broken, particularly at the interfaces between the protein subunits that make up the coat. Understanding such structural details will be important for finding specific antigenic or molecular recognition sites on the exterior surface or for devising agents that could interfere with viral assembly or disassembly.

The smallest virus for which nucleic acid structural information is known is the tobacco mosaic virus, which contains about 3 million atoms in a cigar-shaped structure [3]. The approximate helical symmetry of the coat has been used to obtain structures from X-ray fiber diffraction experiments, but determination of the true structure will require simulations with no such assumed symmetry.

The most expensive computation in standard molecular mechanics and dynamics calculations is the evaluation of the nonbonded energy. Exact computation requires O(N^2) operations, which is infeasible for large-scale systems. Truncation methods have been used to reduce the operation count to O(N), but at the cost of significant decreases in accuracy, particularly for the long-range Coulomb interaction.

The Cell Multipole Method (CMM) [4] was developed to overcome these limitations in handling long-range power-law forces in molecular systems. In particular, it can be used to handle the R^-1 (or R^-2 if screened) Coulomb interaction and the R^6 attractive portion of the usual Lennard-Jones 12-6 or the exponential-6 van der Waals potentials. It is a true O(N)-operation algorithm, with substantially better accuracy than cutoff methods of the same speed. It is thus the most suitable method for handling large-scale molecular mechanics and dynamics problems.

Improved algorithms are not sufficient for performing megamolecular simulations, however. Such large systems also require large amounts of memory and computation, far more than can be provided by the typical scientific workstation. These resources can be most cost-effectively provided at this time by scalable massively parallel computers.

The largest parallel computers available today are most efficiently programmed in a message-passing style. Unfortunately, it is not always easy to implement a set of mathematical equations, which do not by themselves specify appropriate data partitioning and communication patterns, in such a style.

We used a three step strategy to deal with this problem. First, an algorithm is implemented on a standard workstation without regard to parallelization. This allows testing on simple, small cases to ensure that the method incorporates correct physical principles. Second, we used a KSR-1 parallel supercomputer, which provides a global shared memory programming model despite its physically distributed memory, to parallelize incrementally larger portions of the calculation by partitioning data and computation across processors. During this step, calculation and communication are separated within the code as much as possible. When the entire algorithm has been parallelized efficiently in this fashion, it is then reasonably simple to embed the resulting computational routines within a message-passing communications framework, producing an efficient, portable code that can run on the largest production machines, such as the Intel Paragon multicomputers or the Cray T-3D.

Part I of this thesis describes the theory behind molecular dynamics and the Cell Multipole Method, as well as an extension of the CMM to systems with periodic boundary conditions, the Reduced Cell Multipole Method (RCMM) [5].

Part II then discusses the implementation of a large-scale, parallel, distributed-memory, general-purpose molecular dynamics code on the KSR-1 parallel supercomputer. The code uses the CMM and RCMM to handle the nonbonded portions of the calculation. The design of the parallel aspects of the code, particularly the parallelization of the CMM, is described. Details of the implementation of the CMM in a similar, though currently slightly less general, molecular dynamics code on the Intel Touchstone Delta and Paragon multicomputers are also presented. Performance, accuracy, and scalability results are given.

Finally, Part III begins with a discussion of computational experiments leading to a prescription for choosing the value of the free time-scale parameter in Nosé-Hoover constant-volume, constant-temperature (NVT) canonical dynamics. This is followed by several applications of the above large-scale molecular dynamics codes to interesting chemical applications in the areas of argon cluster structure, polymer structure, surface tension of water drops, diffusion of gases through polymers, and viral protein coat structure.


Next / Table of Contents
Kian-Tat Lim