Innovatives Supercomputing in Deutschland
inSiDE • Vol. 11 No. 2 • Autumn 2013
current edition
archive
centers
events
download
about inSiDE
index  index prev  prev next  next

ls1 mardyn - a Massively Parallel Molecular Simulation Code

ls1 mardyn is designed for massively parallel molecular dynamics (MD) simulations on modern supercomputers and currently holds the world record for the largest MD simulation of all times, with over four trillion particles.

In MD simulations, interactions between molecules are evaluated based on potentials. ls1 mardyn features pair potentials, point charges, dipoles, quadrupoles, Lennard-Jones sites and the Tersoff potential. Usually interaction potentials are evaluated only up to a given cut-off radius, as the contribution of far-apart pairs of molecules are relatively small and these missing contributions can be approximated with long-range correction schemes. The cut-off reduces the computational complexity of the force calculation to O(N). In order to reduce the complexity of the whole simulation to O(N), finding neighbouring molecules within the cut-off also has to be reduced to O(N). This is achieved in ls1 mardyn via the linked-cell algorithm.

However, in order to reach the extreme scales and efficiency in computing required to address length and time scales previously out of scope for simulations of highly dynamic and heterogenous systems, highly efficient methods for neighbour search and for dynamic load balancing were developed.

Neighbour Search

In highly dynamic systems neighbours have to be identified often, as the spatial arrangement changes rapidly. ls1 mardyn features an adaptive linked-cell algorithm. The basic linked-cell algorithm divides the simulation volume into equally sized cubic cells with edge length equalling the cut-off radius. Therefore, all interaction partners for any given molecule are no further than one cell away. Nonetheless, the neighbouring cells still contain molecules which are beyond the cut-off radius. Comparing the sphere within which interactions are evaluated to the volume of all neighbouring cells gives a ratio of 0.16. Thus, for a homogeneous distribution, only 16 % of all molecule pairs are accepted for the force calculation.

Molecular simulation of vapour-liquid phase boundaries using ls1 mardyn. Green: CO2, Blue: Oxygen, Temperature: -200 C.

Reducing the volume which needs to be searched can therefore save a lot of computing time. Using smaller cells with e.g. an edge length of half the cut-off improves the ratio to 0.27. However, this causes additional effort, as more cells need to be handled. The adaptive linked-cell of ls1 mardyn is capable of switching the cell size on the fly. For dense regions, where the time for computing the neighbour distances outweighs the cost of handling more cells, the cell size is thus reduced dynamically.

Dynamic Load Balancing

ls1 mardyn is parallelized using the domain decomposition. The simulation volume is divided into subvolumes, which are distributed to the available processing units. This method scales linearly with the number of molecules and is therefore well suited for large systems.

However, for heterogeneous scenarios (the molecules are distributed irregularly in space), the workload of equally sized subvolumes differs dramatically: it is proportional to the number of interactions and therefore grows quadratically with density. Simulations containing coexisting liquid and vapour phases, can easily vary in local density by a factor of > 100. Therefore, the workload for two subvolumes of equal size can differ by a factor of > 10,000.

In order to balance the load, the subvolumes must thus be chosen not with equal size, but equal load. In ls1 mardyn this is achieved by a kd-tree decomposition. The cells of the linked-cell algorithm are used as the basic volume units for which the load costs are determined. On the basis of the computational cost for each of the cells, the kd-tree decomposition recursively splits the simulation domain in two, along alternating dimensions, such that both sides have equal load, until the number of required subvolumes is reached. As the simulations are highly dynamic, this is repeated in regular intervals on the fly.

Scalability

Scalability studies were carried out with heterogeneous and homogeneous scenarios. Heterogeneous scenarios are very challenging - good scaling can be achieved for up to ~ 1,000 cores. Homogeneous scenarios are less challenging and show excellent scaling behaviour with ls1 mardyn. They allow for the utilization of entire state of the art supercomputers, as e.g. the Hermit system. For weak scaling, parallel efficiencies of over 90 % can be reached on more than 100,000 cores.

Partners

The massively parallel simulation code ls1 mardyn is a joint development by

  • TU Kaiserslautern
  • University of Paderborn
  • TU München
  • HLRS

For more Informations:

• Colin Glass
University of Stuttgart, HLRS


top  top