ls1 mardyn  a Massively Parallel Molecular Simulation Code
ls1 mardyn is designed for massively parallel molecular dynamics (MD) simulations on modern supercomputers and currently holds the world record for the largest MD simulation of all times, with over four trillion particles.
In MD simulations, interactions between molecules are evaluated based on potentials. ls1 mardyn features pair potentials, point charges, dipoles, quadrupoles, LennardJones sites and the Tersoff potential. Usually interaction potentials are evaluated only up to a given cutoff radius, as the contribution of farapart pairs of molecules are relatively small and these missing contributions can be approximated with longrange correction schemes. The cutoff reduces the computational complexity of the force calculation to O(N). In order to reduce the complexity of the whole simulation to O(N), finding neighbouring molecules within the cutoff also has to be reduced to O(N). This is achieved in ls1 mardyn via the linkedcell algorithm.
However, in order to reach the extreme scales and efficiency in computing required to address length and time scales previously out of scope for simulations of highly dynamic and heterogenous systems, highly efficient methods for neighbour search and for dynamic load balancing were developed.
Neighbour Search
In highly dynamic systems neighbours have to be identified often, as the spatial arrangement changes rapidly. ls1 mardyn features an adaptive linkedcell algorithm. The basic linkedcell algorithm divides the simulation volume into equally sized cubic cells with edge length equalling the cutoff radius. Therefore, all interaction partners for any given molecule are no further than one cell away. Nonetheless, the neighbouring cells still contain molecules which are beyond the cutoff radius. Comparing the sphere within which interactions are evaluated to the volume of all neighbouring cells gives a ratio of 0.16. Thus, for a homogeneous distribution, only 16 % of all molecule pairs are accepted for the force calculation.

Molecular simulation of vapourliquid phase boundaries using ls1 mardyn. Green: CO2, Blue: Oxygen, Temperature: 200 C. 
Reducing the volume which needs to be searched can therefore save a lot of computing time. Using smaller cells with e.g. an edge length of half the cutoff improves the ratio to 0.27. However, this causes additional effort, as more cells need to be handled. The adaptive linkedcell of ls1 mardyn is capable of switching the cell size on the fly. For dense regions, where the time for computing the neighbour distances outweighs the cost of handling more cells, the cell size is thus reduced dynamically.
Dynamic Load Balancing
ls1 mardyn is parallelized using the domain decomposition. The simulation volume is divided into subvolumes, which are distributed to the available processing units. This method scales linearly with the number of molecules and is therefore well suited for large systems.
However, for heterogeneous scenarios (the molecules are distributed irregularly in space), the workload of equally sized subvolumes differs dramatically: it is proportional to the number of interactions and therefore grows quadratically with density. Simulations containing coexisting liquid and vapour phases, can easily vary in local density by a factor of > 100. Therefore, the workload for two subvolumes of equal size can differ by a factor of > 10,000.
In order to balance the load, the subvolumes must thus be chosen not with equal size, but equal load. In ls1 mardyn this is achieved by a kdtree decomposition. The cells of the linkedcell algorithm are used as the basic volume units for which the load costs are determined. On the basis of the computational cost for each of the cells, the kdtree decomposition recursively splits the simulation domain in two, along alternating dimensions, such that both sides have equal load, until the number of required subvolumes is reached. As the simulations are highly dynamic, this is repeated in regular intervals on the fly.
Scalability
Scalability studies were carried out with heterogeneous and homogeneous scenarios. Heterogeneous scenarios are very challenging  good scaling can be achieved for up to ~ 1,000 cores. Homogeneous scenarios are less challenging and show excellent scaling behaviour with ls1 mardyn. They allow for the utilization of entire state of the art supercomputers, as e.g. the Hermit system. For weak scaling, parallel efficiencies of over 90 % can be reached on more than 100,000 cores.
Partners
The massively parallel simulation code ls1 mardyn is a joint development by
 TU Kaiserslautern
 University of Paderborn
 TU München
 HLRS
For more Informations:
• Colin Glass
University of Stuttgart, HLRS
top
