Extreme Scaling Workshop at LRZ
July 9-11, 2013: Running Real World Applications on more than 130,000 Cores on SuperMUC
In July 20l3, the Leibniz Supercomputing Center (LRZ) organized the first ex- treme scaling workshop on SuperMUC, the 3 PFLOP/s system consisting of 18 thin node islands with 147,456 Intel Sandy Bridge CPU cores. Prior to the workshop, the participants had to show that their code scales up to 4 islands (32,768 cores). Research groups from 14 international projects attained this goal and were invited to the LRZ for a three day workshop. During that time, the participants could test the scaling capabilities of their codes up to a maxi- mum of 16 islands (two islands were continuing user operation). Application experts from the LRZ, Intel, and IBM were present during the workshop to resolve the performance optimization and tuning issues. New techniques like the fast MPI startup mechanism of large-scale special jobs were success- fully executed on SuperMUC to reduce the startup time by a factor of 2-3. At the end of the third day, 6 applications were successfully running on 16 islands (131,072 cores), while the other 8 ap- plications managed to run on 8 islands (65,536 cores).
Listed below are the name and description of the applications and the maximum number of islands the applications successfully ran (one island consists of 512 nodes with 16 physical cores each):
|Figure 1: Scaling Plot for Vertex.
BQCD (16 islands)
BQCD (Berlin Quantum ChromoDynamics program) is a hybrid MPI+OpenMP parallelized Monte-Carlo program for simulating lattice QCD with dynamical Wilson fermions. It allows for simulating 2 & 2 + 1 fermion flavors at a time.
CIAO (8 islands)
CIAO solves the reacting Navier-Stokes equations in the low-Mach limit. It is a second order, semi-implicit finite difference code. It uses Crank-Nicolson time advancement and an iterative predictor corrector scheme. Spatial and temporal staggering is used to increase the accuracy of stencils. The Poisson equation for the pressure is solved by the multi-grid HYPRE solver. Momentum equations are spatially discretized with a second order scheme. Species and temperature equations are discretized with a fifth order WENO scheme.
P-Gadget3-XXL (16 islands)
Highly optimized and fully MPI parallelized TreePM-MHD-SPH code for simulating cosmological structure formation. In its current version it also allows for an effective OpenMP parallelization within each MPI task.
GROMACS (8 islands)
A versatile package to perform molecular dynamics, i.e. to simulate the Newtonian equations of motion for systems with hundreds to millions of particles.
LAMMPS (16 islands)
LAMMPS (Large-scale Atomic/Molecular Massively Parallel Simulator) is a MPI parallelized numerics code for simulating molecular dynamics.
Nyx (16 islands)
The code models dark matter as a system of Lagrangian fluid elements, or “particles,” gravitationally coupled to an inviscid ideal fluid representing baryonic matter. The fluid is modeled using a finite volume representation in an Eulerian framework with block-structured AMR. The mesh structure used to evolve fluid quantities is also used to evolve the particles via a particle-mesh method. In order to more accurately treat hypersonic motions, where the kinetic energy is many orders of magnitude larger than the internal energy of the gas, Nyx uses the dual energy formulation, where both the internal and total energy equations are solved on the grid during each time step.
Vertex (16 islands)
Neutrinoradiation hydrodynamics code, simulates from first principles the physical processes during the evolution of a supernova explosion. Therefore a MPI+OpenMP parallelized implementation of PPM hydrodynamics + a coupled ray-by-ray transport scheme is used.
|Figure 2: Scaling Plot for P-Gadget3-XXL.
|Figure 3: Scaling Plot for GROMACS.
|Figure 4: Scaling Plot for CIAO.
APES (8 islands)
A suite of solvers for problems common in engineering applications. It is based on a common mesh representation library TreElM, and provides besides the solvers a mesh generation and post-processing tool. Currently there are mainly two different solvers developed within APES to implement two different numerical methods: Musubi and Ateles. Musubi implements a LatticeBoltzmann scheme and can deal with various models. Besides the main incompressible Navier-Stokes model it is also capable of propagating passive scalars and multiple species in liquid or gas mixtures. It is mainly used for flow simulations that involve complex geometries, e.g. the flow through a channel filled by some obstacles for the simulation of electrodialysis. Another is the flow of blood through aneurysms and the simulation of the clotting effects. Ateles is a high order discontinuous Galerkin solver that is currently mainly deployed for the simulation of linear conservation laws, like the Maxwell equations.
SeisSol (8 islands)
SeisSol is one of the leading codes for earthquake scenarios, in particular for simulating dynamic rupture processes and for problems that require discretization of very complex geometries. It allows multi-physics ground motion simulation for earthquake-engineering, including the complete dynamic rupture process and 3D seismic wave propagation with frequencies resolved beyond 5 Hz. The numerics in SeisSol are based on a higher-order discontinuous Galerkin discretization and an explicit time stepping following the arbitrary high order derivatives method. In combination with flexible unstructured tetrahedral meshes for spatial adaptivity, SeisSol shows excellent scalability and time to solution on recent supercomputing architectures.
ExaML (4 islands)
Exascale Maximum Likelihood (ExaML) MPI application for inferring evolutionary trees of a set of species under the maximum likelihood criterion. It is an implementation of the popular RAxML search algorithm for partitioned multi-gene or wholegenome datasets.
ICON (4 islands)
ICOsahedral Nonhydrostatic general circulation model is a joint development of the Max Planck Institute for Meteorology in Hamburg, and the Deutscher Wetterdienst. ICON is a next generation earth system model designed to simulate multiple scales of the atmosphere processes, enabling both climate simulations and numerical weather predictions. It provides the option to run locally nested highly refined resolutions, allowing simulations at a very fine scale. ICON is a non-hydrostatic global model with a local zoom function.
All projects were able to generate scaling curves up to 8 or 16 islands. From the preliminary data the following Flops rates have been obtained: 250 TFlop/s for VERTEX on 16 and 201 TFlop/s for Gromacs on 8 islands. The measured Flop rates for the complete application codes correspond to 10 % or more of the peak performance of SuperMUC.
These results obtained in a short workshop can definitely compete with results reported from other Top10 supercomputers such as the K-computer and the Blue Waters system. They demonstrate the usability of SuperMUC for real world applications.
LRZ Extreme Scale Benchmark and Optimization Suite
Some of the participating projects agreed to provide their codes for an automised benchmarking and validation suite, based on the DEISA benchmark and Scalalife Validation suite (ref 1). The purpose of the package is automatic testing of the whole machine e.g. after system maintenance and identification of performance bottlenecks.
The LRZ is already planning a follow-up workshop in the near future, where the improvements and feedback from the experts will be tested.
• Momme Allalaen
• Christoph Bernau
• Arndt Bode
• David Brayford
• Matthias Brehm
• Nicolay Hammer
• Herbert Huber
• Ferdinand Jamitzky
• Anupam Karmakar
• Carmen Navarrete
• Helmut Satzger
Leibniz Supercomputing Center
• Gurvan Bazin
• Klaus Dolag
Universitäts Sternwarte, Ludwig-Maximilians Universität München, Germany
• Jan Frederik Engels
• Wolfram Schmidt
Institute for Astrophysics, University of Göttingen, Göttingen, Germany
• Carsten Kutzner
Max Planck Institute for Biophysical Chemistry, Göttingen, Germany
• Andreas Marek
Rechenzentrum der Max-Planck Gesellschaft am Max Planck-Institut für Plasmaphysik, Garching, Germany
• Philipp Trisjono
Institut für Technische Verbrennung, RWTH Aachen, Germany