Autumn 2016

HPC Landscape & Education

Gauss Centre for Supercomputing @ ISC'16 (June 19-23)

At ISC‘16, the international supercomputing ­conference held in Frankfurt/Main (June 19–23), the 64 m2 large booth of the Gauss Centre for Supercomputing (booth #1310) proved once again to be the pivotal platform for the international scien­tific and industrial HPC community.

The open and enticing layout of the booth, designed to encourage the ISC attendees to stop by and interchange with the representatives of the GCS centres HLRS (High ­Performance Computing Center Stuttgart), JSC (Jülich Supercomputing Centre), and LRZ (Leibniz Supercomputing Centre, Garching/Munich) has again paid off. It attracted countless like-minded researchers, technology ­leaders, and scientists, amongst other visitors interested in HPC technologies and HPC activities.

The Virtual Reality (VR) presentation set up on the booth by the visualisation team of the HLRS revealed to be a true eye catcher for ISC atten­dees. The HLRS scientists had used a 3D tiled display in table configuration to interactively investigate data acquired from CT or MRI scans in combination with biomechanical simulations of bone implant compounds, blood flow ­simulations as well as anatomical structures and forensic data. One of the demonstrations displayed clinical CT-data of a trauma patient. A second demonstration displayed the ­3D-­image acquired from the torso of a crime victim who had been shot three times and killed. Visitors were able to observe the virtually recreated body structures from all different angles and follow the explanations by the HLRS ­scientists ­regarding the analysis and derived insights of the biomechanical simulation results or the reconstruction of the crime scenario.

Extremely well received were also the ISC exhibits arranged by the JSC team. Their in-house developed supercomputing applications and tools, in particular LLview, the interactive monitoring software for supercomputers which demonstrated live the operation of various supercomputers worldwide, found an enormous level of interest. Video showcases of the LRZ, presenting SuperMUC user projects respectively activities related to energy efficient data centre management and operation, completed the presentations on the GCS booth which provided a comprehensive demonstration of the wide spectrum of challenging HPC activities the GCS is involved in, underlining GCS‘s role as a global leader in high performance computing.

GCS Award 2016

At ISC‘s Research Paper session, Prof. Michael M. Resch, Director of the HLRS, was very pleased to put the spotlight on a young and bright researcher of the University of Bologna. The paper Predictive Modeling for Job Power Consumption in HPC Systems, written and submitted by Mr. Andrea Borghesi (University of Bologna) and his team of five, had been selected by the International Award Committee as the most outstanding research paper submitted at this year‘s ISC and thus was honoured with the coveted GCS Award 2016. The paper focuses on methods to accurately predict the power consumption of typical supercomputer workloads and provides insight into how to implement and successfully execute power-saving techniques to real working environments. Prof. Resch as chairman of the award committee and Dr. Claus Axel Müller, Managing Director of GCS, handed the award certificate over to Mr. Borghesi, who had introduced his very interesting work to the attentively following large audience attending this ISC research paper session.

ISC Conference

In addition to its presentations on the exhibition floor of the ISC, GCS contributed to the HPC ­conference with numerous talks and presentations in various tutorials and sessions, workshops, and birds-of-a-feather meetings. One of the highlights, brought to the ISC attendees by GCS, was the special conference session Advanced Disaster Prediction and Mitigation, hosted by Prof. Arndt Bode of the LRZ.

Three users introduced their research projects, which they had carried out on the GCS HPC ­systems Hazel Hen, JUQUEEN, and SuperMUC and provided detailed insight into the challenges of their undertakings and presented their up to now achieved results:

  • Safety in the Underground – Coupling CFD with Pedestrian Simulations (Prof. Dr. Armin Seyfried, Jülich Supercompu­ting Centre & University of Wuppertall)
  • Large-Scale Multi-Physics Earthquake Scenarios with the ADER-DG Method on Modern Supercomputers (Stephanie Wollherr, Ludwig-Maximilians-Universität München)
  • Advancing Numerical Weather Prediction & Downscaling Global Climate Models with Emphasis on Weather Extremes (Prof. Dr. Christoph Kottmeier, Karlsruhe Institute of Technology/KIT)

Additionally, representatives of the GCS ­centres supported activities of GCS partners and/or related HPC initiatives, such as by the Partner­ship for Advanced Computing in Europe (PRACE), European Exascale Projects (DEEP/DEEP-ER, Mont-Blanc, EXDCI), the two European Centres of Excellence CoeGSS and POP, the Jülich-Aachen Research Alliance (JARA), the UNICORE Forum, and others.

TOP500, June 2016

While the Chinese supercomputer Sunway ­TaihuLight, the new #1 on the 47th edition of the TOP500 list revealed at ISC‘16, surprised the global HPC community with its impressive Linpack performance of 93 Petaflops (Rmax), GCS is proud to emphasize that its supercomputing installations Hazel Hen (at HLRS), JUQUEEN (at JSC), and SuperMUC Phase 1 and Phase 2 (at LRZ) continue to represent significant positions in the TOP500. In the list, which enumerates the world’s most powerful HPC systems world-wide, Hazel Hen is registered on position 9 (Linpack: 5.64 Petaflops) with which the HLRS supercomputer defended its ranking amongst the top-10 most powerful HPC systems world-wide. Despite meanwhile being in its 5th year of operation, JSC’s JUQUEEN holds an impressive 14th place with its Linpack performance of 5.01 Petaflops. The supercomputing installations at LRZ, SuperMUC Phase 1 on position #27 (Linpack: 2.90) and SuperMUC Phase 2 (#28, Linpack: 2.81), complement the solid and strong outcome of the GCS HPC systems in the current TOP500 rankings. Combined, the GCS continues to provide the by far most powerful HPC platform in all of Europe for research in science and industry. Apart from its world-class performance, the GCS infrastructure excels at its complementary ­system technologies and architectures to meet the most demanding needs of the individual fields of research.

contact: Regina Weigand, r.weigand[at]

  • Regina Weigand

GCS Public Relations

ISC’16 PhD forum And the winner is...

The ISC 2016 conference has organized for the first time the ISC PhD forum. This new event provided an excellent opportunity for PhD students to present their ongoing research to a wide audience of international HPC experts.

The event was conducted in a lively and ­inspiring setting: First, every student had to summarize their work in a lightning talk within a strict time limit of four minutes. Although there were several students in the early phase of their PhD studies, all participants did a very good job on that challenge. The second stage consisted of a 60 minute poster session in which the students had the opportunity to present their work in more detail. Both events were very well attended and many lively discussions evolved at the posters. The 14 PhD forum presenters had been selected by an international committee led by Prof. Lorena Barba (George Washington University) out of 28 submissions. Considering oral and poster presentation as well as the research contributions, the committee finally selected Alfredo Parra Hinojosa (TU Munich) to receive the first ISC PhD forum award, which comprised an iPad and a book voucher (sponsored by Springer). His research focuses on tolerating hard and soft faults with the sparse grid combination technique and is funded by the DFG SPPEXA priority program through the EXAHD project. Congratulations!

The DFG SPPEXA Priority program supported the event by providing travel funds for all candi­dates, thus enabling many students from outside Germany to present at the PhD forum.

Next year, ISC will host the PhD forum again. The committee led by Prof. Bill Gropp (­University of Illinois at Urbana-Champaign) is looking forward to receiving many high-quality submissions.

Overview of PhD forum presentations:

contact: Prof. Dr. Gerhard Wellein, Gerhard.Wellein[at]

  • Gerhard Wellein

Department of Computer Science, Erlangen-Nürnberg University

Vice World Champion is located in Stuttgart

The supercomputer „Hazel Hen“, a Cray XC40 system located at HLRS, is the second best system in the world regarding its performance under real application conditions. This is a result of the High Performance Geometric multigrid (HPGMG) benchmark, which was published at the International Supercomputing Conference (ISC) in Frankfurt in 2016.

Hazel Hen exploits its strenghts in real applications

The supercomputer surpassed much more expensive and larger systems in the HPGMG ranking. Only the system „Mira“, located at Argonne National Laboratory in the USA (5.00e11 DOF/s), achieved better results than Hazel Hen (4.95e11 DOF/s). The HLRS system could even be well ahead of systems which got better results in the Top500 world ranking.

Number 9 in Top500 list

As the fastest supercomputer in the EU, Hazel Hen currently ranks at the ninth place in the worldwide TOP500 List. Unlike this Linpack benchmark the HPGMG benchmark doesn’t examine the theoretical performance of the ­system, but focuses on the benefits for users in practical application. The base of the HPGMG benchmark is a geometric multigrid solver, which is for example also used in fluid dynamics calculations.

"We are proud to have defended our position in the top 10 in the TOP500 list. But our excellent results in HPGMG benchmark, a test under real working conditions, is much more important. The benefits for our users are our top priority", says Prof. Michael Resch, Director of HLRS.

contact: Felicitas Knapp, knapp[at]

  • Felicitas Knapp

High-Performance Computing Center Stuttgart (HLRS), Germany

Prototyping next-generation supercomputing architectures: ISC’16 Workshop

At this year’s ISC’16, the DEEP-ER and Mont-Blanc projects, two European funded FP7 ­Exascale initiatives, co-organised a workshop on hardware prototyping for next-generation HPC architectures. The event took place on Thursday, June 23, 2016 at the Marriot Hotel in Frankfurt, Germany.

The evolution curve of the computational power of supercomputers is getting flat and after hitting the frequency wall, the HCP community is facing a critical point for Moore’s law. For these reasons experimenting with novel architectures is a must. While the trend towards hetero­geneous computing in the form of coprocessors, accelerators or on-chip helper cores is more evolutionary, revolutionary approaches like neuromorphic computing are in the limelight, as well. Their common goal is to increase performance while being energy efficient. Yet, all those concepts and ideas need to be demons­trated and verified with prototypes.

With strong involvement of the GCS members JSC and LRZ, the DEEP-ER and Mont-Blanc organisation team put together a compelling workshop programme featuring the aforementioned diverse range of approaches. Whereas the first session was devoted to ARM-based prototyping, the second session focused on Intel Xeon Phi architectures, e.g. the QPACE 1 and 2 projects carried out at Universität Regensburg. Special contributions to the workshop included a keynote by Prof Toshihiro Hanawa from the Tokyo University and Prof Steve Furber from the University of Manchester. Prof Hanawa added some international perspective to this European workshop and talked about research projects developing innovative interconnection prototypes using FPGA technology. The invited talk by Prof Furber detailed the SpiNNaker ­project, based on ARM cores and going into the direction of neuromorphic computing.

More information on the workshop is available on this website:

Compute Cloud Training at LRZ

On July 5, 2016, LRZ organised a training for their Compute Cloud. The purpose of the workshop was to introduce the service to beginners and to give an overview on the functionalities.

Next to a general introduction to the concept of Cloud Computing and details on the LRZ service, the training featured two sessions for hands-on training. In these sessions, best practices were covered to address the most common tasks and challenges for basic and advanced use cases in the Cloud.

The Compute Cloud is an additional HPC resource at Leibniz Supercomputing Centre which has been officially launched as a service in early 2015. The main purpose is to provide scien­tific customers with HPC resources needed on short notice. Hence, users are not required to issue an official application via a GAUSS or PRACE call. The Compute Cloud allows them to adjust the resources dynamically and very flexibly according to their needs and offers the possibility to create a personalised environment as it is an Infrastructure-as-a-­Service (IaaS) solution based on the open source software ­OpenNebula.

The training material is accessible via this website:

More information on the service itself can be found here:

First JARA-HPC Symposium

The first JARA-HPC Symposium took place in Aachen, Germany, from October 04–05, 2016. The symposium was jointly organized by the Forschungszentrum Jülich and RWTH Aachen University in the framework of the Jülich Aachen Research Alliance (JARA).

Current HPC systems consist of complex configurations with a huge number of components, very likely heterogeneous, and typically with not enough memory. The hard- and software configuration can change dynamically due to fault recovery or power saving procedures. Deep software hierarchies of large, complex software components are needed to make efficient use of such systems. On the applications side, HPC systems are increasingly used for data analytics and complex workflows. Successful application development requires collaboration between the domain scientists on one side, and computer science/HPC experts on the other.

JARA-HPC is the High Performance Computing section of JARA. Its scientists combine the knowledge of massively parallel computing on supercomputers with the respective expert competences from different research fields.

JARA-HPC organized this symposium to motivate lively discussions on the various aspects of the development of HPC applications among experts. About 60 participants had the opportunity for an in-depth exchange with colleagues from different research fields who also make use of HPC systems in their scientific work.

The program comprised a keynote by Victor Eijkhout of the Texas Advanced Computing Center (TACC), followed by two days of presentations on diverse topics in scientific computing. A topical focus was placed on Aeroacoustics and CFD in HPC during a mini-workshop on the second day. The symposium closed with a panel on software engineering in HPC.

contact: Marc-Andre Hermanns, m.a.hermanns[at]

  • Marc-Andre Hermanns, Bernd Mohr

Jülich Supercomputing Centre (JSC), Germany

  • Michaela Bleuel

General Manager, JARA-HPC, Germany

Big BlueGene Week at JSC on JUQUEEN

Capability computing is a major pillar of advances in computational science. Governed by the paradigm of solving otherwise intractable single ­scientific problems by means of extremely large parallel computing architectures, it is distinct and complementary to capa­city computing. At the same time, contemporary HPC facilities aim at providing services for both these demands, which can compromise in particular the potential of capability computing. JSC thus decided to favour very large compute jobs on JUQUEEN, its IBM BlueGene/Q HPC system within a special event over a whole week. From June 14 to 20 JUQUEEN was dedicated exclusively to large-scale massively parallel computations.

The response to this offer was tremendous, with users taking the chance to execute some of their scientifically and computationally most demanding simulations in full parallelism on up to 458.752 compute cores. More than 77 % of the available time was used for true full machine runs, and in total 22 users could complete 84 jobs, which amounts to about 70 Mio core-h. The availability of the system during the week was higher than 93 %, demonstrating a remarkable level of reliability. The event served various scientific use cases in topics such as turbulent fluid dynamics, neuroscience, elementary particle physics, molecular dynamics and complex stencil code development. Modelling the propagation of polarized light through brain tissue by means of a massively parallel three-dimensional Maxwell solver to enhance the understanding of the structural organization of the human brain, and simulating the mixing of species in a turbulent decaying flow as it occurs in various practical situations are examples for specific problems addressed.

JUQUEEN BigWeek – case study „turbulent flows“

The turbulent motion of fluid is still one of the unsolved problems of classical physics and its description remains challenging. The understanding of turbulent flows and turbulent mixing is of great interests for many applications. Prominent examples are the turbulent combustion of chemical reactants and the dynamics of the atmosphere or the oceans. As a continuum field phenomenon, turbulence is in principle infinite-dimensional and strongly non-local and non-linear. The temporal and spatial evolution of a velocity field is described by the Navier-Stokes equations, which contain all necessary information to fully characterize the motion of turbulent flows. Although the Navier-Stokes equations are formally deterministic, turbulence dynamics are by no means the same.

Even for the simplest turbulent flows, an analytical solution of the Navier-Stokes equations is not known. Therefore, a solution of the Navier-­Stokes equations can only be obtained by numerical methods. Direct numerical simulation (DNS) has become an indispensable tool in turbulence research. DNS solves the Navier-Stokes equations for all scales down to the smallest vortices and can be regarded as a numerical experiment.

The DNS of decaying homogeneous isotropic turbulence is a canonical case of significant interest. Performing DNS of decaying turbulence at high Reynolds numbers is challenging as two opposing constrains need to be  satisfied: it is necessary to accurately resolve the smallest length scales by the numerical grid, while keeping the largest length scales small compared to the size of the numerical domain to reduce confinment effects. Based on the highly optimized simulation code psOpen/nb3dfft [1, 2] a direct numerical simulation (DNS) of decaying turbulence with more than 231 Billion grid points was carried out during the big-week. With access to the complete 28 racks of the JUQUEEN supercomputer it was possible to perform DNS of decaying turbulence in a thus far not reached accuracy and resolution. The high resolution is essential during the transition and the early decay phase.

This simulation contributes to a better understanding of turbulent flows and the self-similarity during decay.

Due to the positive response, JSC is considering to repeat this type of event and encourages all users with suitable codes and applications to join in and report on their experiences. All users interested in further information are invited to contact JSC via


  • [1] Gauding, Michael, Goebbert, Jens Henrik, Hasse, Christian and Peters, Norbert:
    Line segments in homogeneous scalar turbulence.Physics of Fluids (1994-present) Vol. 27 Nr. 9 (2015) pp. 095102. AIP Publishing.
  • [2] Goebbert, J. H., Gauding, Michael, Ansorge, Cedrick, Hentschel, Bernd, Kuhlen, Torsten and Pitsch, Heinz:
    Direct Numerical Simulation of Fluid Turbulence at Extreme Scale with psOpen. Advances in Parallel Computing Vol. 27 pp. 777-785 (2016). International Conference on Parallel Computing, ParCo 2015, Edinburgh, Scotland, UK.

International Workshop „Quantum Annealing and its Applications in Science and Industry“ (QuAASI’16)

The international workshop „Quantum Annealing and its Applications in Science and ­Industry (QuAASI’16)” took place from 26 to 28 July 2016 in the Rotunda of the Jülich Supercomputing Centre. The goal of the two-day workshop, followed by a D-Wave Exploration Day, was to bring together researchers from different commu­nities to discuss both the challenges in using quantum annealing to approach the solution of real-world problems and the requirements on optimization and design of existing and future quantum annealing hardware.

About 60 researchers from Germany, Switzerland, the Netherlands, the United Kingdom, the United States and Canada participated in the workshop. The history of quantum annealing and the design of D-Wave’s quantum processors, the implementation of various optimization problems and machine learning on D-Wave machines, the study of the behavior and performance of D-Wave quantum computers, the various approaches designed to extend the applicability of these devices to larger, more connected optimization problems, and related topics were highlighted in the talks. The D-Wave Exploration Day provided detailed insights into the hardware architecture. Programming techniques and tools available were demonstrated by remotely running examples on one of the D-Wave 2XTM quantum computers with more than 1000 qubits located at the headquarters of D-Wave Systems in Burnaby, Canada.

Discrete optimization and quantum annealing

Optimization challenges are ubiquitous. They affect the sciences and the whole of society directly and indirectly. They comprise, among others, flight and train scheduling, vehicle routing, power trading and scheduling, supply chain network optimization, planning and scheduling of production processes, organ allocation and acceptance optimization, cancer radiation treatment scheduling, and optimizing target inter­actions for drug design. Optimization also lies at the heart of machine learning, artificial intelligence, computer vision and data mining.

In many of these practical optimization problems the task is to find the best solution among a finite set of feasible solutions. Such problems are formulated as discrete optimization problems. A standard way for solving discrete optimization problems is to first construct an integer or mixed-integer programming model, involving discrete or both continuous and discrete variables, and then use a software package such as CPLEX to solve the constructed model.

The new strategy proposed is to use quantum annealing for solving those optimization problems which can be mapped to a QUBO, a quadratic unconstrained binary optimization problem. Quantum annealing is a new technique, inspired by the classical simulated annealing techniques which are based on temperature fluctuations, for finding the global minimum of a quadratic function of binary variables by exploiting quantum fluctuations. Its main potential targets are combinatorial optimization problems featuring a discrete search space with many local minima. Many challenging optimization problems playing a role in scientific research and in industrial applications naturally occur as or can be mapped by clever modeling strategies to QUBOs.

D-Wave Systems

D-Wave Systems, founded in 1999, is the first company that has commercialized quantum annealers to carry out quantum computations. Their quantum annealers are programmable artificial spin systems manufactured as integrated circuits of superconducting qubits. Qubits or quantum bits are the elementary building blocks of a quantum computer, similar to the bits in a digital computer. The latest D-Wave quantum computers, D-Wave 2XTM ­systems, operate with more than 1000 qubits and over 3000 couplers connecting the qubits for information exchange. The D-Wave 2XTM niobium quantum processor, a complex superconducting integrated circuit containing more than 128,000 Josephson junctions, is cooled to 15 mK and shielded from external magnetic fields, vibrations and external radiofrequency fields of any form. A D-Wave 2XTM system requires less than 25 kW of power, most of which is consumed by the refrigeration system and the front-end servers. Currently, D-Wave Systems is testing their next generation of quantum computers having more than 2000 qubits. These new systems are scheduled for release in mid 2017.

D-Wave quantum processors are capable of solving QUBOs by mapping binary variables to qubits and correlations between variables to couplings between qubits. During a “quantum computation”, the system of interacting qubits evolves according to a quantum adiabatic annealing process. At the end of this process the qubits are read out to get the optimal or near optimal solution of the optimization problem.

Current D-Wave quantum processors have a so-called Chimera graph architecture, thereby connecting a given qubit with at most six other qubits. Solving optimization problems on such an architecture requires the embedding of the problems on the Chimera graph. This obviously limits the range of optimization problems that potentially can be solved on such a machine, but nevertheless, some very hard real-world optimization problems might be among them. Hence, exploring the potential of quantum annealing on this operational prototypic hardware for some real world problems is a challenge that should be taken up.

contact: Kristel Michielsen, k.michielsen[at]

  • Prof. Kristel Michielsen , Prof. Thomas Lippert

Jülich Supercomputing Centre (JSC), Germany

  • Prof. Wolfgang Marquardt

Chairman of the Board of Directors, Forschungszentrum Jülich

Fourth Exteme Scale Workshop at the Leibniz Supercomputing Centre

The fourth „Extreme Scale“ workshop on SuperMUC was conducted from February 29, 2016 until March 3, 2016 at LRZ with the goal of the optimization of existing and new Peta-scale applications. 13 projects participated in the workshop with 7 new projects who participated the first time. 11 Projects succeeded in full scaling on all the nodes of SuperMUC Phase 1.

As in the workshops before, the scientists were able to perform scale-out tests on SuperMUC, this time on the whole Phase 1 Partition, which constists of 9216 Nodes featuring 2 Intel Xeon E5-2680 “Sandy Bridge” Processors with a total of 147456 cores and a total memory of 131 TB. During the Extreme Scale workshop a total of 14.1 Mio CPUh were available to the participants which could be used for short debug and test jobs during the day and larger production runs with a maximum of 6 hours runtime during the night.

For the first time, the “Leibniz Exteme Scale Award” for the best scaling behavior of a program on SuperMUC Phase 1 was awarded. The winner of the award was the program VERTEX which showed linear strong scaling on the whole SuperMUC Phase 1. The award was presented by Prof. Bode (Chair of the Board of Directors of the LRZ) to the team leader of VERTEX, Dr. Andreas Marek.

The participating projects of the fourth “Extreme Scale” workshop were:

  1. INDEXA (CFD), TU München (M. Kronbichler)
  2. MPAS (Climate Simulation), KIT (D. Heinzeller)
  3. Inhouse (Matrial Science), TU Dresden (F. Ortmann)
  4. HemeLB (Life Science, Bloodflow Simulation), UCL (P. Coveney)
  5. KPM (Chemistry), FAU Erlangen (M. Kreutzer)
  6. SWIFT (Astro), University of Durham (M. Schaller)
  7. LISO (CFD), TU Darmstadt (S. Kraheberger)
  8. ILDBC (Lattice Boltzmann) FAU Erlangen (M. Wittmann)
  9. Walberla (Lattice Boltzmann), FAU Erlangen (Ch. Godenschwager)
  10. GPI (Parallelization Framework), Fraunhofer, (M. Kühn)
  11. GADGET (Astro), LMU München (K. Dolag)
  12. VERTEX (Astro), LMU München (T. Melson)
  13. PSC (Plasma), LMU München (K. Bamberg)

In the following the results of two participating projects are discussed further:

Linear-Scaling Transport Approach for Innovative Electronic Materials (F. Ortmann, TU Dresden)

Topological insulators are a new state of quantum matter, which exist in two and three dimensions and can be realized in certain materials and compounds. We use time-propagation Kubo methodology implemented in a highly efficient and MPI parallel real-space code. The order-N scaling with sample size (N) of the implemented algorithm potentially allows to tackle macroscopic 3D samples and study realistic structures, thus providing unprecedented insight into the transport physics of novel exciting material classes.

Scaling of the Code

We demonstrate here that the dominating part of the code which is the Lanczos routine for matrix-vector multiplication scales very well beyond 32,768 cores on SuperMUC. For this intensive part, we measure a speed-up of 8.2 on 73,728 cores (9 islands) compared to a single island (grey line in figure 1). This corresponds to 91 % efficiency.

When increasing the number of processes by another factor of 2 for the full phase I of SuperMUC (18 islands), we observe only a speedup of 1.3 for this last step. This leads to a significant drop in total efficiency to 59 % (speedup of 10.7 compared to 1 island). Further analysis of this effect indicates that the domain size which is handled by a single MPI process has dropped down to only 30,912 sites (orbitals) when running on the full machine. Therefore, the work load that is performed on the processes without communication (internal part of matrix-vector multiplication) is strongly reduced (strong scaling approach) such that communication starts to become much more relevant.

Another important observation from our runs concerns another part in the code which, at the scale of extreme scaling, becomes apparent in figure 1. Here we identify mainly the random-­number generation of the initial random-phase state and of the potential in the Hamiltonian, which takes 70 sec and 264 sec, respectively for the 18-islands run. Here we identify a bottleneck in the scaling which can be removed in subsequent work by using a parallel generator.

Performance analysis

With likwid we measured branching rate, flops and memory consumption of the intensive computation part doing the matrix–vector multiplications (Lanczos recursion). Thanks to a hint by LRZ staff Fabio Baruffa, we found out, that the main loop can be optimized, since renaming of variables and unnecessary branching is occurring. By restructuring the loop, we were able to increase the MFLOPS value in the loop from 867 to 1097. At the same time, we observed a reduction of the corresponding memory usage per MFLOP (from 1.87 MB to 0.94 MB). Further analysis and optimization is ongoing.

Using allinea, we did a comprehensive analysis of memory usage, MPI communication and pure computation time. We found out, that for large sample sizes per core, which amounts to a bigger work load for each core, the communication can be neglected in terms of the total time. The code is then computation (memory) bound. Most time is spent in the main matrix–vector loop. In this loop, 80 % of the computation time is due to memory access. The performance can thereby be enhanced by making the loop more compatible to vectorization, which is current work in progress in collaboration with LRZ experts. By lowering the sample size per core, corre­sponding to a ‘strong scaling’ to higher number of tasks, the MPI communication increases from 2 % up to 15 %, explaining the scaling behavior at large scales. This effect occurs due to the delay happening at MPI_receive.

Reverse Time Migration with GPI-2 (M. Kühn, Fraunhofer ITWM, Kaiserslautern)

Reverse Time Migration (RTM) is a seismic imaging method delivering high quality results for oil and gas exploration. Solving the full wave equation it tracks very well the steep dips and complex overburdens. It is applicable for wide azimuth data sets as well as anisotropic velocity models (TTI). The zero lag correlation of a forward modeled acoustic source signal (shot) and the backward modeled receiver signals delivers a high quality partial image (figure 1). Although a typical data set consists of ten thousands of independent shots, the quick processing of a few decisive shots is intriguing e.g. for the interactive modeling of salt domes.

Our RTM implementation (FRTM) models the wave equation using a Finite Differences (FD) scheme on a regular grid. The parallelization approach for each shot is a static domain decomposition on a regular grid with halo exchanges. The latter are executed fully asynchronously in a data dependency driven scheme. This scheme replaces completely the barriers that would typically be applied between the time steps in standard implementations. It allows superior strong scaling by relaxation of the synchronization between the compute nodes. Our communication library GPI-2 is the appropriate tool to implement this communication pattern efficiently [1].

Our benchmark calculates a single shot at 15Hz of the well established synthetic SEAM benchmark [2]. The velocity is modeled as Tilted Transverse Isotropic (TTI), the simulation domain has 800x915x1291 voxels and the wave equation is discretized with an 8th order stencil in space and 2nd order in time. The strong scaling plot in figure 1 shows almost perfect scaling up to 1024 nodes and still well scaling properties up to 4K nodes (72 % efficiency at 4K nodes). At 4K compute nodes we have a run time of 2 milli­seconds for each simulated time step which corresponds to an average total network bandwidth of approximately 8 terabyte per second. The maximum floating point performance (SP) is about 310 TFlop/s.

Beyond 1K nodes the domain decomposition produces very small subdomains consisting of boundary elements only. This situation tightens the coupling between subdomains and reduces the overlap of communication by computation. This explains well the drop in parallel efficiency observed in the scaling plot. However, even under these unfavorable circumstances our approach is able to scale further to a higher absolute performance.

contact: Ferdinand Jamitzky, jamitzky[at]

  • Ferdinand Jamitzky, Helmut Brüchle, M. Kühn, F. Ortmann

Leibniz Supercomputing Centre

21st VI-HPS Tuning Workshop at LRZ

On 18-22 April 2016 the Leibniz Supercomputing Centre hosted the 21st VI-HPS Tuning Workshop in a very fruitful cooperation with the Jülich Supercomputing Centre (JSC) and the VI-HPS consortium. This series of tuning workshops gives an overview of the VI-HPS performance analysis and tuning tools suite, explains the functionality of individual tools and how to use them effectively, and offers hands-on experience and expert assistance using these tools on participants’ own applications.

The Virtual Institute High-Productivity Supercomputing (VI-HPS) combines the expertise of twelve partner institutions spread around the globe, each with a strong record of high-­performance computing research. Its partners have long experience in the development and application of HPC programming tools and host well-known tool projects that are contributing leading-edge technology to this partnership. Most of these tools are open source and freely available to the HPC user communities.

The 5-day workshop attracted over 35 international participants. Talks were given by 15 ­lecturers from 9 VI-HPS member institution—a record in the long history of VI-HPS tuning workshops which was initiated in 2008.

The following 14 HPC tools were covered during the workshop:

  • Score-P instrumentation and measurement
  • Scalasca automated trace analysis
  • Vampir interactive trace analysis
  • Periscope/PTF automated performance analysis and optimisation
  • Extra-P automated performance modeling
  • Paraver/Extrae/Dimemas trace analysis and performance prediction
  • [k]cachegrind cache utilisation analysis
  • MAQAO performance analysis & optimisation
  • MAP+PR profiling and performance ­reports
  • mpiP lightweight MPI profiling
  • Open|SpeedShop profiling and tracing toolset
  • MUST runtime error detection for MPI
  • ARCHER runtime error detection for OpenMP
  • STAT stack trace analysis

The participants especially appreciated the opportunity to optimise their own code during many hands-on sessions with direct help by the instructors, who were in most cases also the developers of the tools. Hands-on sessions were done on the symmetric multiprocessing (SMP) system SGI UltraViolet at LRZ, which was exclusively reserved for the workshop. Also, the organisational efforts of the VI-HPS consortium were greatly acknowledged by the participants.

A social event consisted of a guided tour of the Weihenstephan Brewery, the oldest still-­operating brewery in the world (see figure 1), followed by a self-paid dinner at the brewery restaurant which encouraged intensive partici­pant and instructor networking in a relaxed environ­ment.

Slides of the workshop are available at:

The workshop was a PRACE Advanced Training Centre (PATC) event financially supported by the PRACE-4IP project funded by the European Commission’s Horizon 2020 research and innovation programme (2014-2020) under grant agreement 653838.

contact: Volker Weinberg, Volker.Weinberg[at]

  • Anupam Karmakar, Volker Weinberg

Leibniz Supercomputing Centre (LRZ, Germany

  • Brian Wylie

Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich, Germany

Fiber optics study in regard of animation and VFX companies

The ASAPnet study was dealing with the problem of the lack of an affordable glass fiber network for animation and VFX studios in the region of Stuttgart. In the focus of the study was further­more an improvement of the access to the computational power of the High Performance ­Computing Center Stuttgart. The various sections of the topic, such as the short project times of the animation industry but also the obstacle of the software integration were comprehensively processed and solutions proposed. The Media Solution Center BW (MSC, project of HLRS was not only involved in the preparation of the study, but is also part of those solutions.

Initial impulse

In 2014 the majority of the frames for the “Maya the Bee” movie were rendered at HLRS. At that time, the large amounts of data had to be transported on hard drives via public transport to the HLRS for the rendering process. Which is especially alarming as Stuttgart is one of Germanys leading citys, when it comes to animation and visual effects. Since the described procedure, which found its way into the local press under the term “sneaker network”, not exactly matches the political aim of digitalisation, the Economic Development Corporation for the Region of Stuttgart (Wirtschaftsförderung Region Stuttgart) in cooperation with the media and film society Baden-Württemberg GmbH (Medien- und Filmgesellschaft, MFG) initiated a study in order to analyze the problem in detail. The SICOS BW GmbH was involved as a partner, in this context also as a representer for HLRS and as an expert for technology transfer towards small and medium enterprises.

Course of action

Through individual talks with the participating animation and VFX studios their operation manner and the need for a faster data connection were interrogated and analysed. It has shown, that almost all of the companies are interested in an expansion of their computing capacities. However, as their work is tremendously project-­based, the detailed planning for a fiber optic cable infrastructur ideally including a connection to HLRS, is already beyond capacity. Apart from obvious obstacles like the mentioned time consuming planning for a fiber connection, the issues with regard of the rental situation, the negotiations with the telecommunication providers and in many cases also the expensive earthwork operations, there is another immense challenge for this specific industry: if you want to have a fiber connection and be able to use external render capacities, it also needs an adjustment of the usually very complex and unique software pipeline. A new partner for the animation and VFX studios, who is willing to join them in developing technical innovations, is the Media Solution Center BW (MSC): a cooperation project of the HLRS, the media university Stuttgart, the Filmakademie Baden-­Württemberg and the ZKM in Karlsruhe. It is the declared aim to give the companies access to HLRS, but not only for classical rendering. The MSC rather intends to also give the possibility of trying out new approaches and to support future-oriented working habits.

Fiber optics suggestions

In regard of fiber optics the ASAPnet study has explicit recommendations: Interested animation and VFX companies should form a demand pool. As a group it is more likely to get reasonable quotes for the the so-called dark fiber, which is an unlit fiber for individual and especially practically unlimited use.

The idea is that the lines of the single companies converge at a central node in the center of Stuttgart, where multiple telecommunication providers are present. The thereby increased competition within the market situations allows the achievment of better prices.

Another advantage of a central node is the reduction of the total length of cableway to HLRS. Such shortening would also have the benefit of being cost-saving, since the bulk of the costs for a fiber optic cable connection is the laying costs (figure 2).

According to the study, the proposed model could be realised with a 10,000 Euro one-time investment and 1,000 Euro per month operating costs per studio.

The results of the study were presented at HLRS in front of invited guests and the press. The representatives of the involved institutions agree: if the gathered recommendations get realized, a siginificant improvement of the fiber optic infrastructure can be achieved from which also sectors besides animation can benefit tremendously.

The whole study can be read here: (german version)

contact: Annekatrin Baumann, baumann[at]

  • Annekatrin Baumann

MSC BW project coordinator, HLRS

Symposium on Theoretical Chemistry at Ruhr-Universität Bochum: Chemistry in Solution

From September 26-29 2016, the 52nd Symposium on Theoretical Chemistry (STC) took place at the Ruhr-Universität Bochum organized by the Center for Theoretical Chemistry (Prof. Dr. Dominik Marx) and the DFG Cluster of Excellence RESOLV (Ruhr Explores Solvation). The STC is an annual international meeting of scientists from all areas of Theoretical Chemistry. This year‘s focus was on Solvation Science.

The majority of chemical reactions take place in a liquid-state environment. Solvents—water being the most prominent—are used to solvate mole­cular species, ranging from industrial reagents to biological molecules in living cells. Solvents also wet surfaces, such as lipid membranes or metal electrodes, thus, creating extended inhomogeneities and, thereby, interfaces. It is, therefore, not astonishing that research into liquids, solutions and their interfaces has a long-standing tradition in ­Theoretical Chemistry. The aim of STC 2016 in Bochum dedicated to the featured topic Chemistry in Solution has thus been to advocate that a lot of progress in Theoretical Chemistry can be achieved by a fruitful combination of the wide range of available modern methods. About 400 participants discussed new developments and research results that have been presented in 13 high-level invited lectures given by leading international experts in the field. They covered research into chemistry in liquid-state environments bringing together many different perspectives, from advanced electronic structure calculations to sophisticated computer simulation methods, but also technical challenges and possible solutions in the application of these methods on modern architectures towards exascale computing have been discussed. In addition, numerous contri­buted talks and posters addressed all fields of Theoretical ­Chemistry.

A particular highlight of each STC is the ceremony of the Hans G.-A. Hellmann award, the most prestigious prize for young researchers in the field of theoretical chemistry that have an outstanding scientific record but not yet received a full professorship. This year, the ­Hellmann prize has been awarded to Dr. Ralf Tonner ­(Universität Marburg) for his contributions to the understanding of chemical bonding and reactivity at surfaces using concepts from chemistry and physics. Furthermore, for the first time the new Erich-Hückel Prize of the Gesellschaft Deutscher Chemiker (GDCh) for ­Outstanding Achievements in the Field of ­Theoretical Chemistry has been awarded to Professor ­Werner ­Kutzelnigg (Emeritus at Ruhr-­Universität Bochum) for his groundbreaking work on the nature of chemical bonding, on the description of electron correlation and magnetic properties, and for his contributions to relativistic quantum chemistry.

contact: Michael Römelt, michael.roemelt[at]

contact: Jörg Behler, joerg.behler[at]

  • Michael Römelt, Jörg Behler

Lehrstuhl für Theoretische Chemie, Ruhr-Universität Bochum

TranSim: bringing philosophy and computer simulations together

Someone could ask, with genuine surprise: “what has philosophy to do with computer simu lations?”. Depending on whom we ask, the answer would be a relatively direct “quite more than you expect”. Before fleshing out any more elaborate answer, it is important to delimit what it is being asked. It is not so unusual to find that answers to this question are taken as also asking “what is philosophy, and what is it good for?” and “why should anybody, being a non- philosopher, care about it?”.

The question (what is philosophy—what is it good for) aims at justifying a field of know ledge, and it is definitely not the purpose of our initial question. Nevertheless, an answer to the second question could help to understand the connection between philosophy and computer simulation (first question) and it would give us an idea of why should anybody care about philosophy (third question).

Let us begin with some common opinions about what philosophy is—and what it is good for (more on this point, see Kaminski, Andreas (2016): Art. Philosophie, in: Richter, Philipp (Hg.): Professionell Ethik und Philosophie unterrichten. Ein Arbeitsbuch, Kohlhammer: Stuttgart, S. 275–279.)

A first position: Philosophy as the practice of conceptual clarification

Philosophy is a discipline hard to grasp, and even more to define (as many others). But among its interests, we can mention clarifying concepts, exposing (false) assumptions, challenging ways of understanding our world and surroundings and, last but not least, defiance our most solid beliefs. So, couldn’t we say: this is what philo sophy is, and what it is good for. Some philosophers are quite convinced that this characterizes philosophy. But the problem is that every disci pline is eager to clarify it concepts. From this point of view, the practice of a philosopher would not be any different from a scientist. And worse, philosophy would not have a legitimate area. Scientists would do a better job in clarifying their concepts than philosophers. Why should a philo sopher be in a better position to understand neutrons or Navier Stokes equations?

A second position: Philosophy as Weltanschauung

For some people, philosophy might appear as a Weltanschauung, a fundamental belief about the world as whole. From this point of view, the history of philosophy appears to be at the best a cabinet of bold and interesting opinions (Plato’s opinion about that all of this means is x, Kant’s that it means y), in principle undecidable and therefore up to one’s personal beliefs. Many questions can be cleared up and answered by the empirical sciences, and the remaining part seems to be what philosophy is about. This would explain why there seems to be not a similar progress in philosophy as in science, why it has no empirical methods—and especially why modern sciences liberated themselves from philosophy. And worse, this would make philosophy a very private opinion, not quite understandable, teachable and knowable.

A different approach: philosophy as the study of notion of reflection

Let us have a second and closer look. Philo sophy examines notions like truth, justification, justice, self, and so on. Let us imagine, we would like to overcome philosophy by starting to study the structure of this notions empirically. We start an empirical study of truth. Two problems emerge. First, by an empirical study we would at best discover how people use the notion of truth. Therefore, we could discover an average usage of the notion. But what if the people we observe are using the notion in the wrong way? Second, we already require an understanding of truth in order to do our empirical study. Why? Because, we have to distinguish between true and false statements within our study. This thought experiment gives us a hint of what philo sophy is about. It is the examination of notions that we require for living and for doing science, and which are not or at least not thoroughly explorable empirically. Philosophers call these notions of reflection (Kant, Hegel). For instance, truth, justice, and self are notions which we pre suppose in our everyday life, and they have structures which are not investigable just by observing or asking people. Therefore, the history of philosophy is a history of different models (compare the corres pondence, coherence, pragmatic, and other models for truth), as other models they have advantages and limits.

Let us note that these three approaches are a general overview about philosophy, but by no means exhaust all the different ways in which philosophers relate to philosophy. Now, when it comes to our original question “what has philoso phy to do with computer simulations?”, we can put the third approach to work. Computer simulation transforms the way we try gain and justify knowledge, to predict the future, it changes the way we handle uncertainty and make decisions. Knowledge, justification, future, uncertainty, decision making are (at least) partly notions of reflection. Philosophy aims to analyze and to understand how this method affects these notions.

Computer simulations and philosophy

This might give us an insight on how philoso phy, science, and computer simulation are connected. First, computer simulation studies and philosophy are both model-driven ways of thinking. Second, computer simulation studies involve notions like knowledge, justification, truth, and value, among others. Third, computer simulations change the way to gain and to justify knowledge, to understand nature, to make decisions under uncertainty, to involve values in science, and so forth.

The department Philosophy of Science and Technology of Computer Simulations

In 2014, Prof. Michael Resch and Dr. Andreas Kaminski created the department Philosophy of Science and Technology of Computer Simulations at the HLRS – University of Stuttgart. The idea was to understand how computer simulations change science, engineering, and society. Therefore, the departments move the technical dimension of computer simulation to the fore (computer simulation has been studied by philosophers of science, but less by philosophers of technology). In close collaboration with engineers, physicists, mathematicians, and others, several research areas had been identified.

The research project: Transforming Society – Transforming Simulation (TranSim)

The next year, the project Transforming Society – Transforming Simulation (TranSim) was created by Michael Resch and Andreas Kaminski. It obtained funding for four PhD students and one Post-Doc position. At the beginning of this year, all five positions were filled out. Here is a brief introduction of the six projects currently operating in the department of philosophy (the four PhD, two projects under the Post-Doc position) and their relation between philosophy and computer simulations.

Project 1. Possibilities and limitations of simulating

The first project examined the validity of the theory of science simulations. In its current develop ment, simulations are considered “computer experiments”. That is, computer simulations are understood as a kind of advanced laboratory where the results are easier, quicker, cheaper, and less risky to generate. Unlike the classical experiment, however, the virtual reality of the experiment is completely dependent on the modeling. Only what is part of the model plays a role. An independent nature does not exist in this case. Hence, the task of deter mining the boundaries of the validity of simulation results.

Project 2. The normativity of simulations

The second project addresses an essential step of modeling: selections. Models have to be selective. And selection presupposes a criterion of what is seen as relevant to the area to be simu lated, and what it is not. The assumption is that values play an important role in that process. Economic, scientific, technical, and moral values orientate modelers on what is relevant and what it is not. This implicative use of values is to be examined in this project.

Project 3. Visualization of computer simulations

The imaging techniques in neuroscience have triggered a wide debate on the validity of their results. Analogously, computer simulations result in visualizations that make their outcomes (more) understandable and comprehensible. To avoid similar objections as neuro images faced, visualization of computer simulations should be examined in this project. Are visualizations comparable to pictures or to a special kind of model? And how do scientists use visualizations? What is their function? Is it to represent an object, a process, or to make the simulation models adjustable?

Project 4. Simulations as basis for political decisions

The results of computer simulations have proved to be heavily used for input in policy-making. Examples in the climate change come first to mind (e.g., the capture and storage of CO2), but one can find more examples in sociology, economics, and the like. Such studies involve the transfer of knowledge, which typically is marked by different expectations and relevance patterns. This project investigates how computer simulations influence policy-making decisions, which relevant aspects of the simulations play a more prominent role, and how communication of results is generally performed. The project also aims at evaluating recommendations for improved communication, and assessing problematic expectations of simulations. This project holds close links to projects 1, 2 and 3, especially on issues related to the possibilities and limi tations of simulation, value perspectives, and forms of visualization.

Project 5. Changes in the working world through simulation

In the industrial world (most prominently, although not limited to, the automotive and aerospace industry), computer simulations have proved their central importance beyond doubt. They simplify, accelerate, and drive the general progress of technological developments. It would not be an exaggeration to say that computer simulations have substantially and sustainably changed engineering practice. The core objective of this project is to explore the relationship between heuristic systems (e.g., those that analyze a space of possible solutions to a problem), and computer simulations.

Project 6. Changes in science by simulation

For a long time, computer simulations were conceived simply as fasts (and, to certain extent, reliable) ways to solve very complex scientific models. The kind of models that humans would not be able to solve in a lifetime, if at all. But that was it. Computer simulations were more or less considered as number crunching machines. This project aims at confronting this classic picture with the use of computer simulations in science. Computer simulations are influencing not only our way of doing science and engineering (as it is discussed in project 5), but also in the ways we describe the world in these fields. In simpler worlds, computer simulations seem to be changing scientific thinking. This is the main aim of this project.

contact: Andreas Kaminski, kaminski[at] - Juan M. Durán, duran[at]

  • Andreas Kaminski, Juan M. Durán

Department of philosophy, HLRS Stuttgart


The world’s largest turbulence simulations


Understanding turbulence is critical for a wide range of terrestrial and astrophysical applications. For example, turbulence on earth is responsible for the transport of pollutants in the atmosphere and determines the movement of weather patterns. But turbulence plays a central role in astrophysics as well. For instance, the turbulent motions of gas and dust particles in proto stellar disks enables the formation of planets. Moreover, virtually all modern theories of star formation rest on the statistics of turbulence [5].

Especially the theoretical assumptions about turbulence behind star formation theories allow the prediction of star formation rates in the Milky Way and in distant galaxies [2]. Interstellar turbulence shapes the structure of molecular clouds and is a key process in the formation of filaments which are the building blocks of star-forming clouds. The key ingredient for all these models is the so-called sonic scale. The sonic scale marks the transition from supersonic to subsonic turbulence and produces a break in the turbulence power spectrum from E k−2 to E k−5/3.

While the power-law slopes of -2 and -5/3 for the supersonic and subsonic parts of the spectrum have been measured independently, there is no simulation currently capable of bridging the gap between both regimes. This is because previous simulations did not have enough resolution to separate the injection scale, the sonic scale and the dissipation scale.

The aim of the project presented in this contribution is to run the first simulation that is sufficiently resolved to measure the exact position of the sonic scale and the transition region from supersonic to subsonic turbulence. A simulation with the unprecedented resolution of 10,0003 grid cells will be needed for resolving the transition scale.


In the framework of a GAUSS Large Scale Project, an allocation exceeding 40 million core-h has been granted to this project on SuperMUC. The application used for this project is FLASH, a public, modular grid-based hydrodynamical code for the simulation of astrophysical flows [3]. The parallelisation is based entirely on MPI. In the framework of the SuperMUC Phase 2 scale-out, the current code version (FLASH4) has been optimised to reduce the memory and MPI communication requirements. In particular, non-critical operations are now performed in single precision, without causing any significant impact on the accuracy of the results. In this way, the code runs with a factor of 4.1 less memory and 3.6 times faster than the version used for the previous large-scale project at LRZ [1], and scales remarkably well up to the full machine on SuperMUC Phase 2 (see figure 1).

Our current 10,0483 simulation has been nearly completed at the time of writing, and data processing is in progress. Some early impression of the forthcoming results can be seen from the highlights of the work of [1], based on the previous large-scale project on turbulence simulations (up to 4,0963 grid cells), selected as the SAO/NASA ADS paper of the year 2013.

Highly-compressible supersonic turbulence is complex, if compared to the subsonic, incompressible regime, because the gas density can vary by several orders of magnitude. Using three-dimensional simulations, we have determined the power spectrum in this regime (see figure 2), and found E k−2, confirming earlier indications obtained with much lower resolution [4]. The resolution study in figure 2 shows that we would not have been able to identify this scaling at any lower resolution than 40963 cells. Extremely high resolution and compute power are absolutely necessary for the science done here.

Figure 3 displays the unprecedented level of detail in density structure achieved with our current 10,0483 simulation. This visualization highlights the enormous complexity of the turbulent structures on all spatial scales covered in these simulations. Simulation movies are available online (see links below).

Future Work

Turbulence has a wide range of applications in science and engineering, including the amplification of magnetic fields, star and planet formation, mixing of pollutants in the atmosphere, fuel ignition in engines, and many more. Generating the huge dataset of turbulence presented here, we have begun to reach the technical limits of what is feasible on any supercomputer in the world to date. We are currently pushing the boundaries even further by running the world’s first turbulence simulation with 10,0483 grid cells on SuperMUC. We hope to unravel the statis tics of supersonic and subsonic, magnetized turbulence in the near future, with cutting-edge supercomputing systems provided by the LRZ.

Gauss Centre for Supercomputing:

Uni Heidelberg:


  • [1] Federrath, C.:
    2013, Monthly Notices of the Royal Astronomical Society, 436, 1245
  • [2] Federrath, C., & Klessen, R. S.:
    2012, Astrophysical Journal, 761, 156
  • [3] Fryxell, B., Olson, K., Ricker, P., et al.:
    2000, Astrophysical Journal Supplement Series, 131, 273
  • [4] Kritsuk, A. G., Norman, M. L., Padoan, P., & Wagner, R.:
    2007, Astrophysical Journal, 665, 416
  • [5] Padoan, P., Federrath, C., Chabrier, G., et al.:
    2014, Protostars and Planets VI, 77

contact: Christoph Federrath, christoph.federrath[at]

  • Christoph Federrath

Research School of Astronomy and Astrophysics, Australian National University

Ralf S. Klessen

  • Zentrum für Astronomie der Universität Heidelberg, Institut für Theoretische Astrophysik
  • Universität Heidelberg, Interdisziplinäres Zentrum für Wissenschaftliches Rechnen
  • Nicolay J. Hammer

Leibniz-Rechenzentrum der Bayerischen Akademie der Wissenschaften

CFD based low-order modeling of the nonlinear flame dynamics and of swirl fluctuations


Despite the development of power plants using renewable energy sources, gas turbines will play an important role in future energy production. They offer operational flexibility and at the same time low emission of greenhouse gases. These are important properties to serve as backup solution in the age of renewable energy sources, which is essential to maintain net stability and a reliable energy supply.

Thermoacoustic oscillations limit the development of gas turbines aimed to lower emission of pollutant and a higher operational flexibility. The basic mechanism behind thermoacoustic oscillations is as follows: Small initial fluctuation of the velocity, say, yield fluctuation of the global heat release rate of the flame. This unsteady heat release rate acts as a volume source, which in term creates acoustic waves. These waves are reflected at the boundaries of the burner back to the flame and perturb the flame, again. This feedback can get unstable and yield very large oscillations. If the machine is not turned down, these oscillations can cause significant damage.

The occurrence of those instabilities in a gas turbine depend on the interaction of all parts of the engine. However, due to limited computational power it is by no means possible to simulate a whole gas turbine within a single LES (Large Eddy Simulation). Low order network models are the state of art approach for estimating these instabilities. First, a low-order model of each component in a gas turbine is determined. Then the low order models are interconnected in order to predict the global heat release rate of the flame. Entirely acoustic elements without reactive flows can be modeled with a linearized version of the Navier-Stokes equations. However, a low-order model for the flame cannot be found on this way.

Therefore, in the present project the so called CFD/SI approach is investigated [3]: The flame is simulated with a LES. These simulations are expensive and necessitate the use of SuperMUC. In order to deduce low-order models efficiently from the LES, the LES is perturbed with a broadband excitation signal. The resulting fluctuation of a reference velocity and of the global heat release rate are measured. The time series collected are post-processed with system identi fication methods in order to determine the low-order models.

This method has already been proofed to be both accurate and efficient. In the scope of the present project two new aspects are investigated: (1) It is investigated how non-linear low order models can be deduced. (2) The impact of swirl waves on the flame dynamics is investigated. For this purpose, two different swirl burners are investigated, one having axial swirler (BRS burner) and the other radial swirler (FVV burner) shown in figure 1 and 2, respectively.

Results and Methods

The object-oriented C++ Software package OpenFOAM [4] is used to perform the LES simu lations. OpenFOAM employs an implicit finite volume scheme with well-known PISO and SIMPLE algorithms. The solver is based on the standard solver reactingFOAM.

The turbulence is modeled with Smagorinsky sub-grid scale LES model. Global 2-step chemistry is used to model Methane-Air combustion. The Thickened Flame Model is implemented for decreasing the mesh resolution requirement by artificially thickening the flame. The low Mach number assumption is used in order avoid acoustic wave reflection at boundaries. Adaptive Mesh Refinement capability is added to the solver for refining the mesh only in flame region and therefore saving computational time.

The BRS burner rig is used for code bench marking. Experiments and numerical results using AVBP [3] are available in literature. In figure 3, the Flame Transfer Function calculated by means of experiment is compared against numerical approach (LES-SI) by different solvers AVBP and OpenFOAM. A good agreement with experiment is achieved with OpenFOAM simulations.

In order to compute a single Flame Transfer Function for BRS burner, time series around 0.35 seconds are required. This can be achieved in 120000 CPU hours with 840 processors.

On-going Research / Outlook

For the nonlinear system identification we created long time series with broadband, high amplitude excitation. Due to the resulting large movement of the flame these simulations were numerically demanding. At the moment the focus is on post-processing the time series obtained in order to investigate the capability of the nonlinear system identification. Here, the objective is to investigates how well methods that have been validated for laminar flames [6] can be used to model thermoacoustic oscillations of turbulent flames. As the focus of this work is now on post-processing the CFD data the SuperMuc is not required anymore for this part of the work.

With the SuperMuc our current focus is on the investigation of the swirl flames. Here, a model for the influence has been developed by the authors of the present project [7]. This model is to be validated against a turbulent swirl flame. Extensive numerical studies with the SuperMuc will be necessary.

One proposal continuing the work done within this project has just been submitted. The idea is to use the CFD/SI approach to deduce low-order models for the combustion noise. This is of high industrial interest, as such model combined with a thermoacoustic network models allows to predict the noise emitted by the engine.

In future it is planned to investigated the uncertainty of the prediction in more detail. Here, physical parameters as the wall temperature and the turbulence model as well as numerical parameters as the discretization scheme or the mesh are to be investigated. These study will require are huge computational and thus, necessiate the use of SuperMUC.


  • [1]
  • [2]
  • [3] W. Polifke:
    “Black-box system identification for reduced order model construction,” Annals of Nuclear Energy, vol. 67C, pp. 109–128, May 2014.
  • [4]
  • [5]
  • [6] S. Jaensch, M. Merk, E. Gopalakrishnan, S. Bomberg, T. Emmert, R. I. Sujith, and W. Polifke:
    “Hybrid CFD/ low-order modeling of nonlinear thermoacoustic oscillations,” in submitted to the 36th Symposium of the Combustion Institute, Seoul, Korea, 2016.
  • [7] A. Albayrak, W. Polifke:
    “On the propagation velocity of Swirl Waves in Annular Flows”,Proceedings of the 21st International Congress on Sound and Vibration, Beijing, China, 2014.

contact: Christian Lang, lang[at]

  • Christian Lang
  • Wolfgang Polifke
  • Stefan Jaensch
  • Alp Albayrak

Technische Universität München, Fakultät für Maschinenwesen, Professur für Thermofluiddynamik

Improving scalability for the CFD software package MGLET

The code MGLET has been designed for the numerical simulation of complex turbulent flows. MGLET uses a Finite Volume method to solve the incompressible Navier-Stokes equations. It uses a Cartesian grid with staggered arrangement of the variables that enables an efficient formulation of the spatial approximations. An explicit third order low-storage Runge-Kutta method is used for time integration. The pressure is computed within the framework of fractional step or Chorin’s projection method, respectively. Therefore, at every Runge-Kutta substep, a linear system of equations, a Poisson-equation, has to be solved. Geometrically complex surfaces are represented by an Immersed Boundary Method.

The code is currently being used by several research groups. At the Fachgebiet Hydromechanik of the Technische Universität München, geometrically complex turbulent flows, flow in porous media and fibre suspensions have been investigated using MGLET. The groups of Prof. Helge Andersson and Prof. Bjørnar Pettersen (both NTNU Trondheim) use the code to predict and analyse bluff-body flows primarily using DNS and the Immersed Boundary Method. At the Institute for Atmospheric Physics (DLR Oberpfaffenhofen), aircraft wake vortices are investigated including their interaction with atmospheric boundary layers and ground effects. These applications demonstrate the power and flexibility of the code.

MGLET has always been used on high performance computing hardware. There is a trend towards higher Reynolds numbers, more complex flow configurations and the inclusion of micro-structural effects such as particles or fibres. The simulation so far that has used the largest number of degrees of freedom is the one simulating a fully turbulent channel flow of a fibre suspension [1]. This simulation is the only one using a full micro- mechanical model for the fibres’ orientation distribution function without closure published so far.

MGLET is parallelized by a domain decomposi tion method using MPI as a framework. This implementation scaled well to about 2000 processes on SuperMUC, but the performance is strongly dependent on the algorithms and models used. It has been observed that above approximately 2000 processes the scaling behaviour was poor, especially for the fourth order solver. Additionally the memory requirement per core of MGLET scales linearly with the number of processes. This is one of the design-bottlenecks in the current implementation of MGLET. Finally the current input/output mechanism is done by only one rank, collecting/distributing data from/to the other ranks, which does not scale.

This article presents the results of the performance evaluation and scalability improvements that are done in the frame of an effort funded by KONWIHR (Bavarian Competence Network for Technical and Scientific HPC).

Analysis of the performance issues

With the aid of MPI tracing tools, we quickly discovered that there were significant problems in the communication of the ghost cell values at the boundaries of the computational grids. The exchange of these ghost cell values is the core of the domain decomposition method, and is very frequently performed, so any problems in these routines will cause major performance degradation.

Without presenting too much details, figure 1 shows the most important symptom of the problem: the average transfer rate between some processes are as low as 10 MB/second, while between others the transfer rate is above 2 GB/second. Another interesting effect is that the transfer rate does not seem to be symmetric. In figure 1 this is for example seen between process 291 and 296, where process 291 sends data to process 296 with a rate of 2.1 GB/second, and the opposite, process 296 sends data to process 291 at only 17.3 MB/second.


After discovering this huge bottleneck, we redesigned this part of the code completely, and the effect was a dramatic increase in performance. We still use plain non-blocking MPI_Isend and MPI_Irecv to do the communication, however, we ensure a consistent calling order and always do all calls to MPI_Irecv before we do any calls to MPI_Isend. This resulted in a dramatic performance improvement.

In addition to improving the ghost-cell exchange algorithm, we also reduced the amount of ghost-cell exchanges, collective communication and some obvious and low-hanging memory improvements.

To measure the performance improvements, we have used several benchmarks, both synthetic benchmarks (artificial and simplified flow problems) and real-life cases from the users of MGLET. All tests are performed at SuperMUC phase 1 nodes.

Our first benchmark, shown in figure 2, is a synthetic flow case we use to quickly test MGLET. It is a very simple case, without any bodies present in the flow, no statistical sampling and no IO, thus only testing the core of the flow solver. This case shows a huge improvement in performance, in the entire range from one compute node (16 processes) to 256 compute nodes (4096 processes). Even though the scaling in the region beyond 1024 processes is not perfect, this might not be a problem in practice when additional workloads that do not depend on communication is added, such as statistical sampling, immersed boundaries and IO work.

The second testcase we show here in this article is a large testcase with 1.6 billion grid cells. It is a real-world case from Fachgebiet Hydro mechanik at TU München and simulates the flow around a wall-mounted cylinder at a Reynolds number of Re=78000 by Large Eddy Simulation [2], see figure 4. As figure 3 shows, the run-time of this case is reduced by 50 % during this project. In practice this means that we can compute a case of this size with only half the amount of CPU hours as before this project. It also allows us to increase the problem sizes significantly beyond this point, which opens new and interesting research fields.


  • [1] A. Moosaie and M. Manhart:
    Direct Monte Carlo simulation of turbulent drag reduction by rigid fibers in a channel flow. Acta Mechanica, 224(10):2385–2413, 2013.
  • [2] W. Schanderl and M. Manhart:
    Reliability of wall shear stress estimations of the flow around a wall-mounted cylinder. Computers and Fluids 128:16–29, 2016.

contact: M. Allalen, allalen[at]

H. Strandenes

  • Department of Marine Technology, Norwegian University of Science and Technology
  • Hydromechanics Working Group, Technical University of Munich
  • M. Manhart
  • W. Schanderl

Hydromechanics Working Group, Technical University of Munich

  • M. Allalen

Leibniz Supercomputing Centre

  • I. Pasichnyk

IBM Deutschland

High Performance Computing for Welding Analysis

As an integral part of the PRACE SHAPE pro ject “HPC Welding” [1], the parallel solvers of LS-DYNA were used by Ingenieurbüro Tobias Loose to perform a welding analysis on the Cray XC40 “Hazel Hen” at the High Performance Computing Center Stuttgart (HLRS).

A variety of test cases relevant for industrial applications have been set up with DynaWeld, a welding and heat treatment pre-processor for LS-DYNA, and run on different numbers of compute cores. The explicit mechanical solver was tested on up to 4080 cores with significant scaling. As far as we know, it was the first time that a welding simulation with the LS-DYNA explicit solver was executed on 4080 cores.

Welding simulation

Welding structure simulation is a highly sophisticated finite element (FE) application [2]. It requires a fine mesh discretisation in the weld area so that, in combination with large assemblies and long process times, welding simulation models are very time consuming during the solver run.

HPC with massively parallel processors (MPP) can provide a solution to this issue. In crash applications and forming analysis, it is known that the commercial finite element code LS-DYNA, using the explicit solution algorithm, provides good performance on HPC systems. However, at the authors’ know ledge, performance benchmarking of LS-DYNA for welding simulations have never been performed prior to this study. This project has analysed the feasibility of welding analysis with parallelised LS-DYNA solvers and its performance.

In this project a Cray-specific LS-DYNA mpp double precision (I8R8) version has been used. The version used, named as revision 103287, was compiled by Cray using the Intel Fortran Compiler 13.1 with SSE2 enabled. The Extreme Scalability Mode (ESM) was used.

In addition, the commercial pre-processor DynaWeld [3, 4] is used to set up the welding simulation models for the solver.

Welding tasks

The welding technique covers a very wide range of weld types, process types, clamping and assembly concepts and assembly dimensions. For example: arc weld, laser weld, slow processes, high speed processes, thin sheets, thick plates, single welds, multi-layered welds, unclamped assemblies, fully clamped assemblies, prestress and predeformations. This shall illustrate that there is not only one „welding structure analysis“ but a wide range of modelling techniques to cover all variants of welding. In consequence, welding simulation cannot be checked in general for HPC, but every variant of modelling type has to be checked separately.

This project considers several representative modelling variants for welding structure with the aim to cover a range as wide as possible. Figure 1, for example, shows a model of a gas metal arc welded curved girder. This model covers a complex and large industrial case with many welds. A high speed laser welded thin sheed was the test case for the explicit analysis of the project (figure 2). This case was modelled with 200,000 shell elements (EDB) and 1 million shell elements (MDB).

Results of the project

The results of the test cases with explicit analysis provided the following results: The scaling behaviour in the double-logarithmic scale is linear with nearly constant gradient up to 4080 cores (figure 3). Above 96 cores the model MDB with 1 million elements provides a better scaling than the model EDB with 200 000 elements due to the fact that the number of elements per core domain is larger in this case. Regarding the parallel efficiency (the ratio of speedup and number of cores), the larger model has a ratio of 0.45 at 768 cores, and at the highest number of cores (4080) a ratio of 0.4.

As a result of the project in general, recommendations for the number of cores in order to obtain the optimal performance are provided and the expected speedup is given. Both the number of the cores and the speedup depend on the model type.

The overall effort for welding analysis on HPC is now much better known with the help of this SHAPE project [1], leading to the ability of a more accurate cost estimate of welding consulting jobs. This project provides a good basis for further investigations in high performance computing for welding structure analysis.


This work was financially supported by the PRACE project funded in part by the EU’s Horizon 2020 research and innovation programme (2014-2020) under grant agreement 653838.


  • [1] Loose, T., Bernreuther, M., Große-Wöhrmann, B., Göhner, U.:
  • SHAPE Project Ingenieurbüro Tobias Loose: HPCWelding: Parallelized Welding Analysis with LS-DYNA, SHAPE White Paper, 2016,
  • [2] Loose, T.:
    Einfluß des transienten Schweißvorganges auf Verzug, Eigenspannungen und Stabilitiätsverhalten axial gedrückter Kreiszylinderschalen aus Stahl, Dissertation, Universität Karlsruhe, 2007
  • [3] Loose, T., Mokrov, O., Reisgen, U.:
    SimWeld and DynaWeld - Software tools to set up simulation models for the analysis of welded structures with LS-DYNA. In: Welding and Cutting 15, pp. 168 - 172, 2016
  • [4] ;

contact: Tobias Loose, loose[at]

  • Tobias Loose

Ingenieurbüro Tobias Loose

Large eddy simulation of pulverized coal and biomass combustion


Pulverized coal and biomass combustion (PCBC) is currently among the major sources of energy supply and is expected to play an important role in future energy supply. However, coal combustion releases large amounts of carbon dioxide. Future power plants are required to be efficient and low-polluting, which could be achieved by carbon capture and storage or co-firing coal with biomass.

While experimental studies provide valuable and fundamental understanding of the processes of pulverized coal and biomass combustion, they cannot provide all information due to limited optical access and other issues related to the harsh combustion environment. Simulations, such as large eddy simulations (LES), can complement experimental findings by providing large data sets that can be analyzed in great detail. However, numerical methods and modeling approaches need to be developed further to facilitate a comprehensive investigation of the physics of PCBC. Our work is on developing such models and methods for PCBC. In particular, methods to treat particle conversion and the gas phase combustion are developed.

Results and Methods

The code used for all simulations is the in-house finite volume (FV) Fortran code PsiPhi. The code solves the implicitly filtered Navier-Stokes equations in the low Mach number limit. Continuity is enforced by a pressure-correction scheme and projection method using a Gauß-Seidel solver with successive over-relaxation. The code is parallelized by MPI with non-blocking communication and scaling is demonstrated for up to 128,000 cores. Cartesian, equidistant grids are used, which allow for an efficient usage of a large numbers of cells. A third order Runge- Kutta scheme is used for time-advancement.

Coal and biomass particles are treated as Lagrangian parcels. Their parallelization relies on the same domain decomposition as the gas phase treated by the FV method. Coupling between particles and gas phase is facilitated by tri-linear interpolation schemes. The discrete ordinates method is used to solve for the radiative heat transfer. The code is compiled with Intel Fortran and both IBM and Intel MPI is used. The code has been sped up by improving parallelization and algorithms by us and our collaborators during the projects on SuperMUC.

The overall CPU-hours used in project pr84mu were around 21 million and 10 million in project PRACE 2013081677. Typical, coarse grids consisted of ~500,000 cells and fine grids consisted of up to ~1,700,000,000 cells. Numerical particles were typically around 40,000,000. Coarse runs were conducted with ~1000 cores, whereas fine runs were conducted with ~15.000 cores. To reduce the issue of very long initialization times, results from coarse grids were used to initialize runs on finer grids. The largest run conducted, 16384 cores, generated one restart file per core with the total size of four terabytes. Data relevant for post processing was combined to a hdf5-file of around one terabyte. The overall WORK storage required was 31 terabytes.

Large-scale coal and biomass flames in furnaces that have been studied in detail experimentally—the IST and the BYU furnace—were used as reference cases. The classical coal and biomass conversion models, originally developed in cooperation with Imperial College were tested, as well as improved models and strategies. The first step of particle conversion, pyrolysis, is too complex to be explicitly modeled in LES. In cooperation with TUB Freiberg, a pre- processing strategy was developed to optimize the parameters of a simple empirical model based on the predictions of advanced pyro lysis models [1]. The massively parallel simulations provided a good description of scalar and velocity fields, confirmed by the good agreement with experiments. Flame stabilization, flame structure and particle burnout are strongly affected by the fuel properties and the fluid dynamics, and LES is able to provide insights to the phenomena occurring in this type of application that are currently not available through experimental means. Single particles were tracked over time and instantaneous ensembles were collected to obtain a better understanding of the conditions that coal particles are subjected to [2]. The effect of conversion modeling, particularly the empirical devolatilization model and the mode of char combustion model, on the flame lift-off and flame length of a co-fired flame was also investigated [3].

A part of the project was to incorporate the flamelet model, which is particularly popular in gaseous turbulent combustion computations, to the LES of coal combustion. The test case used for the flamelet LES is a semi-industrial scale furnace with a thermal power of approxi mately 2.5 MW. The first flamelet table used to describe the chemical state of the reacting gas phase was based on two mixture fractions for volatile and char-off gases as well as on enthalpy and variance. The flamelet table was generated before the simulation, and chemical state variables looked up from the table based on the four parameters obtained during the simulation. First simulations provided good results and could demonstrate the suitability of the flamelet model for such simulations [4]. A further improvement was achieved by including scalar dissipation rate as a look-up variable [5]. Scalar dissipation rate is an important parameter in the flamelet model and can be understood as an inverse mixing time-scale. The large simulations show very good results compared to the experiment and reveal a wealth of information that is yet to be analyzed. A key feature of such furnace simulations is depicted in figure 1, where the logarithmic scalar dissipation rate of the sum of the two mixture fractions is presented, showing regions of intense mixing in the volatile flame close to the inlet but also in the shear layers between flue gases and fresh combustion air at the edges of the quarl outlet. Figure 2 shows the temperature field of approxi mately the first half of the furnace. Individual coal parcels are illustrated in figure 3 along with the circumferential gas velocity for a coarser simulation of the same furnace.

On-going Research / Outlook

PCBC LES require large computations, which can only be conducted with HPC on systems like SuperMUC. Further work will be directed to improving the developed methods, such as investigating a more elaborate flamelet model, which is currently developed within a DFG project with collaborators from University of Stuttgart and TU Freiberg. This is part of a follow-up project currently conducted on SuperMUC. Furthermore, the interaction between coal and biomass particles will be extensively investigated to understand the role of each fuel in co-fired flames, which is critical to understand for the next generation of more efficient and clean boilers.

“SuperMUC Next Generation” will enable to simulate test cases even closer to industrial applications. But also to run laboratory scale configurations at higher resolution to resolve physics down to the scales of turbulence and particle-turbulence interaction, which provides valuable understanding of the physics of turbulent coal conversion.


  • [1] M. Rabaçal, B. Franchetti, F. Cavallo Marincola, F. Proch, M. Costa, C. Hasse, A.M. Kempf:
    2015. Large eddy simulation of coal combustion in a large- scale laboratory furnace. Proceedings of the Combustion Institute, 35, 3609-3617.
  • [2] M. Rabaçal, M. Costa, A.M. Kempf:
    2016. Particle History from Massively Parallel Large Eddy Simulations of Coal Combustion in a Large-Scale Laboratory Furnace, Fuel, in revision.
  • [3] M. Rabaçal, M. Costa, M. Vascellari, C. Hasse, M. Rieth, A.M. Kempf:
    2016. Large Eddy Simulation of Co-Firing Biomass and Coal in a Large-Scale Furnace, submitted to the Proceedings of the Combustion Institute.
  • [4] M. Rieth, F. Proch, M. Rabacal, B. Franchetti, F. Cavallo Marincola, A.M. Kempf:
    2016. Flamelet LES of a semi-industrial scale furnace. Combust. Flame, under revision.
  • [5] M. Rieth, F. Proch, A.G. Clements, M. Rabacal, A.M. Kempf:
    2016. Highly resolved flamelet LES of a semi- industrial scale furnace. Submitted to the Proceedings of the Combustion Institute.

contact: Martin Rieth, martin.rieth[at]

  • Andreas Kempf
  • Miriam Rabacal
  • Martin Rieth

University of Duisburg-Essen, Instituto Superior Técnico (University of Lisbon)

From the radioactive decays of mesons to the interactions of quarks and gluons

The determination of a fundamental constant of Nature

The ALPHA Collaboration has computed one of the most elusive fundamental parameters of Nature: the strong coupling. It governs the interactions of quarks and gluons. At high energies, such as the ones reached at the Large Hadron Collider (LHC) at CERN, many processes can be computed in terms of a Taylor series in this coupling. A precise input value for these series is thus essential to make full use of the accelerator. We have simulated the fundamental theory of strong interactions called Quantum Chromodynamics (QCD) over a large range of energy scales in order to extract the coupling at LHC energies.

Fundamental constants of Nature and the Standard Model

Over the last few decades, particle physicists have explored the fundamental forces down to distance scales of ≈ 10-18m. It was found that the experimental observations are described to very high accuracy by a theory which is known as the Standard Model of particle physics.

The Standard Model describes the inter actions of the fundamental constituents of matter through electromagnetic, weak and strong forces in terms of three different quantum gauge theories. It does so in terms of a few funda mental constants of Nature. Its success is not only a consequence of the mathematical simplicity of its basic equations. It is also due to the fact that the forces they describe are relatively weak at the typical energy transfers in particle physics scattering experiments of about 10 – 100 GeV1. The strengths of the interactions are characterized by coupling constants. When the forces are weak, the predictions of the theory can be worked out in terms of an expansion in powers of these coupling constants, a procedure known as perturbation theory. For instance, in Quantum Electrodynamics (QED), the quantum gauge theory describing the interactions between electrons and photons, the coupling constant is the well-known fine structure constant α≈1/137. Its small size guarantees that only a few terms in the power series are sufficient in order to predict physical quantities with high precision.

In the gauge theory for the strong force, called QCD, quarks and gluons assume the rôle of electrons and photons in QED. Quarks are the constituents of the more familiar proton and neutron. QCD’s coupling constant is called αs. As a consequence of Quantum Physics, all coupling “constants” in the Standard Model depend on the energy transfer μ in the interaction process. In this sense they are not really constant, rather they “run” with the energy scale. At µ≈100 GeV the strong coupling is about αs≈0.11. Although this is much larger than the fine structure constant of QED, perturbation theory still works well and essential aspects of processes of high energy scattering can be computed accurately.

An example is the probability of two-jet events in proton-proton scattering, cf. figure 1.

However, as the energy scale μ decreases below 100 GeV, the value of αs increases. At μ below 1 GeV, it becomes so large that perturbation theory cannot be relied upon anymore at all. In fact, this energy region is intrinsically non-perturbative, which means that the perturbative expressions even fail to give a qualitatively correct description. The qualitative change is due to the striking property of “confinement”: despite being the fundamental constituents of QCD, quarks and gluons are confined inside protons, neutrons, π-mesons and many other particles, all known as hadrons. Only hadrons are produced in experiments and leave direct tracks in experiments. Examples are shown in figure 1 and in figure 2.


We are then faced with the task of connecting the analytically accessible realm of QCD at high energies (jets, figure 1) with the properties of protons, π-mesons, K-mesons and other hadrons observed at low energies (for example figure 2). For the latter analytic methods such as perturbation theory fail completely.

Instead, computer “simulations” of QCD formulated on a discrete lattice of space-time points allow for a non-perturbative treatment of the theory in the low-energy regime. The lattice-discretized formulation of QCD contains more information than the Feynman rules of perturbation theory and also low-energy quantities like hadron masses and matrix elements related to decays are computable. This requires the numerical evaluation of the lattice path integral by numerical methods. As usual when numerical approximations on grids come into play, one must study sequences of progressively finer discretizations and take the lattice constant to zero by an extrapolation. In addition, computers demand the restriction of space-time to a finite region which is an approximation that has to be controlled as well.

In such simulations it is natural to use a few hadron masses as input to tune the free para meters in lattice QCD in order to predict the rest afterwards.

1 In particle physics it is customary to use “natural units” where the speed of light, c and Planck’s constant, ℏ are set to one and energies as well as masses are given in GeV. As an orientation note that mproton≈ 1 GeV, where 1 GeV/c2 = 1.602 • 10-7 J.

We are faced with a multi-scale problem: on the one hand we want to probe the short distance / high energy properties of the theory. On the other hand, we need to make contact to the observable hadrons at much lower energy scales. This is a challenge for the computational approach using a grid discretization. The short distance investigation needs much finer grids than the long range part where, however, the overall size of the simulated space-time must be rather large. Accommodating all scales on a single lattice is not feasible. The ALPHA collaboration has developed a recursive finite size scaling technique and demonstrated that it works in simpler lattice theories.

Finite size scaling

The crucial idea of this method is to consider a sequence of sizes for the finite space-time box containing QCD (“femto universe“) [1]. The smallest system is chosen such that due to Heisenberg’s uncertainty relation it corresponds to high energy where perturbative QCD applies. Within perturbative QCD it is then connected to high energy scattering in an infinite volume.

Furthermore, successive boxes differ by scale factors of two (cf. figure 4) and are related to each other by taking the continuum limit. Eventually one arrives at a box sufficiently large for hadrons to fit in. In this way the multi-scale problem is circumvented and a physical scale ratio is implemented that grows exponentially with the number of steps. In terms of figure 3, one starts in a situation where the white window is on the right, and then moves it recursively to the left, until the one suitable for hadrons is reached. This method does not compromise with the multiple scales by handling them on single lattices and is thus amenable to systematic improvement and error control. Its application to QCD is now far advanced [2–6] .

New precision for the strong coupling

In particular, we have now computed the energy dependence for two different non-perturbative definitions of the QCD coupling . They are called “Schrödinger functional” and “Gradient Flow”. Like all sensible definitions of the coupling, they coincide at the lowest order of the perturbative expansion, but differ by higher powers and non-perturbative physics. In figure 5 we show the β-function, which is the logarithmic deriva tive of the coupling g= with respect to the energy scale. It is now known with unprecedented precision in the whole region up to αs=1 or L≈1 fm. The difference to perturbation theory (dashed line) is considerable.

Furthermore, we have connected the finite volume coupling of L≈1 fm, to the decay rate of π- and K-mesons which is known from low energy experiments, cf. figure 2. Decays of these mesons are radioactive decays into, e.g. a μ-lepton and a neutrino which happen only due to the weak interaction of the Standard Model. At the fundamental level radioactive decays proceed by a quark of one species annihilating with an antiquark of another species, producing a very short-lived W-boson which then decays into a lepton and a neutrino. Roughly speaking, the decay rate is then given in terms of the probability of the quark and antiquark to be at the same point in the meson. In precise terms, the decay rate is encoded in the “decay constant” of the meson which is a QCD property. Reverting this, the decay constant is known from the experimentally measured decay rate. We then use this particular quantity as input to connect our computer experiments to the real world since it is computable with high precision, including all systematics, from our lattice simulations. Other inputs that enter are the masses of the π- and K-meson.

Computational Aspects

The finite size scaling analysis illustrated in figure 5 required a couple of hundred different simulations on space-time volumes with up to 32×32×32×32 points. Some of these simulations could be done with local resources but most of them were run on the Crays at the HLRN. The largest computational effort went into the connection of the femto universe to the meson decay constants, where very large volumes are required. These challenging simulations were done together with our colleagues in the Coordinated Lattice Simulations consortium (CLS)[6]. It took a small series of PRACE- projects and Gauss-projects on BG/Q in Jülich and SuperMUC in Munich to make the simulations more and more realistic, concerning the lattice spacing and the masses of the light quarks. The largest lattices with 643 × 192 lattice points were simulated on 65536 cores. The field-configurations generated in these simulations will be used in the future to study other interesting aspects of QCD. Besides our projects at the Gauss Center, it was essential for us to be able to also run a large number of smaller scale simulations at the HLRN in Berlin.


The final result of our computations, just released at the annual lattice field theory symposium [7], is
αs(mz) = 0.1179 + 0.0010 + 0.0002 .

It is in agreement and more precise than the current world average of a number of different determinations. Most importantly, our computation in the three-flavor theory is at a new level of rigor, using perturbation theory only when αs is small, but still keeping excellent non-perturbative precision.

We have included only the three lighter of the six quarks in our simulations. The heavier ones have been added perturbatively. The power series describing these additions are very well behaved and we have estimated the small, second uncertainty in the value given above. It will be an interesting and worthwhile project to remove any doubt of this use of perturbation theory by including the next heavier quark in the simulations.


  • [1] M. Lüscher, P. Weisz and U. Wolff:
    Nucl. Phys. B 359 (1991) 221. doi:10.1016/0550-3213(91)90298-C
  • [2] K. Jansen, C. Liu, M. Lüscher, H. Simma, S. Sint, R. Sommer, P. Weisz and U. Wolff:
    Phys. Lett. B 372 (1996) 275 doi:10.1016/0370-2693(96)00075-5 [hep-lat/9512009].
  • [3] M. Della Morte et al. [ALPHA Collaboration]:
    “Computation of the strong coupling in QCD with two dynamical flavors,” Nucl. Phys. B 713 (2005) 378 doi:10.1016/j.nuclphysb.2005.02.013 [hep-lat/0411025].
  • [4] M. Dalla Brida, P. Fritzsch, T. Korzec, A. Ramos, S. Sint and R. Sommer:
    “The accuracy of QCD perturbation theory at high energies,” arXiv:1604.06193 [hep-ph].
  • [5] M. Dalla Brida, P. Fritzsch, T. Korzec, A. Ramos, S. Sint and R. Sommer:
    “Slow running of the Gradient Flow coupling from 200 MeV to 4 GeV in Nf = 3 QCD,” arXiv:1607.06423 [hep-lat].
  • [6] M. Bruno et al.:
    “Simulation of QCD with Nf = 2 + 1 flavors of non- perturbatively improved Wilson fermions”, JHEP 1502 (2015) 043 doi:10.1007/JHEP02(2015)043 [arXiv:1411.3982 [hep-lat]].
  • [7] M. Dalla Brida, M. Bruno, P. Fritzsch, T. Korzec, A. Ramos, S. Schaefer, S. Simma, S. Sint and R. Sommer:
    “Precision determination of the strong coupling at the electroweak scale”, presentation at the 34th annual ‘International Symposium on Lattice Field Theory’,
  • Mattia Dalla Brida
  • Stefan Schaefer

John von Neumann Institute for Computing (NIC), DESY, Platanenallee 6, 15738 Zeuthen, Germany

Mattia Bruno

  • John von Neumann Institute for Computing (NIC), DESY, Platanenallee 6, 15738 Zeuthen, Germany
  • Physics Department, Brookhaven National Laboratory, Upton, NY 11973, USA
  • Patrick Fritzsch

Instituto de Física Teórica UAM/CSIC, Universidad Autónoma de Madrid, C/ Nicolás Cabrera 13-15, Cantoblanco, Madrid 28049, Spain

  • Tomasz Korzec

Department of Physics, Bergische Universität Wuppertal, Gaußstr. 20, 42119 Wuppertal, Germany

  • Alberto Ramos

CERN, Theory Division, Geneva, Switzerland

  • Stefan Sint

School of Mathematics, Trinity College Dublin, Dublin 2, Ireland

Rainer Sommer

  • John von Neumann Institute for Computing (NIC), DESY, Platanenallee 6, 15738 Zeuthen, Germany
  • Institut für Physik, Humboldt-Universität zu Berlin, Newtonstr. 15, 12489 Berlin, Germany

Electron-injection techniques in plasma-wakefield accelerators for driving free-electron lasers

Plasma wakefields can sustain electric fields on the order of 100 GV/m for the acceleration of electrons up to GeV energies in a distance of only few centimetres. Control over the process of injection of electron beams that witness these high accelerating gradients is of utmost importance for applications that require excellent beam quality as e.g. needed for free-electron lasers (FELs) in photon science. Utilising plasma wakes, it is envisaged that miniaturised FELs may be constructed, dramatically increasing the proliferation of this technology with revolutionary consequences for applications in biology, medicine, material science and physics.

Plasma wakefield acceleration is a quickly developing novel- acceleration technology, allowing for a substantial increase of the average gradient in particle accelerators when compared to current state-of-the-art facilities. When focused into a plasma, an ultra-short laser pulse (laser-wakefield acceleration, LWFA [1]) or a relativistic particle beam (beam-driven plasma acceleration, PWFA [2,3]) repels electrons from its vicinity and forms waves in electron density which are following the driver with a phase velocity close to the speed of light. This allows to create a cavity with simultaneous accelerating and focusing properties for charged particle beams. Inside such a plasma-accelerator module, gradients on the order of 100 GV/m can be sustained without being limited by material breakdown, outperforming conventional radio-frequency schemes by orders of magnitude. Plasma-based accelerators offer a unique opportunity for the production of high-brightness beams for applications, such as FELs. In the future, plasma accelerators may allow for miniaturised FELs [4, 5] with order-of-magnitude smaller cost and footprint than available today.

The FLASHForward project at DESY [6] is a pioneering plasma wakefield acceleration experiment that aims to produce, in a few centi metres of plasma, beams with energy of order GeV that are of a quality sufficient to demonstrate FEL gain. To achieve this goal, FLASHForward will utilise the electron beams produced in the FLASH accelerator as drivers for the generation of strong wakefields in a novel hydrogen plasma cell at a density of around 1017cm-3. The length of the accelerating cavity at these plasma conditions is on the order of 100 microns, and therefore, only ultra-short beams can be created and accelerated inside these structures.

One of the challenging tasks of the FLASHForward project is to study and design injection techniques for the generation of high-brightness witness beams, suitable for application in FELs. Owed to the highly non linear nature of the dynamics in plasma -wakefield accelerators, analytical treatment only allows for an approximate description. Thus, efficient numerical modelling is required in order to obtain a full description of the relevant physics. The particle- in-cell (PIC) method allows for a precise rendering of the complex dynamics with available computational costs. The electromagnetic fields of the system are discretised on a three-dimensional spatial grid (the cells), while the individual particles of the involved plasma or beam species (electrons, ions etc.) are represented by numerical particles. The computational load is distributed over a large number of processors, which simultaneously solve the underlying equations in different spatial regions of the system. The parallelization and distribution of the work among hundreds to tens of thousands of processing units and their efficient communication in supercomputers allows for high-fidelity numerical modelling of all relevant phenomena in plasma-based acceleration.

By employing the PIC code OSIRIS [7] on the high-end supercomputer JUQUEEN at the Jülich Supercomputing Centre (JSC), we have been able to propose and study several novel injection techniques for the generation of high-quality beams in the plasma wake driven by FLASH-like electron beams in three spatial dimensions (3D). The simulation results shown in figure 1 illustrates the newly proposed wakefield- induced ionisation injection technique [8]. The wakefields at the rear end of the plasma cavity are capable of ionising electrons from a well- localised neutral helium gas region coexisting with the hydrogen plasma. Once the beam and the plasma wake have crossed the helium region, the ionised electrons are trapped and are forming a high-quality electron beam which is being accelerated by a field magnitude of more than 100 GV/m. After injection and acceleration over a distance of about 3 cm (compared to required ~100 m in conventional accelerators), the generated beam features three times higher per-electron-energy than the initial driver, and a brightness 10 times higher than the initial one. A simulation result depicting the injection of electrons in plasma-density transitions [9] is shown in figure 2. Due to a rapid elongation of the plasma wave during the down-ramp, electrons at the crest of the plasma wave can be trapped in the accelerating region of the cavity.

Control over the process of injection of electron beams in plasma wakefield accelerators is of utmost importance for the generation of high-quality electron beams. Therefore, our research project aims at exploring and analysing a number of regimes and methods for high-brightness-beam production from plasma- based accelerators with the intention to identify the most promising plasma- wakefield-accelerator design to power the next- generation of miniaturised X-ray FELs.


We thank the OSIRIS consortium (IST/UCLA) for access to the OSIRIS code. Special thanks for support go to J. Vieira and R. Fonseca. Furthermore, we acknowledge the grant of computing time by the Jülich Supercomputing Centre on JUQUEEN under Project No. HHH23 and the use of the High-Performance Cluster (Maxwell) at DESY. This work was funded by the Humboldt Professorship of B. Foster, the Helmholtz Virtual Institute VH-VI-503, and the ARD program.


  • [1] Tajima and Dawson:
    Phys. Rev. Lett. 43, 267 (1979)
  • [2] Veksler:
    Proceedings of CERN Symposium on High Energy Accelerators and Pion Physics 1, 80 (1956).
  • [3] Chen et al.:
    Phys. Rev. Lett. 54, 693 (1985).
  • [4] Fuchs et al.:
    Nat. Physics 5, 826 (2009).
  • [5] Maier et al.:
    Phys. Rev. X 2, 031019 (2012).
  • [6] Aschikhin et al.:
    Nucl. Instr. Meth. Phys. Res. A 806, 175 (2016).
  • [7] Fonseca et al.:
    Lecture Notes in Computer Science 2331, 342 (2002) & Plasma Phys. Control. Fusion 50, 124034 (2008).
  • [8] Martinez de la Ossa et al.:
    Phys. Rev. Lett.111, 245003 (2013). & Phys. Plasmas 22 093107 (2015).
  • [9] J. Grebenyuk, et al.:
    Nucl. Instr. Meth. Phys. Res. A 740, 246 (2014).

contact: Alberto Martinez de la Ossa,[at]

A. Martinez de la Ossa

  • Deutsches Elektronen-Synchrotron DESY, Hamburg, Germany.
  • Institut für Experimentalphysik, Universität Hamburg, Germany.

T. J. Mehrling

  • Institut für Experimentalphysik, Universität Hamburg, Germany.
  • Instituto Superior Técnico, Universidade de Lisboa, Portugal.
  • J. Osterhoff

Deutsches Elektronen-Synchrotron DESY, Hamburg, Germany.

Realistic modeling of semiconductor properties by first principles calculations


For the development of new communication and computing technologies conceptually new materials and device architectures are needed. One pathway of increasing the efficiency of e.g. integrated transistor circuits is to implement photonic functionality to the devices. With the HLRS project “GaPSi” we contribute to the developments in designing and producing optically active compound semiconductor materials that can be integrated into conventional silicon-based (Si) technology.

Critical challenges are posed by the growth of nanoscale thin films of different III/V materials (i.e. elements from groups 13 and 15 of the periodic system) due to stability and kinetic effects. We analyze the reactivity of precursor chemicals used during growth as well as the final materials’ properties with accurate, parameter-free calculations of the atomic and electronic structure. Chemical reactivity and other elementary processes determine whether a certain material combination can be realized as intended for implementing new phenomena into computer chips or sensors communicating via light instead of electrons.


We apply ab initio method without experimental input parameters, namely density functional theory and wave function based quantum chemical methods in static and dynamic simulations to describe the physical properties and reaction mechanisms relevant to semiconductor epitaxy via chemical vapor deposition. To achieve the necessary accuracy, firstly, a large number of reaction trajectories is often needed. The configurational space of an adsorbate close to a substrate can be highly complex and decomposition or surface diffusion dynamics depend strongly on a meaningful sampling of possible pathways.

Secondly, for calculating optical properties—e.g. band structures—of multinary compound semiconductor materials, large unit cells and tight sampling of reciprocal space are needed. These requirements can only be met with large computational resources and the capability of running several trajectories in parallel. The HLRS delivers optimal platforms to pursue our research and perform up-to-date calculations on a diverse range of questions in chemistry and materials science.


In recent years the reactivity of precursor mole cules that carry III/V elements to the semiconductor substrate for functionalization has been in our focus. We examined decomposition mechanisms of gas phase and adsorbate species [1-3]. Efficient elimination of carbohydrate side groups is important as to minimize defect contamination of the materials produced. Furthermore, crystal nucleation mechanisms are both of fundamental scientific interest and determine the material’s quality and interface structure. One aspect under investigation are the thermodynamic properties of the hydrogen passivation layer on Si substrates [4]. The coverage-dependent electronic and vibrational properties require large supercells as shown in figure 1. We applied different models (ab initio thermodynamics, interpolated and explicit phonon dispersion relations, Einstein model) for the evaluation of the equilibrium temperatures that allow activation of the H/Si(001) surface with respect to hydrogen desorption. Gallium phosphide (GaP) can be grown lattice-matched onto activated Si and represents a possible nucleation layer for further functionalization.

We have also studied chemical growth processes related to GaP growth on H/Si(001) (precursor chemisorption in early nucleation) [5] and the resulting morphology and properties of the GaP-Si interface grown [6]. In close cooperation with chemical vapour deposition experiments, transmission electron microscopy analysis and kinetic Monte Carlo modeling of growth processes, we found that kinetic aspects enable the atomic structure to intermix within a region of eight atomic layers across the interface. On the other hand, thermodynamic stabilities determine pyramidal shapes to occur at the interface (figure 2) as was concluded from our first principles calculations applying large supercells.

Further, we have investigated strain and chemical influences on the band gaps of the optically active semiconductor alloys Ga(NAsP) and dilute Ga(AsBi). Those can be used in device superstructures based on Si substrates with a GaP nucleation layer. Figure 3 shows the charge density of the valence band with two different local configurations of the Bi atoms in the GaAs lattice of dilute Ga(AsBi). The material’s band gap largely depends on this local configuration which we could rationalize by solid state chem ical bonding models [7].

All of our studies are compared to experi mental measurements on samples grown under commercially relevant conditions. The results are thus both directly related to larger-scale production of novel semiconductor materials as well as they have proven to be helpful in discovering new features, properties, or optimizing the growth conditions applied. Realistic models for ab initio calculations require large computational resources and reliable collaboration between theoretical and experimental scientists.


The authors acknowledge gratefully the HLRS for providing computing facilities and IT support, the DFG via the graduate college 1782 and the Beilstein Institute for financial support.


  • [1] A. Stegmüller, P. Rosenow, R. Tonner:
    Phys. Chem. Chem. Phys. 2014, 16, 17018. A. Stegmüller, R. Tonner, Inorg. Chem. 2015, 54, 6363. A. Stegmüller, R. Tonner, Chem. Vap. Deposition 2015, 21.
  • [2] A. Stegmüller, P. Rosenow, R. Tonner:
    Phys. Chem. Chem. Phys. 2014, 16, 17018. A. Stegmüller, R. Tonner, Inorg. Chem. 2015, 54, 6363. A. Stegmüller, R. Tonner, Chem. Vap. Deposition 2015, 161.
  • [3] A. Stegmüller, P. Rosenow, R. Tonner:
    Phys. Chem. Chem. Phys. 2014, 16, 17018. A. Stegmüller, R. Tonner, Inorg. Chem. 2015, 54, 6363. A. Stegmüller, R. Tonner, Chem. Vap. Deposition 2015, 14920.
  • [4] P. Rosenow, R. Tonner:
    J. Chem. Phys. 2016, 144, 204706.
  • [5] A. Stegmüller, K. Werner, M. Reutzel, A. Beyer, P. Rosenow, U. Höfer, W. Stolz, K. Volz, M. Dürr, R. Tonner:
    Chem. Eur. J. 2016, 22.
  • [6] A. Beyer, A. Stegmüller, J. O. Oelerich, K. Jandieri, K. Werner, G. Mette, W. Stolz, S. D. Baranovski, R. Tonner, K. Volz:
    Chem. Mater. 2016, 28, 3265.
  • [7] L. C. Bannow, O. Rubel, S. C. Badescu, P. Rosenow, J. Hader, J. V. Moloney, R. Tonner, S. W. Koch:
    Phys. Rev. B 2016, 93, 205202.
  • Ralf Tonner
  • Andreas Stegmüller

Fachbereich Chemie and Material Sciences Centre, Philipps-Universität Marburg, 35032 Marburg, Germany

Towards the holy grail of nuclear astrophysics with JUQUEEN

Processes involving alpha particles (4He nuclei) and alpha-like nuclei comprise a major part of stellar nucleosynthesis and mechanisms for thermonuclear supernovae. In an effort towards understanding alpha processes from first principles, we have performed the first ab initio calculation of the process of alpha-alpha scattering [1]. As our tool, we have used lattice effective field theory to describe the low-energy interactions of nucleons. To reduce the eight-body system to an effective two-cluster system, we have applied a technique called the adiabatic projection method. We find good agreement between lattice results and experimental phase shifts for S-wave and D-wave scattering. The computational scaling of the A1-body + A2-body problem is roughly (A1 + A2)2, mild enough to make first principles calculations of alpha processes possible. This should be contrasted with existing methods that either scale factorially or exponentially with the number of nucleons involved. In particular, an ab initio computation of the so-called “holy grail of nuclear astrophysics” [2], the reaction

at stellar energies is now in reach.

The basic framework is the method of nuclear lattice simulations, that had its breakthrough by allowing for the first ab initio calculation of the so-called Hoyle state in the spectrum of 12C [3]. This combination of the modern approach to the nuclear force problem based on an effective field theory with high-performance computing methods defines a completely new method to exactly solve the nuclear A-body problem (with A the number of nucleons, that is protons and neutrons, in a nucleus). The first ingredient of this method is a systematic and precise effective field theory description of the forces between two and three nucleons, that has been worked out in the last decade by various groups worldwide. To go beyond atomic number four, one has to devise a method to exactly solve the A-body problem. Such a method is given by nuclear lattice simulations. Space-time is discretized with spatial length Ls and temporal length Lt, and nucleons are placed on the lattice sites. The minimal length on the lattice, the so-called lattice spacing a, entails a maximum momentum, pmax = / a. On this lattice, the interactions between the nucleons are represented through auxiliary fields, that are integrated over. Such a lattice representation is ideally suited for parallel computing. Given this framework, the structure and spectrum of 12C [4] and 16O [5] as well as the ground state energies of all alpha-type nuclei up to 28Si have been calculated within a 1 % accuracy [6], based on the same microscopic Hamiltonian.

To calculate scattering processes on such a space-time lattice, our calculation proceeds in two steps. First, using exactly the same microscopic Hamiltonian as in the earlier nuclear structure calculations, allows one to construct an ab initio cluster Hamiltonian. This step is depicted in figure 1, where two clusters at large separation R are shown. Within the clusters, the full microscopic dynamic (strong and electromagnetic interactions) is included, such as polarization and deformation effects or the Pauli exclusion principle. As the separation R becomes very large, we can describe the system in terms of an effective cluster Hamiltonian (the free lattice Hamiltonian for two clusters) plus infinite-range interactions (like the Coulomb interaction). In the second step, we can then compute the two-cluster scattering phase shifts or reaction amplitudes using this adiabatic Hamiltonian. Here, one has to account for the strong and short-range Coulomb interactions between the protons and the neutrons in the clusters as well the long-range Coulomb interactions between the protons. While the first set of interactions can be accurately computed in a small volume L3 ≃ (16 fm)3, the latter one requires matching to Coulomb wave functions in a much larger volume, typically L3 ≃ (100 fm)3, where the Coulomb boundary conditions are imposed on a spherical wall. This method allows to extract the scattering phase shifts and is visualized in figure 2.

We work with the microscopic Hamiltonian at next-to-next-to-leading order (NNLO) in the chiral expansion of the nuclear forces on a coarse lattice with a lattice spacing a ≃ 2 fm. All appearing parameters, the so-called low-energy constants, have been fixed before in the systems with two, three and four nucleons. We thus can make parameter-free predictions of the low- energy - phase shifts. These are shown for the S- and the D-wave in comparison to the existing data in figure 3 and figure 4, respectively, at next-to-leading order (NLO) and at NNLO. The error bars are computed using a jackknife analysis of the stochastic errors of the Monte Carlo data. For the S-wave, we note that the calculated 8Be ground state is bound at NNLO, though only a small fraction of an MeV away from threshold. In the D-wave, the 2+ resonance energy and width are in fairly good agreement with the experimental results. We plan to revisit these 4He+4He calculations again in the future with different lattice spacings and going one order higher to next-to-next-to-next-to-leading order. These phase shifts provide useful benchmarks to assess systematic errors in calculations of higher-body nuclear systems and our calculation further demonstrates that an ab initio calculation of the holy grail of nuclear astrophysics is in reach.

I thank my collaborators Serdar Elhatisari, Evgeny Epelbaum, Hermann Krebs, Timo Lähde, Dean Lee, Thomas Luu and Gautam Rupak for a superb collaboration and the JSC for providing the computational resources.

contact: Ulf-G. Meißner, meissner[at]

  • Ulf-G. Meißner

Universität Bonn & Forschungszentrum Jülich

Magneticum Pathfinder: The simulation of the evolution of the universe in an unmatched precision


Within modern cosmology, the Big Bang marks the beginning of the universe and the creation of matter, space and time about 13.8 billion years ago. Since then, the visible structures of the cosmos have developed: billions of galaxies which bind gas, dust, stars and planets with gravity and host supermassive black holes in their centres. But how could these visible structures have formed from the universe’s initial conditions?

To answer these questions, theoretical astrophysicists carry out large cosmological simulations. They transform our knowledge about the physical processes which drive the formation of our universe, into models, as well as, they simulate the resulting evolution of our universe across a large range of spacial scales and over billions of years. To be comparable to ongoing and future cosmological surveys, such theoretical models have to cover very large volumes, especially to host the rarest, most massive galaxy clusters expected to be the lighthouses of structure formation detectable already at early times (e.g. at high redshifts). While the Universe makes its transition from dark matter dominated to dark energy dominated (i.e. accelerated expansion), the objects which form within it make their transition from young, dynamically active and star formation driven systems to more relaxed and equilibrated systems observed at late time (e.g. low redshifts). Especially here theoretical models in form of complex, hydrodynamical cosmological simulations are needed to disentangle the internal evolution of clusters of galax ies with respect to the evolution of the cosmological background. Such simulations will be essential to interpret the outstanding discoveries expected from current and forthcoming astronomical surveys and instruments like PLANCK, SPT, DES and eROSITA.

In cooperation with experts of the Excellence Cluster Universe’s data centre C2PAP and of LRZ, the world’s most elaborated cosmological simulation of the evolution of our universe was accomplished. The most comprehensive simulation within the Magneticum Pathfinder project pursues the development of a record number of 180 billion tiny spatial elements—each representing the detailed properties of the universe and containing about 500 bytes of information—in a previously unreached spatial length scale of 12.5 billion light years (see table 1).

Results and Challenges

To perform such simulations, we incorporated a variety of physical processes in the calculations, among them three are considered particularly important for the development of the visible universe: first, the condensation of matter into stars, second, their further evolution when the surrounding matter is heated by stellar winds and supernova explosions and enriched with chemical elements, and third, the feedback of super-massive black holes that eject enormous amounts of energy into the universe.

For the first time, these numerous characteristics of the simulations performed (see figure 1 and 2) make it possible to compare cosmo logical simulations in detail with large-scale astronomical surveys. Astronomical surveys from space telescopes like Planck or Hubble observe a large segment of the visible universe while sophisticated simulations so far could only model very small parts of the universe, making a direct comparison virtually impossible. Thus, Magneticum Pathfinder marks the beginning of a new era in computer-based cosmology.

This achievement is preceded by more than ten years of research and development accompanied through support by HPC centers, especially experts from the Leibniz Supercomputing Centre (LRZ) of the Bavarian Academy of Sciences and Humanities. One of the biggest challenges for such a complex problem is to find the right balance between optimizing the simulation code and the development of the astrophysical modeling. While the code permanently needs to be adjusted to changing technologies and new hardware, the underlying models need to be improved by including better or additional descriptions of the physical processes that form our visible universe.

To perform these largest simulations of the Magneticum Pathfinder project took about two years, including initial preparation and testing works. The research group was supported by the physicists of the data centre C2PAP which is operated by the Excellence Cluster Universe and located at the LRZ. Within the framework of several one-week workshops, the Magneticum Pathfinder team got the opportunity to use the LRZ’s entire highest-performance super computer SuperMUC for its simulation.

Overall, the Magneticum Pathfinder simulation of Box0 utilized all 86,016 computing cores and the complete usable main memory—about 155 out of a total of 194 terabytes—of the expansion stage “Phase 2” of the SuperMUC which was put into operation last year. The entire simulation required 25 million CPU hours and generated 320 terabytes of scientific data.

On-going Research / Outlook

The Magneticum research collaboration will continue to analyze the large amount of data produced within this project, see for example [2], [3], [4], [5] and [6]. Furthermore, these data will be made available for interested researchers worldwide via the public web service C2PAP- CosmoSim [7, 8], which is currently in the testing phase. The Munich-based astrophysicists are already engaged in further projects: Among others, Klaus Dolag is currently collaborating with scientists from the Planck collaboration to compare observations of the Planck satellite with the results of the Magneticum simulations.


  • [1]
  • [2] Dolag, Gaensler, Beck & Beck:
    Constraints on the distribution and energetics of fast radio bursts using cosmological hydrodynamic simulations, 2015, MNRAS 451, 4277
  • [3] Teklu, Remus & Dolag et al.:
    Connecting Angular Momentum and Galactic Dynamics: The complex Interplay between Spin, Mass, and Morphology, The Astrophysical Journal 2015, 812, 29
  • [4] Remus, Dolag & Bachmann et al.:
    Disk Galaxies in the Magneticum Pathfinder Simulations, 2015, International Astronomical Union Symposium, Volume 309,145-148
  • [5] Dolag, Komatsu & Sunyaev:
    SZ effects in the Magneticum Pathfinder Simulation: Comparison with the Planck, SPT, and ACT results, 2015, arXiv:1509.05134
  • [6] Bocquet, Saro, Dolag & Mohr
    Halo mass function: baryon impact, fitting formulae, and implications for cluster cosmology, 2016, MNRAS, 456, 2631
  • [7] Ragagnin, Dolag, Biffi, Cadolle Bel, Hammer, Krukau, Petkova & Steinborn:
    A web interface for hydrodynamical, cosmological simulations, submitted to Astronomy and Computing
  • [8]

contact: Nicolay J. Hammer, hammer[at]

  • Klaus Dolag

Universitäts-Sternwarte München, Fakultät für Physik der Ludwig-Maximilians-Universität

  • Nicolay J. Hammer

Leibniz-Rechenzentrum der Bayerischen Akademie der Wissenschaften

EXASTEEL From Micro to Macro Properties

Steel Materials

It has been said that the story of materials is the story of civilization. However, it is clear that, throughout the history of civilization, from the iron age to modern days, iron and steel have been among the most versatile materials known by humanity. Through processing, the mechanical properties of steel can be controlled over a very wide range. These properties can range from extremely hard but brittle to ductile and formable. Indeed, many production and tempering processes have been invented throughout history to influence the strength, toughness, hardness, elasticity, and plasticity parameters of steel materials.

At the core of the adaptability of steel to its particular application is its microstructure. Modern steel materials combine a still higher strength and better ductility with a lower weight. This is achieved through a special polycrystalline grain structure at the microscale: Dual phase (DP) steels consist at the microscopic level of a (softer) polycrystalline ferritic matrix phase and (harder) martensitic inclusions. The resulting microstructure leads to a complicated overall material behavior characterized by finite elasto- plasticity combined with isotropic and kinematic hardening; see figure 2.


In the DFG priority program SPP 1648 Software for Exascale Computing (SPPEXA), in the project “EXASTEEL – From Micro to Macro Properties”, we are developing a massively parallel simulation environment for computational material science. The focus are microheterogeneous materials such as modern high-strength steels. By combining robust and highly scalable non linear solver algorithms with the well-established computational homogenization method FE2 (on the algorithmic side) and including highly nonlinear material models such as crystal plasticity on the microscale (on the modeling side), we are developing an algorithmic tool serving as part of a virtual laboratory for material testing. Once completed, this will allow for the predictive numerical simulation of modern steel materials in a form not possible without massively parallel computers. The project EXASTEEL has brought together experts from computational mathematics (Axel Klawonn, Universität zu Köln; Oliver Rheinbach, TU Bergakademie Freiberg), experts from material science (Daniel Balzani, TU Dresden; Jörg Schröder, Universität Duisburg-Essen), and from computer science, experts in performance engineering (Gerhard Wellein, Universität Erlangen-Nürnberg) and in general purpose direct sparse solvers (Olaf Schenk, Universität Lugano).


For advanced high strength steels, the thermomechanical fields fluctuate at the microscale at length scales differing from the ones at the macroscale by 4 to 6 orders of magnitude. A reasonable FE-discretization down to the microscale would thus require 103-109 finite elements for a three-dimensional cube with the volume of 1μm3. Extrapolating this to a metal sheet with an area of 1 m2 and a thickness of 1 mm would lead to 1018-1024 finite elements. This is an enormously large implicit finite element problem even for the largest current super computers. Moreover, a brute force simulation would require full knowledge of the microscale for the complete macroscopic structure, which is not feasible. This would also produce more detailed results than necessary, since the locations of the phenomena of interest, e.g., induction of failure, do not need to be determined up to a level below micrometers. Therefore, a scale-bridging (homogenization) procedure is clearly the more desirable choice.

In our project, a computational homogenization method is applied, which, on the one hand, takes into account the microstructure on a representative volume element (RVE), but which applies a radically smaller number of degrees of freedom on the macro scale. The FE2 computational homogenization method uses two levels of finite element problems coupled through the macroscopic Gauss points: The macroscopic problem, which is discretized by (relatively coarse) finite elements (FE), and the microscopic problem on the RVEs, where the microstructure is resolved by a fine finite element mesh. Then, in each Gauss point of the macroscopic FE problem, a microscopic RVE problem is solved; see figure 1.

The FE2 computational scale bridging method helps us to reduce the problem size, compared with the full resolution of the microscale, by a factor of 103-106. The resulting three- dimensional, heterogeneous, nonlinear structural mechanics problems on the RVEs can still be expected to exceed 109-1012 degrees of freedom. Therefore state-of-the-art parallel implicit solver algorithms are also needed in our project.

Parallel Solvers

For this, in our FE2TI software, we build on parallel domain decomposition solvers combined with multigrid solvers. Domain decomposition methods are parallel divide and conquer algorithms for the solution of implicit problems. They rely on a geometrical decomposition of the original problems into parallel problems defined on subdomains. For nonlinear problems, the problem is typically first linearized by Newton’s method and then decomposed into parallel (linear) problems on the subdomains. In recent nonlinear domain decomposition methods [1], the order of these operations is reversed, i.e., the nonlinear problem is decomposed into parallel nonlinear problems to improve concurrency. We combine parallel nonlinear FETI-DP (Finite Element Tearing and Interconnecting) domain decomposition methods with parallel sparse direct solvers and Algebraic Multigrid (AMG) methods to obtain a robust and scalable family of solvers. We apply parallel AMG methods, because they are able to construct a multilevel hierarchy from an assembled sparse operator without knowledge of an underlying grid.

Parallel Multiscale Simulations using 6 Million FETI-DP Subdomains

In figure 3, we present weak scalability of our multiscale simulation environment to the complete Mira supercomputer at Argonne National Laboratory (786 432 cores and 1.5 million MPI ranks) using our FE2TI software package, for nonlinear hyperelasticity, discretized with 10-noded tetrahedral finite elements. Mira is a 49 152 node 10-petaflops Blue Gene/Q system at Argonne National Laboratory (USA) with a total number of 786 432 processor cores and is ranked No. 6 in the current TOP500 list (June 2016, Mira uses a Power BQC 16C 1.6GHz processor with 16 cores and 16 GB memory per node.

In our example, in figure 3, the FE2 multiscale approach allows to reduce the problem size from 5×1015 degrees of freedom to 2×1010 degrees of freedom, i.e., by more than 5 orders of magnitude. The resulting problem is then solved on up to 786 432 BlueGene/Q cores using parallel FETI-DP/AMG methods. For the largest problem, we use a total 1 572 864 MPI ranks and a total of more than 6 million FETI-DP subdomains.

Solver Scalability

Note that our solvers on their own, i.e., even without the FE2 method, can scale to the largest current supercomputers: In figure 4, the scalability of our nonlinear FETI-DP domain decomposition method to the complete Mira supercomputer with 786 432 processor cores is shown. Here, the largest problem has 62.9 billion unknowns; for more details, see [1,2]. The parallel Algebraic Multigrid (AMG) solver is also highly scalable and can efficiently make use of more than half a million parallel processes. We have considered recent AMG variants taylored for systems of PDEs and adapted especially to elasticity problems [4]. For this, we have cooperated with the authors of Boomer-AMG (Lawrence Livermore National Laboratory). Here, in H-AMG-LN, the BoomerAMG preconditioner uses a special interpolation which exactly interpolates the rigid body modes. In figure 5, we compare the different approaches U-AMG, H-AMG, H-AMG-LN. The most recent H-AMG-LN approach, taylored for linear elasticity, clearly performs best. These experiments were carried out on the JUQUEEN BG/Q at Jülich Supercomputing Centre (JSC).


This work was supported by the German Research Foundation (DFG) through the Priority Programme 1648 “Software for Exascale Computing” (SPPEXA). This research used resources, i.e., Mira, of the Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357. The authors gratefully acknowledge the Gauss Centre for Supercomputing (GCS) for providing computing time through the John von Neumann Institute for Computing (NIC) on the GCS share of the supercomputer JUQUEEN at Jülich Supercomputing Centre (JSC). The authors also gratefully acknowledge the use of JUQUEEN during the Workshop on “Extreme Scaling on JUQUEEN” (Jülich, 02/05/2015 - 02/06/2015).


  • [1] Klawonn, A., Lanser, M., Rheinbach, O.:
    Toward Extremely Scalable Nonlinear Domain Decomposition Methods for Elliptic Partial Differential Equations, SIAM J. Sci. Comput., 37-6, pp. C667-C696,, 2015
  • [2] Klawonn, A., Lanser, M., Rheinbach, O.:
    FE2TI: Computational Scale Bridging for Dual-Phase Steels, Parallel Computing: On the Road to Exascale (Proceedings of ParCo 2015), IOS Series Advances in Parallel Computing, Vol. 27, pp. 797-806,, 2016
  • [3] Klawonn, A., Lanser, M., Rheinbach, O., Stengel, H., Wellein, G.:
    Hybrid MPI/OpenMP parallelization in FETI-DP methods, Recent Trends in Computational Engineering - CE2014, Lect. Notes Comput. Sci. and Eng. (ed. M. Mehl, M. Bischoff, M. Schäfer), Vol. 105, pp. 67-84,, 2015
  • [4] Baker, A.H., Klawonn, A., Kolev, T., Lanser, M., Rheinbach, O., Meier Yang, U.:
    Scalability of Classical Algebraic Multigrid for Elasticity to Half a Million Parallel Tasks, Accepted for publication in Lect. Notes Comput. Sci. Eng., TU Bergakademie Freiberg, Fakultät 1, Preprint 2015-14 or Lawrence Livermore National Laboratory Report No. LLNL-PROC-679553
  • [5] Balzani, D., Gandhi, A., Klawonn, A., Lanser, M., Rheinbach, O., Schröder, J.:
    One-way and fully-coupled FE2 methods for heterogeneous elasticity and plasticity problems: Parallel scalability and an application to thermo-elastoplasticity of dual-phase steels, Accepted for publication in Lect. Notes Comput. Sci. Eng., TU Bergakademie Freiberg, Fakultät 1, Preprint 2015-13
  • [6] S. Jaensch, M. Merk, E. Gopalakrishnan, S. Bomberg, T. Emmert, R. I. Sujith, and W. Polifke:
    “Hybrid CFD/ low-order modeling of nonlinear thermoacoustic oscillations,” in submitted to the 36th Symposium of the Combustion Institute, Seoul, Korea, 2016.
  • [7] A. Albayrak, W. Polifke:
    “On the propagation velocity of Swirl Waves in Annular Flows”,Proceedings of the 21st International Congress on Sound and Vibration, Beijing, China, 2014.

Contact: Axel Klawonn, axel.klawonn[at] - Martin Lanser, mlanser[at] - Oliver Rheinbach, oliver.rheinbach[at]

  • Axel Klawonn
  • Martin Lanser

University of Cologne

  • Oliver Rheinbach

Technische Universität Bergakademie Freiberg


Human Brain Project: Towards a European infrastructure for brain research

In March of this year, the Human Brain Project (HBP) [1] successfully released initial versions of its six Information and Communication Technology (ICT) Platforms to users outside the project [2]. The HBP Platforms are designed to help brain researchers advance faster and more efficiently, by sharing data and results, and exploiting advanced ICT capabilities. The release marked the end of the HBP’s 2.5-year ramp-up phase and the start of the next phase, during which the HBP continues to build an open, community-driven infrastructure for brain research.

The HBP is a large-scale European project with over a hundred institutional partners from more than 20 countries in Europe and around the world. It is co-funded by the European Union (EU) within the EU’s FET (Future and Emerging Technologies) Flagships Initiative [3]. Launched in October 2013 under the 7th Framework Programme, it is meanwhile governed by a Framework Partnership Agreement (FPA), which was signed in October 2015 [4]. The FPA describes the HBP’s overall objectives, work plan and governance [5] for the remainder of its 10-year duration under Horizon2020 and beyond.

Infrastructure co-design

The major goal of the HBP is the creation of a user-centric Research Infrastructure (RI) for neuroscience and brain-inspired research areas such as neuromorphic computing. This goal has become the main focus of the HBP following recommendations from reviewers [6] and a mediation addressing criticism of the project from parts of the neuroscience community [7].

The HBP RI will emerge from the HBP’s six ICT Platforms, dedicated respectively to Neuroinformatics, Brain Simulation, High Performance Analytics and Computing, Medical Informatics, Neuromorphic Computing, and Neurorobotics. The Platform versions released in March consist of a preliminary hardware infrastructure, software tools, databases and programming interfaces, all of which are now being further de veloped and expanded in a collaborative manner with users, and integrated within the framework of a European RI. All Platforms can be accessed via the HBP Collaboratory [8], a web portal where users can also find guidelines, tutorials and information on training seminars.

To ensure that the HBP RI meets the requirements of the user community, the HBP is promoting a co-design approach to technology development. There are currently six major HBP Co-Design Projects (CDPs) for this purpose, which are each co-led by a domain scientist and an infrastructure expert. The HBP CDPs address challenging scientific problems that cannot be addressed with traditional methods in neuroscience, but which can possibly be solved with advanced technologies developed as part of the HBP RI [9].

Federated data infrastructure

A fundamental role in the HBP RI is played by the High Performance Analytics and Computing (HPAC) Platform, which is coordinated by the JSC at Forschungszentrum Jülich and CSCS, the Swiss National Supercomputing Centre in Lugano. The mission of the HPAC Platform is to provide the basic data and computing infrastructure that will enable scientists to deal with the huge amounts of data on the human brain. Specifically, enabling them to store the data, integrate it into models, use it in simulations, as well as analyze and visualize it. To this end, the participating data centers JSC, CSCS, Cineca and Barcelona Supercomputing Center are working closely together to develop a federated data infrastructure, codenamed FENIX. While strongly driven by HBP use cases, the scope of FENIX goes beyond neuroscience, as it should also benefit other research areas with similar requirements, such as materials science.

Pilot systems for interactive supercomputing

Regarding computing resources, the HPAC Platform currently federates existing HPC systems at the participating centers, including Europe’s fastest supercomputer Piz Daint at CSCS and JUQUEEN at JSC. Two new pilot systems, which were installed at JSC over the summer, have just been integrated into the Platform. The two systems are cutting-edge demonstrators that have been developed by Cray and a consortium of IBM and NVIDIA, respectively, within a Pre-Commercial Procurement (PCP), carried out by Forschungszentrum Jülich on behalf of the HBP. The goal of the HBP PCP is to have suppliers of HPC technology competitively research, develop and integrate novel technologies in the areas of dense memory integration, scalable visualization and dynamic resource management in order to enable “interactive supercomputing”, i.e., the interactive use of supercomputers for complex workflows comprising concurrent simulation, analysis and visualization workloads. The systems are currently used for testing and benchmarking, but are also already in productive use for neuroscience applications.


The refocusing of the HBP on its infrastructure- building mission during the first phase of the project was accompanied by the introduction of a new governance structure, which is by now in place. The HBP remains open for new partners to join the Core Project through open calls for the next project phases, while Partnering Projects may use the HBP Platforms for their research and contribute to infrastructure de velop ment [10].

The HPAC Platform will continuously be im - proved and expanded to enable neuroscientists to address key challenges. These include the creation of high-resolution brain atlases and the processing of brain images using advanced data analytics methods. Another example is the study of synaptic plasticity as a basis for learning, by combining large-scale simulations on massively parallel HPC systems with ultra-fast simulations enabled by the Neuromorphic Computing Platform.


contact: Boris Orth, b.orth[at]

  • Thomas Lippert
  • Anna Lührs
  • Boris Orth
  • Dirk Pleiter

Jülich Supercomputing Centre (JSC), Germany

  • Colin McMurtrie
  • Thomas Schulthess

CSCS, ETH Zürich, Switzerland

Fortissimo 2

Factories of the Future Resources, Technology, Infrastructure and Services for Simulation and Modeling 2

Fortissimo 2 is a project funded by the European Commission under the H2020 Framework Programme for Research and Innovation through Grant Agreement no. 680481 and part of the ICT Innovation for Manufacturing SMEs (I4MS) action ( FORTISSIMO2 is a follow-on action to the Fortissimo project established in 2013.

Project Outline

The success of bringing a new product to the market for a small or medium sized enterprise (SME) depends on its ability to balance innovation, costs, time and quality during product development. With the advent of computer aided engineering, SMEs have started to use simulations to help their engineers to create the most cost-effective products or even products that were not possible before. However, complex simulations need High Performance Computing resources to get results in a reasonable time. But traditionally HPC resources haven’t been affordable by SMEs.

Within its predecessor, the Fortissimo project, the principal objective has been not only to enable European SMEs to be more competitive through the use of simulation services running on a High Performance Computing (HPC) Cloud infrastructure, but also to provide them a “one-stop-shop”—the Fortissimo Marketplace—that allows convenient access to such HPC Cloud infrastructure and in addition to expert support, third-party applications, tools and helpdesk in one place.

In the Fortissimo 2 project, the Fortissimo Marketplace is continued and enhanced with advanced HPC-Cloud services based on High Performance Data Analytics (HPDA) and Coupled HPC Simulations. Any kind of sensor network, like weather or traffic flow control ones, produces a lot of data continuously. Many internet services store a lot of tracking data used later to optimize the service itself. Also complex simulations, like Coupled HPC Simu lations, create big data sets in a short time. Coupled HPC Simulations are used nowadays to study complex models in science, for example, a heart and a blood flow simulation running together for a better understanding of the human circulatory system. Somewhere in this big amount of data there is valuable information, however, the analysis to find this information is quite compute intense and can’t be handled by traditional data analysis tools. HPDA becomes here necessary to handle and analyse this vast amount of data in a reasonable time.

As its predecessor, the Fortissimo 2 project is driven by the requirements of Fortissimo 2 experiments (about 35 experiments in total), which will be brought into the project in three tranches, two of them open calls for proposals to solve real-world customer problems. An initial set of 14 experiments has started on February 1st, 2016 and two further sets will be started also in 2016. Details on the call procedure and documentation is available in the Fortissimo website here (

Fortissimo 2 Experiments

The following table summarizes the 14 initial experiments in Fortissimo 2. These experiments cover a wide range of topics in engineering and manufacturing, from gas and flame detector layouts simulations to railways infra structure for high speed train simulations. Partners involved on each experiment are shown in the last column with the leader partner in bold. The High Performance Computing Center Stuttgart of Stuttgart University (HLRS - USTUTT) is the leading partner in experiments E703 and E704 related to the foundry industry and aero acoustics CFD simulations respectively.

Role of HLRS

In the Fortissimo 2 project, the High Performance Computing Center of Stuttgart University (HLRS), is in charge of the Fortissimo Marketplace development and operations. Therefore, HLRS will not only operate and maintain the Fortissimo Marketplace, but also will extend it based on the requirements of the new experiments in Fortissimo 2. The main topics covered by these new requirements, obtained from the first initial experiments and later on, the open calls, focus on the use of high performance data analytics, data stream processing and an enhanced remote visualization of partial and final results if possible in real time.


The project is coordinated by the University of Edinburgh and involves initially 38 partners. There are 13 core partners: University of Edinburgh, Scapos, University of Suttgart, Sicos BW, Intel, Actur, XLAB, CESGA, Gompute, Bull, Atos, SURFsara and CINECA. There are also 25 experiment partners working since February 1st, 2016 on their experiments (see table in the Fortissimo 2 Experiments section).

Key Facts

Fortissimo 2 has a total cost of 11.1 M€ and an EU contribution of 10 M€ over a duration of three years, commensurate with achieving its ambitious goals. The project has started in November 1st, 2015 and will finish therefore in October 31st, 2018.


contact: Jochen Buchholz, buchholz[at] - Carlos Díaz, diaz[at]

Additional contact at HLRS: Michael Gienger: gienger[at] - Nico Struckmann: struckmann[at]

  • Jochen Buchholz
  • Carlos Díaz

HLRS, Stuttgart


Foundations of a European Research Center of Excellence in High Performance Computing Systems

Project outline

The EuroLab-4-HPC project’s overall goal is to build connected and sustainable leadership in high-performance computing systems by bringing together the different and leading performance orientated communities in Europe working across all layers of the system stack and, at the same time, fuelling new industries in HPC. Thus, to tackle the long-term challenges of HPC the project brings together European research groups to compete internationally.

The EuroLab-4-HPC project targets the vital importance to the progress of science and technology for HPC systems in Europe. In this scope, a collaboration to the ETP4HPC [2] project driving a European HPC vision towards exascale systems will be established. EuroLab-4-HPC purpose is to develop a long-term research agenda promoting innovation and education for HPC systems. The EuroLab-4-HPC project will achieve its objectives through:

  • defining an HPC curriculum in HPC technologies and best-practice education/training methods to foster future European technology leaders.
  • joining HPC system research groups around a long-term HPC research agenda by forming an HPC research roadmap and joining forces behind it.
  • accelerating commercial uptake of new HPC technologies.
  • building an HPC ecosystem with researchers and other stakeholders, e.g. HPC system providers and venture capital.
  • forming a business model and organization for the EuroLab-4-HPC excellence center in HPC systems.

The EuroLab-4-HPC project considers latest training activities and research by using the expertise of the involved partners and encouraging the scientific exchange between EuroLab-4-HPC and projects such as PRACE [1] and ETP4HPC. Further, the consortium brings in specific HPC and Training expertise.

EuroLab-4-HPC addresses major gaps in implementing an effective European HPC strategy concerning industrial and academic leadership in the supply of HPC systems. EuroLab-4-HPC builds a HPC research community by consideration of ongoing activities. In this scope, ETP4HPC is taking a lead in building an HPC ecosystem. From a user perspective, PRACE offers a Europe-wide e-Science infrastructure and, as such, connects the scientific users. What is lacking is a consolidated research community in HPC systems motivated to drive innovations. What is needed for a European HPC strategy to be effective, is establishing links between a wide range of HPC stakeholders. Thus, the project triggers cross stack research and innovation. Additionally, in terms of roadmap and technology challenges it is important to form an agile process for long-term roadmapping of what are the major techno logical challenges facing HPC as well as having agile processes for bringing promising research ideas on a fruitful path to commercialization. The EuroLab-4-HPC objectives lead to the need of enhancing existing approaches and deploy new ones for training purposes for future HPC technology leaders.

As presented in figure 1 the EuroLab-4-HPC consortium combines aspects of research, education and innovation for achieving its goal to join forces among excellent research institutions in HPC systems across the system stack towards a long-term research agenda that drives innovation and education. For reaching this goal EuroLab-4-HPC takes into account sustainability and community building.

The EuroLab-4-HPC Roadmap

The EuroLab-4-HPC roadmap targets a long-term roadmap from 2022 to 2030 for the HPC domain. Due to the long-term view of the roadmap the EuroLab-4-HPC project started with an assessment of future computing technologies that could be relevant for HPC hardware and software.

Beyond the future computing techno logies there is an ever growing need of current and new applications for the HPC domain. However, it is not restricted to only HPC. Typically, HPC targets simu lations using numerical programs. EuroLab-4-HPC expects a continuous scaling of such applications to continue beyond exascale computers. Generally the roadmap targets two major trends being relevant for HPC and super computers. First, the emergence of data analytics completing simulation in scientific discovery is handled and second, the trend of cloud computing and warehouse scale computers will be targeted.

The EuroLab-4-HPC Curriculum

A major goal of the EuroLab-4-HPC project is to establish a HPC curriculum for HPC. Thus, the curriculum is a combination of courses that can be delivered in a traditional form and online. The online courses can be supported by a few limited physical presence sessions as needed, collocated with regularly occurring global events of the EuroLab-4-HPC project.

When thinking of relevant topics for the EuroLab-4-HPC curriculum, courses should cover following contents:

  • Parallel Computer Architectures
  • Scalable Parallel Algorithms
  • Programming with MPI
  • Parallel Computing with Hadoop
  • Programming Shared Memory Parallel Systems
  • Programming Multi-core and Many-core Systems
  • Performance Engineering
  • Programming Heterogeneous Systems with OpenCL/CUDA
  • Large Scale Scientific Computation

In addition, the suggested courses of the curriculum are mapped with programmes for student training. In this scope it is distinguished between three programme proposals:

  1. A two-year MSc level program for CS/ECE and Science (Physics/Math) majors consisting of 9 courses and a semester thesis.
  2. A two-year MSc level program for CS/ECE and Science (Physics/Math) majors consisting of 9 courses and a semester thesis. However, this program varies from the first program through offering partially a different set of courses than the first program proposal.
  3. A single-year MSc level program for CS/ECE majors. It is Bologna-aligned and includes 7 core courses, 2 elective ones and a two-course equivalent thesis (60 ECTS).

Of course the above examples can be tailored in many ways. The Bologna agreement ECTS are a useful metric in gauging the level of the course at least in terms of student effort.

Besides the MSc level programs the curriculum covers courses for experts by figuring out and proposing needed courses being relevant for the HPC domain. In this scope, existing course offers and curricula are considered such as the ACM Computer Science Curricula 2013 [3], course offers of HLRS [4], PRACE or EIT [5]. The curricula will be constantly adapted to address latest educational and training challenges and technological developments. In addition to the curriculum, best practices will be developed in EuroLab-4-HPC, giving guidelines for offering courses and for taking care about customized demands for online courses and training activities.

Role of HLRS in the Project

The role of the High Performance Computing Center is about the educational aspect such as training activities. In this scope HLRS offers its long-term experience and a broad set of courses being physical ones and online training activities. Besides coordination activities, HLRS is deeply involved in offering its expertise regarding HPC training achieved through training activities offered for academia and industry as well as through projects such as PRACE or bwHPC-C5 [6].

Key Facts

The EuroLab-4-HPC project is funded by the European Commission within the Horizon 2020 FET Proactive Programme being a research and innovation action. It has started at the 01.09.2015 and will run until the 31.08.2017.

EuroLab-4-HPC Partners

  • Chalmers University of Technology
  • Barcelona Supercomputing Center
  • Foundation for Research and Technology Hellas
  • University of Stuttgart
  • INRIA - Institut National de Recherche Eninformatique et Automatique
  • University of Manchester
  • ETHz - Eidgenoessische Technische Hochschule Zuerich
  • EPFL - Ecole Polytechnique Federale de Lausanne
  • Technion – Israel Institute of Technology
  • Rheinisch-Westfalische Technische Hochschule Aachen
  • Ghent University


contact: Axel Tenschert, tenschert[at]

Additional contact at HLRS: Bastian Koller, koller[at]

  • Axel Tenschert

HLRS, Stuttgart