Innovatives Supercomputing in Deutschland
inSiDE • Vol. 9 No. 1 • Spring 2011
current edition
archive
centers
events
download
about inSiDE
index  index prev  prev next  next

The AstroGrid-D Use Case GEO600: A Breakthrough in Grid Computing

The use case GEO600 was part of the AstroGrid-D project, one of the first five scientific D-Grid community projects. It was contributed by the Max Planck Institute for Gravitational Physics’ Albert Einstein Institute (AEI). The goal of the GEO600 (The German-British Gravitational Wave Detector, www.geo600.org) use case was to port the analysis of the gravitational wave data measured with the GEO600 detector near Hannover to the Grid. The program for analyzing these data is Einstein@Home (www.einstein-athome.org). Einstein@Home is based on the BOINC framework (boinc.berkeley.edu).

The principle of the data analysis is as follows. Any user who wants to contribute to the analysis of the gravitational wave data must register at Einstein@Home, download the appropriate Einstein@Home client software to his computer, and then start the client. Whenever the computer idles, the client orders a dataset from the server, and start its analysis on the client computer, and the results are transmitted to the server.

Einstein@Home is an ideal candidate for a Grid application because of multi- platform support, well tested software base, simple resource requirements, build-in checkpoint and recovery methods, fine grained adjustable run time, and linear scaling with node number. The Einstein@Home client program itself was invoked as a black box, but all components which are necessary to bring this use case onto the Grid (the deployment), keep it running in production mode by restarting it after a regular job end and by cleaning recoverable errors, and making job statistics, was developed within the AstroGrid project by the AEI.

Figure 1: Example for the daily fluctuations of computation time consumed by GEO600.

For the submission of a job to a Grid resource, the Grid middleware Globus is used. The deployment is triggered by a script which is invoked in a Globus job to all Grid machines, where the GEO600 jobs should run. Special software packages which are required by GEO600 are installed automatically during the deployment.

GEO600 itself is started by a Perl script on the submission host, which invokes the submission of one or more Grid tasks to Globus resources. A configuration file enables the setting of certain task submission parameters individual for each Grid resource, e.g. the location of the deployment directory of the GEO600 software on a target machine, the total number of tasks to be submitted to a target resource, the number of tasks which should be submitted at a time, the walltime to be allocated for an Einstein@Home task. All these configurations affect the way, how the Einstein@Home clients should run on the Grid resources, it does not affect the Einstein@Home client itself.

The submission script uses a local MySQL database to control all submitted tasks based on the task identifiers, and to save the exit code when a task has terminated. Depending on the number of currently pending and active tasks and the parameters in the GEO600 resource configuration file, the submission script can automatically determine when to submit new tasks to a Grid resource. To establish a continuous Grid task submission scheme, it is therefore sufficient to invoke the submission script periodically, e.g. as a cron job, on the submission machine.

During the submission and execution of a complex software package like GEO600 to different HPC clusters in the Grid, various Grid related errors can occur which are difficult to track and analyze by hand. The Einstein@Home Client itself does not end up in an error state. An automated handling of typical Grid related errors has been implemented, such that it takes not more than 10 minutes a day to check the job submissions for failures. The GEO600 use case runs now in production mode, and it consumes between 100,000 and 150,000 CPU hours a day on the Globus resources of D-Grid (see Fig. 1).

Figure 2: E@H Pulsar discover plot, from Knispel et al. 2010. Left: significance S as a function of DM and spin frequency (all E@H results for the discovery beam). Right: the pulse profile at 1.5 GHz (GBT). The bar illustrates the extent of the pulse.

Currently Einstein@Home runs on the D-Grid Globus Ressources:

srvgrid01.offis.uni-oldenburg.de
udo-gt03.grid.tu-dortmund.de
lxgt2.lrz-muenchen.de
emilia.zih.tu-dresden.de
juggle-glob.fz-juelich.de
gramd1.d-grid.uni-hannover.de
gridmon.gwdg.de
iwrgt4.fzk.de
gt4-fzk.gridka.de

Two of the HPC clusters that have been used for the data analysis are operated by the GCS partners FZJ and LRZ. In addition to the provision of D-Grid compute resources, both GCS members provide central D-Grid services which are required for the operation of the D-Grid infrastructure and therewith for the submission of jobs to D-Grid compute resources.

The GEO600 use case has not been running properly in production mode in its beginning. In a first approach it was planned to store all the checkpoint files of the Einstein@Home client jobs (the GEO600 tasks) on a central database at the AIP, the so-called “AstroDataServer”. However, this approach led to a huge net traffic that ended up in longer times for transferring the checkpoint files from this central database to the execution hosts than the real execution times afterwards. So we decided to store the checkpoint files on the local file systems on the execution hosts.

The “Gridification” of the Einstein@Home brought a substantial breakthrough for Grid computing in Germany. In July 2010, a radio pulsar has been discov- ered in data recorded with the Arecibo Observatory in Puerto Rico by means of the Einstein@Home project. The data analysis on the D-Grid clusters is worldwide one of the largest contributions to the Einstein@Home project. On the other hand, Einstein@Home is the most successful scientific application of D-Grid. Without the Grid contribution to Einstein@Home, it would not have been possible to analyze sufficiently large amounts data to discover the pulsar. Figure 2 shows the discovery plot of this pulsar, obtained from Knispel et al. 2010.

For the future we are planning to extend the Einstein@Home job submission also to gLite and UNICORE based D-Grid resources. In order to get the job submission control independent of the addressed Grid middleware, larger changes are necessary. We would like to use the Grid Application Toolkit (GAT, see: www.cs.vu.nl/ibis/javagat.html). However, this requires converting the job control mechanism from Perl to Java.

References

[1] Pulsar Discovery by Global Volunteer Computing: Knispel, B., Allen, B., Cordes, J. M., Deneva, J. S., Anderson, D., Aulbert, C., Bhat, N. D. R., Bock, O., Bogdanov, S., Brazier, A., Camilo, F., Champion, D. J., Chatterjee, S., Crawford, F., Demorest, P. B., Fehrmann, H., Freire, P. C. C., Gonzalez, M. E., Hammer, D., Hessels, J. W. T., Jenet, F. A., Kasian, L., Kaspi, V. M., Kramer, M., Lazarus, P., van Leeuwen, J., Lorimer, D. R., Lyne, A. G., Machenschalk, B., McLaughlin, M. A., Messenger, C., Nice, D. J., Papa, M. A., Pletsch, H. J., Prix, R., Ransom, S. M., Siemens, X., Stairs, I. H., Stappers, B. W., Stovall, K., Venkataraman, A.
ScienceExpress: www.sciencexpress.org, August 12, 2010, page 1, 10.1126, science.1195253

• Alexander Beck-Ratzka
Max Planck Institute for Gravitational Physics


top  top