Innovatives Supercomputing in Deutschland
inSiDE • Vol. 2 No. 2 • Autumn 2004
current edition
archive
centers
events
download
about inSiDE
index  index prev  prev next  next

MARMOT– an MPI Analysis and Checking Tool

Introduction

The Message Passing Interface (MPI) is widely used to write parallel programs using message passing ([1], [2]). Due to the complexity of parallel programming, several reasons demand an increased need for debugging of MPI programs. First, the MPI standard leaves many decisions to the implementation, e.g. whether or not a standard communication is blocking, and therefore portability between different MPI implementations is not guaranteed. Second, parallel applications get more and more complex and especially with the introduction of optimizations like the use of non-blocking communication also more error prone. Examples of incorrect usage of MPI are the introduction of irreproducibility, race conditions, deadlocks and incorrect management of resources like communicators, groups, data types and operators. MARMOT [3] is a tool to aid in the development and debugging of MPI programs by verifying the standard conformance of an MPI program automatically during runtime.

Finding errors in MPI programs is a difficult task that has been addressed in various ways by existing tools. The solutions can be roughly grouped into four different approaches: classical debuggers, special MPI libraries and other tools that may perform a run-time or post-mortem analysis. Existing tools are afflicted with some disadvantages: they may not be freely available, they may require source code modification or language parsers, they may be limited to special platforms or language bindings, or they may be unable to catch incorrect usage of MPI, but only help to analyze the situation once the incorrect usage has produced an error like a segmentation violation.


Figure 1: Meterological application

Design of MARMOT

MARMOT is a library that has to be linked to the application in addition to the native MPI library, without requiring any modification of the application’s source code. MARMOT uses the MPI profiling interface to intercept the MPI calls with their parameters for analysis before they are passed from the application to the native MPI library. As this profiling interface is part of the MPI standard, MARMOT can be used with any MPI implementation that adheres to this standard.

The checking tool adds an additional MPI process for all global tasks that cannot be handled within the context of a single MPI process, like deadlock detection. Information between the MPI processes and this additional debug process are transferred using MPI. For the application, this additional process is transparent. Another global task is the control of the execution flow, i.e. the execution will be serialized if this option is chosen by the user. Local tasks are performed in a distributed way on every process of the parallel application, for example verification of resources like communicators, groups, datatypes, operators or verification of other parameters like ranks, tags, etc.


Figure 2: Architecture of MARMOT

Current Status

MARMOT supports the complete MPI-1.2 standard and provides the functionality described above. For example, when using the non-blocking call MPI_Isend the tool automatically checks if the communicator is valid, i.e. if it is MPI_COMM_NULL or if it is a communicator that has been created and registered with the appropriate MPI calls. Similarly, MARMOT alsoverifies if the data type is valid, i.e. if it is MPI_DATATYPE_NULL or if it has been created and registered properly. The tool also inspects the validity of the count, rank and tag parameters and the proper handling of the request argument, i.e. if an unregistered request is used or if an active request is being recycled.

The tool has been tested successfully with internal test suites, benchmarks, and applications from the CrossGrid project [8]. It was already ported to IA32 and IA64 clusters with MPICH [4], further to IBM, Hitachi, Cray and NEC platforms. Performance measurements show that attaching MARMOT to an application introduces an inevitable but still tolerable overhead [5, 6]. In the future we will improve MARMOT‘s performance, extend the number of tests and add further functionality according to the users‘ needs [7].

Acknowledgements

The development of MARMOT is supported by the European Union through the IST-2001-32243 project “CrossGrid” [8].

References

[1] Message Passing Interface Forum. MPI
A Message Passing Interface Standard, June 1995.
http://www.mpi-forum.org/

[2] Message Passing Interface Forum. MPI-2
Extensions to the Message Passing Interface, July 1997.
http://www.mpi-forum.org/

[3]MARMOT
http://www.hlrs.de/organization/tsc/projects/marmot/

[4] MPICH
http://www-unix.mcs.anl.gov/mpi/mpich/

[5] B. Krammer, K. Bidmon, M. S. Müller, M. M. Resch
MARMOT, an MPI Analysis and Checking Tool. In Proceedings of ParCo2003, Dresden, Germany, Elsevier

[6] B. Krammer, M. S. Müller, M. M. Resch
MPI Application Development Using the Analysis Tool MARMOT. In M. Bubak, G. D. van Albada, P. M. Sloot, and J.J. Dongarra, editors, Computational Science - ICCS 2004, Volume 3038 of Lecture Notes in Computer Science, pp. 464 - 471, Krakow, Poland, 2004. Springer

[7] B. Krammer, M. S. Müller, M. M. Resch
MPI I/O Analysis and Error Detection with MARMOT. In D. Kranzlmüller, P. Kacsuk, and J.J. Dongarra, editors, Recent Advances in Parallel Virtual Machine and Message Passing Interface, Volume 3241 of Lecture Notes in Computer Science, pp. 242 - 250, 11th European PVM/MPI Users‘ Group Meet­ing, Budapest, Hungary, 2004. Springer

[8 ]CrossGrid
http://www.eu-crossgrid.org

Bettina Krammer
Matthias Müller

High Performance Computing Center Stuttgart (HLRS)


top  top