MARMOT– an MPI Analysis and Checking Tool
Introduction
The Message Passing Interface (MPI) is widely used
to write parallel programs using message passing ([1],
[2]). Due to the complexity of parallel programming,
several reasons demand an increased need for debugging
of MPI programs. First, the MPI standard leaves many
decisions to the implementation, e.g. whether or not
a standard communication is blocking, and therefore
portability between different MPI implementations
is not guaranteed. Second, parallel applications get
more and more complex and especially with the introduction
of optimizations like the use of non-blocking communication
also more error prone. Examples of incorrect usage
of MPI are the introduction of irreproducibility,
race conditions, deadlocks and incorrect management
of resources like communicators, groups, data types
and operators. MARMOT [3] is a tool to aid in the
development and debugging of MPI programs by verifying
the standard conformance of an MPI program automatically
during runtime.
Finding errors in MPI programs is a difficult task
that has been addressed in various ways by existing
tools. The solutions can be roughly grouped into four
different approaches: classical debuggers, special
MPI libraries and other tools that may perform a run-time
or post-mortem analysis. Existing tools are afflicted
with some disadvantages: they may not be freely available,
they may require source code modification or language
parsers, they may be limited to special platforms
or language bindings, or they may be unable to catch
incorrect usage of MPI, but only help to analyze the
situation once the incorrect usage has produced an
error like a segmentation violation.

Figure 1: Meterological application
Design of MARMOT
MARMOT is a library that has to be linked to the
application in addition to the native MPI library,
without requiring any modification of the application’s
source code. MARMOT uses the MPI profiling interface
to intercept the MPI calls with their parameters for
analysis before they are passed from the application
to the native MPI library. As this profiling interface
is part of the MPI standard, MARMOT can be used with
any MPI implementation that adheres to this standard.
The checking tool adds an additional MPI process for
all global tasks that cannot be handled within the
context of a single MPI process, like deadlock detection.
Information between the MPI processes and this additional
debug process are transferred using MPI. For the application,
this additional process is transparent. Another global
task is the control of the execution flow, i.e. the
execution will be serialized if this option is chosen
by the user. Local tasks are performed in a distributed
way on every process of the parallel application,
for example verification of resources like communicators,
groups, datatypes, operators or verification of other
parameters like ranks, tags, etc.

Figure 2: Architecture of MARMOT
Current Status
MARMOT supports the complete MPI-1.2 standard and
provides the functionality described above. For example,
when using the non-blocking call MPI_Isend the tool
automatically checks if the communicator is valid,
i.e. if it is MPI_COMM_NULL or if it is a communicator
that has been created and registered with the appropriate
MPI calls. Similarly, MARMOT alsoverifies if the data
type is valid, i.e. if it is MPI_DATATYPE_NULL or
if it has been created and registered properly. The
tool also inspects the validity of the count, rank
and tag parameters and the proper handling of the
request argument, i.e. if an unregistered request
is used or if an active request is being recycled.
The tool has been tested successfully with internal
test suites, benchmarks, and applications from the
CrossGrid project [8]. It was already ported to IA32
and IA64 clusters with MPICH [4], further to IBM,
Hitachi, Cray and NEC platforms. Performance measurements
show that attaching MARMOT to an application introduces
an inevitable but still tolerable overhead [5, 6].
In the future we will improve MARMOT‘s performance,
extend the number of tests and add further functionality
according to the users‘ needs [7].
Acknowledgements
The development of MARMOT is supported by the European
Union through the IST-2001-32243 project “CrossGrid”
[8].
References
[1] Message Passing Interface
Forum. MPI
A Message Passing Interface Standard, June 1995.
http://www.mpi-forum.org/
[2] Message Passing Interface
Forum. MPI-2
Extensions to the Message Passing Interface, July
1997.
http://www.mpi-forum.org/
[3]MARMOT
http://www.hlrs.de/organization/tsc/projects/marmot/
[4] MPICH
http://www-unix.mcs.anl.gov/mpi/mpich/
[5] B. Krammer, K. Bidmon,
M. S. Müller, M. M. Resch
MARMOT, an MPI Analysis and Checking Tool. In Proceedings
of ParCo2003, Dresden, Germany, Elsevier
[6] B. Krammer, M. S. Müller,
M. M. Resch
MPI Application Development Using the Analysis Tool
MARMOT. In M. Bubak, G. D. van Albada, P. M. Sloot,
and J.J. Dongarra, editors, Computational Science
- ICCS 2004, Volume 3038 of Lecture Notes in Computer
Science, pp. 464 - 471, Krakow, Poland, 2004. Springer
[7] B. Krammer, M. S. Müller,
M. M. Resch
MPI I/O Analysis and Error Detection with MARMOT.
In D. Kranzlmüller, P. Kacsuk, and J.J. Dongarra,
editors, Recent Advances in Parallel Virtual Machine
and Message Passing Interface, Volume 3241 of Lecture
Notes in Computer Science, pp. 242 - 250, 11th European
PVM/MPI Users‘ Group Meeting, Budapest,
Hungary, 2004. Springer
[8 ]CrossGrid
http://www.eu-crossgrid.org
top
|