cl-mpi

2016-08-25

cl-mpi provides convenient CFFI bindings for the Message Passing Interface (MPI). MPI is typically used in High Performance Computing to utilize big parallel computers with thousands of cores. It features minimal communication overhead with a latency in the range of microseconds. In comparison to the C or FORTRAN interface of MPI, cl-mpi relieves the programmer from working with raw pointers to memory and a plethora of mandatory function arguments.

If you have questions or suggestions, feel free to contact me (marco.heisig@fau.de).

cl-mpi has been tested with MPICH, MPICH2, IntelMPI and Open MPI.

Usage

An MPI program must be launched with mpirun or mpiexec. These commands spawn multiple processes depending on your system and commandline parameters. Each process is identical, except that it has a unique rank that can be queried with (MPI-COMM-RANK). The ranks are assigned from 0 to (- (MPI-COMM-SIZE) 1).

A wide range of communication functions is available to transmit messages between different ranks.

To become familiar with cl-mpi, see the examples directory. When cl-launch is installed, each test can be invoked like this:

mpiexec -n 3 ./examples/ring.lisp

However, it is usually faster and more resource-friendly to first compile your Lisp application to a stand-alone executable and than launch it with mpiexec. To illustrate this, the scripts directory contains a script build-cl-mpi-application.sh (that is just a very thin layer on top of cl-launch) that can be used like this:

./scripts/build-cl-mpi-application.sh ./examples/ring.lisp ring
mpiexec -n 4 ring

Testing

To run the test suite:

   ./scripts/run-test-suite.sh all

or

   ./scripts/run-test-suite.sh YOUR-FAVOURITE-LISP

Performance

cl-mpi makes no additional copies of transmitted data and has therefore the same bandwidth as any other language (C, FORTRAN). However the convenience of error handling, automatic inference of the message types and safe computation of memory locations adds a little overhead to each message. The exact overhead varies depending on the Lisp implementation and platform but is somewhere around 1000 machine cycles.

Summary:
  • latency increase per message: 400 nanoseconds (SBCL on a 2.4GHz Intel i7-5500U)
  • bandwidth unchanged

Authors

  • Alex Fukunaga
  • Marco Heisig

Special Thanks

This project was funded by KONWIHR (The Bavarian Competence Network for Technical and Scientific High Performance Computing) and the Chair for Applied Mathematics 3 of Prof. Dr. B?nsch at the FAU Erlangen-N?rnberg.

Big thanks to Nicolas Neuss for all the useful suggestions.

Author
Marco Heisig <marco.heisig@fau.de>
License
MIT
Categories
distributed