cl-mpi

2018-07-11

cl-mpi provides convenient CFFI bindings for the Message Passing Interface (MPI). MPI is typically used in High Performance Computing to utilize big parallel computers with thousands of cores. It features minimal communication overhead with a latency in the range of microseconds. In comparison to the C or FORTRAN interface of MPI, cl-mpi relieves the programmer from working with raw pointers to memory and a plethora of mandatory function arguments.

If you have questions or suggestions, feel free to contact me (marco.heisig@fau.de).

cl-mpi has been tested with MPICH, MPICH2, IntelMPI and Open MPI.

Usage

An MPI program must be launched with mpirun or mpiexec. These commands spawn multiple processes depending on your system and commandline parameters. Each process is identical, except that it has a unique rank that can be queried with (MPI-COMM-RANK). The ranks are assigned from 0 to (- (MPI-COMM-SIZE) 1). A wide range of communication functions is available to transmit messages between different ranks. To become familiar with cl-mpi, see the examples directory.

The easiest way to deploy and run cl-mpi applications is by creating a statically linked binary. To do so, create a separate ASDF system like this:

(defsystem :my-mpi-app
  :depends-on (:cl-mpi)
  :defsystem-depends-on (:cl-mpi-asdf-integration)
  :class :mpi-program
  :build-operation :static-program-op
  :build-pathname "my-mpi-app"
  :entry-point "my-mpi-app:main"
  :serial t
  :components
  ((:file "foo") (:file "bar")))

and simply run

(asdf:make :my-mpi-app)

on the REPL. Note that not all Lisp implementation support the creation of statically linked binaries (actually, we only tested SBCL so far). Alternatively, you can try to use uiop:dump-image to create binaries.

Further remark: If the creation of statically linked binaries with SBCL fails with something like "undefined reference to main", your SBCL is probably not built with the :sb-linkable-runtime feature. You are affected by this when (find :sb-linkable-runtime *features*) returns NIL. In that case, you have to compile SBCL yourself, which is as simple as executing the following commands (where SOMEWHERE is the desired installation folder):

git clone git://git.code.sf.net/p/sbcl/sbcl
cd sbcl
sh make.sh --prefix=SOMEWHERE --fancy --with-sb-linkable-runtime --with-sb-dynamic-core
cd tests && sh run-tests.sh
sh install.sh

Testing

To run the test suite:

   ./scripts/run-test-suite.sh all

or

   ./scripts/run-test-suite.sh YOUR-FAVOURITE-LISP

Performance

cl-mpi makes no additional copies of transmitted data and has therefore the same bandwidth as any other language (C, FORTRAN). However the convenience of error handling, automatic inference of the message types and safe computation of memory locations adds a little overhead to each message. The exact overhead varies depending on the Lisp implementation and platform but is somewhere around 1000 machine cycles.

Summary:
  • latency increase per message: 400 nanoseconds (SBCL on a 2.4GHz Intel i7-5500U)
  • bandwidth unchanged

Authors

  • Alex Fukunaga
  • Marco Heisig

Special Thanks

This project was funded by KONWIHR (The Bavarian Competence Network for Technical and Scientific High Performance Computing) and the Chair for Applied Mathematics 3 of Prof. Dr. B?nsch at the FAU Erlangen-N?rnberg.

Big thanks to Nicolas Neuss for all the useful suggestions.

Author
Marco Heisig <marco.heisig@fau.de>
License
MIT
Categories
distributed