MPI on Debian
Adam Powell
Last modified May 23, 2003
Message Passing Interface, or
MPI is a really powerful set of communication tools on which to build parallel
processing computer programs. Many existing packages use it, and it's not very
difficult to set up on Debian.
The first thing you should know is that there are (at least) two
implementations of the MPI standard, called mpich and lam,
which are source compatible but not binary compatible! This means that
if a piece of software is built using mpich, it will not run with
lam, and vice versa. However, both can be installed at the same time,
and since Debian packages specify which one they depend on, you usually need
not worry about the difference too much, unless you plan to build/rebuild
yourself.
Opinions vary on this, but the consensus seems to be that lam has
slightly better performance, can operate with a heterogeneous network of
machines, and can do neat tricks like adding and removing machines on the fly.
However, you need to do a couple of extra things to use lam, as
discussed below (step 8).
The HOWTO
So, you have an application which depends on mpich or lam. Now what?
- Set up NIS (or other authentication mechanism) and
NFS on your cluster machines, so your users can log in everywhere with the
same home directory.
- Install your application on all of the cluster machines. This will
automatically drag with it the mpich/lam MPI implementation it depends on,
and rsh-client and rsh-server too.*
- Edit /etc/hosts.equiv and add the names of all of the machines
in the cluster (or let update-cluster-hosts do that when
bug #194460 is
fixed).
- Run "update-alternatives --display rsh" to make sure
rsh, and not ssh, is linked to
/usr/bin/rsh, and if necessary, "update-alternatives --config
rsh" to set that link properly.
- Test rsh using "rsh <machine> ls" or "rsh
<machine> w" etc. where <machine> is the name of
another machine in the cluster. Now you should be able to execute
arbitrary commands on any of the machines from any other.
- Install either the mpich or lam3-dev package. If you
install both, make sure to run "update-alternatives --display mpi"
to check which is the active one (and use --config to change
it).
- Build the MPI-using application.
- Edit /etc/mpich/machines.LINUX, or /etc/lam/bhost.def
and bhost.lam, adding the names of all of the cluster machines. If
the machines have multiple processors, you can enter them multiple
times.
Note: the update-cluster package can manage
/etc/mpich/machines.LINUX automatically, just build the
cluster.xml file and run "update-cluster-regenerate
mpich".
- For lam, each user must run "lamboot -v" to initialize
the cluster, and you may want to test it using recon. You should
probably read those man pages.
- Run the application (as an ordinary user) using "mpirun -np #
<appname> <args>" where # is the number of
processors, <appname> the program name, and
<args> that program's arguments. It will automatically run
on all of your cluster machines, communicating via the network. Isn't that
cool?
*SECURITY NOTE: An rsh server is a VERY dangerous thing to
install on machines with internet access. If someone knows that your machine
runs it, and the name of any user on the system, (s)he can ip-spoof an
rlogin/rsh to make it look like it's coming from a friendly machine, and log in
as that user. See for example
this old page on ip
spoofing (the 1.3.91 kernel and Netscape 2.0 described there date from-
1995?); with Microsoft including ip spoofing in Windows XP this kind of
activity is certain to increase. Be sure to run this only on a private network
behind a firewall, or on a single internet-connected
machine with only localhost in /etc/hosts.equiv. You've been
warned!
See also:
Share and enjoy!
Adam Powell