LinuxCluster PDF
LinuxCluster PDF
Navtej Singh
Version 1.0
DISCLAIMER
The author has placed this work in the Public Domain, thereby relinquishing all copyrights.
Everyone is free to use, modify, republish, sell or give away this work without prior consent
from anybody.
This documentation is provided on an "as is" basis, without warranty of any kind. Use at your
own risk! Under no circumstances shall the author(s) or contributor(s) be liable for damages
resulting directly or indirectly from the use or non-use of this documentation.
Revisions:
1
Astronomical Data Processing on Linux Cluster Navtej Singh
Contents
1 Introduction 3
2 Requirements 3
2.1 Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
4 Testing 13
6 Troubleshooting 18
References 20
2
Astronomical Data Processing on Linux Cluster Navtej Singh
1 Introduction
Beowulf cluster is a collection of dedicated computing nodes made with commodity class hard-
ware, connected using commercial off the self (COTS) network interfaces, and running open
source infrastructure [1]. It can be used for High Availability (HA) or High Performance (HP)
applications. Technically, cluster of workstations isnt a beowulf, as the workstations are not
dedicated to beowulf but performing other tasks also. For the present work, we will be using
cluster of workstations and beowulf cluster interchangeably, as processor and network load is
minimal for most of the other tasks.
The beowulf cluster mentioned in this document is created for high performance computing
and can be easily scaled up to include more computing nodes.
We will start with listing the hardware and software requirements for creating such a cluster
(Section-2). A step-by-step procedure on how to construct the cluster is discussed in Section-3.
Basic sanity test to check every part of the cluster are discussed in Section-4. Two parallel
astronomical data processing programs are used to highlight the power of cluster for such tasks
(Section-5). Some of the issues that may arise during cluster construction and their resolution
are outlines in Section-6.
2 Requirements
2.1 Hardware
Commodity hardware is used to create the beowulf cluster. Such a cluster can be heterogenous
i.e. computing nodes made of personal computers, laptops, headless (and diskless) machines
etc. Similarly, network interfaces between the machines can be commercial off-the-self. In the
present configuration, two personal computers (quad core machines) and one macbook pro were
connected through a gigabit switch (1000Base-TX). Router or hub can also be used instead of
switch although most of the routers available for home and office use only support 10Base-T
and 100Base-TX networking. Machines in the cluster were able to talk to outside world through
a router (optional). Refer to Section-3.2 for cluster networking layout and configuration.
Hardware specifications of the machines and network devices in the cluster are listed in
Table 1.
3
Astronomical Data Processing on Linux Cluster Navtej Singh
2.2 Software
Theoretically, cluster with different operating systems (OS) on the nodes can be constructed
but to keep things simple, 32-bit Linux operating system was taken as the base OS on all
the nodes. Ubuntu linux was installed natively on Node1 whereas it was installed as virtual
machine on on Node2 and Node3. Open source virtualization software VirtualBox was used.
Following software were also installed to have a functional linux beowulf cluster:
VirtualBox allows running multiple virtual machines (operating systems) on the same ma-
chine and can utilize up to 32 virtual processor cores. Although, setting the number of virtual
core equal to actual processor cores is recommended for better performance. Details about
installing and configuring software is discussed in Section-3.
4
Astronomical Data Processing on Linux Cluster Navtej Singh
5
Astronomical Data Processing on Linux Cluster Navtej Singh
Number of virtual processors can be set under System configuration panel. It is recom-
mended that the virtual processors be equal to the actual number of processors (or cores) in the
machine. For Node2, memory for the virtual machine was set to 3.6 GB (out of 6 GB available)
and 1.8 GB on Node3 virtual machine (out of 4 GB available). Please refer to VirtualBoxs
user manual for details about VirtualBox configuration parameters and options.
As the cluster include only three nodes, Class-C network addressing was used. For network
with hundreds or thousands of nodes, Class-B or Class-A network addressing can be used [7].
Class-C addressing uses 192.168.*.* internet protocol (IP) addresses. Nodes on the cluster
were assigned static ip addresses (i.e. 192.168.1.x; where x varies from 2 to 254). To access the
machines using their host name (instead of IP address), following lines were added to /etc/hosts
file (on all the three machines):
192.168.1.2 Node1
192.168.1.3 Node2
192.168.1.4 Node3
Network interface1 on the nodes were configured with static ip addresses using the following
commands (on Node1) -
1. Open network configuration file for editing (need root access) using vi (or any other text
editor)2 -
$ sudo vi /etc/network/interfaces
6
Astronomical Data Processing on Linux Cluster Navtej Singh
5. Restart the networking service (or stop and start eth0 interface) -
$ sudo /etc/init.d/networking restart
Similarly, static IP address was configured on Node2 and Node3 (the only difference is IP
address).
User mpiu is created with /home/mpiu as its home directory, bash as its default shell and it
belongs to mpiu primary group. Set the password using following command -
$ sudo passwd mpiu
1. To find user id (UID) and group id (GID) of user mpiu, use the following command -
$ id mpiu
2. To change the user id for user mpiu, use the following command -
$ sudo usermod -u new_id mpiu
where new_id is new UID for user mpiu.
4. In case you made a mistake and want to delete the user (along with its home directory),
use the following command -
$ sudo userdel -r mpiu
Important - Set password, user id (UID) and group id (GID) of mpiu same on all
the nodes of the cluster.
7
Astronomical Data Processing on Linux Cluster Navtej Singh
Install openssh server on all the three nodes from Ubuntu repository -
$ sudo apt-get install openssh-server
It can also be installed from the source code if the nodes are not connected to internet.
To have password-less access to ssh server, public key authentication can be used. The way to
do this is to generate private-public key pair on the server using either RSA or DSA encryption
algorithm. The public key generated can then be appended to authorized_keys on the client
side. For our purpose, we will be using RSA authentication algorithm. Follow the steps below
to create private-public keys on Node1 and append it to authorized_keys on Node2 and Node3:
1. Log in as mpiu user on Node1 and generate private-public key pair using ssh-keygen
command -
$ ssh-keygen -t rsa
Leave passphrase empty when prompted. By default, public and private keys are gener-
ated in /home/mpiu/.ssh directory -
/home/mpiu/.ssh/id_rsa: Private key
/home/mpiu/.ssh/id_rsa.pub: Public key
The simplest way to run parallel programs on a cluster is to create a network file system (NFS)
on the master node (ideally the fastest machine with large hard disk) and mount it on all the
other compute nodes in the cluster. In our case, Node1 is the master node running NFS server
and is mounted on Node2 and Node3.
Install NFS server on Node1 using the following command -
$ sudo apt-get install nfs-kernel-server
Node1, Node2 and Node3 have to be configured so that network file system on Node1 can be
mounted on Node2 and Node3 at boot time. Follow the steps below to configure the cluster
nodes:
NODE1
8
Astronomical Data Processing on Linux Cluster Navtej Singh
4. Make a new entry to allow write access to the network file system on Node1 to Node2
and Node34 -
/mirror/mpiu Node2(rw,async,subtree_check,tcp,nohide)
/mirror/mpiu Node3(rw,async,subtree_check,tcp,nohide)
Note: NFS on Node1 can also be manually mounted using the following command -
$ sudo mount -t nfs -o async,tcp Node1:/mirror/mpiu /mirror/mpiu
There are couple of distributed file system alternatives to NFS which are specifically developed
for high performance computing. One of these open source distributed file system is GlusterFS.
GlusterFS is already included in Ubuntus software repository. Follow the steps below to install
and configure GlusterFS server and client -
NODE1
9
Astronomical Data Processing on Linux Cluster Navtej Singh
Image Reduction and Analysis Facility (IRAF) from National Optical Astronomy Observatories
(NOAO) is one of the leading software package used by professional astronomers for astronom-
ical image and data processing. Space Telescope Science Institutes PyRAF provides python
wrapper for IRAF, which allows scripting in user friendly python programming language. ESOs
Scisoft combines IRAF, PyRAF and many other astronomical software in a single easy to install
package. Follow the steps below to install it on all the nodes -
1. Download the latest tar version of Scisoft from ESOs FTP site
2. Scisoft is developed for Fedora Linux and few of the package dependencies are missing
from Ubuntu. Install the following packages from Ubuntu repository -
$ sudo apt-get install tcsh libgfortran3 libreadline5
$ sudo apt-get install libsdl-image1.2 libsdl-ttf2.0-0 unixodbc
3. Also, download the following two packages from Ubuntus archive website -
libg2c0_3.4.6-8ubuntu2_i386.deb gcc-3.4-base_3.4.6-8ubuntu2_i386.deb
Install using dpkg command -
$ sudo dpkg -i gcc-3.4-base_3.4.6-8ubuntu2_i386.deb
$ sudo dpkg -i libg2c0_3.4.6-8ubuntu2_i386.deb
10
Astronomical Data Processing on Linux Cluster Navtej Singh
Message passing Interface (MPI) protocol is one of most common message passing system used
in parallel computing. MPI version 2 library MPICH2 from Argonne National Laboratory
(ANL) is used for its ease of use and extensive documentation. It can be installed from Ubuntu
software repository or from the latest source code from ANLs website. It has to be installed
on all the cluster nodes.
$ sudo apt-get install mpich2
Any other MPI library can also be used.
MPI for Python provides binding of MPI for python programming language. MPI4PY module
was chosen for its maturity and ease of use although the documentation is scarce. Download
the latest version of the software from MPI4PYs google code website. Install it on all the
nodes of the cluster. Issue the following commands to install6 -
$ tar xvfz /path_to_mpi4py/mpi4py-ver.tar.gz
$ cd mpi4py-ver
$ sudo python setup.py install
It can also be installed using python setup tools7 .
Torque resource manager is open source counterpart of the commercially available resource
manager PBS. It is one of the most commonly used resource manager for high performance
computing. Torque has two components - server and client. Server is installed on the master
node and client on all the other nodes on the cluster. Start the installation with torque server
on the Node18 -
2. Unarchive source code into /usr/local/src directory as it will make it easy to un-install
or update the package in future.
3. Run the following commands to compile-link and install the default libraries -
$ cd /usr/local/src/torque-ver
$ sudo ./configure
6
ver is version number of mpi4py
7
more details about setup tools on python website
8
refer to PBS/torque admin manual
11
Astronomical Data Processing on Linux Cluster Navtej Singh
$ sudo make
$ sudo make install
By default, torque files are installed in /var/spool/torque ($TORQUEHOME henceforth)
directory. Refer to torque admin manual on how to install torque in a non-default direc-
tory.
4. Use cluster resources tpackage to create tarballs for compute nodes by running the fol-
lowing command in the source package on Node1 -
$ sudo make packages
Copy the mom (torque-package-mom-linux-i686.sh) and client (torque-package-clients-
linux-i686.sh) packages to Node2 and Node3 and run the following command to install
torque client on compute nodes -
$ sudo sh torque-package-mom-linux-i686.sh --install
$ sudo sh torque-package-clients-linux-i686.sh --install
5. Enable torque as service on server (on Node1) and client (on Node1, Node2 and Node3)
by copying the startup scripts from source package -
Node1
$ sudo cp contrib/init.d/debian.pbs_server /etc/init.d/pbs_server
$ sudo update-rc.d pbs_server defaults
6. Now we need to initialize and configure torque server on Node1. Torque servers serverdb
file contains configuration information of pbs_server and its queues. Run the following
command to initialize serverdb and restart the server -
$ sudo pbs_server -t create
$ sudo /etc/init.d/pbs_server restart
This will initialize basic server parameters and create a single batch queue.
7. Compute nodes can be added to the server either dynamically using qmgr or manually
updating the nodes file. Compute nodes Node1, Node2 and Node3 are added to torque
server -
Dynamically
$ sudo qmgr -c create node Node1
$ sudo qmgr -c create node Node2
$ sudo qmgr -c create node Node3
Manually
Update $TORQUEHOME/server_priv/nodes file and insert the following three lines (for
the three compute nodes) -
Node1 np=4 cluster01 RAM4GB
Node2 np=4 cluster01 RAM3GB
Node3 np=2 cluster01 RAM2GB
12
Astronomical Data Processing on Linux Cluster Navtej Singh
We have assumed 4 virtual processors each for Node1 and Node2 and 2 for Node3. Number
of virtual processors (np) can be greater than actual processors (cores) on the node.
8. Restart the torque server on Node1 and start torque client on all the compute nodes -
Node1
$ sudo /etc/init.d/pbs_server restart
Torques scheduler pbs_sched is very basic and therefore open source job scheduler Maui for
cluster and supercomputers is used. It is only required to be installed on the master node.
Follow the steps below to install it on Node1 -
1. Download Maui scheduler from Cluster Resources website. Registration is required before
downloading the software.
2. Unarchive the source code in /usr/local/src directory (as root) and run the following
commands to install it9 -
$ cd /usr/local/src
$ sudo tar xvfz /path_to_maui/maui-ver.tar.gz
$ cd maui-ver
$ sudo ./configure
$ sudo make
$ sudo make install
By default, files are installed in /usr/local/maui directory.
3. Start the scheduler -
$ sudo /usr/local/bin/maui &
Note - Maui can be started at boot time by creating a service script for it and placing it in
/etc/init.d/ directory.
Torque and Maui are optional software and does not necessarily required to run parallel jobs.
But they make it easy to administrator large number of batch jobs on bigger installations.
4 Testing
If all of the previous steps were successful, its time to test various components of the cluster.
We will start with testing MPI installation, followed by python binding of MPI and ending
with Torque/Maui resource manager functionality. Log into Node1 as mpiu user and follow the
steps below to start testing -
MPI Testing
9
ver is version number
13
Astronomical Data Processing on Linux Cluster Navtej Singh
1. MPI test programs can be downloaded from ANLs website. It has test code in C, C++
and FORTRAN programming languages. Place the test code in /mirror/mpiu directory.
2. We will test MPI installation using the the standard helloworld C code. Save the following
code as hello.c on /mirror/mpiu.
Listing 1: hello.c
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank ) ;
MPI_Comm_size(MPI_COMM_WORLD, &p ) ;
i f ( my_rank != 0 ) {
s p r i n t f ( message , " G r e e t i n g s from p r o c e s s %d ! " , my_rank ) ;
dest = 0;
MPI_Send( message , s t r l e n ( message )+1 , MPI_CHAR, d e st , tag ,
MPI_COMM_WORLD) ;
}
else {
for ( s o u r c e = 1 ; s o u r c e < p ; s o u r c e++) {
MPI_Recv( message , 1 0 0 , MPI_CHAR, s o u r c e , tag , MPI_COMM_WORLD,
&s t a t u s ) ;
p r i n t f ( "%s \n" , message ) ;
}
}
MPI_Finalize ( ) ;
4. To run code in parallel mode, host and number of processors has to provided. We will be
running the code on all the three compute nodes using 10 processors -
$ mpiexec -np 10 -host Node1,Node2,Node3 ./hello
10
mpic++ for C++ programs and mpif77 or mpif90 for FORTRAN programs
14
Astronomical Data Processing on Linux Cluster Navtej Singh
This will automatically divide the total number of processes on the three nodes. To
control the number of processes to start on each node, create a hosts file with following
lines -
Node1:4
Node2:8
Node3:2
Now, run mpi program using following command -
$ mpiexec -np 10 -f hosts ./hello
If the job fails, run the code on each node separately to pinpoint the problem. For example,
to run the program only on Node2 with 8 processes, execute the following command -
$ mpiexec -np 8 -host Node2 ./hello
Some basic errors encountered during MPI execution are listed in Section 6.
MPI4PY Testing
Listing 2: helloworld.py
#! / u s r / b i n / env python
Torque/Maui Testing
15
Astronomical Data Processing on Linux Cluster Navtej Singh
4. To run hello through torque/maui, save the following code to pbsjob file -
Listing 3: pbsjob
#! / b i n / bash
#PBS N p b s j o b
#PBS q b a t c h
#PBS l nodes=Node1 : ppn=4+Node2 : ppn=4+Node3 : ppn=2
#PBS l w a l l t i m e =1:00:00
#PBS e s t d e r r . l o g
#PBS o s t d o u t . l o g
#PBS V
cd $PBS_O_WORKDIR
mpiexec n 10 . / h e l l o
5. Python code helloworld.py can be run similarly using the following batch job code -
Listing 4: pbsjob
#! / b i n / bash
#PBS N p b s j o b
#PBS q b a t c h
#PBS l nodes=Node1 : ppn=4+Node2 : ppn=4+Node3 : ppn=2
#PBS l w a l l t i m e =1:00:00
#PBS e s t d e r r . l o g
#PBS o s t d o u t . l o g
#PBS V
cd $PBS_O_WORKDIR
mpiexec n 10 python h e l l o w o r l d . py
16
Astronomical Data Processing on Linux Cluster Navtej Singh
5.1 CRBLASTER
A parallel version of van Dokkums L.A.COSMIC algorithm to remove cosmic rays from as-
tronomical images was developed by Kenneth J. Mighell [8]. It uses message passing interface
protocol and is written in C. We will be using this program to remove cosmic rays from a
800x800 pixel HST WFPC212 image.
13
Follow the steps below to install and execute CRBLASTER on our cluster -
3. Change to crblaster directory and make CTFITSIO library (used to handle FITS14 image
files) -
$ make cfitsio
4. Make crblaster using following command -
$ make
5. Run CRBLASTER on a 800x800 pixel image -
$ cp images/in_800x800.fits in.fits
$ mpiexec -np 10 -f hosts ./crblaster 1 2 5 OR $ qsub pbs_job
This will generate a clean output image - out.fits. Refer to CRBLASTERs website for
details about input parameters.
5.2 PIX2SKY
IRAF package STSDAS has a task for transforming image pixel coordinates in HST images to
skys RA/DEC coordinates. But it processes only one pixel coordinate at a time. Running it
on an image which requires thousands to millions of coordinates position transformations will
take a very long time (e.g. running it on an image of a dense globular star cluster). We have
developed a parallel pythonic version of this module - pix2sky.
Pix2sky uses STScIs pyfits 15 module. ESO Scisoft package already includes pyfits. If not using
ESO Scisoft, pyfits can be downloaded from STScIs website and installed locally. To execute
pix2sky on the cluster, follow the steps below -
12
Hubble Space Telescope Wide Field Planetary Camera 2
13
more details on Mighells website
14
Flexible Image Transport System
15
Space Telescope Science Institute
17
Astronomical Data Processing on Linux Cluster Navtej Singh
3. Apart from the program, the package includes a 800x800 pixel HST image and file with
1 million X,Y pixel coordinates to be transformed to sky RA,DEC coordinates. Change
to pix2sky directory and execute the following command -
$ mpiexec -n 10 -f hosts python pix2sky.py data/in.fits data/in_xy.cat OR
$ qsub pbs_job
A pbs/torque batch job can also be created to execute the software on the cluster. Output
is a file with X,Y pixel coordinates and corresponding RA,DEC values.
It does not only runs on cluster of machines (using MPI protocol) but can also be executed on a
single multicore machine (using pythons multiprocessing module). Multiprocessing module was
only introduced in python version 2.6. Latest version16 of ESO Scisoft is still using python 2.5
and therefore multiprocessing module is not natively available. But backport for multiprocessing
module is available for python 2.4 and 2.5. To run pix2sky on a multicore machine (if using
python < 2.6), download multiprocessing backport from python website and install it locally.
The program automatically detects the number of processors (cores) on the system and utilizes
all of them. Execute the following command to run the program on all the cores -
$ python pix2sky_multi.py data/in.fits data/in_xy.cat
Number of processors can be controlled by using -n flag. Refer to pix2sky help - $ python
pix2sky_multi.py help, for all the program options.
6 Troubleshooting
Some common issues faced during cluster construction and their resolution -
1. Communication error between cluster nodes. There can be many different reasons
for communication errors between the nodes. Few things to check -
(a) Network file system (or GlusterFS) not mounted on all the nodes.
(b) SSH server not running or properly configured on the nodes
(c) Error in /etc/hosts file. Hostname should point to one and only one IP address. On
many machines, hostname may be pointing to 127.0.0.1. Comment it out.
2. Proxy server. Installing python packages on nodes behind proxy servers may fail. Set
environment variable http_proxy to proxy server for root user. On bash shell, execute -
$ export http_proxy=proxy_serve_hostnamer:port
3. NFS version 4. User and group assigned to xxxxxxx rather than mpiu. This may give
file permission errors while running jobs on the cluster. Set the following parameters in
/etc/default/nfs-common file on master node (Node1) -
NEED_STATD="no"
NEED_IDMAPD="yes"
16
version 7.5
18
Astronomical Data Processing on Linux Cluster Navtej Singh
Restart nsf-server -
$ sudo /etc/init.d/nfs-kernel-server restart
NFS server can be set permanently to run as version 3 by making the following changes
to /etc/default/nfs-kernel-server -
RPCNFSDCOUNT="16 no-nfs-version 4"
and restart the server.
4. SSH password-less log in. Even if after appending master nodes public key to au-
thorized keys on nodes it is asking for password, verify that the UID and GID of the
directory on nodes is same as on the master node.
6. Python.h missing. Python header files are required for compiling mpi4py and pyfits
python modules. Install them -
$ sudo apt-get install python-dev
19
Astronomical Data Processing on Linux Cluster Navtej Singh
References
[1] R. G. Brown, Engineering a beowulf-style compute cluster.
20