Tutorial Cluster Knoppix
Tutorial Cluster Knoppix
Page 1 of 6
Level: Introductory Mayank Sharma ([email protected]), Freelance technical writer 22 Dec 2004 The cluster, a collection of computers that work together, is an important concept in leveraging computing resources because of its ability to transfer workload from an overloaded system (or node) to another system in the cluster. This article explains how to set up a load-balancing Linux cluster using Knoppix-based LiveCDs. Supercomputer is a generic term that refers to a computer that can perform far better than an ordinary computer. A cluster is a collection of computers that are capable of (among other things) transferring workload from an overloaded unit to other computers in the cluster. This feature is called load balancing. In this article, you'll learn how to set up a load-balancing cluster. By balancing loads effectively, a cluster improves its efficiency and earns its place in the family of supercomputers. For passing loads, computers in a cluster must be connected to each other. The computers in a cluster are called nodes. One or more master nodes and several drone or slave nodes manage the cluster. In a typical setup, the master node is where the applications are initiated. It's the master node's responsibility to migrate applications to the drones when required. In this article, you'll see how to use a Knoppix-based LiveCD to set up your very own supercomputer. You have probably heard of LiveCDs; they're wonderful try-before-you-install complete Linux systems that boot off your CD drive. Since their inception, individuals and projects have been using LiveCDs as their demonstration platform and LiveCDs have come a long way since the early "DemoLinux" days. But first, some background on the supercomputing cluster.
What is a supercomputer?
A supercomputer is typically used for scientific and engineering applications that perform a large amount of computation, handle massive databases, or both. (The term supercomputer may also refer to systems that are much slower but still impressively quick.) In reality, most supercomputing systems are multiple interlinked computers that perform parallel processing following one of the two general parallel processing approaches: SMP, or symmetric multiprocessing MPP, or massively parallel processing In SMP (also known as "tightly coupled" multiprocessing and a "shared everything" system), processors share memory and the I/O bus or data path, and a single copy of the operating system controls the processors. Sixteen processors is the usual top limit in most SMP systems. SMP systems have an advantage over MPP shows when performing online transaction processing (OLTP) in which many users employ a simple set of transactions to access the same database. Dynamic workload balancing is the facility that allows SMP to shine for this task. MPP systems (also known as a "loosely coupled" or "shared nothing" system) are characterized by a number of processors, each with their own operating system and memory, that process different parts of a single program at the same time. The system uses a messaging interface and a set of data paths that allow processors to communicate with each other. Up to 200 processors can be focused on a single task. Setting up an MPP system can be complicated, since it requires a lot of planning when it comes to parceling system resources and work assignments among processors (remember, nothing is shared). MPP systems have an advantage in applications in which users need to search a tremendous number of databases at the same time. The IBM Blue Pacific is a good example of the high end of supercomputers. The 5,800-processor, 3.9 teraflop system (with 2.6 trillion bytes of memory) was built in partnership with Lawrence Livermore National Laboratory to simulate the physics involved in nuclear reactions. Clustering represents the lower end of supercomputing, a more build-it-yourself approach. One of the most popular and best-
http://www-128.ibm.com/developerworks/linux/library/l-clustknop.html
4/14/2008
Page 2 of 6
known examples is the Beowulf Project, which explains how to use off-the-shelf PC processors, Fast Ethernet, and the Linux operating system to handcraft a supercomputer. See the Resources section below for more information on the BeoWulf Project. Now that you have clustering in the proper context, I'll show you how to start the process to set up your own cluster.
http://www-128.ibm.com/developerworks/linux/library/l-clustknop.html
4/14/2008
Page 3 of 6
Netmask -- 255.255.255.0 Default Gateway -- 192.168.1.1 IP address of Master -- 192.168.1.10 IP address of Drone #1 -- 192.168.1.20 I won't go into detail on networking in Linux. There's a lot of information available; see Resources below.
openMosixview
Bring up the utility by typing its name on the root shell. It will detect the number of nodes in the cluster and present you with a nice, geeky-looking interface. At a glance, you can see the efficiency of the cluster, the load on the cluster, the memory available to the cluster, the percentage of memory used, and other information. You won't see much activity at this point, since the cluster is hardly being used. Spend some time familiarizing yourself with this application.
openMosixmigmon
This application shows the processes that have been migrated from the master node to the drone. Move your mouse over the square surrounding the circle in the center. You'll be told the name of the process and its ID. To "migrate" a particular process from the master, you can drag a square and drop it in the smaller circle (the drone).
http://www-128.ibm.com/developerworks/linux/library/l-clustknop.html
4/14/2008
Page 4 of 6
OpenMosixAnalyzer
This simple application reports on the load of the cluster as well as individual nodes from its initialization till the time the cluster is up.
mosmon
This command-line-based monitor shows you the load on the cluster, the memory available, memory being used, and other things in real time. Review its man page to understand how you can tailor the view.
mtop
This tool is of interest to people who are familiar with top. top keeps track of each and every process running on the computer. mtop, a cluster-aware variant of top, also displays each and every process, but with the additional information of the node in which the process is running.
Open any word processor, copy this script, and save it as testapp.c. Make this script available on all the nodes in the cluster. To execute this script, issue these commands on all the nodes. First, compile the C program:
gcc testapp.c -o testapp
Then, execute ./testapp. Execute the script at least once on all the nodes. I executed three instances on both nodes. After executing each instance, toggle back to the applications described above. Notice the spur of activity. Enjoy watching
http://www-128.ibm.com/developerworks/linux/library/l-clustknop.html
4/14/2008
Page 5 of 6
your drawing room cluster migrate processes from one node to another. Look Ma, it's balancing loads!
What's next?
Instead of setting up a cluster using ClusterKnoppix on both nodes, you can also set up a heterogeneous cluster once you get the hang of it. In such a cluster, apart from the master node, you don't need to run a GUI on the slaves. You can run a distribution that is openMosix-aware but as small as the Linux kernel. CHAOS is probably the most popular choice of distribution to run on a drone node. It has a small memory footprint, which helps you save memory for the cluster, yet it's secure, reliable, and fast. So what are you waiting for? Show off with your drawing room cluster!
Resources
Beowulf clusters: e pluribus unum (developerWorks, September 2001) is a fine introduction to Beowulf-style clustering. The tutorial Linux clustering with MOSIX (developerWorks, December 2001) explains what it is, how you go about cluster-enabling a Linux system, and what benefits you derive from setting up a cluster. Creating a WebSphere Application Server V5 cluster (developerWorks, January 2004) introduces clusters for load balancing and failover support and describes how to set up a cluster with IBM WebSphere Application Server for Linux. Find information on Linux networking basics in the developerWorks tutorial series on Linux-powered networking. The openMosix Project provides details and updates on this kernel extension. The ClusterKnoppix site explains the distribution and offers an ongoing forum for posing questions. Wikipedia offers lots of information on LiveCDs. Find more resources for Linux developers in the developerWorks Linux zone. Download no-charge trial versions of IBM middleware products that run on Linux, including WebSphere Studio Application Developer, WebSphere Application Server, DB2 Universal Database, Tivoli Access Manager, and Tivoli Directory Server, and explore how-to articles and tech support, in the Speed-start your Linux app section of developerWorks. Get involved in the developerWorks community by participating in developerWorks blogs.
http://www-128.ibm.com/developerworks/linux/library/l-clustknop.html
4/14/2008
Page 6 of 6
http://www-128.ibm.com/developerworks/linux/library/l-clustknop.html
4/14/2008