0% found this document useful (0 votes)
1 views8 pages

Module 2 CC

The document discusses computer clusters, which are interconnected computers that work together as a single resource pool, highlighting trends in cluster development from mainframes to x86 engines. It covers design objectives such as scalability, packaging, control, homogeneity, and security, as well as different types of clusters like compute, high-availability, and load-balancing clusters. Additionally, it analyzes the architecture of supercomputers and examples of modular packaging and interconnect systems used in real-world applications like Google's search engine cluster.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views8 pages

Module 2 CC

The document discusses computer clusters, which are interconnected computers that work together as a single resource pool, highlighting trends in cluster development from mainframes to x86 engines. It covers design objectives such as scalability, packaging, control, homogeneity, and security, as well as different types of clusters like compute, high-availability, and load-balancing clusters. Additionally, it analyzes the architecture of supercomputers and examples of modular packaging and interconnect systems used in real-world applications like Google's search engine cluster.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

MODULE 2

Cluster
 A computer cluster is a collection of interconnected stand-alone computers which
can work together collectively and cooperatively as a single integrated computing
resource pool.
 Clustering explores massive parallelism at the job level and achieves high
availability (HA) through stand-alone operations.

Cluster Development Trends


 Support for clustering of computers has moved from interconnecting high-end
mainframe computers to building clusters with massive numbers of x86 engines.
 Computer clustering started with the linking of large mainframe computers such
as the IBM Sysplex and the SGI Origin 3000.
 Originally, this was motivated by a demand for cooperative group computing and
to provide higher availability in critical enterprise applications.
 The clustering trend moved toward the networking of many minicomputers, such
as DEC’s VMS cluster, in which multiple VAXes were interconnected to share the
same set of disk/tape controllers.
 Tandem’s Himalaya was designed as a business cluster for fault-tolerant online
transaction processing (OLTP) applications.
 In the early 1990s, the next move was to build UNIX-based workstation clusters
represented by the Berkeley NOW (Network of Workstations) and IBM SP2 AIX-
based server cluster.
 Beyond 2000 the trend moved to the clustering of RISC or x86 PC engines.

Milestone Cluster Systems


 Clustering has been a hot research challenge in computer architecture. Fast
communication, job scheduling, SSI, and HA are active areas in cluster research.

 The NOW project addresses a whole spectrum of cluster computing issues,


including architecture, software support for web servers, single system image, I/O
and file system, efficient communication, and enhanced availability.

 The Rice University TreadMarks is a good example of software-implemented


shared-memory cluster of workstations.

 The memory sharing is implemented with a user-space runtime library.


Design Objectives of Computer Clusters:
Scalability
 Clustering of computers is based on the concept of modular growth. To scale a
cluster from hundreds of uniprocessor nodes to a supercluster with 10,000
multicore nodes is a nontrivial task.
 The scalability could be limited by a number of factors, such as the multicore chip
technology, cluster topology, packaging method, power consumption, and cooling
scheme applied.
 The purpose is to achieve scalable performance constrained by the
aforementioned factors.

Packaging
 Cluster nodes can be packaged in a compact or a slack fashion.

 In a compact cluster, the nodes are closely packaged in one or more racks sitting
in a room, and the nodes are not attached to peripherals (monitors, keyboards,
mice, etc.).

 In a slack cluster, the nodes are attached to their usual peripherals (i.e., they are
complete SMPs, workstations, and PCs), and they may be located in different
rooms, different buildings, or even remote regions.

 Packaging directly affects communication wire length, and thus the selection of
interconnection technology used.

Control

 A cluster can be either controlled or managed in a centralized or decentralized


fashion.
 A compact cluster normally has centralized control, while a slack cluster can be
controlled either way.
 In a centralized cluster, all the nodes are owned, controlled, managed, and
administered by a central operator.
 In a decentralized cluster, the nodes have individual owners.
Homogeneity

 A homogeneous cluster uses nodes from the same platform, that is, the same
processor architecture and the same operating system; often, the nodes are from
the same vendors.
 A heterogeneous cluster uses nodes of different platforms.
 Interoperability is an important issue in heterogeneous clusters.
Security

 Intra-cluster communication can be either exposed or enclosed.


 In an exposed cluster, the communication paths among the nodes are exposed to
the outside world.
 An outside machine can access the communication paths, and thus individual
nodes, using standard protocols (e.g., TCP/IP).
 Such exposed clusters are easy to implement, but have several disadvantages:
 Being exposed, intra-cluster communication is not secure, unless the
communication subsystem performs additional work to ensure privacy and
security.
 Outside communications may disrupt intra-cluster communications in an
unpredictable fashion.
 Standard communication protocols tend to have high overhead.

Cluster Family Classification


Compute clusters

 These are clusters designed mainly for collective computation over a single large
job.
 A good example is a cluster dedicated to numerical simulation of weather
conditions.
 The compute clusters do not handle many I/O operations, such as database
services.
 When a single compute job requires frequent communication among the cluster
nodes, the cluster must share a dedicated network, and thus the nodes are mostly
homogeneous and tightly coupled.
 This type of clusters is also known as a Beowulf cluster.
High-Availability clusters

 HA (high-availability) clusters are designed to be fault-tolerant and achieve HA of


services.
 HA clusters operate with many redundant nodes to sustain faults or failures.
 The simplest HA cluster has only two nodes that can fail over to each other. Of
course, high redundancy provides higher availability.

Load-balancing clusters

 These clusters shoot for higher resource utilization through load balancing among
all participating nodes in the cluster.

 All nodes share the workload or function as a single virtual machine (VM).

 Requests initiated from the user are distributed to all node computers to form a
cluster.
Analysis of the Top 500 Supercomputers
 The top supercomputers are analyzed by running the Linpack benchmark program.
 The single instruction, multiple data (SIMD) machines disappeared in 1997. The
cluster architecture appeared in a few systems in 1999.
 The cluster systems are now populated in the Top-500 list with more than 400 systems
as the dominating architecture class.
 The speed of supercomputers has increase over time.
 Linux OS is the OS that is mostly used in top 500 supercomputers.
 Jaguar and Hooper are the most powerful supercomputers, they consume lot of power.
 The Tianhe-1A consumes less power as it uses GPU.

Basic cluster architecture

 A simple cluster of computers built with commodity components and fully supported
with desired SSI features and HA capability.
 The processing nodes are commodity workstations, PCs, or servers.
 These commodity nodes are easy to replace or upgrade with new generations of
hardware.
 The node operating systems should be designed for multiuser, multitasking, and
multithreaded applications.
 The nodes are interconnected by one or more fast commodity networks.
 These networks use standard communication protocols and operate at a speed that
should be two orders of magnitude faster than that of the current TCP/IP speed over
Ethernet.
 The network interface card is connected to the node’s standard I/O bus (e.g., PCI).
 When the processor or the operating system is changed, only the driver software needs
to change.
 The desire to have a platform-independent cluster operating system, sitting on top of
the node platforms.
 A cluster middleware can be used to glue together all node platforms at the user space.
An availability middleware offers HA services.
 An SSI layer provides a single entry point, a single file hierarchy, a single point of
control.

Resource Sharing in Clusters


 Shared Nothing architecture

 The shared-nothing architecture is used in most clusters, where the nodes are
connected through the I/O bus. The shared-disk architecture is in favorof small-scale
availability clusters in business applications. When one node fails, the other node
takes over.
 The shared-nothing configuration in the above figure simply connects two or more
autonomous computers via a LAN such as Ethernet.
 Shared Disk architecture

 This is what most business clusters desire so that they can enable recovery support
in case of node failure.
 The shared disk can hold checkpoint files or critical system images to enhance
cluster availability.
 Without shared disks, checkpointing, rollback recovery, failover, and failback are
not possible in a cluster.
 Shared Memory architecture

 Shared-memory cluster is much more difficult to realize.


 The nodes could be connected by a scalable coherence interface (SCI) ring, which
is connected to the memory bus of each node through an NIC module.
 In the other two architectures, the interconnect is attached to the I/O bus.
 The memory bus operates at a higher frequency than the I/O bus.

Example 2.1 Modular Packaging of the IBM Blue Gene/L System


 The Blue Gene/L is a supercomputer jointly developed by IBM and Lawrence
Livermore National Laboratory.
 Two chips are mounted on a computer card.
 Sixteen computer cards (32 chips or 64 processors) are mounted on a node board.
 A cabinet houses 32 node boards with an 8 x 8 x 16 torus interconnect.
 Finally, 64 cabinets (racks) form the total system.
 Customers can order any size to meet their computational needs.
 The Blue Gene cluster was designed to achieve scalable performance, reliability
through built-in testability, resilience by preserving locality of
 Failures and checking mechanisms, and serviceability through partitioning and
isolation of fault-locations.
Crossbar Switch in Google Search Engine Cluster

 Google has many data centers using clusters of low-cost PC engines. These clusters
are mainly used to support Google’s web search business.
 A Google cluster interconnect of 40 racks of PC engines via two racks of 128 x 128
Ethernet switches.
 Each Ethernet switch can handle 128 one Gbps Ethernet links.
 A rack contains 80 PCs.
 This is an earlier cluster of 3,200 PCs. Google’s search engine clusters are built
with a lot more nodes.
 Two switches are used to enhance cluster availability. The cluster works fine even
when one switch fails to provide the links among the PCs.
 The front ends of the switches are connected to the Internet via 2.4 Gbps OC 12
links.
 The 622 Mbps OC 12 links are connected to nearby data-center networks. In case
of failure of the OC 48 links, the cluster is still connected to the outside world via
the OC 12 links.
 Thus, the Google cluster eliminates all single points of failure.

You might also like