0% found this document useful (0 votes)
25 views

Cluster Computing (Unit 1-5)

The document discusses distributed systems and algorithms for distributed mutual exclusion. It defines distributed systems and describes their key properties like concurrency, lack of a global clock, and independent failures. It then covers several algorithms for distributed mutual exclusion like the central server algorithm, ring-based algorithm, multicast synchronization, and Maekawa's voting algorithm.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

Cluster Computing (Unit 1-5)

The document discusses distributed systems and algorithms for distributed mutual exclusion. It defines distributed systems and describes their key properties like concurrency, lack of a global clock, and independent failures. It then covers several algorithms for distributed mutual exclusion like the central server algorithm, ring-based algorithm, multicast synchronization, and Maekawa's voting algorithm.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 74

Unit 1

Distributed Systems:
A distributed system is one in which components located at networked computers communicate and coordinate
their actions only by passing messages.
A distributed system as one in which hardware or software components located at networked computers
communicate and coordinate their actions only by passing messages. This simple definition covers the entire
range of systems in which networked computers can usefully be deployed.
Computers that are connected by a network may be spatially separated by any distance. They may be on separate
continents, in the same building or in the same room. Our definition of distributed systems has the following
significant consequences:
Concurrency :
In a network of computers, concurrent program execution is the norm. I can do my work on my computer while
you do your work on yours, sharing resources such as web pages or files when necessary. The capacity of the
system to handle shared resources can be increased by adding more resources (for example. computers) to the
network.
No global clock:
When programs need to cooperate they coordinate their actions by exchanging messages. Close coordination
often depends on a shared idea of the time at which the programs’ actions occur. But it turns out that there are
limits to the accuracy with which the computers in a network can synchronize their clocks – there is no single
global notion of the correct time. This is a direct consequence of the fact that the only communication is by
sending messages through a network.
Independent failures :
All computer systems can fail, and it is the responsibility of system designers to plan for the consequences of
possible failures. Distributed systems can fail in new ways. Faults in the network result in the isolation of the
computers that are connected to it, but that doesn’t mean that they stop running. In fact, the programs on them
may not be able to detect whether the network has failed or has become unusually slow. Similarly, the failure
of a computer, or the unexpected termination of a program somewhere in the system (a crash), is not
immediately made known to the other components with which it communicates. Each component of the system
can fail independently, leaving the others still running.

Examples of Distributed systems


To place distributed systems in a realistic context through examples: the Internet, an intranet and mobile
computing.
1. The Internet :
 A vast interconnected collection of computer networks of many different types.
 Passing message by employing a common means of communication (Internet Protocol).
 The web is not equal to the Internet.
DISTRIBUTED MUTUAL EXCLUSION
The mutual exclusion makes sure that concurrent process access shared resources or data in a
serialized way. If a process, say Pi , is executing in its critical section, then no other processes can be
executing in their critical sections.
Distributed mutual exclusion provide critical region in a distributed environment.

Algorithms for Distributed Mutual Exclusion


Consider there are N processes and the processes do not fail. The basic assumption isthe
message delivery system is reliable. So the methods for the critical region are:

 enter() : Enter the critical section block if necessary


 resourceAccesses():Access the shared resources
 exit(): Leaves the critical section. Now other processes can enter critical section.The
following are the requirements for Mutual Exclusion (ME):

 [ME1] safety: only one process at a time


 [ME2] liveness: eventually enter or exit
 [ME3] happened-before ordering: ordering of enter() is the same as HBordering
The second requirement implies freedom from both deadlock and starvation.
Starvation involves fairness condition.

Performance Evaluation:
The following are the criteria for performance measures:

 Bandwidth consumption, which is proportional to the number of messages sent ineach


entry and exit operations.

 The client delay incurred by a process at each entry and exit operation.
 Throughput of the system: Rate at which the collection of processes as a wholecan
access the critical section.
Central Sever Algorithm
 This employs the simplest way to grant permission to enter the critical section byusing a
server.
 A process sends a request message to server and awaits a reply from it.
 If a reply constitutes a token signifying the permission to enter the critical section.
 If no other process has the token at the time of the request, then the server replied
immediately with the token.
 If token is currently held by another process, then the server does not reply butqueues
the request.
 Client on exiting the critical section, a message is sent to server, giving it back thetoken.

Fig : Central Server Algorithm


The central server algorithm fulfils ME1 and ME2 but not ME3 (i.e.) safety and liveness is
ensured but ordering is not satisfied. Also the performance of the algorithm is measured as follows:

 Bandwidth: This is measured by entering and exiting messages. Entering takes two
messages ( request followed by a grant) which are delayed by the round- trip time.
Exiting takes one release message, and does not delay the exiting process.

 Throughput is measured by synchronization delay, round-trip of a release message and


grant message.

Ring Based Algorithm


 This provides a simplest way to arrange mutual exclusion between N processeswithout
requiring an additional process is arrange them in a logical ring.
 Each process pi has a communication channel to the next process in the ring asfollows:
p(i+1)/mod N.

 The unique token is in the form of a message passed from process to process in asingle
direction clockwise.

 If a process does not require to enter the CS when it receives the token, then it
immediately forwards the token to its neighbor.

 A process requires the token waits until it receives it, but retains it.
 To exit the critical section, the process sends the token on to its neighbor.
Fig 4.13: Ring based algorithm
This algorithm satisfies ME1 and ME2 but not ME (i.e.) safety and liveness aresatisfied but
not ordering. The performance measures include:

 Bandwidth: continuously consumes the bandwidth except when a process isinside the CS.
Exit only requires one message.

 Delay: experienced by process is zero message(just received token) to Nmessages(just


pass the token).

 Throughput: synchronization delay between one exit and next entry is anywherefrom
1(next one) to N (self) message transmission.

Multicast Synchronisation
 This exploits mutual exclusion between N peer processes based upon multicast.

 Processes that require entry to a critical section multicast a request message, and canenter it only
when all the other processes have replied to this message.
 The condition under which a process replies to a request are designed to ensure ME1ME2 and
ME3 are met.

 Each process pi keeps a Lamport clock. Message requesting entry are of the form<T,pi>.

 Each process records its state of either RELEASE, WANTED or HELD in a variablestate.

 If a process requests entry and all other processes is RELEASED, then allprocesses reply
immediately.

 If some process is in state HELD, then that process will not reply until it isfinished.

 If some process is in state WANTED and has a smaller timestamp than theincoming request,
it will queue the request until it is finished.

 If two or more processes request entry at the same time, then whichever bears thelowest
timestamp will be the first to collect N-1 replies.
Fig : Multicast Synchronisation
 In the above figure, P1 and P2 request CS concurrently.

 The timestamp of P1 is 41 and for P2 is 34.

 When P3 receives their requests, it replies immediately.

 When P2 receives P1‟s request, it finds its own request has the lower timestamp, and sodoes not
reply, holding P1 request in queue.

 However, P1 will reply. P2 will enter CS. After P2 finishes, P2 reply P1 and P1 willenter CS.
 Granting entry takes 2(N-1) messages, N-1 to multicast request and N-1 replies.

Performance Evaluation:
 Bandwidth consumption is high.
 Client delay is again 1 round trip time
 Synchronization delay is one message transmission time.

Maekawa’s Voting Algorithm


• In this algorithm, it is not necessary for all of its peers to grant access. Only need to obtain
permission to enter from subsets of their peers, as long as the subsets used by any two
processes overlap.

• Think of processes as voting for one another to enter the CS. A candidate process must
collect sufficient votes to enter.

• Processes in the intersection of two sets of voters ensure the safety property ME1 by
casting their votes for only one candidate.

• A voting set Vi associated with each process pi.

• There is at least one common member of any two voting sets, the size of all voting set are
the same size to be fair.

• The optimal solution to minimizes K is K~sqrt(N) and M=K.

• The algorithm is summarized as follows:Vi


{P1, P2, … PN}
Such that for all i, j = 1, 2, … N
Pi i

Vi Vj

|Vi | = K

Each process is contained in M of the voting set Vi


Wait until (number of replies received = K);
state := HELD;
On receiptof a request frompi at pj
if (state = HELD orvoted = TRUE)
then
queue request from pi without replying;
else
send reply to pi;
voted := TRUE;
end if
For pi to exit the critical section
state := RELEASED;
Multicast release to all processes in Vi;
On receiptof a release frompi at pj
if (queue of requests is non-empty)
then
remove head of queue – from pk, say;
send reply to pk;
voted := TRUE;
else
voted := FALSE;
end if

 The ME1 is met.


 If two processes can enter CS at the same time, the processes in the intersection oftwo
voting sets would have to vote for both.
 The algorithm will only allow a process to make at most one vote between successivereceipts
of a release message.
 This algorithm is deadlock prone.
 If three processes concurrently request entry to the CS, then it is possible for p1 to reply to itself
and hold off p2; for p2 rely to itself and hold off p3; for p3 to reply to itself and hold off p1.
 Each process has received one out of two replies, and none can proceed.
 If process queues outstanding request in happen-before order, ME3 can be satisfied and will be
deadlock free.

Performance Evaluation:
 Bandwidth utilization is 2sqrt (N) messages per entry to CS and sqrt (N) per exit.
 Client delay is the same as Ricart and Agrawala’s algorithm, one round-trip time.

 Synchronization delay is one round-trip time which is worse than R&A

Fault Tolerance
 The reactions of the algorithms when messages are lost or when a process crashesis fault
tolerance.
 None of the algorithm that we have described would tolerate the loss of messagesif the
channels were unreliable.
 The ring-based algorithm cannot tolerate any single process crash failure.
 Maekawa’s algorithm can tolerate some process crash failures: if a crashed processis not in
a voting set that is required.
The central server algorithm can tolerate the crash failure of a client process thatneither holds nor has requested
the token

Failure model
 Failure model defines and classifies the faults.
 In a distributed system both processes and communication channels may fail – That is, they may depart
from what is considered to be correct or desirable behavior.

 Types of failures:
 Omission Failures
 Arbitrary Failures
 Timing Failures

Omission failure

 Omission failures refer to cases when a process or communication channel fails to perform
actions that it is supposed to do.
 The chief omission failure of a process is to crash. In case of the crash, the process has halted and
will not execute any further steps of its program.
 Another type of omission failure is related to the communication which is called
communication omission failure shown in
The communication channel produces an omission failure if it does not transport a message from “p”s outgoing
message buffer to “q”’s incoming message buffer.

 This is known as “dropping messages” and is generally caused by lack of buffer space at the receiver or
at an gateway or by a network transmission error, detected by a checksum carried with the message data.

Arbitrary failure

 Arbitrary failure is used to describe the worst possible failure semantics, in which any type of error may
occur.
 E.g. a process may set a wrong values in its data items, or it may return a wrong value
in response to an invocation.
 Communication channel can suffer from arbitrary failures.

E.g. message contents may be corrupted or non-existent messages may be delivered or


real messages may be delivered more than once.

 The omission failures are classified together with arbitrary failures shown in

Timing failure
Timing failures are applicable in synchronized distributed systems where time limits are set on process
execution time, message delivery time and clock drift rate.

Masking failure
 ts that exhibit failure.
E.g. multiple servers that hold replicas of data can continue to provide a service when one
of them crashes.
 A service masks a failure, either by hiding it altogether or by converting it into a more acceptable
type of failure.
E.g. checksums are used to mask corrupted messages - effectively converting an
arbitrary failure into an omission failure.

Programming Paradigms
Programming distributed systems requires addressing various challenges related to concurrency, fault tolerance,
and network communication. Several programming paradigms and models are used to develop distributed
systems, each with its own set of principles and approaches. Here are some common programming paradigms
in distributed systems:
Client-Server Model: This is one of the most straightforward paradigms in distributed computing. Clients send
requests to servers, which process these requests and return responses. The client-server model is often used for
web applications, where web browsers (clients) interact with web servers to fetch resources or perform actions.
Peer-to-Peer (P2P): In P2P systems, all nodes (peers) are equal and have the same capabilities. Peers can both
request and provide services, making P2P systems decentralized. Examples include file-sharing networks like
BitTorrent and blockchain networks like Bitcoin.
Message Passing: This paradigm focuses on communication between processes in a distributed system.
Messages are used to exchange information between nodes. Message-passing models can be implemented using
various communication mechanisms, such as message queues or Remote Procedure Calls (RPC).
Remote Procedure Call (RPC): RPC allows one process to invoke a function or method in another process, as
if it were a local function call. It simplifies remote communication and can hide the complexities of distributed
systems from developers.
Message-Oriented Middleware (MOM): MOM is a communication paradigm that uses message queues to
enable asynchronous communication between distributed components. It's often used in event-driven systems,
such as financial trading platforms.
Actor Model: In the Actor model, everything is an actor, which encapsulates state and behavior. Actors
communicate through message passing. This model is used in systems like Erlang and Akka for building highly
concurrent and fault-tolerant applications.
Data-Parallel Programming: This paradigm is used for distributed data processing systems like Hadoop and
Spark. It focuses on breaking down large data sets into smaller partitions that can be processed independently
across multiple nodes.
MapReduce: MapReduce is a programming model and processing framework that simplifies distributed data
processing. It divides tasks into two phases: mapping data and reducing the results. It's widely used in big data
applications.
Service-Oriented Architecture (SOA): SOA is an architectural style that focuses on designing distributed
systems as a collection of loosely coupled services. These services are designed to be independent and
communicate through well-defined interfaces, often using protocols like HTTP or SOAP.
Microservices Architecture: Microservices are an extension of SOA, where services are smaller, more focused,
and independently deployable. Microservices use lightweight communication mechanisms like RESTful APIs
and are often containerized for easy scaling and deployment.
Event-Driven Architecture: In this paradigm, systems react to events or messages, enabling real-time
communication and processing. Event-driven systems are commonly used in applications like IoT, real-time
analytics, and chat applications.
Blockchain Smart Contracts: In blockchain systems like Ethereum, smart contracts are self-executing
programs that run on the network. These contracts automatically execute when specific conditions are met,
making them suitable for decentralized applications.
When programming in a distributed system, the choice of paradigm depends on the specific requirements of the
application and the trade-offs between factors like fault tolerance, scalability, and ease of development.
Developers often combine multiple paradigms to address different aspects of their distributed systems.

Shared memory
Shared memory in a distributed system refers to the concept of multiple processes or nodes in a distributed
environment accessing and manipulating a common memory space. Unlike traditional shared memory in a
single-machine multi-threaded or multi-process context, distributed shared memory extends the idea to multiple
machines or nodes. This allows processes running on different machines to communicate and share data as if
they were all accessing a single, shared memory space. Here are some key points about shared memory in
distributed systems:
Communication Mechanisms: In a distributed system, shared memory is typically implemented using
communication mechanisms, such as Remote Procedure Calls (RPC), message passing, or distributed memory
access protocols. These mechanisms enable processes on different nodes to read from and write to the shared
memory.
Data Consistency: Ensuring data consistency is a significant challenge in distributed shared memory systems.
When multiple processes on different nodes can access and modify shared data simultaneously, synchronization
mechanisms, like locks, semaphores, or distributed data structures, are needed to maintain data consistency.
Latency and Network Overhead: Network latency and communication overhead are critical considerations in
distributed shared memory systems. Data access times can be significantly higher than in traditional shared
memory systems due to network communication delays.
Data Distribution: Data may be partitioned or replicated across multiple nodes to optimize data access in a
distributed shared memory system. Decisions about data distribution depend on factors like access patterns, data
size, and fault tolerance requirements.
Scalability: Scalability is a crucial concern in distributed shared memory systems. As the number of nodes
increases, managing shared memory becomes more complex. Design choices, such as the granularity of data
sharing and the choice of communication protocols, can impact system scalability.
Fault Tolerance: Distributed shared memory systems need to be designed with fault tolerance in mind. If a node
fails or loses connectivity, mechanisms must ensure data consistency and availability.
Programming Models: Distributed shared memory can be used with various programming models, such as the
Single Program, Multiple Data (SPMD) model, where each process appears to have its private copy of memory,
but they can read from and write to the shared memory as needed.
Examples: Distributed shared memory systems are used in various applications, such as parallel computing
clusters, where multiple nodes work together on computationally intensive tasks, and in distributed databases to
allow multiple nodes to access and modify data as if it were stored locally.
Implementing distributed shared memory can be complex, and developers need to carefully consider data
consistency, access patterns, and fault tolerance requirements. While it can simplify programming by providing
a familiar shared-memory model, it also introduces challenges related to distributed computing, such as network
communication and data synchronization. As a result, developers often choose distributed shared memory when
the benefits of shared memory-like programming outweigh the complexities of managing data in a distributed
environment.

Message Passing
Message passing is a fundamental concept in distributed computing and parallel programming. It is a
communication method that allows separate processes or entities to exchange data and synchronize their actions.
Message passing is commonly used in distributed systems, parallel computing, and interprocess communication.
Here are some key points about message passing:
Process Communication: In a distributed system or parallel computing environment, processes (which can be
threads, programs, or entities) communicate by sending and receiving messages. These messages can contain
data, instructions, or both.
Asynchronous Communication: Message passing allows processes to communicate asynchronously, meaning
that a sender can continue its work without waiting for the receiver to handle the message. This is in contrast to
shared memory systems where synchronization is often required.
Synchronization: While message passing supports asynchronous communication, it also provides mechanisms
for synchronization when necessary. For example, processes can use messages to signal each other, indicating
that certain conditions have been met or specific tasks are complete.
Point-to-Point Communication: In point-to-point communication, one process sends a message to another
specific process. This is akin to sending a private message from one person to another.
Broadcast Communication: In broadcast communication, a process sends a message to all other processes in
the system. This can be useful for distributing information or updates to multiple recipients simultaneously.
Message Queues: Systems that use message passing often employ message queues to store and manage
messages. Processes enqueue messages to a queue, and other processes dequeue and process messages from the
queue.

Message-Oriented Middleware (MOM): Message-oriented middleware is a software layer that simplifies


message passing in distributed systems. It provides features like message routing, message filtering, and message
persistence.

Reliability: Message passing can be designed to be reliable, ensuring that messages are not lost in transit. This
is especially important in distributed systems where network failures can occur.

Scalability: Message passing is highly scalable, making it suitable for large-scale distributed systems. As more
processes are added, they can communicate by sending messages without the need for shared memory or
centralized coordination.
Message Formats: Messages can be formatted in various ways, including plain text, binary, or structured data
like JSON or XML. The format depends on the needs of the application and the message passing middleware
used.
Message Passing in Programming: Message passing is commonly used in programming languages and
libraries designed for distributed and parallel computing. For example, MPI (Message Passing Interface) is a
widely used standard for message passing in high-performance computing.
Examples: Message passing is used in various distributed systems, such as message brokers (e.g., Apache
Kafka), distributed computing frameworks (e.g., Hadoop's MapReduce), and many networked applications
where components or nodes need to communicate.
Message passing is a versatile and widely used method for enabling communication and coordination in
distributed systems. It is particularly well-suited for situations where processes or components run on different
nodes and need to work together without shared memory or centralized control.

Workflow
A workflow refers to a sequence of tasks, processes, or steps that are designed to achieve a specific goal or
outcome. Workflows are used in various fields, including business, information technology, manufacturing, and
more, to streamline and manage work processes efficiently. Here are some key concepts related to workflows:

1. Definition: A workflow defines how work is organized, who is responsible for each task, the order in which
tasks are performed, and what triggers the transition from one task to the next.

2.Components of a Workflow:
 Tasks/Steps: These are the individual actions or activities that make up the workflow. Each task has
a specific purpose and often involves specific individuals or resources.
 Transitions: These define how tasks are connected. They specify the conditions or criteria that must
be met for a task to move to the next one.
 Participants/Roles: Workflows often involve multiple participants or roles responsible for
completing various tasks. Each participant is assigned specific responsibilities.
 Data/Information: Workflows may include data or information that needs to be processed, generated,
or transferred at various stages.
 Rules and Policies: Workflows may be governed by rules, policies, or constraints that dictate how
tasks are executed and under what conditions.
 Automation: In some cases, tasks within a workflow can be automated through software or hardware
systems to improve efficiency and reduce errors.
 Feedback and Monitoring: Workflow systems may include mechanisms for tracking progress,
collecting feedback, and monitoring the performance of tasks and participants.

3. Types of Workflows:
 Sequential Workflow: Tasks are performed in a linear sequence, with each task dependent on the
completion of the previous one.
 Parallel Workflow: Multiple tasks are performed concurrently or in parallel, without strict order or
dependency.
 State Machine Workflow: The workflow's progress is determined by the current state and transitions
based on specific conditions.
 Ad Hoc Workflow: Less structured and more flexible, allowing tasks to be performed in a less
predefined order.

4. Workflow Management Systems (WMS): These are software tools or platforms designed to create, manage,
and automate workflows. They often provide features for defining workflows, assigning tasks, tracking progress,
and handling exceptions.
5. Business Process Management (BPM): BPM is a discipline that focuses on the design, modeling, execution,
and optimization of workflows in organizations. BPM software and methodologies are used to improve business
processes and increase efficiency.

6. Use Cases:
 Business Workflows: Used in various industries for managing processes like order processing,
customer support, project management, and more.
 Scientific Workflows: Common in scientific research and data analysis to automate complex
experiments or simulations.
 Software Development Workflows: Used in software development to manage tasks like code review,
testing, and deployment.
 Manufacturing Workflows: Used in manufacturing to control production processes and quality
assurance.

7. Workflow Notation: Various notations and standards exist for representing workflows, including Business
Process Model and Notation (BPMN) and Workflow Management Coalition (WfMC) standards.

Workflows are essential for improving efficiency, reducing errors, and ensuring that work processes are well-
structured and documented. They are widely used in both business and technical domains to manage and
optimize various processes and operations.
Unit-2 & 3

Introduction to Cluster Computing


The essence of Pfister’s [2] and Buyya’s [3] work defines clusters as follows:
A cluster is a type of parallel and distributed system, which consists of a collection of inter- connected
stand-alone computers working together as a single integrated computing resource.
Buyya defined one of the popular definitions for Grids at the 2002 Grid Planet conference, San Jose,
USA as follows:
A Grid is a type of parallel and distributed system that enables the sharing, selection, and aggregation
of geographically distributed ‘autonomous’ resources dynamically at runtime depending on their
availability, capability, performance, cost, and users’ quality-of-service requirements.
Buyya [1] propose the clouds definition as follows:
A Cloud is a type of parallel and distributed system consisting of a collection of inter-connected and
virtualized computers that are dynamically provisioned and presented as one or more unified
computing resource(s) based on service-level agreements established through negotiation between
the service provider and consumers.
The computing power of a sequential computer is not enough to carry out scientific and engineering
applications. In order to meet the computing power, we can improve the speed of processors,
memory and other components of the sequential computer. However, computing power is limited
when we consider very complex applications. A cost-effective solution is to connect multiple
sequential computers together and combine their computing power. We call it parallel computers.
There are three ways to improve performance as per Pfister [1].
Non-Technical Term Technical Term
1 Work Harder Faster Hardware
2 Work Smarter More Efficiently
3 Get Help Multiple Computer to Solve a Task

The evolution of various computing or systems is as follows.


Year Computing
1950 Multi-Processors Systems
1960-80 Supercomputers
1988 Reconfigurable Computing
1990 Cluster Computers
1998 Distributed Computing
2000 Grid Computing
2006 SOA and Web Services, Deep Computing, Multi-Core Architecture, Skeleton Based
Programming, Network Devices
2008 Cloud Computing
Heterogeneous Multicore General-Purpose computing on Graphics Processing Units
2009-15
(GPGPU), APU, Big Data

There are two eras of computing as follows [3].


1. Sequential Computing Era
2. Parallel Computing Era
The computing era is started with improvement of following things [3].
1. Hardware Architecture
2. System Software
3. Applications
4. Problem Solving Environments (PSEs)
The components of computing eras are going through the following phases [3].
1. Research and Development
2. Commercialization
3. Commodity
A cluster connects a number of computing nodes or personal computers that are used as servers via a
fast local area network. It may be a two-node system that connects two personal computers or fast
supercomputer. However, the supercomputers may include many clusters.
According to the latest TOP500 list [4] (i.e., November 2014), the best 10 supercomputers are as
follows.
R
RMAX RPEAK
A POWER
SITE SYSTEM CORES (TFLOP (TFLOP
N (KW)
/S) /S)
K
Tianhe-2 (MilkyWay-2)- TH-
National Super
IVB-FEP Cluster, Intel Xeon E5-
Computer Center in
1 2692 12C 2.200GHz, TH 3,120,000 33,862.7 54,902.4 17,808
Guangzhou
Express-2, Intel Xeon Phi 31S1P
China
NUDT
Titan - Cray XK7 , Opteron 6274
DOE/SC/Oak Ridge
16C 2.200GHz, Cray Gemini
2 National Laboratory 560,640 17,590.0 27,112.5 8,209
interconnect, NVIDIA K20x
United States
Cray Inc.
Sequoia - BlueGene/Q, Power
DOE/NNSA/LLNL
3 BQC 16C 1.60 GHz, Custom 1,572,864 17,173.2 20,132.7 7,890
United States
IBM
RIKEN Advanced K computer, SPARC64 VIIIfx
4 Institute for 2.0GHz, Tofu interconnect 705,024 10,510.0 11,280.4 12,660
Computational Fujitsu
Science (AICS)
Japan
DOE/SC/Argonne Mira - BlueGene/Q, Power BQC
5 National Laboratory 16C 1.60GHz, Custom 786,432 8,586.6 10,066.3 3,945
United States IBM
Swiss National Piz Daint - Cray XC30, Xeon E5-
Supercomputing 2670 8C 2.600GHz, Aries
6 115,984 6,271.0 7,788.9 2,325
Centre (CSCS) interconnect , NVIDIA K20x
Switzerland Cray Inc.
Texas Advanced Stampede - PowerEdge C8220,
Computing Xeon E5-2680 8C 2.700GHz,
7 Center/Univ. of Infiniband FDR, Intel Xeon Phi 462,462 5,168.1 8,520.1 4,510
Texas SE10P
United States Dell
JUQUEEN - BlueGene/Q, Power
Forschungszentrum
BQC 16C 1.600GHz, Custom
8 Juelich (FZJ) 458,752 5,008.9 5,872.0 2,301
Interconnect
Germany
IBM
Vulcan - BlueGene/Q, Power
DOE/NNSA/LLNL BQC 16C 1.600GHz, Custom
9 393,216 4,293.3 5,033.2 1,972
United States Interconnect
IBM
Cray CS-Storm, Intel Xeon E5-
Government 2660v2 10C 2.2GHz, Infiniband
10 72,800 3,577.0 6,131.8 1,499
United States FDR, Nvidia K40
Cray Inc.
Scalable Parallel Computer Architectures
The scalable parallel computer architectures are as follows.
1. Massively Parallel Processors (MPP)
 It is a shared-nothing architecture.
 Each node in MPP runs a copy of the operating systems (OSs).
2. Symmetric Multiprocessors (SMP)
 It is a shared-everything architecture.
3. Cache-Coherent Nonuniform Memory Access (CC-NUMA)
4. Distributed Systems
 Each node runs its own OS.
 It is the combinations of MPPs, SMPs, clusters and individual computers.
5. Clusters
The detailed comparisons of the scalable parallel computer architectures are shown below [3, 5].

SMP Distributed
Characteristic MPP Cluster
CC-NUMA
Number of Nodes 100 to 1000 10 to 100 100 or less 10 to 1000
Medium or Coarse Wide Range
Node Complexity Fine Grain or Medium Medium Grain
Grain
Message Passing or Shared Files, RPC,
Centralized and
Internode Shared Variables for Message Passing
Distributed Shared Message Passing
Communication Distributed Shared and IPC
Memory (DSM)
Memory
Single Run Queue on Multiple Queue but Independent
Job Scheduling Single Run Queue Queues
Host Coordinated
Always in SMP No
SSI Support Partially Desired
and some NUMA
N Micro-Kernels One Monolithic N OS Platform- N OS Platforms
Node OS Copies
Monolithic or Layered SMP and Many for Homogeneous or Homogeneous
and Type
Oss NUMA Micro-Kernel
Multiple – Single for Multiple
Address Space Single Multiple or Single
DSM
Internode Required if Required
Unnecessary Unnecessary
Security Exposed
One or More Many
Ownership One Organization One Organization Organizations
Organizations

Granularity [6] refers to the extent to which a system or material or a large entity is decomposed into
small pieces. Alternatively, it is to the extent for which smaller entities are joined to form a larger
entity. It is of two types, namely Coarse-Grained and Fine-Grained.
A Coarse-Grained defines a system regards large subcomponents of which the larger ones are
composed. A Fine-Grained defines a system regards smaller components of which the larger ones
are composed.
. Message Passing [7]
 Variables have to be marshaled
 Cost of communication is obvious
 Processes are protected by having private address space
 Processes should execute at the same
time DSM [7]
 Variables are shared directly
 Cost of communication is invisible
 Processes could cause error by altering data
 Executing the processes may happen with non-overlapping lifetimes
Kernel [8] is a program that manages I/O requests from software and translates them into data
processing instructions for the CPU and other electronic components of a computer.
A Monolithic Kernel [8] executes all the OS instructions in the same address space in order to
improve the performance.
A Micro-Kernel [8] runs most of the OS’s background processes in user space to make the OS more
modular. Therefore, it is easier to maintain.
Cluster Computer and its Architecture
A Cluster consists of a collection of interconnected stand-alone computers working together as a single
computing resource. A computer node can be a single or multi-processor system such as PCs,
workstations, servers, SMPs with memory, I/O and an OS. The nodes are interconnected via a
LAN.
The cluster components are as follows.
1. Multiple High Performance Computers
2. Oss (Layered or Micro-Kernel Based)
3. High Performance Networks or Switches (Gigabit Ethernet and Myrinet)
4. Network Interface Cards (NICs)
5. Fast Communication Protocols and Services (Active and Fast Messages)
6. Cluster Middleware (Single System Image (SSI) and System Availability Infrastructure)
7. Parallel Programming Environments and Tools (Parallel Virtual Machine (PVM), Message
Passing Interface (MPI))
8. Applications (Sequential, Parallel or Distributed)

Cluster Classifications
The various features of clusters are as follows.
1. High Performance
2. Expandability and Scalability
3. High Throughput
4. High Availability
Cluster can be classified into many categories as follows.
1. Application Target
 High Performance Clusters
 High Availability Clusters
2. Node Ownership
 Dedicated Clusters
 Nondedicated Clusters
3. Node Hardware
 Cluster of PCs (CoPs) or Piles of PCs (PoPs)
 Cluster of Workstations (COWs)
 Cluster of SMPs (CLUMPs)
4. Node OS
 Linux Clusters (Beowulf)
 Solaris Clusters (Berkeley NOW)
 NT Clusters (High Performance Virtual Machine (HPVM))
 Advanced Interactive eXecutive (AIX) Clusters (IBM Service Pack 2 (SP2))
 Digital Virtual Memory System (VMS) Clusters
 HP-UX Clusters
 Microsoft Wolfpack Clusters
5. Node Configuration
 Homogeneous Clusters
 Heterogeneous Clusters
6. Levels of Clustering
 Group Clusters (No. of Nodes = 2 to 99)
 Departmental Clusters (No. of Nodes = 10 to 100s)
 Organizational Clusters (No. of Nodes = Many 100s)
 National Metacomputers (No. of Nodes = Many Departmental or Organizational
Systems or Clusters)
 International Metacomputers (No. of Nodes = 1000s to Many Millions)
Components for Clusters
The components of clusters are the hardware and software used to build clusters and nodes. They are
as follows.
1. Processors
 Microprocessor Architecture (RISC, CISC, VLIW and Vector)
 Intel x86 Processor (Pentium Pro and II)
 Pentium Pro shows a very strong integer performance in contrast to Sun’s UltraSPARC
for high performance range at the same clock speed. However, the floating-point
performance is much lower.
 The Pentium II Xeon uses a memory bus of 100 MHz. It is available with a choice of
512 KB to 2 MB of L2 cache.
 Other processors: x86 variants (AMD x86, Cyrix x86), Digital Alpha, IBM PowerPC, Sun
SPARC, SGI MIPS and HP PA.
 Berkeley NOW uses Sun’s SPARC processors in their cluster nodes.
2. Memory and Cache
 The memory present inside a PC was 640 KBs. Today, a PC is delivered with 32 or 64
MBs installed in slots with each slot holding a Standard Industry Memory Module
(SIMM). The capacity of a PC is now many hundreds of MBs.
 Cache is used to keep recently used blocks of memory for very fast access. The size of
cache is usually in the range of 8KBs to 2MBs.
3. Disk and I/O
 The I/O performance is improved to carry out I/O operations in parallel. It is supported
by parallel file systems based on hardware or software Redundancy Array of
Inexpensive Disk (RAID).
 Hardware RAID is more expensive than Software RAID.
4. System Bus
 Bus is the collection of wires which carries data from one component to another. The
components are CPU, Main Memory and others.
 Bus is of following types.
o Address Bus
o Data Bus
o Control Bus
 Address bus is the collection of wires which transfer the addresses of Memory or I/O
devices. For instance, Intel 8085 Microprocessor has an address bus of 16 bits. It
shows that the Microprocessor can transfer maximum 16 bit address.
 Data bus is the collection of wires which is used to transfer data within the
Microprocessor and Memory or I/O devices. Intel 8085 has a data bus of 8 bits. That’s
why Intel 8085 is called 8 bit Microprocessor.

 Control bus is responsible for issuing the control signals such as read, write or
opcode fetch to perform some operations with the selected memory location.
 Every bus has a clock speed. The initial PC bus has a clock speed of 5 MHz and it is 8
bits wide.
 In PCs, the ISA bus is replaced by faster buses such as PCI.
 The ISA bus is extended to be 16 bits wide and an enhanced clock speed of 13 MHz.
However, it is not sufficient to meet the demands of the latest CPUs, disk and other
components.
 The VESA local bus is a 32 bit bus that has been outdated by the Intel PCI bus.
 PCI bus allows 133 Mbytes/s.
5. Cluster Interconnects
 The nodes in a cluster are interconnected via standard Ethernet and these nodes are
communicated using a standard networking protocol such as TCP/IP or a low-level
protocol such as Active Messages.
 Ethernet: 10 Mbps
 Fast Ethernet: 100 Mbps
 Gigabit Ethernet
 The two main characteristics of Gigabit Ethernet are as follows.
o It preserves Ethernet’s simplicity which enabling a smooth migration to
Gigabit-per-second (Gbps) speeds.
o It delivers a very high bandwidth to aggregate multiple Fast Ethernet
segments.
 Asynchronous Transfer Mode (ATM)
o It is a switched virtual-circuit technology.
o It is intended to be used for both LAN and WAN, presenting a unified approach
to both.
o It is based on small fixed-size data packets termed cell. It is designed to allow
cells to be transferred using a number of medias such as copper wire and fiber
optic cables.
o CAT-5 is used with ATM allowing upgrades of existing networks without
replacing cabling.
 Scalable Coherent Interface (SCI)
o It aims to provide a low-latency distributed shared memory across a cluster.
o It is design to support distributed multiprocessing with high bandwidth and
low latency.
o It is a point-to-point architecture with directory-based cache coherence.
o Dolphin has produced an SCI MPI which offers less than 12 µs zero message-
length latency on the Sun SPARC platform.
 Myrinet
o It is a 1.28 Gbps full duplex interconnection network supplied by Myricom
[9].
o It uses low latency cut-through routing switches, which is able to offer fault
tolerance.
o It supports both Linux and NT.
o It is relatively expensive when compared to Fast Ethernet, but has following
advantages. 1) Very low latency (5 µs), 2) Very high throughput, 3) Greater
flexibility.
o The main disadvantage of Myrinet is its price. The cost of Myrinet-LAN
components including the cables and switches is $1,500 per host. Switches
with more than 16 ports are unavailable. Therefore, scaling is complicated.
Cluster Middleware and Single System Image
Single System Image (SSI) is the collection of interconnected nodes that appear as a unified resource.
It creates an illusion of resources such as hardware or software that presents a single powerful
resource. It is supported by a middleware layer that resides between the OS and the user-level
environment. The middleware consists of two sub-layers, namely SSI Infrastructure and System
Availability Infrastructure (SAI). SAI enables cluster services such as checkpointing, automatic
failover, recovery from failure and fault-tolerant.
1. SSI Levels or Layers
 Hardware (Digital (DEC) Memory Channel, Hardware DSM and SMP Techniques)
 Operating System Kernel – Gluing Layer (Solaris MC and GLUnix)
 Applications and Subsystems – Middleware
o Applications
o Runtime Systems
o Resource Management and Scheduling Software (LSF and CODINE)
2. SSI Boundaries
 Every SSI has a boundary.
 SSI can exist at different levels within a system – one able to be built on another
3. SSI Benefits
 It provides a view of all system resources and activities from any node of the cluster.
 It frees the end user to know where the application will run.
 It frees the operator to know where a resource is located.
 It allows the administrator to manage the entire cluster as a single entity.
 It allows both centralize or decentralize system management and control to avoid
the need of skilled administrators for system administration.
 It simplifies system management.
 It provides location-independent message communication.
 It tracks the locations of all resources so that there is no longer any need for system
operators to be concerned with their physical location while carrying out system
management tasks.
4. Middleware Design Goals
 Transparency
 Scalable Performance
 Enhanced Availability
5. Key Service of SSI and Availability Infrastructure
 SSI Support Services
o Single Point of Entry
o Single File Hierarchy
o Single Point of Management and Control
o Single Virtual Networking
o Single Memory Space
o Single Job Management System
o Single User Interface
 Availability Support Functions
o Single I/O Space
o Single Process Space
o Checkpointing and Process Migration

Early cluster architectures


Early cluster architectures have evolved over time, and their development can be traced back to the
early days of parallel computing. Here's an overview of some early cluster architectures:

1. Beowulf Clusters:
- Beowulf clusters are one of the earliest and most well-known forms of commodity cluster
computing.
- Developed in the 1990s by researchers at NASA and the National Center for Supercomputing
Applications (NCSA).
- Beowulf clusters typically consist of off-the-shelf hardware components, such as commodity
processors, Ethernet networking, and Linux-based operating systems.
- Message Passing Interface (MPI) is often used for communication between nodes in Beowulf
clusters.

2. PVM (Parallel Virtual Machine):


- PVM is a software system for the development of parallel applications.
- It enables the coordination and communication among a network of computers to work together as
a parallel processing cluster.
- PVM was widely used in the 1980s and 1990s for building parallel applications on clusters of
workstations.

3. NOW (Networks of Workstations):


- NOW architecture emerged in the late 1980s and early 1990s as a way to harness the power of
interconnected workstations for parallel processing.
- Clusters of workstations were connected through a network, and software tools were developed to
enable parallel computing across these machines.
- NOW clusters were often used for scientific computing and research applications.

4. IBM SP2:
- IBM SP2 (Scalable Power Parallel) was a parallel supercomputer developed by IBM in the early
1990s.
- It was a pioneering system in high-performance computing and featured scalable architecture with
multiple nodes interconnected by a high-speed network.
- SP2 used a variety of processors, including PowerPC, and could be used for parallel scientific and
engineering applications.

5. Thinking Machines CM-5:


- The CM-5, developed by Thinking Machines Corporation in the early 1990s, was a massively
parallel supercomputer.
- It used a unique architecture called the Connection Machine architecture, featuring a large number
of simple processing elements connected in a hypercube network.
- The CM-5 was used for scientific simulations and modeling.

6. Intel Paragon:
- The Intel Paragon was a parallel supercomputer developed by Intel in the 1990s.
- It used a scalable architecture with multiple processors connected through a high-speed
interconnect.
- The Paragon was often used for scientific and engineering simulations.

7. Meiko Computing Surface:


- The Meiko CS-2, introduced in the late 1980s, was an early example of a transputer-based parallel
computer.
- Transputers were specialized processors designed for parallel processing, and Meiko clusters used
these processors interconnected in a hypercube network.
These early cluster architectures laid the groundwork for the development of modern cluster
computing systems. They contributed to the evolution of parallel processing techniques and paved the
way for the widespread use of clusters in various scientific, research, and industrial applications.

High Throughput Computing Clusters:


High Throughput Computing (HTC) clusters are designed to efficiently process a large number of
relatively independent tasks, emphasizing throughput over individual task performance. Here are
some key notes on High Throughput Computing clusters:

1. Definition of High Throughput Computing:


- High Throughput Computing focuses on efficiently processing a large number of tasks in a
parallel or distributed manner.
- Unlike High-Performance Computing (HPC), which emphasizes performance of individual tasks,
HTC is geared towards maximizing the overall throughput of a large number of tasks.

2. Task Parallelism:
- HTC clusters often deal with embarrassingly parallel problems where tasks can be executed
independently.
- Examples include parameter sweeps, data mining, and many scientific simulations.

3. Job Scheduling:
- Efficient job scheduling is crucial in HTC clusters to maximize the utilization of resources.
- Schedulers need to consider factors like task dependencies, resource availability, and priority.

4. Condor:
- Condor is a widely used software system for managing distributed computing resources in HTC
environments.
- It provides a job scheduler and a set of tools for managing and optimizing computing resources.

5. Cycle Stealing:
- Cycle stealing is a concept where idle computing resources are utilized for HTC tasks.
- It involves harnessing the spare cycles of machines that are not fully utilized for their primary
tasks.

6. Distributed Resource Management:


- HTC clusters often involve distributed resources, including machines across multiple locations or
even geographically distributed computing grids.
- Efficient resource management and task scheduling become critical in such environments.

7. Data Management:
- Effective data management is important in HTC clusters, especially when dealing with large
datasets distributed across multiple nodes.
- Ensuring data availability and minimizing data transfer times are key considerations.

8. Fault Tolerance:
- Given the large-scale nature of HTC clusters, fault tolerance mechanisms are crucial to handle
failures gracefully.
- This may involve re-scheduling failed tasks on alternative resources.

9. Workload Balancing:
- Workload balancing is essential for maximizing the overall throughput.
- Tasks should be distributed evenly across available resources to prevent bottlenecks.

10. Grid Computing:


- HTC often leverages grid computing concepts, where computing resources are shared across
different administrative domains.
- Grid middleware facilitates the coordination and utilization of resources in a distributed
environment.

11. Parameter Sweep Applications:


- Many applications in HTC clusters involve parameter sweeps, where the same computation is
performed with a range of input parameters.
- Examples include sensitivity analysis and optimization studies.
12. Scientific and Research Applications:
- HTC clusters are commonly used in scientific research for applications such as bioinformatics,
climate modeling, and large-scale data analysis.
- They are also used in industries for tasks like simulation and design optimization.
High Throughput Computing clusters play a crucial role in handling large-scale, loosely-coupled
tasks efficiently. They are well-suited for scenarios where individual tasks are relatively short-lived
and can be executed independently, making them ideal for a wide range of scientific, industrial, and
research applications.

Networking:
Networking is a crucial aspect of cluster computing, as it enables communication and coordination
among the individual nodes within the cluster. Here are some key notes on networking in clusters:

1. Interconnect Topologies:
Bus Topology: Nodes are connected to a common communication bus. Simple but can lead to
contention.
Ring Topology: Nodes are connected in a circular fashion. Data travels in one direction.
Mesh Topology: Nodes are interconnected in a network. Can be partial (some nodes connected) or
complete (all nodes connected).

2. High-Speed Interconnects:
InfiniBand: A high-speed interconnect technology often used in clusters. Provides low latency and
high bandwidth.
10/25/40/100 Gigabit Ethernet: Standard Ethernet can also be used in clusters, with higher speeds
for better performance.

3. Latency and Bandwidth:


Latency: The time it takes for data to travel from the source to the destination. Low latency is
crucial for real-time and high-performance applications.
Bandwidth: The amount of data that can be transmitted in a given time. Higher bandwidth allows
for faster data transfer.
4. Message Passing Interface (MPI):
- MPI is a standard communication protocol used in parallel computing and cluster environments.
- It enables communication between nodes in a cluster, allowing them to exchange data and
coordinate tasks.

5. TCP/IP Networking:
- Clusters often use standard TCP/IP networking for communication between nodes.
- Ethernet is a common choice for connecting nodes within a cluster.

6. Cluster Communication Libraries:


- Libraries such as OpenMPI and MPICH provide implementations of MPI and other
communication protocols for cluster computing.
- They abstract low-level networking details and make it easier to develop parallel applications.

7. Scalability:
- Cluster networking should be scalable to accommodate a growing number of nodes.
- Scalability issues may arise with the increased number of nodes, leading to bottlenecks and
reduced performance.

8. Switching Technologies:
- In larger clusters, network switches are used to connect nodes.
- Technologies like InfiniBand switches or Ethernet switches with high-speed backplanes are
employed.

9. Jumbo Frames:
- Increasing the size of standard Ethernet frames can improve efficiency by reducing the overhead
associated with smaller frames.
- Jumbo frames can lead to better performance in some cluster configurations.

10. Redundancy and Reliability:


- To enhance reliability, clusters often incorporate redundant network connections and switches.
- Redundancy helps maintain communication in the event of a network component failure.

11. Network File Systems (NFS) and Storage Area Networks (SAN):
- Clusters may use NFS for file sharing and distributed file systems.
- SANs provide high-speed storage that can be shared among nodes in the cluster.

12. Cluster Management Networks:


- Clusters may have a dedicated management network for tasks such as remote administration,
monitoring, and job scheduling.

13. Virtual LANs (VLANs):


- VLANs can be used to logically segment a physical network into multiple virtual networks.
- VLANs can enhance security and manageability in cluster environments.

14. Security Considerations:


- Network security is critical in clusters, especially in shared or cloud environments.
- Firewalls, encryption, and secure communication protocols help protect cluster data.

Effective networking in clusters is essential for achieving high performance, scalability, and
reliability. It requires careful consideration of factors such as interconnect technologies, protocols,
and network topologies to ensure optimal communication and coordination among cluster nodes.

Protocols and I//O for Clusters:


Input/Output (I/O) and communication protocols are critical components of cluster computing,
determining how data is transferred and shared among the nodes. Here are some key notes on
protocols and I/O in clusters:
Communication Protocols:
Message Passing Interface (MPI):
 MPI is a widely used standard for communication in parallel computing and clusters.
 It enables processes to communicate with each other by sending and receiving messages.
 Supports both point-to-point and collective communication operations.
1. Remote Procedure Call (RPC):
 RPC is a protocol that allows a program to cause a procedure (subroutine) to execute in
another address space.
 Commonly used in distributed computing and can be adapted for cluster communication.
2. OpenMP:
 Although OpenMP is primarily used for shared-memory parallelization, it can be combined
with other communication libraries for clusters.
 It provides directives for parallel programming in C, C++, and Fortran.
3. PGAS (Partitioned Global Address Space):
 PGAS languages like UPC (Unified Parallel C) and Co-array Fortran provide a global address
space view, simplifying programming for clusters.
 Enables explicit data placement and communication.
I/O Protocols:
1. Parallel File Systems:
 Cluster environments often use parallel file systems like Lustre, GPFS (IBM Spectrum Scale),
and BeeGFS.
 These file systems are designed for high-performance parallel I/O and support multiple nodes
accessing data concurrently.
2. Network File Systems (NFS):
 NFS allows remote access to files over a network and is commonly used in clusters for shared
file access.
 It may be suitable for certain workloads but can introduce latency for I/O-intensive
applications.
3. Object Storage:
 Object storage, like Amazon S3 or OpenStack Swift, is used in cloud-based clusters.
 Provides scalable and durable storage, suitable for applications dealing with large volumes of
unstructured data.
4. Parallel I/O Libraries:
 Libraries like HDF5 and NetCDF provide abstractions for parallel I/O, facilitating efficient
storage and retrieval of large datasets.
 They often integrate with parallel file systems to optimize data access patterns.
5. MPI I/O:
 MPI includes an I/O component that allows parallel applications to perform I/O in a collective
manner.
 Enables processes to read and write data collectively, improving I/O efficiency.
6. I/O Forwarding:
 In some cluster configurations, I/O forwarding mechanisms redirect I/O requests to designated
nodes, reducing the impact on compute nodes.
 Enhances the overall performance of I/O operations.
7. Data Compression:
 For large-scale data transfer in clusters, data compression techniques may be employed to
reduce the amount of data transmitted over the network.
 Compression and decompression are performed on the fly to optimize I/O performance.
8. Checkpointing:
 Checkpointing involves saving the state of a computation periodically to disk.
 Checkpointing protocols ensure data consistency and reliability in the event of failures.
9. Buffering and Caching:
 Buffering and caching strategies can be implemented to optimize I/O performance.
 Temporary storage buffers and caches help mitigate latency and improve overall throughput.
10. DMA (Direct Memory Access):
 DMA allows I/O operations to bypass the CPU and directly access memory, reducing CPU
overhead during data transfers.
 Enhances the efficiency of I/O operations in clusters.
11. Data Virtualization:
 Data virtualization tools provide a layer of abstraction, allowing applications to access data
without being concerned about its physical location.
 Improves flexibility and simplifies data management in clusters.
Efficient communication and I/O protocols are essential for achieving optimal performance in cluster
computing. The selection of protocols depends on the specific requirements of the applications, the
nature of the data, and the characteristics of the cluster environment.

Setting up and administering a cluster


Setting up and administering a cluster involves several steps, from selecting the
hardware and software components to configuring and managing the cluster's
resources. Here's a general guide to help you through the process:

1. Define Cluster Requirements:


 Identify the purpose of the cluster: high-performance computing (HPC), high-
throughput computing (HTC), data processing, etc.
 Determine the workload characteristics and performance requirements.
 Define the number of nodes, processing power, memory, and storage needed.
2. Select Hardware:
 Choose appropriate hardware components, including servers, networking
equipment, and storage.
 Ensure compatibility between components and consider scalability for future
expansion.
 Balance the performance of individual nodes with the overall system
requirements.
3. Choose Networking Infrastructure:
 Select a high-speed and reliable network interconnect, such as InfiniBand or high-
speed Ethernet.
 Ensure that the network infrastructure can handle the communication patterns of
the intended workload.
4. Install Operating System:
 Choose a cluster-compatible operating system (OS), such as a Linux distribution
(e.g., CentOS, Ubuntu).
 Install the OS on each cluster node, ensuring consistent configurations.
5. Cluster Middleware and Software:
 Install cluster management and middleware software. Common choices include
OpenHPC, Warewulf, and Bright Cluster Manager.
 Configure job scheduling and resource management tools (e.g., Slurm, Torque,
Kubernetes).
6. Configure Networking:
 Set up IP addresses, subnet masks, and hostnames for each node.
 Configure the network interconnect, ensuring proper communication between
nodes.
 Implement any required security measures, such as firewalls and VPNs.
7. Shared File System:
 Set up a shared file system for data storage and access. This can be a parallel file
system (e.g., Lustre, GPFS) or a distributed file system.
 Ensure proper permissions and access controls.
8. Node Provisioning:
 Utilize tools like PXE (Preboot Execution Environment) or IPMI (Intelligent Platform
Management Interface) to facilitate node provisioning.
 Automate the installation of the OS and cluster software across all nodes.
9. Security Considerations:
 Implement security best practices, including firewalls, secure shell (SSH)
configurations, and regular system updates.
 Secure communication between nodes and enforce user authentication and
authorization.
10. Monitoring and Logging:
 Set up monitoring tools to track cluster performance, resource usage, and
potential issues.
 Configure logging mechanisms to capture system events and errors.
11. Backup and Recovery:
 Implement a robust backup strategy for critical data and configurations.
 Establish recovery procedures in case of hardware failures or system crashes.
12. User Management:
 Set up user accounts and manage user access to the cluster.
 Configure permissions and quotas to control resource usage.
13. Documentation:
 Maintain comprehensive documentation that includes cluster architecture,
configurations, and procedures.
 Document troubleshooting steps and solutions for common issues.
14. Testing and Optimization:
 Conduct thorough testing of the cluster using benchmarking tools and sample
workloads.
 Optimize the cluster configuration based on performance evaluations.
15. Training and Support:
 Provide training for users and administrators on cluster usage and best practices.
 Establish a support system for addressing user inquiries and resolving issues.
16. Scale and Expand:
 Plan for scalability by designing the cluster to accommodate future growth.
 Implement procedures for adding new nodes and expanding storage capacity.

Setting up and administering a cluster is an iterative process that involves


continuous monitoring, optimization, and adaptation to changing requirements.
Regular maintenance and updates are essential for keeping the cluster in optimal
working condition.

Resource Management and Scheduling (RMS)


RMS is the process of distributing user’s applications among computers to maximize the throughput.
The software that performs the RMS has two components, namely resource manager and resource
scheduler. Resource manager deals with locating and allocating computational resources,
authentication, process creation and migration whereas the resource scheduler deals with queuing
applications, resource location and assignment.
RMS is a client-server system. The jobs are submitted to the RMS environment and the environment
is responsible for place, schedule and run the job in the appropriate way.
The services provided by a RMS environment are as follows.
1. Process Migration
2. Checkpointing
 Taxonomy of Checkpoint Implementation [14]
o Application-Level
 Single Threaded
 Multi Threaded
 Mig Threaded
o User-Level
 Patch
 Library
o System-Level
 Kernel-level
 Hardware-level
3. Scavenging Idle Cycles
4. Fault Tolerance
5. Minimization of Impact on Users
6. Load Balancing
7. Multiple Application Queue
There are many commercial and research packages available for RMS as follows.
1. LSF (http://www.platform.com/)
 The full form of LSF is Load Sharing Facility
 Fair Share
 Preemptive
 Backfill and Service Level Agreement (SLA) Scheduling
 High Throughput Scheduling
 Multi-cluster Scheduling
 Topology, Resource and Energy-aware Scheduling
 LSF is a job scheduling and monitoring software system developed and maintained by
Platform Computing.
 LSF is used to run jobs on the blade center.
 A job is submitted from one of the head nodes (login01, login02 for 32-bit jobs, login03
for jobs compiled to use 64-bits) and waits until resources become available on the
computational nodes.
 Jobs which ask for 4 or fewer processors and 15 minutes or less time are given a high
priority.
2. CODINE (http://www.genias.de/products/codine/tech_desc.html)
 The full form of CODINE is Computing in Distributed Networked Environments.
 Advanced Reservation
 CODINE was a grid computing computer cluster software system, developed and
supported by Sun Microsystems and later Oracle [12].

Figure 1 CODINE [13]


Programming Environments and Tools, Applications
1. Threads
 It is a light-weight process.
 It is applicable for concurrent programming on both uniprocessor and multi-
processors computers.
2. Message Passing Systems (MPI and PVM) [10-11]
 MPI is a specification for the developers and users of message passing libraries. It is
not a library.
 MPI primarily addresses the message-passing parallel programming model. Data is
moved from the address space of one process to that of another process through
cooperative operations on each process.
 The goal of the MPI is to provide a widely used standard for writing message passing
programs. The interface attempts to be:
o practical
o portable
o efficient
o flexible
 The MPI standard has gone through a number of revisions, with the most
recent version being MPI-3.
 Interface specifications have been defined for C and Fortran90 language bindings:
o C++ bindings from MPI-1 are removed in MPI-3
o MPI-3 also provides support for Fortran 2003 and 2008 features
 Actual MPI library implementations differ in which version and features of the MPI
standard they support.
 MPI runs on virtually any hardware platform:
o Distributed Memory
o Shared Memory
o Hybrid
 Reasons for using MPI
o Standardization
o Portability
o Performance Opportunities
o Functionality
o Availability
 PVM is a software package that permits a heterogeneous collection of Unix and/or
Windows computers hooked together by a network to be used as a single large
parallel computer. Thus large computational problems can be solved more cost
effectively by using the aggregate power and memory of many computers.
 PVM software is very portable.
 PVM enables users to exploit their existing computer hardware to solve much larger
problems at minimal additional cost. Hundreds of sites around the world are using
PVM to solve important scientific, industrial and medical problems in addition to
PVM’s use as an educational tool to teach parallel programming. With tens of
thousands of users, PVM has become the de facto standard for distributed computing
world-wide.
3. Distributed Shared Memory (DSM) Systems
4. Parallel Debuggers and Profilers
5. Performance Analysis Tools
6. Cluster Administration Tools

Cluster Applications
1. Grand Challenge Applications (GCAs)
 Crystallographic and Microtomographic Structural Problems
 Protein Dynamics and Biocatalysis
 Relativistic Quantum Chemistry of Antinides
 Virtual Materials Design and Processing
 Global Climate Modeling
 Discrete Event Simulation
2. Supercomputing Applications
3. Computational Intensive Applications
4. Data or I/O Intensive Applications
5. Transaction Intensive Applications
Representative Cluster Systems, Heterogeneous Clusters
Many projects are investigating the development of supercomputing class machines using
commodity off-the-shelf components (COTS). The popular projects are listed as follows.
Project Organization
Network of Workstations (NOW) University of California, Berkeley
High Performance Virtual Machine (HPVM) University of Illinois, Urbana-Champaign
Beowulf Goddard Space Flight Center, NASA
Solaris-MC Sun Labs, Sun Microsystems, Palo Alto, CA

NOW
 Platform: PCs and Workstations
 Communications: Myrinet
 OS: Solaris
 Tool: PVM
HPVM
 Platform: PCs
 Communications: Myrinet
 OS: Linux
 Tool: MPI
Beowulf
 Platform: PCs
 Communications: Multiple Ethernet with TCP/IP
 OS: Linux
 Tool: MPI/PVM
Solaris-MC
 Platform: PCs and Workstations
 Communications: Solaris supported
 OS: Solaris
 Tool: C++ and CORBA
Heterogeneous Clusters
 Clusters deliberately heterogeneous in order to explore the higher floating
point performance of certain architecture and the low cost of other systems.
 A heterogeneous layout means automating administration work will obviously become
more complex, i.e., software packaging is different.
 Major challenges [16]:
o Four major challenges that must be overcome so that heterogeneous computing
clusters emerge as the preferred platform for executing a wide variety of enterprise
workloads.
o First, most enterprise applications in use today were not designed to run on such
dynamic, open and heterogeneous computing clusters. Migrating these applications to
heterogeneous computing clusters, especially with substantial improvement in
performance or energy-efficiency, is an open problem.
o Second, creating new enterprise applications ground-up to execute on the new,
heterogeneous computing platform is also daunting. Writing high-performance,
energy-efficient programs for these architectures is extremely challenging due to the
unprecedented scale of parallelism, and heterogeneity in computing, interconnect and
storage units.
o Third, cost savings from the new shared-infrastructure architecture for consumption
and delivery of IT services are only possible when multiple enterprise applications can
amicably share resources (multi-tenancy) in the heterogeneous computing cluster.
However, enabling multi-tenancy without adversely impacting the stringent quality of
service metrics of each application calls for dynamic scalability and virtualization of a
wide variety of diverse computing, storage and interconnect units, and this is yet
another unsolved problem.
o Finally, enterprise applications encounter highly varying user loads, with spikes of
unusually heavy load. Meeting quality of service metrics across varying loads calls for
an elastic computing infrastructure that can automatically provision (increase or
decrease) computing resources used by an application in response to varying user
demand. Currently, no good solutions exist to meet this challenge.
Security, Resource Sharing, Locality, Dependability Security
There is always a tradeoff between usability and security. Allowing rsh (remote shell) access from the
outside to each node just by matching usernames and hosts with each user’s .rhosts file is not good
as a security incident in a single node compromises the security of all the systems who share that
user’s home. For instance, mail can be abused in a similar way – just change that user’s .forward
file to do the mail delivery via a pipe to an interesting executable or script. A service is not safe
unless all of the services it depends on are at least equally safe.

Connecting all the nodes directly to the external network may cause two main problems. First, we
make temporary changes and forget to restore them. Second, systems tend to have information
leaks. The operating system and its version can easily guessed just with IP access, even with
harmless services running and almost all operating system have serious security problems in their
IP stack in the recent history.

Unencrypted Versus Encrypted Sustained Throughput

Unencrypted Stream >7.5MB/s


Blowfish Encrypted Stream 2.75MB/s
Idea Encrypted Stream 1.8MB/s
3DES Encrypted Stream 0.75MB/s

Special care must be taken when building clusters of clusters is done. The approach for making these
metaclusters secure is building secure tunnels between the clusters, usually from front-end to front-
end
If intermediate backbone switches can be trusted and have the necessary software or resources, they
can setup a VLAN joining the clusters, achieving greater bandwidth and lower latency than routing
at the IP level via the front-ends.

Resource Sharing
Resource Sharing need the cooperation among the processors to ensure that no processor is idle while
there are tasks waiting for service.
In Load Sharing, three location policies were studied. They are random policy, threshold policy and
shortest policy.
Threshold policy probes a limited number of nodes. It terminates the probing as soon as it finds a node
with a queue length shorter than the threshold.
Shortest policy probes several nodes and then selects the one having the shortest queue, from among
those having queue lengths shorter than the threshold.
In the Flexible Load Sharing Algorithm (FLS) a location policy similar to threshold is used. In contrast
to threshold, FLS bases its decisions on local information which is possibly replicated at multiple
nodes. For scalability, FLS divides a system into small subsets which may overlap. Each of these
subsets forms a cache held at a node. This algorithm supports mutual inclusion or exclusion. It is
noteworthy to mention that FLS does not attempt to produce the best possible solution, but like
threshold, it offers instead an adequate one, at a fraction of the cost.

High Availability:

High Availability (HA) in cluster technology refers to the ability of a system to remain
operational and accessible even in the presence of hardware or software failures. Here are
some key technologies and strategies used to achieve high availability in clusters:

1. Redundancy:
 Hardware Redundancy: Use redundant hardware components such as power supplies, network
interfaces, and storage devices to eliminate single points of failure.
 Node Redundancy: Deploy multiple nodes in the cluster so that if one node fails, others can take
over the workload.
2. Failover Mechanisms:
 Implement failover mechanisms to automatically redirect traffic or workload from a failed node to a
healthy one.
 Cluster software monitors the health of nodes and triggers failover when necessary.
3. Load Balancing:
 Distribute workloads evenly across cluster nodes to prevent any single node from becoming a
performance bottleneck.
 Load balancing helps in maximizing resource utilization and improves overall system responsiveness.
4. Quorum Systems:
 Quorum systems help prevent split-brain scenarios, where nodes in a cluster lose communication
with each other and may independently continue operations.
 By requiring a majority of nodes to agree on the cluster state, quorum systems ensure that only one
partition of the cluster remains active.
5. Cluster Communication Protocols:
 Use reliable and efficient communication protocols between nodes to detect failures and coordinate
actions.
 Communication protocols like Heartbeat and Corosync are often employed to monitor the health of
nodes.
6. Shared Storage:
 Implement shared storage systems to allow nodes to access the same data.
 In case of a node failure, another node can take over and access the data seamlessly.
7. Cluster File Systems:
 Utilize cluster file systems, such as GFS (Global File System) or Lustre, which are designed to provide
shared access to files among cluster nodes.
 These file systems enable concurrent access to data and facilitate failover.
8. Virtualization:
 Virtualization technologies, such as VMware High Availability (HA) or Microsoft Hyper-V Replica, can
be used to provide failover for virtual machines.
 Virtualization abstracts applications from the underlying hardware, making it easier to move
workloads between nodes.
9. Backup and Restore:
 Regularly back up critical data and configurations to facilitate quick recovery in the event of a failure.
 Automated backup solutions and off-site storage contribute to data integrity and availability.
10. Monitoring and Alerting:
 Implement comprehensive monitoring systems to track the health and performance of cluster nodes.
 Set up alerting mechanisms to notify administrators of potential issues before they escalate.
11. Power and Environmental Redundancy:
 Ensure power redundancy with uninterruptible power supplies (UPS) and backup generators.
 Implement environmental controls, such as cooling systems, to prevent hardware failures due to
overheating.
12. Automated Repair and Maintenance:
 Use automation tools to perform routine maintenance tasks and apply software updates without
causing downtime.
 Automated repair mechanisms can fix common issues without manual intervention.
13. Geographic Redundancy:
 For critical systems, consider geographical redundancy by deploying clusters in different physical
locations.
 This helps protect against regional disasters and ensures continuity of operations.
14. Database Replication:
 Implement database replication mechanisms to maintain copies of databases on multiple nodes.
 In case of a node failure, the database can be accessed from a replicated copy on another node.
15. Documentation and Training:
 Maintain detailed documentation of the HA configuration, failover procedures, and recovery
processes.
 Train administrators and support staff on the HA setup to ensure a quick and effective response to
failures.
High availability in cluster technology is a multifaceted approach that combines hardware
redundancy, failover mechanisms, and smart resource management. The goal is to minimize
downtime, ensure data integrity, and provide uninterrupted services to end-users.

Performance Model and Simulation:

Performance modeling and simulation are essential tools in computer science and engineering to
predict, analyze, and optimize the performance of systems, applications, or networks. These tools help
in understanding the behavior of complex systems and making informed decisions about design and
resource allocation. Here are key aspects of performance modeling and simulation:

Performance Modeling:

1. Definition:

- Performance modeling involves creating an abstract representation of a system to understand its


behavior under different conditions.

- It can be applied to various domains, including computer systems, networks, and software
applications.

2. Types of Models:
Analytical Models: Use mathematical equations to describe the system's behavior. Examples
include queuing models and network models.

Simulation Models: Utilize simulation software to mimic the behavior of the actual system. Monte
Carlo simulations and discrete event simulations fall into this category.

Empirical Models: Derived from observed measurements and data. Regression analysis is
commonly used to create empirical models.

3. Performance Metrics:

- Define performance metrics relevant to the system being modeled, such as response time,
throughput, and resource utilization.
- Metrics help in quantifying and comparing the performance of different system configurations.
4. Queuing Theory:
- Commonly used in modeling systems where entities (tasks, jobs, etc.) wait in line before being
processed.
- Queuing models help analyze and optimize the use of resources and predict system performance.

5. Petri Nets:

- Petri Nets are graphical and mathematical modeling languages used for the specification,
simulation, and verification of systems.
- They are particularly useful for modeling concurrent and distributed systems.

Performance Simulation:

1. Definition:

- Performance simulation involves running a model of a system to observe and analyze its behavior
over time.

- Simulation helps in understanding system dynamics, identifying bottlenecks, and evaluating the
impact of changes.

2. Advantages:
Cost-Effective: Simulation allows experimentation in a virtual environment without the costs
associated with real-world testing.
Flexibility: Simulations can be easily modified to test various scenarios and configurations.

Risk Mitigation: Simulating potential changes or improvements allows for risk assessment before
implementation.

3. Discrete Event Simulation:


- Systems are modeled as a sequence of events that occur at distinct points in time.
- Useful for modeling complex systems with discrete, interactive components.
4. Monte Carlo Simulation:
- Uses random sampling to model the probability of different outcomes in a system.
- Particularly effective for assessing risk and uncertainty in a system.

5. Performance Evaluation:

- Evaluate the performance of a system by analyzing simulation results against predefined


performance metrics.
- Assess how changes in system parameters impact overall performance.

6. System Validation:
- Use simulations to validate and verify the behavior of a system before its actual implementation.
- Helps in detecting potential issues and refining the system design.

7. Application in Computer Networks:

- Performance simulation is widely used in networking to model communication protocols, evaluate


network topologies, and assess the impact of network changes.

8. Model Calibration:
- Adjust model parameters to match the simulation's output with real-world observations.
- Calibration ensures that the simulation accurately reflects the behavior of the actual system.

9. Toolsets:

- Various simulation tools are available, such as OPNET, NS-3, and Simulink, each with its specific
strengths and applications.

10. Parallel and Distributed Systems:

- Simulation is crucial for understanding the performance of parallel and distributed computing
systems.
- It helps in optimizing resource allocation and load balancing.
Challenges:

1. Model Accuracy:

- Achieving an accurate representation of the real system can be challenging, particularly when
dealing with complex and dynamic environments.

2. Resource Intensive:

- Some simulations can be computationally intensive, requiring significant computational resources


and time.

3. Validation and Verification:

- Ensuring that the simulation model accurately reflects the behavior of the real system requires
careful validation and verification processes.

Performance modeling and simulation play a crucial role in system design, optimization, and decision-
making processes. By providing insights into the behavior of complex systems, these tools contribute
to the development of more efficient and reliable technologies.

Process Scheduling:

1. Definition:
- Process scheduling is the mechanism used by an operating system to manage the execution of
processes on the CPU.

2. Scheduling Policies:
- First-Come-First-Serve (FCFS): Processes are executed in the order they arrive.
- Shortest Job Next (SJN): The process with the shortest burst time is selected next.
- Round Robin (RR): Each process gets a fixed time slot before the next process is selected.
- Priority Scheduling: Processes are assigned priority levels, and the one with the highest priority is
selected next.

3. Context Switching:

- When the operating system switches from executing one process to another, it performs a context
switch.

- Context switching involves saving the state of the current process and loading the saved state of the
next process.

4. Preemption:
- Preemptive scheduling allows a higher-priority process to interrupt and temporarily suspend the
execution of a lower-priority process.

Load Sharing and Load Balancing:

1. Load Sharing:
- Load sharing involves distributing the workload among multiple processors or nodes.
- It aims to improve overall system performance by utilizing available resources efficiently.

2. Load Balancing:
- Load balancing is the process of distributing the workload evenly across all processors or nodes in
a system.
- Ensures that no single processor is overloaded while others remain underutilized.

3. Centralized vs. Decentralized Load Balancing:


- Centralized Load Balancing: A central controller makes decisions on workload distribution.
- Decentralized Load Balancing: Nodes or processors collaborate to make load balancing decisions.

4. Static vs. Dynamic Load Balancing:


- Static Load Balancing: Workload distribution is determined in advance and remains unchanged
during execution.
- Dynamic Load Balancing: Adjusts workload distribution dynamically based on runtime conditions.

5. Algorithms:
- Round Robin: Distributes tasks in a circular order.
- Weighted Round Robin: Assigns different weights to tasks based on their complexity.
- Least Connections: Assigns tasks to the server with the fewest active connections.
- Randomized Load Balancing: Assigns tasks randomly to available servers.

6. Challenges:
- Dynamic changes in workload can make load balancing challenging.
- Overhead associated with monitoring and redistributing tasks.

Distributed Shared Memory (DSM):

1. Definition:

- Distributed Shared Memory allows multiple nodes in a distributed system to share a common
address space, providing the illusion of a single shared memory space.

2. Architecture:
- Hardware DSM: Shared memory is implemented at the hardware level.
- Software DSM: Shared memory is implemented using software libraries and protocols.

3. Consistency Models:

Sequential Consistency: The result of any execution is the same as if the operations of all processors
were executed in some sequential order.
Causal Consistency: Preserves causality between operations initiated by different processors.
Release Consistency: Operations are grouped into critical sections, and consistency is maintained
only at the entry and exit of these sections.

4. Coherence Protocols:
Write-Once Protocol: Each block of memory is only written by one processor.
Invalidation Protocol: When a processor writes to a block, all other copies are invalidated.
Update Protocol: Updates are made directly to all copies of a block.

5. Advantages:

- Simplifies programming in distributed systems by providing a familiar shared-memory


programming model.
- Allows for easy communication and data sharing between processes running on different nodes.

6. Challenges:
- Overhead in maintaining coherence between distributed copies of memory.
- Increased latency compared to local shared memory systems.

Distributed shared memory, process scheduling, load sharing, and load balancing are critical aspects
of distributed computing. These concepts play a crucial role in improving the efficiency,
performance, and scalability of systems in distributed environments.
Unit 4

Grid Computing: Introduction to Grid Computing, Virtual


Organizations, Architecture, Applications, Computational,
Data, Desktop and Enterprise Grids, Data-intensive
Applications
Grid computing has emerged as an important new field, distinguished from conventional
distributed computing by its focus on large-scale resource sharing, innovative applications
and in some cases, high-performance orientation. The term “the Grid” was coined in the
mid1990s to denote a proposed distributed computing infrastructure for advanced science
and engineering. Considerable progress has since been made on the construction of such
an infrastructure, but the term “Grid” has also been conflated, at least in popular
perception, to embrace everything from advanced networking to artificial intelligence.
One might wonder whether the term has any real substance and meaning. Is there really a
distinct “Grid problem” and hence a need for new “Grid technologies”? If so, what is the
nature of these technologies, and what is their domain of applicability? While numerous
groups have interest in Grid concepts and share, to a significant extent, a common vision
of Grid architecture, we do not see consensus on the answers to these questions. The Grid
concept is indeed motivated by a real and specific problem and that there is an emerging,
well-defined Grid technology base that solves this problem. Grid technologies are
currently distinct from other major technology trends, such as Internet, enterprise,
distributed, and peer-to-peer computing, these other trends can benefit significantly from
growing into the problem space addressed by Grid technologies.
The real and specific problem that underlies the Grid concept is coordinated resource sharing
and problem solving in dynamic, multi-institutional virtual organizations. The sharing that
we are concerned with is not primarily file exchange but rather direct access to computers,
software, data and other resources, as is required by a range of collaborative problem-
solving and resource brokering strategies emerging in industry, science, and engineering.
This sharing is, necessarily, highly controlled, with resource providers and consumers
defining clearly and carefully just what is shared, who is allowed to share, and the
conditions under which sharing occurs. A set of individuals and/or institutions defined by
such sharing rules are called a virtual organization (VO). The following are examples of
VOs: the application service providers, storage service providers, cycle providers, and
consultants engaged by a car manufacturer to perform scenario evaluation during planning
for a new factory; members of an industrial consortium bidding on a new aircraft; a crisis
management team and the databases and simulation systems that they use to plan a
response to an emergency situation; and members of a large, international, multiyear high
energy physics collaboration. Each of these examples represents an approach to computing
and problem solving based on collaboration in computation and data rich environments.
As these examples show, VOs vary tremendously in their purpose, scope, size, duration,
structure, community, and sociology. Nevertheless, careful study of underlying technology
requirements leads us to identify a broad set of common concerns and requirements. In
particular, the need for highly flexible sharing relationships, ranging from client-server to
peer- to-peer and brokered; for complex and high levels of control over how shared
resources are used, including fine-grained access control, delegation, and application of
local and global policies; for sharing of varied resources, ranging from programs, files,
and data to computers, sensors, and networks; and for diverse usage modes, ranging from
single user to multi-user and from performance sensitive to cost-sensitive and hence
embracing issues of quality of service, scheduling, co-allocation, and accounting

Grid Architecture:
1. Definition:

 Grid computing is a distributed computing paradigm that involves the coordinated sharing
and use of resources across multiple administrative domains.

2. Characteristics of the Grid:

 Distributed Resources: Resources like computing power, storage, and applications are
distributed across multiple locations.
 Virtual Organization: Users and resources are organized into virtual organizations,
crossing administrative boundaries.
 Dynamic Resource Allocation: Resources can be dynamically allocated and de-allocated
based on demand.
 High Performance: Grids are designed to handle computationally intensive tasks, often
involving parallel processing.

Characterization of the Grid:


1. Resource Heterogeneity:

 Grids integrate diverse resources, including different hardware architectures, operating


systems, and storage systems.

2. Virtual Organizations:

 Grid users and resources are organized into virtual organizations (VOs) based on their
requirements and affiliations.

3. Scalability:

 Grids are designed to scale horizontally, accommodating a large number of resources and
users.
4. Collaboration:

 Collaboration is a key aspect, allowing users and organizations to share resources and work
together on large-scale projects.

Grid-Related Standard Bodies:


1. Open Grid Forum (OGF):

 OGF develops open standards and specifications to facilitate the adoption of grid computing
technologies.

2. Global Grid Forum (GGF):

 The predecessor of OGF, GGF contributed to the development of grid standards and best
practices.

Grid Types:
1. Computational Grids:

 Focus on sharing computing resources for parallel processing tasks.

2. Data Grids:

 Emphasize the management and sharing of large-scale distributed data.

3. Collaborative Grids:

 Facilitate collaboration among geographically dispersed users and teams.

4. Utility Grids:

 Provide computing resources as a utility, similar to traditional utility services.

Topologies:
1. Tree Topology:

 Hierarchical structure with a central coordinator.

2. Mesh Topology:

 Nodes are interconnected, allowing for multiple communication paths.

3. Cluster Topology:

 Nodes are grouped into clusters with high interconnectivity within clusters.
4. Hybrid Topology:

 Combination of different topologies to achieve specific goals.

Components of Grid:
1. Resource Management:

 Allocates and manages resources efficiently, considering user requirements.

2. Grid Middleware:

 Software layer that enables communication and coordination among diverse resources in
the grid.

3. Grid Security:

 Ensures secure communication, authentication, and authorization in a distributed


environment.

4. Grid Applications:

 Custom applications designed to leverage the grid infrastructure for specific tasks.

Layers of Grid:
1. Fabric Layer:

 Physical infrastructure layer consisting of computing nodes, storage devices, and network
connections.

2. Connectivity Layer:

 Network infrastructure that facilitates communication among grid components.

3. Resource Layer:

 Manages and provides access to computing resources, storage, and other services.

4. Collective Layer:

 Coordinates and manages multiple resources for parallel processing and collaborative tasks.

5. Application Layer:

 Contains user applications that interact with the grid infrastructure.


Comparison with Other Approaches:
1. Grid vs. Cluster Computing:

 Grids involve distributed resources across multiple clusters, while clusters consist of
interconnected nodes within a single administrative domain.

2. Grid vs. Cloud Computing:

 Grids often focus on sharing computing resources, while cloud computing provides on-
demand access to a pool of configurable computing resources.

3. Grid vs. Peer-to-Peer (P2P) Computing:

 Grids are centrally managed and organized, whereas P2P computing involves decentralized
and distributed sharing of resources among peers.

Grid computing has played a significant role in addressing large-scale computational challenges by
enabling collaboration and resource sharing across organizational boundaries. The architecture,
characteristics, and standards associated with grids continue to evolve as technology advances.
Unit 5
System infrastructure
System infrastructure refers to the underlying foundation of hardware, software,
networking, and other components that support the functionality and operation of a
computer system or a network. It includes both physical and virtual components that
work together to provide a computing environment. Here are key components of
system infrastructure:

Physical Infrastructure:
1. Hardware:
 Servers: Powerful computers that provide services or resources to other computers
(clients) in the network.
 Storage: Devices or systems for storing data, which can include hard drives, solid-
state drives, and network-attached storage (NAS).
 Networking Equipment: Routers, switches, and other devices that enable
communication and data transfer between different components in the network.
 Computers and End-User Devices: Desktops, laptops, tablets, and other devices
used by end-users to access and interact with the system.
2. Data Centers:
 Facilities that house and manage servers, storage, and networking equipment.
 Designed to provide a secure and controlled environment for computing resources.
3. Power and Cooling Systems:
 Infrastructure to ensure a stable power supply and effective cooling for servers and
networking equipment in data centers.
Virtual Infrastructure:
1. Virtualization:
 Server Virtualization: Enables multiple virtual servers to run on a single physical
server, optimizing resource utilization.
 Desktop Virtualization: Allows multiple virtual desktops to run on a single physical
machine, providing flexibility for end-users.
 Storage Virtualization: Abstracts physical storage resources, making them appear as
a single, centralized storage pool.
2. Cloud Infrastructure:
 Infrastructure as a Service (IaaS) provides virtualized computing resources over the
internet, including virtual machines, storage, and networking.
 Platform as a Service (PaaS) offers a platform with development tools, allowing users
to build and deploy applications without managing the underlying infrastructure.
 Software as a Service (SaaS) provides access to software applications over the
internet without the need for installation.
Operating Systems:
1. Server Operating Systems:
 Manage server hardware resources and provide a platform for running server
applications.
2. Client Operating Systems:
 Run on end-user devices and provide a user interface for interacting with applications
and accessing network resources.
3. Embedded Operating Systems:
 Operating systems designed for embedded systems, such as those in IoT devices,
routers, and appliances.
Networking Infrastructure:
1. Network Protocols:
 Standardized rules for data communication between devices on a network. Examples
include TCP/IP, HTTP, and FTP.
2. Firewalls and Security Appliances:
 Devices that control and monitor network traffic, enforcing security policies to
protect against unauthorized access and threats.
3. Load Balancers:
 Distribute network traffic across multiple servers to ensure optimal resource
utilization and prevent server overload.
4. Switches and Routers:
 Switches connect devices within a local network, while routers connect different
networks, facilitating data transfer between them.
Software Infrastructure:
1. Middleware:
 Software that connects and manages communication between different software
applications or components.
2. Databases:
 Systems for storing, organizing, and retrieving data. Examples include relational
databases (e.g., MySQL, Oracle) and NoSQL databases (e.g., MongoDB, Cassandra).
3. Web Servers:
 Software that handles HTTP requests and responses, serving web pages to users.
Examples include Apache, Nginx, and Microsoft IIS.
4. Application Servers:
 Platforms that host and execute software applications, providing services such as
transaction management and security.
Management and Monitoring:
1. System Management Tools:
 Software for configuring, monitoring, and managing hardware and software
components in a system.
2. Monitoring and Logging Tools:
 Tools that track system performance, log events, and generate alerts in case of
anomalies or issues.
3. Configuration Management:
 Systems and tools for automating the configuration and management of infrastructure
components.
A well-designed system infrastructure is critical for the reliability, performance, and scalability of
computer systems. It provides the foundation for running applications, storing and managing data,
and facilitating communication within and across networks.

Traditional Paradigms for distributed computing:


Traditional paradigms for distributed computing refer to the foundational models and approaches
that have been historically used to design and implement distributed systems. These paradigms have
played a crucial role in shaping the principles and concepts underlying distributed computing. Here
are some of the traditional paradigms:

1. Client-Server Model:

- In the client-server model, tasks are divided between client and server entities.

- Clients request services or resources from servers, which respond to these requests.

- Centralized servers manage resources, and clients are responsible for user interfaces and local
processing.

2. Peer-to-Peer (P2P) Model:

- In a peer-to-peer model, nodes (peers) participate in resource sharing and collaborative


processing.

- Peers act both as clients and servers, sharing resources (files, processing power) directly with
each other.

- P2P networks can be either decentralized or have a degree of centralization.

3. Remote Procedure Call (RPC):

- RPC is a communication paradigm that allows a program to cause a procedure to execute on


another address space (commonly on another machine).

- It abstracts the communication between distributed components, making it appear as if they


were local.

4. Message Passing:

- In message passing, processes communicate by sending and receiving messages.

- This paradigm is commonly used in parallel and distributed computing to facilitate


communication between different nodes.

5. Distributed Shared Memory (DSM):


- DSM provides a shared memory abstraction in a distributed system, allowing multiple processes
to access a common address space.

- Processes can read and write to shared memory locations, even if they are physically distributed
across different nodes.

6. Object-Oriented Paradigm:

- The object-oriented paradigm involves designing distributed systems using object-oriented


principles.

- Objects encapsulate data and behavior, and distributed objects communicate through method
invocations.

7. Tuple Spaces:

- Tuple spaces provide a shared memory abstraction in which processes can store and retrieve
tuples.

- Processes communicate by adding and removing tuples from the shared space.

8. Batch Processing:

- Batch processing involves the execution of a series of jobs without user interaction.

- Distributed batch processing systems distribute tasks across multiple nodes for parallel execution.

9. Cluster Computing:

- Cluster computing involves connecting multiple computers to work together as a single system.

- Nodes in a cluster typically share a common storage system and are used for parallel processing.

10. Grid Computing:

- Grid computing extends the concept of cluster computing to a larger scale, often involving
geographically distributed resources.

- Resources from multiple administrative domains are coordinated to work together.

11. Mobile Agent Paradigm:

- Mobile agents are autonomous software entities that can migrate between systems to perform
tasks.

- This paradigm is used for distributed problem-solving and resource management.


12. Stream Processing:

- In stream processing, data is processed as it is generated, rather than being stored and processed
later.

- This paradigm is well-suited for real-time analytics and processing continuous data streams.

These traditional paradigms have laid the groundwork for the development of more modern and
specialized approaches to distributed computing, such as cloud computing, edge computing, and
microservices architectures. Each paradigm addresses specific challenges and requirements in
distributed systems, reflecting the evolution of distributed computing over time.

Web Services:
Web services are a standardized way of integrating web-based applications over the internet. They
enable communication and data exchange between different software systems, regardless of the
programming languages, platforms, or devices they are built on. Web services use standard web
protocols and formats, such as HTTP, XML, and JSON, to provide interoperability between diverse
applications. Here are key aspects of web services:

1. Key Components:

a. SOAP (Simple Object Access Protocol):

- SOAP is a protocol for exchanging structured information in web services.

- It uses XML for message formatting and relies on HTTP, SMTP, or other protocols for message
transmission.

b. REST (Representational State Transfer):

- REST is an architectural style for designing networked applications.

- It uses standard HTTP methods (GET, POST, PUT, DELETE) and is often simpler than SOAP.

2. Service Description:

a. WSDL (Web Services Description Language):

- WSDL is an XML-based language that describes web services and their available operations.

- It provides a standardized way for clients to understand the functionality offered by a web
service.
3. Communication Protocols:

a. HTTP/HTTPS:

- Most web services communicate over HTTP or its secure variant, HTTPS.

- These protocols are widely supported, making it easy for different systems to interact.

b. MIME (Multipurpose Internet Mail Extensions):

- MIME types are used to specify the nature and format of a document or file.

- They play a role in web service communication by defining the data types being exchanged.

4. Data Formats:

a. XML (eXtensible Markup Language):

- XML is a widely used format for structuring data in web services.

- It provides a standard way to represent information that is both human-readable and machine-
readable.

b. JSON (JavaScript Object Notation):

- JSON is a lightweight data interchange format that is easy for humans to read and write.

- It has gained popularity for its simplicity and ease of use in web services.

5. Web Service Operations:

a. GET:

- Retrieves information from the server.

b. POST:

- Sends data to the server to create a new resource.

c. PUT:

- Updates an existing resource on the server.


d. DELETE:

- Removes a resource from the server.

6. Security in Web Services:

a. SSL/TLS:

- Secure Socket Layer (SSL) and Transport Layer Security (TLS) protocols are used to secure
communication between clients and web services.

b. WS-Security:

- A set of specifications for securing SOAP messages in web services.

7. Discovery and Registration:

a. UDDI (Universal Description, Discovery, and Integration):

- UDDI is a directory service where businesses can register and discover web services.

8. Types of Web Services:

a. SOAP Web Services:

- SOAP (Simple Object Access Protocol) is a protocol for exchanging structured information in web
services.

b. RESTful Web Services:

- REST (Representational State Transfer) is an architectural style that uses standard HTTP methods
and is often simpler than SOAP.

9. Web Service Life Cycle:

a. Publishing:

- Making a web service available for others to use.


b. Discovery:

- Allowing potential users to find and learn about the web service.

c. Binding:

- The process of creating a connection between a service and a client.

d. Invocation:

- The actual use of the web service by a client.

10. Web Services in Cloud Computing:

- Web services play a crucial role in cloud computing by facilitating communication and integration
between different cloud-based applications and services.

11. Advantages:

a. Interoperability:

- Web services enable interoperability between applications developed in different languages or


running on different platforms.

b. Reusability:

- Web services are designed to be reusable, allowing multiple applications to use the same service.

c. Scalability:

- Web services can be easily scaled to accommodate an increasing number of users or clients.

d. Standardization:

- The use of standard protocols and formats makes web services a widely accepted and
standardized technology.

12. Challenges:
a. Security Concerns:

- Ensuring the secure transmission of sensitive data is a significant challenge in web services.

b. Versioning:

- Managing changes and updates to web service interfaces without disrupting existing users.

c. Latency:

- The overhead of communication and data serialization can introduce latency in web service
interactions.

Web services have become a foundational technology for modern distributed systems, providing a
standardized and interoperable way for different software applications to communicate and
collaborate over the internet.

Grid Standards:
Grid computing involves the coordinated use of distributed computing resources, and the
development of standards is crucial to ensure interoperability and seamless integration of diverse
components. Several standards have been established to facilitate the implementation and
operation of grid systems. Here are some key grid standards:

1. Open Grid Services Architecture (OGSA):

- Overview: OGSA defines a service-oriented architecture for grid computing.

- Key Features:

- Service-based model with standardized interfaces.

- Emphasizes the use of web services for communication.

- Supports the creation and management of virtual organizations.

2. Web Services Resource Framework (WSRF):

- Overview: WSRF extends web services concepts to address stateful resources in a grid
environment.

- Key Features:

- Introduces the concept of stateful resources with unique identifiers.

- Defines resource properties and lifetime management.

3. Open Grid Services Infrastructure (OGSI):


- Overview: OGSI is a precursor to OGSA and provides a set of specifications for building grid
services.

- Key Features:

- Defines interfaces and protocols for grid services.

- Specifies how grid services can be discovered and composed.

4. Grid Security Infrastructure (GSI):

- Overview: GSI addresses security concerns in grid computing.

- Key Features:

- Utilizes X.509 certificates for authentication.

- Implements secure communication through protocols like SSL/TLS.

- Supports authorization mechanisms.

5. GridFTP:

- Overview: GridFTP is an extension of the File Transfer Protocol (FTP) designed for grid
environments.

- Key Features:

- Supports high-performance, secure, and reliable file transfer in grid computing.

- Implements features like parallel data transfer and third-party transfers.

6. Job Submission Description Language (JSDL):

- Overview: JSDL is a specification for describing jobs in grid and distributed computing
environments.

- Key Features:

- Defines a standardized way to describe job requirements, execution environments, and


dependencies.

- Enables interoperability between job submission systems.

7. Common Information Model (CIM):

- Overview: CIM is a standard for describing managed elements in a network environment.

- Key Features:

- Defines a common language for expressing management information.


- Supports modeling grid resources and services.

8. Resource Management Facility (RMF):

- Overview: RMF is a standard for managing and provisioning resources in a grid environment.

- Key Features:

- Specifies interfaces for resource allocation, monitoring, and scheduling.

- Addresses resource management challenges in grid systems.

9. Grid Computing Now (GCN) Standardization Framework:

- Overview: GCN is a framework that provides guidelines and recommendations for standardizing
grid computing.

- Key Features:

- Emphasizes the importance of interoperability and integration.

- Recommends the use of existing standards and the development of new ones as needed.

10. OASIS Topology and Orchestration Specification for Cloud Applications (TOSCA):

- Overview: While initially focused on cloud computing, TOSCA is also relevant to grid computing.

- Key Features:

- Provides a standardized way to describe the topology of services and their orchestration.

- Supports the interoperability of services across diverse environments.

11. Global Grid Forum (GGF) Standards:

- Overview: GGF (now part of the Open Grid Forum) contributed to various grid computing
standards and specifications.

- Key Features:

- GGF developed standards for grid computing in areas such as resource management, security,
and data management.

12. eXtensible Access Control Markup Language (XACML):

- Overview: XACML is an XML-based language for expressing policies and access control rules.

- Key Features:
- Defines a standardized way to manage access control in grid and distributed computing
environments.

13. Network Weather Service (NWS):

- Overview: NWS is a set of specifications for collecting and disseminating information about the
computational resources in a grid environment.

- Key Features:

- Aims to provide information on the current and predicted state of resources for better resource
management.

These standards and frameworks contribute to the development and deployment of grid computing
solutions by ensuring consistency, interoperability, and security across diverse grid environments. As
technology evolves, new standards may emerge or existing ones may be updated to address the
changing landscape of grid computing.

Case Studies of Cluster Systems:


Cluster systems, where multiple computers work together as a single system, have been
implemented across various domains to enhance performance, scalability, and reliability. Here are a
few case studies highlighting the use of cluster systems in different applications:

1. Google Cluster Architecture:

Overview:

- Google uses a massive cluster architecture to power its search engine and various other services.

- The architecture consists of commodity hardware organized into clusters managed by Google's
proprietary software.

Key Features:

- Distributed File System: Google File System (GFS) is used to store and manage large amounts of
data across the cluster.

- MapReduce: Google's MapReduce programming model allows distributed processing of large


datasets.

- Datacenter Efficiency: Clusters are distributed across multiple data centers worldwide for
redundancy and reliability.

2. High-Performance Computing (HPC) Cluster in Research:


Overview:

- Many scientific and research institutions utilize high-performance computing clusters for complex
simulations and data analysis.

- Example: The National Center for Supercomputing Applications (NCSA) cluster.

Key Features:

- Parallel Processing: Clusters are designed for parallel processing to handle computationally
intensive tasks.

- Distributed Memory: Utilizes message-passing interfaces (MPI) for communication among nodes.

- Specialized Hardware: Inclusion of accelerators like GPUs for specific workloads.

3. Amazon EC2 Cluster Instances:

Overview:

- Amazon Elastic Compute Cloud (EC2) provides resizable compute capacity in the cloud, and users
can create their own clusters.

- Used by businesses and researchers for various applications, including data processing and
analytics.

Key Features:

- Scalability: Users can dynamically scale the number of instances in a cluster based on demand.

- Preconfigured AMIs: Amazon Machine Images (AMIs) provide preconfigured cluster environments
for different applications.

- Cluster Networking: EC2 clusters can be configured to use high-performance networking for low-
latency communication.

4. NASA's Pleiades Supercomputer:

Overview:

- Pleiades is one of NASA's supercomputing clusters used for advanced simulations, climate
modeling, and astrophysics research.

- It consists of thousands of nodes and is among the most powerful supercomputers.


Key Features:

- High-Performance Interconnect: InfiniBand is used for high-speed inter-node communication.

- Parallel File System: Lustre is employed for efficient parallel storage.

- Diverse Workloads: Supports a wide range of scientific applications and simulations.

5. Financial Services:

Overview:

- Financial institutions often use cluster computing for risk analysis, algorithmic trading, and other
computational finance tasks.

- Example: JPMorgan's Athena platform.

Key Features:

- Real-Time Analytics: Clusters are used for real-time analytics to inform trading decisions.

- Distributed Data Processing: Handles vast amounts of financial data for modeling and analysis.

- Scalability: Easily scales to accommodate growing computational demands.

6. Weather Forecasting:

Overview:

- Meteorological agencies use cluster systems for weather modeling and prediction.

- Example: The European Centre for Medium-Range Weather Forecasts (ECMWF).

Key Features:

- Numerical Weather Prediction (NWP): Clusters simulate atmospheric conditions using NWP
models.

- Ensemble Forecasting: Multiple simulations run concurrently to provide probabilistic weather


predictions.

- Global Collaboration: Data from multiple clusters worldwide contribute to global weather models.

7. Bioinformatics and Genomic Research:


Overview:

- Cluster systems play a vital role in bioinformatics for tasks like DNA sequencing and protein
folding simulations.

- Example: The Broad Institute's Genome Analysis Toolkit (GATK) pipeline.

Key Features:

- Parallel Processing: Clusters accelerate data analysis by distributing tasks across nodes.

- Large-Scale Data Handling: Handles vast genomic datasets efficiently.

- Customized Algorithms: Clusters may use custom algorithms tailored for specific genomics tasks.

These case studies demonstrate the versatility of cluster systems across different industries and
research domains. Whether it's powering internet-scale services, advancing scientific research, or
supporting critical business operations, clusters provide a scalable and efficient solution for
demanding computing workloads.

Beowulf:
Beowulf is a term that originally referred to an Old English epic poem but has been adopted in the
field of high-performance computing to describe a particular type of clustered computing
architecture. Beowulf clusters are designed to provide parallel processing capabilities for scientific
and engineering applications. Here are key aspects of Beowulf clusters:

1. Beowulf

- A Beowulf cluster is a type of commodity-based, high-performance computing (HPC) cluster that


is built using off-the-shelf hardware components, typically running on Linux.

2. Origins:

- The term "Beowulf" for computing clusters was popularized by Dr. Thomas Sterling and Dr.
Donald Becker in the 1990s. The name was chosen to represent a cluster of interconnected,
independent, and inexpensive processors, akin to the warriors in the Old English epic poem
"Beowulf."

3. Key Characteristics:

- Commodity Hardware: Beowulf clusters use standard, off-the-shelf components such as Intel x86
processors, Ethernet networking, and Linux as the operating system.

- Parallel Processing: The architecture allows multiple processors to work in parallel, dividing
computational tasks among the nodes for increased performance.

- Message Passing Interface (MPI): Beowulf clusters often utilize MPI for communication between
nodes, enabling efficient parallel processing.
4. Components:

- Nodes: Individual computers or servers that make up the cluster.

- Interconnect: Networking infrastructure (commonly Ethernet) connecting the nodes for


communication.

- Master Node: Coordinates and manages the tasks assigned to each node.

- Compute Nodes: Perform the actual computation, running parallel tasks.

5. Software Stack:

- Linux Operating System: Beowulf clusters typically run on a Linux distribution.

- MPI Libraries: MPI is commonly used for communication and coordination between nodes.

- Cluster Management Software: Tools for managing and scheduling tasks across the cluster.

6. Applications:

- Scientific Computing: Beowulf clusters are widely used in scientific research, simulations, and
data analysis.

- Engineering Simulations: Applications in fields like computational fluid dynamics, finite element
analysis, and structural engineering.

- Bioinformatics: Genome sequencing, molecular dynamics simulations, and other bioinformatics


tasks.

7. Advantages:

- Cost-Effective: Beowulf clusters are cost-effective due to the use of commodity hardware.

- Scalability: Clusters can easily scale by adding more nodes to handle increasing computational
demands.

- Customization: Users have the flexibility to customize and configure the cluster to meet specific
requirements.

8. Challenges:

- Administration: Managing and maintaining a large number of nodes can be challenging.

- Node Heterogeneity: Ensuring consistency in performance across potentially diverse hardware


configurations.

- Fault Tolerance: Addressing issues related to node failures and ensuring uninterrupted operation.
9. Evolution:

- Over time, Beowulf clusters have evolved, incorporating advancements in hardware, networking,
and software technologies.

- Modern Beowulf clusters may include accelerators like GPUs for enhanced parallel processing.

10. Examples:

- Various research institutions, universities, and organizations worldwide have deployed Beowulf
clusters for diverse scientific and computational tasks.

Beowulf clusters represent a cost-effective and scalable approach to high-performance computing,


making them accessible to a wide range of research and engineering applications. While the term
"Beowulf" may not be as commonly used today, the principles and concepts it represents have
influenced the design and implementation of numerous high-performance computing clusters.

COMPaS:
NanOS:
PARAM:

You might also like