0% found this document useful (0 votes)
6 views

CS439 CC 2 Parallel Distributed Systems[1]

Uploaded by

haiqachaudary6
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

CS439 CC 2 Parallel Distributed Systems[1]

Uploaded by

haiqachaudary6
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 37

CS439 - Cloud Computing

Parallel and Distributed Systems


The path to cloud computing
 Cloud computing is based on the traditional
parallel and distributed systems.
 A result of knowledge and wisdom accumulated over ~60
years of computing.

 Cloud applications follow the client-server


model
 Thin-client running on the user end.
 Computations are carried out on the cloud.
Important factors
• Concurrency
• Concurrency in cloud computing refers to the ability of a system to handle
multiple tasks or operations simultaneously. This concept is critical in cloud
environments, where resources are shared among multiple users and
processes to optimize performance, cost, and scalability .
• Checkpoint-restart mechanism
• The checkpoint-restart mechanism is a fault-tolerance technique
used in computing systems to save the state of an application or
process periodically, allowing it to resume from the last saved state
in case of failure or interruption. This approach is particularly
valuable in distributed systems, high-performance computing (HPC),
and cloud computing environments where tasks may run for long
durations and are prone to failures.
• Communication
 The communication factor in cloud computing refers to the
mechanisms, strategies, and technologies used to facilitate data
exchange and coordination between different components within
a cloud environment. Effective communication is critical for
ensuring performance, reliability, scalability, and user experience
in cloud-based systems.
Parallel computing

• Parallel hardware/software systems are used


to:
 Solve problems demanding resources not
available on a single system
 Reduce the time required to obtain a solution

• The speed-up S measures the effectiveness


of parallelization:
S(N) = T(1) / T(N)
Where,
T(1) is the execution time of the sequential computation.
T(N) is the execution time when N parallel computations are
executed.
Parallel computing

• Amdahl's Law
Amdahl's Law is a principle in parallel computing that
defines the theoretical maximum speedup of a task when a
portion of the task is parallelized
S = 1/ α

• Gustafson's Law
• Gustafson's Law is a principle in parallel computing that
addresses the limitations of Amdahl's Law by considering
that workloads can scale with the number of processors.
The scaled speed-up with N parallel processes

S(N) = N + α( 1 - N)
Concurrency

• Required by system and application software:

 Reactive systems respond to external events


o e.g., operating system kernel, embedded systems.

 Improve performance
o Parallel applications partition workload & distribute it
to multiple threads running concurrently.

 Support variable load & shorten the response


time of distributed applications, like
o Transaction management systems
o Client-server applications
Concurrency

• Concurrent execution can be challenging

• Might lead to race conditions


• A race condition occurs when two or more threads or processes access
shared resources concurrently, and the final outcome depends on the
sequence or timing of their execution. This can lead to errors in a system.
• Shared resources must be protected by locks/
semaphores /monitors to ensure serial
access
• Deadlocks :A deadlock occurs when two or more processes are
unable to proceed because each is waiting for the other to release a
resource.and

• livelocks:A livelock occurs when processes or threads are not


blocked (as in a deadlock) but are unable to make progress because they
continuously change their states in response to each other. are
possible
Concurrency

Four Coffman conditions for a deadlock:


 Mutual exclusion
At least one resource must be non-sharable, only one
process/thread may use the resource at any given time
 Hold and wait
 A process holding at least one resource is waiting for additional
resources that are currently being held by other processes.No-
preemption:
 Resources cannot be forcibly taken away from a process holding
them; they must be released voluntarily by the process.
 Circular wait A closed chain of processes exists, where each
process is waiting for a resource held by the next process in the
chain.

A deadlock situation on a resource can arise iff all of the above conditions hold
simultaneously in the system.
Monitor
A monitor provides special procedures to access the data in a critical
section.
Other Challenges
• Livelock condition
Two or more processes/threads continually change their state in
response to changes in the other processes; then none of the
processes can complete its execution.
• Priority inversion

• Priority inversion occurs in a concurrent system when a high-priority task is


waiting for a resource held by a lower-priority task, but the lower-priority task
cannot release the resource because it is preempted by a medium-priority
task. This results in the high-priority task being blocked for an extended
period, which violates the expected priority order.
• Discovering parallelism is often challenging
• Development of parallel algorithms requires a
considerable effort
 For example, solving large systems of linear equations or solving
systems of PDEs (Partial Differential Equations), require algorithms
based on domain decomposition methods.
Parallelism
• Fine-grain parallelism
• Fine-grain parallelism refers to the practice of dividing a computational task
into many small, independent units of work that can be executed
simultaneously by different processing elements, such as CPU cores or
threads.
• Coarse-grain parallelism
 Coarse-grain parallelism refers to a parallel computing approach where a program or task is divided
into relatively large, independent units of work that can be executed simultaneously. Unlike fine-
grain parallelism, where tasks are small and require frequent synchronization, coarse-grain
parallelism involves fewer, larger tasks, often with minimal interaction or communication between
them.
• Speed-up of applications with fine-grain parallelism is considerably lower as
compared to coarse-grained applications
• Data parallelism
 Data is partitioned into several blocks and the blocks are
processed in parallel.
• Same Program Multiple Data (SPMD)
 Data parallelism when multiple copies of the same program run
concurrently, each one on a different data block.
Parallelism levels

• Bit level parallelism


 Number of bits processed per clock cycle (often called a
word size)
 Increased from 4-bit, to 8-bit, 16-bit, 32-bit, and to 64-bit
• Instruction-level parallelism
 refers to the capability of a processor to execute multiple
instructions simultaneously. present within a single thread
 of execution,
Data parallelism or loop parallelism
 The program loops can be processed in parallel
• Task parallelism
 The problem can be decomposed into tasks that can be
carried out concurrently. For example, SPMD. Note that
data dependencies cause different flows of control in
individual tasks
Distributed systems

• A collection of
 Autonomous computers

 Connected through a network

 A distributed system is a collection of independent computers that


work together as a single system to achieve a common goal.
Characteristics of distributed systems

• Users perceive the system as a single, integrated


computing facility
• Components are autonomous
• Scheduling, resource management and security
policies are implemented by each system
• There are multiple
 Points of control
 Points of failure

• Resources may not be accessible at all times


• Can be scaled by adding additional resources
• Can be designed to maintain availability even at
low levels of hardware/software/network reliability
Desirable properties of a distributed system

• Access transparency
 Local & remote resources are accessed using
identical operations
• Location transparency
 Information objects are accessed without
knowing their location.
• Concurrency transparency
 several processes run concurrently using shared
information objects without interference among
them.
• Replication transparency
 where the system hides the complexity of data replication from
the user and the application.
Desirable properties of a distributed system

• Failure transparency
 Concealment of faults.
• Migration transparency
 Information objects in the system are moved
without affecting the operation performed on
them.
• Performance transparency
 System can be reconfigured based on the load
and quality of service requirements.
• Scaling transparency
 System & applications can scale without
changing the system structure and without
affecting the applications.
Processes, threads, events
• Dispatchable units of work:
 Process is a program in execution
 Thread is a light-weight process
• State of a process/thread
 Information required to restart a suspended process/thread,
e.g. program counter and the current values of the registers.
• Event
 A change of state of a process e.g. local or Communication
events
• Process group
 A collection of cooperating processes
 Processes cooperate & communicate to reach a common
goal
• Global state of a distributed system
 Distributed systems consists several processes &
communication channels
 Global state is the union of states of individual processes and
Messages and communication channels

• Message is a structured unit of information.


• Communication channel provides the means
for processes or threads to
 Communicate with one another
 Coordinate their actions by exchanging messages
 Communication is done using send(m) and receive(m)
system calls, where m is a message
• State of a communication channel
 Given two processes pi and pj, the state of the channel ξi,j,
from pi to pj consists of messages sent by pi but not yet
received by pj
• Protocol
 A finite set of messages exchanged among processes to
help them coordinate their actions
Events
Space-time diagrams display
local and communication events 1 2 11
e e e
3
e
4
e
during a process lifetime
1 1 1 1 1

p 1

 Local events are small black circles. (a)

 Communication events are


connected by lines from the send
event and to the receive event. 1
e
1 e
1
2 3
e
1 e
4
1 e
5
1
6
e
1
p 1

a) All events in process p1 are local e


1
2 e
2
2
3
e 2 e
4
2 e
5
2
p
 Process is in state σ1 immediately 2

(b)
after the occurrence of event e11 and
remains in that state until the
occurrence of event e21
b) 2 processes p1 and p2 1
e
1 e
2
1 e
3
1 e
4
1 e
5
1

 Events e12 & e23 are communication p 1

event
1 2 3 4

o p1 sends a message to p2 e 2 e 2 e 2 e 2

p 2

o p2 receives the message sent by p1.

c) 3 processes interact by means


1 2 3 4
e 3 e 3 e 3 e 3

p
of communication events. 3

(c)
Global state of a process group

• Global states of a distributed computation with n


processes form an n-dimensional lattice.
• The global state of a process group refers to a comprehensive snapshot of the
states of all processes within the group at a specific moment in time. Process
groups are often used in distributed systems, parallel computing, or operating
systems, and their global state is crucial for coordination, debugging, and fault
tolerance.

 How many paths to reach a global state exist?


 More the paths, harder to identify events leading to a
given state
 Debugging is quite difficult if there are large number of
paths
Global state of a process group

• In case of 2 processes, No. of paths from the


initial state Σ(0,0) to the state Σ(m,n) is:

• In 2-dimensional case, global state Σ (m,n) can


only be reached from two states, Σ(m-1,n) and
Σ(m,n-1)
0, 0
 1 2
e e
Global State
1 1

p
1


1, 0

0 ,1
p
2 1
e e 2
2
2
1 2
e e
(a) Lattice of 
2, 0

1,1

0, 2
p
1 1

global states of 2
1

p 2

processes with the 1 2


3, 0 e
2 e 2

2 ,1 1, 2
 
space-time 1
e
1 e
1
2

showing only the 3,1


p
1

2, 2

first 2 events per  p


2 1 2
e e
process.
2 2

4 ,1

3, 2 2,3 1 2
  e e 1 1

p
1

5 ,1 3, 3
p
 
4, 2 2, 4 2 1 2
  e e 2 2

(b) 6 possible e e
1
1
2
1

sequences of 5, 2
 
4,3

3, 4

2,5 p
1

events leading to p
2 1 2
e 2 e 2

the state Σ2,2 53


 
4, 4

3, 5
e
1
e
2
1 1

p 1

4,5 p

5, 4 2 1 2
 e 2 e 2

5, 5

(a) time
6,5 (b)

Process Coordination - Communication protocols

• A major challenge is to guarantee that two


processes will reach an agreement in case of
channel failures

• Communication protocols ensure process


coordination by implementing :
 Error control mechanisms
o Using error detection and error correction codes.
 Flow control
o Provides feedback from the receiver, it forces the sender to
transmit only the amount of data the receiver can handle.
 Congestion control
o Ensures that the offered load of the network does not exceed
the network capacity.
Process Coordination - Time & time intervals

• Process coordination requires:


 A global concept of time shared by cooperating entities.
 The measurement of time intervals, the time elapsed
between two events.
• Two events in the global history may be unrelated
 Neither one is the cause of the other
 Such events are said to be
• Local timers provide relative time measurements
 An isolated system can be characterized by its history i.e.
a sequence of events
• Global agreement on time is necessary to trigger
actions that should occur concurrently
• Timestamps are often used foconcurrent eventsr
event ordering
 Using a global time base constructed on local virtual clocks
Logical clocks
• Logical clock (LC)
 An abstraction necessary to ensure the clock condition in the
absence of a global clock
• A process maps events to positive integers.
 LC(e) is the local variable associated with event e.
• Each process time-stamps the message m it sends
with the value of the logical clock at the time of
sending:

• The rules to update the logical clock:

)
Logical Clocks

1 2 3 4 5 12
p 1

m 1 m 2 m
5

1 2 6 7 8 9
p 2

m 3 m
4

1 2 3 10 11
p 3

)
Message delivery rules; causal delivery

• A real-life network might reorder messages.


• First-In-First-Out (FIFO) delivery
 Messages are delivered in the same order they are sent.
• Causal delivery
 An extension of the FIFO delivery
 Used in case when a process receives messages from
different sources.
• Communication channel typically does not
guarantee FIFO delivery
 However, FIFO delivery is enforced by attaching a sequence
number to each message sent
 The sequence numbers are also used to reassemble
messages out of individual packets.
Atomic actions
• Parallel & distributed applications must take special
precautions for handling shared resources
• Atomic operation
 A multi-step operation should be allowed to proceed to completion
without any interruptions and should not expose the state of the
system until the action is completed
 Hiding the internal state of an atomic action reduces the No. of
states a system can be in
 Hence, it simplifies the design and maintenance of the system.
• Atomicity requires hardware support:
 Test-and-Set
o Instruction which writes to a memory location and returns the old
content of that memory cell as non-interruptible.
 Compare-and-Swap
o Instruction which compares the contents of a memory location to a
given value and, only if the two values are the same, modifies the
contents of that memory location to a given new value.
Atomicity

• Before-or-after atomicity
 Effect of multiple actions is as if these actions have
occurred one after another, in some order.
• A systematic approach to atomicity must
address several delicate questions:
 How to guarantee that only one atomic action has
access to a shared resource at any given time?
 How to return to the original state of the system
when an atomic action fails to complete?

 How to ensure that the order of several atomic


actions leads to consistent results?
All-or-nothing atomicity
• A transaction is either carried out successfully, or
the record targeted by the transaction is returned
to its original state.
• Two phases:
 Pre-commit phase
o During this phase it should be possible to back up from it without
leaving any trace.
o Commit point - the transition from the first to the second phase.
o During the pre-commit phase all steps necessary to prepare the
post-commit phase must be carried out, e.g. check permissions,
swap in main memory all pages that may be needed, mount
removable media, and allocate stack space
o During this phase no results should be exposed and no actions that
are irreversible should be carried out
 Post-commit phase
o Should be able to run to completion
o Shared resources allocated during the pre-commit cannot be
released until after the commit point.
Storage Models

• Cell storage does not


support all-or-nothing
actions
 When we maintain the version
histories it is possible to
restore the original content
 However, we need to
encapsulate the data access
and provide mechanisms to
implement the two phases of
an atomic all-or-nothing action

• The journal storage


does precisely that.
Consensus protocols

• Consensus
 Process of agreeing to one of several alternates
proposed by a number of agents.

• Consensus service
 Set of n processes
 Clients send requests, propose a value and wait
for a response
 Goal is to get the set of processes to reach
consensus on a single proposed value.
Consensus protocols

• Consensus protocol assumptions:


 Consensus is a fundamental concept in distributed systems where
multiple nodes (or processes) must agree on a single data value
or decision to ensure consistency across the system. Achieving
consensus is critical for maintaining reliability,
 Processors:
1. Operate at arbitrary speeds
2. Have stable storage and may rejoin the protocol after a
failure
3. Send messages to one another.
 Network
1. May lose, reorder, or duplicate messages
2. Messages are sent asynchronously
3. Message may take arbitrary long time to reach the
destination.
Client-Server Paradigm
• This paradigm is based on the enforced modularity
 Modules are forced to interact only by sending
and receiving messages.
• A more robust design
 Clients and servers are independent modules and may fail
separately.
• Servers are stateless
 May fail and then come up without the clients being
affected or even noticing the failure of the server.
• An attack is less likely
 Difficult for an intruder to guess the
o Format of the messages
o Sequence numbers of the segments, when messages are
transported by TCP
Services

a) Email service
 Sender and receiver
communicate
asynchronously using
inboxes and outboxes
 Mail demons run at each
site.
b) Event service
 An event service is a
middleware architecture or
(a)

platform that facilitates the


asynchronous
communication of events
between producers (senders)
and consumers (receivers) in
a distributed system. It acts
as an intermediary to
decouple components,
enabling scalability,
flexibility, and coordination
among system participants.
(b)
WWW
Browser Web Server

• 3-way handshake HTTP request


SYN
 First 3 messages exchanged RTT
SYN TCP connection establishment
between the client and the ACK + HTTP request
server
• Once TCP connection is ACK
Server residence time.
Web page created on the fly
established the HTTP server Data
takes its time to construct Data
Data transmission time
ACK
the page to respond the first
request
• To satisfy the second HTTP request

request, the HTTP server ACK

must retrieve an image from Server residence time.


Image retrieved from disk
the disk
• Response time includes
Image transmission time
 Round Trip Time (RTT) Image
 Server residence time
 Data transmission time

time time
HTTP Communication

A Web client can: HTTP client


request
Web
TCP HTTP
(a) communicate Browser
port
80 server
directly with the response
server

request to proxy
(b) communicate HTTP client
Web request to server
through a proxy Browser
Proxy
TCP port 80

response to client HTTP


server
response to proxy

(c) use tunneling to HTTP client request request


TCP HTTP
cross the network. Web Tunnel port
80 server
Browser
response response

You might also like