0% found this document useful (0 votes)
4 views44 pages

DC Unit 3

The document outlines the Faculty Orientation Program for the Final Year of AI & DS Engineering, focusing on Distributed Computing Algorithms. It discusses key concepts such as communication and coordination in distributed systems, including message passing, Remote Procedure Call (RPC), and consensus algorithms like Paxos, Raft, and ZAB. Additionally, it addresses fault tolerance and recovery mechanisms in distributed systems.

Uploaded by

yt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views44 pages

DC Unit 3

The document outlines the Faculty Orientation Program for the Final Year of AI & DS Engineering, focusing on Distributed Computing Algorithms. It discusses key concepts such as communication and coordination in distributed systems, including message passing, Remote Procedure Call (RPC), and consensus algorithms like Paxos, Raft, and ZAB. Additionally, it addresses fault tolerance and recovery mechanisms in distributed systems.

Uploaded by

yt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

Faculty Orientation Program 2023-24 (SEM-II)

Final Year of AI & DS Engineering (2020 Course)

417531: Distributed Computing


Savitribai Phule Pune University
Ganeshkhind Rd, Ganeshkhind, Pune, Maharashtra
Presented by:

Dr. Bhandari R. R
Associate Professor & HOD AI & DS Engineering
SNJB’s Late Sau. K. B. Jain College of Engineering
Unit 3 Distributed Computing Algorithms
Unit 3 Distributed Computing Algorithms

Communication and Coordination in Distributed System

● Communication and coordination in distributed systems are crucial aspects that determine the efficiency,
reliability, and overall performance of the system.
● Key considerations for communication and coordination in distributed systems:

1. Message Passing
2. Remote Procedure Call (RPC) and Remote Method Invocation (RMI)
3. Data Replication
4. Coordination Models
5. Distributed Transactions
6. Fault Tolerance
7. Consensus Algorithms
8. Monitoring and Logging
Message Passing

● Message passing is a flexible and scalable method for inter-node communication in distributed systems.
● It enables nodes to exchange information, coordinate activities, and share data without relying on shared
memory or direct method invocations.
● Models like synchronous and asynchronous message passing offer different synchronization and
communication semantics to suit system requirements.
● Synchronous message passing ensures sender and receiver synchronization, while asynchronous message
passing allows concurrent execution and non-blocking communication.
● It takes a huge time because it is performed via the kernel (system calls).
● It is useful for sharing little quantities of data without causing disputes.
Remote Procedure Call (RPC)

● Remote Procedure Call (RPC) is a powerful technique for constructing distributed, client-server based
applications
● It is based on extending the conventional local procedure calling so that the called procedure need not
exist in the same address space as the calling procedure
● The two processes may be on the same system, or they may be on different systems with a network
connecting them.
Remote Procedure Call (RPC)

1. The calling environment is


suspended, procedure parameters are
transferred across the network to the
environment where the procedure is to
execute, and the procedure is executed
there.

2. When the procedure finishes and


produces its results, its results are
transferred back to the calling
environment, where execution resumes
as if returning from a regular procedure
call.
The following steps take place during a RPC :

1. A client invokes a client stub procedure, passing parameters in the usual way. The client stub resides
within the client’s own address space.
2. The client stub marshalls(pack) the parameters into a message. Marshalling includes converting the
representation of the parameters into a standard format, and copying each parameter into the message.
3. The client stub passes the message to the transport layer, which sends it to the remote server machine.
4. On the server, the transport layer passes the message to a server stub, which demarshalls(unpack) the
parameters and calls the desired server routine using the regular procedure call mechanism.
5. When the server procedure completes, it returns to the server stub (e.g., via a normal procedure call
return), which marshalls the return values into a message. The server stub then hands the message to
the transport layer.
6. The transport layer sends the result message back to the client transport layer, which hands the
message back to the client stub.
7. The client stub demarshalls the return parameters and execution returns to the caller.
ADVANTAGES :

1. RPC provides ABSTRACTION i.e message-passing nature of network communication is hidden from
the user.
2. RPC often omits many of the protocol layers to improve performance. Even a small performance
improvement is important because a program may invoke RPCs often.
3. RPC enables the usage of the applications in the distributed environment, not only in the local
environment.
4. With RPC code re-writing / re-developing effort is minimized.
5. Process-oriented and thread oriented models supported by RPC.
Disadvantages of Remote Procedure Calls:

● In Remote Procedure Calls parameters are only passed by values as pointer values are not
allowed.
● It involves a communication system with another machine and another process, so this
mechanism is extremely prone to failure.
● The RPC concept can be implemented in a variety of ways, hence there is no standard.
● Due to the interaction-based nature, there is no flexibility for hardware architecture in RPC.
● Due to a remote procedure call, the process’s cost has increased.
Remote Method Invocation

● RMI stands for Remote Method Invocation. It is a mechanism that allows an object
residing in one system (JVM) to access/invoke an object running on another JVM.
● RMI is used to build distributed applications; it provides remote communication between Java
programs. It is provided in the package java.rmi.

Architecture of an RMI Application


● In an RMI application, we write two programs, a server program (resides on the server)
and a client program (resides on the client).
● Inside the server program, a remote object is created and reference of that object is made
available for the client (using the registry).
● The client program requests the remote objects on the server and tries to invoke its
methods.
The following diagram shows the architecture of an RMI application.
Consensus Algorithms

Consensus algorithms are fundamental to distributed computing. In a distributed system, multiple nodes communicate with
each other to perform a common task. Consensus algorithms help these nodes agree on a shared value, even when some nodes
may fail or be unreliable.

1. Paxos

Variant of Pexos
Consensus Algorithms

Consensus algorithms are fundamental to distributed computing. In a distributed system, multiple nodes communicate with
each other to perform a common task. Consensus algorithms help these nodes agree on a shared value, even when some nodes
may fail or be unreliable.

1. Paxos

● The Paxos algorithm was first introduced by Leslie Lamport in 1990 and has since become
widely used in distributed systems.
● The algorithm works by allowing a group of nodes to agree on a single value, even if some nodes
fail or are unresponsive.
Paxos

The algorithm consists of three phases:

1. Phase 1 (Prepare Phase): A node that wants to propose a value as the agreed value sends a prepare message to all

other nodes. This message contains a proposal number, which is a unique identifier for this particular proposal.

Each node that receives the prepare message responds with a promise to not accept any proposals with a lower

proposal number than the one in the prepare message.

2. Phase 2 (Accept Phase): Once a node receives promises from a majority of nodes, it can send an accept message to

all nodes containing the proposed value and the proposal number. Each node that receives this accept message will

accept the proposed value only if it has not already promised to accept a higher numbered proposal.

3. Phase 3 (Commit Phase): Once a node receives accept messages from a majority of nodes, it can send a commit

message to all nodes containing the agreed-upon value. The other nodes will then update their state with this

agreed-upon value.
Consensus Algorithms

2. Raft

The algorithm consists of three main components:

1. Leader Election:
2. Log Replication:
3. Committing Entries
Raft Algorithms

● The Raft algorithm was introduced by Diego Ongaro and John Ousterhout in 2014 as an
alternative to Paxos that is easier to understand and implement.
● It also follows a leader-follower architecture, where one node is designated as the leader and
the other nodes act as followers.
Raft Algorithms

● The algorithm consists of three main components:


1. Leader Election: The first step in the Raft algorithm is electing a leader. Nodes in the system use a randomized

timer to periodically send out heartbeats to other nodes. If a follower does not receive a heartbeat from the leader

within a certain amount of time, it assumes that the leader has failed and initiates a new leader election.

2. Log Replication: Once a leader is elected, it can accept client requests and replicate them to the other nodes in the

system. When a leader receives a client request, it appends the request to its log and sends append entries messages

to the other nodes. These messages contain information about the log entry, including the index and term number.

3. Committing Entries: Once a log entry has been replicated to a majority of nodes in the system, the leader can send

a commit message to all nodes, indicating that the entry has been committed. The other nodes will then apply the

entry to their state machine and respond with an acknowledgement message.


Consensus Algorithms

3. ZAB

The Zab (ZooKeeper Atomic Broadcast) algorithm is a consensus protocol used in distributed systems. It is specifically designed
to provide reliable and ordered message delivery among a group of nodes, ensuring consistency and fault-tolerance.
ZAB Algorithms

Analogy :

Imagine a group of friends who want to play a game and make sure that everyone gets the
same instructions in the right order. They need someone to be the leader, who will give
instructions to all the other friends. But they also want to make sure that if the leader
becomes unavailable, another friend can take over and continue giving instructions.
Consensus Algorithms: ZAB

The Zab algorithm works like this:

1. Electing the Leader: The friends need to choose a leader among themselves. They do this by raising their hands and
voting for a leader. The friend who gets the most votes becomes the leader. This leader will be responsible for sending
instructions to everyone.
2. Broadcasting Instructions: The leader starts by giving an instruction to everyone, like "Jump three times!" The
friends receive the instruction and follow it. They acknowledge that they have received the instruction.
3. Acknowledgments: Once the leader receives acknowledgments from a majority of the friends, it knows that the
instruction has been successfully delivered. This ensures that everyone has received and understood the instruction.
4. Handling Leader Failures: Sometimes the leader may become unavailable, like if they leave the game or get too
busy. In that case, the friends need to elect a new leader. They use a similar voting process to choose a new leader
among themselves. This ensures that there is always a leader to give instructions.
5. Ensuring Order: The Zab algorithm guarantees that the instructions are delivered and executed in the same order by
all friends. This means that if the leader says, "Jump three times!" and then says, "Clap your hands!" everyone will
follow the instructions in that order. This ensures consistency among all friends.
Consensus Algorithms: ZAB
1. Role of Zab in Distributed Systems: Zab is responsible for maintaining a consistent state among multiple nodes in a
distributed system. It achieves this by ensuring that all nodes receive and process messages in the same order.
2. Leader Election: Zab starts by electing a leader node among the participating nodes. The leader is responsible for
generating and broadcasting messages to all other nodes. This leader election process guarantees that there is always a
designated leader for message coordination.
3. Message Broadcasting: The leader generates messages, assigns each message a unique sequence number, and sends
them to all nodes in the system. The messages are transmitted in order, ensuring that all nodes receive them in the same
sequence.
4. Acknowledgments and Quorums: After receiving a message, each node sends an acknowledgment (ACK) back to the
leader. Once the leader receives acknowledgments from a majority of nodes, it knows that the message has been
successfully delivered and can proceed to the next message.
5. Handling Failures: Zab handles failures by electing a new leader when the current leader becomes unavailable. It uses a
variant of the Paxos algorithm to perform leader election, ensuring that a reliable leader is always present to coordinate
message delivery.
6. Atomic Broadcast Guarantee: Zab guarantees atomic broadcast, meaning that either all nodes receive a message in the
same order or none of them receive it. This ensures consistency across the distributed system and prevents
inconsistencies caused by message reordering or losses.
7. Snapshotting: In addition to message delivery, Zab also supports snapshotting, which allows a node to save and restore
its state from a particular point in time. This feature is essential for efficient recovery and replication in distributed
systems.
Consensus Algorithms: Proof of Work
● Proof of Work consensus is the mechanism of choice for the majority of cryptocurrencies currently in
circulation. The algorithm is used to verify the transaction and create a new block in the blockchain.
● The idea for Proof of Work(PoW) was first published in 1993 by Cynthia Dwork and Moni Naor and was later
applied by Satoshi Nakamoto in the Bitcoin paper in 2008.
● The term “proof of work” was first used by Markus Jakobsson and Ari Juels in a publication in 1999.
● Cryptocurrencies like Litecoin, and Bitcoin are currently using PoW. Ethereum was using PoW mechanism, but
now shifted to Proof of Stake(PoS).
Purpose of : Proof of Work
● The purpose of a consensus mechanism is to bring all the nodes in agreement, that is, trust one another, in an
environment where the nodes don’t trust each other.
● All the transactions in the new block are then validated and the new block is then added to the blockchain.
● The block will get added to the chain which has the longest block height
● Miners(special computers on the network) perform computation work in solving a complex mathematical
problem to add the block to the network, hence named, Proof-of-Work.
● With time, the mathematical problem becomes more complex.
Features: Proof of Work

Features of PoW

There are mainly two features that have contributed to the wide popularity of this consensus protocol and
they are:
● It is hard to find a solution to a mathematical problem.
● It is easy to verify the correctness of that solution.
How does PoW Work:

The PoW consensus algorithm involves verifying a transaction through the mining process. This section focuses on
discussing the mining process and resource consumption during the mining process.

Mining:

The Proof of Work consensus algorithm involves solving a computationally challenging puzzle in order to create new
blocks in the Bitcoin blockchain. The process is known as ‘mining’, and the nodes in the network that engages in
mining are known as ‘miners’.
● The incentive for mining transactions lies in economic payoffs, where competing miners are rewarded with
6.25 bitcoins and a small transaction fee.
● This reward will get reduced by half its current value with time.
How does PoW Work: Example

View
Consensus Algorithms

Clickhere for Detail


Consensus Algorithms

5. Proof-of-Stake (PoS)

● Proof of stake is a consensus mechanism used to verify new cryptocurrency transactions. Since
blockchains lack any centralized governing authorities, proof of stake is a method to guarantee
that data saved on the network is valid.

● Decentralization is at the heart of blockchain technology and cryptocurrency.

● There’s no central gatekeeper to manage a blockchain’s record of transactions and data.

● Proof of stake is the consensus mechanism that helps choose which participants get to handle this
lucrative task—lucrative because the chosen ones are rewarded with new crypto if they accurately
validate the new data and don’t cheat the system.
Consensus Algorithms

5. Proof-of-Stake (PoS)

● Solana, Terra and Cardano are among the biggest cryptocurrencies that use proof of stake.
Ethereum, the second-largest crypto by market capitalization after Bitcoin, is in the midst of a
transition from proof of work to proof of stake.


Consensus Algorithms

7. Simplified Practical Byzantine Fault Tolerance (SPBFT)

● SPBFT aims to reduce communication complexity and improve performance while maintaining Byzantine fault
tolerance. It is often used in permissioned blockchain systems.
Fault tolerance and Recovery in Distributed Systems,

The ability of the system to continue operating as intended even in the event of a failure is known as fault tolerance.

Types of Faults
1. Transient Faults
2. Intermittent Faults
3. Permanent Faults

Phases of Fault Tolerance in Distributed System


Fault tolerance and Recovery in Distributed Systems,

Types of Fault Tolerance in Distributed Systems


1. Hardware Fault Tolerance:
2. Software Fault Tolerance
3. System Fault Tolerance

Recovery in Distributed Systems

Recovery in distributed systems refers to the ability of a system to recover from failures or faults and continue operating in a
consistent and reliable manner. Here are some key concepts and strategies related to recovery in distributed systems:

1. Replication & Redundancy


2. Checkpointing Mechanisms
3. Rollback recovery protocols
4. Logging and Auditing
Load Balancing and Resource allocation strategies:

1. A load balancer is a device that acts as a reverse proxy and distributes network or application traffic across a number of
servers.

Load Balancing Approaches:

● Round Robin
● Least Connections
● Least Time
● Hash
● IP Hash

Classes of Load Adjusting Calculations:


Load Balancing and Resource allocation:

● Weighted Round Robin (WRR)


● Least Connections
● Randomized Load Balancing:
● Dynamic Load Balancing
● Centralized load balancing
● Distributed load balancing
● Predictive load balancing
Applying AI techniques to optimize distributed computing algorithms

Artificial Intelligence (AI) techniques can be applied to optimize distributed computing algorithms in various ways, enhancing
their efficiency, adaptability, and overall performance.

● Genetic Algorithms

● Reinforcement Learning

● Machine Learning for Predictive Analytics

● Dynamic Load Balancing:

Neural Networks: Employ neural networks to learn patterns in the workload distribution and dynamically adjust the load
balancing strategy to optimize resource utilization across nodes.

● Anomaly Detection:

Utilize AI-driven anomaly detection to identify unusual behavior or potential faults in the distributed system.
Applying AI techniques to optimize distributed computing algorithms

● Deep Learning for Consensus:

Explore the use of deep learning to predict optimal consensus algorithm configurations based on historical performance
data and current system conditions.

● Energy Efficiency:

AI for Green Computing: Employ AI techniques to optimize energy consumption in distributed systems by dynamically
adjusting resource allocation and workload distribution based on energy efficiency models.


Machine Learning for Resource Allocation

Machine Learning for Resource Allocation

Machine learning can help you learn and improve your resource allocation decisions continuously, based on the outcomes,
results, and impacts of your actions. This can help you enhance your efficiency, quality, and productivity over time, and adapt
to changing conditions and requirements.

Here are ways machine learning can be applied to optimize resource allocation:

Demand Prediction:

Regression Models: Train regression models to predict the resource requirements of applications based on historical usage patterns, workload
characteristics, and other relevant features. This enables proactive resource provisioning.

Classification Models:

Use classification models to determine whether auto-scaling is needed based on predicted future demand. This can automate the decision-making
process for dynamically adjusting the number of resources in response to workload changes.

Anomaly Detection:

Anomaly Detection Algorithms: Implement anomaly detection models to identify unusual behavior in resource consumption. This helps detect
abnormal resource usage patterns, indicating potential issues or changes in the application's behavior.
Machine Learning for Resource Allocation
QoS Optimization:

Reinforcement Learning: Apply reinforcement learning to optimize Quality of Service (QoS) by dynamically adjusting resource allocation
parameters. The model can learn to balance performance, cost, and other factors based on feedback from the system.

Workload Placement:

Clustering Algorithms: Use clustering algorithms to group similar workloads together. This facilitates efficient resource allocation by placing
similar workloads on the same nodes, reducing contention for resources.

Cost Prediction Models: Develop models to predict the cost associated with different resource allocation scenarios. This helps in making
informed decisions regarding resource allocation based on cost-effectiveness.

Dynamic Priority Scheduling:

Decision Trees or Neural Networks: Employ machine learning models to dynamically assign priorities to different tasks or applications. This
enables intelligent scheduling, ensuring that critical tasks receive appropriate resources.

Multi-Objective Optimization:

Evolutionary Algorithms: Use evolutionary algorithms to optimize resource allocation for multiple objectives, such as minimizing cost,
maximizing performance, and ensuring fairness among different applications.

Adaptive Configuration:

Deep Learning: Leverage deep learning models to learn optimal configurations for various applications in different contexts. This enables the
system to adaptively configure resource allocation settings based on real-time conditions.
Machine Learning for Resource Allocation

Examples of ML for resource allocation:

● Google Cloud Platform uses ML to automatically allocate resources to virtual machines.


● Amazon Web Services uses ML to optimize resource allocation in its data centers.
● Netflix uses ML to allocate bandwidth to different users based on their viewing habits.
● Uber uses ML to allocate drivers to passengers based on their location and demand.
Reinforcement Learning for Dynamic Load Balancing

Reinforcement learning (RL) can be a powerful approach for dynamic load balancing in distributed systems, allowing the system
to adapt and optimize load distribution based on real-time conditions.

State Representation

Action Space

Reward Function

RL Algorithm

Training Environment

Periodic Retraining

Integration with Monitoring


Genetic Algorithms for Task Scheduling

Genetic Algorithms (GAs) can be effectively employed for task scheduling in distributed systems, where the goal is to assign tasks
to various processing units to optimize overall system performance. Here's how genetic algorithms can be applied to task
scheduling:

Chromosome Representation

Initial Population

Fitness Function

Selection

Crossover (Recombination)

Mutation

Replacement
Swarm Intelligence for Distributed Optimization

Swarm intelligence is a computational paradigm inspired by the collective behavior of social organisms, such as ant colonies, bird
flocks, and bee swarms. In the context of distributed optimization, swarm intelligence algorithms are used to solve complex
problems by simulating cooperation and interaction among a group of simple agents. Here are some common swarm
intelligence algorithms applied to distributed optimization:

Ant Colony Optimization (ACO):

Particle Swarm Optimization (PSO):

Bee Algorithm (BA):

Firefly Algorithm (FA):

Artificial Bee Colony (ABC):

Grey Wolf Optimizer (GWO):

Bat Algorithm (BA):

You might also like