Merged_document
Merged_document
2
Hw/Sw Clocks
• Physical clocks in computers are realized as crystal
oscillation counters at the hardware level
– Correspond to counter register H(t)
– Used to generate interrupts
• Usually scaled to approximate physical time t, yielding
software clock C(t), C(t) = αH(t) + β
– C(t) measures time relative to some reference event, e.g., 64
bit counter for # of nanoseconds since last boot
– Simplification: C(t) carries an approximation of real time
– Ideally, C(t) = t (never 100% achieved)
– Note: Values given by two consecutive clock queries will differ
only if clock resolution is sufficiently smaller than processor
cycle time
3
Problems with H/W or S/W Clock
• Drift: Difference in the rate at which two clocks count the time
– Due to physical differences in crystals, plus heat, humidity, voltage, etc.
– Accumulated drift can lead to significant skew
4
Challenges
• Two clocks do not agree perfectly
• Skew: The time difference between two clocks
• Quartz oscillators vibrate at different rates
• Drift: The difference in rates of two clocks
• If we had two perfect clocks
– Skew = 0
– Drift = 0
5
Clock Skew
• When we detect a clock has a skew
• Eg: it is 5 seconds behind Or 5 seconds ahead
• What can we do?
6
Clock Skew: Impacts & Solutions
• When we detect a clock has a skew
• Eg: it is 5 seconds behind – We can advance it 5 seconds
to correct/Run it faster until it catches up
• Or 5 seconds ahead – Pushing back 5 seconds is a bad
idea/Run it slower until it catches up
• • This does not guarantee correct clock in future
• – Need to check and adjust periodically
• Problems due to Skew:
– Message was received before it was sent
– Document closed before it was saved etc..
– We want monotonicity: time always increases
7
How clocks synchronize ?
Obtain time from Time server:
Request
Time
Client Server
Reply: Time:
00:05:20
8
Causes of Inaccurate Time
Delays in message
transmission
9
Clock Inaccuracies
• Clock inaccuracies cause problems and troublesome in
solving tasks in distributed systems.
• The clocks of different nodes need to be synchronized to
limit errors.
• Need of an efficient communication or resource sharing.
• Clocks need to be monitored and adjusted continuously.
Otherwise, the clocks drift apart.
• Similarly clock skew also introduces mismatch in time
value of two clocks.
• Both these are to be addressed to make an efficient
usage of features of distributed systems.
11
Solutions
The synchronization solution using a central server is trivial; the
server will dictate the system time. (Single point of failure)
14
Cristian's algorithm
Client To T1
15
Cristian's algorithm
Cristian's algorithm works between a process P, and a time server S connected to a
time reference source.
Based on the answers, it computes an average time & tells all the
other nodes to advance their clocks to the new time or slow their
clocks down until some specific reduction has been achieved
Network
2.50 P1 P2 3.25
18
Example
The nodes answer the difference in their time w.r.t
time at Time Daemon ( i.e., -10 & +25)
3.00
Time Daemon
2.50 P1 P2 3.25
19
Example
The time daemon computes the average of time of all the nodes including time
daemon i.e, (3.00 + 2.50 + 3.25)/3 = 9.15/3 = 3.05.
Time Daemon
3.05
3.05 P1 P2 3.05
The leader polls the followers who reply with their time in a similar
way to Cristian's algorithm.
The leader observes the round-trip time (RTT) of the messages and
estimates the time of each follower and its own.
The leader then averages the clock times, ignoring any values it
receives far outside the values of the others.
A simple solution to this problem is to halt the clock for the duration
specified by the leader, but this simplistic solution can also cause
problems, although they are less severe. For minor corrections, most
systems slow the clock (known as "clock slew"), applying the
correction over a longer period of time.
An Event ?
24
Three Conditions proposed by Lamport:
(1) a -> b C(a) < C (b) (Happened Before Relations) indicates event
a is always earlier than b
The system clock, C = f(from events to the integers) where C(e)=Ci(e) and e is an event in Pi
Causal Functionality:
Given 2 events (e1, e2) where one is caused by the other (e1 contributes to e2 occurring). Then the
timestamp of the ‘caused by’ event (e1) is less than the other event (e2).
27
Implementation Rules
To provide this functionality any Logical Clock must provide 2 rules:
Rule 1: this determines how a local process updates its own clock when an event occurs.
Before executing an event (excluding the event of receiving a message) increment the
local clock by 1.
Local_clock = local_clock + 1
Rule 2: determines how a local process updates its own clock when it receives a message from another
process. This can be described as how the process brings its local clock inline with information about the
global time.
When receiving a message (the message must include the senders local clock value) set your local
clock to the maximum of the received clock value and the local clock value. After this, increment your
local clock by 1
2. local_clock = local_clock + 1
Let
Pi is process i
a, b & c …. are events in processes
Ci(a) is the time stamp of event ‘a’ of process Pi
IR is the implementation rule
Clock Condition to evaluate the logical clocks with the following correctness criterion
a, b, c, d, e, f, g, h, i, j, k, l, m are events
(2+1)
(2+1) (6+1)
34
Another Example
35
36
37
Logical Clock
38
Logical Clock
39
Another Example
40
Clock Values: 1 2 7 8
P1 e11 e12 e13 e14
Clock Values: 1 2 3 5 6
e21x e22 e23 e24 e25
P2
41
Limitations of Lamports Clock or
Scalar
W.R.T Implementation rule 1 and 2: Clock
[IR1]: If a → b then Ci(a) < Ci(b) True
[IR2]: If a → b then Ci(a) < Ci(b) May be or May not
Limitation:Difficult to predict whether Clock value of e11 < Clock
value of e31 or not ?
ThisRule
is called partial ordering
No Applied: 1 1
of events (Can not resolve clock issues
withClock
same counter values)
Values: (1) (2)
P1 e11 e12 Works globally & Communicate; Causal
dependency
Space
Example: Multiply by 10 and I to the clock value of Pi so that the values are
different from each other
Clock Val*10+2: 12 22 32 52 62
e21 e22 e23 e24 e25
P2
45
Vector Clocks
Given a process (Pi) with a vector (v), Vector Clocks implement the Logical
Clock rules as follows:
This is the element in the vector that refers to Node(i)’s local clock.
local_vector[i] = local_vector[i] + 1
Rule 2: When receiving a message (the message must include the senders vector)
loop through each element in the vector sent and compare it to the local vector,
updating the local vector to be the maximum of local and received clock value.
Clock Val: (0 1 0) (2 2 0) (2 3 1) (2 4 1)
P2 e21 e22 e23 e24
P3 e31 e32
Clock val (0 0 1) (0 0 2)
TIME ------
Clock Val: (0 1 0) (0 2 1) (2 3 1) (2 4 1)
P2 e21 e22 e23 e24
TIME ------
52
References
https://www.youtube.com/watch?v=VqZa4raMv_Q
53
Thank
You
54
LEADER ELECTION
RDBMS:
RDBMSs rely on leader election to pick a leader database which
handles all writes, and sometimes, all reads where election may58 be
Election Algorithms
Many distributed algorithms need one process to act as a
coordinator for coordinating all the activities in a distributed system
Examples:
(1) Take over the role of a failed process (Fault Tolerance)
Safety condition: Only a single node enters the elected state &
eventually become the leader. 60
Validity of LEA
A LEA is valid if it meets the following
conditions:
Termination: the algorithm should finish
within a finite time once the leader is
selected. In randomized approaches this
condition is sometimes weakened (for
example, requiring termination with
probability 1).
b. Ring Algorithm
Modified Ring Algorithm
63
Bully Algorithm: Basic Assumptions
1. The system is synchronous
Types of Messages:
Coordinator: For announcing the victory of election
Election Message: To initiate election process
Alive Message : To indicate the status of message
65
Bully Algorithm
When a process P recovers from failure, or the failure detector indicates that the
current coordinator has failed, P performs the following actions:
Step 1: If P has the highest process ID, it sends a Victory message to all other
processes and becomes the new Coordinator. Otherwise, P broadcasts an Election
message to all other processes with higher process IDs than itself.
Step 3: If P receives an Answer from a process with a higher ID, it sends no further
messages for this election and waits for a Victory message. (If there is no Victory
message after a period of time, it restarts the process at the beginning.)
Step 3: Now process P sends election message to every process with high
priority number
Step 4: It waits for response, if no one responds for time interval T then process
P elects itself as a coordinator
Step 5: Then it sends a message to all lower priority processes then it is elected
as their new coordinator
Process P again waits for time interval T to receive another message from Q
that it has been elected as coordinator 67
Example Bully Algorithm
Failed N3
N80
ELECT N5
N32
N12 N6
Detected
ELECT
the
failure
N32 know the id of all other processes (Every process knows the
ids of all other processes)
Best case
Second highest id detects leader failure
Sends (N-2) coordinator messages
Completion time: 1 message transmission time 71
Impossibility
Since timeouts built into protocol, in asynchronous
system model:
72
Disadvantages of Bully Algorithm
(a) Space Complexity is very large since every process
should know the identity of every other process in the
system.
73
Improved Bully Algorithm
Presented by A.arghavani, E.ahmadi, A.T.haghighat in 2011.
75
MODIFIED ELECTION ALGORITHM
Presented by M.S. Kordafshari, M.Gholipour, M.jahanshahi, A.T.haghighat in 2005.
3. Process with the higher priority sends ok message with its priority number to
process P.
4. When process p receive all the response it select the new coordinator with the
highest priority number process and sends the grant message to it.
5. Now the coordinator process will broadcast a new coordinator message to all
76
other process and informs itself as a coordinator.
Disadvantages of Modified Bully Algorithm
77
Ring Algorithm
The algorithm applies to system organized as a ring
(logically or physically)
78
Algorithm: Ring
Step1: If process P1 detects a coordinator failure, it creates a new
active list which is empty initially.
It sends election message to its neighbor on right and adds number
1 to its active list.
Coordinator
6 [2,3] 3
Previous Coordinator [5]
has crashed [2,3,4] 4
5
[5,6,0,1,2,3,4] Valu5 [2,3,4,5,6,0,1] Active List at Node
2 80
Example: Ring Algorithm
[5, 6, 0, 1, 2, 3, 4]
[2, 3, 4, 5, 6, 0, 1] 6 Coordinator
81
MODIFIED RING ALGORITHM
When a node notices that the leader has crashed, it sends its ID number to its
neighboring node in the ring. Thus, it is not necessary for all nodes to send their
IDs into the ring.
The receiving node compares the received ID with its own, and forwards whichever
is the greatest. This comparison is done by all the nodes such that only the
greatest ID remains in the ring.
If the received ID equals that of the initial sender, it declares itself as the leader
by sending a coordinate message into the ring.
It can be observed that this method dramatically reduces the overhead involved in
message passing.
Thus, if many nodes notice the absence of the leader at the same time, only the
message of the node with the greatest ID circulates in the ring thus, preventing
smaller IDs from being sent.
82
If n{i1,i2,··· ,im} is the number of nodes that concurrently detect the absence of
• Leader election is an important component of many cloud
computing systems
83
Applications of Leader Election
In Wireless Networks:
Key distribution,
Routing coordination,
Sensor coordination, and
General control.
In Cloud Computing:
Resolving Conflicts During Resource
sharing
84
How Amazon elects a leader ?
There are many ways to elect a leader, ranging from algorithms like
Paxos, to software like Apache ZooKeeper, to custom hardware, to
leases.
Leases:
are the most widely used leader election mechanism at Amazon.
For example:
86
Examples of systems using leader election at Amazon
Amazon EBS (Elastic Block Store) distributes reads and
writes for a volume (Solid State Drives/Hard Disk Drives)
over many storage servers.
DynamoDB:
Uses leader election protocol to elect a AWS Management Console
to monitor resource utilization and performance metrics of various
operations over data bases
Availability: This issues can occur if the work thread dies or stops
while the heartbeating thread holds on to the lease.
90
Characteristics of a Good Leader Election
Reliability: Have reliable metrics that show how much work a leader
can do versus how much it is doing now.
Scalability: Review the metrics often and make sure that there are
plans for scaling in advance of running out of capacity.
Flexibility: Make it easy to find which host is the current leader and
which host was the leader at any given time. Keep an audit trail or
log of leadership changes.
2. https://aws.amazon.com/builders-library/leader-election-in-distributed-systems/
92
Thank
You
93