0% found this document useful (0 votes)
89 views

5.1 Distributed Hash Table

return successor.find_successor(id); // base case if (id < n.id and id >= all other ids) return n; // recursive case return successor.find_successor(id); So successor pointers allow lookup in O(N) time.

Uploaded by

Dương Nguyễn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
89 views

5.1 Distributed Hash Table

return successor.find_successor(id); // base case if (id < n.id and id >= all other ids) return n; // recursive case return successor.find_successor(id); So successor pointers allow lookup in O(N) time.

Uploaded by

Dương Nguyễn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Distributed hash table (DHT)

Lecturer: Thanh-Chung Dao


Slides by Viet-Trung Tran
School of Information and Communication Technology
Outline

• Hashing
• Distributed Hash Table
• Chord

2
A Hash Table (hash map)

• A data structure implements an associative array that can


map keys to values.
• searching and insertions are 0(1) in the worse case
• Uses a hash function to compute an index into an array of
buckets or slots from which the correct value can be found.
• index = f(key, array_size)

3
Hash functions

• Crucial for good hash table performance


• Can be difficult to achieve
• WANTED: uniform distribution of hash values
• A non-uniform distribution increases the number of collisions and the
cost of resolving them

4
Hashing for partitioning usecase

• Objective
• Given document X, choose one of k servers to use
• Eg. using modulo hashing
• Number servers 1..k
• Place X on server i = (X mod k)
• Problem? Data may not be uniformly distributed
• Place X on server i = hash (X) mod k
• Problem?
• What happens if a server fails or joins (k à k±1)?
• What is different clients has different estimate of k?
• Answer: All entries get remapped to new nodes!

5
Distributed hash table (DHT)

• Distributed Hash Table (DHT) is similar to hash table but


spread across many hosts
• Interface
• insert(key, value)
• lookup(key)
• Every DHT node supports a single operation:
• Given key as input; route messages to node holding key

6
DHT: basic idea

K V
K V

K V
K V

K V

K V K V

K V

K V
K V
K V

7
DHT: basic idea

K V
K V

K V
K V

K V

K V K V

K V

K V
K V
K V

Neighboring nodes are “connected” at the application-level8


DHT: basic idea

K V
K V

K V
K V

K V

K V K V

K V

K V
K V
K V

Operation: take key as input; route messages to node holding key 9


DHT: basic idea

K V
K V

K V
K V

K V

K V K V

K V

K V
K V
K V
insert(K1,V1)

Operation: take key as input; route messages to node holding key 10


DHT: basic idea

K V
K V

K V
K V

K V

K V K V

K V

K V
K V
K V
insert(K1,V1)

Operation: take key as input; route messages to node holding key 11


DHT: basic idea

(K1,V1) K V
K V

K V
K V

K V

K V K V

K V

K V
K V
K V

Operation: take key as input; route messages to node holding key 12


DHT: basic idea

K V
K V

K V
K V

K V

K V K V

K V

K V
K V
K V

retrieve (K1)

Operation: take key as input; route messages to node holding key 13


How to design a DHT?

• State Assignment
• What “(key, value) tables” does a node store?
• Network Topology
• How does a node select its neighbors?
• Routing Algorithm:
• Which neighbor to pick while routing to a destination?
• Various DHT algorithms make different choices
• CAN, Chord, Pastry, Tapestry, Plaxton, Viceroy, Kademlia, Skipnet,
Symphony, Koorde, Apocrypha, Land, ORDI …

14
Chord: A scalable peer-to-peer look-up
protocol for internet applications
Credit: University of California, berkely and Max planck institute

15
Outline

• What is Chord?
• Consistent Hashing
• A Simple Key Lookup Algorithm
• Scalable Key Lookup Algorithm
• Node Joins and Stabilization
• Node Failures

16
What is Chord?

• In short: a peer-to-peer lookup system


• Given a key (data item), it maps the key onto a node (peer).
• Uses consistent hashing to assign keys to nodes .
• Solves the problem of locating key in a collection of
distributed nodes.
• Maintains routing information with frequent node arrivals
and departures

17
Consistent hashing

• Consistent hash function assigns each node and key an m-bit


identifier.
• SHA-1 is used as a base hash function.
• A node’s identifier is defined by hashing the node’s IP
address.
• A key identifier is produced by hashing the key (chord doesn’t
define this. Depends on the application).
• ID(node) = hash(IP, Port)
• ID(key) = hash(key)

18
Consistent hashing

• In an m-bit identifier space, there are 2m identifiers.


• Identifiers are ordered on an identifier circle modulo 2m.
• The identifier ring is called Chord ring.
• Key k is assigned to the first node whose identifier is equal to
or follows (the identifier of) k in the identifier space.
• This node is the successor node of key k, denoted by
successor(k).

19
Consistent hashing – Successor nodes

identifier
node
6
X key
1
0 successor(1) = 1
7 1

identifier
successor(6) = 0 6 6 circle 2 2 successor(2) = 3

5 3
4
2

20
Consistent hashing – Join and departure

• When a node n joins the network, certain keys previously


assigned to n’s successor now become assigned to n.
• When node n leaves the network, all of its assigned keys are
reassigned to n’s successor.

21
Consistent hashing – Node join

keys
5
7

keys
0 1

7 1

keys

6 2

keys
5 3 2

22
Consistent hashing – Node departure

keys
7

keys
0 1

7 1

keys
6 6 2

keys
5 3 2

23
A Simple key lookup

• If each node knows only how to contact its current successor


node on the identifier circle, all node can be visited in linear
order.
• Queries for a given identifier could be passed around the
circle via these successor pointers until they encounter the
node that contains the key.

24
A Simple key lookup

• Pseudo code for finding successor:


// ask node n to find the successor of id
n.find_successor(id)
if (id Î (n, successor])
return successor;
else
// forward the query around the circle
return successor.find_successor(id);

25
A Simple key lookup

• The path taken by a query from node 8 for key 54:

26
Scalable key location

• To accelerate lookups, Chord maintains additional routing


information.
• This additional information is not essential for correctness,
which is achieved as long as each node knows its correct
successor.

27
Scalable key location – Finger tables

• Each node n’ maintains a routing table with up to m entries


(which is in fact the number of bits in identifiers), called finger
table.
• The ith entry in the table at node n contains the identity of
the first node s that succeeds n by at least 2^i-1 on the
identifier circle.
• s = successor(n+2^i-1).
• s is called the ith finger of node n, denoted by n.finger(i)

28
Scalable key location – Finger tables

finger table keys


For. start succ. 6
0+20 1 1
0+21 2 3
0+22 4 0

finger table keys


0 For. start succ. 1

7 1 1+20 2 3
1+21 3 3
1+22 5 0

6 2

finger table keys


5 3 For. start succ. 2
3+20 4 0
4 3+21 5 0
3+22 7 0

29
Scalable key location – Finger tables

• A finger table entry includes both the Chord identifier and the
IP address (and port number) of the relevant node.
• The first finger of n is the immediate successor of n on the
circle.

30
Scalable key location – Example query

• The path a query for key 54 starting at node 8:

31
Scalable key location – A characteristic

• Since each node has finger entries at power of two intervals


around the identifier circle, each node can forward a query at
least halfway along the remaining distance between the node
and the target identifier. From this intuition follows a theorem:
• Theorem: With high probability, the number of nodes that must be
contacted to find a successor in an N-node network is O(logN).

32
Node joins and stabilizations

• The most important thing is the successor pointer.


• If the successor pointer is ensured to be up to date, which is
sufficient to guarantee correctness of lookups, then finger
table can always be verified.
• Each node runs a “stabilization” protocol periodically in the
background to update successor pointer and finger table.

33
Node joins and stabilizations

• “Stabilization” protocol contains 6 functions:


• create()
• join()
• stabilize()
• notify()
• fix_fingers()
• check_predecessor()

34
Node Joins – join()

• When node n first starts, it calls n.join(n’), where n’ is any


known Chord node.
• The join() function asks n’ to find the immediate successor of
n.
• join() does not make the rest of the network aware of n.

35
Node Joins – join()

// create a new Chord ring.


n.create()
predecessor = nil;
successor = n;

// join a Chord ring containing node n’.


n.join(n’)
predecessor = nil;
successor = n’.find_successor(n);

36
Node joins – stabilize()

• Each time node n runs stabilize(), it asks its successor for it’s
predecessor p, and decides whether p should be n’s successor
instead.
• stabilize() notifies node n’s successor of n’s existence, giving
the successor the chance to change its predecessor to n.
• The successor does this only if it knows of no closer
predecessor than n.

37
Node joins – stabilize()

// called periodically. verifies n’s immediate


// successor, and tells the successor about n.
n.stabilize()
x = successor.predecessor;
if (x Î (n, successor))
successor = x;
successor.notify(n);

// n’ thinks it might be our predecessor.


n.notify(n’)
if (predecessor is nil or n’ Î (predecessor, n))
predecessor = n’;

38
Node joins – Join and stabilization

n n joins
ns n predecessor = nil
n n acquires ns as successor via some n’
pred(ns) = n
n n runs stabilize
n n notifies ns being the new predecessor
n ns acquires n as its predecessor

succ(np) = ns
n np runs stabilize
pred(ns) = np

n
n np asks ns for its predecessor (now n)
n np acquires n as its successor
succ(np) = n

n np notifies n
nil n n will acquire np as its predecessor
n all predecessor and successor pointers are now
correct

n fingers still need to be fixed, but old fingers will


still work
np

39
Node Joins – fix_fingers()

• Each node periodically calls fix fingers to make sure its finger
table entries are correct.
• It is how new nodes initialize their finger tables
• It is how existing nodes incorporate new nodes into their
finger tables.

40
Node Joins – fix_fingers()

// called periodically. refreshes finger table entries.


n.fix_fingers()
next = next + 1 ;
if (next > m)
next = 1 ;
finger[next] = find_successor(n + 2next-1);

// checks whether predecessor has failed.


n.check_predecessor()
if (predecessor has failed)
predecessor = nil;

41
Node failures

• Key step in failure recovery is maintaining correct successor


pointers
• To help achieve this, each node maintains a successor-list of
its r nearest successors on the ring. Hence, all r successors
would have to simultaneously fail in order to disrupt the
Chord ring.
• If node n notices that its successor has failed, it replaces it
with the first live entry in the list
• Successor lists are stabilized as follows:
• node n reconciles its list with its successor s by copying s’s successor
list, removing its last entry, and prepending s to it.
• If node n notices that its successor has failed, it replaces it with the
first live entry in its successor list and reconciles its successor list with
its new successor.

42
Chord – The math

• Each node maintains O(logN) state information and lookups


needs O(logN) messages
• Every node is responsible for about K/N keys (N nodes, K keys)
• When a node joins or leaves an N-node network, only O(K/N)
keys change hands (and only to and from joining or leaving
node)

43
Interesting simulation results

• Adding virtual nodes as an indirection layer can significantly


improve load balance.
• The average path length is about ½(logN).
• Maintaining a set of alternate nodes for each finger and route
the queries by selecting the closest node according to
network proximity metric improves routing latency effectively.
• Recursive lookup style is faster iterative style

44
Applications: Time-shared storage

• For nodes with intermittent connectivity (server only


occasionally available)
• Store others‘ data while connected, in return having their
data stored while disconnected
• Data‘s name can be used to identify the live Chord node
(content-based routing)

45
Applications: Chord-based DNS

• DNS provides a lookup service


• keys: host names values: IP adresses
• Chord could hash each host name to a key
• Chord-based DNS:
• no special root servers
• no manual management of routing information
• no naming structure
• can find objects not tied to particular machines

46
What is Chord? – Addressed problems

• Load balance: chord acts as a distributed hash function,


spreading keys evenly over nodes
• Decentralization: chord is fully distributed, no node is more
important than any other, improves robustness
• Scalability: logarithmic growth of lookup costs with the
number of nodes in the network, even very large systems are
feasible
• Availability: chord automatically adjusts its internal tables to
ensure that the node responsible for a key can always be
found
• Flexible naming: chord places no constraints on the structure
of the keys it looks up.

47
Summary

• Simple, powerful protocol


• Only operation: map a key to the responsible node
• Each node maintains information about O(log N) other nodes
• Lookups via O(log N) messages
• Scales well with number of nodes
• Continues to function correctly despite even major changes of
the system

48
Thanks for your attention!

49

You might also like