0% found this document useful (0 votes)
42 views

Unit 5 1

Uploaded by

Pranav Raj H
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views

Unit 5 1

Uploaded by

Pranav Raj H
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Unit V CS1603/Distributed Systems

UNIT V P2P & DISTRIBUTED SHARED MEMORY


Peer-to-peer computing and overlay graphs: Introduction – Data indexing and overlays – Chord –
Content addressable networks – Tapestry. Distributed shared memory: Abstraction and advantages –
Memory consistency models –Shared memory Mutual Exclusion.

Peer-to-peer (P2P) network systems use an application-level organization of the network


overlay for flexibly sharing resources (e.g., files and multimedia documents) stored across
network-wide computers.
All nodes (called peers) are equal i.e. client as well as server; communication directly
between peers (no client server)
The ongoing entry and exit of various nodes, as well as dynamic insertion and deletion of
objects is termed as churn.
Desirable characteristics and performance features of P2P systems
Features Performance
Self-organizing Large combined storage, CPU power, and resources
Distributed control Fast search for machines and data objects
Role symmetry for nodes Scalable
Anonymity Efficient management of churn
Naming mechanism Selection of geographically close servers
Security, authentication, trust Redundancy in storage and paths

NAPSTER AND ITS LEGACY


The need and the feasibility of a peer-to-peer solution were first demonstrated by the
Napster file sharing system which provided a means for users to share digital music files. Napster
became very popular for music exchange after its launch in 1999. At its peak, several million users
were registered and thousands were swapping music files simultaneously.
Napster’s architecture included centralized indexes, but users supplied the files, which
were stored and accessed on their personal computers. Napster’s method of operation is illustrated
in Figure 10.2. In step 5 clients are expected to add their own music files to the pool of shared
resources by transmitting a link to the Napster indexing service for each available file. Thus the
motivation for Napster and the key to its success was the making available of a large, widely
distributed set of files to users throughout the Internet, by providing access to ‘shared resources at
the edges of the Internet.

1
Unit V CS1603/Distributed Systems

Napster was shut down as a result of legal proceedings instituted against the operators of
the Napster service by the owners of the copyright in some of the material (i.e., digitally encoded
music) that was made available on it. Anonymity for the receivers and the providers of shared data
and other resources is a concern for the designers of peer-to-peer systems. If files are also
encrypted before they are placed on servers, the owners of the servers can plausibly deny any
knowledge of the contents.
Limitations
Napster used a (replicated) unified index of all available music files. Unless the access path
to the data objects is distributed, object discovery and addressing are likely to become a bottleneck.
Application dependencies
Napster took advantage of the special characteristics of the application for which it was
designed:
 Music files are never updated, avoiding any need to make sure all the replicas of files
remain consistent after updates.
 No guarantees are required concerning the availability of individual files – if a music file
is temporarily unavailable, it can be downloaded later.

APPLICATION LAYER OVERLAYS


A core mechanism in P2P networks is searching for data, and this mechanism depends on
how (i) the data, and (ii) the network, are organized. P2P search uses the P2P overlay, which is a
logical graph among the peers that is used for the object search and object storage and management
algorithms.

2
Unit V CS1603/Distributed Systems

Note that above the P2P overlay is the application layer overlay, where communication
between peers is point-to-point (representing a logical all-to-all connectivity) once a connection is
established.

DATA INDEXING AND OVERLAYS


The data in a P2P network is identified by using indexing. Data indexing allows the
physical data independence from the applications.
Classification of Indexing mechanisms
a. Centralized
b. Distributed
c. local
a. Centralized indexing entails the use of one or a few central servers to store references
(indexes) to the data on many peers. The DNS lookup as well as the lookup by some early
P2P networks such as Napster used a central directory lookup.
b. Distributed indexing involves the indexes to the data at various peers being scattered
across other peers throughout the P2P network. Access data through mechanisms such as
Distributed Hash Tables (DHT). These differ in the hash mapping, search algorithms,
diameter for lookup, search diameter, fault-tolerance, and resilience to churn.
A typical DHT uses a flat key space to associate the mapping between network
nodes and data objects/files/values. Specifically, the node address is mapped to a logical
identifier in the key space using a consistent hash function. The data object/file/value is
also mapped to the same key space using hashing. These mappings are illustrated in Figure.

c. Local indexing requires each peer to index only the local data objects and remote objects
need to be searched for. This form of indexing is typically used in unstructured overlays in
conjunction with flooding search or random walk search. Gnutella uses local indexing.

Other Classification
Semantic index mechanism: A semantic index is human readable, for example, a document name,
a keyword, or a database key. It supports keyword searches, range searches, and approximate
searches, whereas these searches are not supported by semantic free index mechanisms.
Semantic-free index mechanism: A semantic-free index is not human readable and typically
corresponds to the index obtained by a hash mechanism, e.g., the DHT schemes.

3
Unit V CS1603/Distributed Systems

Classification of P2P Overlay Network

4
Unit V CS1603/Distributed Systems

CHORD

5
Unit V CS1603/Distributed Systems

6
Unit V CS1603/Distributed Systems

For a query on key key at node i, if key lies between i and its successor, the key would reside at the
successor and the successor’s address is returned. If key lies beyond the successor, then node i searches
through the m entries in its finger table to identify the node j such that j most
immediately precedes key, among all the entries in the finger table. A s j is
the closest known node that precedes key, j is most likely to have the most
information on locating key, i.e., locating the immediate successor node to
which key has been mapped.

7
Unit V CS1603/Distributed Systems

8
Unit V CS1603/Distributed Systems

CONTENT ADDRESSIBLE NETWORKS (CAN)


CAN is a logical d-dimensional Cartesian coordinate space organized as a d-torus logical
topology, i.e., a virtual overlay d-dimensional mesh with wrap-around. A two-dimensional torus
is shown in Figure

The entire space is partitioned dynamically among all the nodes present, so that each node
i is assigned a disjoint region r(i) of the space. As nodes arrive, depart, or fail, the set of
participating nodes, as well as the assignment of regions to nodes, change.
The three core components of a CAN design are the following:
A. Setting up the CAN virtual coordinate space, and partitioning it among the nodes as they join
the CAN.
B. Routing in the virtual coordinate space to locate the node that is assigned the region containing
p.
C. Maintaining the CAN due to node departures and failures.

A. CAN initialization (Joining the CAN)


⚫ New node N1 attempts to locate node N2 already in the CAN, typically using the IP address
of a bootstrap node
⚫ Generate random point P in the space
⚫ Use hash function to locate zone that contains P
⚫ Send JOIN message to node N3 that owns zone that contains P
⚫ N3 splits its zone in half, assigns half to N1 by sending half of (key, value) pairs to N1,
along with neighbor information
⚫ N3 informs neighbors of space reallocation
When a node joins a CAN, only the neighboring nodes in the coordinate space are required
to participate in the joining process. The overhead is thus of the order of the number of neighbors,
which is O(d) and independent of n, the number of nodes in the CAN.

B. CAN routing
Each node stores the IP address and coordinate zone of adjoining, or neighboring, nodes
⚫ This data makes up the node’s routing table
⚫ Greedy algorithm
Uniform hash function is used to map key to point P
if P is within the Zone of current node,
return (key, value)

9
Unit V CS1603/Distributed Systems

else
forward the query to the neighbor with coordinates closest to P
C. CAN maintainence
When a node voluntarily departs from CAN, it hands over its region and the associated
database of (key, value) tuples to one of its neighbors. The neighbor is chosen as follows:
 If the node’s region can be merged with that of one of its neighbors to form a valid
convex region, then such a neighbor is chosen.
 Otherwise the node’s region is handed over to the neighbor whose region has the
smallest volume or load – the regions are not merged and the neighbor handles both
zones temporarily until a periodic background region reassignment process runs to
integrate the regions and prevent further fragmentation.
CAN optimizations
1. Multiple Dimensions
⚫ Increase number of dimensions
⚫ Reduce average path length
⚫ Reduce path latency
⚫ Increases routing table size due to greater number of neighbors
2. Multiple Realities
⚫ Increase number of Realities
⚫ Multiple coordinate spaces exist at the same time, each space is called a reality
⚫ Each node assigned a different node in each reality
⚫ Shorter paths, higher fault-tolerance
⚫ (key, value) mapping to P at (x,y,z) is possibly stored at three different nodes
3. Multiple Hash Functions
⚫ Multiple hash functions increases data availability, reduces query latency
⚫ Improve data availability by mapping a single key to k points in the coordinate space by
using k hash functions
⚫ (key, value) only unavailable when all nodes crash
4. Overload Coordinate Zones
⚫ Overload the coordinate zones by assigning more than one node to share the same zone
⚫ Reduces the average path length, improved fault-tolerance
⚫ No additional neighbors
5. Delay Latency (RTT Ratio)
⚫ Limiting the round-trip-time (RTT)
⚫ Each node measures RTT to neighbors
⚫ Favor the lower latency paths
6. Topologically sensitive overlay
The CAN overlay described so far has no correlation to the physical proximity or to the IP
addresses of domains. Logical neighbors in the overlay may be geographically far apart, and
logically distant nodes may be physical neighbors. By constructing an overlay that accounts for
physical proximity in determining logical neighbors, the average query latency can be significantly
reduced.
CAN complexity
The time overhead for a new joiner is O(d) for updating the new neighbors in the CAN,
and O(d/4 . log(n)) for routing to the appropriate location in the coordinate space. This is
also the overhead in terms of the number of messages.

10

You might also like