Module 3
Module 3
Amirtha Preeya V
Assistant Professor/CSE
MODULE III
Although they may differ in the resources that they contribute, all
the nodes in a peer-to-peer system have the same functional
capabilities and responsibilities.
No Single point of failure.
Better scalability.
The challenge of designing a P2P network - All the peers are
connected to the internet, but how do they know which address
each peer has, and how does a given peer make sure it is
communicating with the correct peer?
A key issue for their efficient operation is the choice of an
algorithm for the placement of data across many hosts and
subsequent access to it in a manner that balances the workload.
Peer to-peer systems have many interesting technical aspects like decentralized
control, self- organization, adaptation and scalability.
In order to get the peers in the P2P network to communicate correctly, you
therefore have to solve two problems:
Peer Identification
Peer Location
Identifying peers means being able to distinguish each peer from each
other. Since millions of peers might be connected to the same P2P
network, you need to be able to address each peer individually. This is
done by assigning each peer a unique GUID (Globally Unique ID).
Locating peers means finding the peer with a specific GUID on the
network. With potentially millions of peers in the P2P network, a peer
cannot keep a fully up-to-date list of all peers in the network. Peers are
joining and leaving the network all the time. If all peers have to know each
other, imagine how many messages they would have to send to each other,
to keep each others list of peers up to date. It would be pretty much
impossible.
Instead of keeping a full list of all peers in the network, a peer keeps a
routing table with a subset of the peers in the network.
Example of Centralized P2P
Systems:
Napster and its Legacy
Napster
Provided a means for users to share music files –
primarily MP3s
Launched 1999 – several million users
Not fully peer-to-peer since it used central servers to
maintain lists of connected systems and the files they
provided, while actual transactions were conducted
directly between machines
Proved feasibility of a service using hardware and data
owned by ordinary Internet users
7
Napster and its
legacy
Napster architecture included centralized indexes but
users supplied the files, which were stored and accessed
on their personnel systems.
Napster method of operation as follows in step by
step. 1.File location Request
2. List of peers offering the files
3.File request
4.File delivered
5.Index
update
Napster and its
legacy
Napster used a (replicated) unified index of all available
music files.
Functional Requirements :
Load Balancing
Security of data
1 Insertion of objects:
A node wishing to make a new object available to a peer-to-
peer service computes a GUID for the object and announces it
to the routing overlay, which then ensures that the object is
reachable by all other clients.
2 Deletion of objects:
When clients request the removal of objects from the service
the routing overlay must make them unavailable.
3 Node addition and removal:
Nodes (i.e., computers) may join and leave the service. When
a node joins the service, the routing overlay arranges for it to
assume some of the responsibilities of other nodes when a
node leaves.
2 types of
overlays
1. Unstructured
2. Structured
Unstructured systems { do not impose any structure on the
overlay networks or loosely structured}
E.g., Napster, Gnutella, Freenet, FastTrack, eDonkey2000,
BitTorrent
Support complex search based on file metadata
Low search efficiency, especially for unpopular files
Structured systems { impose particular structures on the overlay
networks}
E.g., Distributed Hash Tables (DHTs)
The topology of the peer network is tightly controlled
Any file can be located in a small number of overlay hops
Structured overlays use a number of different geometries (rings,
trees, hypercubes, tori, XOR, . . . )
Types of Routing
Overlays
• DHT – Distributed Hash Tables
• put(GUID, data), remove(GUID) , value =
get(GUID)
• DOLR is a layer over the DHT that maps GUIDs and address of
nodes
Distributed File
System
What is File
system?
What is File
system?
A file system is a subsystem of operating system
that performs how data is stored, accessed and
managed.
1. Access transparency
2. Location transparency
3. Mobility transparency
4. Performance transparency
5. Scaling transparency
List of file accessing
models
The file accessing models of a distributed file system
mainly depends on two factors-the methods used for
accessing remote files and unit of data access:
Abort
Commit =>=> partial results
successful are undone
completion (All) (Nothing)
Transactions are delimited by two special
primitives:
Begin_transaction // or something similar
transaction operations
(read, write, open, close, etc.)
End_transaction
•
• Clients may obtain the UFID of a file by quoting its text name to
directory service.