0% found this document useful (0 votes)
24 views

Module 3

The document discusses peer-to-peer services and file systems. It describes peer-to-peer networks and how they differ from client-server architectures by having equal nodes that share resources directly. It provides examples like Napster and Skype and discusses concepts like peer identification, location, routing overlays, and distributed file systems.

Uploaded by

sumanthpopuri67
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

Module 3

The document discusses peer-to-peer services and file systems. It describes peer-to-peer networks and how they differ from client-server architectures by having equal nodes that share resources directly. It provides examples like Napster and Skype and discusses concepts like peer identification, location, routing overlays, and distributed file systems.

Uploaded by

sumanthpopuri67
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 78

DISTRIBUTED SYSTEM

PEER TO PEER SERVICES AND FILE


SYSTEM

Amirtha Preeya V
Assistant Professor/CSE
MODULE III

PEER TO PEER SERVICES AND FILE SYSTEM


Peer-to-peer Systems – Introduction – Napster and its legacy – Peer-to-
peer – Middleware – Routing overlays. Distributed File Systems –
Introduction – File service architecture – Andrew File system. File System:
Features-File model -File accessing models – File sharing semantics.
Peer to Peer
System
What is a Peer-to-Peer (P2P) system?

In P2P applications a node may act


as both a client and a server
What is a Peer-to-Peer (P2P) system?

 “All nodes are equals”


 Peer to peer is a type of architecture in which nodes are
interconnected with each other and share resources with each other
without the central controlling server.

Objective: to balance network traffic and reduce the load on the
primary host.

P2P system allows us to construct such a distributed system or a
application in which all resources and data is contributed by
the hosts over the network.

Their operation does not depend on the existence of any centrally
administrated systems.

Skype (VoIP)-examples of P2P applications which is used for
digital content distribution.
Characteristics of Peer-to-Peer
systems

Although they may differ in the resources that they contribute, all
the nodes in a peer-to-peer system have the same functional
capabilities and responsibilities.
No Single point of failure.
Better scalability.
The challenge of designing a P2P network - All the peers are
connected to the internet, but how do they know which address
each peer has, and how does a given peer make sure it is
communicating with the correct peer?
A key issue for their efficient operation is the choice of an
algorithm for the placement of data across many hosts and
subsequent access to it in a manner that balances the workload.
Peer to-peer systems have many interesting technical aspects like decentralized
control, self- organization, adaptation and scalability.
In order to get the peers in the P2P network to communicate correctly, you
therefore have to solve two problems:
 Peer Identification
 Peer Location
 Identifying peers means being able to distinguish each peer from each
other. Since millions of peers might be connected to the same P2P
network, you need to be able to address each peer individually. This is
done by assigning each peer a unique GUID (Globally Unique ID).
 Locating peers means finding the peer with a specific GUID on the
network. With potentially millions of peers in the P2P network, a peer
cannot keep a fully up-to-date list of all peers in the network. Peers are
joining and leaving the network all the time. If all peers have to know each
other, imagine how many messages they would have to send to each other,
to keep each others list of peers up to date. It would be pretty much
impossible.
 Instead of keeping a full list of all peers in the network, a peer keeps a
routing table with a subset of the peers in the network.
Example of Centralized P2P
Systems:
Napster and its Legacy
 Napster
 Provided a means for users to share music files –
primarily MP3s
 Launched 1999 – several million users
 Not fully peer-to-peer since it used central servers to
maintain lists of connected systems and the files they
provided, while actual transactions were conducted
directly between machines
 Proved feasibility of a service using hardware and data
owned by ordinary Internet users

7
Napster and its
legacy
 Napster architecture included centralized indexes but
users supplied the files, which were stored and accessed
on their personnel systems.
 Napster method of operation as follows in step by
step. 1.File location Request
2. List of peers offering the files
3.File request
4.File delivered
5.Index
update
Napster and its
legacy
 Napster used a (replicated) unified index of all available
music files.

 Object discovery and addressing is likely to become a


bottleneck.

 Music files are never updated, avoiding any need to make


all the replicas of file consistent after updates.

 No guarantees are require concerning the availability of


individual files – if a music file is temporarily
unavailable, it can be downloaded later when it
available. This reduces the requirement for dependability
of individual computers and their connections to the
internet.
Peer-to-Peer
Middleware
 Peer To Peer Middleware

 To provide mechanism to access data resources anywhere


in network

 Functional Requirements :

Peer clients need to locate and communicate with any


available resource, even though resources may be
widely distributed
Add and remove resources at will
Add and remove new hosts at will
Interface to application programmers should be simple
and independent of types of distributed resources
Non-functional
requirement
 Global Scalability

 Peer-to-peer middleware must be designed to support


applications that access millions of objects on hundred of
thousands of hosts.

 Load Balancing

 The performance of any system designed to exploit a large


number of computers depends upon the balanced distribution of
workload across them.

 For the systems we are considering, this will be achieved by a


random placement of resources together with the use of replicas
of heavily used resources.
 Optimization for local interactions between
neighbouring peers
 The middleware should aim to place resources close to the nodes
that access them the most.
Non-functional
requirement
 Accommodating to highly dynamic host availability
 As hosts join the system, they must integrated into the
system and the load must be re-distributed to exploit their
new resources. When they leave the system, the system must
detect their departure and re-distribute their load and
resources.

 Security of data

 Trust must be built up by the use of authentication and


encryption mechanisms to ensure integrity and privacy of
information.

 Anonymity, deniability, and resistance to censorship


(in some applications)

 Host that hold data should be able to deny responsibility for


holding or supplying it.
Overlay
Networks
SKYPE AS OVERLAY
NETWORK
 Skype is an application that provides video chat and
voice call services.

 Skype is available for Microsoft Windows, Macintosh, or


Linux, as well as Android, Blackberry, and both Apple
and Windows smartphones and tablets.

 Skype allows users to communicate over the Internet by


voice using a microphone, by video by using a webcam,
as well as with instant messaging.

 Skype-to-Skype calls to other users are free of charge


SKYPE
ARCHITECTURE
 Skype is an overlay peer-to-peer network.

 There are two types of nodes in this overlay network,


ordinary hosts and super nodes (SN).

 An ordinary host is a Skype application that can be


used to place voice calls and send text messages.

 A super node is an ordinary host’s end-point on the


Skype network. Any node with a public IP address
having sufficient CPU, memory, and network
bandwidth is a candidate to become a super node.
Skype Peer-to-Peer Internet Telephony Protocol
 An ordinary host must connect to a super node and must
register itself with the Skype login server for a
successful login.

 The Skype login server is an important entity in the


Skype network. User names and passwords are stored at
the login server.

 User authentication at login is also done at this server.

 This server also ensures that Skype login names are


unique across the Skype name space.
PROTOCOL

 Skype uses a proprietary Internet telephony (VoIP)


network called the Skype protocol.

 The protocol has not been made publicly available by


Skype, and official applications using the protocol
are closed-source.

 The main difference between Skype and standard


VoIP clients is that Skype operates on a peer-to-peer
model , rather than the more usual client–server
model.
Routing
Overlays
 A routing overlay is a distributed algorithm for
a middleware layer responsible for routing
requests from any client to a host that holds
the object to which the request is addressed.

 Responsible for locating nodes and objects

 Implements a routing mechanism in the


application layer

 Ensures that any node can access any object


by
 routing each request
Exploits knowledge through
at each node to locate a sequence
the destination of
nodes
 Peer-to-peer systems usually store multiple replicas
of objects to ensure availability.
 In that case, the routing overlay
knowledge
maintainsof the location of all the available replicas
and delivers requests to the nearest ‘live’ node (i.e.
one that has not failed) that has a copy of the
relevant object.

 GUIDs (Globally Unique Identifiers) used to identify


nodes and objects. These are also known as opaque
identifiers, since they reveal nothing about the
locations of the objects to which they refer.

 Assigning GUIDs- Peers are assigned a GUID when


they join an existing network.
The main task of a routing overlay is the following:

o Routing of requests to objects:


A client wishing to invoke an operation on an object
submits a request including the object’s GUID to the
routing overlay, which routes the request to a node at
which a replica of the object resides.
The routing overlay must also perform some other tasks:

1 Insertion of objects:
A node wishing to make a new object available to a peer-to-
peer service computes a GUID for the object and announces it
to the routing overlay, which then ensures that the object is
reachable by all other clients.
2 Deletion of objects:
When clients request the removal of objects from the service
the routing overlay must make them unavailable.
3 Node addition and removal:
Nodes (i.e., computers) may join and leave the service. When
a node joins the service, the routing overlay arranges for it to
assume some of the responsibilities of other nodes when a
node leaves.
2 types of
overlays
1. Unstructured
2. Structured
Unstructured systems { do not impose any structure on the
overlay networks or loosely structured}
 E.g., Napster, Gnutella, Freenet, FastTrack, eDonkey2000,
BitTorrent
 Support complex search based on file metadata
 Low search efficiency, especially for unpopular files
Structured systems { impose particular structures on the overlay
networks}
 E.g., Distributed Hash Tables (DHTs)
 The topology of the peer network is tightly controlled
 Any file can be located in a small number of overlay hops
 Structured overlays use a number of different geometries (rings,
trees, hypercubes, tori, XOR, . . . )
Types of Routing
Overlays
• DHT – Distributed Hash Tables
•  put(GUID, data), remove(GUID) , value =
get(GUID)

• DOLR – Distributed Object Location and Routing


•  Publish(GUID), unpublish(GUID)

•  DOLR is a layer over the DHT that maps GUIDs and address of
nodes
Distributed File
System
What is File
system?
What is File
system?
 A file system is a subsystem of operating system
that performs how data is stored, accessed and
managed.

 A file system is a hierarchical structure (file tree)


of files and directories.

 This file tree uses directories to organize data and


programs into groups, allowing the
management of several directories and files at
one time.

 Files contain both data and attributes.


Characteristics of file
system
DFS
 Stands for "Distributed File System“.
 A Distributed File System is a file system that may have files
on more than one machine.
 A distributed file system is a client/server-based application that
allows clients to access and process data stored on the server
as if it were on their own computer.
 Even when files are stored on multiple computers, DFS can
organize and display the files as if they are stored on one
computer.
 Users can also share files by copying them to a directory in the
DFS and can update files by editing existing documents.
 Since more than one client may access the same data
simultaneously, the server must have a mechanism in place
(such as maintaining information about the times of access) to
organize updates so that the client always receives the most
current version of data and that data conflicts do not arise.
Distributed File system
requirements
Related requirements in distributed file systems are:
1. Transparency
2. Concurrent file updates
3. File Replication
4. Hardware & Operating system Heterogeneity
5. Fault tolerance
6. Consistency
7. Security
8. Efficiency
Transparency

 Transparency defined as the concealment from the


user and application programmer of the separation
of components in a distributed system.

 Client programs should be unaware of the


distribution of files.
Forms of Transparencies
in File Services

1. Access transparency
2. Location transparency
3. Mobility transparency
4. Performance transparency
5. Scaling transparency
List of file accessing
models
 The file accessing models of a distributed file system
mainly depends on two factors-the methods used for
accessing remote files and unit of data access:

 Accessing remote files

 Remote service model


 Data-caching model

 Unit of Data Transfer

 File-level transfer model


 Block-level transfer model
 Byte-level transfer model
 Record-level transfer model
Remote service
model
 Processing of client request is performed at
server’s node.

 Client request is delivered to server and server


machine performs on it and returns replies to
client.

 Request and replies transferred across network as


message.

 File server interface and communication protocol

 must be designed carefully so as to minimize the


Every remote file access results in traffic
overhead of generating the messages.
Data caching
model

 Reduced the amount of network traffic by taking


advantage of locality feature.

 If requested data is not present locally then copied


it from server’s node to client node and caching
there.

 LRU is used to keep the cache size bounded

 Cache Consistency problem


UNIX semantics
Session
Semantics
 For this semantics, the following file access pattern
is assumed: A client opens a file, performs a series
of read/write operations on the file and finally
closes the file.
 A session is a series of file accesses made
between the open and close operations.
 Local changes to a file are not made permanent
until the file is closed. In the meantime, if another
user opens the file, she gets the original version.

 This approach is common in DFS’s.


Immutable shared-files
semantics
 This semantics is based on the use of the immutable file model.
 An immutable file cannot be modified once it has been created.
 The only operations on a file are, effectively, create, read, and
replace.
 According to this semantics, once the creator of the file declares it
to be sharable, the file is treated as immutable, so that it cannot be
modified anymore.
 Changes to the file are handled by creating a new updated version
of the file. Each version of the file is treated as an entirely new file.
 Therefore the semantics allows files to be shared only in the read-
only mode.
 If several users try to replace an existing file at the same time, one
is chosen: either the last to close, or non-deterministically.
Transaction-like
semantics

 Transaction: a set of operations


which must be executed entirely, or not at all.

 Transactions will either commit or abort:

 Abort
 Commit =>=> partial results
successful are undone
completion (All) (Nothing)
 Transactions are delimited by two special
primitives:
Begin_transaction // or something similar
transaction operations
(read, write, open, close, etc.)
End_transaction

 If the transaction successfully reaches the end


statement, it “commits” and all changes become
permanent; otherwise it aborts.
File Service
Architecture

An architecture that offers a clear separation of


the main concerns in providing access to files is
obtained by structuring the file service as three
components:
 A flat file service
 A directory service
 A client module.
File Service
Architecture
Flat file
service

 Concerned with the implementation


of operations on the contents of file.

 Unique File Identifiers (UFIDs) are used to


refer files in all requests for flat file service
operations.
Flat File service operations
• Read
• Write
• Create
• Delete
• Get attributes
• Set attributes
Read(FileId, i, n) :
Reads a sequence of up to n items from a file starting at item i.
Write(FileId, i, Data) :
Write a sequence of Data to a file, starting at item i.
Create() :
Creates a new file of length0 and delivers a UFID for it.
Delete(FileId) :
Removes the file from the file store.
GetAttributes(FileId) :
Returns the file attributes for the file.
SetAttributes(FileId, Attr) :
Sets the file attributes.
Directory
service
•  Provides mapping between text names for the files and
their UFIDs.

•  Clients may obtain the UFID of a file by quoting its text name to
directory service.

•  Directory service supports functions to add new files to


directories.
Directory service operations
• Lookup
• Addname
• Unname
• Getnames
Lookup(Dir, Name) :
Locates the text name in the directory and returns the relevant
UFID.
•If Name is not in the directory, throws an exception.
AddName(Dir, Name, File) :
If Name is not in the directory, adds(Name,File) to the directory
and updates the file’s attribute record.
•If Name is already in the directory: throws an exception.
UnName(Dir, Name) :
If Name is in the directory, the entry containing Name
is removed from the directory.
•If Name is not in the directory: throws an exception.
GetNames(Dir, Pattern):
Returns all the text names in the directory that match the regular
expression Pattern.
Client
module
 It runs on each computer and provides integrated
service (flat file and directory) as a single API to
application programs.

 It holds information about the network locations of


flat-file and directory server processes.
ACCESS CONTROL:
In distributed implementations, access rights checks have to be
performed at the server.
HIERACHIAL FILE SYSTEM:
A Hierarchic file system consists of a number of directories arranged in a
tree structure.
FILE GROUP
• A file group is a collection of files that can be located on any server.
Andrew file system
(AFS)
An Andrew file system (AFS) is a location-independent
file system that uses a local cache to reduce the
workload and increase the performance of a distributed
computing environment. A first request for data to a
server from a workstation is satisfied by the server and
placed in a local cache. A second request for the same
data is satisfied from the local cache.
Intention is to support information sharing on a large
scale by minimizing client-server communication.
Achieved by transferring whole files between server and
client computers and caching them at clients until the
servers receives a more up-to-date version.
Problem of sharing
files
 Caching files in the client side cache reduces
computation at the server side, thus enhancing
performance. However, the problem of sharing
files arises. To solve this, all clients with copies of
a file being modified by another client are not
informed the moment the client makes changes.
That client thus updates its copy, and the changes
are reflected in the distributed file system only
after the client closes the file.
Design Characteristics
• Whole file serving : enter contents of directories and files
transferred from server to client
• Whole file caching : When file transferred to client it will be
stored on that clients local disk
Vice process: Server s/w to run the user processes at server.
Venus process: Client s/w
Types of
Files
 The files available to user processes running on
workstations are either local or shared.

 Local files are handled as normal UNIX files. They


are stored on a workstation’s disk and are
available only to local user processes.

 Shared files are stored on servers, and copies of


them are cached on the local disks of workstations.
 Local files are used only for temporary files (/tmp)
and processes that are essential for workstation
startup.

 Other standard UNIX files (such as those normally


found in /bin, /lib and so on) are implemented as
symbolic links from local directories to files held in
the shared space.

 Users’ directories are in the shared space, enabling


users to access their files from any workstation
Cache
consistency
 When Vice supplies a copy of a file to a Venus process it
also provides a callback promise.

 callback promise – a token issued by the Vice server that


is the custodian of the file, guaranteeing that it will
notify the Venus process when any other client modifies
the file.

 Callback have 2 states: valid and cancelled.


 When a server performs a request to update a file it notifies
all of the Venus processes to which it has issued callback
promises by sending a callback to each.

 When the Venus process receives a callback, it sets the


callback promise token for the relevant file to cancelled.

 Whenever Venus handles an open on behalf of a client, it


checks the cache. If the required file is found in the cache,
then its token is checked. If its value is cancelled, then a
fresh copy of the file must be fetched from the Vice server,
but if the token is valid, then the cached copy can be
opened and used without reference to Vice.

You might also like