0% found this document useful (0 votes)
62 views

Requirements For Distributed File Systems

The document discusses distributed file systems. It covers requirements like transparency, performance, and fault tolerance. It describes design issues such as file sharing, caching, and architectures. Examples of components are file servers, a directory service to map names to file IDs, and client modules that maintain state like open files.

Uploaded by

myat
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views

Requirements For Distributed File Systems

The document discusses distributed file systems. It covers requirements like transparency, performance, and fault tolerance. It describes design issues such as file sharing, caching, and architectures. Examples of components are file servers, a directory service to map names to file IDs, and client modules that maintain state like open files.

Uploaded by

myat
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

06-06798 Distributed Systems Overview

• Requirements for distributed file systems


– transparency, performance, fault-tolerance, ...
Lecture 7: • Design issues
Distributed File Systems – possible options, architectures
– file sharing, concurrent updates
– caching
• Example
– Sun NFS

Distributed Systems 1 Distributed Systems 2

Characteristics of file systems File attributes


• Operations on files (=data + attributes) File length
– create/delete Creation timestamp
– query/modify attributes Read timestamp
– open/close Write timestamp
– read/write Attribute timestamp

– access control Reference count


Owner
• Storage organisation User controlled
File type
– directory structure (hierarchical, pathnames) Access control list
– metadata (file management information)
• file attributes
• directory structure info, etc

Distributed Systems 3 Distributed Systems 4

Distributed file system requirements Distributed file system requirements


• Transparency (clients unaware of the distributed
nature) – Concurrent file updates (changes by one client do not
affect another)
– access transparency (client unaware of distribution of files,
same interface for local/remote files) – File replication (for load sharing, fault-tolerance)
– location transparency (uniform file name space from any – Heterogeneity (interface platform-independent)
client workstation) – Fault-tolerance (continues to operate in the face of client
– mobility transparency (files can be moved from one server and server failures)
to another without affecting client) – Consistency (one-copy-update semantics or slight
– performance transparency (client performance not affected variations)
by load on service) – Security (access control)
– scaling transparency (expansion possible if numbers of – Efficiency (performance comparable to conventional file
clients increase) systems)

Distributed Systems 5 Distributed Systems 6

1
File Service Design Options File Service Design Options
• Stateful • Stateless
– server holds information on open files, current position, file – no state information held by server
locks – file operations idempotent, must contain all information
– open before access, close after needed (longer message)
– better performance - shorter message, read-ahead possible – simpler file server design
– server failure - lose state – can recover easily from client or server crash
– client failure - tables fill up – locking requires extra lock server to hold state
– can provide file locks

Distributed Systems 7 Distributed Systems 8

File server architecture


File Service Architecture Text names
Components (for openness):
to UFIDs
• Flat file service
Client computer Server computer
– operations on file contents
Application Application Directory service – unique file identifiers (UFIDs)
program program – translates UFIDs to file locations
• Directory service
RPC Flat file service – mapping between text names to UFIDs
Client module • Client module
– API for file access, one per client computer
– holds state: open files, positions
– knows network location of flat file & directory server
API: knows open files, positions... UFIDs
opns on contents
Distributed Systems 9 Distributed Systems 10

Flat file service RPC interface Access control


• Used by client modules, not user programs • In UNIX file system
– FileId (UFID) uniquely identifies file – access rights are checked against the access mode (read,
– invalid if file not present or inappropriate access write, execute) in open
– Read/Write; Create/Delete; Get/SetAttributes – user identity checked at login time, cannot be tampered with
• No open/close! (unlike UNIX)
• In distributed systems
– access immediate with FileId
– access rights must be checked at server
– Read/Write identify starting point
• RPC unprotected
• Improved fault-tolerance • forging identity possible, a security risk
– operations idempotent except Create, can be repeated (at- – user id typically passed with every request (e.g. Sun NFS)
least-once RPC semantics)
– stateless
– stateless service

Distributed Systems 11 Distributed Systems 12

2
File names
Directory structure Text name (=directory pathname+file name)
• Hierarchical • hostname:local name
– tree-like, pathnames from root – not mobility transparent
– (in UNIX) several names per file (link operation) • uniform name structure (same name space for all
• Naming system clients)
– implemented by client module, using directory service • remote mount (e.g. Sun NFS)
– root has well-known UFID – remote directory inserted into local directory
– locate file following path from root – relies on clients maintaining consistent naming
conventions across all clients
• all clients must implement same local tree
• must mount remote directory into the same local directory
Distributed Systems 13 Distributed Systems 14

Remote mount Directory service


Server 1 Client Server 2 • Directory
(root) (root) (root)
– conventional file (client of the flat file service)
– mapping from text names to UFIDs
export ... vmunix usr nfs • Operations
– require FileId, machine readable UFID as parameter
Remote Remote – locate file (LookUp)
people students x staff users
mount mount – add/delete file (AddName/UnName)
big jon bob ... jim ann jane joe – match file names to regular expression (GetNames)

Note: The file system mounted at /usr/students in the client is actually the sub-tree located at /export/people in Server 1;
the file system mounted at /usr/staff in the client is actually the sub-tree located at /nfs/users in Server 2.

Distributed Systems 15 Distributed Systems 16

File sharing Example: Sun NFS (1985)


• Structure of flat file & client & directory service
Multiple clients share the same file for read/write access. • NFS protocol
• One-copy update semantics – RPC based, OS independent (originally UNIX)
– every read sees the effect of all previous writes • NFS server
– a write is immediately visible to clients who have the file – stateless (no open/close)
open for reading – no locks or concurrency control
• Problems! – no replication with updates
– caching: maintaining consistency between several copies • Virtual file system, remote mount
difficult to achieve • Access control (user id with each request)
– serialise access by using file locks (affects performance) – security loophole (modify RPC to impersonate user…)
– trade-off between consistency and performance • Client and server caching

Distributed Systems 17 Distributed Systems 18

3
NFS architecture File identifier (FileId)
Client computer Server computer
Simple Solution
– i-node (number identifying file Server address Index
Application Application
program program
within file system)
UNIX – file migration requires finding IP address.socket i-node number
system calls
UNIX kernel and changing all FileIds
UNIX kernel Virtual file system Virtual file system – UNIX reuses i-node numbers
Local Remote
after file deleted (i-node gen. no)
file system

UNIX NFS NFS UNIX NFS file handle


file file
Other

client server
system
NFS
system Virtual file system uses i-node if local, file handle if remote.
protocol

File handle
File system identifier i-node no. i-node gener. no.
RPC (UDP or TCP)
Distributed Systems 19 Distributed Systems 20

Caching in NFS Server caching


• Store data in server memory
• Indispensable for performance
• Read-ahead: anticipate which pages to read
• Caching
– retains recently used data (file pages, directories, file
• Delayed write
attributes) in cache – update in cache; write to disk periodically (UNIX sync to
– updates data in cache for speed synchronise cache) or when space needed
– which contents seen by users depends on timing
– block size typically 8kbytes
• Server caching • Write through
– cache and write to disk (reliable, poor performance)
– cache in server memory (UNIX kernel)
• Client caching • Write on close
– cache in client memory, local disk – write to disk only when commit received (fast but
problems with files open for a long time)
Distributed Systems 21 Distributed Systems 22

Client caching
• Potential consistency problems! Summary
– different versions, portions of files, check if copy still valid • File service
• Timestamp method – crucial to the running of a distributed system
– tag with latest time of validity check and modification time – performance, consistency and easy recovery essential
– copy valid if time since last check less than freshness
interval, or modification time on server the same • Design issues
– choose freshness interval adaptively – separate flat file service from directory service and client
module
• Reads
– perform validity check, if not valid, request data from server, – stateless for performance and fault-tolerance
optimisations – caching for performance
• Writes – concurrent updates difficult with caching
– After modification, marked as dirty and flushed – approximation of one-copy update semantics
• Not truly one-copy update semantics...
Distributed Systems 23 Distributed Systems 24

You might also like