0% found this document useful (0 votes)
40 views

Chapter 1

This document provides an introduction to distributed systems, defining them as collections of independent computers that appear as single coherent systems to users. It discusses several key characteristics, including autonomy of components, transparency to users, and scalability. The goals of distributed systems are described as making resources accessible, achieving distribution transparency through various forms of transparency, openness through standard interfaces, and scalability in size, geography, and administration.

Uploaded by

Prathamesh
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views

Chapter 1

This document provides an introduction to distributed systems, defining them as collections of independent computers that appear as single coherent systems to users. It discusses several key characteristics, including autonomy of components, transparency to users, and scalability. The goals of distributed systems are described as making resources accessible, achieving distribution transparency through various forms of transparency, openness through standard interfaces, and scalability in size, geography, and administration.

Uploaded by

Prathamesh
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 63

DISTRIBUTED SYSTEMS

Principles and Paradigms


Second Edition
ANDREW S. TANENBAUM
MAARTEN VAN STEEN

Chapter 1
Introduction
Definition of a Distributed System

A collection of independent computers that appears to its


users as a single coherent system.

2
Definition of a Distributed System

Several important aspects:


1. Consists of components that are autonomous.
2. Users think they are dealing with a single system.
3. No assumptions made regarding the types of
computers or the way they are interconnected.
• They could range from high-performance mainframe computers to
small nodes in sensor networks.

3
Important characteristics of distributed systems
• Differences between the various computers and the ways in which
they communicate are mostly hidden from users. The same holds for
the internal organization of the distributed system.
• users and applications can interact with a distributed system in a consistent
and uniform way, regardless of where and when interaction takes place.
• Should also be relatively easy to expand or scale. This characteristic is a
direct consequence of having independent computers, but at the same time,
hiding how these computers actually take part in the system as a whole.
• A distributed system will normally be continuously available, although
perhaps some parts may be temporarily out of order.
• Users and applications should not notice that parts are being replaced or
fixed, or that new parts are added to serve more users or applications .

4
Definition of a Distributed System
Distributed systems are often organized by means of a layer of software -
• logically placed between a higher layer consisting of users and
applications, and a lower layer consisting of operating systems and basic
communication facilities
• Middleware.

Figure 1-1. A distributed system organized as middleware. The middleware layer


extends over multiple machines, and offers each application the same
interface. 5
Goals of Distributed Systems
 Making resources accessible
 Distribution transparency
 Openness
 Scalability

6
Making Resources Accessible

 Making it easy for users and applications to access remote


resources
 Share remote resources in a controlled and efficient manner

7
Making Resources Accessible

 Benefits of sharing remote resources


 Better economics by sharing expensive resources
 Easier to collaborate and exchange information
 Connectivity of the Internet has lead to numerous virtual organizations
where geographically dispersed people can work together using groupware
 Connectivity has enabled electronic commerce
 However, as connectivity and sharing increase …
 Security problems
 Eavesdropping or intrusion on communication
 Tracking of communication to build up a preference profile of a specific
user

8
Distribution Transparency

An important goal - hide the fact that the


processes and resources are physically
distributed across multiple computers.
Transparent - A distributed system that presents
itself to users and applications as if it were only
a single computer system.

9
Types of Transparency

Figure 1-2. Different forms of transparency in a


distributed system (ISO, 1995).

10
Types of Transparency

• Access transparency - hide differences in data representation


and the way the resources are accessed
• Hide differences in machine architectures, but more important
is that we reach agreement on how data is to be represented
by different machines and operating systems
• Ex: a distributed system may have computer systems that run
different operating systems, each having their own file-naming
conventions.

11
Types of Transparency

• Location transparency - users cannot tell where a resource


is physically located in the system.
• Achieved by assigning only logical names to resources i.e.
names in which the location of a resource is not secretly
encoded
• Ex: http://www.prenhall.com/index.html

Migration transparency - resources can be moved without


affecting how those resources can be accessed.

12
Types of Transparency

Relocation transparency - resources can be relocated while


they are being accessed without the user or application
noticing anything.
Ex: when mobile users can continue to use their wireless
laptops while moving from place to place.

Replication transparency - hide the fact that several copies of


a resource exist.

13
Types of Transparency

• Concurrency transparency: Sharing of resources can be


done in a concurrent way without the knowledge of the
users.
Ex: Two independent users may each have stored their files
on the same file server or may be accessing the same
tables in a shared database.
• Consistency can be achieved through locking mechanisms,
by which users are, in turn, given exclusive access to the
desired resource.

• Failure transparency - a user does not notice that a


resource fails to work properly, and that the system
subsequently recovers from that failure.
14
Degree of Transparency

Complete hiding the distribution aspects from users is not


always a good idea.
Attempting to mask a server failure before trying another one
may slow down the system
Requiring several replicas to be always consistent means a
single update operation may take seconds to complete
For mobile and embedded devices, it may be better to
expose distribution rather than trying to hide it
Signal transmission is limited by the speed of light as well as
the speed of intermediate switches.

15
Openness

• An open distributed system offers services according to


standard rules that describe the syntax and semantics of
those services.
• Standard rules govern the format, contents, and meaning
of messages sent and received.
• Such rules are formalized in protocols
 Services are generally specified through interfaces, which
are often described in an Interface Definition Language
(IDL).

16
Openness

• An interface definition - allows an arbitrary process that needs a


certain interface to talk to another process that provides that
interface.
• Allows two independent parties to build completely different
implementations of those interfaces, leading to two separate
distributed systems that operate in exactly the same way.
• Proper specifications are complete and neutral
• Complete means that everything that is necessary to make an
implementation has indeed been specified.
• Neutral means do not prescribe what an implementation
should look like.
• Completeness and neutrality are important for interoperability
and portability.
17
Openness

Interoperability - characterizes the extent by which two


implementations of systems or components from different
manufacturers can co-exist and work together by merely
relying on each other's services as specified by a common
standard
Portability characterizes to what extent an application
developed for a distributed system A can be executed.
without modification, on a different distributed system B
that implements the same interfaces as A.

18
Openness

Extensibility
- It should be easy to configure the system out of different
components
- It should be easy to add new components or replace
existing ones.

19
Scalability

Scalability can be measured against three dimensions.


Size: be able to easily add more users and resources to a
system
Geography: be able to handle users and resources that are
far apart
Administrative: be easy to manage even if it spans many
independent administrative organizations

20
Size Scalability Problems

Consider scaling w.r.t. size - we are often confronted with the


limitations of centralized services, data and algorithms.

Figure 1-3. Examples of scalability limitations.

21
Scalability Problems

Only decentralized algorithms should be used.

Characteristics of decentralized algorithms:


 No machine has complete information about the system
state
 Machines make decisions based only on local information
 Failure of one machine does not ruin the algorithm
 There is no implicit assumption that a global clock exists

22
Geographical scalability Problems

• LANs use synchronous communication.


• Designing WANs using synchronous communication is
much more difficult
• Communication in WANs is inherently unreliable, and
virtually always point-to-point.
• this makes it very easy to locate a service.
• Scaling across multiple, independent administrative
domains leads to conflicting policies w.r.t. resource
usage, payment, management and security.

23
Scalability Problems

Security issues:
Many components of a distributed system that resides within
a single domain, may not be trusted by users in other
domains.

24
Scaling Techniques

How can the scalability problems be solved?


Three techniques for scaling:
• Hiding communication latencies
• Distribution
• Replication

25
Scaling Techniques
Hiding communication latencies:
Basic idea: Try to avoid waiting for responses to remote service
requests as much as possible.
In applications that cannot make effective use of asynchronous
communication, a better solution is to reduce the overall
communication.

26
Scaling Techniques

Figure 1-4. The difference between letting (a) a server


or (b) a client check forms as they are being filled.
27
Scaling Techniques
 Distribution: Taking a component, splitting into smaller parts,
and subsequently spreading them across the system.
Ex: the Internet Domain Name System (DNS).
- The DNS namespace is hierarchically organized into a tree of
domains, which are divided into nonoverlapping zones.
• The names in each zone are handled by a single name server.
• Each path name,being the name of a host in the Internet, and
thus associated with a network address of that host

28
Scaling Techniques
for example to resolve the name nl. vu.cs.flits

Figure 1-5. An example of dividing the DNS


name space into zones. 29
Scaling Techniques

 Replication: increases availability and helps balance the


load between components leading to better performance.
 Caching: special form of replication - making a copy of the
resource, generally in the proximity of the client accessing
that resource.

30
Scaling Techniques

One serious drawback to caching and replication -


consistency problems.

31
Scaling Techniques

Size scalability - least problematic from a technical point of


view.
Geographical scalability is a much tougher problem
Administrative scalability is the most difficult one, partly also
because we need to solve nontechnical problems

32
Pitfalls when Developing
Distributed Systems
False assumptions made by first time developer:
 The network is reliable
 The network is secure
 The network is homogeneous
 The topology does not change
 Latency is zero
 Bandwidth is infinite
 Transport cost is zero
 There is one administrator

These assumptions relate to properties that are unique to distributed


systems:
reliability, security, heterogeneity, and topology of the network; latency
and bandwidth; transport costs; and finally administrative domains.33
TYPES OF DISTRIBUTED SYSTEMS
• Distributed Computing Systems
• Distributed information systems,
• Distributed embedded systems.

34
Distributed Computing Systems
Cluster Computing Systems
• In cluster computing the underlying hardware consists of a collection
of similar workstations or PCs, closely connected by means of a high
speed local-area network.
• In addition, each node runs the same operating system.
• Used for parallel programming in which a single (compute intensive)
program is run in parallel on multiple machines.

35
Distributed Computing Systems
Grid Computing Systems
• Have a high degree of heterogeneity: no assumptions are made concerning
hardware, operating systems, networks, administrative domains, security
policies, etc.
• A key issue in a grid computing system is that resources from different
organizations are brought together to allow the collaboration of a group of people or
institutions.
•Such a collaboration is realized in the form of a virtual organization.
• Typically, resources consist of compute servers (including supercomputers, possibly
implemented as cluster computers), storage facilities, and databases.
• In addition, special networked devices such as telescopes, sensors, etc., can be
provided as well. 36
Distributed Computing Systems
Grid Computing Systems

37
Distributed Computing Systems
• The Fabric layer provides interfaces to local resources at a specific site. Note that these
interfaces are tailored to allow sharing of resources within a virtual organization. Typically,
they will provide functions for querying the state and capabilities of a resource, along with
functions for actual resource management (e.g., locking resources).

• The Connectivity layer consists of communication protocols for supporting

grid transactions that span the usage of multiple resources. For example, protocols are
needed to transfer data between resources, or to simply access a resource from a remote
location. In addition, the connectivity layer will contain security protocols to authenticate users
and resources. Note that in many cases human users are not authenticated; instead,
programs acting on behalf of the users are authenticated. In this sense, delegating rights from
a user to programs is an important function that needs to be supported in the connectivity
layer

38
Distributed Computing Systems
• The resource layer is responsible for managing a single resource. It uses the
functions provided by the connectivity layer and calls directly the interfaces
made available by the fabric layer. For example, this layer will offer functions for
obtaining configuration information on a specific resource, or, in general, to
perform specific operations such as creating a process or reading data. The
resource layer is thus seen to be responsible for access control, and hence will
rely on the authentication performed as part of the connectivity layer
• The collective layer deals with handling access to multiple resources and
typically consists of services for resource discovery, allocation and scheduling of
tasks onto multiple resources, data replication, and so on. Unlike the
connectivity and resource layer, which consist of a relatively small, standard
collection of protocols, the collective layer may consist of many different
protocols for many different purposes, reflecting the broad spectrum of services
it may offer to a virtual organization.ity layer.
• Finally, the application layer consists of the applications that operate within a
virtual organization and which make use of the grid computing environment

39
Distributed Pervasive Systems
• Devices are often characterized by being small, battery-powered, mobile,
and having only a wireless connection
• An important feature is the general lack of human administrative control.
• At best, devices can be configured by their owners, but otherwise they
need to automatically discover their environment
• following three requirements for pervasive applications:
1. Embrace contextual changes.
2. Encourage ad hoc composition.
3. Recognize sharing as the default.

40
Distributed Pervasive Systems
• Embracing contextual changes :
– means that a device must be continuously be aware of the fact that its
environment may change all the time.
– One of the simplest changes is discovering that a network is no longer
available, for example, because a user is moving between base stations.
– In such a case, the application should react, possibly by automatically
connecting to another network, or taking other appropriate actions.
• Encouraging ad hoc composition:
– refers to the fact that many devices in pervasive systems will be used in very
different ways by different users.
– As a result, it should be easy to configure the suite of applications running on a
device, either by the user or through automated (but controlled) interposition.
• Recognize sharing as the default:
– One very important aspect of pervasive systems is that devices generally join
the system in order to access (and possibly provide) information.
– This calls for means to easily read, store, manage, and share information.

41
Home System
• Built around one or more PCs, but can also include other
electronic devices:
– Automatic control of lighting, sprinkler systems, alarm
systems, etc.
– Network enabled appliances
– PDAs and smart phones, etc.
Electronic Health Care Systems

Figure 1-12. Monitoring a person in a pervasive electronic health care


system, using (a) a local hub or (b) a continuous wireless connection.
Sensor Networks
• A collection of geographically distributed nodes
consisting of a comm. device, a power source, some
kind of sensor, a small processor…
• Purpose: to collectively monitor sensory data
(temperature, sound, moisture etc.,) and transmit the
data to a base station
• “smart environment” – the nodes may do some
rudimentary processing of the data in addition to their
communication responsibilities.
Sensor Networks

Figure 1-13. Organizing a sensor network database, while storing


and processing data (a) only at the operator’s site or …
ARCHITECTURES
• The logical organization of distributed systems into software components, also referred to as software architecture
• These software architectures tell us how the various software components are to be organized and how they should
interact.
Architectural Styles
• The way that components are connected to each other, the data
exchanged between components. and finally how these elements are
jointly configured into a system.
• A component is a modular unit with well-defined required and provided
interfaces that is replaceable within its environment
• A connector, which is generally described as a mechanism that mediates
communication, coordination, or cooperation among components
• Important styles of architecture for distributed systems
 Layered architectures
 Object-based architectures
 Data-centered architectures
 Event-based architectures
Architectural Styles
• Components are organized in a
layered fashion where a
component at layer L is allowed
to call components at the
underlying layer L-1, but not
the other way around
• Control generally flows from
layer to layer
• Requests go down the
hierarchy whereas the results
flow upward
• Figure 2-1. The (a) layered architectural style
Architectural Styles (3)

• Each object corresponds to


component, and these components
are connected through a (remote)
procedure call mechanism.
• This software architecture matches
the client-server system
architecture

• Figure 2-1. (b) The object-based architectural style.


Data-centered architectures
• That processes communicate
through a common (passive or
active) repository.
• For example, a wealth of
networked applications have
been developed that rely on a
shared distributed file system in
which virtually all
communication takes place
through files.
• Likewise,Web-based distributed
systems are largely data-
centric: processes communicate
through the use of shared Web-
based data services.

50
Architectural Styles (4)

• Processes communicate through the


propagation of events, which
optionally also carry data
• Processes publish events after which
the middleware ensures that only
those processes that subscribed to
those events will receive them.
• Advantage : processes are loosely
coupled. They need not explicitly
refer to each other

Figure 2-2. (a) The event-based


architectural style
Architectural Styles (5)

• Processes are now also decoupled in time:


they need not both be active when
communication takes place.
• Many shared data spaces use a SQL-like
interface to the shared repository in that
sense that data can be accessed using a
Figure 2-2. (b) The shared data-space description rather than an explicit reference,
architectural style. as is the case with files.
System Architectures
 How many distributed systems are actually organized
by considering where software components are placed?
 Deciding on software components, their interaction,
and their placement leads to an instance of a software
architecture, also called a system architecture.
 Two Types of architecture:
 Centralized architecture
 Decentralized architecture

53
Centralized Architectures (1)
 In the basic client-server model, processes in a
distributed system are divided into two (possibly
overlapping) groups.
 A server is a process implementing a specific service, for
example, a file system service or a database service.
 A client is a process that requests a service from a server by
sending it a request and subsequently waiting for the server's
reply.
This client-server interaction, also known as request-
reply behavior

54
Centralized Architectures (2)

• Figure 2-3. General interaction between a client and a


server.

55
Centralized Architectures (3)
 Communication between a client and a server can be
implemented by means of a simple connectionless
protocol when the underlying network is fairly reliable
as in many local-area networks
When a client requests a service, it simply packages a
message for the server, identifying the service it wants,
along with the necessary input data. The message is
then sent to the server.
 Server will always wait for an incoming request,
subsequently process it, and package the results in a
reply message that is then sent to the client.
56
Centralized Architectures (4)
 Advantage of connectionless protocol:
 efficient
 As long as messages do not get lost or corrupted, the
request/reply protocol works fine.
 Making the protocol resistant to occasional
transmission failures is not trivial.
 Solution: the client must resend the request when no reply
message comes in.
 Problem: the client cannot detect whether the original
request message was lost, or that transmission of the reply
failed.
57
Centralized Architectures (5)
 If the reply was lost, then resending a request may result in
performing the operation twice.
 Examples:
 If the operation was
"transfer $10,000 from my bank account,"
then, it would be better to report an error instead.
 If the operation was
"tell me how much money I have left,"
then, it would be perfectly acceptable to resend the request.
 When an operation can be repeated multiple times without
harm, it is said to be idempotent.

58
Centralized Architectures (6)
 As an alternative, many client-server systems use a
reliable connection-oriented protocol.
 Although this solution is not appropriate in a local-
area network due to relatively low performance, it
works perfectly in wide-area systems in which
communication is inherently unreliable.

59
Centralized Architectures (7)
Example:
Virtually all Internet application protocols are based on
reliable TCP/IP connections.
In this case, whenever a client requests a service, it first sets
up a connection to the server before sending the request.
The server generally uses that same connection to send the
reply message, after which the connection is torn down.
Trouble: setting up and tearing down a connection is
relatively costly, especially when the request and reply
messages are small.

60
Application Layering (1)
• considering that many client-server applications are
targeted toward supporting user access to databases,
• many people have advocated a distinction between the
following three levels
 The user-interface level
 The processing level
 The data level

61
Application Layering (1)
• Considering that many client-server applications are targeted toward
supporting user access to databases,
• Many people have advocated a distinction between the following three
levels
 The user-interface level: contains all that is necessary to directly
interface with the user, such as display management.
 The processing level: typically contains the applications
 The data level: manages the actual data that is being acted on

62
Application Layering (2)
• Figure 2-4. The simplified organization of an Internet
search engine into three different layers.

63

You might also like