0% found this document useful (0 votes)
4 views

CS6601 UNIT I

The document provides an introduction to distributed systems, defining them as collections of independent computers that communicate through message passing. It discusses key characteristics such as concurrency, the absence of a global clock, and independent failures, along with examples like the Internet, intranets, and mobile computing. Additionally, it covers resource sharing, the structure of the World Wide Web, and the technologies that underpin it, including HTML, URLs, and HTTP.

Uploaded by

balasankar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

CS6601 UNIT I

The document provides an introduction to distributed systems, defining them as collections of independent computers that communicate through message passing. It discusses key characteristics such as concurrency, the absence of a global clock, and independent failures, along with examples like the Internet, intranets, and mobile computing. Additionally, it covers resource sharing, the structure of the World Wide Web, and the technologies that underpin it, including HTML, URLs, and HTTP.

Uploaded by

balasankar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

CS6601 DS – UNIT I

UNIT I INTRODUCTION
Examples of Distributed Systems–Trends in Distributed Systems – Focus on resource sharing –
Challenges. Case study: World Wide Web
1.1 Introduction
Definition :
A distributed System is a collection of independent computers that appears to its users as a single
coherent system.
Distributed system is one in which the hardware and software components located at networked
computers can communicate and coordinate their actions only by passing messages.
Our definition of distributed systems has the following important consequences:
Concurrency:
 DS supports concurrent program execution in a network of computers.
 Ex: Sharing of resources such as web pages or files when necessary.
 The capacity of the system to handle shared resources can be increased by adding more
resources to the network.
No global clock:
 Programs need to cooperate and coordinate their actions by exchanging messages.
 Close coordination often depends on a shared idea of the time at which the programs‟
actions occur.
 But there is a limit to the accuracy with which the computers in a network can
synchronize their clocks – there is no single global notion of the correct time.
 This is a direct consequence of the fact that the only communication is by sending
messages through a network.
Independent Failures
 Faults in the network result in the separation of the computers that are connected to it, but
that doesn‟t mean that they stop running.
 In fact, the programs on them may not be able to detect whether the network has failed or
has become unusually slow.
 Each component of the system can fail independently, leaving the others still running.
The prime motivation for constructing and using distributed systems stems from a desire
to share resources.
1.2 Examples of distributed Systems :

csenotescorner.blogspot.com Page 1
CS6601 DS – UNIT I

Three examples:
 The Internet
 An Intranet
 Mobile and Ubiquitous computing

1.2.1 The Internet


The Internet is a vast interconnected collection of computer networks of many different types.
Programs running on the computers connected to it interact by passing messages, employing a
common means of communication.

Figure 1.1 Illustrates the typical portion of the Internet.

The above figure shows a collection of intranets – sub networks operated by companies and
other organizations and typically protected by firewalls. Internet Service Providers (ISPs) are
companies that provide modem links and other types of connection to individual users and small
organizations, enabling them to access services anywhere in the Internet as well as providing
local services such as email and web hosting. The intranets are linked together by backbones.

csenotescorner.blogspot.com Page 2
CS6601 DS – UNIT I

A backbone is a network link with a high transmission capacity, employing satellite


connections, fiber optic cables and other high-bandwidth circuits. Multimedia services are
available in the Internet, enabling users to access audio and video data.

1.2.2 An intranet
An intranet is a portion of the internet that is separately administered and has a boundary that can
be configured to enforce local security policies.
The following fig. shows a typical intranet. It is composed of several local area networks linked
by backbone connections. The network configuration of a particular intranet is the responsibility
of the organization that administers it and may vary widely – ranging from a LAN on a single
site to a connected set of LANs belonging to branches of a company or other organization in
different countries.
An intranet is connected to the Internet via a router, which allows the users inside the intranet to
make use of services elsewhere such as web or email. It also allows the users in other intranets to
access the services it provides. Many organizations need to protect their own services from
unauthorized use by possibly malicious users elsewhere.
The role of firewall is to protect an intranet by preventing unauthorized messages leaving or
entering. A firewall is implemented by filtering incoming and outgoing messages, for example
according to their source or destination.
email server Desktop
computers
print and other servers

Local area
Web server network

email server
print
File server
other servers
the rest of
the Internet
router/firewall

1.2.3 Mobile and ubiquitous computing

csenotescorner.blogspot.com Page 3
CS6601 DS – UNIT I

Technological advances in device miniaturization and wireless networking have led increasingly
to the integration of small and portable computing devices into distributed systems. These
devices include:
 Laptop computers.
 Handheld devices, including mobile phones, pagers, personal digital assistants (PDAs),
video cameras and digital cameras.
 Wearable devices, such as smart watches with functionality similar to a PDA.
 Devices embedded in appliances such as washing machines, hi-fi systems, cars and
refrigerators.

The portability of many of these devices, together with their ability to connect conveniently to
networks in different places, makes mobile computing possible. Mobile computing is the
performance of computing tasks while the user is on the move, or visiting places other than their
usual environment.

In mobile computing, users who are away from their „home‟ intranet are still provided with
access to resources via the devices they carry with them. They can continue to access the
Internet; they can continue to access resources in their home intranet; and there is increasing
provision for users to utilize resources such as printers that are conveniently nearby as they move
around. The latter is also known as location-aware or context-aware computing.

Ubiquitous computing is connecting many small, cheap computational devices that are present
in users‟ physical environments, including the home, office and even natural settings. The term
„ubiquitous‟ is intended to suggest that small computing devices will eventually become so
pervasive in everyday objects that they are scarcely noticed. The presence of computers
everywhere only becomes useful when they can communicate with one another.

csenotescorner.blogspot.com Page 4
CS6601 DS – UNIT I

Internet

Host intranet WAP


Wireless LAN gateway Home intranet

Mobile
phone
Printer Laptop
Camera Host site

Ubiquitous and mobile computing overlap, since the mobile user can in principle benefit from
computers that are everywhere. But they are distinct, in general. Ubiquitous computing could
benefit users while they remain in a single environment such as the home or a hospital. Figure
shows a user who is visiting a host organization. The figure shows the user‟s home intranet and
the host intranet at the site that the user is visiting. Both intranets are connected to the rest of the
Internet.

The user has access to three forms of wireless connection. Their laptop has a means of
connecting to the host‟s wireless LAN. This network provides coverage of a few hundred meters.
It connects to the rest of the host intranet via a gateway or access point. The user also has a
mobile telephone, which is connected to the Internet. The phone gives access to pages of simple
information, which it presents on its small display. Finally, the user carries a digital camera,
which can communicate over a personal area wireless network with a device such as a printer.

1.3 Resource sharing and the web


We share hardware resources such as printers, data resources such as files, and resources with
more specific functionality such as search engines.
We share equipment such as printers and disks to reduce costs. But of far greater significance to
users is the sharing of the higher-level resources that play a part in their applications and in their
everyday work and social activities.
In practice, patterns of resource sharing vary widely in their scope and in how closely users work
together. At one extreme, a search engine on the Web provides a facility to users throughout the
world, users who need never come into contact with one another directly. At the other extreme,

csenotescorner.blogspot.com Page 5
CS6601 DS – UNIT I

in computer-supported cooperative working (CSCW), a group of users who cooperate directly


share resources such as documents in a small, closed group.

We use the term service for a distinct part of a computer system that manages a collection of
related resources and presents their functionality to users and applications. For example, we
access shared files through a file service; we send documents to printers through a printing
service; we buy goods through an electronic payment service.
Server refers to a running program (a process) on a networked computer that accepts requests
from programs running on other computers to perform a service and responds appropriately. The
requesting processes are referred to as clients, and the overall approach is known as client-server
computing. In this approach, requests are sent in messages from clients to a server and replies
are sent in messages from the server to the clients. When the client sends a request for an
operation to be carried out, we say that the client invokes an operation upon the server. A
complete interaction between a client and a server, from the point when the client sends its
request to when it receives the server‟s response, is called a remote invocation.

The same process may be both a client and a server, since servers sometimes invoke operations
on other servers. Clients are active (making requests) and servers are passive (only waking up
when they receive requests); servers run continuously, whereas clients last only as long as the
applications of which they form a part. An executing web browser is an example of a client. The
web browser communicates with a web server, to request web pages from it.

The World Wide Web


The World Wide Web is a developing system for publishing and accessing resources and
services across the Internet. The Web began life at the European centre for nuclear research
(CERN), Switzerland, in 1989. A key feature of the Web is that it provides a hypertext structure
among the documents that it stores. This means that documents contain links references to other
documents and resources that are also stored in the Web.
The Web is an open system: it can be extended and implemented in new ways without disturbing
its existing functionality. First, its operation is based on communication standards and document
or content standards that are published and widely implemented.

csenotescorner.blogspot.com Page 6
CS6601 DS – UNIT I

Second, the Web is open with respect to the types of resource that can be published and shared
on it. At its simplest, a resource on the Web is a web page or some other type of content that can
be presented to the user, such as media files and documents in Portable Document Format.
The Web is based on three main standard technological components:

 The Hyper Text Markup Language (HTML), a language for specifying the contents and
layout of pages as they are displayed by web browsers;
 Uniform Resource Locators (URLs), also known as Uniform Resource Identifiers (URIs),
which identify documents and other resources stored as part of the Web;
 A client-server system architecture, with standard rules for interaction (the Hyper Text
Transfer Protocol – HTTP) by which browsers and other clients fetch documents and
other resources from web servers.

HTML
The Hyper Text Markup Language is used to specify the text and images that make up the
contents of a web page, and to specify how they are arranged and formatted for presentation to
the user. A web page contains such structured items as headings, paragraphs, tables and images.
HTML is also used to specify links and which resources are associated with them.
A typical piece of HTML text follows:
<IMG SRC = “http://www.cdk5.net/WebExample/Images/earth.jpg”> 1
<P> 2
Welcome to Earth! Visitors may also be interested in taking a look at the 3
<A HREF = “http://www.cdk5.net/WebExample/moon.html”>Moon</A>. 4
</P> 5

The HTML directives, known as tags, are enclosed by angle brackets, such as <P>.
Line1 of the example identifies a file containing an image for presentation. Its URL is
http://www.cdk5.net/WebExample/Images/earth.jpg. Lines 2 and 5 are directives to begin and
end a paragraph, respectively. Lines 3 and 4 contain text to be displayed on the web page in the
standard paragraph format.
Line 4 specifies a link in the web page. It contains the word „Moon‟ surrounded by two related
HTML tags, <A HREF...> and </A>. The text between these tags is what appears in the link as it

csenotescorner.blogspot.com Page 7
CS6601 DS – UNIT I

is presented on the web page. Most browsers are configured to show the text of links underlined
by default, so what the user will see in that paragraph is:
Welcome to Earth! Visitors may also be interested in taking a look at the Moon.

URLs
The purpose of a Uniform Resource Locator is to identify a resource. Every URL, in its full,
absolute form, has two top-level components:
scheme : scheme-specific-identifier
The first component, the „scheme‟, declares which type of URL this is. URLs are required to
identify a variety of resources. For example, mailto:[email protected] identifies a user‟s email
address; ftp://ftp.downloadIt.com/software/aProg.exe identifies a file that is to be retrieved using
the File Transfer Protocol (FTP).
The Web is open with respect to the types of resources it can be used to access, by virtue of the
scheme designators in URLs. If somebody invents a useful new type of „widget‟ resource –
perhaps with its own addressing scheme for locating widgets and its own protocol for accessing
them – then the world can start using URLs of the form widget:....
HTTP URLs are the most widely used, for accessing resources using the standard HTTP
protocol. An HTTP URL has two main jobs: to identify which web server maintains the
resource, and to identify which of the resources at that server is required.

Fig: Web Servers and Web browsers


The above Figure shows three browsers issuing requests for resources managed by three web
servers. The topmost browser is issuing a query to a search engine. The middle browser requires

csenotescorner.blogspot.com Page 8
CS6601 DS – UNIT I

the default page of another web site. The bottom most browser requires a web page that is
specified in full, including a path name relative to the server. The files for a given web server are
maintained in one or more sub-trees (directories) of the server‟s file system, and each resource is
identified by a path name relative to the server.

In general, HTTP URLs are of the following form:


http:// servername [:port] [/pathName] [?query] [ #fragment]

where items in square brackets are optional. A full HTTP URL always begins with the string
„http://‟ followed by a server name, expressed as a Domain Name System (DNS) name. The
server‟s DNS name is optionally followed by the number of the „port‟ on which the server listens
for requests, which is 80 by default. Then comes an optional path name of the server‟s resource.
If this is absent then the server‟s default web page is required. Finally, the URL optionally ends
in a query component – for example, when a user submits the entries in a form such as a search
engine‟s query page – and/or a fragment identifier, which identifies a component of the resource.

Consider the URLs:


http://www.cdk5.net
http://www.w3.org/standards/faq.html#conformance
http://www.google.com/search?q=obama

HTTP
The Hyper Text Transfer Protocol defines the ways in which browsers and other types of client
interact with web servers.
Request-reply interactions: HTTP is a „request-reply‟ protocol. The client sends a request
message to the server containing the URL of the required resource. The server looks up the path
name and, if it exists, sends back the resource‟s content in a reply message to the client.
Otherwise, it sends back an error response such as the familiar „404 Not Found‟.
Content types: Browsers are not necessarily capable of handling every type of content. When a
browser makes a request, it includes a list of the types of content it prefers – for example, in
principle it may be able to display images in „GIF‟ format but not „JPEG‟ format. The server
may be able to take this into account when it returns content to the browser. The server includes

csenotescorner.blogspot.com Page 9
CS6601 DS – UNIT I

the content type in the reply message so that the browser will know how to process it. The strings
that denote the type of content are called MIME types, and they are standardized in RFC 1521.
For example, if the content is of type „text/html‟ then a browser will understand the text as
HTML and display it; if the content type is „application/zip‟ then it is data compressed in „zip‟
format, and the browser will start an external helper application to decompress it.
One resource per request: Clients specify one resource per HTTP request. If a web page
contains nine images, say, then the browser will issue a total of ten separate requests to obtain
the entire contents of the page.
Simple access control: By default, any user with network connectivity to a web server can
access any of its published resources. If users wish to restrict access to a resource, then they can
configure the server to issue a „challenge‟ to any client that requests it. The corresponding user
then has to prove that they have the right to access the resource, for example, by typing in a
password.
Dynamic Pages: Much of the users‟ experience of the Web is that of interacting with services
rather than
retrieving data. For example, when purchasing an item at an online store, the user often fills out a
web form to provide personal details or to specify exactly what they wish to purchase. A web
form is a webpage containing instructions for the user and input widgets such as text fields and
check boxes. When the user submits the form the browser sends an HTTP request to a web
server, containing the values that the user has entered.
Since the result of the request depends upon the user‟s input, the server has to process the user‟s
input. Therefore the URL or its initial component designates a program on the server, not a file.
If the user‟s input is a reasonably short then it is usually sent as the query component of the
URL, following a “?” character. For example, a request containing the following URL invokes a
program called „search‟ at www.google.com and specifies a query string of „kindberg‟:
http://www.google.com/search?q=kindberg.

That „search‟ program produces HTML text as its output, and the user will see a listing of pages
that contain the word „kindberg‟.
A program that web servers run to generate content for their clients is referred to as a Common
Gateway Interface (CGI) program.

csenotescorner.blogspot.com Page 10
CS6601 DS – UNIT I

1.4 CHALLENGES :
The key challenges faced by the designers of Distributed system are heterogeneity, openness,
security, Scalability, failure handling, concurrency and the need for transparency.

1.4.1 Heterogeneity :
It enables the users to access services and run applications over different networks, hardware,
operating systems, and programming languages. The internet Communication protocols mask the
difference in networks, and middleware can deal with other differences.

Different programming languages use different representations for characters and data structures
such as arrays and records. Programs written by different developers cannot communicate with
one another unless
they use common standards.

The term Middleware applies to a software layer that provides a programming abstraction as well
as masking the heterogeneity of the underlying networks, hardware, operating systems and
programming languages.
Example : Common Object Request Broker Architecture (CORBA).

The term mobile code is used to refer to program code that can be transferred from one computer
to another and run at the destination – Java applets are an example. Code suitable for running on
one computer is not necessarily suitable for running on another because executable programs are
normally specific both to the instruction set and to the host operating system. The virtual
machine approach provides a way of making code executable on a variety of host computers: the
compiler for a particular language generates code for a virtual machine instead of a particular
hardware order code.

1.4.2 Openness
The openness of a computer system is the characteristic that determines whether the system can
be
extended and re-implemented in various ways.

csenotescorner.blogspot.com Page 11
CS6601 DS – UNIT I

The openness of distributed systems is determined primarily by the degree to which new
resource sharing services can be added and be made available for use by a variety of client
programs.

The first step to provide openness is to publish the interfaces of the components, but the
integration of components written by different programmers is a real challenge.

• Open systems are characterized by the fact that their key interfaces are published.
• Open distributed systems are based the provision of a uniform communication mechanism and
published interfaces for access to shared resources.
• Open systems can be constructed from heterogeneous hardware and software.

1.4.3 Security
Security is important issue in DS. Security for information resources has three components

(i) Confidentiality - Protection against disclosure to unauthorized individuals.


(ii) Integrity - Protection against alteration or corruption.
(iii) Availability - Protection against interference with the means to access the resources.
The challenge is to send sensitive information in a message over a network in a secure manner.
But security is not just a matter of concealing the contents of messages; it also involves knowing
for sure the identity of the user.
The second challenge is to identify a remote user.
Both of these challenges can be met by the use of encryption techniques. Encryption can be used
to provide adequate protection of shared resource and to keep sensitive information secret when
it is transmitted in messages over a network. Denial of service attacks are still a problem.

1.4.4 Scalability
A distributed system is scalable if the cost of adding a user is a constant amount in terms of the
resources that must be added.
In other words, A system is described as scalable if it will remain effective when there is
significant increase in the number of resources and the number of users.

csenotescorner.blogspot.com Page 12
CS6601 DS – UNIT I

The design of scalable distributed system present the following challenges:

Controlling the cost of physical resources: As the demand for a resource grows, it should be
possible to extend the system, at reasonable cost, to meet it.

Controlling the performance loss: Consider the management of a set of data whose size is
proportional to the number of users or resources in the system. Algorithms that use hierarchic
structures scale better than those that use linear structures. But even with hierarchic structures an
increase in size will result in some loss in performance: the time taken to access hierarchically
structured data is O(log n), where n is the size of the set of data.
.
Preventing software resources running out:
As an Example, the supply of available internet addresses is running out. There is no correct
solution to this problem. It is difficult to predict the demand that will be put on a system years
ahead.

Avoiding performance bottle necks :


Algorithms should be decentralized to avoid having performance bottlenecks. Because, In a large
distributed system, enormous number of messages have to be routed over many lines.

The optimal way to do this is collect complete information about the load on all machines and
lines, and then run a graph theory algorithm to compute all the optional. The trouble is that
collecting and transporting all the Input and output information would again be a bad idea
because these messages world overload part of the network.

1.4.5 Failure handling


Failures fall into two obvious categories : hardware and software. If failure occurs, program may
produce incorrect results on they may stop before they have completed and intended
computation. Failures in a distributed system are partial, i.e., some components fail while others
continue to function. Therefore, the handling of failures is particularly difficult. The techniques
dealing with failures are listed below.

csenotescorner.blogspot.com Page 13
CS6601 DS – UNIT I

Detecting failures - Some failures can be detected. E.g. checksums can be used to detect
corrupted data in a message or a file

Masking failures - Some failures that have been detected can be hidden or made less severe.
E.g. messages can be retransmitted when they fail to arrive.
Tolerating failures - It is not possible to detect and hide all the failures that might occur in large
network. Their clients can be designed to tolerate failures, which generally involve the users
tolerating them as well. Services can be made to tolerate failures by the use of redundant
components

Recovery from failures - Recovery involves the design of software so that the state of
permanent data can be recovered or rolled back after a server has crashed.

1.4.6 Concurrency
Both services and applications provide resources that can be shared by clients in a distributed
system.

There is therefore a possibility that several clients will attempt to access a shared resource at the
same time. The process that manages a shared resource could take one client request at a time.
But that approach limits throughput. Therefore, services and applications generally allow
multiple client requests to be processed concurrently.

For an object to be safe in concurrent environment, its operations must be synchronized in such a
way that its data remains consistent. This can be achieved by standard techniques. such as
semaphores.

1.4.7 Transparency
It is defined as the concealment from the user and the application programmer of the separation
of components in a distributed system. Hence, the system is perceived as a whole rather than as a
collection of independent components.

csenotescorner.blogspot.com Page 14
CS6601 DS – UNIT I

The main aim of the transparency is to make certain aspects of distribution invisible to the
application programmers.

Access transparency enables local and remote resources to be accessed using identical
operations.
Location transparency enables resources to be accessed without knowledge of their location.
Concurrency transparency enables several processes to operate concurrently using shared
resources. The resources will not interfere among themselves.
Replication transparency enables multiple instances of resources to be used to increase
reliability and performance without knowledge of the replicas by users or application
programmers.
Failure transparency enables the concealment of faults. It allows users and application
programs to complete their tasks even though they have the failure of hardware or software
components.
Mobility transparency allows the movement of resources and clients within a system. The
movement will not affect the operation of users or programs.
Performance transparency allows the system to be reconfigured to improve performance as
loads vary.
Scaling transparency allows the system and applications to expand in scale. However it will not
change to the system structure or the application algorithms.

There are two important transparencies available. They are access and location transparency;
their presence or absence most strongly affects the utilization of distributed resources. They are
referred together as network transparency.

csenotescorner.blogspot.com Page 15

You might also like