DS-UNIT1 NOTES
DS-UNIT1 NOTES
UNIT-1
Distributed systems
A distributed system consists of a collection of autonomous computers, connected
through a network and distribution middleware, which enables computers to coordinate their
activities and to share the resources of the system, so that users perceive the system as a single,
integrated computing facility.
Common Characteristics
Resource Sharing
Openness
Concurrency
Scalability
Fault Tolerance
Transparency
Resource Sharing
Openness
Concurrency
Scalability
Fault Tolerance
• recovery
• redundancy
Transparency
Location Transparency
Enables information objects to be accessed without knowledge of their location.
Concurrency Transparency
• Example: NFS
• Example: Automatic teller machine network
• Example: Database management system
Replication Transparency
Failure Transparency
Migration Transparency
• Example: NFS
• Example: Web Pages
Performance Transparency
Allows the system to be reconfigured to improve performance as loads vary.
Scaling Transparency
Allows the system and applications to expand in scale without change to the
system structure or the application algortithms.
• Data transfer rate is the speed at which data can be transferred between two
computers in the network once transmission has begun (Bits per Sec)
• Message transmission time
3. No routing of messages is required within a segment, since the medium provides direct
connections between all of the computers connected to it.
4. In local area networks, the total system bandwidth is high and latency is low, except
when message traffic is very high
2. A variety of technologies have been used to implement the routing of data in MANs,
ranging from Ethernet to ATM.
3. The DSL (Digital Subscriber Line) and cable modem connections now available in
many countries are an example.
4. DSL typically uses ATM switches located in telephone exchanges to route digital data
onto twisted pairs of copper wire.
2. To connect computers within homes and office buildings to each other and the Internet.
They are in widespread use in several variants of the IEEE 802.11 standard (WiFi),
Network Principles
3. The timely delivery of audio and video streams depends upon the availability of
connections with adequate quality of service, bandwidth, latency and reliability must
all be considered.
4. Ideally, adequate quality of service should be guaranteed. ATM networks are designed
to provide high bandwidth and low latencies and to support QoS by the reservation of
network resources.
Circuit switching
1. At one time telephone networks were the only telecommunication networks.
2. when a caller dialled a number, the pair of wires from her phone to the local exchange
was connected by an automatic switch at the exchange to the pair of wires connected to
the other party’s phone.
3. For a long-distance call the process was similar but the connection would be switched
through a number of intervening exchanges to its destination.
4. This system is sometimes referred to as the plain old telephone system, or POTS. It is
a typical circuit-switching network.
Broadcast
1. Broadcasting is a transmission technique that involves no switching.
Protocols
1. The term protocol is used to refer to a well-known set of rules and formats to be used
for communication between processes in order to perform a given task.
2. The definition of a protocol has two important parts to it:
8. TELNET(Terminal Network)
9. POP3(Post Office Protocol 3)
10. IPv4
11. IPv6
12. ICMP
13. UDP
14. IMAP
15. SSH
16. Gopher
Note: Hypertext refers to the special format of the text that can contain links to other texts.
8. TELNET(Terminal Network)
TELNET is a standard TCP/IP protocol used for virtual terminal service given by ISO. This
enables one local machine to connect with another. The computer which is being connected is
called a remote computer and which is connecting is called the local computer. TELNET
operation lets us display anything being performed on the remote computer in the local
computer. This operates on the client/server principle. The local computer uses the telnet client
program whereas the remote computer uses the telnet server program.
10. IPv4
The fourth and initially widely used version of the Internet Protocol is called IPv4 (Internet
Protocol version 4). It is the most popular version of the Internet Protocol and is in charge of
distributing data packets throughout the network. Maximum unique addresses for IPv4 are
4,294,967,296 (232), which are possible due to the use of 32-bit addresses. The network
address and the host address are the two components of each address. The host address
identifies a particular device within the network, whereas the network address identifies the
network to which the host belongs. In the “dotted decimal” notation, which is the standard for
IPv4 addresses, each octet (8 bits) of the address is represented by its decimal value and
separated by a dot (e.g. 192.168.1.1).
11. IPv6
The most recent version of the Internet Protocol, IPv6, was created to address the IPv4
protocol’s drawbacks. A maximum of 4.3 billion unique addresses are possible with IPv4’s 32-
bit addresses. Contrarily, IPv6 uses 128-bit addresses, which enable a significantly greater
number of unique addresses. This is significant because IPv4 addresses were running out and
there are an increasing number of devices that require internet access. Additionally, IPv6 offers
enhanced security features like integrated authentication and encryption as well as better
support for mobile devices. IPv6 support has spread among websites and internet service
providers, and it is anticipated to gradually displace IPv4 as the main internet protocol.
For more details, please refer Differences between IPv4 and IPv6 article.
12. ICMP
ICMP (Internet Control Message Protocol) is a network protocol that is used to send error
messages and operational information about network conditions. It is an integral part of the
Internet Protocol (IP) suite and is used to help diagnose and troubleshoot issues with network
connectivity. ICMP messages are typically generated by network devices, such as routers, in
response to errors or exceptional conditions encountered in forwarding a datagram. Some
examples of ICMP messages include:
ICMP can also be used by network management tools to test the reachability of a host and
measure the round-trip time for packets to travel from the source to the destination and back.
It should be noted that ICMP is not a secure protocol, it can be used in some types of network
attacks like DDoS amplification.
13. UDP
UDP (User Datagram Protocol) is a connectionless, unreliable transport layer protocol. Unlike
TCP, it does not establish a reliable connection between devices before transmitting data, and
it does not guarantee that data packets will be received in the order they were sent or that they
will be received at all. Instead, UDP simply sends packets of data to a destination without any
error checking or flow control. UDP is typically used for real-time applications such as
streaming video and audio, online gaming, and VoIP (Voice over Internet Protocol) where a
small amount of lost data is acceptable and low latency is important. UDP is faster than TCP
because it has less overhead. It doesn’t need to establish a connection, so it can send data
packets immediately. It also doesn’t need to wait for confirmation that the data was received
before sending more, so it can transmit data at a higher rate.
14. IMAP
IMAP (Internet Message Access Protocol) is a protocol used for retrieving emails from a mail
server. It allows users to access and manage their emails on the server, rather than downloading
them to a local device. This means that the user can access their emails from multiple devices
and the emails will be synced across all devices. IMAP is more flexible than POP3 (Post Office
Protocol version 3) as it allows users to access and organize their emails on the server, and also
allows multiple users to access the same mailbox.
15. SSH
SSH (Secure Shell) is a protocol used for secure remote login and other secure network
services. It provides a secure and encrypted way to remotely access and manage servers,
network devices, and other computer systems. SSH uses public-key cryptography to
authenticate the user and encrypt the data being transmitted, making it much more secure than
traditional remote login protocols such as Telnet. SSH also allows for secure file transfers using
the SCP (Secure Copy) and SFTP (Secure File Transfer Protocol) protocols. It is widely used
in Unix-based operating systems and is also available for Windows. It is commonly used by
system administrators, developers, and other technical users to remotely access and manage
servers and other network devices.
16. Gopher
Gopher is a type of file retrieval protocol that provides downloadable files with some
description for easy management, retrieving, and searching of files. All the files are arranged
on a remote computer in a stratified manner. It is an old protocol and it is not much used
nowadays.
Distributed Computing System Models
Distributed computing is a system where processing and data storage is distributed across
multiple devices or systems, rather than handled by a single central device. In this article, we
will see Distributed Computing System Models.
o Architectural Model
o Fundamental Model
1. Physical Model
A physical model represents the underlying hardware elements of a distributed system. It
encompasses the hardware composition of a distributed system in terms of computers and other
devices and their interconnections. It is primarily used to design, manage, implement, and
determine the performance of a distributed system.
A physical model majorly consists of the following components:
1. Nodes
Nodes are the end devices that can process data, execute tasks, and communicate with the other
nodes. These end devices are generally the computers at the user end or can be servers,
workstations, etc.
Nodes provision the distributed system with an interface in the presentation layer that
enables the user to interact with other back-end devices, or nodes, that can be used for
storage and database services, processing, web browsing, etc.
Each node has an Operating System, execution environment, and different middleware
requirements that facilitate communication and other vital tasks.,
2. Links
Links are the communication channels between different nodes and intermediate devices.
These may be wired or wireless. Wired links or physical media are implemented using copper
wires, fiber optic cables, etc. The choice of the medium depends on the environmental
conditions and the requirements. Generally, physical links are required for high-performance
and real-time computing. Different connection types that can be implemented are as follows:
Point-to-point links: Establish a connection and allow data transfer between only two
nodes.
Broadcast links: It enables a single node to transmit data to multiple nodes
simultaneously.
Multi-Access links: Multiple nodes share the same communication channel to transfer
data. Requires protocols to avoid interference while transmission.
3. Middleware
These are the softwares installed and executed on the nodes. By running middleware on each
node, the distributed computing system achieves a decentralised control and decision-making.
It handles various tasks like communication with other nodes, resource management, fault
tolerance, synchronisation of different nodes and security to prevent malicious and
unauthorised access.
4. Network Topology
This defines the arrangement of nodes and links in the distributed computing system. The most
common network topologies that are implemented are bus, star, mesh, ring or hybrid. Choice
of topology is done by determining the exact use cases and the requirements.
5. Communication Protocols
Communication protocols are the set rules and procedures for transmitting data from in the
links. Examples of these protocols include TCP, UDP, HTTPS, MQTT etc. These allow the
nodes to communicate and interpret the data.
2. Architectural Model
Architectural model in distributed computing system is the overall design and structure of the
system, and how its different components are organised to interact with each other and provide
the desired functionalities. It is an overview of the system, on how will the development,
deployment and operations take place. Construction of a good architectural model is required
for efficient cost usage, and highly improved scalability of the applications.
1. Client-Server model
It is a centralised approach in which the clients initiate requests for services and severs respond
by providing those services. It mainly works on the request-response model where the client
sends a request to the server and the server processes it, and responds to the client accordingly.
2. Peer-to-peer model
It is a decentralised approach in which all the distributed computing nodes, known as peers,
are all the same in terms of computing capabilities and can both request as well as provide
services to other peers. It is a highly scalable model because the peers can join and leave the
system dynamically, which makes it an ad-hoc form of network.
The resources are distributed and the peers need to look out for the required resources
as and when required.
The communication is directly done amongst the peers without any intermediaries
according to some set rules and procedures defined in the P2P networks.
3. Layered model
It involves organising the system into multiple layers, where each layer will provision a specific
service. Each layer communicated with the adjacent layers using certain well-defined protocols
without affecting the integrity of the system. A hierarchical structure is obtained where each
layer abstracts the underlying complexity of lower layers.
4. Micro-services model
In this system, a complex application or task, is decomposed into multiple independent tasks
and these services running on different servers. Each service performs only a single function
and is focussed on a specific business-capability. This makes the overall system more
maintainable, scalable and easier to understand. Services can be independently developed,
deployed and scaled without affecting the ongoing services.
3. Fundamental Model
The fundamental model in a distributed computing system is a broad conceptual framework
that helps in understanding the key aspects of the distributed systems. These are concerned
with more formal description of properties that are generally common in all architectural
models. It represents the essential components that are required to understand a distributed
system’s behaviour. Three fundamental models are as follows:
1. Interaction Model
Distributed computing systems are full of many processes interacting with each other in highly
complex ways. Interaction model provides a framework to understand the mechanisms and
patterns that are used for communication and coordination among various processes. Different
components that are important in this model are –
Message Passing – It deals with passing messages that may contain, data, instructions,
a service request, or process synchronisation between different computing nodes. It may
be synchronous or asynchronous depending on the types of tasks and processes.
Publish/Subscribe Systems – Also known as pub/sub system. In this the publishing
process can publish a message over a topic and the processes that are subscribed to that
topic can take it up and execute the process for themselves. It is more important in an
event-driven architecture.
1. Failure Model
This model addresses the faults and failures that occur in the distributed computing system. It
provides a framework to identify and rectify the faults that occur or may occur in the system.
Fault tolerance mechanisms are implemented so as to handle failures by replication and error
detection and recovery methods. Different failures that may occur are:
Crash failures – A process or node unexpectedly stops functioning.
2. Security Model
Distributed computing systems may suffer malicious attacks, unauthorised access and data
breaches. Security model provides a framework for understanding the security requirements,
threats, vulnerabilities, and mechanisms to safeguard the system and its resources. Various
aspects that are vital in the security model are:
Authentication: It verifies the identity of the users accessing the system. It ensures that
only the authorised and trusted entities get access. It involves –
Encryption:
o It is the process of transforming data into a format that is unreadable without a
decryption key. It protects sensitive information from unauthorized access or
disclosure.
Data Integrity:
o Data integrity mechanisms protect against unauthorised modifications or
tampering of data. They ensure that data remains unchanged during storage,
transmission, or processing. Data integrity mechanisms include: