Unit-4 Rtu Kota
Unit-4 Rtu Kota
The techniques that are used for scheduling the processes in distributed systems are as
follows:
1. Task Assignment Approach: In the Task Assignment Approach, the user-submitted process
is composed of multiple related tasks which are scheduled to appropriate nodes in a
system to improve the performance of a system as a whole.
2. Load Balancing Approach: In the Load Balancing Approach, as the name implies, the
workload is balanced among the nodes of the system.
3. Load Sharing Approach: In the Load Sharing Approach, it is assured that no node would be
idle while processes are waiting for their processing.
Note:The Task Assignment Approach finds less applicability
practically as it assumes that characteristics of processes
like inter-process communication cost, etc. must be known in
advance.
The Load Balancing approach refers to the division of load among the processing elements of
a distributed system. The excess load of one processing element is distributed to other
processing elements that have less load according to the defined limits. In other words, the
load is maintained at each processing element in such a manner that neither it gets
overloaded nor idle during the execution of a program to maximize the system throughput
which is the ultimate goal of distributed systems. This approach makes all processing
elements equally busy thus speeding up the entire task leads to the completion of the task by
all processors approximately at the same time.
Types of Load Balancing Algorithms:
Static Load Balancing Algorithm: In the Static Load Balancing Algorithm, while
distributing load the current state of the system is not taken into account. These algorithms
are simpler in comparison to dynamic load balancing algorithms. Types of Static Load
Balancing Algorithms are as follows:
o Deterministic: In Deterministic Algorithms, the properties of nodes and
processes are taken into account for the allocation of processes to nodes.
Because of the deterministic characteristic of the algorithm, it is difficult to
optimize to give better results and also costs more to implement.
o Probabilistic: n Probabilistic Algorithms, Statistical attributes of the system are
taken into account such as several nodes, topology, etc. to make process
placement rules. It does not give better performance.
Dynamic Load Balancing Algorithm: Dynamic Load Balancing Algorithm takes into
account the current load of each node or computing unit in the system, allowing for faster
processing by dynamically redistributing workloads away from overloaded nodes and
toward underloaded nodes.
Dynamic load sharing and transparency refer to distributing network traffic dynamically across
multiple resources while maintaining a transparent experience for users and applications. This
involves techniques like load balancing algorithms that adapt to changing conditions and ensure
efficient resource utilization.
Dynamic Load Sharing:
This approach dynamically distributes traffic based on real-time factors like server load, connection count, or
response time, ensuring optimal resource utilization and preventing overload on any single server.
Transparency:
The goal of transparent load sharing is for users and applications to be unaware of the load distribution
process. The system handles the routing and switching of traffic behind the scenes, providing a seamless
experience.
Benefits:
Dynamic load sharing and transparency enhance performance, improve reliability, and enable scalability by
distributing the workload across multiple resources. This helps prevent single points of failure and allows
systems to handle increased traffic demands efficiently.
Examples:
Load balancing algorithms like round-robin, weighted round-robin, and least-connection methods are
commonly used for dynamic load sharing, as discussed in a TechTarget article.
Applications:
Dynamic load sharing and transparency are essential in various scenarios, including web hosting, cloud
computing, and distributed applications, where it's crucial to handle large volumes of traffic and maintain high
availability.
4. Distributed process implementation-
Distributed process implementation involves executing a task by breaking it down into smaller
parts and distributing those parts across multiple computing resources, often in a networked
environment. This approach aims to improve efficiency, performance, and reliability compared
to traditional centralized processing.
Here's a more detailed breakdown:
Key Concepts:
Distributed Processing:
A model where different parts of a task are executed simultaneously on multiple machines.
Task Decomposition:
The core process of breaking down a large task into smaller, manageable units.
Parallel Execution:
These smaller tasks are then executed concurrently across multiple nodes or machines.
Synchronization:
Mechanisms are needed to ensure that the parts of the task are executed in a coordinated manner and that
the results are combined correctly.
Networking:
Nodes communicate and exchange data over a network to facilitate the distributed execution.
Implementation Considerations:
Choosing the right architecture:
Different architectures (e.g., client-server, peer-to-peer) have different suitability for different types of
distributed processes.
Data sets are replicated onto multiple servers, which enables redundancy to keep data
highly available. The DFS is located on a collection of servers, mainframes or a cloud
environment over a local area network (LAN) so multiple users can access and store
unstructured data. If organizations need to scale up their infrastructure, they can add more
storage nodes to the DFS.
Clients access data on a DFS using namespaces. Organizations can group shared folders into
logical namespaces. A namespace is the shared group of networked storage on a DFS root.
Features of a DFS
Organizations use a DFS for features such as scalability, security and remote access to
data. Features of a DFS include the following:
Location independence. Users do not need to be aware of where data is stored. The DFS
manages the location and presents files as if they are stored locally.
Transparency. Transparency keeps the details of one file system away from other file
systems and users. There are multiple types of transparency in distributed file systems,
including the following:
Scalability. To scale a DFS, organizations can add file servers or storage nodes.
High availability. The DFS should continue to work in the event of a partial failure in the
system, such as a node failure or drive crash. A DFS should also create backup copies if
there are any failures in the system.
Security. Data should be encrypted at rest and in transit to prevent unauthorized access or
data deletion.
Working of DFS
There are two ways in which DFS can be implemented:
Standalone DFS namespace: It allows only for those DFS roots that exist on the local
computer and are not using Active Directory. A Standalone DFS can only be acquired on
those computers on which it is created. It does not provide any fault liberation and cannot
be linked to any other DFS. Standalone DFS roots are rarely come across because of their
limited advantage.
Domain-based DFS namespace: It stores the configuration of DFS in Active Directory,
creating the DFS namespace root accessible at \\<domainname>\<dfsroot> or \\<FQDN>\
<dfsroot>
6.Transaction service and concurrency control-
Concurrency control mechanisms provide us with various concepts & implementations to ensure
the execution of any transaction across any node doesn't violate ACID or BASE (depending on
database) properties causing inconsistency & mixup of data in the distributed systems.
Types of Concurrency Control Mechanisms
There are 2 types of concurrency control mechanisms as shown below diagram:
Types of Concurrency Control Mechanism
8.Case studies-
Sun’s Network File System (NFS) One of the first uses of distributed client/server computing was in the
realm of distributed file systems. In such an environment, there are a number of client machines and
one server (or a few); the server stores the data on its disks, and clients request data through well-
formed protocol messages. Figure 49.1 depicts the basic setup. Client 0 Client 1 Client 2 Client 3 Server
RAID Network Figure 49.1: A Generic Client/Server System As you can see from the picture, the server
has the disks, and clients send messages across a network to access their directories and files on those
disks. Why do we bother with this arrangement? (i.e., why don’t we just let clients use their local disks?)
Well, primarily this setup allows for easy sharing of data across clients. Thus, if you access a file on one
machine (Client 0) and then later use another (Client 2), you will have the same view of the file system.
Your data is naturally shared across these different machines. A secondary benefit is centralized
administration; for example, backing up files can be done from the few server machines instead of from
the multitude of clients. Another advantage could be security; having all servers in a locked machine
room prevents certain types of problems from arising.
The General Parallel File System (GPFS) is a high-performance clustered file system
developed by IBM. It is designed to provide rapid access to large volumes of data while
maintaining high availability and scalability. GPFS is widely used in the field of high-
performance computing (HPC) and is particularly well-suited for applications that require
parallel access to large files or datasets.
GPFS is a distributed file system that allows multiple servers or nodes to access and
manage a shared file system simultaneously.
GPFS is known for its ability to handle large files and datasets. It supports file sizes up
to several petabytes and can efficiently distribute data across multiple disks and servers
to achieve high levels of parallelism. This parallel access to storage resources allows for
faster data processing and analysis, as multiple nodes can read and write data
simultaneously.
One of the key features of GPFS is its high availability.
GPFS offers a wide range of features and benefits that make it a popular choice for
organizations dealing with large-scale data processing and storage. Some of the key
features and benefits of GPFS include:
Scalability: GPFS can scale to support thousands of servers and petabytes of data,
making it suitable for environments that require massive storage and processing
capabilities.
Performance: GPFS delivers high-performance data access and processing through its
parallel architecture, enabling organizations to process large datasets quickly.
Reliability: GPFS provides built-in fault tolerance mechanisms, such as data replication
and distributed metadata, to ensure data availability and integrity even in the event of
hardware failures.
Flexibility: GPFS supports a variety of storage technologies, including traditional
spinning disks, solid-state drives (SSDs), and even cloud storage.
The Andrew File System (AFS) is a distributed file system that allows multiple computers to
share files and data seamlessly. It was developed by Morris ET AL. in 1986 at Carnegie Mellon
University in collaboration with IBM.
The Andrew File System (AFS) is a distributed file system developed at Carnegie Mellon
University. It was designed to handle large-scale distributed computing environments,
providing a way to share files across multiple machines in a network as if they were local files.
Andrew File System Architecture
Vice: The Andrew File System provides a homogeneous, location-transparent file
namespace to all client workstations by utilizing a group of trustworthy servers known as
Vice. The Berkeley Software Distribution of the Unix operating system is used on both
clients and servers. Each workstation’s operating system intercepts file system calls and
redirects them to a user-level process on that workstation.
Venus: This mechanism, known as Venus, caches files from Vice and returns updated
versions of those files to the servers from which they originated. Only when a file is opened
or closed does Venus communicate with Vice; individual bytes of a file are read and written
directly on the cached copy, skipping Venus
This file system architecture was largely inspired by the need for scalability. To increase the
number of clients a server can service, Venus performs as much work as possible rather than
Vice. Vice only keeps the functionalities that are necessary for the file system’s integrity,
availability, and security. The servers are set up as a loose confederacy with little connectivity
between them.
Figure: Andrew File System
The following are the server and client components used in AFS networks:
Any computer that creates requests for AFS server files hosted on a network qualifies as a
client.
The file is saved in the client machine’s local cache and shown to the user once a server
responds and transmits a requested file.
When a user visits the AFS, the client sends all modifications to the server via a callback
mechanism. The client machine’s local cache stores frequently used files for rapid access.
Coda is a distributed file system developed as a research project at Carnegie Mellon University since 1987
under the direction of Mahadev Satyanarayanan. It descended directly from an older version of Andrew File
System (AFS-2) and offers many similar features. The InterMezzo file system was inspired by Coda.
Features-
Coda has many features that are desirable for network file systems, and several features not found elsewhere.
A distributed file system stores files on one or more computers called servers, and
makes them accessible to other computers called clients, where they appear as normal
files. There are several advantages to using file servers: the files are more widely
available since many computers can access the servers, and sharing the files from a
single location is easier than distributing copies of files to individual clients. Backups
and safety of the information are easier to arrange since only the servers need to be
backed up. The servers can provide large storage space, which might be costly or
impractical to supply to every client. The usefulness of a distributed file system
becomes clear when considering a group of employees sharing documents. However,
more is possible. For example, sharing application software is an equally good
candidate. In both cases system administration becomes easier.
There are many problems facing the design of a good distributed file system.
Transporting many files over the net can easily create sluggish performance and
latency, network bottlenecks and server overload can result. The security of data is
another important issue: how can we be sure that a client is really authorized to have
access to information and how can we prevent data being sniffed off the network?
Two further problems facing the design are related to failures.