0% found this document useful (0 votes)
7 views17 pages

Unit-4 Rtu Kota

The document covers various aspects of system performance modeling, static and dynamic process scheduling, load balancing, distributed file systems, and concurrency control in distributed transactions. It discusses the importance of replication in distributed systems, along with case studies on Sun's Network File System and IBM's General Parallel File System. Additionally, it highlights the characteristics of good scheduling algorithms and the benefits of dynamic load sharing and transparency in resource management.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views17 pages

Unit-4 Rtu Kota

The document covers various aspects of system performance modeling, static and dynamic process scheduling, load balancing, distributed file systems, and concurrency control in distributed transactions. It discusses the importance of replication in distributed systems, along with case studies on Sun's Network File System and IBM's General Parallel File System. Additionally, it highlights the characteristics of good scheduling algorithms and the benefits of dynamic load sharing and transparency in resource management.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 17

Unit-4

1.System performance model-


System performance modeling is a process that uses mathematical models to
analyze, optimize, and predict how a system will perform. It's a key tool in
systems engineering.
How it works A mathematical model is created of the system, Different
scenarios are simulated, Key performance indicators are measured, and
Potential problems are identified.
What it's used for
 To understand how a computer system performs
 To predict how it will perform under different conditions
 To estimate power consumption for mobile or embedded systems
 To create realistic performance test scenarios

Steps in system performance modeling


1. Define the system boundaries
2. Select the modeling approach
3. Build and validate the model
4. Perform sensitivity analysis

2.Static process scheduling with communication-

Scheduling in Distributed Systems:

The techniques that are used for scheduling the processes in distributed systems are as
follows:
1. Task Assignment Approach: In the Task Assignment Approach, the user-submitted process
is composed of multiple related tasks which are scheduled to appropriate nodes in a
system to improve the performance of a system as a whole.
2. Load Balancing Approach: In the Load Balancing Approach, as the name implies, the
workload is balanced among the nodes of the system.
3. Load Sharing Approach: In the Load Sharing Approach, it is assured that no node would be
idle while processes are waiting for their processing.
Note:The Task Assignment Approach finds less applicability
practically as it assumes that characteristics of processes
like inter-process communication cost, etc. must be known in
advance.

Characteristics of a Good Scheduling Algorithm:

The following are the required characteristics of a Good Scheduling Algorithm:


 The scheduling algorithms that require prior knowledge about the properties and resource
requirements of a process submitted by a user put a burden on the user. Hence, a good
scheduling algorithm does not require prior specification regarding the user-submitted
process.
 A good scheduling algorithm must exhibit the dynamic scheduling of processes as the
initial allocation of the process to a system might need to be changed with time to balance
the load of the system.
 The algorithm must be flexible enough to process migration decisions when there is a
change in the system load.

Load Balancing in Distributed Systems:

The Load Balancing approach refers to the division of load among the processing elements of
a distributed system. The excess load of one processing element is distributed to other
processing elements that have less load according to the defined limits. In other words, the
load is maintained at each processing element in such a manner that neither it gets
overloaded nor idle during the execution of a program to maximize the system throughput
which is the ultimate goal of distributed systems. This approach makes all processing
elements equally busy thus speeding up the entire task leads to the completion of the task by
all processors approximately at the same time.
Types of Load Balancing Algorithms:
 Static Load Balancing Algorithm: In the Static Load Balancing Algorithm, while
distributing load the current state of the system is not taken into account. These algorithms
are simpler in comparison to dynamic load balancing algorithms. Types of Static Load
Balancing Algorithms are as follows:
o Deterministic: In Deterministic Algorithms, the properties of nodes and
processes are taken into account for the allocation of processes to nodes.
Because of the deterministic characteristic of the algorithm, it is difficult to
optimize to give better results and also costs more to implement.
o Probabilistic: n Probabilistic Algorithms, Statistical attributes of the system are
taken into account such as several nodes, topology, etc. to make process
placement rules. It does not give better performance.
 Dynamic Load Balancing Algorithm: Dynamic Load Balancing Algorithm takes into
account the current load of each node or computing unit in the system, allowing for faster
processing by dynamically redistributing workloads away from overloaded nodes and
toward underloaded nodes.

Types of Dynamic Load Balancing Algorithms are as follows:


o Centralized: In Centralized Load Balancing Algorithms, the task of handling
requests for process scheduling is carried out by a centralized server node. The
benefit of this approach is efficiency as all the information is held at a single
node but it suffers from the reliability problem because of the lower fault
tolerance. Moreover, there is another problem with the increasing number of
requests.
o Distributed: In Distributed Load Balancing Algorithms, the decision task of
assigning processes is distributed physically to the individual nodes of the
system. Unlike Centralized Load Balancing Algorithms, there is no need to hold
state information.

3.Dynamic load sharing and transparencies-

Dynamic load sharing and transparency refer to distributing network traffic dynamically across
multiple resources while maintaining a transparent experience for users and applications. This
involves techniques like load balancing algorithms that adapt to changing conditions and ensure
efficient resource utilization.
Dynamic Load Sharing:

This approach dynamically distributes traffic based on real-time factors like server load, connection count, or
response time, ensuring optimal resource utilization and preventing overload on any single server.
 Transparency:
The goal of transparent load sharing is for users and applications to be unaware of the load distribution
process. The system handles the routing and switching of traffic behind the scenes, providing a seamless
experience.
 Benefits:
Dynamic load sharing and transparency enhance performance, improve reliability, and enable scalability by
distributing the workload across multiple resources. This helps prevent single points of failure and allows
systems to handle increased traffic demands efficiently.
 Examples:
Load balancing algorithms like round-robin, weighted round-robin, and least-connection methods are
commonly used for dynamic load sharing, as discussed in a TechTarget article.
 Applications:
Dynamic load sharing and transparency are essential in various scenarios, including web hosting, cloud
computing, and distributed applications, where it's crucial to handle large volumes of traffic and maintain high
availability.
4. Distributed process implementation-
Distributed process implementation involves executing a task by breaking it down into smaller
parts and distributing those parts across multiple computing resources, often in a networked
environment. This approach aims to improve efficiency, performance, and reliability compared
to traditional centralized processing.
Here's a more detailed breakdown:

Key Concepts:
 Distributed Processing:
A model where different parts of a task are executed simultaneously on multiple machines.
 Task Decomposition:
The core process of breaking down a large task into smaller, manageable units.
 Parallel Execution:
These smaller tasks are then executed concurrently across multiple nodes or machines.
 Synchronization:
Mechanisms are needed to ensure that the parts of the task are executed in a coordinated manner and that
the results are combined correctly.
 Networking:
Nodes communicate and exchange data over a network to facilitate the distributed execution.

Benefits of Distributed Processing:


 Increased Efficiency:
By leveraging multiple resources, tasks can be completed faster.
 Improved Performance:
Distributed systems can handle larger workloads and more complex computations than single-machine
systems.
 Enhanced Reliability:
If one node fails, the others can continue processing, making the system more resilient.
 Scalability:
Distributed systems can easily be scaled up or down by adding or removing nodes.

Implementation Considerations:
 Choosing the right architecture:
Different architectures (e.g., client-server, peer-to-peer) have different suitability for different types of
distributed processes.

 Designing for concurrency:


Processes need to be designed to handle concurrent execution without causing conflicts or bottlenecks.
 Ensuring fault tolerance:
Mechanisms need to be in place to handle node failures and ensure data integrity.
 Managing communication:
Efficient and reliable communication between nodes is crucial for distributed processes.

5. What is DFS (Distributed File System)?


A Distributed File System (DFS) is a file system that is distributed on multiple file servers or
multiple locations. It allows programs to access or store isolated files as they do with the local
ones, allowing programmers to access files from any network or computer.
How a DFS works
A DFS clusters together multiple storage nodes and logically distributes data sets across
multiple nodes that each have their own computing power and storage. The data on a DFS can
reside on various types of storage devices, such as solid-state drives and hard disk drives.

Data sets are replicated onto multiple servers, which enables redundancy to keep data
highly available. The DFS is located on a collection of servers, mainframes or a cloud
environment over a local area network (LAN) so multiple users can access and store
unstructured data. If organizations need to scale up their infrastructure, they can add more
storage nodes to the DFS.
Clients access data on a DFS using namespaces. Organizations can group shared folders into
logical namespaces. A namespace is the shared group of networked storage on a DFS root.

Features of a DFS
Organizations use a DFS for features such as scalability, security and remote access to
data. Features of a DFS include the following:

 Location independence. Users do not need to be aware of where data is stored. The DFS
manages the location and presents files as if they are stored locally.

 Transparency. Transparency keeps the details of one file system away from other file
systems and users. There are multiple types of transparency in distributed file systems,
including the following:

 Scalability. To scale a DFS, organizations can add file servers or storage nodes.

 High availability. The DFS should continue to work in the event of a partial failure in the
system, such as a node failure or drive crash. A DFS should also create backup copies if
there are any failures in the system.

 Security. Data should be encrypted at rest and in transit to prevent unauthorized access or
data deletion.

Working of DFS
There are two ways in which DFS can be implemented:
 Standalone DFS namespace: It allows only for those DFS roots that exist on the local
computer and are not using Active Directory. A Standalone DFS can only be acquired on
those computers on which it is created. It does not provide any fault liberation and cannot
be linked to any other DFS. Standalone DFS roots are rarely come across because of their
limited advantage.
 Domain-based DFS namespace: It stores the configuration of DFS in Active Directory,
creating the DFS namespace root accessible at \\<domainname>\<dfsroot> or \\<FQDN>\
<dfsroot>
6.Transaction service and concurrency control-

Concurrency Control in Distributed Transactions

Concurrency control mechanisms provide us with various concepts & implementations to ensure
the execution of any transaction across any node doesn't violate ACID or BASE (depending on
database) properties causing inconsistency & mixup of data in the distributed systems.
Types of Concurrency Control Mechanisms
There are 2 types of concurrency control mechanisms as shown below diagram:
Types of Concurrency Control Mechanism

Pessimistic Concurrency Control (PCC)


The Pessimistic Concurrency Control Mechanisms proceeds on assumption that, most of
the transactions will try to access the same resource simultaneously. It's basically used to
prevent concurrent access to a shared resource and provide a system of acquiring a Lock on
the data item before performing any operation.
Optimistic Concurrency Control (OCC)
The problem with pessimistic concurrency control systems is that, if a transaction acquires a
lock on a resource so that no other transactions can access it.

7. Replication in Distributed System-


Replication in distributed systems involves creating duplicate copies of data or services
across multiple nodes. This redundancy enhances system reliability, availability, and
performance by ensuring continuous access to resources despite failures or increased
demand.
Data replication refers to the process of creating and maintaining multiple
copies of data across different storage locations or systems. It involves
duplicating data from a source location, known as the primary or master
copy, to one or more secondary copies. These secondary copies can be
stored on separate servers, data centers, or even in different geographical
locations.

Importance of Replication in Distributed Systems


Replication plays a crucial role in distributed systems due to several important reasons:
 Enhanced Availability:
o By replicating data or services across multiple nodes in a distributed system, you
ensure that even if some nodes fail or become unreachable, the system as a
whole remains available.
o Users can still access data or services from other healthy replicas, thereby
improving overall system availability.
 Improved Reliability:
o Replication increases reliability by reducing the likelihood of a single point of
failure.
o If one replica fails, others can continue to serve requests, maintaining system
operations without interruption.
o This redundancy ensures that critical data or services are consistently
accessible.
 Reduced Latency:
o Replicating data closer to users or clients can reduce latency, or the delay in
data transmission.
o This is particularly important in distributed systems serving users across different
geographic locations.
o Users can access data or services from replicas located nearer to them,
improving response times and user experience.
 Scalability:
o Replication supports scalability by distributing the workload across multiple
nodes.
o As the demand for resources or services increases, additional replicas can be
deployed to handle increased traffic or data processing requirements.
o This elasticity ensures that distributed systems can efficiently handle varying
workloads.
There are three popular data replication algorithms discussed here.

1. Single-leader replication also known as primary-secondary, active-passive,


or master-slave replication. The replication can either be synchronous or
asynchronous.

2. Multi-leader replication wherein multiple leaders replicate the data to


their followers. The replication can either be synchronous or asynchronous.
3. Leaderless replication. Here there is no one leader. Each node is
responsible for both read and write operations and replicating the data to
other nodes as well as receiving data from other nodes.

8.Case studies-

Sun’s Network File System (NFS) One of the first uses of distributed client/server computing was in the
realm of distributed file systems. In such an environment, there are a number of client machines and
one server (or a few); the server stores the data on its disks, and clients request data through well-
formed protocol messages. Figure 49.1 depicts the basic setup. Client 0 Client 1 Client 2 Client 3 Server
RAID Network Figure 49.1: A Generic Client/Server System As you can see from the picture, the server
has the disks, and clients send messages across a network to access their directories and files on those
disks. Why do we bother with this arrangement? (i.e., why don’t we just let clients use their local disks?)
Well, primarily this setup allows for easy sharing of data across clients. Thus, if you access a file on one
machine (Client 0) and then later use another (Client 2), you will have the same view of the file system.
Your data is naturally shared across these different machines. A secondary benefit is centralized
administration; for example, backing up files can be done from the few server machines instead of from
the multitude of clients. Another advantage could be security; having all servers in a locked machine
room prevents certain types of problems from arising.

9.General parallel file system-

The General Parallel File System (GPFS) is a high-performance clustered file system
developed by IBM. It is designed to provide rapid access to large volumes of data while
maintaining high availability and scalability. GPFS is widely used in the field of high-
performance computing (HPC) and is particularly well-suited for applications that require
parallel access to large files or datasets.

GPFS is a distributed file system that allows multiple servers or nodes to access and
manage a shared file system simultaneously.

GPFS is known for its ability to handle large files and datasets. It supports file sizes up
to several petabytes and can efficiently distribute data across multiple disks and servers
to achieve high levels of parallelism. This parallel access to storage resources allows for
faster data processing and analysis, as multiple nodes can read and write data
simultaneously.
One of the key features of GPFS is its high availability.

Key Features and Benefits of GPFS

GPFS offers a wide range of features and benefits that make it a popular choice for
organizations dealing with large-scale data processing and storage. Some of the key
features and benefits of GPFS include:

Scalability: GPFS can scale to support thousands of servers and petabytes of data,
making it suitable for environments that require massive storage and processing
capabilities.
Performance: GPFS delivers high-performance data access and processing through its
parallel architecture, enabling organizations to process large datasets quickly.
Reliability: GPFS provides built-in fault tolerance mechanisms, such as data replication
and distributed metadata, to ensure data availability and integrity even in the event of
hardware failures.
Flexibility: GPFS supports a variety of storage technologies, including traditional
spinning disks, solid-state drives (SSDs), and even cloud storage.

10.Windows file system-


Windows uses different file systems to manage how data is stored and accessed on disks. The
primary file system used by modern Windows is NTFS, which is the standard for most hard
drives and solid-state drives. Other file systems include FAT32, exFAT, and ReFS, each with
specific use cases and advantages.

Key Windows File Systems:


 NTFS (New Technology File System):
The default and most widely used file system in Windows, offering features like security, encryption, and
advanced metadata management. It's used on the main hard drive and is generally the best choice for
modern Windows machines.

 FAT32 (File Allocation Table 32):


A legacy file system that's still used for some older devices and in situations where compatibility is a primary
concern. It has limitations on file size and disk size compared to NTFS.
 exFAT (Extended FAT):
A more recent version of FAT designed for large drives like USB flash drives and memory cards. It supports
larger file sizes than FAT32.
 ReFS (Resilient File System):
Primarily used on Windows Server, ReFS is designed for large storage environments and offers features like
data integrity and scalability.

11. Andrew File System


The Andrew File System (AFS) is a distributed file system that allows multiple computers to
share files and data seamlessly. It was developed by Morris ET AL. in 1986 at Carnegie Mellon
University in collaboration with IBM.
The Andrew File System (AFS) is a distributed file system developed at Carnegie Mellon
University. It was designed to handle large-scale distributed computing environments,
providing a way to share files across multiple machines in a network as if they were local files.
Andrew File System Architecture
 Vice: The Andrew File System provides a homogeneous, location-transparent file
namespace to all client workstations by utilizing a group of trustworthy servers known as
Vice. The Berkeley Software Distribution of the Unix operating system is used on both
clients and servers. Each workstation’s operating system intercepts file system calls and
redirects them to a user-level process on that workstation.
 Venus: This mechanism, known as Venus, caches files from Vice and returns updated
versions of those files to the servers from which they originated. Only when a file is opened
or closed does Venus communicate with Vice; individual bytes of a file are read and written
directly on the cached copy, skipping Venus
This file system architecture was largely inspired by the need for scalability. To increase the
number of clients a server can service, Venus performs as much work as possible rather than
Vice. Vice only keeps the functionalities that are necessary for the file system’s integrity,
availability, and security. The servers are set up as a loose confederacy with little connectivity
between them.
Figure: Andrew File System

The following are the server and client components used in AFS networks:
 Any computer that creates requests for AFS server files hosted on a network qualifies as a
client.
 The file is saved in the client machine’s local cache and shown to the user once a server
responds and transmits a requested file.
 When a user visits the AFS, the client sends all modifications to the server via a callback
mechanism. The client machine’s local cache stores frequently used files for rapid access.

11.Coda file system-

Coda is a distributed file system developed as a research project at Carnegie Mellon University since 1987
under the direction of Mahadev Satyanarayanan. It descended directly from an older version of Andrew File
System (AFS-2) and offers many similar features. The InterMezzo file system was inspired by Coda.
Features-
Coda has many features that are desirable for network file systems, and several features not found elsewhere.

1. Disconnected operation for mobile computing.


2. Is freely available under the GPL.
3. High performance through client-side persistent caching
4. Server replication
5. Security model for authentication, encryption and access control
6. Continued operation during partial network failures in server network
7. Network bandwidth adaptation
8. Good scalability
9. Well defined semantics of sharing, even in the presence of network failure
Coda uses a local cache to provide access to server data when the network connection is lost.

A distributed file system stores files on one or more computers called servers, and
makes them accessible to other computers called clients, where they appear as normal
files. There are several advantages to using file servers: the files are more widely
available since many computers can access the servers, and sharing the files from a
single location is easier than distributing copies of files to individual clients. Backups
and safety of the information are easier to arrange since only the servers need to be
backed up. The servers can provide large storage space, which might be costly or
impractical to supply to every client. The usefulness of a distributed file system
becomes clear when considering a group of employees sharing documents. However,
more is possible. For example, sharing application software is an equally good
candidate. In both cases system administration becomes easier.

There are many problems facing the design of a good distributed file system.
Transporting many files over the net can easily create sluggish performance and
latency, network bottlenecks and server overload can result. The security of data is
another important issue: how can we be sure that a client is really authorized to have
access to information and how can we prevent data being sniffed off the network?
Two further problems facing the design are related to failures.

You might also like