OS Unit-4
OS Unit-4
FILE:
A file is a named collection of related information that is recorded on secondary
storage. The information in a file is defined by its creator. Many different types of
information may be stored in a file source programs, object programs, executable programs,
numeric data, text, payroll records, graphic images, sound recordings, and so on.
A file has a certain defined which depends on its type.
• A text file is a sequence of characters organized into lines.
• A source file is a sequence of subroutines and functions, each of which is further organized
as declarations followed by executable statements.
• An object file is a sequence of bytes organized into blocks understandable by the system's
linker.
• An executable file is a series of code sections that the loader can bring into memory and
execute.
1. APPLICATION PROGRAM:- This is the top most layer where users interact
with files through application. It provide the user interface for file operation like
creating, deleting, reading and writing etc. Example include text editors and file
browsers.
2. LOGICAL FILE SYSTEM:- It manages the metadata information about a file
i.e includes all details about a file except the actual contents of the file.
6. DEVICES:- The bottom most layer, consisting of the actual hardware devices. It
perform the actual reading and writing of data to the physical storage medium.
a) FILE IDENTIFICATION:-
i. File name:- The name of the file is part of its meta data and is used to identify and
access the file.
ii. File type:- Meta data includes information about the type of file to ensure proper
handling and access by appropriate applications.
b) FILE ATTRIBUTES:-
i. Size:- The size of the file (in bites) is stored in the metadata and is used to allocate
space and manage file storage
ii. Permissions:- Information about who can read, write or execute the file. This is
essential for security and access control.
Directory structure
Directories are mainly used in organising the files. A directory isa collection
of files. The directories are used to store the files and attributes of the files too.
Advantages:
▪ Simple to implement.
▪ Faster file access.
➢ Disadvantages:
▪ Naming problem.
▪ Grouping problem.
Advantages:
Can have same file name for different
user efficient searching.
Disadvantages:
Grouping problem.
Poor file organization.
ALLOCATION METHODS
Allocation methods address the problem of allocating space to files so that disk space
is utilized effectively and files can be accessed quickly.
Three methods exist for allocating disk space
• Contiguous allocation
• Linked allocation
• Indexed allocation
a) Contiguous Allocation:-
In Contiguous Allocation, a file's data is stored in a sequence of consecutive blocks
on the storage medium. This means all the data blocks for the file are located next to each
other on the disk without gaps. The starting block and the length (number of blocks) are
recorded in the file's metadata.
Advantages:-
• Faster access
• Simple and efficient for small files.
Disadvantages-
• Fragmentation
• Wastage of Space
• Limited scalability for large and dynamic systems.
b) Linked Allocation:-
In Linked Allocation, a file is divided into multiple blocks that can be scattered
across the disk. Each block stores the data along with a pointer to the next block of the
file. This eliminates the need for the file to occupy contiguous space.
Advantages:
• No Contiguous Space Needed
• Dynamic File Size
• Efficient Space Utilization
Disadvantages:
• Slower Access
• Pointer Overhead
• Risk of Broken Links
c) Indexed Allocation
In Indexed Allocation, a special block (called the index block) is used to store pointers to
all the data blocks of a file. Instead of linking blocks together (as in linked allocation), all
pointers to a file's blocks are collected in one place: the index block. This approach allows the
file's blocks to be scattered anywhere on the disk, while still supporting both sequential and
random access efficiently.
Advantages:
• No Fragmentation Issues
• Efficient Access
• Supports Dynamic File Size
• Simplified Management
Disadvantages:
• Overhead of the Index Block
• Limited File Size
• Extra Access for Index Block
• Complex Implementation
2. Free list
In this method, all the free blocks existing in the disk are linked together in a linked
list. The address of the first free block is stored somewhere in the memory. Each free
block contains a pointer that contains the address to the next free block. The last free
block points to null, indicating the end of the linked list.
3. Grouping
The grouping technique is also called the "modification of a linked list technique”. In
this method, first, the free block of memory contains the addresses of the n-free blocks.
And the last free block of these n free blocks contains the addresses of the next n free
block of memory and this keeps going on. This technique separates the empty and
occupied blocks of space of memory.
4. Counting
This method takes advantage of the fact that several contiguous blocks may be
allocated or freed simultaneously. In this method, a linked list is maintained but in
addition to the pointer to the next free block, a count of free contiguous blocks that
follow the first block is also maintained. Thus each free block in the disk will contain
two things . A pointe to the next free block. The number of free contiguous blocks
following it
Comparison of Free Space Management Techniques
Feature Bit Map Linked List Grouping Counting
Structure Bit for each List of free Groups Count of
block blocks multiple blocks consecutive
blocks
Efficiency Fast for small Fast for small Efficient for Very efficient
spaces spaces large spaces for large block
Space Low, but High due to Low for large Minimal, just
Overhead grows with pointers groups the count
size
Fragmentation Can cause Low Reduces Minimizes
internal fragmentation fragmentation fragmentation
fragmentation
Complexity Simple to Moderate More complex Simple but
implement complexity needs
management
Access Time Constant time Linear search Faster for Constant time
groups for large
blocks
Fragmentation
Fragmentation occurs when the available storage space in a system is divided into
smaller, non-contiguous blocks, causing inefficiency in file storage. This issue can negatively
impact system performance, making it harder to allocate or access large files effectively.
Fragmentation is categorized into two main types: Internal Fragmentation and External
Fragmentation.
• Throughput: The amount of data that can be read or written per unit of time.
• Latency: The time it takes for a read or write operation to complete.
• IOPS (Input/Output Operations Per Second): The number of read and write
requests that can be processed per second.
• Response Time: The time it takes for a request to be processed and a response to
be returned.
1.Hardware:
• Disk Speed: Faster disks (e.g., SSDs) can significantly improve performance.
• Disk Controller: A high-performance disk controller can optimize data transfer
rates.
• Memory: More memory can reduce disk I/O and improve overall system
performance.
2. Software:
• File System Type: Different file systems have different performance
characteristics. Choosing the right file system for your workload is important.
• File System Configuration: Proper configuration of file system parameters,
such as block size and journal size, can impact performance.
• Kernel Settings: Kernel parameters related to I/O scheduling and caching
can affect file system performance.
• I/O Pattern: The pattern of I/O requests (sequential, random, small, large)
can significantly impact performance.
• Concurrency: The number of concurrent I/O requests can affect performance,
especially in high-load environments.
DATA RECOVERY
Data recovery is the process of restoring data that has been lost, accidentally
deleted corrupted or made inaccessible
There are three basic types of recovery
• Instance recovery
• Crash recovery
• Media recovery
INSTANCE RECOVERY
Is a process used to restore the consistency of a system after a failure, such as
a crash, unexpected shutdown, or hardware failure
1.Failure Detection :The system identifies that an instance(e.g.(DBMS)database or
virtual machine) has failed or crashed.
2. Checkpoint/Log Replay: The system uses saved check points or logstore store the
instance to a consistent state.
3. Consistency Checks: The system performs integrity checks to ensure data
consistency across all structures (e.g., database tables, filesystems)
CRASH RECOVERY
Crash recovery in an operating system refers to the process of restoring the system to
a consistent and operational state after a crash or failure, such as a power failure, hardware
failure, or software bug.
Types of Failures
• System Crash: Occurs when the operating system halts due to a software bug or
system-level error.
• Hardware Failure: Includes issues such as disk failure or memory corruption.
• Power Failure: Results in abrupt shutdown, potentially leaving processes
incomplete.
Media Recovery
Media recovery refers to the process of restoring a system or database after a
failure that involves permanent damage to the storage medium, such as a disk
corruption or data loss.
Media Recovery Process
Failure Identification: The system detects damage to the storage medium
Data Restoration: Media recovery uses backups and archived logs to rebuild or
replace the damaged parts of the storage.
Log Replay: Just like in instance recovery, logs are replayed to restore data
consistency. This ensures that all transactions completed after the last backup are
recovered.
Consistency Checks: Integrity checks are performed to verify that the recovered data
is consistent and matches the intended structure
• Deadlock resolution: When a deadlock is detected, a recovery scheme can be used to restore
system functionality. This can involve
• Process termination: Terminating one or more processes involved in the deadlock
• Rollback: Rolling back the state of certain processes to a point where deadlock is not present
• Resource: This is the process of protecting these resources through protocols and program
interfaces.
• Data Integrity: Data integrity ensures that the data stored in the file system remains accurate,
consistent, and unaltered except by authorized operations. Any corruption, whether caused by
software bugs, power failures, or hardware malfunctions, can lead to data loss or inconsistencies.
How it's ensured: Checksums and Hashing: File systems often use checksums or cryptographic
hashes to verify data during read and write operations. These checks can detect corruption in stored
data.
Fault Tolerance: Fault tolerance refers to the file system’s ability to continue operating correctly
despite the occurrence of hardware or software failures. A fault-tolerant system can handle failures
without losing data or requiring a complete shutdown.
• Redundancy: Techniques like RAID (Redundant Array of Independent Disks) or data mirroring
provide redundancy, ensuring that if one disk fails, the system can continue operating using a backup
copy of the data. Snapshots and Backups: Regular snapshots and backups create restore points,
enabling data recovery in case of corruption or loss. File systems like ZFS and Btrfs allow for
snapshot-based recovery.
Mass Storage Structure
• Systems designed to store enormous volumes of data are referred to as mass storage
devices.
• Magnetic disk provides a bulk of secondary storage of modern computers.
• Drives rotates at 60 to 250 times per second & Typical transfer rate is 100-150 M
Bytes/sec
• Positioning time i.e. typically, time is the time to move the disk arm to desired seek
time & time for desired sector to rotate the disk head (latency) i.e. it refers to the sum
of seek time plus rotational latency; typically, 5-10ms.
• Head crash results from disk head making contact with disk surface which is bad.
Disks can be removable
Advantages:
• Cost-Effectiveness
• Portability
• Better storage security
• Easier to manage
Dis-advantages:
• Size and weight
• Power consumption:
• Noise
• Sensitivity to physical damage
The higher data transfer rates of modern mass storage devices have increased
the overall throughput of data access, improving the performance of data-
intensive applications i.e. current devices can achieve transfer speeds of several
gigabytes per second, dramatically accelerating data operations.
• Enhanced Parallel Access
Modern mass storage supports advanced parallel access capabilities, allowing multiple read
and write operations to occur simultaneously. This parallelism significantly improves overall
system performance, especially in multi-user environments and high-load scenarios.
• Enhanced Reliability
Improved data integrity and fault-tolerance features of mass storage devices ensure the
reliability of stored information.
• Improved Capacity
The growing storage capacities of mass storage devices enable the efficient management and
archiving of large data sets.
• Optical Discs
Optical storage, such as CDs, DVDs, and Blu-ray discs, uses laser technology to read and
write data, offering portable and relatively inexpensive storage.
Disk Scheduling
Disk scheduling is a technique operating systems use to manage the order in which
disk I/O (input/output) requests are processed. Disk scheduling is also known as I/O
Scheduling.
Disk Scheduling Algorithms
Disk scheduling algorithms are crucial in managing how data is read from and
written to a computer’s hard disk. These algorithms help determine the order in which disk
read and write requests are processed, significantly impacting the speed and efficiency of
data access.
Key algorithms include First-Come, First-Served (FCFS), which processes requests in arrival
order but can lead to high seek times; Shortest Seek Time First (SSTF), which prioritizes
requests closest to the current head position to reduce seek time but risks starvation; SCAN
(Elevator Algorithm) and its variation C-SCAN, which sweep across the disk in one or both
directions, providing a balanced approach; and LOOK/C-LOOK, which optimize SCAN by
only traveling as far as the last request. Modern trends in disk scheduling, like real-time
scheduling and algorithms for SSDs, address unique challenges posed by evolving storage
technologies, focusing on workload efficiency and fairness.
Advantages of FCFS
• Every request gets a fair chance
• No indefinite postponement
Disadvantages of FCFS
• Does not try to optimize seek time
• May not provide the best possible service
Advantages
• The average Response Time decreases
• Throughput increases
Disadvantages
• Overhead to calculate seek time in advance
• Can cause Starvation for a request if it has a higher seek time as compared to
incoming requests
3. SCAN
It is also called as Elevator Algorithm. In this algorithm, the disk arm moves into a
particular direction till the end, satisfying all the requests coming in its path, and then it turns
back and moves in the reverse direction satisfying requests coming in its path. It works in the
way an elevator works, elevator moves in a direction completely till the last floor of that
direction and then turns back.
Example:
Advantages
• High throughput
• Low variance of response time
• Average response time
Disadvantages
• Long waiting time for requests for locations just visited by disk arm
4. C-SCAN
In the C-SCAN algorithm in which the disk arm instead of reversing its direction
goes to the other end of the disk and starts servicing the requests from there. So, the disk
arm moves in a circular fashion and this algorithm is also similar to the SCAN algorithm
hence it is known as C-SCAN (Circular SCAN).
Example:
Suppose the requests to be addressed are-82,170,43,140,24,16,190. And the
Read/Write arm is at 50, and it is also given that the disk arm should move “towards the
larger value”.
So, the total overhead movement (total distance covered by the disk arm) is calculated as:
= (199-50) + (199-0) + (43-0) = 391
5. LOOK
LOOK Algorithm is similar to the SCAN disk scheduling algorithm except for the
difference that the disk arm in spite of going to the end of the disk goes only to the last
request to be serviced in front of the head and then reverses its direction from there only.
Thus it prevents the extra delay which occurred due to unnecessary traversal to the end of
the disk
Replication
Replication in system design involves creating and maintaining
exact copies of data, resources, or processes across multiple locations
within a system. It's like having mirrors of the same information or
functionality spread out across different parts of a system. Replication
serves various purposes, such as:
• Improving performance by distributing workload across multiple
instances
• Enhancing fault tolerance by ensuring that if one copy fails, others
can take over seamlessly
• Improving accessibility by allowing users to access data or
resources from nearby replicas.
DISTRIBUTED FILES SYSTEM
A Distributed File System (DFS) is a file system that is distributed
on multiple file servers or multiple locations. It allows programs to access
or store isolated files as they do with the local ones, allowing
programmers to access files from any network or computer.
Traditional vs. Distributed File Systems
Traditional File Systems Distributed File Systems
Files are stored on a local disk or Files are distributed across
single machine multiple machines or locations.
Files can be accessed from the Files can be accessed from
specific machine anywhere in the network.
Limited by the capacity of a single Can scale by adding more
machine machines to the system.
If the machine fails , files may be Offers redundancy, so data
lost. remains accessible even if one
machine fails.
Faster for local files access but Can handle larger workloads by
limited by hardware distributing tasks across machines.