Unit V
Unit V
UNIT – IV
STORAGE MANAGEMENT: File system-Concept of a file, access methods, directory structure,
file system mounting, file sharing, protection. (T1: Ch-10) SECONDARY-STORAGE
STRUCTURE: Overview of mass storage structure, disk structure, disk attachment, disk
scheduling algorithms, swap space management, stable storage implementation, and tertiary
storage structure (T1: Ch-12).
File Concept
The operating system abstracts from the physical properties of its storage devices to define a
logical storage unit, the file. Files are mapped by the operating system onto physical devices.
These storage devices are usually nonvolatile, so the contents are persistent between system
reboots.
A file is a named collection of related information that is recorded on secondary
storage. From a user’s perspective, a file is the smallest allotment of logical secondary storage.
Commonly, files represent programs (both source and object forms) and data. Data files may
be numeric, alphabetic, alphanumeric, or binary.
The information in a file is defined by its creator. Many different types of information
maybe stored in a file—source or executable programs, numeric or text data, photos, music,
video, and so on. A file has a certain defined structure, which depends on its type. A text file is
a sequence of characters organized into lines (and possibly pages). A source file is a
sequence of functions, each of which is further organized as declarations followed by
executable statements. An executable file is a series of code sections that the loader can
bring into memory and execute.
File Attributes
file’s attributes vary from one operating system to another but typically consist of these:
• Name. The symbolic file name is the only information kept in human readable form.
• Identifier. This unique tag, usually a number, identifies the file within the file system; it is
the non-human-readable name for the file.
• Type. This information is needed for systems that support different types of files.
• Location. This information is a pointer to a device and to the location of the file on that
device.
• Size. The current size of the file (in bytes, words, or blocks) and possibly the maximum
allowed size are included in this attribute.
• Protection. Access-control information determines who can do reading, writing, executing,
and so on.
• Time, date, and user identification. This information may be kept for creation, last
modification, and last use. These data can be useful for protection, security, and usage
monitoring.
File Operations
Creating a file.
Writing a file: The system must keep a write pointer to the location in the file where the
next write is to take place. The write pointer must be updated whenever a write occurs.
Reading a file: system needs to keep a read pointer to the location in the file where the next
read is to take place. the current operation location can be kept as a per-process current file-
position pointer.
Repositioning within a file: This file operation is also known as a file seek.
Deleting a file
Truncating a file
Most of the file operations mentioned involve searching the directory for the entry associated
with the named file. To avoid this constant searching, many systems require that an open()
system call be made before a file is first used. The operating system keeps a table, called the
open-file table, containing information about all open files.
Typically, the open-file table also has an open count associated with each file to
indicate how many processes have the file open. Each close() decreases this open count, and
when the open count reaches zero, the file is no longer in use, and the file’s entry is removed
from the open-file table.
Some operating systems provide facilities for locking an open file (or sections of a file). File
locks allow one process to lock a file and prevent other processes from gaining access to it. A
shared lock is akin to a reader lock in that several processes can acquire the lock
concurrently. An exclusive lock behaves like a writer lock; only one process at a time can
acquire such a lock.
Access Methods
Files store information. When it is used, this information must be accessed and read into
computer memory. The information in the file can be accessed in several ways. Some systems
provide only one access method for files. while others support many access methods, and
choosing the right one for a particular application is a major design problem.
There are three ways to access a file into computer system: Sequential Access, Direct Access,
Index sequential Method.
Sequential Access
The simplest access method is sequential access. Information in the file is processed in
order, one record after the other. Reads and writes make up the bulk of the operations on a
file. A read operation—read next()—reads the next portion of the file and automatically
advances a file pointer, which tracks the I/O location. Similarly, the write operation—write
next()—appends to the end of the file and advances to the end of the newly written material
(the new end of file).
Key points:
1. Data is accessed one record right after another record in an order.
2. When we use read command, it move ahead pointer by one
3. When we use write command, it will allocate memory and move the pointer to
the end of the file
4. Such a method is reasonable for tape.
Direct Access
Another method is direct access (or relative access). Here, a file is made up of fixed-length
logical records that allow programs to read and write records rapidly in no particular order.
The direct-access method is based on a disk model of a file, since disks allow random access
to any file block.
For direct access, the file is viewed as a numbered sequence of blocks or records. Thus,
we may read block 14, then read block 53, and then write block 7. There are no restrictions
on the order of reading or writing for a direct-access file. For the direct-access method, the
file operations must be modified to include the block number as a parameter. Thus, we have
read(n), where n is the block number, rather than read next(), and write(n) rather than write
next().
The block number provided by the user to the operating system is normally a relative
block number. A relative block number is an index relative to the beginning of the file. Thus,
the first relative block of the file is 0, the next is1, and so on. When file is used, information is
read and accessed into computer memory and there are several ways to accesses these
information of the file.
Partitioning is useful for limiting the sizes of individual file systems, putting multiple
file-system types on the same device, or leaving part of the device available for other uses,
such as swap space or unformatted (raw) disk space. A file system can be created on each of
these parts of the disk. Any entity containing a file system is generally known as a volume.
The volume may be a subset of a device, a whole device, or multiple devices linked together
into a RAID set. Each volume can be thought of as a virtual disk. Volumes can also store
multiple operating systems, allowing a system to boot and run more than one operating
system.
Each volume that contains a file system must also contain information about the files
in the system. This information is kept in entries in a device directory or volume table of
contents. The device directory (more commonly known simply as the directory) records
information—such as name, location, size, and type—for all files on that volume. Figure 11.7
shows a typical file-system organization.
Directory Overview
The directory can be viewed as a symbol table that translates file names into their
directory entries. If we take such a view, we see that the directory itself can be organized in
many ways. The organization must allow us to insert entries, to delete entries, to search for a
named entry, and to list all the entries in the directory.
What is a directory?
Directory can be defined as the listing of the related files on the disk. The directory may store
some or the entire file attributes. To get the benefit of different file systems on the different
operating systems, A hard disk can be divided into the number of partitions of different sizes.
The partitions are also called volumes or mini disks.
Each partition must have at least one directory in which, all the files of the partition
can be listed. A directory entry is maintained for each file in the directory which stores all the
information related to that file.
A directory can be viewed as a file which contains the Meta data of the bunch of files.
Disadvantages
1. We cannot have two files with the same name.
2. The directory may be very big therefore searching for a file may take so much time.
3. Protection cannot be implemented for multiple users.
4. There are no ways to group same kind of files.
5. Choosing the unique name for every file is a bit complex and limits the number of files
in the system because most of the Operating System limits the number of characters
used to construct the file name.
In the two-level directory structure, each user has his own user file directory (UFD).
The UFDs have similar structures, but each lists only the files of a single user. When a user job
starts or a user logs in, the system’s master file directory (MFD) is searched. The MFD is
indexed by user name or account number, and each entry points to the UFD for that user.
Every file in the system has a path name. To name a file uniquely, a user must know the path
name of the file desired.
▪ There are two ways to specify a file path:
Absolute Path
▪ In this path we can reach to a specified file from the main or root directory.
▪ In this case current directory is not involved; file path is specified starting from the
root directory.
Relative Path
▪ The user working in any directory that directory is called current directory.
▪ To reach to a specified file we have to search from the current directory.
Each user has its own directory and it cannot enter in the other user's directory.
However, the user has the permission to read the root's data but he cannot write or modify
this. Only administrator of the system has the complete access of root directory.
Searching is more efficient in this directory structure. The concept of current working
directory is used. A file can be accessed by two types of path, either relative or absolute. In
tree structured directory systems, the user is given the privilege to create the files as well as
directories.
These kinds of directory graphs can be made using links or aliases. We can have
multiple paths for a same file. Links can either be symbolic (logical) or hard link (physical).
If a file gets deleted in acyclic graph structured directory system, then
1. In the case of soft link, the file just gets deleted and we are left with a dangling pointer.
2. In the case of hard link, the actual file will be deleted only if all the references to it gets
deleted.
File Systems
File system is the part of the operating system which is responsible for file management. It
provides a mechanism to store the data and access to the file contents including data and
programs. Some Operating systems treats everything as a file for example Ubuntu.
The File system takes care of the following issues
o File Structure
We have seen various data structures in which the file can be stored. The task of the
file system is to maintain an optimal file structure.
o Recovering Free space
Whenever a file gets deleted from the hard disk, there is a free space created in the
disk. There can be many such spaces which need to be recovered in order to reallocate
them to other files.
o disk space assignment to the files
The major concern about the file is deciding where to store the files on the hard disk.
o tracking data location
A File may or may not be stored within only one block. It can be stored in the non
contiguous blocks on the disk. We need to keep track of all the blocks on which the
part of the files reside.
File-System Mounting
• The basic idea behind mounting file systems is to combine multiple file systems into
one large tree structure.
• The mount command is given a file system to mount and a mount point ( directory )
on which to attach it.
• Once a file system is mounted onto a mount point, any further references to that
directory actually refer to the root of the mounted file system.
• Any files ( or sub-directories ) that had been stored in the mount point directory prior
to mounting the new file system are now hidden by the mounted file system, and are
no longer available. For this reason some systems only allow mounting onto empty
directories.
• File systems can only be mounted by root, unless root has previously configured
certain file systems to be mountable onto certain pre-determined mount points. ( E.g.
root may allow users to mount floppy file systems to /mnt or something like it. )
Anyone can run the mount command to see what file systems are currently mounted.
• File systems may be mounted read-only, or have other restrictions imposed.
Figure 11.14 - File system. (a) Existing system. (b) Unmounted volume.
File Sharing
Multiple Users
• On a multi-user system, more information needs to be stored for each file:
o The owner ( user ) who owns the file, and who can control its access.
o The group of other user IDs that may have some special access to the file.
o What access rights are afforded to the owner ( User ), the Group, and to the rest
of the world ( the universe, a.k.a. Others. )
Protection
• Files must be kept safe for reliability ( against accidental damage ), and protection
( against deliberate malicious access. ) The former is usually managed with backup
copies.
• One simple protection scheme is to remove all access to a file. However this makes the
file unusable, so some sort of controlled access must be arranged.
Access Control
In access-control list (ACL) specifying user names and the types of access allowed for each
user. When a user requests access to a particular file, the operating system checks the access
list associated with that file. If that user is listed for the requested access, the access is
allowed. Otherwise, a protection violation occurs, and the user job is denied access to the file.
This technique has two undesirable consequences:
• Constructing such a list may be a tedious and unrewarding task, especially if we do not know in
advance the list of users in the system.
• The directory entry, previously of fixed size, now must be of variable size, resulting in more
complicated space management.
These problems can be resolved by use of a condensed version of the access list. To
condense the length of the access-control list, many systems recognize three classifications of
users in connection with each file:
• Owner. The user who created the file is the owner.
• Group. A set of users who are sharing the file and need similar access is a group, or work
group.
• Universe. All other users in the system constitute the universe.
To illustrate, consider a person, Sara, who is writing a new book. She has hired three
graduate students (Jim, Dawn, and Jill) to help with the project. The text of the book is kept in
a file named book.tex. The protection associated with this file is as follows:
• Sara should be able to invoke all operations on the file.
• Jim, Dawn, and Jill should be able only to read and write the file; they should not be allowed
to delete the file.
• All other users should be able to read, but not write, the file. (Sara is interested in letting as
many people as possible read the text so that she can obtain feedback.)
Types of Access
• The following low-level operations are often controlled:
Write
W ( change ) file Change directory contents. Required to create or delete files.
contents.
• In addition there are some special bits that can also be applied:
o The set user ID ( SUID ) bit and/or the set group ID ( SGID ) bits applied to
executable files temporarily change the identity of whoever runs the program
to match that of the owner / group of the executable program. This allows users
running specific programs to have access to files ( while running that
program ) to which they would normally be unable to access. Setting of these
two bits is usually restricted to root, and must be done with caution, as it
introduces a potential security leak.
o The sticky bit on a directory modifies write permission, allowing users to only
delete files for which they are the owner. This allows everyone to create files in
/tmp, for example, but to only delete files which they have created, and not
anyone else's.
o The SUID, SGID, and sticky bits are indicated with an S, S, and T in the positions
for execute permission for the user, group, and others, respectively. If the letter
is lower case, ( s, s, t ), then the corresponding execute permission is not also
given. If it is upper case, ( S, S, T ), then the corresponding execute permission
IS given.
o The numeric form of chmod is needed to set these advanced bits.
A read–write head “flies” just above each surface of every platter. The heads are attached to a
disk arm that moves all the heads as a unit. The surface of a platter is logically divided into
circular tracks, which are subdivided into sectors. The set of tracks that are at one arm
position makes up a cylinder. There may be thousands of concentric cylinders in a disk drive,
and each track may contain hundreds of sectors. The storage capacity of common disk drives
is measured in gigabytes.
When the disk is in use, a drive motor spins it at high speed. Most drives rotate 60 to
250 times per second, specified in terms of rotations per minute (RPM). Common drives
spin at 5,400, 7,200, 10,000, and 15,000 RPM. Disk speed has two parts. The transfer rate is
the rate at which data flow between the drive and the computer. The positioning time, or
random-access time, consists of two parts: the time necessary to move the disk arm to the
desired cylinder, called the seek time, and the time necessary for the desired sector to rotate
to the disk head, called the rotational latency.
Typical disks can transfer several megabytes of data per second, and they have seek
times and rotational latencies of several milliseconds Other forms of removable disks include
CDs, DVDs, and Blu-ray discs as well as removable flash-memory devices known as flash
drives (which are a type of solid-state drive).
A disk drive is attached to a computer by a set of wires called an I/O bus. Several kinds
of buses are available, including advanced technology attachment (ATA), serial ATA (SATA),
eSATA, universal serial bus (USB), and fibre channel (FC). The data transfers on a bus are
carried out by special electronic processors called controllers. The host controller is the
controller at the computer end of the bus. A disk controller is built into each disk drive. To
perform a disk I/O operation, the computer places a command into the host controller,
typically using memory-mapped I/O ports.
Solid-State Disks
Sometimes old technologies are used in new ways as economics change or the technologies
evolve. An example is the growing importance of solid-state disks, or SSDs. Simply described,
an SSD is nonvolatile memory that is used like a hard drive. There are many variations of this
technology, from DRAM with a battery to allow it to maintain its state in a power failure
through flash-memory technologies like single-level cell (SLC) and multilevel cell (MLC) chips.
SSDs have the same characteristics as traditional hard disks but can be more reliable
because they have no moving parts and faster because they have no seek time or latency. In
addition, they consume less power. However, they are more expensive per megabyte than
traditional hard disks, have less capacity than the larger hard disks, and may have shorter life
spans than hard disks, so their uses are somewhat limited.
SSDs are also used in some laptop computers to make them smaller, faster, and more
energy-efficient. Because SSDs can be much faster than magnetic disk drives, standard bus
interfaces can cause a major limit on throughput.
Magnetic Tapes
Magnetic tape was used as an early secondary-storage medium. Although it is relatively
permanent and can hold large quantities of data, its access time is slow compared with that of
Operating Systems Page 19
UNIT-5 NOTES
main memory and magnetic disk. In addition, random access to magnetic tape is about a
thousand times slower than random access to magnetic disk, so tapes are not very useful for
secondary storage.
Tapes are used mainly for backup, for storage of infrequently used information, and as
a medium for transferring information from one system to another
Tapes and their drivers are usually categorized by width, including 4, 8, and 19 millimeters
and 1/4 and 1/2 inch. Some are named according to technology, such as LTO-5 and SDLT.
Disk Structure
Modern magnetic disk drives are addressed as large one-dimensional arrays of logical blocks,
where the logical block is the smallest unit of transfer. The size of a logical block is usually
512 bytes, although some disks can be low-level formatted to have a different logical block
size, such as 1,024 bytes.
The one-dimensional array of logical blocks is mapped onto the sectors of the disk
sequentially. Sector 0 is the first sector of the first track on the outermost cylinder. The
mapping proceeds in order through that track, then through the rest of the tracks in that
cylinder, and then through the rest of the cylinders from outermost to innermost.
Tracks in the outermost zone typically hold 40 percent more sectors than do tracks in
the innermost zone. The drive increases its rotation speed as the head moves from the outer
to the inner tracks to keep the same rate of data moving under the head. This method is used
in CD-ROM 10.3 Disk Attachment 471 and DVD-ROM drives. Alternatively, the disk rotation
speed can stay constant; in this case, the density of bits decreases from inner tracks to outer
tracks to keep the data rate constant. This method is used in hard disks and is known as
constant angular velocity (CAV).
The number of sectors per track has been increasing as disk technology improves, and
the outer zone of a disk usually has several hundred sectors per track. Similarly, the number
of cylinders per disk has been increasing; large disks have tens of thousands of cylinders.
Disk Attachment
Computers access disk storage in two ways. One way is via I/O ports (or host-attached
storage); this is common on small systems. The other way is via a remote host in a distributed
file system; this is referred to as network-attached storage.
Host-Attached Storage
Host-attached storage is storage accessed through local I/O ports. These ports use several
technologies. The typical desktop PC uses an I/O bus architecture called IDE or ATA. This
architecture supports a maximum of two drives per I/O bus. A newer, similar protocol that
has simplified cabling is SATA.
High-end workstations and servers generally use more sophisticated I/O architectures such
as fibre channel (FC), a high-speed serial architecture that can operate over optical fiber or
over a four-conductor copper cable. It has two variants. One is a large switched fabric having
a 24-bit address space. This variant is expected to dominate in the future and is the basis of
storage-area networks (SANs), because of the large address space and the switched nature of
the communication, multiple hosts and storage devices can attach to the fabric, allowing great
flexibility in I/O communication.
The other FC variant is an arbitrated loop (FC-AL) that can address 126 devices (drives
and controllers). A wide variety of storage devices are suitable for use as host-attached
storage. Among these are hard disk drives, RAID arrays, and CD, DVD, and tape drives.
Network-Attached Storage
➢ Network attached storage connects storage devices to computers using a remote
procedure call, RPC, interface, typically with something like NFS filesystem mounts.
This is convenient for allowing several computers in a group common access and
naming conventions for shared storage.
➢ NAS can be implemented using SCSI cabling, or ISCSI uses Internet protocols and
standard network connections, allowing long-distance remote access to shared files.
➢ NAS allows computers to easily share data storage, but tends to be less efficient than
standard host-attached storage.
The technique that operating system uses to determine the request which is to be satisfied
next is called disk scheduling.
Let's discuss some important terms related to disk scheduling.
Seek Time
Seek time is the time taken in locating the disk arm to a specified track where the read/write
request will be satisfied.
Rotational Latency
It is the time taken by the desired sector to rotate itself to the position from where it can
access the R/W heads.
Transfer Time
It is the time taken to transfer the data.
Disk Access Time
Disk access time is given as,
Disk Access Time = Rotational Latency + Seek Time + Transfer Time
Disk Response Time
It is the average of time spent by each request waiting for the IO operation.
Purpose of Disk Scheduling
The main purpose of disk scheduling algorithm is to select a disk request from the queue of IO
requests and decide the schedule when this request will be processed.
Goal of Disk Scheduling Algorithm
o Fairness
o High throughout
o Minimal traveling head time
Disk Scheduling Algorithms
The list of various disks scheduling algorithm is given below. Each algorithm is carrying some
advantages and disadvantages. The limitation of each algorithm leads to the evolution of a
new algorithm.
o FCFS scheduling algorithm
o SSTF (shortest seek time first) algorithm
o SCAN scheduling
o C-SCAN scheduling
o LOOK Scheduling
o C-LOOK scheduling
Scan Algorithm
It is also called as Elevator Algorithm. In this algorithm, the disk arm moves into a particular
direction till the end, satisfying all the requests coming in its path, and then it turns back and
moves in the reverse direction satisfying requests coming in its path.
It works in the way an elevator works, elevator moves in a direction completely till the last
floor of that direction and then turns back.
Example
Consider the following disk request sequence for a disk with 100 tracks
98, 137, 122, 183, 14, 133, 65, 78
Head pointer starting at 54 and moving in left direction. Find the number of head movements
in cylinders using SCAN scheduling.
C-SCAN algorithm
In C-SCAN algorithm, the arm of the disk moves in a particular direction servicing requests
until it reaches the last cylinder, then it jumps to the last cylinder of the opposite direction
without servicing any request then it turns back and start moving in that direction servicing
the remaining requests.
Example
Consider the following disk request sequence for a disk with 100 tracks
98, 137, 122, 183, 14, 133, 65, 78
Head pointer starting at 54 and moving in left direction. Find the number of head movements
in cylinders using C-SCAN scheduling.
Look Scheduling
It is like SCAN scheduling Algorithm to some extant except the difference that, in this
scheduling algorithm, the arm of the disk stops moving inwards (or outwards) when no more
request in that direction exists. This algorithm tries to overcome the overhead of SCAN
algorithm which forces disk arm to move in one direction till the end regardless of knowing if
any request exists in the direction or not.
Example
Consider the following disk request sequence for a disk with 100 tracks
98, 137, 122, 183, 14, 133, 65, 78
Head pointer starting at 54 and moving in left direction. Find the number of head movements
in cylinders using LOOK scheduling.
C Look Scheduling
C Look Algorithm is similar to C-SCAN algorithm to some extent. In this algorithm, the arm of
the disk moves outwards servicing requests until it reaches the highest request cylinder, then
it jumps to the lowest request cylinder without servicing any request then it again start
moving outwards servicing the remaining requests.
It is different from C SCAN algorithm in the sense that, C SCAN force the disk arm to move till
the last cylinder regardless of knowing whether any request is to be serviced on that cylinder
or not.
Example
Consider the following disk request sequence for a disk with 100 tracks
98, 137, 122, 183, 14, 133, 65, 78
Head pointer starting at 54 and moving in left direction. Find the number of head movements
in cylinders using C LOOK scheduling.
Virtual memory is a combination of RAM and disk space that running processes can
use. Swap space is the portion of virtual memory that is on the hard disk, used when RAM
is full.
Swap space can be useful to computer in various ways:
• It can be used as a single contiguous memory which reduces i/o operations to read or
write a file.
• Applications which are not used or are used less can be kept in swap file.
• Having sufficient swap file helps the system keep some physical memory free all the
time.
• The space in physical memory which has been freed due to swap space can be used by
OS for some other important tasks.
In operating systems such as Windows, Linux, etc systems provide a certain amount of swap
space by default which can be changed by users according to their needs. If you don’t want to
use virtual memory you can easily disable it all together but in case if you run out of memory
then kernel will kill some of the processes in order to create a sufficient amount of space in
physical memory.
So it totally depends upon user whether he wants to use swap space or
not.Alternatively, swap space can be created in a separate raw partition. No file system or
directory structure is placed in this space. Rather, a separate swap-space storage manager is
used to allocate and deallocate the blocks from the raw partition. This manager uses
algorithms optimized for speed rather than for storage efficiency, because swap space is
accessed much more frequently than file systems (when it is used).
Stable-Storage Implementation
By definition, information residing in stable storage is never lost. To implement such storage,
we need to replicate the required information on multiple storage devices (usually disks)
with independent failure modes. We also need to coordinate the writing of updates in a way
that guarantees that a failure during an update will not leave all the copies in a damaged state
and that, when we are recovering from a failure, we can force all copies to a consistent and
correct value, even if another failure occurs during the recovery. A disk write results in one of
three outcomes:
1. Successful completion. The data were written correctly on disk.
2. Partial failure. A failure occurred in the midst of transfer, so only some of the sectors were
written with the new data, and the sector being written during the failure may have been
corrupted.
3. Total failure. The failure occurred before the disk write started, so the previous data values
on the disk remain intact.
Whenever a failure occurs during writing of a block, the system needs to detect it and
invoke a recovery procedure to restore the block to a consistent state. To do that, the system
must maintain two physical blocks for each logical block. An output operation is executed as
follows:
1. Write the information onto the first physical block.
2. When the first write completes successfully, write the same information onto the second
physical block.
3. Declare the operation complete only after the second write completes successfully.
During recovery from a failure, each pair of physical blocks is examined. If both are the same
and no detectable error exists, then no further action is necessary. If one block contains a
detectable error then we replace its contents with the value of the other block. If neither
block contains a detectable error, but the blocks differ in content, then we replace the content
of the first block with that of the second. This recovery procedure ensures that a write to
stable storage either succeeds completely or results in no change.
Very durable and reliable Read-only disks, such ad CD-ROM and DVD, come from the
factory with the data pre-recorded
Tapes
Compared to a disk, a tape is less expensive and holds more data, but random access is
much slower.
Tape is an economical medium for purposes that do not require fast random access, e.g.,
backup copies of disk data, holding huge volumes of data.
Large tape installations typically use robotic tape changers that move tapes between tape
drives and storage slots in a tape library
stacker – library that holds a few tapes
silo – library that holds thousands of tapes
A disk-resident file can be archived to tape for low cost storage; the computer can stage it
back into disk storage for active use.
IMPORTANT QUESTIONS