0% found this document useful (0 votes)
5 views

Operating Systems Unit 3 - Files Management

The document discusses file systems and management in operating systems, emphasizing the organization, types, and operations of files. It covers user perspectives on files, file access permissions, and various file allocation strategies such as contiguous, chained list, and indexed allocation. Additionally, it addresses security concerns, disk management, and RAID structures for reliability and performance in data storage.

Uploaded by

Henry Kanenga
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Operating Systems Unit 3 - Files Management

The document discusses file systems and management in operating systems, emphasizing the organization, types, and operations of files. It covers user perspectives on files, file access permissions, and various file allocation strategies such as contiguous, chained list, and indexed allocation. Additionally, it addresses security concerns, disk management, and RAID structures for reliability and performance in data storage.

Uploaded by

Henry Kanenga
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 61

File Systems and Management

By Dr D B Ntalasha

1 Operaing systems - cs 225 8/24/2024


Introduction
 Much of the work within the computer system can be seen as
management of processes and files.
 Files are the primary means of accessing the information.
 The Storage of files may be in the main memory or in secondary
memory (disk memory).
 OS allows users to manage such files with a file system.

2 Operaing systems - cs 225 8/24/2024


What are files?
 Irrespective of the content, any organized information is a file,
e.g. - a telephone numbers list, web images or data
 logged from an instrument, all are files.
 In UNIX files are arbitrary bit (or byte) streams.
 A file system is that software which allows users and
applications to organize their files

3 Operaing systems - cs 225 8/24/2024


User’s view of files
 At this stage it is important to have a perspective which is
relevant to the user, so we will give user’s view of files
 First need (of a user) is to be able to access a file.
 The file system must be able to locate the file sought …
identify the file by its name.
 File names have extensions (e.g. .c, .obj) which define the file
type.
 Related files are organized into directories.
 Within a directory, each file must have a unique name.

4 Operaing systems - cs 225 8/24/2024


Directory and File Organization
 We have a tree structure amongst directories. Files form leaves in
the tree structure of directories.

5 Operaing systems - cs 225 8/24/2024


File Types
 Most OS use file types in their id.
 File descriptor in the file structure used by the file
management software to help OS provide file management
services.
 MAC OS typically stores this information in its resource fork –
done to let OS display the icons of the application
environment in which the file was created.
 PDP-11 used an octal 0407 as a magic number to identify
executable files.
 File system stores other information such as location of the
file etc.

6 Operaing systems - cs 225 8/24/2024


File Operations - 1
 User performs various operations – read, write, save, retrieve,
display, append, copy, delete etc. with files.
 User changes file attributes such as access permissions of files.
 Unix provides a visual editor called vi to view text files as
well as edit them.
 Unix provides the ls command to get a file listing. The
 ls command has several options for file listing.

7 Operaing systems - cs 225 8/24/2024


File Operations - 2

8 Operaing systems - cs 225 8/24/2024


File Access Permissions
 A file system manages access by checking a file’s permissions.
 A file may be accessed to perform read, write or execute
operations.
 The usage is determined by the context in which the file was
created.

9 Operaing systems - cs 225 8/24/2024


An Example
 Consider a file supporting an application of bus schedules.
 It shall contain a bus time-table with the
 following restrictions :
 read-only permission for the general public,
 read, write permission for the supervisor,
 read, write and execute permissions for the management.

10 Operaing systems - cs 225 8/24/2024


Who all can access a File?
 Unix recognizes three category of users – user/owner,
 group and others.
 Owner may be a person or a program or an application
 or a system based utility. The notion of a group comes from the
software engineering team operations.
 Others has the connotation of public usage.
 Organization of this information is as 9 bits r w x r w x r w x
for each owner, group and others where each r w x is an octal
number, e.g 100 100 100 gives read permission for owner,
group and others.

11 Operaing systems - cs 225 8/24/2024


Security Concerns
 File permissions are the most elementary and effective form of
security measure in a stand alone single user system.
 Some systems provide security by having passwords for files.
 Enhanced security would be to encrypt a file with some key.
 Unix provides crypt command to encrypt files. Syntax is :
crypt EncryptKey <InputFile> OutputFile

12 Operaing systems - cs 225 8/24/2024


Information Required for File
Management - 1

13 Operaing systems - cs 225 8/24/2024


Information Required for File
Management - 2

14 Operaing systems - cs 225 8/24/2024


File Storage Management - 1
 An OS needs to maintain several pieces of information for
file management, e.g. access and modification times of a file.
 Audit Trail gives who accessed which file and did what.
 In Unix, these trails are maintained in the syslog file – useful
to recover files after a system crash and to detect
unauthorized accesses to a system.
 Unix internally recognizes 4 different file types ordinary,
directory, special and named.
 Ordinary files are those created by the users programs or
utilities.

15 Operaing systems - cs 225 8/24/2024


File Storage Management - 2
 Directory files organize the files hierarchically, they are
different from ordinary files.
 Inode structure is used by Unix to maintain information about
named files.

16 Operaing systems - cs 225 8/24/2024


File Control Blocks
 The Microsoft counterpart of an inode is a File Control Block
(FCB).
 The FCBs store file name, location of secondary storage, length of a
file in bytes, date and time of its creation etc.

17 Operaing systems - cs 225 8/24/2024


Block Based File Organization - 1
 We next discuss organization of files as blocks of information.
 In this section we shall discuss various techniques used in the
organization of blocks.

18 Operaing systems - cs 225 8/24/2024


Contiguous Allocation - 1
 If we know apriori the Contiguous memory allocation is a
size of the file to be classical memory allocation model
created, this information that assigns a process consecutive
can be given to OS for it memory blocks
to follow a pre-allocation
policy and find a suitable
memory block that can
fit the entire file as a
contiguous block.

19 Operaing systems - cs 225 8/24/2024


Contiguous Allocation - 2
 The numbers 1, 2, 3 and 4 in the previous figure identify the
starting blocks of the four files.
 One advantage of the pre-allocation policy is that the retrieval
of information is very fast.
 One disadvantage of this policy is that it requires apriori
information of the size of the file.
 Other disadvantage is that it might not be possible to find a
contiguous memory block always.
 Also, note this is a static allocation.

20 Operaing systems - cs 225 8/24/2024


Chained List Allocation - 1
 This is a dynamic block allocation policy that overcomes the
disadvantages of pre-allocation policy.
 The disadvantage of dynamic allocation is that random access to
blocks is not possible.

21 Operaing systems - cs 225 8/24/2024


Chained List Allocation - 2
• There are two reasons why
a dynamic block allocation
policy is needed.
• The first is that in most
cases it is not possible to
know apriori the size of a
file being created.
• The second is that there
are some files that already
exist and it is not easy to
find contiguous regions

22 Operaing systems - cs 225 8/24/2024


Chained List Allocation - 3
 In a dynamic situation, a list of free blocks is maintained.
 Allocation is made as the need arises.
 We may even allocate one block at a time from a free space list.
 The OS maintains a chain of free blocks and allocates next free
block in the chain to an incoming file.
 This way the finally allocated files may be located at various
positions on the disk.
 The obvious overhead is the maintenance of chained links.
 But then we now have a dynamically allocated disk space.

23 Operaing systems - cs 225 8/24/2024


Indexed Allocation - 1
 An index file for each file in its first block is maintained.
 Thus address information for each block can be obtained with
one level of indirection.
 The advantage of this method is that there is a direct access to
any part of the file.

24 Operaing systems - cs 225 8/24/2024


Indexed Allocation - 2
 In an indexed
allocation we
maintain an index
table for each file in
its very first block.
 Thus it is possible to
obtain the address
information for each
of the blocks with
only one level of
indirection, i.e. from
the index.
25 Operaing systems - cs 225 8/24/2024
Internal and External Fragmentation
 In mapping byte streams to blocks, block size was assumed to
be 1024 bytes.
 In the previous example, for a file size of 1145 bytes, 2
blocks were allocated – 1024 + 121 bytes in the second
block. Such non-utilization of space caused internally is called
Internal Fragmentation.

26 Operaing systems - cs 225 8/24/2024


Internal and External Fragmentation
 Most OSs maintain a free space list to allocate blocks as needed.
 Suppose a file was initially 7 blocks after which it was
reduced to 4 blocks, a hole of 3 blocks is produced.
 Due to such holes, shortage of memory occurs due to non-
utilization of holes called External Fragmentation

27 Operaing systems - cs 225 8/24/2024


Policies in Practice - 1
 MSDOS and OS2 use a FAT (File Allocation Table) strategy
where stores entries for files for each directories (similar to
index node in Unix).
 File name is used to get the starting address of the first block of a
file.
 Each file block is chained linked to the next block till an EOF is
stored in some block
 FAT maintains a list of free block chains.
 FAT was stored in the first few blocks of disc space.

28 Operaing systems - cs 225 8/24/2024


Policies in Practice - 2
 Extension of FAT – FAT32 is supported on Windows98 and higher.
FAT32 in addition supports longer filenames and file
compression.
 Other version of FAT on Windows NT is NTFS.
 Unlike FAT, NTFS spreads the file tables throughout the discs
for their efficient management.
 Like FAT32, supports long file-names and file compression.
 File access permissions are supported by NTFS.
 Windows2000 uses NTFS

29 Operaing systems - cs 225 8/24/2024


Selecting a Disk-Scheduling Algorithm

 SSTF is common and has a natural appeal

 SCAN and C-SCAN perform better for systems that place a heavy load on the disk
 Less starvation

 Performance depends on the number and types of requests

 Requests for disk service can be influenced by the file-allocation method


 And metadata layout

 The disk-scheduling algorithm should be written as a separate module of the operating system, allowing it to be
replaced with a different algorithm if necessary

 Either SSTF or LOOK is a reasonable choice for the default algorithm

 What about rotational latency?


 Difficult for OS to calculate

 How does disk-based queuing effect OS queue ordering efforts?


Disk Management
 Low-level formatting, or physical formatting — Dividing a disk into sectors that the disk
controller can read and write
 Each sector can hold header information, plus data, plus error correction code (ECC)
 Usually 512 bytes of data but can be selectable

 To use a disk to hold files, the operating system still needs to record its own data structures on the
disk
 Partition the disk into one or more groups of cylinders, each treated as a logical disk
 Logical formatting or “making a file system”
 To increase efficiency most file systems group blocks into clusters
 Disk I/O done in blocks
 File I/O done in clusters

 Boot block initializes system


 The bootstrap is stored in ROM
 Bootstrap loader program stored in boot blocks of boot partition

 Methods such as sector sparing used to handle bad blocks


Booting from a Disk in Windows 2000
Swap-Space Management
 Swap-space — Virtual memory uses disk space as an extension of main memory
 Less common now due to memory capacity increases

 Swap-space can be carved out of the normal file system, or, more commonly, it can be in a separate disk
partition (raw)

 Swap-space management
 4.3BSD allocates swap space when process starts; holds text segment (the program) and data segment
 Kernel uses swap maps to track swap-space use
 Solaris 2 allocates swap space only when a dirty page is forced out of physical memory, not when the virtual
memory page is first created
 File data written to swap space until write to file system requested
 Other dirty pages go to swap space due to no other home
 Text segment pages thrown out and reread from the file system as needed

 What if a system runs out of swap space?

 Some systems allow multiple swap spaces


Data Structures for Swapping on
Linux Systems
RAID Structure
 RAID – multiple disk drives provides reliability via
redundancy

 Increases the mean time to failure

 Frequently combined with NVRAM to improve write


performance

 RAID is arranged into six different levels


RAID (Cont.)
 Several improvements in disk-use techniques involve the use of multiple
disks working cooperatively
 Disk striping uses a group of disks as one storage unit
 RAID schemes improve performance and improve the reliability of the
storage system by storing redundant data
 Mirroring or shadowing (RAID 1) keeps duplicate of each disk
 Striped mirrors (RAID 1+0) or mirrored stripes (RAID 0+1) provides
high performance and high reliability
 Block interleaved parity (RAID 4, 5, 6) uses much less redundancy
 RAID within a storage array can still fail if the array fails, so automatic
replication of the data between arrays is common
 Frequently, a small number of hot-spare disks are left unallocated,
automatically replacing a failed disk and having data rebuilt onto them
RAID Levels
RAID (0 + 1) and (1 + 0)
Extensions
 RAID alone does not prevent or detect data corruption or other errors, just disk
failures

 Solaris ZFS adds checksums of all data and metadata

 Checksums kept with pointer to object, to detect if object is the right one and whether
it changed

 Can detect and correct data and metadata corruption

 ZFS also removes volumes, partititions


 Disks allocated in pools
 Filesystems with a pool share that pool, use and release space like “malloc” and “free”
memory allocate / release calls
ZFS Checksums All Metadata and Data
Traditional and Pooled Storage
Stable-Storage Implementation
 Write-ahead log scheme requires stable storage

 To implement stable storage:


 Replicate information on more than one nonvolatile storage
media with independent failure modes
 Update information in a controlled manner to ensure that we
can recover the stable data after any failure during data transfer
or recovery
Tertiary Storage Devices
 Low cost is the defining characteristic of tertiary storage

 Generally, tertiary storage is built using removable media

 Common examples of removable media are floppy disks and


CD-ROMs; other types are available
Removable Disks
 Floppy disk — thin flexible disk coated with magnetic
material, enclosed in a protective plastic case
 Most floppies hold about 1 MB; similar technology is used for
removable disks that hold more than 1 GB
 Removable magnetic disks can be nearly as fast as hard disks,
but they are at a greater risk of damage from exposure
Removable Disks (Cont.)
 A magneto-optic disk records data on a rigid platter coated with
magnetic material
 Laser heat is used to amplify a large, weak magnetic field to record a
bit
 Laser light is also used to read data (Kerr effect)
 The magneto-optic head flies much farther from the disk surface
than a magnetic disk head, and the magnetic material is covered with
a protective layer of plastic or glass; resistant to head crashes
 Optical disks do not use magnetism; they employ special
materials that are altered by laser light
WORM Disks
 The data on read-write disks can be modified over and over

 WORM (“Write Once, Read Many Times”) disks can be written only
once
 Thin aluminum film sandwiched between two glass or plastic platters

 To write a bit, the drive uses a laser light to burn a small hole through
the aluminum; information can be destroyed by not altered
 Very durable and reliable

 Read-only disks, such ad CD-ROM and DVD, com from the factory
with the data pre-recorded
Tapes
 Compared to a disk, a tape is less expensive and holds more data, but
random access is much slower.
 Tape is an economical medium for purposes that do not require fast
random access, e.g., backup copies of disk data, holding huge volumes
of data.
 Large tape installations typically use robotic tape changers that move
tapes between tape drives and storage slots in a tape library
 stacker – library that holds a few tapes
 silo – library that holds thousands of tapes

 A disk-resident file can be archived to tape for low cost storage; the
computer can stage it back into disk storage for active use.
Operating System Support
 Major OS jobs are to manage physical devices and to
present a virtual machine abstraction to applications

 For hard disks, the OS provides two abstraction:


 Raw device – an array of data blocks
 File system – the OS queues and schedules the interleaved
requests from several applications
Application Interface
 Most OSs handle removable disks almost exactly like fixed disks — a
new cartridge is formatted and an empty file system is generated on the
disk
 Tapes are presented as a raw storage medium, i.e., and application does
not not open a file on the tape, it opens the whole tape drive as a raw
device
 Usually the tape drive is reserved for the exclusive use of that
application
 Since the OS does not provide file system services, the application must
decide how to use the array of blocks
 Since every application makes up its own rules for how to organize a
tape, a tape full of data can generally only be used by the program that
created it
Tape Drives
 The basic operations for a tape drive differ from those of a disk drive

 locate()positions the tape to a specific logical block, not an entire


track (corresponds to seek())
 The read position()operation returns the logical block
number where the tape head is
 The space()operation enables relative motion

 Tape drives are “append-only” devices; updating a block in the middle


of the tape also effectively erases everything beyond that block
 An EOT mark is placed after a block that is written
File Naming
 The issue of naming files on removable media is especially
difficult when we want to write data on a removable cartridge on
one computer, and then use the cartridge in another computer.

 Contemporary OSs generally leave the name space problem


unsolved for removable media, and depend on applications and
users to figure out how to access and interpret the data.

 Some kinds of removable media (e.g., CDs) are so well


standardized that all computers use them the same way.
Hierarchical Storage Management (HSM)

 A hierarchical storage system extends the storage hierarchy


beyond primary memory and secondary storage to incorporate
tertiary storage — usually implemented as a jukebox of tapes or
removable disks.

 Usually incorporate tertiary storage by extending the file system


 Small and frequently used files remain on disk
 Large, old, inactive files are archived to the jukebox

 HSM is usually found in supercomputing centers and other large


installations that have enormous volumes of data.
Speed
 Two aspects of speed in tertiary storage are bandwidth and
latency.

 Bandwidth is measured in bytes per second.


 Sustained bandwidth – average data rate during a large
transfer; # of bytes/transfer time
Data rate when the data stream is actually flowing
 Effective bandwidth – average over the entire I/O time,
including seek() or locate(), and cartridge switching
Drive’s overall data rate
Speed (Cont.)
 Access latency – amount of time needed to locate data
 Access time for a disk – move the arm to the selected cylinder and
wait for the rotational latency; < 35 milliseconds
 Access on tape requires winding the tape reels until the selected
block reaches the tape head; tens or hundreds of seconds
 Generally say that random access within a tape cartridge is about a
thousand times slower than random access on disk
 The low cost of tertiary storage is a result of having many cheap
cartridges share a few expensive drives
 A removable library is best devoted to the storage of infrequently
used data, because the library can only satisfy a relatively small
number of I/O requests per hour
Reliability
 A fixed disk drive is likely to be more reliable than a
removable disk or tape drive

 An optical cartridge is likely to be more reliable than a


magnetic disk or tape

 A head crash in a fixed hard disk generally destroys the data,


whereas the failure of a tape drive or optical disk drive often
leaves the data cartridge unharmed
Cost
 Main memory is much more expensive than disk storage

 The cost per megabyte of hard disk storage is competitive with


magnetic tape if only one tape is used per drive

 The cheapest tape drives and the cheapest disk drives have had
about the same storage capacity over the years

 Tertiary storage gives a cost savings only when the number of


cartridges is considerably larger than the number of drives
Price per Megabyte of DRAM
From 1981 to 2004
Price per Megabyte of Magnetic Hard Disk
From 1981 to 2004
Price per Megabyte of a Tape Drive
From 1984-2000
Disc Partitions
 Allows better management of disc space.
 Unix maintains disc partitions to house system, kernel and user files.
Windows too partitions the hard disc.
 Disc partitions are mounted on a file system.
 A disc partition is organized into a directory structure - tree.
 This tree gets connected to some node in the overall file
system tree – mounting a file system
 This basic concept is carried on in case of file servers
 on a network also.

60 Operaing systems - cs 225 8/24/2024


The End
 Questions

61 Operaing systems - cs 225 8/24/2024

You might also like