File Organization-Lec5
File Organization-Lec5
2
Data Organization
➢ There are two basic ways to organize data on a disk by:
• Sector and
• User- defined block.
3
Organizing Tracks by Sector
➢ The simplest view, is that sectors are adjacent, fixed-sized segments of a track that
happen to hold a file.
➢ This is often a perfectly adequate way to view a file logically, but it may not be a
good way to store sectors physically.
4
Organizing Tracks by Sector
➢ The file manager is the part of the operating system responsible for managing files.
• The file manager maps the logical parts of the file into their physical location.
• A cluster is a fixed number of contiguous sectors.
• The file manager allocates an integer number of clusters to a file. An example: Sector
size: 512 bytes, Cluster size: 2 sectors.
• If a file contains 10 bytes, a cluster is allocated (1024 bytes).
• There may be unused space in the last cluster of a file. This unused space contributes to
internal fragmentation.
• The clusters are also not usually stored contiguously on the disk, causing external
fragmentation.
5
Organizing Tracks by Sector
➢ Clusters are good since they improve sequential access: reading bytes sequentially
from a cluster can be done in one revolution, seeking only once.
➢ The file manager maintains a file allocation table (FAT) containing for each cluster
in the file and its location in disk.
➢ An extent is a group of contiguous clusters. If file is stored in a single extent then
seeking is done only once.
➢ If there is not enough contiguous clusters to hold a file, the file is divided into 2 or
more extents.
6
Organizing Tracks by Sector
7
Fragmentation
➢ Due to records not fitting exactly in a sector.
• Example: Record size = 200 bytes, sector size = 512 bytes
• To avoid that a record span 2 sectors, we can only store 2 records in this sector (112
bytes go unused per sector)
• The alternative is to let a record span two sectors, but in this case two sectors must be
read when we need to access this record).
➢ Due to the use of clusters.
• If the file size is not multiple of the cluster size, then the last cluster will be partially
used.
8
How to Chose Cluster Size
➢ Some OS allow the system administrator to choose the cluster size.
➢ When to use large cluster size?
• When disks contain large files likely to be processed sequentially.
• Example: Updates in a master file of bank accounts (in batch mode)
9
Organizing Tracks by Block
➢ Rather than being divided into sectors, the disk tracks may be divided into user-
defined blocks.
➢ When the data on a track is organized by block, this usually means that the amount
of data transferred in a single I/O operation can vary depending on the needs of the
software designer (not the hardware).
10
Organizing Tracks by Block
➢ The blocking factor indicates the number of records that are to be stored in each
block in a file.
➢ Blocks don’t have the sector-spanning and fragmentation problem of sectors since
they vary in size to fit the logical organization of the data.
➢ A block typically contains subblocks.
➢ Data subblock: contains the records in this block.
➢ Each block is usually accompanied by subblocks:
• Key-subblock:
• The key for the last record in the data subblock (disk controller can search for key without
loading it in main memory)
• Count-subblock:
• The number of bytes in a block.
11
Non-Data Overhead
➢ Amount of space used for extra stuff other than data.
➢ Sector-Addressable Disks
• At the beginning of each sector some info is stored, such as sector address, track
address, condition (if sector is defective);
• There is some gap between sectors.
➢ Block-Organized Disks
• Subblocks and interblock gaps is part of the extra stuff; more nondata overhead than
with sector-addressing.
12
Non-Data Overhead
➢ Whether using a block or a sector organization, some space on the disk is taken up
by non-data overhead. i.e., information stored on the disk during pre-formatting.
13
Non-Data Overhead
➢ The greater the block-size, the greater potential amount of internal track
fragmentation.
➢ The flexibility introduced by the use of blocks rather than sectors can save time
since it lets the programmer determine, to a large extent, how the data is to be
organized physically on disk.
14
Example
➢ Disk characteristics
• Block-addressable Disk Drive
• Size of track = 20.000 bytes
• Nondata overhead per block = 300 bytes
➢ File Characteristics
• Record size = 100 bytes
➢ How many records can be stored per track for the following blocking factors?
• 1. Block factor = 10
• 2. Block factor = 60
15
Solution
➢ Case 1:
• Blocking factor is 10
𝟐𝟎𝟎𝟎𝟎
• Size of data subblocks = 1000 = 𝟏𝟓. 𝟑𝟖 = 𝟏𝟓
𝟏𝟑𝟎𝟎
• Number of blocks that can fit in a track =
• Number of records per track = 150 records
➢ Case 2:
• Blocking factor is 60 𝟐𝟎𝟎𝟎𝟎
= 𝟑. 𝟏𝟕 = 𝟑
• Size of data subblocks = 6000 𝟔𝟑𝟎𝟎
• Number of blocks that can fit in a track =
• Number of records per track = 180 records
16
The Cost of a Disk Access
➢ Seek Time is the time required to move the access arm to the correct cylinder.
• More costly in a multiuser environment.
➢ Rotational Delay is the time it takes for the disk to rotate so the sector we want is
under the read/write head.
➢ Transfer Time
• =(# 𝒐𝒇 𝒃𝒚𝒕𝒆𝒔 𝒕𝒓𝒂𝒏𝒔𝒇𝒆𝒓𝒓𝒆𝒅) / (# 𝒐𝒇 𝒃𝒚𝒕𝒆𝒔 𝒐𝒏 𝒂 𝒕𝒓𝒂𝒄𝒌) × 𝒓𝒐𝒕𝒂𝒕𝒊𝒐𝒏 𝒕𝒊𝒎𝒆
• 63 sectors per track
17
Disk as Bottleneck
➢ Processes are often Disk-Bound, i.e., the network and the CPU often have to wait
inordinate lengths of time for the disk to transmit data.
➢ When a program reads a byte from the disk, the operating system locates the
surface, track and sector containing that byte, and reads the entire sector into a
special area in main memory called buffer.
18
Various Techniques to Solve this Problem
1. Multiprocessing: (CPU works on other jobs while waiting for the disk), but:
• Multiprocessing is not always available.
• The process cannot afford so much time waiting for the disk.
2. Disk Striping:
• Putting different blocks of the file in different drives, then letting the separate drives
deliver parts of the file to the network simultaneously.
• Independent processes accessing the same file may not interfere with each other
(parallelism)
3. RAID (Redundant Array of Independent Disks).
4. RAM Disk (Memory Disk): Simulate the behavior of the mechanical disk in
memory.
19
Various Techniques to Solve this Problem
5. Disk Cache:
• Large block of memory configured to contain pages of data from a disk.
• When data is requested from disk, first the cache is checked.
• If data is not there (miss) the disk is accessed.
• Differs from the Cache memory which does the same types of performance-enhancing
operations with respect to memory.
20
RAID (Redundant Array of Independent Disks)
➢ Disk Array: Arrangement of several disks that gives abstraction of a single, large
disk. (One Disk Controller)
➢ Goals: Increase performance and reliability.
➢ Two main techniques:
• Data striping: Data is partitioned; size of a partition is called the striping unit.
Partitions are distributed over several disks. For an 8-drive RAID, for example, the
controller receives a single block to write and breaks it into eight pieces, the first piece is
written to a particular track of the first disk, and so on. Reading is done the same way,
all the pieces are reassembled in cache, and cache content is transmitted back through
the I/O channels.
• Redundancy: Same Information is replicated in more disks.
• More disks more failures.
• Redundant information allows reconstruction of data if a disk fails.
21