0% found this document useful (0 votes)
16 views21 pages

File Organization-Lec5

Uploaded by

Pc Pc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views21 pages

File Organization-Lec5

Uploaded by

Pc Pc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

CSW241-File Organization and Processing

Secondary Storage Devices: Magnetic Disks

Dr. Riham Moharam


Faculty of Information Technology & Computer Science
Sinai University
North Sinai, Egypt
Outline
➢ Data Organization
➢ Organizing Tracks by Sector
➢ Organizing Tracks by Block
➢ Disk Layout Strategies
➢ Non Data Overhead
➢ The Cost of a Disk Access
➢ Disk as Bottleneck

2
Data Organization
➢ There are two basic ways to organize data on a disk by:
• Sector and
• User- defined block.

3
Organizing Tracks by Sector
➢ The simplest view, is that sectors are adjacent, fixed-sized segments of a track that
happen to hold a file.

➢ This is often a perfectly adequate way to view a file logically, but it may not be a
good way to store sectors physically.

4
Organizing Tracks by Sector
➢ The file manager is the part of the operating system responsible for managing files.
• The file manager maps the logical parts of the file into their physical location.
• A cluster is a fixed number of contiguous sectors.
• The file manager allocates an integer number of clusters to a file. An example: Sector
size: 512 bytes, Cluster size: 2 sectors.
• If a file contains 10 bytes, a cluster is allocated (1024 bytes).
• There may be unused space in the last cluster of a file. This unused space contributes to
internal fragmentation.
• The clusters are also not usually stored contiguously on the disk, causing external
fragmentation.

5
Organizing Tracks by Sector
➢ Clusters are good since they improve sequential access: reading bytes sequentially
from a cluster can be done in one revolution, seeking only once.
➢ The file manager maintains a file allocation table (FAT) containing for each cluster
in the file and its location in disk.
➢ An extent is a group of contiguous clusters. If file is stored in a single extent then
seeking is done only once.
➢ If there is not enough contiguous clusters to hold a file, the file is divided into 2 or
more extents.

6
Organizing Tracks by Sector

7
Fragmentation
➢ Due to records not fitting exactly in a sector.
• Example: Record size = 200 bytes, sector size = 512 bytes
• To avoid that a record span 2 sectors, we can only store 2 records in this sector (112
bytes go unused per sector)
• The alternative is to let a record span two sectors, but in this case two sectors must be
read when we need to access this record).
➢ Due to the use of clusters.
• If the file size is not multiple of the cluster size, then the last cluster will be partially
used.

8
How to Chose Cluster Size
➢ Some OS allow the system administrator to choose the cluster size.
➢ When to use large cluster size?
• When disks contain large files likely to be processed sequentially.
• Example: Updates in a master file of bank accounts (in batch mode)

➢ What about small cluster size?


• When disks contain small files and/or files likely to be accessed randomly.
• Example : online updates for airline reservation

9
Organizing Tracks by Block
➢ Rather than being divided into sectors, the disk tracks may be divided into user-
defined blocks.

➢ When the data on a track is organized by block, this usually means that the amount
of data transferred in a single I/O operation can vary depending on the needs of the
software designer (not the hardware).

➢ Blocks can normally be either fixed or variable in length, depending on the


requirements of the file designer and the capabilities of the operating system.

➢ A block is usually organized to contain an integral number of logical records.

10
Organizing Tracks by Block
➢ The blocking factor indicates the number of records that are to be stored in each
block in a file.
➢ Blocks don’t have the sector-spanning and fragmentation problem of sectors since
they vary in size to fit the logical organization of the data.
➢ A block typically contains subblocks.
➢ Data subblock: contains the records in this block.
➢ Each block is usually accompanied by subblocks:
• Key-subblock:
• The key for the last record in the data subblock (disk controller can search for key without
loading it in main memory)
• Count-subblock:
• The number of bytes in a block.

11
Non-Data Overhead
➢ Amount of space used for extra stuff other than data.

➢ Sector-Addressable Disks
• At the beginning of each sector some info is stored, such as sector address, track
address, condition (if sector is defective);
• There is some gap between sectors.

➢ Block-Organized Disks
• Subblocks and interblock gaps is part of the extra stuff; more nondata overhead than
with sector-addressing.

12
Non-Data Overhead
➢ Whether using a block or a sector organization, some space on the disk is taken up
by non-data overhead. i.e., information stored on the disk during pre-formatting.

➢ On sector-addressable disks, pre-formatting involves storing, at the beginning of


each sector, sector address, track address and condition (usable or defective).

➢ On block-organized disks, subblock + interblock gaps have to be provided with


every block. The relative amount of non-data space necessary for a block scheme is
higher than for a sector-scheme.

13
Non-Data Overhead
➢ The greater the block-size, the greater potential amount of internal track
fragmentation.

➢ The flexibility introduced by the use of blocks rather than sectors can save time
since it lets the programmer determine, to a large extent, how the data is to be
organized physically on disk.

14
Example
➢ Disk characteristics
• Block-addressable Disk Drive
• Size of track = 20.000 bytes
• Nondata overhead per block = 300 bytes
➢ File Characteristics
• Record size = 100 bytes
➢ How many records can be stored per track for the following blocking factors?
• 1. Block factor = 10
• 2. Block factor = 60

15
Solution
➢ Case 1:
• Blocking factor is 10
𝟐𝟎𝟎𝟎𝟎
• Size of data subblocks = 1000 = 𝟏𝟓. 𝟑𝟖 = 𝟏𝟓
𝟏𝟑𝟎𝟎
• Number of blocks that can fit in a track =
• Number of records per track = 150 records

➢ Case 2:
• Blocking factor is 60 𝟐𝟎𝟎𝟎𝟎
= 𝟑. 𝟏𝟕 = 𝟑
• Size of data subblocks = 6000 𝟔𝟑𝟎𝟎
• Number of blocks that can fit in a track =
• Number of records per track = 180 records

16
The Cost of a Disk Access
➢ Seek Time is the time required to move the access arm to the correct cylinder.
• More costly in a multiuser environment.

➢ Rotational Delay is the time it takes for the disk to rotate so the sector we want is
under the read/write head.

➢ Transfer Time
• =(# 𝒐𝒇 𝒃𝒚𝒕𝒆𝒔 𝒕𝒓𝒂𝒏𝒔𝒇𝒆𝒓𝒓𝒆𝒅) / (# 𝒐𝒇 𝒃𝒚𝒕𝒆𝒔 𝒐𝒏 𝒂 𝒕𝒓𝒂𝒄𝒌) × 𝒓𝒐𝒕𝒂𝒕𝒊𝒐𝒏 𝒕𝒊𝒎𝒆
• 63 sectors per track

17
Disk as Bottleneck
➢ Processes are often Disk-Bound, i.e., the network and the CPU often have to wait
inordinate lengths of time for the disk to transmit data.

➢ When a program reads a byte from the disk, the operating system locates the
surface, track and sector containing that byte, and reads the entire sector into a
special area in main memory called buffer.

18
Various Techniques to Solve this Problem
1. Multiprocessing: (CPU works on other jobs while waiting for the disk), but:
• Multiprocessing is not always available.
• The process cannot afford so much time waiting for the disk.
2. Disk Striping:
• Putting different blocks of the file in different drives, then letting the separate drives
deliver parts of the file to the network simultaneously.
• Independent processes accessing the same file may not interfere with each other
(parallelism)
3. RAID (Redundant Array of Independent Disks).
4. RAM Disk (Memory Disk): Simulate the behavior of the mechanical disk in
memory.

19
Various Techniques to Solve this Problem
5. Disk Cache:
• Large block of memory configured to contain pages of data from a disk.
• When data is requested from disk, first the cache is checked.
• If data is not there (miss) the disk is accessed.
• Differs from the Cache memory which does the same types of performance-enhancing
operations with respect to memory.

20
RAID (Redundant Array of Independent Disks)
➢ Disk Array: Arrangement of several disks that gives abstraction of a single, large
disk. (One Disk Controller)
➢ Goals: Increase performance and reliability.
➢ Two main techniques:
• Data striping: Data is partitioned; size of a partition is called the striping unit.
Partitions are distributed over several disks. For an 8-drive RAID, for example, the
controller receives a single block to write and breaks it into eight pieces, the first piece is
written to a particular track of the first disk, and so on. Reading is done the same way,
all the pieces are reassembled in cache, and cache content is transmitted back through
the I/O channels.
• Redundancy: Same Information is replicated in more disks.
• More disks more failures.
• Redundant information allows reconstruction of data if a disk fails.

21

You might also like