Reducing I/O costs
How crucial is the problem?
Disk capacity has improved 1,000X in the last 15 year. The size of data also has increased in the same rate. But,
platters only spin 4X faster. the transfer rate has improved only 40X in the same period.
Data Organization on Disks
Davood Rafiei
Thus:
disk accesses are more precious. they are expected to be more precious in future.
Physical Disk Structure
Reducing I/O costs (Cont.)
Key to lower I/O cost: reduce seek/rotation delays Software solutions
arrange blocks of a file sequentially on disk read/write in bigger chunks buffering
Hardware solutions
Will not be discussed.
Reducing I/O costs (Cont.)
Store pages containing related information close together on disk
Justification: If application accesses x, it will next access data related to x with high probability
Buffering
Keep cache of recently accessed pages in main memory
Goal: request for page can be satisfied from cache instead of disk Purge pages when cache is full
For example, use LRU algorithm Record clean/dirty state of page (clean pages dont have to be written)
Page size tradeoff:
Large page size - data related to x stored in same page; hence additional page transfer can be avoided Small page size - reduce transfer time, reduce buffer size in main memory Typical page size - 4096 bytes
Example Page Size
Consider:
An IBM Deskstar disk with 40 sectors/track, 512 bytes/sector and average seek time of 9.1 msec. Disk platters spin at 7,200 rpm. average rotational delay=(1/7200)/2 minutes = 4.17 msec. transfer time for a sector= (1/7200)/40 minutes = 0.21 msec A file of 6400 256-byte records = 1638 KB which occupies 3200 sectors of the disk.
Accessing Data Through Cache
DBMS Page transfer
Application
cache/buffer
Case 1:
The file is stored in 100 extents each of size 4 pages where each page is 8 sectors. Time to read the file= 100 x ( 9.1 + 4.17 + 32 x 0.21)=2 seconds
Case 2:
The file is stored in 3200 pages each of size one sector. Time to read the file = 3200 (9.1 + 4.17 + 0.21) = 43 seconds
6
Item transfer
block Page frames
Hardware Solutions
Arrange disks arrays: several disks that give abstraction of a single, large disk. Partition data into striping units and distribute them over several disks.
student file:
pages 1-10 pages 11-20 pages 21-30
read pages 1,22,15
more disks --> more failures!
9
Summary
Disks: cheap, non-volatile storage.
provides both sequential and random access. The cost for a random access depends on the location of page on disk; important to arrange data sequentially to minimize seek and rotation delays.
Lowering I/O costs
software vs. hardware solutions.
10