Lecture 2 - IO Management
Lecture 2 - IO Management
IO Management
Operating Systems
Lecture 2A:
Mass Storage Systems
Operating Systems
Lecture 2A: Mass Storage Systems
Note that drive controllers have small buffers and can manage a queue
of I/O requests (of varying “depth”)
Several algorithms exist to schedule the servicing of disk I/O requests
The analysis is true for one or many platters
We illustrate scheduling algorithms with a request queue (0 - 199)
98, 183, 37, 122, 14, 124, 65, 67
Head pointer 53
Shortest Seek Time First selects the request with the minimum seek
time from the current head position
SSTF scheduling is a form of SJF scheduling; may cause starvation of
some requests
Illustration shows total head movement of 236 cylinders
The disk arm starts at one end of the disk, and moves toward the
other end, servicing requests until it gets to the other end of the disk,
where the head movement is reversed and servicing continues
SCAN algorithm Sometimes called the elevator algorithm
Illustration shows total head movement of 208 cylinders
But note that if requests are uniformly dense, largest density at other
end of disk and those wait the longest
Raw disk access for apps that want to do their own block management,
keep OS out of the way (databases for example)
Boot block initializes system
The bootstrap is stored in ROM
Bootstrap loader program stored in boot blocks of boot partition
Methods such as sector sparing used to handle bad blocks
Swap space - virtual memory uses disk space as an extension of main memory
Less common now due to memory capacity increases
Swap space can be carved out of the normal file system, or, more commonly, it
can be in a separate disk partition (raw)
Swap space management
4.3BSD allocates swap space when process starts; holds text segment (the
program) and data segment
Kernel uses swap maps to track swap space use
Solaris 2 allocates swap space only when a dirty page is forced out of
physical memory, not when the virtual memory page is first created
File data written to swap space until write to file system requested
Other dirty pages go to swap space due to no other home
Text segment pages thrown out and reread from the file system as needed
What if a system runs out of swap space?
Some systems allow multiple swap spaces
RAID alone does not prevent or detect data corruption or other errors,
just disk failures
Solaris ZFS adds checksums of all data and metadata
Checksums kept with pointer to object, to detect if object is the right
one and whether it changed
Can detect and correct data and metadata corruption
ZFS also removes volumes, partitions
Disks allocated in pools
Filesystems with a pool share that pool, use and release space like
malloc() and free() memory allocate / release calls
Operating Systems
Lecture 2B:
File System Interface
Operating Systems
Lecture 2B: File System Interface
File Concept
Access Methods
Disk and Directory Structure
File System Mounting
File Sharing
Protection
Directory
Files
F1 F2 F4
F3
Fn
Naming problem
Grouping problem
Path name
Can have the same file name for different user
Efficient searching
No grouping capability
Efficient searching
Grouping Capability
Current directory (working directory)
cd /spell/mail/prog
type list
Operating Systems
Lecture 2C:
File System Implementation
Operating Systems
Lecture 2C: File System Implementation
File structure
Logical storage unit
Collection of related information
File system resides on secondary storage (disks)
Provided user interface to storage, mapping logical to physical
Provides efficient and convenient access to disk by allowing data
to be stored, located retrieved easily
Disk provides in place rewrite and random access
I/O transfers performed in blocks of sectors (usually 512 bytes)
File control block - storage structure consisting of information about
a file
Device driver controls the physical device
File system organized into layers
We have system calls at the API level, but how do we implement their
functions?
On disk and in memory structures
Boot control block contains info needed by system to boot OS from
that volume
Needed if volume contains OS, usually first block of volume
Volume control block (superblock, master file table) contains
volume details
Total # of blocks, # of free blocks, block size, free block pointers
or array
Directory structure organizes the files
Names and inode numbers, master file table
Per file File Control Block (FCB) contains many details about the file
inode number, permissions, size, dates
NFTS stores into in master file table using relational DB structures
Mount table storing file system mounts, mount points, file system types
The following figure illustrates the necessary file system structures
provided by the operating systems
Figure (a) refers to opening a file
Figure (b) refers to reading a file
Plus buffers hold data blocks from secondary storage
Open returns a file handle for subsequent use
Data from read eventually copied to specified user process memory
address
The API is to the VFS interface, rather than any specific type of file system
An allocation method refers to how disk blocks are allocated for files
Contiguous allocation - each file occupies set of contiguous blocks
Best performance in most cases
Simple - only starting location (block #) and length (number of
blocks) are required
Problems include finding space for file, knowing file size, external
fragmentation, need for compaction offline (downtime) or
online
LA/512
Many newer file systems (i.e., Veritas File System) use a modified
contiguous allocation scheme
Extent based file systems allocate disk blocks in extents
An extent is a contiguous block of disks
Extents are allocated for file allocation
A file consists of one or more extents
block = pointer
Mapping
Q
LA/511
R
Indexed allocation
Each file has its own index block(s) of pointers to its data blocks
Logical view
index table
Q
LA/512
R
Q1
LA / (512 x 511)
R1
Two level index (4K blocks could store 1,024 four byte pointers in outer
index 1,048,567 data blocks and file size of up to 4GB)
Q1
LA / (512 x 512)
R1
More index blocks than can be addressed with 32 bit file pointer
0 1 2 n-1
…
1 block[i] free
bit[i] =
0 block[i] occupied
(number of bits per word) * (number of 0 value words) + offset of first 1 bit
CPUs have instructions to return offset within word of first “1” bit
Grouping
Modify linked list to store address of next n-1 free blocks in first free
block, plus a pointer to next block that contains free block pointers
(like this one)
Counting
Because space is frequently contiguously used and freed, with
contiguous allocation allocation, extents, or clustering
Keep address of first free block and count of following free
blocks
Free space list then has entries containing addresses and
counts
Space Maps
Used in ZFS
Consider metadata I/O on very large file systems
Full data structures like bit maps couldn’t fit in memory
thousands of I/Os
Divides device space into metaslab units and manages metaslabs
Given volume can contain hundreds of metaslabs
Each metaslab has associated space map
Uses counting algorithm
But records to log file rather than file system
Log of all block activity, in time order, in counting format
Metaslab activity load space map into memory in balanced tree
structure, indexed by offset
Replay log into that structure
Combine contiguous free blocks into single entry
Efficiency dependent on
Disk allocation and directory algorithms
Types of data kept in file’s directory entry
Preallocation or as needed allocation of metadata structures
Fixed size or varying size data structures
Performance
Keeping data and metadata close together
Buffer cache - separate section of main memory for frequently
used blocks
Synchronous writes sometimes requested by apps or needed by
OS
No buffering / caching - writes must hit disk before
acknowledgement
Asynchronous writes more common, bufferable, faster
Free behind and read ahead - techniques to optimize sequential
access
Reads frequently slower than writes
A page cache caches pages rather than disk blocks using virtual
memory techniques and addresses
Memory mapped I/O uses a page cache
Routine I/O through the file system uses the buffer (disk) cache
This leads to the following figure
A unified buffer cache uses the same page cache to cache both
memory mapped pages and ordinary file system I/O to avoid double
caching
But which caches get priority, and what replacement algorithms to
use?
UNIX file system interface (based on the open, read, write, and close
calls, and file descriptors)
Virtual File System (VFS) layer - distinguishes local files from remote
ones, and local files are further distinguished according to their file
system types
The VFS activates file system specific operations to handle local
requests according to their file system types
Calls the NFS protocol procedures for remote requests
NFS service layer - bottom layer of the architecture
Implements the NFS protocol
Operating Systems
Lecture 2D:
I/O Systems
Operating Systems
Lecture 2D: I/O Systems
Overview
I/O Hardware
Application I/O Interface
Kernel I/O Subsystem
Transforming I/O Requests to Hardware Operations
STREAMS
Performance
Used to avoid programmed I/O (one byte at a time) for large data
movement
Requires DMA controller
Bypasses CPU to transfer data directly between I/O device and
memory
OS writes DMA command block into memory
Source and destination addresses
Read or write mode
Count of bytes
Writes location of command block to DMA controller
Bus mastering of DMA controller - grabs bus from CPU
Cycle stealing from CPU but still much more efficient
When done, interrupts to signal completion
Version that is aware of virtual addresses can be even more efficient -
DVMA
Synchronous Asynchronous
Vectored I/O allows one system call to perform multiple I/O operations
For example, Unix readve() accepts a vector of multiple buffers to read
into or write from
This scatter gather method better than multiple individual I/O calls
Decreases context switching and system call overhead
Some versions provide atomicity
Avoid for example worry about multiple threads changing data
as reads / writes occurring
Scheduling
Some I/O request ordering via per device queue
Some OSs try fairness
Some implement Quality Of Service (i.e. IPQOS)
Buffering - store data in memory while transferring between devices
To cope with device speed mismatch
To cope with device transfer size mismatch
To maintain “copy semantics”
Double buffering - two copies of the data
Kernel and user
Varying sizes
Full / being processed and not full / being used
Copy on write can be used for efficiency in some cases
Kernel keeps state info for I/O components, including open file tables,
network connections, character device state
Many, many complex data structures to track buffers, memory
allocation, “dirty” blocks
Some use object oriented methods and message passing to implement
I/O
Windows uses message passing
Message with I/O information passed from user mode into
kernel
Message modified as it flows through to device driver and back
to process
Pros / cons?
Operating Systems