0% found this document useful (0 votes)

181 views9 pages

Database Storage Media Overview

This document summarizes different storage media used in database systems including their characteristics like speed, cost, and reliability. It focuses on magnetic disks, describing their physical components and characteristics like seek time, rotational latency, and data transfer rate that impact performance. Optimization techniques for disk access include scheduling, file organization, buffering, and using RAID configurations to improve reliability and performance.

Uploaded by

gdeepthi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

181 views9 pages

Database Storage Media Overview

Uploaded by

gdeepthi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 9

Storage and File Structure

Chapter 10, Database Systems Concepts, by Silberschatz, et al, 1997

Portions reproduced with permission

So far we have studied the DBMS at level of the logical model. The logical model of a
database system is the correct level for the database users to focus on. The goal of a
database system is to simplify and facilitate access to data. As members of the
development staff and as potential Database Administrators, we need to understand
the physical level better than a typical user.

Overview of Physical Storage Media

Storage media are classified by speed of access, cost per unit of data to buy the media,
and by the medium's reliability. Unfortunately, as speed and cost go up, the reliability
does down.

1. Cache is the fastest and the most costly for of storage. The type of cache
referred to here is the type that is typically built into the CPU chip and is
256KB, 512KB, or 1MB. Thus, cache is used by the operating system and has
no application to database, per se.

2. Main memory is the volatile memory in the computer system that is used to
hold programs and data. While prices have been dropping at a staggering rate,
the increases in the demand for memory have been increasing faster. Today's
32-bit computers have a limitation of 4GB of memory. This may not be
sufficient to hold the entire database and all the associated programs, but the
more memory available will increase the response time of the DBMS. There are
attempts underway to create a system with the most memory that is cost
effective, and to reduce the functionality of the operating system so that only
the DBMS is supported, so that system response can be increased. However,
the contents of main memory are lost if a power failure or system crash occurs.

3. Flash memory is also referred to as electrically erasable programmable read-

only memory (EEPROM). Since it is small (5 to 10MB) and expensive, it has
little or no application to the DBMS.

4. Magnetic-disk storage is the primary medium for long-term on-line storage

today. Prices have been dropping significantly with a corresponding increase in
capacity. New disks today are in excess of 20GB. Unfortunately, the demands
have been increasing and the volume of data has been increasing faster. The
organizations using a DBMS are always trying to keep up with the demand for
storage. This media is the most cost-effective for on-line storage for large
databases.

5. Optical storage is very popular, especially CD-ROM systems. This is limited

to data that is read-only. It can be reproduced at a very low-cost and it is
expected to grow in popularity, especially for replacing written manuals.

6. Tape storage is used for backup and archival data. It is cheaper and slower
than all of the other forms, but it does have the feature that there is no limit on
the amount of data that can be stored, since more tapes can be purchased. As
the tapes get increased capacity, however, restoration of data takes longer and
longer, especially when only a small amount of data is to be restored. This is
because the retrieval is sequential, the slowest possible method.

Magnetic Disks
A typical large commercial database may require hundreds of disks!

Physical Characteristics of Disks

Disks are actually relatively simple. There is normally a collection of platters on a

spindle. Each platter is coated with a magnetic material on both sides and the data is
stored on the surfaces. There is a read-write head for each surface that is on an arm
assembly that moves back and forth. A motor spins the platters at a high constant
speed, (60, 90, or 120 revolutions per seconds.)

The surface is divided into a set of tracks (circles). These tracks are divided into a set
of sectors, which is the smallest unit of data that can be written or read at one time.
Sectors can range in size from 31 bytes to 4096 bytes, with 512 bytes being the most
common. A collection of a specific track from both surfaces and from all of the
platters is called a cylinder.

Platters can range in size from 1.8 inches to 14 inches. Today, 5 1/4 inches and 3 1/2
inches are the most common, because they have the highest seek times and lowest
cost.

A disk controller interfaces the computer system and the actual hardware of the disk
drive. The controller accepts high-level command to read or write sectors. The
controller then converts the commands in the necessary specific low-level commands.
The controller will also attempt to protect the integrity of the data by computing and
using checksums for each sector. When attempting to read the data back, the controller
recalculates the checksum and makes several attempts to correctly read the data and
get matching checksums. If the controller is unsuccessful, it will notify the operating
system of the failure.

The controller can also handle the problem of eliminating bad sectors. Should a sector
go bad, the controller logically remaps the sector to one of the extra unused sectors
that disk vendors provide, so that the reliability of the disk system is higher. It is
cheaper to produce disks with a greater amount of sectors than advertised and then
map out bad sectors than it is to produce disks with no bad sectors or with extremely
limited possibility of sectors going bad.

There are many different types of disk controllers, but the most common ones today
are SCSI, IDE, and EIDE.

One other characteristic of disks that provides an interesting performance is the

distance from the read-write head to the surface of the platter. The smaller this gap is
means that data can be written in a smaller area on the disk, so that the tracks can be
closer together and the disk has a greater capacity. Often the distance is measured in
microns. However, this means that the possibility of the head touching the surface is
increased. When the head touches the surface while the surface is spinning at a high
speed, the result is called a "head crash", which scratches the surface and defaces the
head. The bottom line to this is that someone must replace the disk.

Performance Measures of Disks

1. Seek time is the time to reposition the head and increases with the distance that
the head must move. Seek times can range from 2 to 30 milliseconds. Average
seek time is the average of all seek times and is normally one-third of the
worst-case seek time.

2. Rotational latency time is the time from when the head is over the correct track
until the data rotates around and is under the head and can be read. When the
rotation is 120 rotations per second, the rotation time is 8.35 milliseconds.
Normally, the average rotational latency time is one-half of the rotation time.

3. Access time is the time from when a read or write request is issued to when the
data transfer begins. It is the sum of the seek time and latency time.

4. Data-transfer rate is the rate at which data can be retrieved from the disk and
sent to the controller. This will be measured as megabytes per second.
5. Mean time to failure is the number of hours (on average) until a disk fails.
Typical times today range from 30,000 to 800,000 hours (or 3.4 to 91 years).

Optimization of Disk-Block Access

Requests for disk I/O are generated by both the file system and by the virtual memory
manager found in most systems. Each request specifies the address on the disk to be
referenced; that address specifies is in the form of a block number. Each block is a
contiguous sequence of sectors from a single track of one platter and ranges from 512
bytes to several kilobytes of data. The lower level file manager must convert block
addresses into the hardware-level cylinder, surface, and sector number.

Since access to data on disk is several orders of magnitude slower is access to data in
main memory; much attention has been paid to improving the speed of access to
blocks on the disk. This is also where more main memory can speed up the response
time, by making sure that the data needed is in memory when it is needed.

This is the same problem that is addressed in designing operating systems, to insure
the best response time from the file system manager and the virtual memory manager.

Scheduling. Disk-arm scheduling algorithms attempt to order accesses in an

attempt to increase the number of accesses that can be processed in a given
amount of time. The might include First-Come/First-Serve, Shortest Seek First,
and elevator.

File organization. To reduce block-access time, data could be arranged on the

disk in the same order that it is expected to be retrieved. (This would be storing
the data on the disk in order based on the primary key.) At best, this starts to
produce less and less of a benefit, as there are more inserts and deletes. Also we
have little control of where on the disk things get stored. The more the data gets
fragmented on the disk, the more time it takes to locate it.

Nonvolatile write buffer. Using non-volatile memory (flash memory) can be

used to protect the data in memory from crashes, but it does increase the cost. It
is possible that the use of an UPS would be more effective and cheaper.

Log disk. You can use a disk for writing a sequential log.

Buffering. The more information you have in buffers in main memory, the
more likely you are to not have to get the information from the disk. However it
is more likely that more of the memory will be wasted with information not
necessary.
RAID
RAIDs are Redundant Arrays of Inexpensive Disks. There are six levels of organizing
these disks:

0 -- Non-redundant Striping

1 -- Mirrored Disks

2 -- Memory Style Error Correcting Codes

3 -- Bit Interleaved Parity

4 -- Block Interleaved Parity

5 -- Block Interleaved Distributed Parity

6 -- P + Q Redundancy

Tertiary Storage
This is commonly optical disks and magnetic tapes.

Storage Access
A database is mapped into a number of different files, which are maintained by the
underlying operating system. Files are organized into block and a block may contain
one or more data item.

A major goal of the DBMS is to minimize the number of block transfers between the
disk and memory. Since it is not possible to keep all blocks in main memory, we need
to manage the allocation of the space available for the storage of blocks. This is also
similar to the problems encountered by the operating system, and can be in conflict
with the operating system, since the OS is concerned with processes and the DBMS is
concerned with only one family of processes.

Buffer Manager

Programs in a DBMS make requests (that is, calls) on the buffer manager when they
need a block from a disk. If the block is already in the buffer, the requester is passed
the address of the block in main memory. If the block in not in the buffer, the buffer
manager first allocates space in the buffer for the block, through out some other block,
if required, to make space for the new block. If the block that is to be thrown out has
been modified, it must first be written back to the disk. The internal actions of the
buffer manager are transparent to the programs that issue disk-block requests.

Replacement strategy. When there is no room left in the buffer, a block must be
removed from the buffer before a new one can be read in. Typically, operating
systems use a least recently use (LRU) scheme. There is also a Most Recent
Used (MRU) that can be more optimal for DBMSs.

Pinned blocks. A block that is not allowed to be written back to disk is said to
be pinned. This could be used to store data that has not been committed yet.

Forced output of blocks. There are situations in which it is necessary to write

back to the block to the disk, even though the buffer space is not currently
needed. This might be done during system lulls, so that when activity picks up,
a write of a modified block can be avoided in peak periods.

File Organization

Fixed-Length Records

Suppose we have a table that has the following organization:

type deposit = record
branch-name : char(22);
account-number : char(10);
balance : real;
end

If each character occupies 1 byte and a real occupies 8 bytes, then this record
occupies 40 bytes. If the first record occupies the first 40 bytes and the second
record occupies the second 40 bytes, etc. we have some problems.

It is difficult to delete a record, because there is no way to indicate that the

record is deleted. (At least one system automatically adds one byte to each
record as a flag to show if the record is deleted.) Unless the block size happens
to be a multiple of 40 (which is extremely unlikely), some records will cross
block boundaries. It would require two block access to read or write such a
record.
One solution might be to compress the file after each deletion. This will incur a major
amount of overhead processing, especially on larger files. Additionally, there is the
same problem on inserts!

Another solution would be to have two sets of pointers. One that would link the
current record to the next logical record (linked list) plus a free list (a list of free
slots.) This increases the size the file.

Variable-Length Records

We can use variable length records:

Storage of multiple record types in one file.

Record types that allow variable lengths for one or more fields

Record types that allow repeating fields.

A simple method for implementing variable-length records is to attach a special end-

of-record symbol at the end of each record. But this has problems:

To easy to reuse space occupied formerly by a deleted record.

There is no space in general for records to grow. If a variable-length record is

updated and needs more space, it must be moved. This can be very costly.

It could be solved:

By making a variable-length into a fixed length.

By using pointers to point to fixed length records, chained together by pointers.

As you can see, there is not an easy answer.

Organization of Records in Files

Heap File Organization

Any record can be placed anywhere in the file. There is no ordering of records and
there is a single file for each relation.

Sequential File Organization

Records are stored in sequential order based on the primary key.

Hashing File Organization

Any record can be placed anywhere in the file. A hashing function is computed on
some attribute of each record. The function specifies in which block the record should
be placed.

Clustering File Organization

Several different relations can be stored in the same file. Related records of the
different relations can be stored in the same block.

Data Dictionary Storage

A RDBMS needs to maintain data about the relations, such as the schema. This is
stored in a data dictionary (sometimes called a system catalog):

Names of the relations

Names of the attributes of each relation

Domains and lengths of attributes

Names of views, defined on the database, and definitions of those views

Integrity constraints

Names of authorized users

Accounting information about users

Number of tuples in each relation

Method of storage for each relation (clustered/non-clustered)

Name of the index

Name of the relation being indexed

Attributes on which the index in defined

Type of index formed

Unit IV
No ratings yet
Unit IV
31 pages
Storage and File Structure
No ratings yet
Storage and File Structure
60 pages
Storage and File Structure
No ratings yet
Storage and File Structure
55 pages
Unit 4
No ratings yet
Unit 4
19 pages
Module-4 Data Storage
No ratings yet
Module-4 Data Storage
78 pages
Classification o F Physical Storage Media
No ratings yet
Classification o F Physical Storage Media
21 pages
Physical Storage Media Overview
No ratings yet
Physical Storage Media Overview
16 pages
Lecture 15
No ratings yet
Lecture 15
19 pages
File Organisation and Indexing
No ratings yet
File Organisation and Indexing
10 pages
Os Unit Iv - SM
No ratings yet
Os Unit Iv - SM
120 pages
Chapter 6 - File - and - Storage
No ratings yet
Chapter 6 - File - and - Storage
63 pages
Magnetic Disks & Storage Devices
No ratings yet
Magnetic Disks & Storage Devices
34 pages
Study Module 2
No ratings yet
Study Module 2
17 pages
Understanding Backing Store Types
No ratings yet
Understanding Backing Store Types
27 pages
Database Managment System
No ratings yet
Database Managment System
85 pages
D. Disk System Architecture
No ratings yet
D. Disk System Architecture
13 pages
YH Microprocessors-6 Memory Elements
No ratings yet
YH Microprocessors-6 Memory Elements
73 pages
Dbms 5th Unit
No ratings yet
Dbms 5th Unit
30 pages
DBMS Storage & File Structures
No ratings yet
DBMS Storage & File Structures
45 pages
DBMS 11
No ratings yet
DBMS 11
85 pages
Lecture13 Cda3101
No ratings yet
Lecture13 Cda3101
24 pages
4 - Memory
No ratings yet
4 - Memory
41 pages
UNIT I Storage Fundamentals
No ratings yet
UNIT I Storage Fundamentals
63 pages
Ism Chapter 2
No ratings yet
Ism Chapter 2
13 pages
DBMS Notes Unit IV PDF
No ratings yet
DBMS Notes Unit IV PDF
73 pages
FALLSEM2024-25 CSI1004 TH VL2024250101793 2024-10-24 Reference-Material-I
No ratings yet
FALLSEM2024-25 CSI1004 TH VL2024250101793 2024-10-24 Reference-Material-I
64 pages
Lec 8 - Disk Storage & Files
No ratings yet
Lec 8 - Disk Storage & Files
31 pages
Mod-5 Ch3 Ch4-1
No ratings yet
Mod-5 Ch3 Ch4-1
19 pages
Unit - V - Principles of Operating Systems - Ece - Iii - Ii
No ratings yet
Unit - V - Principles of Operating Systems - Ece - Iii - Ii
52 pages
Storage
No ratings yet
Storage
16 pages
2010 STG Fil Idx Qry Trns
No ratings yet
2010 STG Fil Idx Qry Trns
89 pages
Operating Systems Unit 4
No ratings yet
Operating Systems Unit 4
70 pages
Notes 02 - Hardware
No ratings yet
Notes 02 - Hardware
62 pages
Chapter 6
No ratings yet
Chapter 6
27 pages
06 External Memory
No ratings yet
06 External Memory
51 pages
Storage and File Structure
No ratings yet
Storage and File Structure
60 pages
Magnetic Memory
No ratings yet
Magnetic Memory
4 pages
Secondary Storage Introduction
No ratings yet
Secondary Storage Introduction
82 pages
Secondary Storage Devices (1) :: Magnetic Disks
No ratings yet
Secondary Storage Devices (1) :: Magnetic Disks
56 pages
05 External Memory
No ratings yet
05 External Memory
47 pages
Unit Iii
No ratings yet
Unit Iii
116 pages
ADBMS
No ratings yet
ADBMS
23 pages
7852 RDBMS-MRM
No ratings yet
7852 RDBMS-MRM
132 pages
Unit - V
No ratings yet
Unit - V
87 pages
Module 5 2 Storage Systems
No ratings yet
Module 5 2 Storage Systems
38 pages
Computer Architecture and Organization: Lecture10: Rotating Disks
No ratings yet
Computer Architecture and Organization: Lecture10: Rotating Disks
21 pages
Storage Systems: 1. Explain Various Types of Storage Devices
No ratings yet
Storage Systems: 1. Explain Various Types of Storage Devices
24 pages
L-6 (External Memory)
No ratings yet
L-6 (External Memory)
48 pages
External Memory and Storage Devices
No ratings yet
External Memory and Storage Devices
40 pages
CH06-WS - 10thed
No ratings yet
CH06-WS - 10thed
14 pages
File Organization-Lec4
No ratings yet
File Organization-Lec4
21 pages
Secondary Storage
No ratings yet
Secondary Storage
50 pages
Unit-Iv: Database Management System
50% (2)
Unit-Iv: Database Management System
8 pages
Chapter 11: Storage and File Structure Classification of Physical Storage Media
No ratings yet
Chapter 11: Storage and File Structure Classification of Physical Storage Media
7 pages
Unit 5 1
No ratings yet
Unit 5 1
27 pages
Univ 4 & 5notes
No ratings yet
Univ 4 & 5notes
73 pages
Storage1
No ratings yet
Storage1
49 pages
Design of Carry Lookahead Adders
No ratings yet
Design of Carry Lookahead Adders
2 pages
Spam Detection and Filtering
No ratings yet
Spam Detection and Filtering
16 pages
MFCS Mid2 Bits
No ratings yet
MFCS Mid2 Bits
3 pages
DBMS Korth ch1
No ratings yet
DBMS Korth ch1
32 pages
Department of Electrical and Electronics Engineering: Lab Manual
No ratings yet
Department of Electrical and Electronics Engineering: Lab Manual
53 pages
C Programming Syllabus
No ratings yet
C Programming Syllabus
3 pages
Compatible Relation
No ratings yet
Compatible Relation
2 pages
Pki Openssl
100% (1)
Pki Openssl
11 pages
Intro to Equivalence & Order Relations
No ratings yet
Intro to Equivalence & Order Relations
6 pages
Network+ (Networking) (Certified by Comptia, Usa)
No ratings yet
Network+ (Networking) (Certified by Comptia, Usa)
4 pages
C# and .Net Programming
No ratings yet
C# and .Net Programming
2 pages
Discrete Structures and Graph Theory
No ratings yet
Discrete Structures and Graph Theory
4 pages
Syllabus
No ratings yet
Syllabus
1 page
Syllabus Instructor: Office:: Peter Ping Liu
No ratings yet
Syllabus Instructor: Office:: Peter Ping Liu
3 pages
It
No ratings yet
It
306 pages
Cloud Computing & Big Data Course
0% (1)
Cloud Computing & Big Data Course
5 pages
B.Tech Computer Organization Guide
No ratings yet
B.Tech Computer Organization Guide
15 pages
C Programming Lab
No ratings yet
C Programming Lab
3 pages
CNS Syllabus
No ratings yet
CNS Syllabus
1 page
Computer Science Course Syllabus
No ratings yet
Computer Science Course Syllabus
2 pages
Split Ac
No ratings yet
Split Ac
10 pages
Understanding Single Sign-On (SSO) Between IBM WebSphere Portal and IBM Lotus Domino
100% (9)
Understanding Single Sign-On (SSO) Between IBM WebSphere Portal and IBM Lotus Domino
26 pages
Creating and Viewing Control Charts - SPD
No ratings yet
Creating and Viewing Control Charts - SPD
24 pages
KCIDE For Circuits
No ratings yet
KCIDE For Circuits
10 pages
Study Program Curriculum 2023 Informatics
No ratings yet
Study Program Curriculum 2023 Informatics
46 pages
DS Assign 1 KU
No ratings yet
DS Assign 1 KU
1 page
Contoh Resume Yang Lengkap
0% (1)
Contoh Resume Yang Lengkap
7 pages
4 A14 CSCOPerating SYstem
No ratings yet
4 A14 CSCOPerating SYstem
2 pages
2011 CSCIM Conference Proceedings
No ratings yet
2011 CSCIM Conference Proceedings
5 pages
Notes CA403 Object Oriented Programming Using C Sybbaca
No ratings yet
Notes CA403 Object Oriented Programming Using C Sybbaca
49 pages
PL7-3 Installation Guide
No ratings yet
PL7-3 Installation Guide
10 pages
Data Structure
No ratings yet
Data Structure
10 pages
Homework 1
No ratings yet
Homework 1
2 pages
CISCO MPLS-TP Solutions PDF
No ratings yet
CISCO MPLS-TP Solutions PDF
41 pages
4130-Rc032-010d-Hibernate Search 0 1
100% (1)
4130-Rc032-010d-Hibernate Search 0 1
6 pages
Pro Milone
No ratings yet
Pro Milone
237 pages
Java - Io.file Class in Java: It Is An Abstract Representation of File and Directory Pathnames
No ratings yet
Java - Io.file Class in Java: It Is An Abstract Representation of File and Directory Pathnames
5 pages
AI Important Questions
No ratings yet
AI Important Questions
17 pages
ID58bcfada7-2012 Honda Civic Ex Manual
No ratings yet
ID58bcfada7-2012 Honda Civic Ex Manual
2 pages
Format Conversion Guide for Engineers
No ratings yet
Format Conversion Guide for Engineers
4 pages
Grade 8 Olympiad in
No ratings yet
Grade 8 Olympiad in
9 pages
Module Tutorial
No ratings yet
Module Tutorial
16 pages
PNN
No ratings yet
PNN
19 pages
Prim's Algorithm For Optimizing Fiber Optic Trajectory Planning
No ratings yet
Prim's Algorithm For Optimizing Fiber Optic Trajectory Planning
7 pages
EOP Best Practices Guide
No ratings yet
EOP Best Practices Guide
111 pages
BSCI30 NIL Lab Guide PDF
No ratings yet
BSCI30 NIL Lab Guide PDF
202 pages
A Parallelizable Variant of HCA : Sreenivasan Ganti Visnu Srinivasan Pallavi Ramicetty
No ratings yet
A Parallelizable Variant of HCA : Sreenivasan Ganti Visnu Srinivasan Pallavi Ramicetty
7 pages
User Guide Slurm
100% (2)
User Guide Slurm
82 pages
3655529342
No ratings yet
3655529342
2 pages
Training of Key DP Personnel - IMCA M 117 Rev
100% (1)
Training of Key DP Personnel - IMCA M 117 Rev
4 pages

Database Storage Media Overview

Uploaded by

Database Storage Media Overview

Uploaded by

Storage and File Structure

Chapter 10, Database Systems Concepts, by Silberschatz, et al, 1997

Portions reproduced with permission

Overview of Physical Storage Media

3. Flash memory is also referred to as electrically erasable programmable read-

4. Magnetic-disk storage is the primary medium for long-term on-line storage

5. Optical storage is very popular, especially CD-ROM systems. This is limited

Physical Characteristics of Disks

Disks are actually relatively simple. There is normally a collection of platters on a

One other characteristic of disks that provides an interesting performance is the

Performance Measures of Disks

Optimization of Disk-Block Access

Scheduling. Disk-arm scheduling algorithms attempt to order accesses in an

File organization. To reduce block-access time, data could be arranged on the

Nonvolatile write buffer. Using non-volatile memory (flash memory) can be

2 -- Memory Style Error Correcting Codes

3 -- Bit Interleaved Parity

4 -- Block Interleaved Parity

5 -- Block Interleaved Distributed Parity

Forced output of blocks. There are situations in which it is necessary to write

Suppose we have a table that has the following organization:

It is difficult to delete a record, because there is no way to indicate that the

We can use variable length records:

Storage of multiple record types in one file.

Record types that allow repeating fields.

A simple method for implementing variable-length records is to attach a special end-

To easy to reuse space occupied formerly by a deleted record.

There is no space in general for records to grow. If a variable-length record is

By making a variable-length into a fixed length.

By using pointers to point to fixed length records, chained together by pointers.

As you can see, there is not an easy answer.

Organization of Records in Files

Sequential File Organization

Hashing File Organization

Clustering File Organization

Data Dictionary Storage

Names of the relations

Names of the attributes of each relation

Domains and lengths of attributes

Names of views, defined on the database, and definitions of those views

Names of authorized users

Accounting information about users

Number of tuples in each relation

Method of storage for each relation (clustered/non-clustered)

Name of the index

Name of the relation being indexed

Attributes on which the index in defined

You might also like