0% found this document useful (0 votes)
38 views

Data Storage and Access Methods: Min Song IS698

The document discusses physical database design and access methods. It describes how physical records are stored on disk using blocks and how different access methods like sequential, indexed sequential, random, and hashed access work. It also covers key physical design decisions around storage format, data arrangement, indexes, and query optimization that impact performance.
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views

Data Storage and Access Methods: Min Song IS698

The document discusses physical database design and access methods. It describes how physical records are stored on disk using blocks and how different access methods like sequential, indexed sequential, random, and hashed access work. It also covers key physical design decisions around storage format, data arrangement, indexes, and query optimization that impact performance.
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 50

Data Storage and Access Methods

Min Song IS698

Database Design Process


Application 1 Application 2 Application 3 Application 4

External Model
Application 1

External Model

External Model

External Model

Conceptual requirements
Application 2

Conceptual requirements
Application 3

Conceptual requirements
Application 4

Conceptual Model

Logical Model

Internal Model

Conceptual requirements

Physical Design

Physical Database Design


Many physical database design decisions are implicit in the technology adopted Also, organizations may have standards or an information architecture that specifies operating systems, DBMS, and data access languages -- thus constraining the range of possible physical implementations. We will be concerned with some of the possible physical implementation issues

Physical Database Design


The primary goal of physical database design is data processing efficiency We will concentrate on choices often available to optimize performance of database services Physical Database Design requires information gathered during earlier stages of the design process

Physical Design Information


Information needed for physical file and database design includes:
Normalized relations plus size estimates for them Definitions of each attribute Descriptions of where and when data are used entered, retrieved, deleted, updated, and how often Expectations and requirements for response time, and data security, backup, recovery, retention and integrity Descriptions of the technologies used to implement the database

Physical Design Decisions


There are several critical decisions that will affect the integrity and performance of the system
Storage Format Physical record composition Data arrangement Indexes Query optimization and performance tuning

Storage Format
Choosing the storage format of each field (attribute). The DBMS provides some set of data types that can be used for the physical storage of fields in the database Data Type (format) is chosen to minimize storage space and maximize data integrity

Objectives of data type selection


Minimize storage space Represent all possible values Improve data integrity Support all data manipulations The correct data type should, in minimal space, represent every possible value (but eliminate illegal values) for the associated attribute and can support the required data manipulations (e.g. numerical or string operations)

Access Data Types


Numeric (1, 2, 4, 8 bytes, fixed or float) Text (255 max) Memo (64000 max) Date/Time (8 bytes) Currency (8 bytes, 15 digits + 4 digits decimal) Autonumber (4 bytes) Yes/No (1 bit) OLE (limited only by disk space) Hyperlinks (up to 64000 chars)

Access Numeric types


Byte Integer

Stores numbers from 0 to 255 (no fractions). 1 byte

Stores numbers from 32,768 to 32,767 (no fractions) 2 bytes Long Integer (Default) Stores numbers from 2,147,483,648 to 2,147,483,647 (no fractions). 4 bytes Single Stores numbers from -3.402823E38 to 1.401298E45 for negative values and from 1.401298E45 to 3.402823E38 for positive values. 4 bytes Double Stores numbers from 1.79769313486231E308 to 4.94065645841247E324 for negative values and from 1.79769313486231E308 to 4.94065645841247E324 for positive values. 15 8 bytes Replication ID Globally unique identifier (GUID) N/A 16 bytes

Designing Physical Records


A physical record is a group of fields stored in adjacent memory locations and retrieved together as a unit Fixed Length and variable fields

Data Storage
Storing Data: Disks Buffer manager Representing relational data in a disk

The Memory Hierarchy


Main Memory = Disk Cache Processor Cache: Volatile access time 10 nanos 256M-1G 512K Access time: 10-100 nanoseconds Disk Tape Persistent 1.5 MB/S transfer rate 10-100 GB storage 280 GB typical speed: capacity Rate=5-10 MB/S Only sequential access Access time= Not for operational 10-15 msecs. data

Main Memory
Fastest, most expensive (excluding cache) Today: 512MB are common even on PCs Many databases could fit in memory
New industry trend: Main Memory Database E.g TimesTen

Main issue is volatility

Secondary Storage
Disks Slower, cheaper than main memory Persistent !!! The unit of disk I/O = block
Typically 1 block = 4k A disk block is also called a disk page or simply a page

Used with a main memory buffer

Block
Blocking factor (bfr) for a file is the average number of records stored in a disk block. Suppose the block size of a database system is 2000 bytes. Customer table has an average record length of 190 bytes. Assume the overhead of a block for the data is 100 bytes.
What is the blocking factor?

The Mechanics of Disk


Mechanical characteristics: Rotation speed (5400RPM) Disk head Number of platters (1-30) Number of tracks (<=10000) Number of sectors (256/track) Number of bytes / sector (29=512) Block size (212=4096)
Cylinder

Spindle Tracks

Sector

Arm movement

Platters

Arm assembly

Important Disk Access Characteristics


Block access time = Disk latency + transfer time Disk latency = seek time + rotational latency Seek time = time for the head to reach the right track 10ms 40ms Rotational latency = rotation time to get to the right sector Time for one rotation = 10ms Average rotation latency = 10ms/2 Transfer time = typically 5-10MB/s Disks read/write one block at a time (typically 4kB)

Representing Data Elements


Relational database elements:
CREATE TABLE Product ( pid INT PRIMARY KEY, name CHAR(20), description VARCHAR(200), maker CHAR(10) REFERENCES Company(name))

A tuple is represented as a record

Record Formats: Fixed Length


F1 L1 F2 L2 F3 F4

L3

L4

Base address (B)

Address = B+L1+L2

Information about field types same for all records in a file; stored in system catalogs. Finding ith field requires scan of record. Note the importance of schema information!

Record Header
To schema length F1
L1 header timestamp F2 L2

F3
L3

F4
L4

Need the header because: The schema may change for a while new+old may coexist Records from different relations may coexist

Variable Length Records


Other header information

header

F1 L1

F2 L2

F3
L3

F4
L4

length

Place the fixed fields first: F1, F2 Then the variable length fields: F3, F4 Null values take 2 bytes only Sometimes they take 0 bytes (when at the end)

Records With Referencing Fields


Other header information

header

F1 L1

F2 L2

F3
L3

length

E.g. to represent one-many or many-many relationships

Storing Records in Blocks


Blocks have fixed size (typically 4k)
BLOCK R4 R3 R2 R1

Spanning Records Across Blocks


block header block header

R1

R2

R2

R3

When records are very large Or even medium size: saves space in blocks

BLOB
Binary large objects Supported by modern database systems E.g. images, sounds, etc. Storage: attempt to cluster blocks together

Modifications: Insertion
File is unsorted
add it to the end

File is sorted:
Is there space in the right block ?
Yes: we are lucky, store it there

Is there space in a neighboring block ?


Look 1-2 blocks to the left/right, shift records

If anything else fails, create overflow block

Overflow Blocks
Blockn-1 Blockn Blockn+1

Overflow

After a while the file starts being dominated by overflow blocks: time to reorganize

Modifications: Deletions
Free space in block, shift records Maybe be able to eliminate an overflow block

Modifications: Updates
If new record is shorter than previous, easy If it is longer, need to shift records, create overflow blocks

Physical Addresses
Each block and each record have a physical address that consists of:
The host The disk The cylinder number The track number The block within the track For records: an offset in the block sometimes this is in the blocks header

Logical Addresses
Logical address: a string of bytes (1016) More flexible: can blocks/records around But need translation table:
Logical address L1 L2 L3 Physical address P1 P2 P3

Main Memory Address


When the block is read in main memory, it receives a main memory address Buffer manager has another translation table
Memory address M1 M2 M3 Logical address L1 L2 L3

Designing Physical/Internal Model Overview terminology Access methods

Physical Design
Internal Model/Physical Model
User request Interface 1
External Model

DBMS Model Internal


Access Methods

Interface 2 Operating System Access Methods

Interface 3

Data Base

Physical Design
Interface 1: User request to the DBMS. The user presents a query, the DBMS determines which physical DBs are needed to resolve the query Interface 2: The DBMS uses an internal model access method to access the data stored in a logical database. Interface 3: The internal model access methods and OS access methods access the physical records of the database.

Physical File Design


A Physical file is a portion of secondary storage (disk space) allocated for the purpose of storing physical records Pointers - a field of data that can be used to locate a related field or record of data Access Methods - An operating system algorithm for storing and locating data in secondary storage Pages - The amount of data read or written in one disk input or output operation

Internal Model Access Methods


Many types of access methods:
Physical Sequential Indexed Sequential Indexed Random Inverted Direct Hashed

Differences in
Access Efficiency Storage Efficiency

Physical Sequential
Key values of the physical records are in logical sequence Main use is for dump and restore Access method may be used for storage as well as retrieval Storage Efficiency is near 100% Access Efficiency is poor (unless fixed size physical records)

Indexed Sequential
Key values of the physical records are in logical sequence Access method may be used for storage and retrieval Index of key values is maintained with entries for the highest key values per block(s) Access Efficiency depends on the levels of index, storage allocated for index, number of database records, and amount of overflow Storage Efficiency depends on size of index and volatility of database

Index Sequential
Adams Becker Dumpling

Data File Block 1

Actual Value Dumpling Harty Texaci ...

Address Block Number 1 2 3

Getta Harty

Block 2

Mobile Sunoci Texaci

Block 3

Indexed Sequential: Two Levels


Key Value
150 385 Key Value 385 678 805 Address

Address
1 2

001 003 . . 150 251 . . 385 455 480 . . 536 605 610 . . 678 705 710 . . 785

7 8 9

Key Value
536 678

Address
3 4

Key Value
785 805

Address
5 6

791 . . 805

Indexed Random
Key values of the physical records are not necessarily in logical sequence Index may be stored and accessed with Indexed Sequential Access Method Index has an entry for every data base record. These are in ascending order. The index keys are in logical sequence. Database records are not necessarily in ascending sequence. Access method may be used for storage and retrieval

Indexed Random
Becker Harty
Actual Value Adams Becker Dumpling Getta Address Block Number 2 1 3 2

Adams Getta

Harty

Dumpling

Btree
F || P || Z| B || D || F| H || L || P| R || S || Z|

Devils Flyers Hawkeyes Hoosiers Minors Panthers Seminoles

Aces Boilers Cars

Inverted
Key values of the physical records are not necessarily in logical sequence Access Method is better used for retrieval An index for every field to be inverted may be built Access efficiency depends on number of database records, levels of index, and storage allocated for index

Inverted
CH 145 101, 103,104
Actual Value CH 145 CS 201 CS 623 PH 345 Address Block Number 1 2 3

Student name

Course Number

Adams Becker

CH145 cs201

Dumpling ch145

CS 201 102

Getta
Harty Mobile

ch145
cs623 cs623

CS 623 105, 106

Direct
Key values of the physical records are not necessarily in logical sequence There is a one-to-one correspondence between a record key and the physical address of the record May be used for storage and retrieval Access efficiency always 1 Storage efficiency depends on density of keys No duplicate keys permitted

Hashing
Key values of the physical records are not necessarily in logical sequence Many key values may share the same physical address (block) May be used for storage and retrieval Access efficiency depends on distribution of keys, algorithm for key transformation and space allocated Storage efficiency depends on distibution of keys and algorithm used for key transformation

Comparative Access Methods


Factor Storage space Sequential retrieval on primary key Random Retr. Multiple Key Retr. Deleting records Sequential No wasted space Very fast Indexed
No wasted space for data but extra space for index

Hashed
more space needed for addition and deletion of records after initial load

Moderately Fast Moderately Fast Very fast with multiple indexes OK if dynamic

Impractical Very fast Not possible very easy

Impractical Possible but needs a full scan can create wasted space Adding records requires rewriting file Updating records usually requires rewriting file

OK if dynamic
Easy but requires Maintenance of indexes

very easy
very easy

You might also like