0% found this document useful (0 votes)

7 views

1 - Disk Storage - Ch13

This document discusses disk storage and basic file structures in databases. It covers topics like disk storage devices, unordered files, ordered files, and hashed files. Specifically, it describes how data is stored on magnetic disks in tracks and sectors, and how files can be organized in heap files without order, ordered sequential files sorted by a key, or hashed files that map keys to buckets using a hash function.

Uploaded by

modyxstar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

1 - Disk Storage - Ch13

Uploaded by

modyxstar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

Chapter 13: Disk Storage, Basic File

Structures, and Hashing

Chapter Outline
l Disk Storage Devices
l Files of Records
l Operations on Files
l Unordered Files
l Ordered Files
l Hashed Files
• Dynamic and Extendible Hashing Techniques
Disk Storage Devices
l Data stored as magnetized areas on magnetic disk
surfaces.
l A disk pack contains several magnetic disks connected
to a rotating spindle.
l Disks are divided into concentric circular tracks on
each disk surface.
l Track capacities vary typically from 4 to 50 Kbytes or more
l A track is divided into smaller blocks or sectors
l because it usually contains a large amount of information

*
10

A typical hard drive

Tracks

Platter

Spindle
Cylinders
Disk head
Disk arm
Platter

Arm movement Spindle rotation “Moving parts” are slow

Top view
“Zoning”: more sectors/data on outer tracks
Track
Track
Track Sectors

A block is a
logical unit
of transfer
12

Disk access time

Sum of:
• Seek time: time for disk heads to move to the
correct cylinder
• Rotational delay: time for the desired block to
rotate under the disk head
• Transfer time: time to read/write data in the block
(= time for disk to rotate over the block)
Data on External Storage
• Data must persist on disk across program executions in a
DBMS
– Data is huge
– Must persist across executions
– But has to be fetched into main memory when DBMS processes the
data

• The unit of information for reading data from disk, or writing

data to disk, is a page

• Disks: Can retrieve random page at fixed cost

– But reading several consecutive pages is much cheaper than reading
them in random order
Disk Space Management
• Lowest layer of DBMS software manages space on disk

• Higher levels call upon this layer to:

– allocate/de-allocate a page
– read/write a page

• Size of a page = size of a disk block

= data unit

• Request for a sequence of pages often satisfied by allocating

contiguous blocks on disk

• Space on disk managed by Disk-space Manager

– Higher levels don’t need to know how this is done, or how free space
is managed
Buffer Management
Suppose
• 1 million pages in db, but only space for 1000 in memory
• A query needs to scan the entire file
• DBMS has to
– bring pages into main memory
– decide which existing pages to replace to make room for a new
page
– called Replacement Policy
• Managed by the Buffer manager
– Files and access methods ask the buffer manager to access a
page mentioning the “record id” (soon)
– Buffer manager loads the page if not already there
Files of Records
• Page or block is OK when doing I/O, but higher
levels of DBMS operate on records, and files of
records
• FILE: A collection of pages, each containing a
collection of records
• Must support:
– insert/delete/modify record
– read a particular record (specified using record id)
– scan all records (possibly with some conditions on the
records to be retrieved)
File Organization

• File organization: Method of arranging a file of

records on external storage
– One file can have multiple pages
– Record id (rid) is sufficient to physically locate the page
containing the record on disk
– Indexes are data structures that allow us to find the
record ids of records with given values in index search key
fields

• NOTE: Several uses of “keys” in a database

– Primary/foreign/candidate/super keys
– Index search keys
Alternative File Organizations
Many alternatives exist, each ideal for some situations, and
not so good in others:
• Heap (random order) files: Suitable when typical access is a
file scan retrieving all records
• Sorted Files: Best if records must be retrieved in some
order, or only a “range” of records is needed.
• Indexes: Data structures to organize records via trees or
hashing
– Like sorted files, they speed up searches for a subset of records,
based on values in certain (“search key”) fields
– Updates are much faster than in sorted files
Unordered (Heap) Files

• Simplest file structure contains records in no

particular order
• As file grows and shrinks, disk pages are allocated
and de-allocated
• To support record level operations, we must:
– keep track of the pages in a file
– keep track of free space on pages
– keep track of the records on a page
• There are many alternatives for keeping track of this
Heap File Implemented as a List
Data Data Data Full Pages
Page Page Page
Header
Page
Data Data Data
Page Page Page Pages with
Free Space

• The header page id and Heap file name must be stored

someplace
• Each page contains 2 `pointers’ plus data
• Problem?
– to insert a new record, we may need to scan several pages
on the free list to find one with sufficient space
How do we arrange a collection of
records on a page?
• Each page contains several slots
– one for each record

• Record is identified by <page-id, slot-number>

• Fixed-Length Records
• Variable-Length Records

• For both, there are options for

– Record formats (how to organize the fields within a record)
– Page formats (how to organize the records within a page)
Ordered Files
• Also called a sequential file.
• File records are kept sorted by the values of an ordering field.
• Insertion is expensive: records must be inserted in the correct
order.
• It is common to keep a separate unordered overflow (or transaction) file for
new records to improve insertion efficiency; this is periodically merged with
the main ordered file.
• A binary search can be used to search for a record on its
ordering field value.
• This requires reading and searching log2 of the file blocks on the average,
an improvement over linear search.
• Reading the records in order of the ordering field is quite
efficient.
Ordered Files (contd.)

Disk Storage
Average Access Times
The following table shows the average access time to
access a specific record for a given type of file

*
Hashed Files
• Hashing for disk files is called External Hashing
• The file blocks are divided into M equal-sized buckets,
numbered bucket0, bucket1, ..., bucketM-1.
• Typically, a bucket corresponds to one (or a fixed number of) disk
block.
• One of the file fields is designated to be the hash key of the file.
• The record with hash key value K is stored in bucket i, where
i=h(K), and h is the hashing function. E.g., h(k)= K mod M
• Search is very efficient on the hash key.
• Collisions occur when a new record hashes to a bucket that is
already full.
• An overflow file is kept for storing such records.
• Overflow records that hash to each bucket can be linked together.
Hashed Files (contd.)
• There are numerous methods for collision resolution, including the
following:
• Open addressing: Proceeding from the occupied position specified by the
hash address, the program checks the subsequent positions in order until
an unused (empty) position is found.
• Chaining: For this method, various overflow locations are kept, usually by
extending the array with a number of overflow positions. In addition, a
pointer field is added to each record location. A collision is resolved by
placing the new record in an unused overflow location and setting the
pointer of the occupied hash address location to the address of that
overflow location.
• Multiple hashing: The program applies a second hash function if the first
results in a collision. If another collision results, the program uses open
addressing or applies a third hash function and then uses open addressing
if necessary.
Static Hashing
• Pages containing data = a collection of buckets
– each bucket has one primary page, also possibly
overflow pages
– buckets contain data entries k*

0
h(key) mod N
2
key
h

N-1
Primary bucket pages Overflow pages
Static Hashing
• # primary pages fixed
– allocated sequentially, never de-allocated, overflow pages if
needed.
• h(k) mod N = bucket to which data entry with key k
belongs
– N = # of buckets

0
h(key) mod N
2
key
h

N-1
Primary bucket pages Overflow pages
Static Hashing
• Hash function works on search key field of record r
– Must distribute values over range 0 ... N-1
– h(key) = (a * key + b) usually works well
• bucket = h(key) mod N
– a and b are constants – chosen to tune h
• Advantage:
– #buckets known – pages can be allocated sequentially
– search needs 1 I/O (if no overflow page)
– insert/delete needs 2 I/O (if no overflow page) (why 2?)
• Disadvantage:
– Long overflow chains can develop if file grows and degrade performance (data
skew)
– Or waste of space if file shrinks
• Solutions:
– keep some pages say 80% full initially
– Periodically rehash if overflow pages (can be expensive)
– or use Dynamic Hashing
Extendible Hashing
• Consider static hashing
• Bucket (primary page) becomes full

• Why not re-organize file by doubling # of buckets?

– Reading and writing (double #pages) all pages is expensive

• Idea: Use directory of pointers to buckets

– double # of buckets by doubling the directory, splitting just the
bucket that overflowed
– Directory much smaller than file, so doubling it is much cheaper
– Only one page of data entries is split
– No overflow page (new bucket, no new overflow page)
– Trick lies in how hash function is adjusted
Example: h(k) is 4 bits; 2 keys/block
i= 2
(j)
1
i= 00
1 0001
01

10
1 2
1001 11
1010 1100

1 2 New directory
Insert 1010 1100

*
Example continued
2
0000
i= 2
0001
00
1 2
01
0001 0111
10 0111

11 2
1001
1010

Insert: 2
0111 1100
0000
Example continued
i= 3
0000 2
000
i= 2 0001
001
00 0111 2

01 010

10 011
1001 3

11 1001 100

1010 1001 2 3 101

1010 110
Insert:
1001 1100 2 111

*
When does bucket split cause
directory doubling?
• Before insert, local depth of bucket = global depth
• Insert causes local depth to become > global
depth
• directory is doubled by copying it over and `fixing’
pointer to split image page
Comments on Extendible Hashing
• If directory fits in memory, equality search answered with one
disk access (to access the bucket); else:
– Directory grows in spurts, and, if the distribution of hash values is skewed,
directory can grow large
– Multiple entries with same hash value cause problems
• Delete:
– If removal of data entry makes bucket empty, can be merged with `split
image’
– If each directory element points to same bucket as its split image, can
halve directory.
Extendible Hashing
Dynamic Hashing

Chapter 17 Disk Storage, Basic File Structures, and Hashing Disk Storage Devices
No ratings yet
Chapter 17 Disk Storage, Basic File Structures, and Hashing Disk Storage Devices
10 pages
Parametric Analysis in ANSYS Workbench Using ANSYS Fluent
No ratings yet
Parametric Analysis in ANSYS Workbench Using ANSYS Fluent
48 pages
Disk Storage, Basic File Structures, and Hashing
No ratings yet
Disk Storage, Basic File Structures, and Hashing
34 pages
2MCA2 DBMS Nit 2 Secondary Storage. 16960710426030.Pptx
No ratings yet
2MCA2 DBMS Nit 2 Secondary Storage. 16960710426030.Pptx
32 pages
Elmasri Storage Hashing
No ratings yet
Elmasri Storage Hashing
27 pages
22-File Organization-06-09-2024
No ratings yet
22-File Organization-06-09-2024
23 pages
File Structures Indexing Kopyası
No ratings yet
File Structures Indexing Kopyası
76 pages
ENACh 13 Final
No ratings yet
ENACh 13 Final
34 pages
DBMS Chapter 4 Record Organization and Dile Management
No ratings yet
DBMS Chapter 4 Record Organization and Dile Management
36 pages
DBMS Storage and Indexing
No ratings yet
DBMS Storage and Indexing
90 pages
Chapter 6-
No ratings yet
Chapter 6-
62 pages
Chapter 6- - Copy
No ratings yet
Chapter 6- - Copy
62 pages
Chapter - 8 1 97
No ratings yet
Chapter - 8 1 97
97 pages
File Organization
No ratings yet
File Organization
11 pages
Chapter_2 - Disk Storage, Basic File Structures, and Hashing
No ratings yet
Chapter_2 - Disk Storage, Basic File Structures, and Hashing
71 pages
File Structures Indexing
No ratings yet
File Structures Indexing
58 pages
Unit 5
No ratings yet
Unit 5
185 pages
7_DataStorageIndexingStructures
No ratings yet
7_DataStorageIndexingStructures
83 pages
DBMS Storage and Indexing
No ratings yet
DBMS Storage and Indexing
80 pages
File Organization CH16 Updated
No ratings yet
File Organization CH16 Updated
30 pages
Class 6
No ratings yet
Class 6
15 pages
File Organization Notes
No ratings yet
File Organization Notes
21 pages
Data Management: INFO125
No ratings yet
Data Management: INFO125
111 pages
Elmasri_6e_Ch17_ppt_Compatibility_Mode_Repaired
No ratings yet
Elmasri_6e_Ch17_ppt_Compatibility_Mode_Repaired
32 pages
1. Elmasri_6e_Ch17
No ratings yet
1. Elmasri_6e_Ch17
43 pages
UNIT-IV - File Organization
No ratings yet
UNIT-IV - File Organization
10 pages
Inls 623 - Database Systems Ii - File Structures, Indexing, and Hashing
No ratings yet
Inls 623 - Database Systems Ii - File Structures, Indexing, and Hashing
41 pages
Chapter 11: Indexing and Storage: Modified From: Database System Concepts, 6 Ed
No ratings yet
Chapter 11: Indexing and Storage: Modified From: Database System Concepts, 6 Ed
53 pages
Dbms Unit III Notes
No ratings yet
Dbms Unit III Notes
27 pages
8.Physical Database Design
No ratings yet
8.Physical Database Design
20 pages
Layers of a DBMS
No ratings yet
Layers of a DBMS
38 pages
File Storage and Indexing: Lesson 13 Cs 3200 Kathleen Durant PHD
No ratings yet
File Storage and Indexing: Lesson 13 Cs 3200 Kathleen Durant PHD
46 pages
File Organization and Indexing: Structure of Disks
No ratings yet
File Organization and Indexing: Structure of Disks
28 pages
1 File Structure & Organization
No ratings yet
1 File Structure & Organization
23 pages
File Organization in DBMS
No ratings yet
File Organization in DBMS
13 pages
CH 13
No ratings yet
CH 13
6 pages
DS_TM_Study_Material_Presentations_Unit-4_1TM
No ratings yet
DS_TM_Study_Material_Presentations_Unit-4_1TM
22 pages
Chapter 12: Indexing and Hashing
No ratings yet
Chapter 12: Indexing and Hashing
31 pages
Lecture 17
No ratings yet
Lecture 17
24 pages
file organization
No ratings yet
file organization
9 pages
5 Data Storage and Indexing
No ratings yet
5 Data Storage and Indexing
60 pages
m5 Index PDF
No ratings yet
m5 Index PDF
60 pages
Lecture15 Fall
No ratings yet
Lecture15 Fall
102 pages
Unit-3 Part 2 Indexing and Hashing
No ratings yet
Unit-3 Part 2 Indexing and Hashing
36 pages
8 DataStorageIndexingStructures Updated
No ratings yet
8 DataStorageIndexingStructures Updated
57 pages
L2.2-File Organization Techniques
No ratings yet
L2.2-File Organization Techniques
42 pages
UNIT 5 File Organization in DBMS
No ratings yet
UNIT 5 File Organization in DBMS
22 pages
The Bare Basics: Storing Data On Disks and Files
No ratings yet
The Bare Basics: Storing Data On Disks and Files
33 pages
5 Data Storage and Indexing
No ratings yet
5 Data Storage and Indexing
58 pages
LM2 File Organisation
No ratings yet
LM2 File Organisation
31 pages
Indexing
No ratings yet
Indexing
62 pages
Layers of A DBMS: Query Optimization Query Processor Query
No ratings yet
Layers of A DBMS: Query Optimization Query Processor Query
15 pages
DBMS Unit 3
No ratings yet
DBMS Unit 3
81 pages
d-s-s-1
No ratings yet
d-s-s-1
6 pages
Disk Storage, Basic File Structures, and Hashing: Database Design Database Design
No ratings yet
Disk Storage, Basic File Structures, and Hashing: Database Design Database Design
13 pages
Data Storage: Agnibesh Samanta Mba-Final Year
No ratings yet
Data Storage: Agnibesh Samanta Mba-Final Year
12 pages
File Organization
No ratings yet
File Organization
45 pages
FreeBSD Mastery: Storage Essentials: IT Mastery, #4
From Everand
FreeBSD Mastery: Storage Essentials: IT Mastery, #4
Michael W. Lucas
No ratings yet
OpenBSD Mastery: Filesystems: IT Mastery, #19
From Everand
OpenBSD Mastery: Filesystems: IT Mastery, #19
Michael W. Lucas
No ratings yet
Bash Shell from Zero to Hero: An SRE's Practical Guide to Terminal Skills, Scripting, and Automation
From Everand
Bash Shell from Zero to Hero: An SRE's Practical Guide to Terminal Skills, Scripting, and Automation
Nolan Reeves
No ratings yet
Computer Science I Essentials
From Everand
Computer Science I Essentials
Randall Raus
5/5 (7)
Download Full The Haskell School of Music From Signals to Symphonies 1st Edition Paul Hudak PDF All Chapters
100% (11)
Download Full The Haskell School of Music From Signals to Symphonies 1st Edition Paul Hudak PDF All Chapters
77 pages
Sapthaa Institute of Film Technology - Revised
No ratings yet
Sapthaa Institute of Film Technology - Revised
22 pages
Latitude 7.0 System Specifications 24 Sep 2015
No ratings yet
Latitude 7.0 System Specifications 24 Sep 2015
7 pages
Garden Reach 211021
No ratings yet
Garden Reach 211021
22 pages
SSC209 - Top 10 Database Maintenance Best Practices
No ratings yet
SSC209 - Top 10 Database Maintenance Best Practices
38 pages
Book M1
No ratings yet
Book M1
42 pages
Park Warning: Shri Sant Gajanan Maharaj College of Engineering Shegaon - 444203 (M.S.)
No ratings yet
Park Warning: Shri Sant Gajanan Maharaj College of Engineering Shegaon - 444203 (M.S.)
19 pages
Tamil Nadu E-District: Application Training Manual
No ratings yet
Tamil Nadu E-District: Application Training Manual
24 pages
Digital Communications A Discretetime Approach Rice Michael pdf download
No ratings yet
Digital Communications A Discretetime Approach Rice Michael pdf download
74 pages
(eBook PDF) Principles of Information Systems 13th Edition pdf download
100% (2)
(eBook PDF) Principles of Information Systems 13th Edition pdf download
46 pages
Syllogism Quiz 13
No ratings yet
Syllogism Quiz 13
12 pages
Chapter3 Stream Ciphers
No ratings yet
Chapter3 Stream Ciphers
28 pages
Focus40 Blue Users Guide
No ratings yet
Focus40 Blue Users Guide
57 pages
Urn Uvci 01 Ro W10dmyl82pkjp10d54n6or37gev9x5#1
No ratings yet
Urn Uvci 01 Ro W10dmyl82pkjp10d54n6or37gev9x5#1
2 pages
Job Description For Tata Motor's Ltd. (2019) ..
No ratings yet
Job Description For Tata Motor's Ltd. (2019) ..
5 pages
ASSIGNMENT A-001 - Girls
No ratings yet
ASSIGNMENT A-001 - Girls
29 pages
Nokia - CNUM PoC Report v1
No ratings yet
Nokia - CNUM PoC Report v1
9 pages
Servo and Proportional Valves
No ratings yet
Servo and Proportional Valves
19 pages
Amharic NamedEntityRecognition
No ratings yet
Amharic NamedEntityRecognition
85 pages
Oose Lab
No ratings yet
Oose Lab
31 pages
DISM Pro Questions Solved
No ratings yet
DISM Pro Questions Solved
5 pages
1 1 - Download and Install Neo4j Desktop Go To Https - Neo4j-Com-Downl
No ratings yet
1 1 - Download and Install Neo4j Desktop Go To Https - Neo4j-Com-Downl
18 pages
Cómo Escribir Un Ensayo Sobre Comunicaciones
100% (1)
Cómo Escribir Un Ensayo Sobre Comunicaciones
4 pages
A Study of Recent Research Trends of Proxy Server: Yogita Chhabra
No ratings yet
A Study of Recent Research Trends of Proxy Server: Yogita Chhabra
6 pages
Blockchain-Based Privacy-Preserving Shop Floor Auditing Architecture
No ratings yet
Blockchain-Based Privacy-Preserving Shop Floor Auditing Architecture
8 pages
Prisma ii platform
No ratings yet
Prisma ii platform
12 pages
Java+Programming+ +UNIT I+Part+A
No ratings yet
Java+Programming+ +UNIT I+Part+A
75 pages
Lab3 Solutions
No ratings yet
Lab3 Solutions
7 pages
Airplus Instructional Sheet
No ratings yet
Airplus Instructional Sheet
6 pages

1 - Disk Storage - Ch13

Uploaded by

1 - Disk Storage - Ch13

Uploaded by

Chapter 13: Disk Storage, Basic File

Structures, and Hashing

A typical hard drive

Arm movement Spindle rotation “Moving parts” are slow

Disk access time

• The unit of information for reading data from disk, or writing

• Disks: Can retrieve random page at fixed cost

• Higher levels call upon this layer to:

• Size of a page = size of a disk block

• Request for a sequence of pages often satisfied by allocating

• Space on disk managed by Disk-space Manager

• File organization: Method of arranging a file of

• NOTE: Several uses of “keys” in a database

• Simplest file structure contains records in no

• The header page id and Heap file name must be stored

• Record is identified by <page-id, slot-number>

• For both, there are options for

• Why not re-organize file by doubling # of buckets?

• Idea: Use directory of pointers to buckets

1010 1001 2 3 101

You might also like