22-File Organization-06-09-2024
22-File Organization-06-09-2024
Indexing
1
DBMS stores data on hard disks
• This means that data needs to be
– read from the hard disk into memory (RAM)
– Written from the memory onto the hard disk
• Because I/O disk operations are slow query performance
depends upon how data is stored on hard disks
• The lowest component of the DBMS performs storage
management activities
• Other DBMS components need not know how these low
level activities are performed
2 2
Basics of Data storage on
hard disk
A disk is organized
into a number of
blocks or pages
A page is the unit of
exchange between
the disk and the
main memory
A collection of pages
is known as a file
DBMS stores data in
one or more files on
the hard disk
3
File Organization
The physical arrangement of data in a file into records and pages
on the disk
File organization determines the set of access methods for
Storing and retrieving records from a file
4
Organization of Records in
Files
• Heap – a record can be placed anywhere in the file where
there is space
• Hashing –
This function computed on some attribute of each record.
The term hash indicates splitting of key into pieces.
Records of each relation may be stored in a separate file.
5
Unordered Or Heap File
Records are stored in the same order in which they are
created
Insert operation
Fast – because the incoming record is written at the end of the last
page of the file
Delete Operation
Slow – because the record to be deleted is first searched
Deleting the record creates a hole in the page
6
Ordered or Sequential File
Records are sorted on the values of one or more fields
Ordering field – the field on which the records are sorted
Delete Operation
Fast – because searching the record is fast
Insert Operation
Poor – because if we insert the new record in the correct position
we need to shift more than half the subsequent records in the file
Alternatively an ‘overflow file’ is created which contains all the
new records as a heap
Periodically overflow file is merged with the main file
7
Sequential access vs random
access .
• sequential access
means that a group of
elements is accessed
predetermined, ordered
sequence
9 9
A bucket is a unit of storage containing one or more records
(a bucket is typically a disk block).
10
Hash File
Insert Operation
Fast – because the hash function computes the index of
the bucket to which the record belongs
If that bucket is full you go to the next free one
Search Operation
Fast – because the hash function computes the index of
the bucket
Delete Operation
Fast – once again for the same reason of hashing
function being able to locate the record quick
11
Internal Hashing:
• Opening Addressing:
-Proceeding from occupied position specified by the hash address,
program check the subsequent position in order until an unused empty
position is found.
• Chaining
-Various overflow locations are kept, usually by extending the array
with number of overflow position
-A pointer field is added to each record location.
• Multiple hashing:
External Hashing:
- Hashing for disk file is called External Hashing
- The Goal of good hashing function is to distribute the record
uniformly over the address space so as to minimize collisions.
12
Static Hashing
Dynamic Hashing
Dynamic hashing provides a
mechanism in which data buckets are
added and removed dynamically and
on-demand(extended hashing)
13
Overflow Chaining: When buckets are
full, a new bucket is allocated for the
same hash result and is linked after the
previous one.
This mechanism is called Closed
Hashing.
14
Hash file organization of account file, using branch_name as key
For a string search - key, the binary representations of all the characters in the
string could be added and the sum modulo the number of buckets could be
returned
Use of Extendable Hash Structure:
Example
15
Initial Hash structure, bucket size = 2
17
18
19
Indexing
• Index file stores each value of the index field along with pointer
(eg:page no.) pointer(s) to block(s) that contain record(s) with that field
value or pointer to the record with that field value:<Indexing Field, Pointer>
• Index file much smaller than the data file => searching will be fast.
20
Choosing Indexing Technique
• Five Factors involved when choosing the
indexing technique:
• access type
• access time
• insertion time
• deletion time
• space overhead
21
Two Types of Indices
22
Types of Indexes
• Indexes on ordered vs. unordered files