0% found this document useful (0 votes)
15 views23 pages

22-File Organization-06-09-2024

sbdf

Uploaded by

Hemesh R
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views23 pages

22-File Organization-06-09-2024

sbdf

Uploaded by

Hemesh R
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 23

File Organization &

Indexing

1
DBMS stores data on hard disks
• This means that data needs to be
– read from the hard disk into memory (RAM)
– Written from the memory onto the hard disk
• Because I/O disk operations are slow query performance
depends upon how data is stored on hard disks
• The lowest component of the DBMS performs storage
management activities
• Other DBMS components need not know how these low
level activities are performed

2 2
Basics of Data storage on
hard disk
 A disk is organized
into a number of
blocks or pages
 A page is the unit of
exchange between
the disk and the
main memory
 A collection of pages
is known as a file
 DBMS stores data in
one or more files on
the hard disk

3
File Organization
 The physical arrangement of data in a file into records and pages
on the disk
 File organization determines the set of access methods for
 Storing and retrieving records from a file

 We study three types of file organization


 Unordered or Heap files
 Ordered or sequential files
 Hash files

 We examine each of them in terms of the operations we perform on the


database
 Insert a new record
 Search for a record (or update a record)
 Delete a record

4
Organization of Records in
Files
• Heap – a record can be placed anywhere in the file where
there is space

• Sequential – store records in sequential order, based on the


value of the search key of each record.

• Hashing –
 This function computed on some attribute of each record.
 The term hash indicates splitting of key into pieces.
 Records of each relation may be stored in a separate file.

5
Unordered Or Heap File
 Records are stored in the same order in which they are
created

 Insert operation
 Fast – because the incoming record is written at the end of the last
page of the file

 Search (or update) operation


 Slow – because linear search is performed on pages

 Delete Operation
 Slow – because the record to be deleted is first searched
 Deleting the record creates a hole in the page

6
Ordered or Sequential File
 Records are sorted on the values of one or more fields
Ordering field – the field on which the records are sorted

 Search (or update) Operation


Fast – because binary search is performed on sorted records

 Delete Operation
Fast – because searching the record is fast

 Insert Operation
Poor – because if we insert the new record in the correct position
we need to shift more than half the subsequent records in the file
Alternatively an ‘overflow file’ is created which contains all the
new records as a heap
Periodically overflow file is merged with the main file

7
Sequential access vs random
access .
• sequential access
means that a group of
elements is accessed
predetermined, ordered
sequence

• Random Access files


will be spited in to pieces
and will be stored
wherever spaces
available.

• Sequential file may load


faster and random access
8
files may take time
Hash File
• Is an array of buckets
– Given a record, k a hash function, h(k) computes the index of
the bucket in which record k belongs
– h uses one or more fields in the record called hash fields
– Hash key - the key of the file when it is used by the hash
function
– h(K)=K mod M

• Example hash function


– Assume that the staff last name is used as the hash field
– Assume also that the hash file size is 26 buckets - each bucket
corresponding to each of the letters from the alphabet
– Then a hash function can be defined which computes the bucket
address (index) based on the first letter in the last name.

9 9
A bucket is a unit of storage containing one or more records
(a bucket is typically a disk block).

Hash function is used to locate records for access, insertion


as well as deletion.

Hashing is an effective technique to calculate direct location


of data record on the disk without using index structure.

10
Hash File
Insert Operation
Fast – because the hash function computes the index of
the bucket to which the record belongs
 If that bucket is full you go to the next free one
Search Operation
Fast – because the hash function computes the index of
the bucket

Delete Operation
Fast – once again for the same reason of hashing
function being able to locate the record quick

11
Internal Hashing:
• Opening Addressing:
-Proceeding from occupied position specified by the hash address,
program check the subsequent position in order until an unused empty
position is found.

• Chaining
-Various overflow locations are kept, usually by extending the array
with number of overflow position
-A pointer field is added to each record location.

• Multiple hashing:

External Hashing:
- Hashing for disk file is called External Hashing
- The Goal of good hashing function is to distribute the record
uniformly over the address space so as to minimize collisions.

12
Static Hashing

!!! ….Problem with static hashing


is that it does not expand or
shrink dynamically as the size of
database grows or shrinks….???

Dynamic Hashing
Dynamic hashing provides a
mechanism in which data buckets are
added and removed dynamically and
on-demand(extended hashing)

13
Overflow Chaining: When buckets are
full, a new bucket is allocated for the
same hash result and is linked after the
previous one.
This mechanism is called Closed
Hashing.

Linear Probing: When hash function


generates an address at which data is
already stored, the next free bucket is
allocated to it.
This mechanism is called Open Hashing.

14
Hash file organization of account file, using branch_name as key

For a string search - key, the binary representations of all the characters in the
string could be added and the sum modulo the number of buckets could be
returned
Use of Extendable Hash Structure:
Example

15
Initial Hash structure, bucket size = 2
17
18
19
Indexing

• Index File (same idea as textbook index) : auxiliary structure designed to


speed up access to desired data.
• Indexing field: field on which the index file is defined.

• Index file stores each value of the index field along with pointer
(eg:page no.) pointer(s) to block(s) that contain record(s) with that field
value or pointer to the record with that field value:<Indexing Field, Pointer>

• To find a record in the data file based on a certain selection criterion on an


indexing field , we initially access the index file, which will allow the access
of the record on the data file.

• Index file much smaller than the data file => searching will be fast.

• Indexing important for file systems and DBMSs:

20
Choosing Indexing Technique
• Five Factors involved when choosing the
indexing technique:
• access type
• access time
• insertion time
• deletion time
• space overhead

21
Two Types of Indices

• Ordered index (Primary index or clustering


index) – which is used to access data sorted by
order of values.

• Hash index (secondary index or non-clustering


index ) - used to access data that is distributed
uniformly across a range of buckets.

22
Types of Indexes
• Indexes on ordered vs. unordered files

• Dense vs. non-dense (i.e. sparse) indexes


- Dense: An entry in the index file for each record of the data file.
- Sparse: only some of the data records are represented in the index, often
one index entry per block of the data file.

• Primary indexes vs. secondary indexes

• Ordered Indexes – Hash indexes


- Ordered Indexes: indexing fields stored in sorted order.
- Hash indexes: indexing fields stored using a hash function.

• Single-level vs. multi-level


– single-level index is an ordered file and is searched using binary search.
– multi-level ones are tree-structured that improve the search and require a
more elaborate search algorithm.

• Index on a single indexing field –


• Index on multiple indexing fields (i.e. Composite Index).
23

You might also like