Unit5 File Organization
Unit5 File Organization
•Primary index =
an ordered
index whose search
key is also the sort
key used for
the sequential file
The primary Indexing in DBMS is also further divided into two
types.
•Dense Index What is Sparse Index:
• When there are large database tables and if we
•Sparse Index use the dense index, then its size increases, so
the solution to this problem is sparse index.
• According to sparse index, index points to records
What is a Dense primary index in the main tables in the form of group. For
Each record in the main table strictly example, one sparse index can point to more
than one records of the main database tables.
has one entry in the index table.
What is the clustered index?
In a clustered index, table records are sorted physically to
match the index.
What is the Secondary Index?
• Secondary index: an index whose search key does not specify an order different from
the sequential order of the file. Also called non-clustering index.
• Secondary index manages the index in multi-levels.
• Multi-level indexing is an advancement in the secondary matrix, and we use
more and more levels in multi-level indexing.
Multilevel indexes
• The purpose of multilevel indexing is to reduce the
number of block access required to locate a record.
• More than on level of index files are maintained.
• Every level will reduce the number of block access
required by a factor of bfr(blocking factor). This is
called fan out of the multilevel index.
• The first level is an ordered file with a distinct value
for each K(i).
• The second level is primary index for the first level
has one entry for block of the first level.
• The third level is the primary index for the second
level has an entry for some index level fit in a single
block. This is called the top index level.
Multilevel indexes
Example:
17.3 Dynamic Multilevel Indexes Using B-
Trees and B+ -Trees
• Tree data structure terminology
• Tree is formed of nodes
• Each node (except root) has one parent and zero or more child nodes
• Leaf node has no child nodes
• Unbalanced if leaf nodes occur at different levels
• Nonleaf node called internal node
• Subtree of node consists of node and all descendant nodes
Slide 17- 31
Difference Between B-Tree And B+ Tree
B-Tree B+ Tree
Data is stored in leaf nodes as well as Data is stored only in leaf nodes.
internal nodes.
Searching is a bit slower as data is Searching is faster as the data is stored
stored in internal as well as leaf nodes. only in the leaf nodes.
No redundant search keys are present. Redundant search keys may be present.
Deletion operation is complex. Deletion operation is easy as data can
be directly deleted from the leaf nodes.
Leaf nodes cannot be linked together. Leaf nodes are linked together to form a
linked list.
Motivation
• Tree-based data structures
– O(logN) access time (Find, Insert, Delete)
• Can we do better than this?
– If we consider the average case rather than worst case, is there a O(1)
time approach with high probability?
– Hashing is such a data structure that allows for efficient insertion,
deletion, and searching of keys in O(1) time on average.
• Numerous applications
– Symbol table of variables in Compilers
– Virtual to physical memory translation in Operating Systems
– String matching
Components of Hashing
• Hash table is an array of some fixed size, containing the items.
– Generally a search is performed on some part of the item, called the
key.
– The item could consist of a string or a number (that serves as the key)
and additional data members (for instance, a name that is part of a
large employee structure).
– The size of the table is TableSize.
• Hash function h(k) maps search key k to some location in the
hash table in the range [0.. TableSize-1]. Different keys
might be mapped (or called hashed) to the same location and
this is called collision.
General Idea
• Insertion: Compute the location in the hash table for the input item and
insert it into the table.
• Deletion: Compute the location in the hash table for the input item and
remove it from the table.
Issues in Hashing
Hash Table
Hash Function
0
Items: 1
18 2
23 key % TableSize 3
26 4
9 5
7 6
TableSize = 7
Hash-based indexes
n Hash-based indexes are best for equality selections. Cannot support
range searches.
n Static and dynamic hashing techniques exist; trade-offs similar to ISAM
vs. B+ trees.
1.72
Static Hashing
Static hashing
In static hashing, a search key value is provided
by the designed Hash Function always computes
the same address
For example, if mod (4) hash function is used, then
it shall generate only 5 values.
The number of buckets provided remains
unchanged at all times
Bucket address = h (K): the address of the desired
data item which is used for insertion updating and
deletion operations
1.73
Static Hashing
0
h(key)
1
key
h
N-1
Primary bucket pages Overflow pages
1.74
Static hashing comes with the following disadvantages −
1.75
Open Hashing (Separate chaining)
• Collisions are resolved using a list of
elements to store objects with the same
key together.
Solution: First, calculate the binary forms of each of the given numbers.
16- 10000
4- 00100
6- 00110
22- 10110
24- 11000
10- 01010
31- 11111
7- 00111
9- 01001
20- 10100
26- 11010
• Initially, the global-depth and local-depth is always 1. Thus, the hashing frame looks
like this:
• Inserting 16:
The binary format of 16 is 10000 and global-depth is 1. The hash function returns 1 LSB of 10000 which is 0.
Hence, 16 is mapped to the directory with id=0.
• Inserting 4 and 6:
Both 4(100) and 6(110)have 0 in their LSB. Hence, they are hashed as follows:
• Inserting 22: The binary form of 22 is 10110. Its LSB is 0. The bucket pointed by directory 0 is already full. Hence,
Over Flow occurs.
As directed by Case 1, Since Local Depth = Global Depth, the bucket splits and directory
expansion takes place. Also, rehashing of numbers present in the overflowing bucket takes
place after the split. And, since the global depth is incremented by 1, now,the global depth is
2. Hence, 16,4,6,22 are now rehashed w.r.t 2 LSBs.[ 16(10000),4(100),6(110),22(10110) ]
1.112