0% found this document useful (0 votes)

30 views20 pages

Indexing and Hashing

Indexing mechanisms like indices and hashing are used to speed up access to desired data records. There are two main types of indices: ordered indices which store search keys in sorted order, and hash indices which distribute search keys uniformly using a hash function. B-tree indexing structures are commonly used as they can efficiently support insertion, deletion and search operations in logarithmic time. B-trees keep the tree balanced through splitting nodes and propagating keys during insertion if nodes overflow.

Uploaded by

Mukul Dilwaria

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views20 pages

Indexing and Hashing

Uploaded by

Mukul Dilwaria

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

26

Indexing and Hashing

Indexing mechanisms used to speed up access to
desired data. E.g., author catalog in library.
Search Key - attribute to set of attributes used to look
up records in a file.
An index file consists of records (called index entries)
of the form
search-key

pointer

Index files are typically much smaller than the original file
Two basic kinds of indices:
Ordered indices: search keys are stored in sorted
order
Hash indices: search keys are distributed uniformly
across buckets using a hash function.

DBMS: Rajeev Wankar

Main concepts
search keys are sorted in the index file and point to
the actual records
primary vs. secondary indices
Clustering (sparse) vs non-clustering (dense) indices
Primary key index: on primary key (no duplicates)
123
234
345
456
567

STUDENT
Ssn
123
234
678
456
345

Nam e
smith
jones
tom s o n
stevens
smith

Address
main str
forbes a ve
main str
forbes a ve
forbes a ve

Secondary key index: duplicates may exist

forbes ave
main str

Address-index

DBMS: Rajeev Wankar

STUDENT
Ssn
Name
123 smith
234 jones
345 tomson
456 stevens
567 smith

Address
main str
forbes ave
main str
forbes ave
forbes ave

secondary key index: typically, with postings lists

Postings lists
STUDENT
Ssn
Name
123 smith
234 jones
345 tomson
456 stevens
567 smith

forbes ave
main str

Address
main str
forbes ave
main str
forbes ave
forbes ave

Clustering (= sparse) index: records are physically

sorted on that key (and not all key values are
needed in the index)
Non-clustering (=dense) index: the opposite
E.g.: Clustering/sparse index on ssn
123
456

>=123

>=456

DBMS: Rajeev Wankar

STUDENT
Ssn
Name
123 smith
234 jones
345 tomson
456 stevens
567 smith

Address
main str
forbes ave
main str
forbes ave
forbes ave

Sparse Index: contains index records for only some

search-key values.
Dense Index Files: Index record appears for every
search-key value in the file
Non-clustering / dense index
123
234
345
456
567

Ssn
345
234
567
456
123

Nam e
tom s o n
jones
smith
ste v e n s
smith

Address
main str
forbes a ve
forbes a ve
forbes a ve
main str

ISAM
What if index is too large to search sequentially?
>=123
123
3,423

123
456

>=456
block

DBMS: Rajeev Wankar

STUDENT
Ssn
Name
123 smith
234 jones
345 tomson
456 stevens
567 smith

Address
main str
forbes ave
main str
forbes ave
forbes ave

Multilevel Index
If primary index does not fit in memory, access
becomes expensive.
if index is too large, store it on disk and keep
index-on-the-index
usually two levels of indices, one first- level entry
per disk block (why? )
What about insertions/deletions?
Index Update: Deletion
If deleted record was the only record in the file with
its particular search-key value, the search-key is
deleted from the index also.
>=123
123
3,423

123
456

>=456
124; peterson; fifth ave.

DBMS: Rajeev Wankar

STUDENT
Ssn
Name
123 smith
234 jones
345 tomson
456 stevens
567 smith

Address
main str
forbes ave
main str
forbes ave
forbes ave

123
3,423

123
456

STUDENT
Ssn
Name
123 smith
234 jones
345 tomson
456 stevens
567 smith

overflows
Address
main str
forbes ave
main str
forbes ave
forbes ave

124; peterson; fifth ave.

Problems?

overflow chains may become very long - what to

do?
overflow chains may become very long - thus:
shut-down & reorganize
start with ~80% utilization
if index is too large, store it on disk and keep index
on the index (in memory)
usually two levels of indices, one first- level entry per
disk block (why? )
indices (like ISAM) suffer in the presence of frequent updates
alternative indexing structure: B - trees
DBMS: Rajeev Wankar

the most successful family of index schemes

(B-trees, B+-trees, B*-trees)
Can be used for primary/secondary, clustering/
non-clustering index.

B-trees
Eg., B-tree of order 3:

<6
>6
1

<9
7

9
>9
13

A B/B+ tree is a rooted tree satisfying the following

properties:
All paths from root to leaf are of the same length
Each node that is not a root or a leaf has between
( ( n 2 ) and n children.
A leaf node has between ( ( ( n 1 ) 2 ) and n 1
values. Special cases: If the root is not a leaf, it has
at least 2 children.
DBMS: Rajeev Wankar

If the root is a leaf (that is, there are no other nodes

in the tree), it can have between 0 and (n1) values.
O(log (N)) for everything! (ins/del/search)
typically, if m = 50 - 100, then 2 - 3 levels
utilization >= 50%, guaranteed; on average 69%
Queries: Algorithm for exact match query?
(eg., ssn=8?)

<6
>6
3

H steps
>9

<9
7

B-tree print keys in sorted order?

<6
>6
1

DBMS: Rajeev Wankar

<9
7

9
>9
13

Solution B+-Tree Index Files

facilitate sequential ops.
They string all leaf nodes together
AND
replicate keys from non-leaf nodes, to make sure
every key appears at the leaf level
6

<6
>=6
1

9
>=9

<9
6

Advantage of B+-tree index files: automatically

reorganizes itself with small, local, changes, in the
face of insertions and deletions. Reorganization of
entire file is not required to maintain performance.
Disadvantage of B+-trees: extra insertion and deletion overhead, space overhead.
Advantages of B+-trees outweigh disadvantages,
and they are used extensively.

DBMS: Rajeev Wankar

B+-Tree Node Structure

Ki are the search-key values

Pi are pointers to children (for non-leaf nodes) or
pointers to records or buckets of records (for leaf
nodes).
The search-keys in a node are ordered
K1 < K2 < K3 < . . . < Kn1
Leaf Nodes in B+-Trees
For i = 1, 2, . . ., n1, pointer Pi either points to a file
record with search-key value Ki, or to a bucket of
pointers to file records, each record having searchkey value Ki. Only need bucket structure if searchkey does not form a primary key.
Pn points to next leaf node in search-key order

DBMS: Rajeev Wankar

Non-Leaf Nodes in B+-Trees

Non leaf nodes form a multi-level sparse index on
the leaf nodes. For a non-leaf node with m pointers:
All the search-keys in the sub-tree to which P1 points
are less than K1
For 2 i n 1 , all the search-keys in the sub-tree to
which Pi points have values greater than or equal to
Ki1 and less than Km1
B+-tree for account file (n = 3)

DBMS: Rajeev Wankar

Leaf nodes must have between 2 and 4 values

( ( ( n 1 ) 2 ) and n 1, with n = 5).
Non-leaf nodes other than root must have between 3 and
5 children ( ( n 2 ) and n with n =5).
Root must have at least 2 children.

Observations about B+-trees

Since the inter-node connections are done by
pointers, logically close blocks need not be
physically close.
The non-leaf levels of the B+-tree form a hierarchy
of sparse indices.
The B+-tree contains a relatively small number of
levels (logarithmic in the size of the main file), thus
searches can be conducted efficiently.

DBMS: Rajeev Wankar

Queries on B+-Trees

Find all records with a search-key value of k.

Start with the root node

1. Examine the node for the smallest search-key value

> k.
2. If such a value exists, assume it is Kj. Then follow Pi
to the child node
3. Otherwise k K m 1 , where there are m pointers in
the node. Then follow Pm to the child node.

If the node reached by following the pointer above

is not a leaf node, repeat the above procedure on
the node, and follow the corresponding pointer.
Eventually reach a leaf node. If for some i, key Ki =
k follow pointer Pi to the desired record or bucket.
Else no record with search-key value k exists.
Insertions and deletions to the main file can be
handled efficiently, as the index can be restructured in logarithmic time (as we shall see).
If there are K search-key values in the file, the path is
no longer than log n 2 ( K ) .
With 1 million search key values and n = 100, at most
log50(1,000,000) = 4 nodes are accessed in a lookup.
DBMS: Rajeev Wankar

B+ tree insertion
Find the leaf node in which the search-key value
would appear
If the search-key value is already there in the leaf
node, record is added to file and if necessary a
pointer is inserted into the bucket.
If the search-key value is not there, then add the
record to the main file and create a bucket if necessary. Then:
If there is room in the leaf node, insert (key-value,
pointer) pair in the leaf node
Otherwise, split the node (along with the new (keyvalue, pointer) entry).

Splitting a node:
take the n(search-key value, pointer) pairs (including
the one being inserted) in sorted order. Place the first
n 2 in the original node, and the rest in a new
node.

DBMS: Rajeev Wankar

let the new node be p, and let k be the least key value
in p. Insert (k,p) in the parent of the node being split.
If the parent is full, split it and propagate the split
further up.
The splitting of nodes proceeds upwards till a node
that is not full is found. In the worst case the root
node may be split increasing the height of the tree
by 1.
/* ATTENTION:
a split at the LEAF level is handled by COPYING
the middle key upstairs;
A split at a higher level is handled by PUSHING
the middle key upstairs */
INSERTION OF KEY K
insert search-key value to L such that the keys are in
order;
if ( L overflows) {
split L ;
insert (ie., COPY) smallest search-key value
of new node to parent node P;
if (P overflows) {
repeat the B-tree split procedure recursively;
/* Notice: the B-TREE split; NOT the B+ -tree */
}
}
DBMS: Rajeev Wankar

E g ., in s e r t 8
6

>=9

>=6
3

E g ., in s e r t 8
6

>=9

>=6
1

C O P Y m id d le u p s t a ir s

Eg., insert 8
6

>=9
>=6

<9
6

7
7

COPY middle upstairs

DBMS: Rajeev Wankar

N o n -leaf overflow
just PU S H t h e
m iddle

E g ., insert 8
6

>=9

>=6
1

C O P Y m iddle upstairs

E g ., in s e r t 8

>=7
9

>=9

>=6
1

FIN A L T R E E

DBMS: Rajeev Wankar

B+-Tree before and after insertion of Clearview

B+-Tree File Organization
Index file degradation problem is solved by using
B+-Tree indices.
The leaf nodes in a B+-tree file organization store
records, instead of pointers.
Since records are larger than pointers, the maximum
number of records that can be stored in a leaf node is
less than the number of pointers in a nonleaf node.
Leaf nodes are still required to be half full.
Insertion and deletion are handled in the same way
as insertion and deletion of entries in a B+-tree index.
DBMS: Rajeev Wankar

B-Tree Index Files

Similar to B+-tree, but B-tree allows search-key values to appear only once; eliminates redundant storage of search keys.
Search keys in nonleaf nodes appear nowhere else in
the B-tree; an additional pointer field for each search
key in a nonleaf node must be included.

DBMS: Rajeev Wankar

B-tree (above) and B+-tree (below) on same data

DBMS: Rajeev Wankar

Unit Iv Indexing and Hashing: Basic Concepts
No ratings yet
Unit Iv Indexing and Hashing: Basic Concepts
35 pages
IT3020 L06 Indexing
No ratings yet
IT3020 L06 Indexing
41 pages
03 UW Indexing (1)
No ratings yet
03 UW Indexing (1)
97 pages
Indexing_Hashing_Files
No ratings yet
Indexing_Hashing_Files
68 pages
CH 12 Updated
No ratings yet
CH 12 Updated
55 pages
DBMS-Indexing
No ratings yet
DBMS-Indexing
43 pages
7 Indexing
No ratings yet
7 Indexing
13 pages
PPT-203105251-3
No ratings yet
PPT-203105251-3
35 pages
Co3 Session 21
No ratings yet
Co3 Session 21
53 pages
Lecture 5 Trees
No ratings yet
Lecture 5 Trees
47 pages
Index and Hashing
No ratings yet
Index and Hashing
82 pages
Chapter 7 Indexing Part1
No ratings yet
Chapter 7 Indexing Part1
58 pages
CS2202_IndexingHashing
No ratings yet
CS2202_IndexingHashing
83 pages
CSE 301 Lecture-8-Indexing WT
No ratings yet
CSE 301 Lecture-8-Indexing WT
31 pages
Indexing in Database
No ratings yet
Indexing in Database
33 pages
File Organization and Indexing (1)
No ratings yet
File Organization and Indexing (1)
38 pages
Indexing: Contents
No ratings yet
Indexing: Contents
13 pages
ch12 1 40
No ratings yet
ch12 1 40
40 pages
Index Architecture: Febriliyan Samopa
No ratings yet
Index Architecture: Febriliyan Samopa
110 pages
Storage-Final
No ratings yet
Storage-Final
77 pages
Indexing - DBMS
No ratings yet
Indexing - DBMS
20 pages
Memoryhierarchy Indexing
No ratings yet
Memoryhierarchy Indexing
9 pages
DBMS Unit-4
No ratings yet
DBMS Unit-4
9 pages
Lecture Index Structures
No ratings yet
Lecture Index Structures
43 pages
Aplikasi DB-MKG 7
No ratings yet
Aplikasi DB-MKG 7
22 pages
UNIT-5: Indexing and Hashing
No ratings yet
UNIT-5: Indexing and Hashing
78 pages
Lecture12(CNC 312)
No ratings yet
Lecture12(CNC 312)
36 pages
Indexing
No ratings yet
Indexing
6 pages
Index Structures
No ratings yet
Index Structures
34 pages
Week2 S1 Indexing
No ratings yet
Week2 S1 Indexing
50 pages
Database Modeling - Notes-V
No ratings yet
Database Modeling - Notes-V
9 pages
CO3-Session-09 & 10
No ratings yet
CO3-Session-09 & 10
41 pages
DBMS Indexing Methods
No ratings yet
DBMS Indexing Methods
33 pages
CSE 544: Lecture 11 Storing Data, Indexes: Monday, 5/1/2006
No ratings yet
CSE 544: Lecture 11 Storing Data, Indexes: Monday, 5/1/2006
52 pages
Indexing in DBMS
No ratings yet
Indexing in DBMS
12 pages
Unit 4 Notes
No ratings yet
Unit 4 Notes
15 pages
IN3020/4020 - Database Systems Spring 2020, Week 3.1 Indexing
No ratings yet
IN3020/4020 - Database Systems Spring 2020, Week 3.1 Indexing
44 pages
Indexing
No ratings yet
Indexing
24 pages
DBMS-Unit5-PPT (1)
No ratings yet
DBMS-Unit5-PPT (1)
40 pages
Indexing - II
No ratings yet
Indexing - II
57 pages
Unit-6 Storage Strategies
No ratings yet
Unit-6 Storage Strategies
43 pages
Static Hashing in DBMS
No ratings yet
Static Hashing in DBMS
75 pages
File Organizations and Indexing: R&G Chapter 8
No ratings yet
File Organizations and Indexing: R&G Chapter 8
40 pages
Unit5 Dbms Indexing
No ratings yet
Unit5 Dbms Indexing
6 pages
Ch14, Veiws, Normalization_summary.pptx
No ratings yet
Ch14, Veiws, Normalization_summary.pptx
68 pages
Indexes
No ratings yet
Indexes
70 pages
08-indexes1
No ratings yet
08-indexes1
7 pages
INDEXING
No ratings yet
INDEXING
10 pages
Indexing and Hashing: B.Ramamurthy
No ratings yet
Indexing and Hashing: B.Ramamurthy
24 pages
Indexing
No ratings yet
Indexing
10 pages
Dbms Indexing
No ratings yet
Dbms Indexing
3 pages
Mobile Batching Plant: CP 18 C3 / CP 18 TM / CP 21
0% (1)
Mobile Batching Plant: CP 18 C3 / CP 18 TM / CP 21
4 pages
L4 Indexing
No ratings yet
L4 Indexing
56 pages
link.docx
No ratings yet
link.docx
4 pages
Unit 3 Storage Strategies Indices B-Trees Hashing
No ratings yet
Unit 3 Storage Strategies Indices B-Trees Hashing
12 pages
Computer Vision Module 5
100% (1)
Computer Vision Module 5
22 pages
Chapter 11: Indexing and Hashing
No ratings yet
Chapter 11: Indexing and Hashing
47 pages
Lingua Ex Machina
100% (2)
Lingua Ex Machina
305 pages
Class XII - Indian Society (Sociology)
No ratings yet
Class XII - Indian Society (Sociology)
161 pages
HLURB LU Codes
100% (2)
HLURB LU Codes
7 pages
The Art of Electronics 3rd Edition - Paul Horowitz, Winfield Hill - Free Download, Borrow, and Streaming - Internet Archive
No ratings yet
The Art of Electronics 3rd Edition - Paul Horowitz, Winfield Hill - Free Download, Borrow, and Streaming - Internet Archive
3 pages
Unit 2 Notes
No ratings yet
Unit 2 Notes
43 pages
DEEP FACE RECOGNITION USING IMPERFECT FACIAL DATA MAIN (1)
No ratings yet
DEEP FACE RECOGNITION USING IMPERFECT FACIAL DATA MAIN (1)
17 pages
DAY 2 PPT one
No ratings yet
DAY 2 PPT one
34 pages
ASIC Interview Question & Answer - PLL
No ratings yet
ASIC Interview Question & Answer - PLL
3 pages
CSM-Dump Truck Problem Simulation
No ratings yet
CSM-Dump Truck Problem Simulation
9 pages
Indexing
No ratings yet
Indexing
8 pages
Nature Stories - Plants
No ratings yet
Nature Stories - Plants
137 pages
Looking Inside For Competitive Advantage
100% (2)
Looking Inside For Competitive Advantage
14 pages
Android Customization - From The Kernel To The Apps
No ratings yet
Android Customization - From The Kernel To The Apps
40 pages
Computer Problem Solving
No ratings yet
Computer Problem Solving
7 pages
Unit-4 Summary by Fatma Uyar
No ratings yet
Unit-4 Summary by Fatma Uyar
6 pages
2019 Faculty of Computer Science and Information Technology-3
No ratings yet
2019 Faculty of Computer Science and Information Technology-3
20 pages
Hsslive-XI-CS-Chap5-Introduction To C Programming
No ratings yet
Hsslive-XI-CS-Chap5-Introduction To C Programming
2 pages
Driver Salary Payment Temp
No ratings yet
Driver Salary Payment Temp
1 page
Driver Salary Payment Temp-3 (1)
No ratings yet
Driver Salary Payment Temp-3 (1)
1 page
20 Links Hbo Max @shadow - Animes
No ratings yet
20 Links Hbo Max @shadow - Animes
5 pages
37-Web+Filter
No ratings yet
37-Web+Filter
10 pages
Rishika Internship
No ratings yet
Rishika Internship
64 pages
Cisco SD-WAN Migration Guide: July 23, 2019
No ratings yet
Cisco SD-WAN Migration Guide: July 23, 2019
55 pages
InitiateSingleEntryPaymentSummary16-01-2025
No ratings yet
InitiateSingleEntryPaymentSummary16-01-2025
1 page
SE-S856-MSS - Easergy Builder - User Manual - EN - 4.0
No ratings yet
SE-S856-MSS - Easergy Builder - User Manual - EN - 4.0
146 pages
CRIO MDK Software User Manual
No ratings yet
CRIO MDK Software User Manual
122 pages
9_4 Ka Antenna Spec sheet
No ratings yet
9_4 Ka Antenna Spec sheet
8 pages
Ali Hejazizo: - Curriculum Vitae
No ratings yet
Ali Hejazizo: - Curriculum Vitae
3 pages
Tapmad General Proposal
No ratings yet
Tapmad General Proposal
22 pages
Custom Stepper Motor With Creality 4.2.2 Mainboard
No ratings yet
Custom Stepper Motor With Creality 4.2.2 Mainboard
9 pages
1Z1-333 Oracle Fusion Financials Cloud Service General Ledger 2016 Implementation Essentials - New
No ratings yet
1Z1-333 Oracle Fusion Financials Cloud Service General Ledger 2016 Implementation Essentials - New
43 pages
Lexical Analysis
No ratings yet
Lexical Analysis
5 pages
Enrolment Update Form Resident Foreigner v3
No ratings yet
Enrolment Update Form Resident Foreigner v3
2 pages
ADVANCED DATA STRUCTURES FOR ALGORITHMS: Mastering Complex Data Structures for Algorithmic Problem-Solving (2024)
From Everand
ADVANCED DATA STRUCTURES FOR ALGORITHMS: Mastering Complex Data Structures for Algorithmic Problem-Solving (2024)
VIOLET CASTRO
No ratings yet
Mastering Data Structures and Algorithms in C and C++
From Everand
Mastering Data Structures and Algorithms in C and C++
Sachin Naha
No ratings yet
Search Tree: Fundamentals and Applications
From Everand
Search Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Painless Pre-Algebra
From Everand
Painless Pre-Algebra
Barron's Educational Series
3/5 (2)

Indexing and Hashing

Uploaded by

Indexing and Hashing

Uploaded by

26

Indexing and Hashing

DBMS: Rajeev Wankar

Secondary key index: duplicates may exist

DBMS: Rajeev Wankar

secondary key index: typically, with postings lists

Clustering (= sparse) index: records are physically

DBMS: Rajeev Wankar

Sparse Index: contains index records for only some

DBMS: Rajeev Wankar

DBMS: Rajeev Wankar

124; peterson; fifth ave.

overflow chains may become very long - what to

the most successful family of index schemes

A B/B+ tree is a rooted tree satisfying the following

If the root is a leaf (that is, there are no other nodes

B-tree print keys in sorted order?

DBMS: Rajeev Wankar

Solution B+-Tree Index Files

Advantage of B+-tree index files: automatically

DBMS: Rajeev Wankar

B+-Tree Node Structure

Ki are the search-key values

DBMS: Rajeev Wankar

Non-Leaf Nodes in B+-Trees

DBMS: Rajeev Wankar

Leaf nodes must have between 2 and 4 values

Observations about B+-trees

DBMS: Rajeev Wankar

Find all records with a search-key value of k.

1. Examine the node for the smallest search-key value

If the node reached by following the pointer above

DBMS: Rajeev Wankar

COPY middle upstairs

DBMS: Rajeev Wankar

DBMS: Rajeev Wankar

B+-Tree before and after insertion of Clearview

B-Tree Index Files

DBMS: Rajeev Wankar

B-tree (above) and B+-tree (below) on same data

DBMS: Rajeev Wankar

You might also like