0% found this document useful (0 votes)

14 views

Indexing - II

Uploaded by

f20211140

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views

Indexing - II

Uploaded by

f20211140

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 57

Index structures/files

 Secondary Index
 Multi-Level Index
 B+- Tree
Index Structure
S
Search key
value

Location Mechanism
Location mechanism
facilitates finding
index entry for S
S Index entries

Once index entry is

found, the row can
be directly accessed S, …….
Dense indexes

 Every key from the data file is represented

 Entries are in the same order as that of the file
 Binary search can be used to find the required
<key, pointer>
 No.of blocks searched ‘log n’ instead of n/2 on an
average
 Example: 1,000,000 tuples, 10 tuples/4096 byte
block, key field 30 bytes, pointer 8 bytes
 Data file takes 400MB space
 Index file will take 10,000 blocks with100 entries/block
 Search will involve at most log10000 = 13 blocks in
MM
 Memory can also be optimized by keeping only
most searched blocks in memory
 Hence a record can be retrieved with less than 14
disk I/Os
Sparse indexes
 Useful if dense index is too large
 Uses less space at the cost of possibly more time
to search
 Generally a record, usually the first, per block is
represented
 Sparse index for previous example would take only
1000 blocks, 4MB
 But, it can not give quick answer to query ‘does
there exist a record with key value K?”
 It requires one disk I/O with searching in the

block
 Search K: find entry with largest key  K
Sparse Vs Dense Index
 Dense index: index entry for each data
record
 Unclustered index must be dense
 Clustered index need not be dense
 Sparse index: index entry for each block
of data file
Sparse Vs. Dense Index
Id Name Dept

Sparse,
clustered
index sorted
on Id
data file sorted Dense,
on Id unclustered
index sorted
on Name
Clustered vs. Unclustered Index

 Clustered (main/primary) index: index entries and

rows are ordered in the same way
 An integrated storage structure is always clustered
 There can be at most one clustered index on a table
 Unclustered (secondary) index: index entries and
rows are not sorted on the same search key
 There can be many secondary indices on a table
Clustering and Non-clustering
 Non-clustering indices have to be dense.
 Indices offer substantial benefits when searching for
records.
 When a file is modified, every index on the file must
be updated. Updating indices imposes overhead on
database modification.
 Sequential scan using clustering index is efficient, but
a sequential scan using a non-clustering index is
expensive – each record access may fetch a new
block from disk.
 Block fetch requires about 5 to 10 micro seconds, versus
about 100 nanoseconds for memory access
Indexing and Hashing 9
Clustered Index
 Good for range searches
 Use location mechanism to locate index
entry at start of range
 This locates first data record.
 Subsequent data records are contiguous if
index is clustered (not so if unclustered)
 Minimizes page transfers and maximizes
likelihood of cache hits
Types of Single-Level Indexes
 Secondary Index
A secondary index provides a secondary means of
accessing a file for which some primary access already
exists.
The secondary index may be on a field which is a
candidate key and has a unique value in every record, or
a nonkey with duplicate values.
The index is an ordered file with two fields.
 The first field is of the same data type as some
nonordering field of the data file that is an indexing
field.
 The second field is either a block pointer or a record
pointer. There can be many secondary indexes (and
hence, indexing fields) for the same file.
 Includes one entry for each record in the data file;
hence, it is a dense index
A dense secondary
index (with block
pointers) on a
nonordering key
field of a file.
Secondary indexes
 SELECT name, address
FROM MovieStar
WHERE birthdate= ‘1952-01-01’
 CREATE INDEX BDIndex ON MovieStar(birthdate);
 Secondary indexes are always ‘dense’
 Second level index could be ‘sparse’
 Secondary indexes are usually with duplicates
Secondary Indices Example

Secondary index on balance field of account

 Index record points to a bucket that contains

pointers to all the actual records with that particular
search-key value.
20
Secondary index 40

10 10
10 20
20
20 50
30
20
30 10
40 50
50
60
20
 Pointers in one index block may refer to
multiple data blocks
 Results in more number of Disk I/Os
 Unavoidable problem
 Using ‘bucket file’ between index file and data
file
 Single entry <k,p> for each value ‘k’ where p
points to location in bucket file containing all
other pointers of records with value ‘k’
 Avoids wastage of space due to multiple storage
of same value ‘k’
Definition of Bucket

 Bucket - another form of a storage unit

that can store one or more records of
information.

 Buckets are used if the search key value

cannot form a candidate key, or if the
file is not stored in search key order.
20
40

10 10
20 20
30
40 50
30
50
60 10
50

60
Index file 20

Bucket file Data file

 Application of ‘bucket file’
 It can help answer queries efficiently using
intersection of pointer sets
 Example
 SELECT title
FROM Movie
WHERE StudioName=‘Disney’ AND year=1995;
 This reduces number of Disk I/Os
Movie Tuples
Buckets for studio Buckets for year

Disney 1995

Studio index Year index

Multi-level indexes
 When an index is too large with even binary
search taking too many disk I/Os
 Define second level index: index on index
 This can continue to multi-level index structure
 Second and higher level indexes must be sparse
 Second level index in previous example would
take only 10 blocks, 40KB
 Search involves 2 disk I/Os and searching in the
block
Multilevel Index

 If an index does not fit in memory, access becomes

expensive.
 To reduce number of disk accesses to index records,
treat the index kept on disk as a sequential file and
construct a sparse index on it.
 outer index – a sparse index on main index

 inner index – the main index file

 If even outer index is too large to fit in main

memory, yet another level of index can be created,
and so on.
 Indices at all levels must be updated on insertion or
deletion from the file. 22
Multilevel Index (Cont.)
outer index inner index

Data
Index Block 0
Block 0

M
 Data
Block 1
M

Index 
Block 1

M



M

CIS552 23
Multi-level indexes
 When an index is too large with even binary
search taking too many disk I/Os
 Define second level index: index on index
 This can continue to multi-level index structure
 Second and higher level indexes must be sparse
 Second level index in previous example would
take only 100 blocks, 400KB
 Search involves 6 disk I/Os and searching in the
block
A Two-level Primary Index
Estimating Costs
 For simplicity we estimate the cost of an operation by
counting the number of blocks that are read or
written to disk.
 We ignore the possibility of blocked access which
could significantly lower the cost of I/O.
 We assume that each relation is stored in a separate
file with B blocks and R records per block.

CIS552 Indexing and Hashing 26

Choosing Indexing Technique
 Five Factors involved when choosing the
indexing technique:
 access type
 access time
 insertion time
 deletion time
 space overhead
Indexing Definitions
 Access type is the type of access being used.
 Access time - time required to locate the
data.
 Insertion time - time required to insert the
new data.
 Deletion time - time required to delete the
data.
 Space overhead - the additional space
occupied by the added data structure.
Index Evaluation Metrics
 Access time for:
 Equality searches – records with a specified

value in an attribute
 Range searches – records with an attribute

value falling within a specified range.

 Insertion time
 Deletion time
 Space overhead

29
B+-Tree Index
A B+-tree is a rooted tree satisfying the following properties:
o All paths from root to leaf are of the same length
o Each node that is not a root or a leaf has between n/2 and
n children. [Non leaf node]
o A leaf node has between (n–1)/2 and n–1 values
o Special cases:
o If the root is not a leaf, it has at least 2 children.

o If the root is a leaf (that is, there are no other nodes in

the tree), it can have between 0 and (n–1) values.

B+-Tree Node Structure
 Typical node

o Ki are the search-key values

o Pi are pointers to children (for non-leaf nodes) or
pointers to records or buckets of records (for leaf
nodes).
o The search-keys in a node are ordered
K1 < K2 < K3 < . . . < Kn–1
Example of B+-Tree
B+ Tree: Most Widely Used
Index
 Insert/delete at log F N cost; keep tree height-balanced.
(F = fanout, N = # leaf pages)
 Minimum 50% occupancy (except for root). Each node
contains n <= m <= 2n entries. The parameter n is
called the order of the tree.
 Supports equality and range-searches efficiently.

Index Entries
(Direct search)

Data Entries
("Sequence set")
Dynamic Multilevel Indexes Using B-Trees
and B+-Trees

 Because of the insertion and deletion problem, most

multi-level indexes use B-tree or B+-tree data
structures, which leave space in each tree node (disk
block) to allow for new index entries
 These data structures are variations of search trees that
allow efficient insertion and deletion of new search
values.
 In B-Tree and B+-Tree data structures, each node
corresponds to a disk block
 Each node is kept between half-full and completely full
Dynamic Multilevel Indexes Using B-Trees
and B+-Trees (contd.)

 An insertion into a node that is not full is quite

efficient; if a node is full the insertion causes a split
into two nodes
 Splitting may propagate to other tree levels
 A deletion is quite efficient if a node does not become
less than half full
 If a deletion causes a node to become less than half
full, it must be merged with neighboring nodes
Difference between B-tree and B+-tree

 In a B-tree, pointers to data records exist at all levels

of the tree
 In a B+-tree, all pointers to data records exists at the
leaf-level nodes
 A B+-tree can have less levels (or higher capacity of
search values) than the corresponding B-tree
B-tree structures. (a) A node in a B-tree with q – 1 search
values. (b) A B-tree of order p = 3. The values were
inserted in the order 8, 5, 1, 7, 3, 12, 9, 6.
The nodes of a B+-tree. (a) Internal node of a B+-tree with
q –1 search values. (b) Leaf node of a B+-tree with q – 1
search values and q – 1 data pointers.
Observations about B+-trees
o Since the inter-node connections are done by pointers,
“logically” close blocks need not be “physically” close.
o The non-leaf levels of the B+-tree form a hierarchy of
sparse indices.
o The B+-tree contains a relatively small number of levels
o
Level below root has at least 2* n/2 values
o
Next level has at least 2* n/2 * n/2 values
o
.. etc.
o If there are K search-key values in the file, the tree

height is no more than  logn/2(K)

o thus searches can be conducted efficiently.
o Insertions and deletions to the main file can be handled
efficiently, as the index can be restructured in logarithmic
time.
Queries on B+-Trees
 Find all records with a search-key value of
k.
1. N=root
2. Repeat
1. Examine N for the smallest search-key value > k.
2. If such a value exists, assume it is Ki. Then set N =
Pi
3. Otherwise k  Kn–1. Set N = Pn
Until N is a leaf node
3. If for some i, key Ki = k follow pointer Pi to
the desired record or bucket.
4. Else no record with search-key value k exists.
Example B+ Tree
 Search begins at root, and key comparisons direct it
to a leaf.
 Search for 5*, 15*, all data entries >= 24* ...
Root

13 17 24 30

2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

 Based on the search for 15*, we know it is not in the tree!

Query on B+ Trees
 In processing a query, a path is traversed in the tree from
the root to some leaf node.
 If there are K search-key values in the file, the path is no
longer than
 A node is generally the same size as a disk block, typically
4 kilobytes, and n is typically around 100 (40 bytes per
index entry).
 With 1 million search key values and n = 100, at most
log50(1,000,000) = 4 nodes are accessed in a lookup.
 Contrast this with a balanced binary tree with 1 million
search key values — around 20 nodes are accessed in a
lookup
 above difference is significant since every node access may need
a disk I/O, costing around 20 milliseconds!
B+ Trees in Practice
 Typical order: 100. Typical fill-factor: 67%.
 average fanout = 133
 Typical capacities:
 Height 4: 1334 = 312,900,700 records
 Height 3: 1333 = 2,352,637 records
 Can often hold top levels in buffer pool:
 Level 1 = 1 page = 8 Kbytes
 Level 2 = 133 pages = 1 Mbyte
 Level 3 = 17,689 pages = 133 MBytes
Inserting a Data Entry into a B+ Tree

 Find correct leaf L.

 Put data entry onto L.
 If L has enough space, done!
 Else, must split L (into L and a new node L2)
 Redistribute entries evenly, copy up middle key.

 Insert index entry pointing to L2 into parent of L.

 This can happen recursively

 To split index node, redistribute entries evenly, but
push up middle key. (Contrast with leaf splits.)
 Splits “grow” tree; root split increases height.
 Tree growth: gets wider or one level taller at top.
Updates on B+-Trees: Insertion

B+-Tree before and after insertion of “Clearview”

Inserting 8* into Example B+ Tree

Entry to be inserted in parent node.

 Observe how 5 (Note that 5 is
s copied up and
continues to appear in the leaf.)
minimum
occupancy is
2* 3* 5* 7*
guaranteed in
8*

both leaf and

index pg splits.
 Note difference Entry to be inserted in parent node.
between copy- 17 (Note that 17 is pushed up and only
appears once in the index. Contrast
this with a leaf split.)
up and push-
up; be sure you 5 13 24 30
understand the
reasons for
this.
Example B+ Tree After Inserting 8*

Root
17

5 13 24 30

2* 3* 5* 7* 8* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

 Notice that root was split, leading to increase in height.

 In this example, we can avoid split by re-distributing
entries; however, this is usually not done in
practice.
An example of insertion
in a B+-tree with q = 3
and pleaf = 2.

8,5,1,7,3,12,9,6
Deleting a Data Entry from a B+ Tree

 Start at root, find leaf L where entry belongs.

 Remove the entry.
 If L is at least half-full, done!
 If L has only d-1 entries,
 Try to re-distribute, borrowing from sibling (adjacent

node with same parent as L).

 If re-distribution fails, merge L and sibling.

 If merge occurred, must delete entry (pointing to L or

sibling) from parent of L.
 Merge could propagate to root, decreasing height.
Example Tree After (Inserting 8*, Then)
Deleting 19* and 20* ...
Root

5 13 27 30

2* 3* 5* 7* 8* 14* 16* 22* 24* 27* 29* 33* 34* 38* 39*

 Deleting 19* is easy.

 Deleting 20* is done with re-distribution. Notice how
middle key is copied up.
... And Then Deleting 24*

 Must merge. 30

 Observe `toss’ of index

entry (on right), and `pull 22* 27* 29* 33* 34* 38* 39*
down’ of index entry
(below).

Root
5 13 17 30

2* 3* 5* 7* 8* 14* 16* 22* 27* 29* 33* 34* 38* 39*

An example of
deletion from a
B+-tree.

Business Analytics For Decision Making Mid Term Exam DR Mahmoud Beshr
No ratings yet
Business Analytics For Decision Making Mid Term Exam DR Mahmoud Beshr
7 pages
API For AP Invoice
0% (1)
API For AP Invoice
6 pages
Indexing in Database
No ratings yet
Indexing in Database
33 pages
Index Architecture: Febriliyan Samopa
No ratings yet
Index Architecture: Febriliyan Samopa
110 pages
File Organization and Indexing (1)
No ratings yet
File Organization and Indexing (1)
38 pages
Chapter 11: Indexing and Hashing
No ratings yet
Chapter 11: Indexing and Hashing
47 pages
PPT-203105251-3
No ratings yet
PPT-203105251-3
35 pages
Memoryhierarchy Indexing
No ratings yet
Memoryhierarchy Indexing
9 pages
Indexing
No ratings yet
Indexing
8 pages
Co3 Session 21
No ratings yet
Co3 Session 21
53 pages
IN3020/4020 - Database Systems Spring 2020, Week 3.1 Indexing
No ratings yet
IN3020/4020 - Database Systems Spring 2020, Week 3.1 Indexing
44 pages
Unit Iv Indexing and Hashing: Basic Concepts
No ratings yet
Unit Iv Indexing and Hashing: Basic Concepts
35 pages
DBMS Indexing Methods
No ratings yet
DBMS Indexing Methods
33 pages
Index and Hashing
No ratings yet
Index and Hashing
82 pages
Index and Hashing 2017 Combined
No ratings yet
Index and Hashing 2017 Combined
60 pages
03 UW Indexing (1)
No ratings yet
03 UW Indexing (1)
97 pages
Unit 4 Notes
No ratings yet
Unit 4 Notes
15 pages
Indexing in DBMS
No ratings yet
Indexing in DBMS
12 pages
Indexing Files: Last Time
No ratings yet
Indexing Files: Last Time
5 pages
Indexing_Hashing_Files
No ratings yet
Indexing_Hashing_Files
68 pages
Indexes
No ratings yet
Indexes
70 pages
Indexing
No ratings yet
Indexing
62 pages
index1 (5)
No ratings yet
index1 (5)
25 pages
Indexing
No ratings yet
Indexing
10 pages
Indexing Structures For Files: Database Design Database Design
No ratings yet
Indexing Structures For Files: Database Design Database Design
9 pages
Data Indexing Presentation
No ratings yet
Data Indexing Presentation
38 pages
CS2202_IndexingHashing
No ratings yet
CS2202_IndexingHashing
83 pages
INDEXING
No ratings yet
INDEXING
10 pages
CH 12 Updated
No ratings yet
CH 12 Updated
55 pages
Indexing
No ratings yet
Indexing
6 pages
Indexing
No ratings yet
Indexing
11 pages
Indexing Structures For Files
No ratings yet
Indexing Structures For Files
30 pages
Lecture12(CNC 312)
No ratings yet
Lecture12(CNC 312)
36 pages
Module 4 Indexing
No ratings yet
Module 4 Indexing
20 pages
Indexing - DBMS
No ratings yet
Indexing - DBMS
20 pages
Indexing and Hashing
No ratings yet
Indexing and Hashing
20 pages
Indexing Lecture Nov 2023 Detailed
No ratings yet
Indexing Lecture Nov 2023 Detailed
37 pages
Index Structures
No ratings yet
Index Structures
34 pages
File Organization
No ratings yet
File Organization
41 pages
Indexing Structures For Files
No ratings yet
Indexing Structures For Files
25 pages
11.2 Indexing
No ratings yet
11.2 Indexing
26 pages
UNIT-5: Indexing and Hashing
No ratings yet
UNIT-5: Indexing and Hashing
78 pages
7 Indexing
No ratings yet
7 Indexing
13 pages
V_Unit[1]
No ratings yet
V_Unit[1]
36 pages
V Unit
No ratings yet
V Unit
15 pages
Chap. 2 File Organization and Indexing: Abel J.P. Gomes
No ratings yet
Chap. 2 File Organization and Indexing: Abel J.P. Gomes
20 pages
DBMS Storage and Indexing
No ratings yet
DBMS Storage and Indexing
80 pages
02 - Indices
No ratings yet
02 - Indices
208 pages
Exam Notes COA
No ratings yet
Exam Notes COA
36 pages
Indexing
No ratings yet
Indexing
24 pages
File Organizations and Indexing: R&G Chapter 8
No ratings yet
File Organizations and Indexing: R&G Chapter 8
40 pages
Indexing Lecture Nov 2023 Summary
No ratings yet
Indexing Lecture Nov 2023 Summary
41 pages
Primary Indexing
No ratings yet
Primary Indexing
7 pages
Chapter_3 - Indexing Structures for Files
No ratings yet
Chapter_3 - Indexing Structures for Files
83 pages
File Organizations and Indexing: R&G Chapter 8
No ratings yet
File Organizations and Indexing: R&G Chapter 8
26 pages
Chapter 8 Indexing NEW
No ratings yet
Chapter 8 Indexing NEW
43 pages
Unit 6 notes DBMS final
No ratings yet
Unit 6 notes DBMS final
14 pages
Unit-6 Storage Strategies
No ratings yet
Unit-6 Storage Strategies
43 pages
Lecture3 File Orgn
No ratings yet
Lecture3 File Orgn
13 pages
Screenshot 2025-03-12 at 9.41.04 AM
No ratings yet
Screenshot 2025-03-12 at 9.41.04 AM
41 pages
Search Tree: Fundamentals and Applications
From Everand
Search Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Learn MongoDB in 24 Hours
From Everand
Learn MongoDB in 24 Hours
Alex Nordeen
5/5 (2)
000_Company Interview Qns
No ratings yet
000_Company Interview Qns
13 pages
Data Science Book
No ratings yet
Data Science Book
16 pages
Big Data Architecture
No ratings yet
Big Data Architecture
25 pages
Research Methods Notes
100% (3)
Research Methods Notes
20 pages
216205.1 - Database Initialization Parameters For Oracle Applications Release 11i
No ratings yet
216205.1 - Database Initialization Parameters For Oracle Applications Release 11i
21 pages
DBS Sessional 1 Paper DDU Nadiad
No ratings yet
DBS Sessional 1 Paper DDU Nadiad
2 pages
BI unit 1-1
No ratings yet
BI unit 1-1
23 pages
Time Series Analysis and Mining With R
No ratings yet
Time Series Analysis and Mining With R
12 pages
Backup Veeam
No ratings yet
Backup Veeam
8 pages
(Module 2) Data Visualization in Excel
100% (1)
(Module 2) Data Visualization in Excel
72 pages
Database
No ratings yet
Database
13 pages
Test Cases Test Case For Print (Table) : Test Case Action Input Value Expected Results Actual Results Pass/Fail
No ratings yet
Test Cases Test Case For Print (Table) : Test Case Action Input Value Expected Results Actual Results Pass/Fail
5 pages
Library System Literature Review
100% (2)
Library System Literature Review
8 pages
Analysis of The Utilization of Big Data in Constructions in Ilorin
No ratings yet
Analysis of The Utilization of Big Data in Constructions in Ilorin
11 pages
Database Systems Lecture 2
No ratings yet
Database Systems Lecture 2
40 pages
Sample SQL Manual
No ratings yet
Sample SQL Manual
3 pages
A Study On Coustomer Satisfaction Preference Towards Lakme
No ratings yet
A Study On Coustomer Satisfaction Preference Towards Lakme
5 pages
Lecture 3.3 - Introduction To Triggers Lecture 3.4 - Implementation of Triggers
No ratings yet
Lecture 3.3 - Introduction To Triggers Lecture 3.4 - Implementation of Triggers
41 pages
Modul E2: Role of Research in Understanding Consumer Behaviour
100% (2)
Modul E2: Role of Research in Understanding Consumer Behaviour
72 pages
Logistics Project
60% (10)
Logistics Project
57 pages
Unit I Introduction To Big Data: 1.1 Definition
No ratings yet
Unit I Introduction To Big Data: 1.1 Definition
16 pages
LIVING-IN-THE-IT-ERA-MODULE-1 Answers
No ratings yet
LIVING-IN-THE-IT-ERA-MODULE-1 Answers
19 pages
MR Rafael Hernandez Nunez
No ratings yet
MR Rafael Hernandez Nunez
15 pages
Compiled Unsucc
No ratings yet
Compiled Unsucc
28 pages
Iedscout: Versatile Software Tool For Working With Iec 61850 Devices
No ratings yet
Iedscout: Versatile Software Tool For Working With Iec 61850 Devices
12 pages
Buy ebook Data Modeling and Database Design 2nd Edition Narayan S. Umanath cheap price
100% (1)
Buy ebook Data Modeling and Database Design 2nd Edition Narayan S. Umanath cheap price
48 pages
Canteen Management Chapter 3 PDF Free
No ratings yet
Canteen Management Chapter 3 PDF Free
4 pages
10gen-MongoDB Operations Best Practices
No ratings yet
10gen-MongoDB Operations Best Practices
29 pages

Indexing - II

Uploaded by

Indexing - II

Uploaded by

Index structures/files

Once index entry is

 Every key from the data file is represented

 Clustered (main/primary) index: index entries and

Secondary index on balance field of account

 Index record points to a bucket that contains

 Bucket - another form of a storage unit

 Buckets are used if the search key value

Bucket file Data file

Studio index Year index

 If an index does not fit in memory, access becomes

 inner index – the main index file

 If even outer index is too large to fit in main

CIS552 Indexing and Hashing 26

value falling within a specified range.

o If the root is a leaf (that is, there are no other nodes in

the tree), it can have between 0 and (n–1) values.

o Ki are the search-key values

 Because of the insertion and deletion problem, most

 An insertion into a node that is not full is quite

 In a B-tree, pointers to data records exist at all levels

height is no more than  logn/2(K)

 Based on the search for 15*, we know it is not in the tree!

 Find correct leaf L.

 Insert index entry pointing to L2 into parent of L.

 This can happen recursively

B+-Tree before and after insertion of “Clearview”

Entry to be inserted in parent node.

both leaf and

 Notice that root was split, leading to increase in height.

 Start at root, find leaf L where entry belongs.

node with same parent as L).

 If merge occurred, must delete entry (pointing to L or

 Deleting 19* is easy.

 Observe `toss’ of index

2* 3* 5* 7* 8* 14* 16* 22* 27* 29* 33* 34* 38* 39*

You might also like