0% found this document useful (0 votes)

29 views7 pages

Database Query Efficiency and Indexing

The document discusses various data structures and indexing techniques to improve the efficiency of database queries, emphasizing the importance of selectivity and data characteristics. It covers types of indexes, dimensionality of data, and different query types, including exact match, range, and similarity queries. Additionally, it explains specific structures like R-Trees and Pyramid Indexes, detailing their operations and the implications of using them for various data types.

Uploaded by

xerodo1379

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views7 pages

Database Query Efficiency and Indexing

Uploaded by

xerodo1379

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Exercises:

3.1 – Time Complexity Analysis

5.3 – Pyramid Indexes (Which queries can be accessed efficiently using PIs?)

The efficiency of database queries often depends on two things:

1. The selectivity of a query / how constrained a query is (searching for just one object is way quicker
than searching for a list of objects)
2. Data characteristics (how data is distributed and how our data is stored)

A simple sequential scan has the runtime of 𝑂(𝑛), yet it can be improved through usage of various data
structures.

Index Structures have the following properties:

 Common Goal: Improve (shorten) processing time of query
 It's the task of the index to give the user the queried data cell
 Has to grow with the data structure at large
 Improved time complexity:
o commonly 𝑂(log 𝑛)
o sometimes 𝑂(1)
o in many cases only 𝑂(𝑛)
 Success is dependent on data characteristics
 Are additional structures which consist of redundant information
 Have 2 types of indexes:
o Primary (only one per relational table / file): physical clustering/sorting of data entries
according to one selected (unique) primary key
o Secondary: support for additional non-key attributes; creation of a secondary index requires
only the creation of a new directory, no physical sorting of data entries required; multiple
secondary index structures per relational table / file possible
 Requirements:
o Efficient Search
o Dynamic insert, deletion, update of data entries
o Order-preserving index
o Efficient space usage of the index
o Easy to implement
o Adaptive towards changing data distributions
o Parallelism

To be noted: all index structures promise an efficiency improvement, but it's always dependent on what
kind of data is stored, queries and data distribution.
Dimensionality: depending on what the dimensionality is, we end up with varying results of our queries:

 One-dimensional data
o Scalar (numerical), nominal data, ordinal data, metric data
 Multi-dimensional data
o Multi-variant queries, geographic information systems (spatial data), …
 High-dimensional data
o Images, vector spaces, time series, sensor representations, time-series data
 No-dimensional data (metric data)
o Only distance functions between data items are available (protein folding structures)
o Important characteristics of metric data:
1. Symmetry: 𝑑(𝑝, 𝑞) = 𝑑(𝑞, 𝑝)
2. Definiteness: 𝑑(𝑝, 𝑞) = 0 ⇒ 𝑝 = 𝑞
3. Triangle Inequality: 𝑑(𝑝, 𝑟) ≤ 𝑑(𝑝, 𝑞) + 𝑑(𝑞, 𝑟)

Types of Data Structuring:

1. Space-Oriented Structures: easy partitioning, but the complexity may

rise unpredictably.
Example: hashing functions.
2. Data-Oriented Structures: constant re-partitioning, but a balanced
distribution.
Example: search trees.

We can also improve efficiency by changing Internal Structures:

1. Efficient Sequential Scan: instead of storing all the information, we store parts of it.
Complexity: Still 𝑂(𝑛), but still quicker than a sequential scan.
Example: Bitmaps, VA files.
2. Hierarchical structures: Idea: pruning of search paths.
Complexity: 𝑂(𝑛) to 𝑂(log 𝑛).
Example: search trees.
3. Scatter Storage Structures: scattering the data over a fixed set of buckets, and with this scattering
we achieve the best query complexity. This may also lead to degeneration of data tables, however.
Complexity: 𝑂(𝑛)to 𝑂(1).
Example: hash functions.
Types of Search Queries:

1. Exact Match Query

 Specifies all k attributes exactly
 (𝑥 , … , 𝑥 )
2. Partial Match Query
 Specifies values only for some attributes
 (∗, 𝑥 , … , 𝑥 ,∗, 𝑥 )
3. Range Query
 Specifies all k ranges
 ([𝑢 , 𝑜 ], … , [𝑢 , 𝑜 ])
4. Partial Range Query
 Specifies some ranges
 Similar to partial range queries

Types of Similarity Queries (from most strict to least specific):

1. Range Queries
 Returns all objects with a distance smaller than a specified 𝜀 value
 {𝑜|𝑜 ∈ 𝐷𝐵 ∧ 𝑑(𝑜, 𝑞) ≤ 𝜀}
2. k-Nearest Neighbor Queries
 Gives us the object(s) o that are the closest to us, i.e. those that have the smallest distance to q
compared to all other o'
 {𝑜|𝑜 ∈ 𝐷𝐵 ∧ ∀𝑜 ∈ 𝐷𝐵: 𝑑(𝑜, 𝑞), = 𝑑(𝑜 , 𝑞)}
3. Ranking Queries
 Similar to k-nearest neighbor, but it's ordered in a certain (e.g. ascending in relation to 𝑑(𝑜, 𝑞))
manner and gives us a large result
INVERTED LISTS:

Inverted Lists work as follows:

ONLY if all attributes 𝐴 … 𝐴 are equally important, we can answer a multivariate query using Inverted
Lists.

COMPOSITE INDEXES:

In Composite Indexes, however, the fact that not all attributes are made equal is used to the fullest.
The importance is encoded inside of the composite index and doesn't change.
Example:
R-TREES:

R-Trees are another data structure that is based on storage of overlapping page regions.

The height is always ≤ ⌈log 𝑁⌉ − 1 where m is the minimum amount of entries in a page and N is the
amount of stored objects.

The idea is to approximate objects via minimal bounding rectangles and to, as such, allow for quicker
searching (through so-called MBR’s: “minimum bounding rectangle“):

1. If we find that the queried range is empty, we can stop at the 1 st level.
2. Even if we keep on looking for an object that doesn’t exist, we only need to go through a small
amount of data pages (≤ ⌈log 𝑁⌉ − 1)

Each rectangle in a directory page covers the MBR of all rectangles in all directory- or data-pages stored
in the respective subtree:

It should be noted:
 Only the leaf nodes contain any kinds of objects, intermediary nodes only contain references to
other pages
 An object is always stored in one node, but a query might have to search in two nodes if the queried
object is located at an intersection
 R Trees can do spatial data well, but it's a bad idea to use R Trees for high-dimensional data due to
degeneration. The more overlapping there is, the more a tree is degenerated. The "sweet" spot is
somewhere in the range of 2 to 6-dimensional data.
 ONLY if all attributes 𝐴 … 𝐴 are equally important can we use R-Trees
The operations on R-Trees work as follows:

1. Search: start at the root node, search for all index-entries (in leaf nodes) which have a non-empty
intersection with the query rectangle q (i.e. they overlap), find q

2. Insert: insertion performed in leaf nodes. In case of overflow, perform a split of nodes and insertion
of the respective index-entry into the parent node; hence, recursive split of parent nodes possible.
Complexity: 𝑂(log 𝑛), assuming no split is necessary, otherwise 𝑂(log 𝑛) + the complexity of the
respective split operation

3. Node Splitting:
a. Quadratic Split:
Complexity: 𝑂(𝑀 ), where M is the maximum amount of entries in a page, yet linear in
dimensions of the data space
After performing this split, if we’re inserting an object, we insert it into the rectangle with the
smallest amount of objects.
b. Linear Split:
Complexity: 𝑂(𝑀)

These are the main types of queries in R-Trees:

1. Point Query: results in the entire data page being

returned. If a point is located at the intersection of
2 pages, both are returned
2. Rectangle Query:
a. Intersection: returns an intersection of all
nodes that contain something in a range
b. Inclusion: returns only the page that fully
contains the query
3. Range Query:
a. Intersection
b. Cover
4. Nearest-Neighbor Query: returns the closest
data page at the lowest level (leaf node).
5. K-Nearest-Neighbors Query: returns k leaf
nodes closest to a data point.

When doing a rectangular query (example to the right):

1. Start at the root node
2. Do a depth-first search for the leaf node
intersecting with the query
3. Continue the search in intersecting rectangles
4. Go one level higher, repeat with step 2
PYRAMID INDEXES:

A d-dimensional key is translated into a one-dimensional key with the help of pyramids.
Then every object is assigned one specific one-dimensional key.
Example:

Each object can be indexed with just 2 values:

1. The pyramid number 𝑖 (which pyramid the object is in)
2. The pyramid value 𝑝𝑣

Pyramid Values:

For a point 𝑣 in a pyramid 𝑖 we have to calculate the height ℎ :

ℎ = |𝑣 |
With d being the number of dimensions.

Then we can generate a one-dimensional representation of the object that can be stored in a B-Tree,
the so-called pyramid value:
𝑝𝑣 = (𝑖 + ℎ )

The pyramid value is thus the sum of the pyramid number and the height of the point within that
pyramid.

The problem with pyramid values:

Each pyramid value does not represent one object only; each value represents infinitely many objects.
This might lead to many objects being returned as a result of a query (for example, a range query), so
the set of the returned objects will then have to be filtered (or refined).

If the pyramid values (i.e. heights) are different across all of the queried data points, then the data set
can and will be queried efficiently. However, if the heights of the queried data objects are the same,
then pyramid indexes don’t provide faster query times.

G3 - R-Tree, R+-Tree
No ratings yet
G3 - R-Tree, R+-Tree
47 pages
Spatial Data Indexing Techniques Explained
No ratings yet
Spatial Data Indexing Techniques Explained
56 pages
Spatial Data Management
No ratings yet
Spatial Data Management
7 pages
R Tree
No ratings yet
R Tree
11 pages
Spatial and Multimedia Database Indexing
No ratings yet
Spatial and Multimedia Database Indexing
53 pages
Background Reading - R Tree With Examples
No ratings yet
Background Reading - R Tree With Examples
24 pages
R-Trees: Efficient Spatial Data Indexing
100% (1)
R-Trees: Efficient Spatial Data Indexing
36 pages
B-Tree Indexing for Data Retrieval
No ratings yet
B-Tree Indexing for Data Retrieval
208 pages
Multidimensional Search Trees
No ratings yet
Multidimensional Search Trees
119 pages
Multidimensional Indexes
No ratings yet
Multidimensional Indexes
31 pages
Arnab Bhattacharya - Fundamentals of Database Indexing and Searching
No ratings yet
Arnab Bhattacharya - Fundamentals of Database Indexing and Searching
280 pages
The W-Tree: High-Dimensional Indexing
No ratings yet
The W-Tree: High-Dimensional Indexing
26 pages
The W-Tree: High-Dimensional Indexing
No ratings yet
The W-Tree: High-Dimensional Indexing
26 pages
Multidimensional Search Trees
No ratings yet
Multidimensional Search Trees
100 pages
R Tree
No ratings yet
R Tree
11 pages
R-Tree Variants: R, R+, R*, X, Hilbert
No ratings yet
R-Tree Variants: R, R+, R*, X, Hilbert
9 pages
7 Up
No ratings yet
7 Up
13 pages
Efficient Implementation of Range Trees
No ratings yet
Efficient Implementation of Range Trees
15 pages
1972 Bayer Mccreight
No ratings yet
1972 Bayer Mccreight
17 pages
Timos Sellis: The R - Tree: A Dynamic Index For Multi-Dimensional Objects
No ratings yet
Timos Sellis: The R - Tree: A Dynamic Index For Multi-Dimensional Objects
11 pages
R-trees for Spatial Data Indexing
No ratings yet
R-trees for Spatial Data Indexing
11 pages
Nhom 10 - A New Enhancement To The R-Tree Node Splitting
No ratings yet
Nhom 10 - A New Enhancement To The R-Tree Node Splitting
16 pages
Data Structures For Unstructured Mesh Generation
No ratings yet
Data Structures For Unstructured Mesh Generation
22 pages
UNIT-5: Indexing and Hashing
No ratings yet
UNIT-5: Indexing and Hashing
78 pages
CH 14
No ratings yet
CH 14
6 pages
Range Queries
100% (1)
Range Queries
4 pages
Understanding R-Trees for Spatial Data
No ratings yet
Understanding R-Trees for Spatial Data
29 pages
Multidimensional Search Trees Overview
No ratings yet
Multidimensional Search Trees Overview
42 pages
R-tree Spatial Indexing Techniques
No ratings yet
R-tree Spatial Indexing Techniques
27 pages
Spatial Indexing I: Point Access Methods
No ratings yet
Spatial Indexing I: Point Access Methods
52 pages
Index Structures
No ratings yet
Index Structures
34 pages
Multidimensional Index Structures
No ratings yet
Multidimensional Index Structures
70 pages
Advanced Indexing Techniques: Bibliographical Notes
No ratings yet
Advanced Indexing Techniques: Bibliographical Notes
4 pages
Bulk Loading The M-Tree To Enhance Query Performance
No ratings yet
Bulk Loading The M-Tree To Enhance Query Performance
13 pages
Answering Metric Skyline Queries by PM-tree
No ratings yet
Answering Metric Skyline Queries by PM-tree
16 pages
Advanced Data Structures: k-D and Quad Trees
No ratings yet
Advanced Data Structures: k-D and Quad Trees
69 pages
2IL50 Data Structures: 2017-18 Q3 Lecture 9: Range Searching
No ratings yet
2IL50 Data Structures: 2017-18 Q3 Lecture 9: Range Searching
40 pages
R-tree Performance Prediction Model
No ratings yet
R-tree Performance Prediction Model
11 pages
Spatial Data Management: Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 1
No ratings yet
Spatial Data Management: Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 1
7 pages
Cheat Sheet v4
No ratings yet
Cheat Sheet v4
3 pages
Index Dbms
No ratings yet
Index Dbms
5 pages
Database Management Systems November 6, 2008: Dynamic Indexes: Sections 14.3
No ratings yet
Database Management Systems November 6, 2008: Dynamic Indexes: Sections 14.3
38 pages
Spatial Data Management: Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 1
No ratings yet
Spatial Data Management: Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 1
21 pages
Lecture 5.Pptx 2
No ratings yet
Lecture 5.Pptx 2
22 pages
Organization and Maintenance of Large Ordered Indices
No ratings yet
Organization and Maintenance of Large Ordered Indices
35 pages
Indexing Structures For Files
No ratings yet
Indexing Structures For Files
25 pages
Unit Iv Indexing and Hashing: Basic Concepts
No ratings yet
Unit Iv Indexing and Hashing: Basic Concepts
35 pages
CH 17 Sum
No ratings yet
CH 17 Sum
9 pages
Database Indexing Techniques
No ratings yet
Database Indexing Techniques
50 pages
IN3020/4020 - Database Systems Spring 2020, Week 3.1 Indexing
No ratings yet
IN3020/4020 - Database Systems Spring 2020, Week 3.1 Indexing
44 pages
B Tree, B Plus and Graph
No ratings yet
B Tree, B Plus and Graph
38 pages
Unit V
No ratings yet
Unit V
81 pages
Ch14, Veiws, Normalization - Summary
No ratings yet
Ch14, Veiws, Normalization - Summary
68 pages
Multidimensional Indexing Techniques
No ratings yet
Multidimensional Indexing Techniques
20 pages
Lecture 1 - 2 February, 2010: 3.1 Types of Binary Search Trees
No ratings yet
Lecture 1 - 2 February, 2010: 3.1 Types of Binary Search Trees
7 pages
Advanced Multidimensional Array Algorithm
No ratings yet
Advanced Multidimensional Array Algorithm
9 pages
Unit 2 (Daa)
No ratings yet
Unit 2 (Daa)
5 pages
SQL Indexes
No ratings yet
SQL Indexes
20 pages
DB Cheat Sheet Till Mid
No ratings yet
DB Cheat Sheet Till Mid
2 pages
Laplace Transforms for Engineers
100% (1)
Laplace Transforms for Engineers
7 pages
Data Structures: Advanced Concepts
100% (1)
Data Structures: Advanced Concepts
2 pages
SitoMan Eng Mov11.5 Guida Introduttiva PDF
No ratings yet
SitoMan Eng Mov11.5 Guida Introduttiva PDF
106 pages
Dynamic Programming for CS Students
No ratings yet
Dynamic Programming for CS Students
13 pages
PCL Print Kit-AN1 SM Rev0 022712
No ratings yet
PCL Print Kit-AN1 SM Rev0 022712
24 pages
23ee2230f-Eas Lab Work Book
No ratings yet
23ee2230f-Eas Lab Work Book
51 pages
Projectile Motion FULL
No ratings yet
Projectile Motion FULL
25 pages
Sap Sac Interview Questions and Answers
No ratings yet
Sap Sac Interview Questions and Answers
8 pages
Taparia Price List 2021
100% (1)
Taparia Price List 2021
28 pages
EagleBurgmann - CobaDGS - Zero Emission Solution - EN
No ratings yet
EagleBurgmann - CobaDGS - Zero Emission Solution - EN
2 pages
ESci 115b Learning Guide Exp 3
No ratings yet
ESci 115b Learning Guide Exp 3
3 pages
Sudeep Documents
No ratings yet
Sudeep Documents
56 pages
SmartPLS for Structural Equation Modeling
No ratings yet
SmartPLS for Structural Equation Modeling
15 pages
2.72's Aircraft Engine Design Project: GE Transportation - Aircraft Engines
100% (1)
2.72's Aircraft Engine Design Project: GE Transportation - Aircraft Engines
43 pages
DDS Quick Guide
No ratings yet
DDS Quick Guide
21 pages
CSE-101 Summer 2023 Exam Paper
No ratings yet
CSE-101 Summer 2023 Exam Paper
2 pages
Daikin Teknikal Data
No ratings yet
Daikin Teknikal Data
4 pages
Ev 1
No ratings yet
Ev 1
43 pages
QLCP User Manual Overview
No ratings yet
QLCP User Manual Overview
47 pages
Xelil Muellim Design - Book
100% (1)
Xelil Muellim Design - Book
161 pages
908 2850 1 PB
No ratings yet
908 2850 1 PB
9 pages
Hypothesis Theory Law
No ratings yet
Hypothesis Theory Law
2 pages
One Shot Marathon Series II Material Science For ESE & GATE ME Exams
No ratings yet
One Shot Marathon Series II Material Science For ESE & GATE ME Exams
19 pages
Quick Installation Guide AX4000 1
No ratings yet
Quick Installation Guide AX4000 1
10 pages
Project 1
No ratings yet
Project 1
9 pages
STL Final Record
No ratings yet
STL Final Record
46 pages
Chemistry For Engineers
No ratings yet
Chemistry For Engineers
4 pages
LPG Related Questions
No ratings yet
LPG Related Questions
7 pages
HDFC vs ICICI Bank Financial Performance Analysis
No ratings yet
HDFC vs ICICI Bank Financial Performance Analysis
44 pages
Data Analytics Career Profile
No ratings yet
Data Analytics Career Profile
2 pages

Database Query Efficiency and Indexing

Uploaded by

Database Query Efficiency and Indexing

Uploaded by

Exercises:

3.1 – Time Complexity Analysis

The efficiency of database queries often depends on two things:

Index Structures have the following properties:

Types of Data Structuring:

1. Space-Oriented Structures: easy partitioning, but the complexity may

We can also improve efficiency by changing Internal Structures:

1. Exact Match Query

Types of Similarity Queries (from most strict to least specific):

Inverted Lists work as follows:

These are the main types of queries in R-Trees:

1. Point Query: results in the entire data page being

When doing a rectangular query (example to the right):

Each object can be indexed with just 2 values:

For a point 𝑣 in a pyramid 𝑖 we have to calculate the height ℎ :

The problem with pyramid values:

You might also like