0% found this document useful (0 votes)

21 views

Lec6 QP Indexing

The document summarizes the basics of query processing and indexing in database systems. It discusses how (1) an SQL query is parsed and optimized into logical and physical query plans before execution, (2) data is stored on disk in files organized by rows, and (3) indexes can be created on attributes to enable faster retrieval and updating of records compared to scanning the entire data file sequentially. Indexes store key-pointer pairs to allow quick access to records given a search key value. Common index structures include hash tables and B+ trees.

Uploaded by

Previzsla

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views

Lec6 QP Indexing

Uploaded by

Previzsla

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 40

CMPT 354:

Database System I
Lecture 6. Basics of Query Processing and Indexing

1
Outline
• Query Processing
• What happens when an SQL query is issued?

• Indexing
• How to speed up query performance?

2
Query Processing Steps
SQL query

SQL Parser

Logical Optimization
Query
optimization
Physical Optimization

Query Execution

Disk
3
Example
• Offering (oID, dept, cNum, term, instructor)
• Took (sID, oID, grade)

Q: Student number of all students who have taken CMPT 354

SELECT sID
FROM Offering O, Took T
WHERE O.oID = T.oID
AND O.dept = ‘CMPT’
AND O.cNum = ‘354’
4
Offering (oID, dept, cNum, term, instructor)
Took (sID, oID, grade)
SQL Parser
• From the input SQL text to a logical plan
SELECT sID
psID
FROM Offering O, Took T
WHERE O.oID = T.oID
AND O.dept = ‘CMPT’
sdept = ‘CMPT’ Ù cNum = 354
AND O.cNum = ‘354’

⨝
psID (sdept = ‘CMPT’ Ù cNum = 354 (Offering ⨝ Took))

Relational algebra expression is also Offering Took

called the “logical query plan” 5
Logical Optimization
• Find the optimal logical plan
psID psID

sdept = ‘CMPT’ Ù cNum = 354 ⨝

sdept = ‘CMPT’ Ù cNum = 354
⨝

Offering Took Offering Took

6
Physical Optimization
• Find the optimal physical plan
psID psID
(Nested loop)
⨝ (Hash Join)
⨝
V.S.
(Scan & write to T) (Scan & write to T)
sdept = ‘CMPT’ Ù cNum = 354 sdept = ‘CMPT’ Ù cNum = 354

Offering Took Offering Took

(File Scan) (File Scan) (File Scan) (File7 Scan)
Query Execution
• From a physical plan to actual machine code
psID

(Hash Join)
⨝ “Volcano Iterator Model”
Machine Code
(Scan & write to T) (e.g., C++)
sdept = ‘CMPT’ Ù cNum = 354

Offering Took
(File Scan) (File Scan) 8
Summary
• Logical plans:
• Created by the parser from the input SQL text
• Expressed as a relational algebra tree
• Each SQL query has many possible logical plans
• Physical plans:
• Goal is to choose an efficient implementation for each
operator in the RA
• Each logical plan has many possible physical plans
• Query Optimization:
• Find the optimal logical plan
• Find the optimal physical plan
9
Outline
• Query Processing
• What happens when an SQL query is issued?

• Indexing
• How to speed up query performance?

10
Query Performance
• My database application is too slow… why?
• One of the queries is very slow… why?

• To address these problems, we need to understand:

• How is data organized on disk
• What is an index
• How to select indexes

11
sID dept cNum Term instructor
10 CMPT 345 SP 2018 Jiannan
Data Storage 20 CMPT 454 FA 2018 Martin
… … … … …

• DBMSs store data in files

• Most common 10 CMPT 345 SP 2018 Jiannan
Block 1
organization is row-wise 20 CMPT 454 FA 2018 Martin

storage 30 … … … …
Block 2
40 …
• On disk, a file is split into
blocks 50
Block 3
• Each block contains a 60

set of tuples 70
Block 4
80

In the example, we have 4 blocks with 2 tuples each

12
Scanning a Data File
• Data file is stored on Disk
• Consequence: Sequential IO is MUCH FASTER than
random IO
• Good: read blocks 1, 2, 3, 4, 5
• Bad: read blocks 2342, 11, 321, 9
• Rule of thumb:
• Random reading 1-2% of the file ≈ sequential scanning the
entire file

13
Data File Types
• Heap file
• Unsorted
• Sequential file
• Sorted according to some attribute(s) called key

Note: key here means something different from primary

key: it just means that we order the file according to that
attribute. In our example we ordered by sID. Might as well
order by instructor, if that seems a better idea for the
applications running on our database.
14
Index Motivation (1)
Student(name, age)

• Suppose we want to search for students of a specific age

• First idea: Sort the records by age… we know how to do

this fast!

• How many IO operations to search over N sorted records?

• Simple scan: O(N)
• Binary search: O(𝐥𝐨𝐠 𝟐 𝑵)

Could we get even cheaper search? E.g. go from 𝐥𝐨𝐠 𝟐 𝑵

à 𝐥𝐨𝐠 𝟐𝟎𝟎 𝑵?
Index Motivation (2)

• What about if we want to insert a new student, but

keep the list sorted?
2

1,3 4,5 6,7 1,2 3,4 5,6 7,

• We would have to potentially shift N records,

requiring up to ~ 2*N/P IO operations (where P = #
of records per page)!

Could we get faster insertions?

Index Motivation (3)

• What about if we want to be able to search quickly

along multiple attributes (e.g. not just age)?
• We could keep multiple copies of the records, each
sorted by one attribute set… this would take a lot of
space

Can we get fast search over multiple attribute

sets without taking too much space?

We’ll create separate data structures called

indexes to address all these points
Index
• An additional file, that allows fast access to records in
the data file given a search key
• The index contains (key, value) pairs:
• The key = an attribute value (e.g., student ID or age)
• The value = a pointer to the record
• An index can store the full rows it points to (primary
index) or pointers to those rows (secondary index)
• We’ll mainly consider secondary indexes
• Could have many indexes for one table

18
Different Keys
• Primary key
• uniquely identifies a tuple

• Key of the sequential file

• how the data file is sorted

• Index key
• how the index is organized

19
Example 1: Index on sID
Data File
Index
10 CMPT 345 SP 2018 Jiannan
10 20 CMPT 454 FA 2018 Martin
20
30 … … … …
30
40 …
40
50 50
60 60
70
70
80
80

20
Example 2: Index on cNum
Data File
Index
10 CMPT 345 SP 2018 Jiannan
102 20 CMPT 454 FA 2018 Martin
110
30 … 110 … …
225
40 … 276
276
354 50 225
383 60 383
454
70 102
470
80 470

21
Index Organization
• Common indexes:
• Hash tables
• B+ trees

• Specialized indexes
• R-trees
• Inverted index
•…

22
B+ Tree Example
K = 30?

30 < 80 80

30 in [20,60) 20 60 100 120 140

30 in [30,40) 10 15 18 20 30 40 50 60 65 80 85 90

Not all nodes pictured

To the data! 10 12 15 20 28 30 40 60 63 80 84 89
Clustered vs. Unclustered Index

30 30

Index File
22 25 28 29 32 34 37 38 22 25 28 29 32 34 37 38

19 22 27 28 30 33 35 37 Data file 19 33 27 22 37 28 35 30

Clustered Unclustered
Clustered vs. Unclustered Index
• Recall that for a disk with block access, sequential IO is
much faster than random IO

• For exact search, no difference between clustered /

unclustered

• For range search over R values: difference between

1 random IO + R sequential IO, and R random IO
SELECT *
FROM R

x
WHERE R.K > ? And R.K < ?

Inde
d
stere
lu
Unc

Sequential Scan

Cost dex
In
te red
Clus

0 100
Percentage tuples retrieved
26
Summary
• Index = a file that enables direct access to records
in another data file
• B+ tree / Hash table
• Clustered/unclustered

• Data resides on disk

• Organized in blocks
• Sequential IO is more efficient than random IO
• Random read 1-2% of data worse than sequential scan
of the entire file

27
Creating Indexes in SQL

• Offering (oID, dept, cNum, term, instructor)

CREATE INDEX IDX1 ON Offering(dept)

Which query(s) could be affected by IDX1?

SELECT oID FROM Offering

(A) WHERE dept = ‘CMPT’

SELECT oID FROM Offering

(B)
WHERE cNum = ‘354’

SELECT oID FROM Offering

• Offering (oID, dept, cNum, term, instructor)

CREATE INDEX IDX2 ON Offering(dept, cNum)

Which query(s) could be affected by IDX2?

SELECT oID FROM Offering

(A) WHERE dept = ‘CMPT’

SELECT oID FROM Offering

(B)
WHERE cNum = ‘354’

SELECT oID FROM Offering

• How many indexes could we create?

• Which indexes should we create?

30
Which Indexes?
• The index selection problem
• Given a table, and a “workload” (SFU CourSys
application with lots of SQL queries), decide which
indexes to create (and which ones NOT to create!)

• Who does index selection:

• The database administrator DBA

• Semi-automatically, using a database administration tool

31
Index Selection: Which Search Key
• Make some attribute K a search key if the WHERE
clause contains:
• An exact match on K
• A range predicate on K
• A join on K

32
The Index Selection Problem 1
• Your workload is

100000 queries 100000 queries

SELECT sID SELECT sID
FROM Student FROM Student
WHERE name = ? WHERE gender = ?

Which one is better?

A. Index on name
B. Index on gender
33
The Index Selection Problem 2
• Your workload is

100000 queries 100000 queries

SELECT sID SELECT sID
FROM Student FROM Student
WHERE name like ? WHERE age = ?

Which one is better?

A. Index on name
B. Index on age
34
The Index Selection Problem 3
• Your workload is

100000 queries 100 queries

SELECT sID SELECT sID
FROM Student FROM Student
WHERE name = ? WHERE age = ?

Which one(s) are useful?

A. Index on name
B. Index on age
C. Index on name, age
35
D. Index on age, name
The Index Selection Problem 4
• Your workload is

100000 queries 100000 queries

SELECT sID SELECT sID
FROM Student FROM Student
WHERE fname = ? WHERE fname = ? AND age > ?

Which one is better?

A. Index on (fname, age)
B. Index on (age, fname)
36
The Index Selection Problem 5
• Your workload:
100000 queries 100 queries 100000 queries
SELECT sID SELECT sID INSERT INTO Student
FROM Student FROM Student VALUES (?, …, ?)
WHERE name = ? WHERE age = ?

Which one(s) are useful?

A. Index on name
B. Index on age
C. Index on name, age
37
D. Index on age, name
Basic Index Selection Guidelines
• Consider queries in workload in order of importance

• Consider relations accessed by query

• No point indexing other relations

• Look at WHERE clause for possible search key

• Try to choose indexes that speed up multiple queries

38
Summary
• Query Processing
• SQL Parser
• Logical Optimization
• Physical Optimization
• Query Execution

• Indexing
• Data Storage
• Index motivation
• Index Selection

39
Acknowledge
• Some lecture slides were copied from or inspired by the
following course materials
• “W4111: Introduction to databases” by Eugene Wu at
Columbia University
• “CSE344: Introduction to Data Management” by Dan Suciu at
University of Washington
• “CMPT354: Database System I” by John Edgar at Simon Fraser
University
• “CS186: Introduction to Database Systems” by Joe Hellerstein
at UC Berkeley
• “CS145: Introduction to Databases” by Peter Bailis at Stanford
• “CS 348: Introduction to Database Management” by Grant
Weddell at University of Waterloo
40

DP Ss3 Note First Term
100% (2)
DP Ss3 Note First Term
43 pages
Counting Sort - Good
No ratings yet
Counting Sort - Good
8 pages
Index On The Search Key, and Heap Files With An Unclusted Hash Index. Briefly Discuss The
No ratings yet
Index On The Search Key, and Heap Files With An Unclusted Hash Index. Briefly Discuss The
5 pages
Lec20Indexing_v1
No ratings yet
Lec20Indexing_v1
57 pages
8 Query Optimization
No ratings yet
8 Query Optimization
39 pages
11.2 Indexing
No ratings yet
11.2 Indexing
26 pages
UEU Basis Data Pertemuan 14
No ratings yet
UEU Basis Data Pertemuan 14
32 pages
SQL Query Optimization
No ratings yet
SQL Query Optimization
49 pages
Indexing
No ratings yet
Indexing
62 pages
Index and Hashing 2017 Combined
No ratings yet
Index and Hashing 2017 Combined
60 pages
Index & Query Optimization
No ratings yet
Index & Query Optimization
21 pages
Chap. 2 File Organization and Indexing: Abel J.P. Gomes
No ratings yet
Chap. 2 File Organization and Indexing: Abel J.P. Gomes
20 pages
File Organizations and Indexing: R&G Chapter 8
No ratings yet
File Organizations and Indexing: R&G Chapter 8
26 pages
Index: Presented By-VISHAKHA CHANDRA (10030141082)
No ratings yet
Index: Presented By-VISHAKHA CHANDRA (10030141082)
29 pages
Lec 8 Indexing & Data Structures for Query Processing
No ratings yet
Lec 8 Indexing & Data Structures for Query Processing
51 pages
Lecture3 File Orgn
No ratings yet
Lecture3 File Orgn
13 pages
Lecture9 PDF
No ratings yet
Lecture9 PDF
45 pages
Lesson 9 Lecture9
No ratings yet
Lesson 9 Lecture9
45 pages
Lecture12(CNC 312)
No ratings yet
Lecture12(CNC 312)
36 pages
L11 QueryProcessing I
No ratings yet
L11 QueryProcessing I
42 pages
index1 (5)
No ratings yet
index1 (5)
25 pages
ADB - CH2 - Advanced SQL
No ratings yet
ADB - CH2 - Advanced SQL
60 pages
PPT-203105251-3
No ratings yet
PPT-203105251-3
35 pages
W5 Storage Files Indexing pt1
No ratings yet
W5 Storage Files Indexing pt1
61 pages
Indexing - II
No ratings yet
Indexing - II
57 pages
Tuning
100% (2)
Tuning
29 pages
File Organizations and Indexing: R&G Chapter 8
No ratings yet
File Organizations and Indexing: R&G Chapter 8
40 pages
Query Processing, Optimization, and Indexing Techniques
No ratings yet
Query Processing, Optimization, and Indexing Techniques
29 pages
Indexing in Database
No ratings yet
Indexing in Database
33 pages
File Organization
No ratings yet
File Organization
41 pages
Inls 623 - Database Systems Ii - File Structures, Indexing, and Hashing
No ratings yet
Inls 623 - Database Systems Ii - File Structures, Indexing, and Hashing
41 pages
DINLect1.pptx
No ratings yet
DINLect1.pptx
69 pages
05 QueryProcessing LecW4 Feb7 22
No ratings yet
05 QueryProcessing LecW4 Feb7 22
55 pages
Tuning: Overview: Leccotech
No ratings yet
Tuning: Overview: Leccotech
29 pages
mod4
No ratings yet
mod4
4 pages
MySQL-Indexing Best Practices (WEBINAR)
No ratings yet
MySQL-Indexing Best Practices (WEBINAR)
41 pages
File Organizations and Indexing: R&G Chapter 8
No ratings yet
File Organizations and Indexing: R&G Chapter 8
40 pages
File Organizations and Indexing: R&G Chapter 8
No ratings yet
File Organizations and Indexing: R&G Chapter 8
40 pages
Tuning SQL Queries - Oracle
100% (1)
Tuning SQL Queries - Oracle
27 pages
Introduction To Database Systems CSE 344: Basic Query Evaluation and Indexes
No ratings yet
Introduction To Database Systems CSE 344: Basic Query Evaluation and Indexes
38 pages
Query Optimization
No ratings yet
Query Optimization
9 pages
L6 Query Optimization
No ratings yet
L6 Query Optimization
52 pages
Layers of a DBMS
No ratings yet
Layers of a DBMS
38 pages
PHP 09 MySQL
No ratings yet
PHP 09 MySQL
58 pages
Query Optimization in Mysql Database Usi F8e2fb8b
No ratings yet
Query Optimization in Mysql Database Usi F8e2fb8b
7 pages
Planning For SQL Server® 2012 Indexing
No ratings yet
Planning For SQL Server® 2012 Indexing
25 pages
Lt20 21 Index
No ratings yet
Lt20 21 Index
28 pages
Indexing in Relational Databases
No ratings yet
Indexing in Relational Databases
2 pages
Index Architecture: Febriliyan Samopa
No ratings yet
Index Architecture: Febriliyan Samopa
110 pages
7-Indexing and Block
No ratings yet
7-Indexing and Block
20 pages
DBMS Storage and Indexing
No ratings yet
DBMS Storage and Indexing
80 pages
CS 345: Topics in Data Warehousing: Thursday, October 21, 2004
No ratings yet
CS 345: Topics in Data Warehousing: Thursday, October 21, 2004
29 pages
Lecture 16
No ratings yet
Lecture 16
19 pages
V Unit
No ratings yet
V Unit
15 pages
V_Unit[1]
No ratings yet
V_Unit[1]
36 pages
Indexes
No ratings yet
Indexes
70 pages
IN3020/4020 - Database Systems Spring 2020, Week 3.1 Indexing
No ratings yet
IN3020/4020 - Database Systems Spring 2020, Week 3.1 Indexing
44 pages
data science course training in india hyderabad: innomatics research labs
From Everand
data science course training in india hyderabad: innomatics research labs
innomatics research labs
No ratings yet
Python Performance Engineering: Strategies and Patterns for Optimized Code
From Everand
Python Performance Engineering: Strategies and Patterns for Optimized Code
Aarav Joshi
No ratings yet
PostgreSQL 9 Administration Cookbook LITE: Configuration, Monitoring and Maintenance
From Everand
PostgreSQL 9 Administration Cookbook LITE: Configuration, Monitoring and Maintenance
Simon Riggs
3/5 (1)
IPython Interactive Computing and Visualization Cookbook
From Everand
IPython Interactive Computing and Visualization Cookbook
Cyrille Rossant
5/5 (1)
Scala Cheat Sheet
No ratings yet
Scala Cheat Sheet
3 pages
Chapter One
No ratings yet
Chapter One
27 pages
COSC 3101A - Design and Analysis of Algorithms 7
No ratings yet
COSC 3101A - Design and Analysis of Algorithms 7
50 pages
Computer Algoritham For Chennai Univarsity Unit5
No ratings yet
Computer Algoritham For Chennai Univarsity Unit5
11 pages
Amazon - LeetCode
No ratings yet
Amazon - LeetCode
23 pages
DAA Theory Notes
No ratings yet
DAA Theory Notes
31 pages
Lab Manual-3
No ratings yet
Lab Manual-3
74 pages
7 Data Structure II
No ratings yet
7 Data Structure II
38 pages
Lec 04
No ratings yet
Lec 04
79 pages
More Approximation Algorithms Based On Linear Programming: Exercise 1. Show That M
No ratings yet
More Approximation Algorithms Based On Linear Programming: Exercise 1. Show That M
9 pages
Dr. S. Vairachilai Department of CSE CVR College of Engineering Mangalpalli Telangana
No ratings yet
Dr. S. Vairachilai Department of CSE CVR College of Engineering Mangalpalli Telangana
18 pages
CS300_2020_FALL_FINAL_EXAM__1_.pdf
No ratings yet
CS300_2020_FALL_FINAL_EXAM__1_.pdf
2 pages
Project #1: Maze Solving Task 03 - Coding The Solution Documentation
No ratings yet
Project #1: Maze Solving Task 03 - Coding The Solution Documentation
2 pages
DSA Lab 09
No ratings yet
DSA Lab 09
5 pages
Chapter 6 Job Scheduling
No ratings yet
Chapter 6 Job Scheduling
25 pages
Big O Notation Cheat Sheet
No ratings yet
Big O Notation Cheat Sheet
2 pages
Niharika 01
No ratings yet
Niharika 01
24 pages
Unit 5 Data Science
No ratings yet
Unit 5 Data Science
18 pages
Department of Computer Science and Engineering: Daffodil International University
No ratings yet
Department of Computer Science and Engineering: Daffodil International University
3 pages
FLANN Presnetation For Group
No ratings yet
FLANN Presnetation For Group
26 pages
ms6021 Thomas
No ratings yet
ms6021 Thomas
3 pages
Important Questions
No ratings yet
Important Questions
13 pages
DSA Project
No ratings yet
DSA Project
3 pages
NCERT Solutions: CLASS - X Mathematics Chapter-01 Real Numbers (Exercise 1.1)
No ratings yet
NCERT Solutions: CLASS - X Mathematics Chapter-01 Real Numbers (Exercise 1.1)
3 pages
SVKM'S Nmims Mukesh Patel School of Technology Management& Engineering (Campus Name)
No ratings yet
SVKM'S Nmims Mukesh Patel School of Technology Management& Engineering (Campus Name)
16 pages
Musa - S Problem Solution
No ratings yet
Musa - S Problem Solution
19 pages
Adaptive Huffman Code
No ratings yet
Adaptive Huffman Code
46 pages
Chapter7 Arrays
No ratings yet
Chapter7 Arrays
26 pages

Lec6 QP Indexing

Uploaded by

Lec6 QP Indexing

Uploaded by

CMPT 354:

Q: Student number of all students who have taken CMPT 354

Relational algebra expression is also Offering Took

sdept = ‘CMPT’ Ù cNum = 354 ⨝

Offering Took Offering Took

Offering Took Offering Took

• To address these problems, we need to understand:

• DBMSs store data in files

In the example, we have 4 blocks with 2 tuples each

Note: key here means something different from primary

• Suppose we want to search for students of a specific age

• First idea: Sort the records by age… we know how to do

• How many IO operations to search over N sorted records?

Could we get even cheaper search? E.g. go from 𝐥𝐨𝐠 𝟐 𝑵

• What about if we want to insert a new student, but

1,3 4,5 6,7 1,2 3,4 5,6 7,

• We would have to potentially shift N records,

Could we get faster insertions?

• What about if we want to be able to search quickly

Can we get fast search over multiple attribute

We’ll create separate data structures called

• Key of the sequential file

30 in [20,60) 20 60 100 120 140

Not all nodes pictured

• For exact search, no difference between clustered /

• For range search over R values: difference between

• Data resides on disk

• Offering (oID, dept, cNum, term, instructor)

Which query(s) could be affected by IDX1?

SELECT oID FROM Offering

SELECT oID FROM Offering

SELECT oID FROM Offering

• Offering (oID, dept, cNum, term, instructor)

Which query(s) could be affected by IDX2?

SELECT oID FROM Offering

SELECT oID FROM Offering

SELECT oID FROM Offering

• How many indexes could we create?

• Which indexes should we create?

• Who does index selection:

• Semi-automatically, using a database administration tool

100000 queries 100000 queries

Which one is better?

100000 queries 100000 queries

Which one is better?

100000 queries 100 queries

Which one(s) are useful?

100000 queries 100000 queries

Which one is better?

Which one(s) are useful?

• Consider relations accessed by query

• Look at WHERE clause for possible search key

• Try to choose indexes that speed up multiple queries

You might also like