Lec6 QP Indexing
Lec6 QP Indexing
Database System I
Lecture 6. Basics of Query Processing and Indexing
1
Outline
• Query Processing
• What happens when an SQL query is issued?
• Indexing
• How to speed up query performance?
2
Query Processing Steps
SQL query
SQL Parser
Logical Optimization
Query
optimization
Physical Optimization
Query Execution
Disk
3
Example
• Offering (oID, dept, cNum, term, instructor)
• Took (sID, oID, grade)
SELECT sID
FROM Offering O, Took T
WHERE O.oID = T.oID
AND O.dept = ‘CMPT’
AND O.cNum = ‘354’
4
Offering (oID, dept, cNum, term, instructor)
Took (sID, oID, grade)
SQL Parser
• From the input SQL text to a logical plan
SELECT sID
psID
FROM Offering O, Took T
WHERE O.oID = T.oID
AND O.dept = ‘CMPT’
sdept = ‘CMPT’ Ù cNum = 354
AND O.cNum = ‘354’
⨝
psID (sdept = ‘CMPT’ Ù cNum = 354 (Offering ⨝ Took))
(Hash Join)
⨝ “Volcano Iterator Model”
Machine Code
(Scan & write to T) (e.g., C++)
sdept = ‘CMPT’ Ù cNum = 354
Offering Took
(File Scan) (File Scan) 8
Summary
• Logical plans:
• Created by the parser from the input SQL text
• Expressed as a relational algebra tree
• Each SQL query has many possible logical plans
• Physical plans:
• Goal is to choose an efficient implementation for each
operator in the RA
• Each logical plan has many possible physical plans
• Query Optimization:
• Find the optimal logical plan
• Find the optimal physical plan
9
Outline
• Query Processing
• What happens when an SQL query is issued?
• Indexing
• How to speed up query performance?
10
Query Performance
• My database application is too slow… why?
• One of the queries is very slow… why?
11
sID dept cNum Term instructor
10 CMPT 345 SP 2018 Jiannan
Data Storage 20 CMPT 454 FA 2018 Martin
… … … … …
storage 30 … … … …
Block 2
40 …
• On disk, a file is split into
blocks 50
Block 3
• Each block contains a 60
set of tuples 70
Block 4
80
13
Data File Types
• Heap file
• Unsorted
• Sequential file
• Sorted according to some attribute(s) called key
18
Different Keys
• Primary key
• uniquely identifies a tuple
• Index key
• how the index is organized
19
Example 1: Index on sID
Data File
Index
10 CMPT 345 SP 2018 Jiannan
10 20 CMPT 454 FA 2018 Martin
20
30 … … … …
30
40 …
40
50 50
60 60
70
70
80
80
20
Example 2: Index on cNum
Data File
Index
10 CMPT 345 SP 2018 Jiannan
102 20 CMPT 454 FA 2018 Martin
110
30 … 110 … …
225
40 … 276
276
354 50 225
383 60 383
454
70 102
470
80 470
21
Index Organization
• Common indexes:
• Hash tables
• B+ trees
• Specialized indexes
• R-trees
• Inverted index
•…
22
B+ Tree Example
K = 30?
30 < 80 80
30 in [30,40) 10 15 18 20 30 40 50 60 65 80 85 90
30 30
Index File
22 25 28 29 32 34 37 38 22 25 28 29 32 34 37 38
19 22 27 28 30 33 35 37 Data file 19 33 27 22 37 28 35 30
Clustered Unclustered
Clustered vs. Unclustered Index
• Recall that for a disk with block access, sequential IO is
much faster than random IO
x
WHERE R.K > ? And R.K < ?
Inde
d
stere
lu
Unc
Sequential Scan
Cost dex
In
te red
Clus
0 100
Percentage tuples retrieved
26
Summary
• Index = a file that enables direct access to records
in another data file
• B+ tree / Hash table
• Clustered/unclustered
27
Creating Indexes in SQL
30
Which Indexes?
• The index selection problem
• Given a table, and a “workload” (SFU CourSys
application with lots of SQL queries), decide which
indexes to create (and which ones NOT to create!)
31
Index Selection: Which Search Key
• Make some attribute K a search key if the WHERE
clause contains:
• An exact match on K
• A range predicate on K
• A join on K
32
The Index Selection Problem 1
• Your workload is
38
Summary
• Query Processing
• SQL Parser
• Logical Optimization
• Physical Optimization
• Query Execution
• Indexing
• Data Storage
• Index motivation
• Index Selection
39
Acknowledge
• Some lecture slides were copied from or inspired by the
following course materials
• “W4111: Introduction to databases” by Eugene Wu at
Columbia University
• “CSE344: Introduction to Data Management” by Dan Suciu at
University of Washington
• “CMPT354: Database System I” by John Edgar at Simon Fraser
University
• “CS186: Introduction to Database Systems” by Joe Hellerstein
at UC Berkeley
• “CS145: Introduction to Databases” by Peter Bailis at Stanford
• “CS 348: Introduction to Database Management” by Grant
Weddell at University of Waterloo
40