Lec 38 Query Optimization1
Lec 38 Query Optimization1
Jan-Apr, 2018
Failure Classification
Storage Structure
Recovery and Atomicity
Log-Based Recovery
L
TE
NP
Database System Concepts - 6th Edition 38.2 ©Silberschatz, Korth and Sudarshan
PPD
Module Objectives
SWAYAM: NPTEL-NOC MOOCs Instructor: Prof. P P Das, IIT Kharagpur. Jan-Apr, 2018
Cost
• Selection Operation
• Sorting
• Join Operation
•
L
Other Operations
TE
NP
OVERVIEW OF QUERY PROCESSING
Database System Concepts - 6th Edition 38.5 ©Silberschatz, Korth and Sudarshan
Basic Steps in Query Processing
SWAYAM: NPTEL-NOC MOOCs Instructor: Prof. P P Das, IIT Kharagpur. Jan-Apr, 2018
Query Optimization: Amongst all equivalent evaluation plans choose the one with
lowest cost
Cost is estimated using statistical information from the database catalog
L
e.g. number of tuples in each relation, size of tuples, etc.
TE
In this module we study
How to measure query costs
NP
Algorithms for evaluating relational algebra operations
How to combine algorithms for individual operations in order to evaluate a
complete expression
In the next module
We study how to optimize queries, that is, how to find an evaluation plan with
lowest estimated cost
Database System Concepts - 6th Edition 38.9 ©Silberschatz, Korth and Sudarshan
PPD
• Overview of Query
Processing
• Measures of Query
SWAYAM: NPTEL-NOC MOOCs Instructor: Prof. P P Das, IIT Kharagpur. Jan-Apr, 2018
Cost
• Selection Operation
• Sorting
• Join Operation
•
L
Other Operations
TE
NP
MEASURES OF QUERY COST
Database System Concepts - 6th Edition 38.10 ©Silberschatz, Korth and Sudarshan
Measures of Query Cost
SWAYAM: NPTEL-NOC MOOCs Instructor: Prof. P P Das, IIT Kharagpur. Jan-Apr, 2018
For simplicity we just use the number of block transfers from disk and the number
of seeks as the cost measures
L
tT – time to transfer one block
tS – time for one seek
TE
Cost for b block transfers plus S seeks
b * tT + S * tS
NP
We ignore CPU costs for simplicity
Real systems do take CPU cost into account
We do not include cost to writing output to disk in our cost formulae
Database System Concepts - 6th Edition 38.12 ©Silberschatz, Korth and Sudarshan
Measures of Query Cost (Cont.)
SWAYAM: NPTEL-NOC MOOCs Instructor: Prof. P P Das, IIT Kharagpur. Jan-Apr, 2018
Cost
• Selection
Operation
• Sorting
• Join Operation
L
• Other Operations
TE
NP
SELECTION OPERATION
Database System Concepts - 6th Edition 38.14 ©Silberschatz, Korth and Sudarshan
Selection Operation
SWAYAM: NPTEL-NOC MOOCs Instructor: Prof. P P Das, IIT Kharagpur. Jan-Apr, 2018
File scan
Algorithm A1 (linear search). Scan each file block and test all records to see whether
they satisfy the selection condition
L
Cost estimate = br block transfers + 1 seek
TE
br denotes number of blocks containing records from relation r
If selection is on a key attribute, can stop on finding record
cost = (br /2) block transfers + 1 seek
NP
Linear search can be applied regardless of
selection condition or
ordering of records in the file, or
availability of indices
Note: binary search generally does not make sense since data is not stored
consecutively
except when there is an index available,
and binary search requires more seeks than index search
Database System Concepts - 6th Edition 38.15 ©Silberschatz, Korth and Sudarshan
Selections Using Indices
SWAYAM: NPTEL-NOC MOOCs Instructor: Prof. P P Das, IIT Kharagpur. Jan-Apr, 2018
Disjunction:σθ1∨ θ2 ∨. . . θn (r).
A10 (disjunctive selection by union of identifiers)
L
Applicable if all conditions have available indices
TE
Otherwise use linear scan
Use corresponding index for each condition, and take union of all the obtained sets
of record pointers
NP
Then fetch records from file
Negation: σ¬θ(r)
Use linear scan on file
If very few records satisfy ¬θ, and an index is applicable to θ
Find satisfying records using index and fetch from file
Database System Concepts - 6th Edition 38.21 ©Silberschatz, Korth and Sudarshan
PPD
• Overview of Query
Processing
• Measures of Query
SWAYAM: NPTEL-NOC MOOCs Instructor: Prof. P P Das, IIT Kharagpur. Jan-Apr, 2018
Cost
• Selection Operation
• Sorting
• Join Operation
•
L
Other Operations
TE
SORTING NP
Database System Concepts - 6th Edition 38.22 ©Silberschatz, Korth and Sudarshan
Sorting
SWAYAM: NPTEL-NOC MOOCs Instructor: Prof. P P Das, IIT Kharagpur. Jan-Apr, 2018
We may build an index on the relation, and then use the index to read the relation in
sorted order
May lead to one disk block access for each tuple
L
For relations that fit in memory, techniques like quicksort can be used
TE
For relations that do not fit in memory, external sort-merge is a good choice
NP
Database System Concepts - 6th Edition 38.23 ©Silberschatz, Korth and Sudarshan
Example: External Sorting Using Sort-Merge
SWAYAM: NPTEL-NOC MOOCs Instructor: Prof. P P Das, IIT Kharagpur. Jan-Apr, 2018
L
TE
NP
Database System Concepts - 6th Edition 38.24 ©Silberschatz, Korth and Sudarshan
External Sort-Merge
SWAYAM: NPTEL-NOC MOOCs Instructor: Prof. P P Das, IIT Kharagpur. Jan-Apr, 2018
2. Merge the runs (N-way merge). We assume (for now) that N < M
1. Use N blocks of memory to buffer input runs, and 1 block to buffer output. Read
the first block of each run into its buffer page
2. repeat
L
TE
1. Select the first record (in sort order) among all buffer pages
2. Write the record to the output buffer. If the output buffer is full write it to disk.
NP
3. Delete the record from its input buffer page
If the buffer page becomes empty then
read the next block (if any) of the run into the buffer
3. until all input buffer pages are empty:
Database System Concepts - 6th Edition 38.26 ©Silberschatz, Korth and Sudarshan
External Sort-Merge (Cont.)
SWAYAM: NPTEL-NOC MOOCs Instructor: Prof. P P Das, IIT Kharagpur. Jan-Apr, 2018
Cost
• Selection Operation
• Sorting
• Join Operation
•
L
Other Operations
TE
NP
JOIN OPERATION
Database System Concepts - 6th Edition 38.28 ©Silberschatz, Korth and Sudarshan
Join Operation
SWAYAM: NPTEL-NOC MOOCs Instructor: Prof. P P Das, IIT Kharagpur. Jan-Apr, 2018
In the worst case, if there is enough memory only to hold one block of each relation, the estimated cost is
n ∗ b + b block transfers, plus
r s r
nr + br seeks
L
If the smaller relation fits entirely in memory, use that as the inner relation.
TE
Reduces cost to br + bs block transfers and 2 seeks
Example of join of students and takes:
Number of records of student: 5,000 takes: 10,000
NP
Number of blocks of student: 100 takes: 400
Assuming worst case memory availability cost estimate is
with student as outer relation:
5000 ∗ 400 + 100 = 2,000,100 block transfers,
5000 + 100 = 5100 seeks
with takes as the outer relation
10000 ∗ 100 + 400 = 1,000,400 block transfers and 10,400 seeks
If smaller relation (student) fits entirely in memory, the cost estimate will be 500 block transfers
Block nested-loops algorithm (next slide) is preferable
Database System Concepts - 6th Edition 38.31 ©Silberschatz, Korth and Sudarshan
Block Nested-Loop Join
SWAYAM: NPTEL-NOC MOOCs Instructor: Prof. P P Das, IIT Kharagpur. Jan-Apr, 2018
Cost
• Selection Operation
• Sorting
• Join Operation
•
L
Other Operations
TE
NP
OTHER OPERATIONS
Database System Concepts - 6th Edition 38.36 ©Silberschatz, Korth and Sudarshan
Other Operations
SWAYAM: NPTEL-NOC MOOCs Instructor: Prof. P P Das, IIT Kharagpur. Jan-Apr, 2018
Duplicate elimination
Projection
L
Aggregation
TE
Set Operations
Outer Join
NP
Database System Concepts - 6th Edition 38.37 ©Silberschatz, Korth and Sudarshan
Other Operations
SWAYAM: NPTEL-NOC MOOCs Instructor: Prof. P P Das, IIT Kharagpur. Jan-Apr, 2018
Understood the overall flow for Query Processing and defined the Measures of Query Cost
Studied the algorithms for processing Selection Operations, Sorting, Join Operations and a few
Other Operations
L
TE
NP
Database System Concepts - 6th Edition 38.40 ©Silberschatz, Korth and Sudarshan
PPD
Instructor and TAs
SWAYAM: NPTEL-NOC MOOCs Instructor: Prof. P P Das, IIT Kharagpur. Jan-Apr, 2018