0% found this document useful (0 votes)
6 views

3 - Query Tuning

Uploaded by

Hunter Money
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

3 - Query Tuning

Uploaded by

Hunter Money
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

Query tuning

Viet-Trung Tran
SoICT

9/27/21 Database Tuning


1
What is query tuning

• Rewrite query to run faster


• First thing to do if query is slow
• Other tuning approaches related to query
• Adding indexes
• Changing schema ( 3, 4 NF, etc)
• Modify transaction lengh

9/27/21 Database Tuning


2
1. Overview

• What is query processing


• Phrases of query processing
• Parser
• Optimizer
1.1. What is query processing

• The entire process or activities involved in retrieving data from the


database
• SQL query translation into low level instructions (usually relational algebra)
• Query optimization to save resources, cost estimation or evaluation of query
• Query execution for the extraction of data from the database.
1.2. Phases of query processing

SQL

Optimized
Parser Query plan
execution
plan
Optimizer
Code
Generator

Code for executing


1.3. Parser

• Scans and parses the query into individual tokens and examines for
the correctness of query
• Does it containt the right keywords?
• Does it conform to the syntax?
• Does it containt the valid tables, attributes?
• Output: Query plan
• E.g.
• Input: SELECT balance FROM account WHERE balance < 2500
• Output: Relational algebra expression
• But it’s not unique
1.4. Optimizer

• Input: RA expression

• Output: Query execution plan


• Query execution plan = query plan + the algorithms for the
executions of RA operations
• Aims to choose the cheapest execution plan out of the
possible ones
• Step 1: Equivalence transformation
• Step 2: Annotation for the algorithm of the RA expression
• Step 3: Cost estimation for different query execution plans
2. Understanding optimizer

• Choose the cheapest execution plan out of the possible ones


• Step 1: Equivalence transformation
• Step 2: Annotation for the algorithmic execution of the RA expression
• Step 3: Cost estimation for different query execution plans
2.1. Step 1: Equivalence transformation

• RA expressions are equivalent if they generate the same set of tuples


on every database instance
• Equivalence rules:
• Transform one relational algebra expression into equivalent one
• Similar to numeric algebra: a + b = b + a, a(b + c) = ab + ac, etc
• Why producing equivalent expressions?
• equivalent algebraic expressions give the same result
• but usually the execution time varies significantly
2.1. Step 1: Equivalence transformation

• Equivalance tranformation rules


• (1) Conjunctive selection operations can be deconstructed into a
sequence of individual seections; cascade of 𝜎
• 𝜎!! ∧ !" 𝐸 = 𝜎!! 𝜎!" 𝐸
• (2) Selection operations are commutative
• 𝜎!! 𝜎!" 𝐸 = 𝜎!" 𝜎!! 𝐸
• (3) Only the final operations in a sequence of projection operations
is needed; cascade of Π
• Π#! Π#" … Π## 𝐸 … = Π#! (𝐸)
• (4) Selections can be combined with Cartesian products and theta
joins
• 𝜎!! 𝐸$ × 𝐸% = 𝐸$ ⋈!! 𝐸%
• 𝜎!! 𝐸$ ⋈!" 𝐸% = 𝐸$ ⋈!! ∧ !" 𝐸%
2.1. Step 1: Equivalence transformation

• Equivalance tranformation rules


• (5) Theta Join operations are commutative
• 𝐸! ⋈" 𝐸# = 𝐸# ⋈" 𝐸!
• (6) Natural join operations are associative
• 𝐸! ⋈ 𝐸# ⋈ 𝐸$ = (𝐸! ⋈ 𝐸# ) ⋈ 𝐸$
• Theta join are associative in the follwoing manner where θ# involves
attributes from E2 and E3 only
• (𝐸! ⋈"! 𝐸# ) ⋈"" ∧ "# 𝐸$ = 𝐸! ⋈! ∧ "# (𝐸# ⋈"" 𝐸$ )
2.1. Step 1: Equivalence transformation

• Equivalance tranformation rules


• (7) Selection distributes over joins in the following ways
• If predicate involves attributes of E1 only
• 𝜎"! 𝐸! ⋈"" 𝐸# = 𝜎"! (𝐸! ) ⋈"" 𝐸#
• If predicate θ! involves only attributes of E1 and θ# involves only
attributes of E2 (a consequence of rule 7 and 1)
• 𝜎"! ∧ "" 𝐸! ⋈"# 𝐸# = 𝜎"! (𝐸! ) ⋈"# 𝜎"" (𝐸# )
2.1. Step 1: Equivalence transformation

• Equivalance tranformation rules


• (8) Projection distributes over join as follows
• Π&!∪&" (𝐸! ⋈" 𝐸# ) = Π&! (𝐸! ) ⋈" Π&" (𝐸# )
• If 𝜃 involves attributes in 𝐿! ∪ 𝐿# only and 𝐿( contains attributes of 𝐸(
• (9) The set operations union and intersection are
commutative
• 𝐸! ∪ 𝐸# = 𝐸# ∪ 𝐸!
• 𝐸! ∩ 𝐸# = 𝐸# ∩ 𝐸!
• (10) The union and intersection are associative
• (𝐸! ∪ 𝐸# ) ∪ 𝐸$ = 𝐸! ∪ (𝐸# ∪ 𝐸$ )
2.1. Step 1: Equivalence transformation

• Equivalance tranformation rules


• (11) The selection operation distributes over union,
intersection, and set-difference
• 𝜎" 𝐸! ∪ 𝐸# = 𝜎" (𝐸! ) ∪ 𝜎" (𝐸# )
• 𝜎" 𝐸! ∩ 𝐸# = 𝜎" (𝐸! ) ∩ 𝜎" (𝐸# )
• 𝜎" 𝐸! − 𝐸# = 𝜎" (𝐸! ) − 𝜎" (𝐸# )
• (12) The project operation distributes over the union
• Π& 𝐸! ∪ 𝐸# = Π& (𝐸! ) ∪ Π& (𝐸# )
2.2. Step 2: Execution algorithms of RA
operations

• Algebra expression is not a query execution plan.


• Additional decisions required:
• which indexes to use, for example, for joins and selects?
• which algorithms to use, for example, sort-merge vs. hash join?
• materialize intermediate results or pipeline them?
2.2. Step 2: Execution algorithms of RA
operations

• Basic Operators
• One-pass operators:
• Scan
• Select
• Project
• Multi-pass operators:
• Join
• Various implementations
• Handling of larger-than-memory sources
• Aggregation, union, etc.
2.2. Step 2: Execution algorithms of RA
operations

• 1-Pass Operators: Scanning a Table


• Sequential scan: read through blocks of table
• Index scan: retrieve tuples in index order
2.2. Step 2: Execution algorithms of RA
operations

• Nested-loop JOIN

For each tuple tr in r {


for each tuple ts in s {
if (tr and ts satisfy the join condition) {
add tuple tr x ts to the result set
}
}
}

• No index needed
• Any join condition types
• Expensive: O(n2)
2.2. Step 2: Execution algorithms of RA
operations

• Single-loop JOIN (Index-based)

for each tube tr in R {


seach for ts in s thought index {
if ts.exist() {
add tr x ts to the result set
}
}
}
• Index needed
• Cheaper: O(nlogm)
2.2. Step 2: Execution algorithms of RA
operations

• Sort-merge JOIN
• Requires data physically sorted by join attributes: Merge and join
sorted files, reading sequentially a block at a time
• Maintain two file pointers
• While tuple at R < tuple at S, advance R (and vice versa)
• While tuples match, output all possible pairings
• Very efficient for presorted data. Otherwise, may require a sort (adds
cost + delay)
2.2. Step 2: Execution algorithms of RA
operations

• Partition-hash JOIN
• Hash two relations on join attributes
• Join buckets accordingly
2.2. Step 2: Execution algorithms of RA
operations

• Execution Strategy: Materialization vs. Pipelining


• Execution strategy defines how to walk the query execution plan
• Materialization
• Pipelining

Join
PressRel.Symbol = EastCoast.CoSymbol

Join Project
PressRel.Symbol = Clients.Symbol
CoSymbol

Select
Client = “Atkins”

Scan Scan Scan


PressRel Clients EastCoast
2.2. Step 2: Execution algorithms of RA
operations

• Materialization
• Performs the innermost or leaf-level operations first of the query
execution plan
• The intermediate result of each operation is materialized into
temporary relation and becomes input for subsequent operations.
• The cost of materialization is the sum of the individual operations plus
the cost of writing the intermediate results to disk
• lots of temporary files, lots of I/O.
2.2. Step 2: Execution algorithms of RA
operations

• Pipelining
• Operations form a queue, and results are passed from one operation
to another as they are calculated
• Pipelining restructures the individual operation algorithms so that they
take streams of tuples as both input and output.
• Limitation
• algorithms that require sorting can only use pipelining if the input is already
sorted beforehand
• since sorting by nature cannot be performed until all tuples to be sorted are known.
2.3. Step 3: Cost estimation

• Each relational algebra expression can result in many query execution


plans
• Some query execution plans may be better than others
• Finding the fastest one
• Just an estimation under certain assumptions
• Huge number of query plans may exist
2.3. Step 3: Cost estimation

• Cost estimation factors


• Catalog information: database maintains statistics about relations
• Ex.
• number of tuples per relation
• number of blocks on disk per relation
• number of distinct values per attribute
• histogram of values per attribute
• Problems
• cost can only be estimated
• updating statistics is expensive, thus they are often out of date
2.3. Step 3: Cost estimation

• Choosing the cheapest query plan


• Problem:
• Estimating cost for all possible plans too expensive.
• Solutions:
• pruning: stop early to evaluate a plan
• heuristics: do not evaluate all plans
• Real databases use a combination of
• Apply heuristics to choose promising query plans.
• Choose cheapest plan among the promising plans using pruning.
• Examples of heuristics:
• perform selections as early as possible
• perform projections early avoid Cartesian products
2.3. Step 3: Cost estimation

• Heuristic rules
• Break apart conjunctive selections into a sequence of simple selections
• Move 𝜎 down the query tree as soon as possible
• Replace 𝜎-x pairs by ⋈
• Break apart and move Π down the tree as soon as possible
• Perform the joins with the smallest expected result first
Remark

• Query processing is the entire process or activities involved in


retrieving data from the database
• Parser
• Optimizer
• Code generator
• Query optimizer
• Step 1: Equivalence transformation
• Step 2: Annotation for the algorithm of the RA expression
• Step 3: Cost estimation for different query execution plans
Why query tuning? Why query optimizer is not
enough?

• Optimizers are not perfect:


• transformations produce only a subset of all possible query plans
• only a subset of possible annotations might be considered
• cost of query plans can only be estimated
• Query Tuning: Make life easier for your query optimizer!

9/27/21 Database Tuning


30
Figure out problematic queries

• Which queries should be rewritten?


• Rewrite queries that run “too slow”
• How to find these queries?
• query issues far too many disc accesses,
for example, point query scans an entire table
• you look at the query plan and see that relevant indexes are not used

9/27/21 Database Tuning


31
Overview of query tuning

• avoid DISTINCTs
• subqueries often inefficient
• temporary tables might help
• use clustering indexes for joins
• HAVING vs. WHERE
• use views with care
• system peculiarities: OR and order in FROM clause

9/27/21 Database Tuning


32
Testbed scenario

• Employee(ssnum, name, manager, dept, salary, numfriends)


• clustering index on ssnum
• non-clustering index on name
• non-clustering index on dept
• keys: ssnum, name
• Students(ssnum, name, course, grade)
• clustering index on ssnum
• non-clustering index on name
• keys: ssnum, name
• Techdept(dept, manager, location)
• clustering index on dept
• key: dept
• manager may manage many departments
• a location may contain many departments

9/27/21 Database Tuning


33
DISTINCT

• How can DISTINCT hurt?


• DISTINCT forces sort or other overhead.
• If not necessary, it should be avoided.
• Query: Find employees who work in the information systems
department.
• SELECT DISTINCT ssnum
FROM Employee
WHERE dept = ’information systems’
• DISTINCT not necessary:
• ssnum is a key of Employee, so it is also a key of a subset of Employee.
• Note: Since an index is defined on ssnum, there is likely to be no
overhead in this particular examples.

9/27/21 Database Tuning


34
Non-Correlated Subqueries
• Many systems handle subqueries inefficiently.
• Non-correlated: attributes of outer query not used in inner query.
• Query:
• SELECT ssnum
FROM Employee
WHERE dept IN (SELECT dept FROM Techdept)
• May lead to inefficient evaluation:
• check for each employee whether they are in Techdept
• index on Employee.dept not used!
• Equivalent query:
• SELECT ssnum
FROM Employee, Techdept
WHERE Employee.dept = Techdept.dept
• Efficient evaluation:
• look up employees for each dept in Techdept
use index on Employee.dept

9/27/21 Database Tuning


35
Temporary tables

• Temporary tables can hurt in the following ways:


• force operations to be performed in suboptimal order
(optimizer often does a very good job!)
• creating temporary tables i.s.s.1 causes catalogue update – possible
concurrency control bottleneck
• system may miss opportunity to use index
• Temporary tables are good:
• to rewrite complicated correlated subqueries
• to avoid ORDER BYs and scans in specific cases (see example)

9/27/21 Database Tuning


36
Ex. Unnecessary temp table

• Query: Find all IT department employees who earn more than


40000.
• SELECT * INTO Temp
FROM Employee
WHERE salary > 40000
SELECT ssnum
FROM Temp
WHERE Temp.dept = ’IT’
• Inefficient SQL:
• index on dept can not be used
• overhead to create Temp table (materialization vs. pipelining)
• Efficient SQL:
• SELECT ssnum
FROM Employee
WHERE Employee.dept = ’IT’
AND salary > 40000

9/27/21 Database Tuning


37
Joins: Use clustering indexes and numeric
values

• Query: Find all students who are also employees.


• Inefficient SQL:
• SELECT Employee.ssnum
FROM Employee, Student
WHERE Employee.name = Student.name
• Efficient SQL:
• SELECT Employee.ssnum
FROM Employee, Student
WHERE Employee.ssnum = Student.ssnum
• Benefits:
• Join on two clustering indexes allows merge join (fast!).
• Numerical equality is faster evaluated than string equality.

9/27/21 Database Tuning


38
Don’t use HAVING where WHERE is enough

• Query: Find average salary of the IT department.


• Inefficient SQL:
• SELECT AVG(salary) as avgsalary, dept
FROM Employee
GROUP BY dept
HAVING dept = ’IT’
• Problem: May first compute average for employees of all
departments.
• Efficient SQL: Compute average only for relevant employees.
• SELECT AVG(salary) as avgsalary, dept
FROM Employee
WHERE dept = ’IT’
GROUP BY dept

9/27/21 Database Tuning


39
Use views with care

• Views: macros for queries


• queries look simpler
• but are never faster and sometimes slower
• Creating a view:
• CREATE VIEW Techlocation
AS SELECT ssnum, Techdept.dept, location
FROM Employee, Techdept
WHERE Employee.dept = Techdept.dept
• Using the view:
• SELECT location
FROM Techlocation
WHERE ssnum = 452354786
• System expands view and executes:
• SELECT location
FROM Employee, Techdept
WHERE Employee.dept = Techdept.dept
AND ssnum = 452354786

9/27/21 Database Tuning


40
• Query: Get the department name for the employee with social
security number 452354786 (who works in a technical
department).
• Example of an inefficient SQL:
• SELECT dept
FROM Techlocation
WHERE ssnum = 452354786
• This SQL expands to:
• SELECT dept
FROM Employee, Techdept
WHERE Employee.dept = Techdept.dept
AND ssnum = 452354786
• But there is a more efficient SQL (no join!) doing the same thing:
• SELECT dept
FROM Employee
WHERE ssnum = 452354786

9/27/21 Database Tuning


41
System peculiarity: Indexes and OR

• Some systems never use indexes when conditions are OR-


connected.
• Query: Find employees with name Smith or who are in the
acquisitions department.
• SELECT Employee.ssnum
FROM Employee
WHERE Employee.name = ’Smith’
OR Employee.dept = ’acquisitions’
• Fix: use UNION instead of OR
• SELECT Employee.ssnum
FROM Employee
WHERE Employee.name = ’Smith’
UNION
SELECT Employee.ssnum
FROM Employee
WHERE Employee.dept = ’acquisitions’

9/27/21 Database Tuning


42

You might also like