3 - Query Tuning
3 - Query Tuning
Viet-Trung Tran
SoICT
SQL
Optimized
Parser Query plan
execution
plan
Optimizer
Code
Generator
• Scans and parses the query into individual tokens and examines for
the correctness of query
• Does it containt the right keywords?
• Does it conform to the syntax?
• Does it containt the valid tables, attributes?
• Output: Query plan
• E.g.
• Input: SELECT balance FROM account WHERE balance < 2500
• Output: Relational algebra expression
• But it’s not unique
1.4. Optimizer
• Input: RA expression
• Basic Operators
• One-pass operators:
• Scan
• Select
• Project
• Multi-pass operators:
• Join
• Various implementations
• Handling of larger-than-memory sources
• Aggregation, union, etc.
2.2. Step 2: Execution algorithms of RA
operations
• Nested-loop JOIN
• No index needed
• Any join condition types
• Expensive: O(n2)
2.2. Step 2: Execution algorithms of RA
operations
• Sort-merge JOIN
• Requires data physically sorted by join attributes: Merge and join
sorted files, reading sequentially a block at a time
• Maintain two file pointers
• While tuple at R < tuple at S, advance R (and vice versa)
• While tuples match, output all possible pairings
• Very efficient for presorted data. Otherwise, may require a sort (adds
cost + delay)
2.2. Step 2: Execution algorithms of RA
operations
• Partition-hash JOIN
• Hash two relations on join attributes
• Join buckets accordingly
2.2. Step 2: Execution algorithms of RA
operations
Join
PressRel.Symbol = EastCoast.CoSymbol
Join Project
PressRel.Symbol = Clients.Symbol
CoSymbol
Select
Client = “Atkins”
• Materialization
• Performs the innermost or leaf-level operations first of the query
execution plan
• The intermediate result of each operation is materialized into
temporary relation and becomes input for subsequent operations.
• The cost of materialization is the sum of the individual operations plus
the cost of writing the intermediate results to disk
• lots of temporary files, lots of I/O.
2.2. Step 2: Execution algorithms of RA
operations
• Pipelining
• Operations form a queue, and results are passed from one operation
to another as they are calculated
• Pipelining restructures the individual operation algorithms so that they
take streams of tuples as both input and output.
• Limitation
• algorithms that require sorting can only use pipelining if the input is already
sorted beforehand
• since sorting by nature cannot be performed until all tuples to be sorted are known.
2.3. Step 3: Cost estimation
• Heuristic rules
• Break apart conjunctive selections into a sequence of simple selections
• Move 𝜎 down the query tree as soon as possible
• Replace 𝜎-x pairs by ⋈
• Break apart and move Π down the tree as soon as possible
• Perform the joins with the smallest expected result first
Remark
• avoid DISTINCTs
• subqueries often inefficient
• temporary tables might help
• use clustering indexes for joins
• HAVING vs. WHERE
• use views with care
• system peculiarities: OR and order in FROM clause