0% found this document useful (0 votes)
61 views

Query Processing and Optimization

The document discusses query execution plans and query optimization techniques. It covers: - Query execution plans represent queries as trees using relational algebra and specify access methods. - Query optimization uses heuristics like pushing selections before joins and cost-based methods to choose lowest cost plans. - Cost estimates consider factors like access, memory, storage and computation costs. Selectivity estimates help determine best join order.

Uploaded by

Fazal Mahar
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views

Query Processing and Optimization

The document discusses query execution plans and query optimization techniques. It covers: - Query execution plans represent queries as trees using relational algebra and specify access methods. - Query optimization uses heuristics like pushing selections before joins and cost-based methods to choose lowest cost plans. - Cost estimates consider factors like access, memory, storage and computation costs. Selectivity estimates help determine best join order.

Uploaded by

Fazal Mahar
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 24

Lecture 26

Query Processing and Optimization

Query Execution Plans

A query execution plan is a combination of


Relational algebra query representation (i.e. tree)

There is order

Information about the access methods to be used for each relation

How to access every relation

E.g., PRIMARY INDEX, CLUSTERING INDEX, SECONDARY INDEX Why not index all fields?

Methods for computing the relational operators stored in the tree

Many algorithms to do joins, sorting, selection etc

Query Execution Plans

Searching algorithms
Linear (#recs/2) or binary (log2(#recs)) Use index or not?

Joins
Nested loop join R JOIN S

For each tr in R do

For each ts in S do Test pair (tr, ts) to see if they can be joined If so, add to result Loop

Loop E.g. Department Join Employee

Query Execution Plans

Select salary From Employee Where salary < 25000

P salary

salary < 25000; use secondary index on salary; use binary search
Employee

Query Optimization

Can be achieved through two techniques

Using heuristic rules

Reorder the operations in the internal representation of a query (tree or graph) to improve performance A heuristic rule works well in MOST cases but it is NOT GUARANTEED to work in ALL possible cases

Selections before joins better efficiency

Using cost estimations

Find the costs of the different execution strategies and choose the one with the lowest cost Computationally intensive

Most DBMSs combine both

Issues with Heuristics

Which is a better?

(1) Join with a selection afterwards

(R.A=?) (R

R.B = S.B

S)

(2) Selection with a join afterwards

( R.A=? (R)) R.B = S.B S What if S is too small compared to R Index on B in R but no index on A

(2) Apply selection a linear scan of all of R (1) Join and reject results that dont satisfy selection

Query Representation

Query tree
Tree data structure that corresponds to a relational algebra expression

Input relations of the query as leaf nodes of the tree The relational algebra operations as internal nodes

An execution of the query tree consists of


executing an internal node operation whenever its operands are available replacing that internal node by the relation that results from executing the operation

There are many trees for the same query


Trees always have a strict order among their operations Query optimization must find best order

Sample Query

Example: For every project located in Stafford, retrieve the project number, the controlling department number and the department managers last name, address and birthdate
SELECT P.NUMBER,P.DNUM,E.LNAME, E.ADDRESS, E.BDATE FROM PROJECT AS P,DEPARTMENT AS D, EMPLOYEE AS E WHERE P.DNUM=D.DNUMBER AND D.MGRSSN=E.SSN AND P.PLOCATION=STAFFORD;

Relational algebra:
PNUMBER, DNUM, LNAME, ADDRESS, BDATE (((PLOCATION=STAFFORD(PROJECT))
DNUM=DNUMBER

(DEPARTMENT))

MGRSSN=SSN (EMPLOYEE))

Internal nodes are - Executed when inputs are ready - Replaced by results Internal nodes

Input relations

Different representation for the same algebra expression assumed to be the initial form

There are many trees for the same query - strict order among their operations - Query optimization must find best order

Heuristics in Query Optimization

The main heuristic is to first apply the operations that reduce the size of intermediate results

E.g., Apply SELECT and PROJECT operations before applying the JOIN or other binary operations 1- Push selections down 2- Apply more restrictive selections first

General heuristic optimization Algorithm


Selectivity estimated by DBMS

3- Combine cross products and selections to become joins 4- Push projections down

Steps in converting a query tree during heuristic optimization

Select names of employees working on the Aquarius Project and born after 1957 Select Lname From Employee, Works_On, Project Where Pname = Acquarius and bithdate> 12/31/1957 and SSN=ESSN and PNumber = PNO
Usually, we start with Cross Products, followed by selections, followed by Projects

Steps in converting a query tree during heuristic optimization


(a) Initial query tree for the SQL query made by parser

1-Push selections down 2-Apply more restrictive selections first -e.g. equalities before range queries 3-Combine cross products and selections to become joins 4-Push projections down

(b) Moving SELECT operations down the query tree

(c) Applying the more restrictive SELECT operation first

(d) Replacing CARTESIAN PRODUCT and SELECT with JOIN operations

(e) Moving PROJECT operations down the query tree.

Transformation Rules

Transformation rules transform one relational algebra expression to AN EQUIVALENT ONE General Transformation Rules:

Used by the query optimizer to optimize query tree Any rule, if applied, makes sure that the resulting tree is equivalent resulting execution plan is equivalent (2) Commutativity of : The operation is commutative:
c1 (c2(R)) = c2 (c1(R))

(6.a) Commuting with (or x ): If all the attributes in the selection condition c involve only the attributes of one of the relations being joinedsay Rthe two operations can be commuted as follows :

More selective selections first

c ( R S ) = (c (R)) Do selections first

rest in the book p. 574-576

Heuristics in Query Optimization

Outline of a Heuristic Algebraic Optimization Algorithm

Break up any select operations with conjunctive conditions into a cascade of select operations Move each select operation as far down the query tree as is permitted by the attributes involved in the selection condition Rearrange the leaf nodes of the tree so that the leaf node relations with the most restrictive select operations are executed first in the query tree representation.

Heuristics in Query Optimization

Combine a cross product operation with a subsequent select operation in the tree into a join operation Break down and move lists of projection attributes down the tree as far as possible by creating new project operations as needed

(Importantgiven a query optimize using this algorithm)


SQL For every query block

Initial query tree

Start with Cross Products, then Selections, then Projects

Apply algorithm

Selectivity and Cost Estimates in Query Optimization

A query optimizer does not depend completely on heuristics


Not always optimal

Cost-based query optimization


Estimate and compare the costs of executing a query using different execution strategies and choose the one with the lowest cost estimate

Issues
Cost function Number of execution strategies to be considered

Limit the number

Much better for compiled queries where optimization is done once at compile time and the query is executed many times

PreparedStatements VS Statements

Selectivity and Cost Estimates in Query Optimization

Cost Components for Query Execution


Access cost to secondary storage Memory usage cost

Searching, reading, writing, updating, etc Number of memory buffers needed for the query Storing any intermediate files that are generated by an execution strategy for the query

Storage cost

Communication cost

Shipping the results from the database site to the users site
Of performing in-memory operations on the data buffers during the execution plan (searching, sorting, joining, arithmetic)

Computation cost

Exercise Heuristic Optimization

Query in SQL For every query block


Initial query tree

Start with Cross Products, then Selections, then Projects

Apply algorithm

Find Lname and SSN of all employees in the Design Department working on project 5 who earn more than the highest paid employee working on the Project X Project

You might also like