0% found this document useful (0 votes)
15 views

29-Query Optimization-04-10-2024

vfsvzfv

Uploaded by

Hemesh R
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

29-Query Optimization-04-10-2024

vfsvzfv

Uploaded by

Hemesh R
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 35

Query Processing

Heuristic Query Optimization


• σ marks >60 (π marks (student_marks))

• π marks ( σ marks >60 (student_marks))


Processing a Query
• Tasks in processing a high-level query
1. Scanner scans the query and identifies the language tokens
2. Parser checks syntax of the query
3. The query is validated by checking that all attribute names and
relation names are valid
4. An intermediate internal representation for the query is created
(query tree or query graph)
5. Query execution strategy is developed
6. Query optimizer produces an execution plan
7. Code generator generates the object code
8. Runtime database processor executes the code

• Query processing and query optimization


Processing a Query
• Input:
– A query written in SQL is given as input to the
query processor.
• Parsing
– In this step, the parser of the query processor
module checks the syntax of the query, the user’s
privileges to execute the query, the table names
and attribute names, etc. The correct table names,
attribute names and the privilege of the users can
be taken from the system catalog (data
dictionary).
• Translation
– If we have written a valid query, then it is
converted from high level language SQL to low
level instruction in Relational Algebra.
– For example, our SQL query can be converted into
a Relational Algebra equivalent as follows;
– SELECT Ename FROM Employee, Proj_Assigned
WHERE Employee.Eno = Proj_Assigned.Eno AND
DOP > 10;
– πEname(σDOP>10 Λ Employee.Eno=Proj_Assigned.Eno (Employee X
Prof_Assigned))
• Optimizer
– Optimizer uses the statistical data stored as part of
data dictionary. The statistical data are information
about the size of the table, the length of records,
the indexes created on the table, etc. Optimizer also
checks for the conditions and conditional attributes
which are parts of the query.
• Execution Plan
– A query can be expressed in many ways. The query
processor module, at this stage, using the
information collected in step 3 to find different
relational algebra expressions that are equivalent
and return the result of the one which we have
written already.
– For our example, the query written in Relational
algebra can also be written as the one given below;
– πEname(Employee ⋈Eno (σDOP>10 (Prof_Assigned)))
– So far, we have got two execution plans. Only
condition is that both plans should give the same
result.
• Evaluation
– At this stage, we choose one execution plan of the
several we have developed. This Execution plan
accesses data from the database to give the final
result.
– In our example, the second plan may be good. In the
first plan, we join two relations (costly operation)
then apply the condition (conditions are considered
as filters) on the joined relation. This consumes more
time as well as space.
– In the second plan, we filter one of the tables
(Proj_Assigned) and the result is joined with the
Employee table. This join may need to compare less
number of records. Hence, the second plan is the
best (with the information known, not always).
Heuristic Optimization of Query Trees

Heuristic: Rule that leads to the least cost in most


cases

Query Tree (relational algebra expression)

leaf node :relations


Internal node :relational algebra operations
execution of query trees: post order traversal of tree
Heuristic Optimization of Query Trees-
Example
Heuristic Optimization of Query Trees-
Example
• Query
"Find the last names of employees born after
1957 who work on a project named ‘Aquarius’."

• SQL
SELECT LNAME
FROM EMPLOYEE, WORKS_ON, PROJECT
WHERE PNAME=‘Aquarius’ AND PNUMBER=PNO
AND ESSN=SSN AND BDATE.‘1957-12-31’;
Steps in converting a query tree during heuristic optimization.

a) Initial (canonical) query tree for SQL query Q.

b) Moving SELECT operations down the query tree.

c) Applying the more restrictive SELECT operation first.

d) Replacing CARTESIAN PRODUCT and SELECT with JOIN operations.

e) Moving PROJECT operations down the query tree.


SELECT LNAME
FROM EMPLOYEE, WORKS_ON, PROJECT
WHERE PNAME=“Aquarius’ AND PNUMBER=PNO
AND ESSN=SSN AND BDATE > ‘DEC-31-1957’

Canonical
query tree

6-16 18-17
a) Executing this tree directly first creates a very large file
containing the CARTESIAN PRODUCT of the entire
EMPLOYEE, WORKS_ON, and PROJECT files. That is
why the initial query tree is never executed, but is
transformed into another equivalent tree that is efficient to
execute. This particular query needs only one record from
the PROJECT relation— for the ‘Aquarius’ project—and
only the EMPLOYEE records for those whose date of birth
is after ‘1957-12-31’.
Moving SELECT operations
down the query tree

6-16 18-19
(b) shows an improved query tree that first applies the
SELECT operations to reduce the number of tuples that
appear in the CARTESIAN PRODUCT.
(c) Applying more
restrictive SELECT operation first

SELECT LNAME
FROM EMPOYEE, WORKS_ON, PROJECT
WHERE PNAME=‘Aquarius’ AND
PUMBER=PNO AND
ESSN=SSN AND
BDATE > ‘DEC-31-1957’

6-17 18-21
c) A further improvement is achieved by switching the
positions of the EMPLOYEE and PROJECT relations in the
tree, as shown in Figure (c). This uses the information that
Pnumber is a key attribute of the PROJECT relation, and
hence the SELECT operation on the PROJECT relation will
retrieve a single record only.
Replacing CARTESIAN PRODUCT and SELECT with JOIN

6-17 18-23
d) We can further improve the query tree by replacing any
CARTESIAN PRODUCT operation that is followed by a join
condition with a JOIN operation, as shown in Figure(d).
Moving PROJECT operations down

Transformation should keep equivalence

6-18 18-25
e) Another improvement is to keep only the attributes
needed by subsequent operations in the intermediate
relations, by including PROJECT (π) operations as early as
possible in the query tree, as shown in Figure (e). This
reduces the attributes (columns) of the intermediate
relations, whereas the SELECT operations reduce the
number of tuples (records).
SQL Query with an Uncorrelated
Subquery
Find the movies with stars born in 1960
MovieStar(name, address, gender, birthdate)
StarsIn(title, year, starName)

SELECT title
FROM StarsIn
WHERE starName IN (
SELECT name
FROM MovieStar
WHERE birthdate LIKE ‘%1960’
);
Parse Tree
<Query>

<SFW>

SELECT <SelList> FROM <FromList> WHERE <Condition>

<Attribute> <RelName> <Tuple> IN <Query>

title StarsIn <Attribute> ( <Query> )

starName <SFW>

SELECT <SelList> FROM <FromList> WHERE <Condition>

<Attribute> <RelName> <Attribute> LIKE <Pattern>

name MovieStar birthDate ‘%1960’


Generating Relational Algebra
title
Two-argument selection

StarsIn <condition>

<tuple> IN name
<attribute> birthdate LIKE ‘%1960’
starName MovieStar
Applying the Rewrite Rule

title title
starName=name

StarsIn <condition>

<tuple> IN name StarsIn δ

<attribute> birthdate LIKE ‘%1960’ name


birthdate LIKE ‘%1960’
starName MovieStar

MovieStar
Improving the Logical Query Plan
title
title
starName=name
 starName=name

StarsIn δ
StarsIn name
name
birthdate LIKE ‘%1960’
birthdate LIKE ‘%1960’
MovieStar
MovieStar
SQL Queries and Relational Algebra
(1)
• Example
SELECT Lname, Fname
FROM EMPLOYEE
WHERE Salary > ( SELECT MAX(Salary)
FROM EMPLOYEE
WHERE Dno = 5 )
• Inner block and outer block

ICS 424 - 01 (072) Query Processing and Optimization 32


Translating SQL Queries into Relational Algebra
SELECT LNAME, FNAME
FROM EMPLOYEE
WHERE > ( SELECT MAX (SALARY)
SALARY FROM EMPLOYEE
WHERE DNO = 5);

SELECT LNAME, FNAME SELECT MAX (SALARY)


FROM EMPLOYEE FROM EMPLOYEE
WHERE SALARY > C WHERE DNO = 5

πLNAME, FNAME (σSALARY>C(EMPLOYEE)) ℱMAX SALARY (σDNO=5 (EMPLOYEE))

ICS 424 - 01 (072) Query Processing and Optimization 33


SQL Queries and Relational Algebra
(2)

• Uncorrelated nested queries Vs Correlated nested queries


• Example
Retrieve the name of each employee who works on all the projects controlled
by department number 5.

SELECT FNAME, LNAME


FROM EMPLOYEE
WHERE ( (SELECT PNO
FROM WORKS_ON
WHERE SSN=ESSN)
CONTAINS
(SELECT PNUMBER
FROM PROJECT
WHERE DNUM=5) )
ICS 424 - 01 (072) Query Processing and Optimization 34
SQL Queries and Relational Algebra
• Example (3)
For every project located in ‘Stafford’, retrieve the project number,
the controlling department number and the department
manager’s last name, address and birthdate.
• SQL query:
SELECT P.NUMBER,P.DNUM,E.LNAME, E.ADDRESS, E.BDATE

FROM PROJECT AS P,DEPARTMENT AS D, EMPLOYEE AS E


WHERE P.DNUM=D.DNUMBER AND D.MGRSSN=E.SSN AND
P.PLOCATION=‘STAFFORD’;

• Relation algebra:
PNUMBER, DNUM, LNAME, ADDRESS, BDATE (((PLOCATION=‘STAFFORD’(PROJECT))
DNUM=DNUMBER (DEPARTMENT)) MGRSSN=SSN (EMPLOYEE))

ICS 424 - 01 (072) Query Processing and Optimization 35


SQL Queries and Relational Algebra
(4)

ICS 424 - 01 (072) Query Processing and Optimization 36

You might also like