0% found this document useful (0 votes)
217 views7 pages

Chapter 2 Querry Proccessing

The document describes the process of query processing which involves 3 main steps: 1) Parsing and translating the query into an internal representation like a query tree. 2) Optimization to choose the most efficient execution plan by estimating the cost of different plans. 3) Evaluation which executes the chosen plan and returns results to the user. Query optimization is the key step that determines the best execution strategy to minimize resources like time and space. The optimizer considers different access methods, join algorithms, and database statistics to select the lowest cost plan.

Uploaded by

Musariri Talent
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
217 views7 pages

Chapter 2 Querry Proccessing

The document describes the process of query processing which involves 3 main steps: 1) Parsing and translating the query into an internal representation like a query tree. 2) Optimization to choose the most efficient execution plan by estimating the cost of different plans. 3) Evaluation which executes the chosen plan and returns results to the user. Query optimization is the key step that determines the best execution strategy to minimize resources like time and space. The optimizer considers different access methods, join algorithms, and database statistics to select the lowest cost plan.

Uploaded by

Musariri Talent
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

OVERVIEW OF QUERY PROCESSING

Query processing is defined as follows:


i) A 3-step process that transforms a high-level query (of relational calculus/SQL)
into an equivalent and more efficient lower-level query (of relational algebra).
ii) Query processing would mean the entire process or activity which involves query
translation into low level instructions, query optimization to save resources, cost
estimation or evaluation of query and extraction of data from the database; the
goal is to find an efficient query execution plan for a given SQL query which
would minimize the cost considerably especially time.
iii) The activities involved in parsing, validating, optimizing and executing a query.
The basic steps in query processing are shown in the diagram below:

I. Parsing and translation


 translate the query into its internal form. This is then translated into relational
algebra.
 Parser checks syntax, verifies relations
II. Optimization
This is the activity of choosing an efficient execution strategy for processing a query.
Amongst all equivalent evaluation plans choose the one with lowest cost. Cost is
estimated using statistical information from the database catalog
 e.g. number of tuples in each relation, size of tuples, etc.
1|Page
III. Evaluation
 The query-execution engine takes a query-evaluation plan, executes that plan,
and returns the answers to the query.

QUERY PROCESSING EXAMPLE


Relations: EMP(ENO, ENAME, TITLE), ASG(ENO,PNO,RESP,DUR)
Query: Find the names of employees who are managing a project ?
– High level query
SELECT ENAME
FROM EMP,ASG
WHERE EMP.ENO = ASG.ENO AND DUR > 37
Step 1: Parsing
In this step, the parser of the query processor module checks the syntax of the query, the
user’s privileges to execute the query, the table names and attribute names etc. The correct
table names, attribute names and the privilege of the users can be taken from the system
catalog (data dictionary).
Step 2: Translation
If we have written a valid query, then it is converted from high level language SQL to low
level instruction in Relational Algebra.
– Two possible transformations of the query are:

∗ Expression 1: ΠENAME(σDUR>37∧EMP.ENO=ASG.ENO(EMP × ASG))

∗ Expression 2: ΠENAME(EMP ⋊⋉ENO(σDUR>37(ASG)))

Step 3: Optimizer
The optimizer uses the statistical data stored as part of the data dictionary. The statistical data
are information about the size of the table, the length of records, the indexes created on the
table etc. The optimizer also checks for the conditions and conditional attributes which are
parts of the query.
Step 4: Execution Plan

2|Page
A query can be expressed in many ways. The query processor module at this stage using the
information collected in step 3 to find different relational algebra expressions that are
equivalent and return the result of the one which we have written already. So far we have got
two execution plans. Only condition is that both plans should give the same result.
Step 5: Evaluation
Though we have got many execution plans constructed through statistical data, though they
return the same result they differ in terms of time consumption to execute the query or the
space required for executing the query. Hence it is mandatory to choose one plan which
obviously consumes less cost.
In our case Expression 2 avoids the expensive and large intermediate Cartesian product, and
therefore typically is better.
Output:
The final result is shown to the user.
PHASES IN QUERY PROCESSING

a) QUERY DECOMPOSITION

It aims to transform a high level query into a relational algebra query and to check that the
query is syntactically and semantically correct. The stages involved here include:

a) Analysis- where the query is analyzed; this stage also verifies that the relations and
attributes specifies in the query are defined in the system catalog and that any
operations applied to database objects are appropriate for the object type; E.G.

You have the following information:

Table:Staff

Format Length Primary Null


EmployeeNo Int 4 Yes No
EmployeeName Varchar 25 No Yes
Date_of_birth Date No Yes
Position Char 10 No No
National_id_number Varchar 11 No No

SQL Query

SELECT staffNo FROM Staff WHERE position>10

This query would be rejected on two grounds:

i) The attribute staffNo is not defined in the table Staff


ii) In the WHERE clause, the comparison ‘>10’ is incompatible with the data
type for position which is a character
3|Page
However let us consider the following query:

SELECT EmployeeName FROM Staff WHERE Position = “manager” OR Position


= “clerk” AND EmployeeNo > 1283.

This query will be processed since it is correct. On completion of this stage, the high
level query is transformed into some internal representation that is more suitable for
processing. The internal representation chosen is a QUERY TREE which is
constructed as follows:

 A leaf node is created for each base relation in the query.


 A non leaf node is created for each intermediate relation produced by a
relational algebra operation.
 The root of the tree represents the result of the query.
 The sequence of operations is directed from the leaves to the root.

b) Normalization- this stage converts the query into a normalised form that can be more
easily manipulated. There are two normal forms here:

i) Conjunctive normal form- this is a sequence of conjuncts that are connected


with the ^ (AND) operator. Each conjunct contains one or more terms
connected by the ̌ (OR) operator e.g.

(Position = “manager” ̌ Position = “clerk”) ^ EmployeeNo >1283

N.B. A conjunctive selection contains only those tuples that satisfy all
conjuncts.

ii) Disjunctive normal form- a sequence of disjuncts that are connected with the ̌
(OR) operator. Each disjunct contains one or more terms connected by the ^
(AND) operator. E.g.

(Position = “ manager” ^ EmployeeNo >1283) ̌ ( Position = “clerk” ^


EmployeeNo >1283)

N.B. A disjunctive selection contains those tuples formed by the union of all
tuples that satisfy the disjuncts.

c) Semantic Analysis- the objective is to reject normalised queries that are incorrectly
formulated or contradictory. A query is incorrectly formulated if components do not
contribute to the generation of the result which may happen if some join
specifications are missing. A query is contradictory if its predicate cannot be satisfied
by any tuple e,g, an employee cannot be both a manager and a clerk ( Position =
“manager” ^ Position= “clerk”). To ascertain correctness of queries one can construct
a Relation Connection Graph or a Normalised Attribute Connection Graph.

4|Page
d) Simplification- the objective is to detect redundant qualifications, eliminate common
sub-expressions and transform the query to a semantically equivalent but more easily
and efficiently computed form.
e) Query Restructuring- the query is restructured to provide a more efficient
implementation.

b) QUERY OPTIMIZATION

Purpose of Query Optimization

Query optimization attempts to generate the best execution plan for a SQL statement. The
best execution plan is defined as the plan with the lowest cost among all considered candidate
plans. The cost computation accounts for factors of query execution such as I/O, CPU, and
communication.

The best method of execution depends on myriad conditions including how the query is
written, the size of the data set, the layout of the data, and which access structures exist. The
optimizer determines the best plan for a SQL statement by examining multiple access
methods, such as full table scan or index scans, and different join methods such as nested
loops and hash joins.

Because the database has many internal statistics and tools at its disposal, the optimizer is
usually in a better position than the user to determine the best method of statement execution.
For this reason, all SQL statements use the optimizer.

Consider a user who queries records for employees who are managers. If the database
statistics indicate that 80% of employees are managers, then the optimizer may decide that a
full table scan is most efficient. However, if statistics indicate that few employees are
managers, then reading an index followed by a table access by rowid may be more efficient
than a full table scan.

Optimizer Components

The optimizer contains three main components, which are shown below:

Figure 1: Optimizer Components

5|Page
1. Query transformer

The optimizer determines whether it is helpful to change the form of the query so that
the optimizer can generate a better execution plan.

For some statements, the query transformer determines whether it is advantageous to


rewrite the original SQL statement into a semantically equivalent SQL statement with
a lower cost. When a viable alternative exists, the database calculates the cost of the
alternatives separately and chooses the lowest-cost alternative.

The optimizer employs several query transformation techniques which include OR


Expansion, View Merging, Predicate Pushing, Star Transformation etc

2. Estimator

The optimizer estimates the cost of each plan based on statistics in the data dictionary. The
estimator is the component of the optimizer that determines the overall cost of a given
execution plan.

The estimator uses three different measures to determine cost:

 Selectivity

The percentage of rows in the row set that the query selects, with 0 meaning no rows
and 1 meaning all rows. Selectivity is tied to a query predicate, such as WHERE
last_name LIKE 'A%', or a combination of predicates. A predicate becomes more

6|Page
selective as the selectivity value approaches 0 and less selective (or more unselective)
as the value approaches 1.

Cardinality

The cardinality is the number of rows returned by each operation in an execution plan. This
input, which is crucial to obtaining an optimal plan, is common to all cost functions. The
estimator can derive cardinality from the table statistics collected by DBMS_STATS, or
derive it after accounting for effects from predicates (filter, join, and so on), DISTINCT or
GROUP BY operations, and so on. The Rows column in an execution plan shows the
estimated cardinality.

Cost

This measure represents units of work or resource used. The query optimizer uses disk I/O,
CPU usage, and memory usage as units of work.

3. Plan Generator

The optimizer compares the costs of plans and chooses the lowest-cost plan, known as
the execution plan, to pass to the row source generator.

The plan generator explores various plans for a query block by trying out different access
paths, join methods, and join orders. Many plans are possible because of the various
combinations that the database can use to produce the same result. The optimizer picks the
plan with the lowest cost.

c) CODE GENERATION

A code is generated to execute the selected plan.

d) RUNTIME QUERY EXECUTION

The query is executed at run time and the final result displayed.

7|Page

You might also like