0% found this document useful (0 votes)
18 views

Query Decomposition[1]

Uploaded by

gujjaresa
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Query Decomposition[1]

Uploaded by

gujjaresa
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 23

QUERY

DECOMPOSITION

DISTRIBUTED DATABASE MANAGEMENT SYSTEM


Group Members:

• Ayesha
• Minahil
• Areeba
• Eshwa
• Mahnoor
• Zainab

DISTRIBUTED DATABASE MANAGEMENT SYSTEM


Query Decomposition

• The query decomposition is the first phase of query processing


whose aims are to transform a high-level query into a relational
algebra query and to check whether that query is syntactically and
semantically correct.
• Thus, a query decomposition phase starts with a high-level query
and transforms it into a query graph of low-level operations
(algebraic expressions), which satisfies the query.

DISTRIBUTED DATABASE MANAGEMENT SYSTEM


• Mapping of calculus query (SQL) to algebra operations
(select, project, join, rename).
• Both input and output queries refer to global relations,
without knowledge of the distribution of data.
• The output query is semantically correct and good in
the sense that redundant work is avoided.

DISTRIBUTED DATABASE MANAGEMENT SYSTEM


STEPS OF QUERY DECOMPOSITION

Query decomposition consists of 4 steps:


• Normalization: Transform query to a normalized form.
• Analysis: Detect and reject incorrect queries; possible only for a subset of
relational calculus.
• Elimination of redundancy: Eliminate redundant predicates.
• Rewriting: Transform query to RA (Relational Algebra) and optimize query
DDB (Distributed Database).

DISTRIBUTED DATABASE MANAGEMENT SYSTEM


Normalization:

• Normalization refers to transform the query to a


normalized form to facilitate further processing.
• It consists of mainly two steps:
1.Lexical and syntactic analysis
2.Put into Normal form

DISTRIBUTED DATABASE MANAGEMENT SYSTEM


1. Lexical and Syntactic Analysis

Lexical Analysis:
• This is the first part of query normalization, where the query is analyzed at the token
level.
• The SQL query is broken down into basic components (tokens) such as keywords
(SELECT, FROM, WHERE), identifiers (table names, column names), operators (=, AND,
OR), and literals (values like 'Sales', 50000).
• The lexical analyzer checks if these tokens conform to the language syntax rules,
ensuring that each component of the query is correctly identified and classified.

DISTRIBUTED DATABASE MANAGEMENT SYSTEM


Syntactic Analysis:
• After the lexical analysis, the syntactic analysis verifies the structure of the query.
It checks whether the sequence of tokens forms a valid SQL statement according
to the grammar rules of the query language.
• For example, it ensures that the SELECT clause is followed by valid column names,
the FROM clause specifies valid table names, and conditions in the WHERE clause
are properly structured.
• Any syntax errors, such as missing keywords or mismatched parentheses, are
detected at this stage.

DISTRIBUTED DATABASE MANAGEMENT SYSTEM


2. Put into Normal Form (CNF or DNF)

• After the query passes the lexical and syntactic analysis, the
next step is to transform the query into a normal form.

Two common normal forms in query decomposition are:


1. Conjunctive Normal Form (CNF)
2. Disjunctive Normal Form (DNF).

DISTRIBUTED DATABASE MANAGEMENT SYSTEM


• Conjunctive Normal Form (CNF):

CNF is a way of writing logical expressions where you have


multiple conditions connected by "AND", and within each
condition, you have parts connected by "OR."
• Disjunctive Normal Form (DNF):

DNF is a way of writing logical expressions where you have


multiple conditions connected by "OR", and within each
condition, you have parts connected by "AND."

DISTRIBUTED DATABASE MANAGEMENT SYSTEM


I - NORMALIZATION EXAMPLE
- Consider the following query: Find the name of employees who have been working on a project P1 for
12 to 24 months?
• The query in SQL:

SELECT ENAME
FROM EMP, ASG
WHERE EMP. ENO = ASG.ENO AND
ASG. PNO = 'P1' AND
DUR = 12 OR DUR = 24

• The qualification in conjunctive normal form:

EMP.ENO = ASG.ENO and ASG.PNO = 'P1' and (DUR = 12 or DUR = 24)

• The qualification in disjunctive normal form:

(EMP.ENO = ASG.ENO and ASG.PNO = 'P1' and DUR = 12) or


(EMP.ENO = ASG.ENO and ASG.PNO = 'P1' and DUR = 24)

DISTRIBUTED DATABASE MANAGEMENT SYSTEM


ANALYSIS

• It aims to Identify and reject type incorrect or semantically


incorrect queries.
Type incorrect:
• Checks whether the attributes and relation names of a query
are defined in the global schema.
• Checks whether the operations on attributes do not conflict
with the types of the attributes, e.g., a comparison > operation
with an attribute of type string.

DISTRIBUTED DATABASE MANAGEMENT SYSTEM


ANALYSIS

Semantically incorrect
• Checks whether the components contribute in any way to the
generation of the results.
• Only a subset of relational calculus queries can be tested for
correctness, i.e., those that do not contain disjunction and
negation.
• Typical data structures used to detect the semantically
incorrect queries are:
1. Conjunction graph (query graph)
2. Join graph
DISTRIBUTED DATABASE MANAGEMENT SYSTEM
ANALYSIS EXAMPLE
• Example: Consider a query:

SELECT ENAME, RESP


FROM EMP, ASG, PROJ
WHERE EMP. ENO = ASG. ENO
AND ASG. PNO = PROJ. PNO
AND PNAME = "CAD/ CAM"
AND DUR >_36
AND TITLE = "Programmer“

• Query/connection graph
- Nodes represent operand or result relation
- Edge represents a join if both connected nodes represent an
operand relation, otherwise it is a projection

• Join graph
- a subgraph of the query graph that consider are only the joins
•Once the query graph is connected, the query is semantically
correct. DISTRIBUTED DATABASE MANAGEMENT SYSTEM
ANALYSIS EXAMPLE
• Example: Consider the following query and its query graph:

SELECT ENAME, RESP


FROM EMP, ASG, PROJ
WHERE EMP.ENO = ASG.ENO
AND PNAME = "CAD/CAM"
AND DUR ≥ 36
AND TITLE = "Programmer"

• Since the graph is not connected, the query is semantically


incorrect.
• 3 possible solutions:

1. Reject the query.


2. Assume an implicit Cartesian Product between ASG and PROJ.
3. Infer from the schema the missing join predicate ASG.PNO =
PROJ.PNO.

DISTRIBUTED DATABASE MANAGEMENT SYSTEM


3. ELIMINATION OF REDUNDANCY

• Elimination of redundancy: Simplify the query by eliminate redundancies,


e.g., redundant predicates.
• Redundancies are often due to semantic integrity constraints expressed in
the query language. e.g., queries on views are expanded into queries on
relations that satisfy certain integrity and security constraints.

DISTRIBUTED DATABASE MANAGEMENT SYSTEM


ELIMINATION OF REDUNDANCY
• Example: Consider the following query:
SELECT TITLE
FROM EMP
WHERE EMP. ENAME = "J. Doe"
OR (NOT (EMP. TITLE = "Programmer")
AND ( EMP.TITLE = "Elect. Eng."
OR EMP. TITLE = "Programmer" )
AND NOT (EMP. TITLE = "Elect. Eng.") )
• Let p1 be ENAME = "J. Doe", p2 be TITLE = "Programmer and ps be TITLE =
"Elect. Eng."
• Then the qualification can be written as p. V (-pz ^(Pz V ps) ^ -p3) and then be
transformed into p1
• Simplified query:
SELECT TITLE
FROM EMP
WHERE EMP. ENAME = "J. Doe"
DISTRIBUTED DATABASE MANAGEMENT SYSTEM
4. REWRITING
• It convert relational calculus query to relational algebra query and find an eflicient
expression.
• Transform query to RA and optimize query.

DISTRIBUTED DATABASE MANAGEMENT SYSTEM


REWRITING EXAMPLE
• Example: Find the names of employees other than J. Doe who worked on the
CAD/CAM project for either 1 or 2 years
SELECT ENAME
FROM EMP, ASG, PROJ
WHERE EMP. ENO = ASG. ENO
AND ASG. PNO = PROJ. PNO
AND ENAME ≠ " J. Doe"
AND PNAME • "CAD/CAM"
AND (DUR = 12 OR DUR = 24)
• A query tree represents the RA-expression
- Relations are leaves (FROM clause)
- Result attributes are root (SELECT clause)
- Intermediate
leaves should give a result
from the leaves to the root

DISTRIBUTED DATABASE MANAGEMENT SYSTEM


REWRITING EXAMPLE
• By applying transformation rules, many different trees/expressions may be found that
are equivalent to the original tree/expression, but might be more efficient.
• In the following we assume relations R(A1,..., An), S(Bi.., B,), and T which is union-
compatible to R
.
• Commutativity of binary operations
- Rx S= Sx R
- Rx S= Sx R
- RUS = SUR

• Associativity of binary operations


- (Rx S) xT= Rx (S x T)
- (R x S) x T = R x (S x T)

• Idempotence of unary operations


- Пл(Пл(R)) = Пл(R)
- OpI(11)(0p2(12)(R)) = 0p1(11)0p2(A2)(R)

DISTRIBUTED DATABASE MANAGEMENT SYSTEM


Thank
You
DISTRIBUTED DATABASE MANAGEMENT SYSTEM

You might also like