Chapter 6: Query Decomposition and Data Localization
Chapter 6: Query Decomposition and Data Localization
DDB 2008/09
J. Gamper
Page 1
Query Decomposition
Query decomposition: Mapping of calculus query (SQL) to algebra operations (select, project, join, rename)
Both input and output queries refer to global relations, without knowledge of the distribution of data.
The query in SQL: SELECT ENAME FROM EMP, ASG WHERE EMP.ENO = ASG.ENO AND ASG.PNO = P1 AND DUR = 12 OR DUR = 24 The qualication in conjunctive normal form:
EM P.EN O = ASG.EN O ASG.P N O = P 1 (DU R = 12 DU R = 24)
Analysis: Identify and reject type incorrect or semantically incorrect queries Type incorrect
Checks whether the attributes and relation names of a query are dened in the global schema Checks whether the operations on attributes do not conict with the types of the attributes, e.g., a comparison > operation with an attribute of type string
Semantically incorrect
Checks whether the components contribute in any way to the generation of the result Only a subset of relational calculus queries can be tested for correctness, i.e., those that do not contain disjunction and negation Typical data structures used to detect the semantically incorrect queries are: Connection graph (query graph) Join graph
DDB 2008/09
J. Gamper
Page 5
ENAME,RESP EMP, ASG, PROJ EMP.ENO = ASG.ENO ASG.PNO = PROJ.PNO PNAME = "CAD/CAM" DUR 36 TITLE = "Programmer"
Query/connection graph
Nodes represent operand or result relation Edge represents a join if both connected nodes represent an operand relation, otherwise it is a projection
Join graph
a subgraph of the query graph that considers only the joins
ENAME,RESP EMP, ASG, PROJ EMP.ENO = ASG.ENO PNAME = "CAD/CAM" DUR 36 TITLE = "Programmer"
Since the graph is not connected, the query is semantically incorrect. 3 possible solutions:
Reject the query Assume an implicit Cartesian Product between ASG and PROJ Infer from the schema the missing join predicate ASG.PNO = PROJ.PNO
DDB 2008/09
J. Gamper
Page 7
Transformation rules are used, e.g., p p p p p p p true p p f alse p p f alse f alse p true true p p f alse p p true p1 (p1 p2 ) p1 p1 (p1 p2 ) p1
DDB 2008/09 J. Gamper Page 8
Then the qualication can be written as p1 (p2 (p2 p3 ) p3 ) and then be transformed into p1 Simplied query:
SELECT FROM WHERE
DDB 2008/09
ENAME EMP, ASG, PROJ EMP.ENO = ASG.ENO ASG.PNO = PROJ.PNO ENAME = "J. Doe" PNAME = "CAD/CAM" (DUR = 12 OR DUR = 24)
DDB 2008/09
J. Gamper
Page 10
In the following we assume relations R(A1 , . . . , An ), S (B1 , . . . , Bn ), and T which is union-compatible to R. Commutativity of binary operations RS =SR R S=S R RS =SR Associativity of binary operations (R S ) T = R (S T ) (R S) T =R (S T) Idempotence of unary operations A (A (R)) = A (R) p1(A1) (p2(A2) (R)) = p1(A1)p2(A2) (R)
DDB 2008/09 J. Gamper Page 11
p(A1 ) (R p(A2 ,B2 ) S ) p(A1 ) (R) p(A2 ,B2 ) S p(A) (R T ) p(A) (R) p(A) (T ) (A belongs to R and T )
Commuting projection with binary operations (assume C = A B , A A, B B ) C (R S ) A (R) B (S ) C (R p(A ,B ) S ) A (R) p(A ,B ) B (S )
C (R S ) C (R) C (S )
DDB 2008/09
J. Gamper
Page 12
DDB 2008/09
J. Gamper
Page 13
DDB 2008/09
J. Gamper
Page 14
Data Localization
Data localization
Input: Algebraic query on global conceptual schema Purpose: Apply data distribution information to the algebra operations and determine which fragments are involved Substitute global query with queries on fragments Optimize the global query
DDB 2008/09
J. Gamper
Page 15
Data Localization . . .
Example:
Assume EMP is horizontally fragmented into EMP1, EMP2, EMP3 as follows:
ASG1 = EN OE 3 (ASG) ASG2 = EN O>E 3 (ASG) Simple approach: Replace in all queries EMP by (EMP1EMP2 EMP3) ASG by (ASG1ASG2)
Result is also called generic query
Data Localization . . .
Example (contd.): Parallelsim in the evaluation is often possible
Depending on the horizontal fragmentation, the fragments can be joined in parallel followed by the union of the intermediate results.
DDB 2008/09
J. Gamper
Page 17
Data Localization . . .
Example (contd.): Unnecessary work can be eliminated e.g., EM P3 ASG1 gives an empty result EM P 3 = EN O>E 6 (EM P ) ASG1 = EN OE 3 (ASG)
DDB 2008/09
J. Gamper
Page 18
Various more advanced reduction techniques are possible to generate simpler and
optimized queries.
DDB 2008/09
J. Gamper
Page 19
(R 1 R 2 ) S (R1 S ) (R 2 S)
Rule 2: Useless joins of fragments, Ri = pi (R) and Rj = pj (R), can be determined when the qualications of the joined fragments are contradicting, i.e.,
Ri R1 R2 R3
DDB 2008/09
pi ,p2 Ri R2
pi ,p3 Ri R3
Page 21
Generic query
The query reduced by distributing joins over unions and applying rule 2 can be implemented as a union of three partial joins that can be done in parallel.
DDB 2008/09
J. Gamper
Page 22
If the fragmentation is not on the same predicate as the join (as in the previous
example), derived horizontal fragmentation can be applied in order to make efcient join processing possible.
Example: Assume the following query and fragmentation of the EMP relation:
Query: SELECT * FROM EMP, ASG WHERE EMP.ENO=ASG.ENO Fragmentation (not on the join attribute): EMP1 = TITLE=Prgrammer (EMP)
EM P2 =
Page 23
Rule 3: D,K (Ri ) is useless if the set of projection attributes D is not in A i and K is the key attribute. Note that the result is not empty, but it is useless, as it contains only the key attribute.
DDB 2008/09
J. Gamper
Page 24
Generic query
Reduced query
By commuting the projection with the join (i.e., projecting on ENO, ENAME), we can see that the projection on EMP2 is useless because ENAME is not in EMP2 .
DDB 2008/09
J. Gamper
Page 25
Conclusion
Query decomposition and data localization maps calculus query into algebra operations
and applies data distribution information to the algebra operations.
Data localization reduces horizontal fragmentation with join and selection, and vertical
fragmentation with joins, and aims to nd empty relations.
DDB 2008/09
J. Gamper
Page 26