SF8 - Unit 2 DDB
SF8 - Unit 2 DDB
costs.
Distributed systems also account for
communication costs.
Early Assumptions: Early distributed query
optimization focused on minimizing
communication costs, assuming communication
dominated local processing costs due to slow
networks.
Modern Networks: With faster communication
form.
Analysis: Checking the query for correctness.
algebra.
The first three steps focus on making sure the query
Edges:
An edge between two nodes (not the result node) represents
a join operation.
An edge that connects a node to the result represents a
projection operation.
A node that isn’t the result can be labeled with a
select operation or a self-join.
Join Graph:
An important part of the query graph is the join
optimization phase.
EMP(ENO, ENAME, TITLE)
PROJ(PNO, PNAME, BUDGET, LOC, CNAME)
ASG(ENO, PNO, RESP, DUR)
PAY(TITLE, SAL)
“Find the names and responsibilities of
3. p∧true ⇔ p
4. p∨ f alse ⇔ p
5. p∧ f alse ⇔ false
6. p∨true ⇔ true
7. p∧ ¬p ⇔ false
8. p∨ ¬p ⇔ true
9. p1 ∧(p1 ∨ p2) ⇔ p1
The root node represents the final result (from the SQL SELECT
clause).
The conditions (from the SQL WHERE clause) are turned into
join operation.
For vertically fragmented data, the localization
operator trees.
For N relations, the number of possible join trees
heuristics:
Perform selection and projection early.
Avoid unnecessary Cartesian products (e.g., operator
tree (c) in would not be considered).
Types of Join Trees:
Linear trees: At least one operand in each operation is a
base relation. This reduces the search space to O(2^N).
Bushy trees: More general, where both operands can be
intermediate results. These are useful in distributed
systems for parallel execution.
Parallelism in Bushy Trees:
In a distributed environment, bushy trees allow operations
to run in parallel, improving performance (e.g., join tree
(b)
SEARCH STRATEGY
Dynamic Programming in Query Optimization:
The most common search strategy is dynamic programming,
which is deterministic.
It builds query plans step-by-step, starting from base relations
and adding one relation at a time until the complete plan is
formed.
Dynamic programming creates all possible plans and selects
the best one.
Pruning is used to discard partial plans that are unlikely to be
optimal, reducing optimization costs.
Greedy Algorithm (Another Deterministic Strategy):
Unlike dynamic programming, the greedy algorithm builds
only one plan at a time, following a depth-first approach.
Pros and Cons of Dynamic Programming:
It guarantees finding the best plan but works well only when
there are fewer relations (5 or 6 or fewer).
For more complex queries, it becomes too expensive in terms of
time and memory.
Randomized Strategies for Complex Queries:
Randomized strategies are used for complex queries with
more relations.
These strategies reduce the complexity of optimization but
randomly.
Performance of Randomized Strategies:
Randomized strategies tend to perform better than
deterministic strategies when the query involves many
relations.
DISTRIBUTED COST MODEL
An optimizer’s cost model helps predict how
long it will take to run a query. It includes:
Cost functions: These predict the time
of the query.
1. COST FUNCTIONS
Cost of Distributed Execution Strategy:
The cost can be measured in two ways:
Total time: The sum of all time components involved in the
query.
Response time: The time from starting the query to
completing it.
Formula for Total Time:
Total time is calculated using this formula:
TCPU: Time for CPU instructions.
TI/O: Time for disk input/output (I/O).
TMSG: Time to send or receive a message.
TT R: Time to transfer data between sites.
Cost in Networks:
In wide-area networks (like the Internet),
tasks in parallel.
EXAMPLE OF TOTAL TIME VS.
RESPONSE TIME:
When transferring data from two sites to a third
site:
Total time adds up the time for both data transfers.
Response time considers the longer of the two
transfers, as they can happen in parallel.
Trade-Off Between Total Time and Response
Time:
Minimizing response time often requires more
two tables.
Join operations combine data from two tables, and the result
techniques.
Centralized optimization is simpler because it doesn't
Output:
output: Final result after execution.
Initialization:
Relation:
If n = 1 then , execute MRQ directly:
output ← run(MRQ).
Return the result as the final output.
Recursive Case:
If the MRQ has more than one relation:
The MRQ is decomposed into m one-relation queries
(ORQs) and a smaller MRQ (MRQ0').
Each ORQ is executed using the run function, and the
results are merged into output.
A relation R is chosen from the remaining MRQ
(MRQ0') for tuple substitution.
For each tuple t in R:
The values of t are substituted into MRQ0' to
create a new MRQ (MRQ'').
The algorithm is recursively called on MRQ''
Execute Joins
Determine the possible ordering of joins
Determine the cost of each ordering.
Choose the join ordering with the minimal
cost.
Example:
“Names of employees working on the CAD/CAM
project”
PROJ * ASG * EMP
EMP has an index on ENO
ASG has an index on PNO
PROJ has an index on PNO and an index on PNAME
irreducible queries.
Step 3: While there are irreducible queries:
query.
DISTRIBUTED QUERY OPTIMIZATION-
INGRES DYNAMIC APPROACH
Input:
The algorithm takes a multi-relation query
(MRQ) as input.
MRQ consists of multiple relations (tables)
execution.
Step 1: Execute Each Monorelation Query
(ORQ)
For each detachable query ORQi in MRQ:
ORQi is a monorelation query (a query involving only a
single relation).
Run each ORQi individually to get initial results.
left to process.
Step 3.1: Select the Next Query
Choose the next irreducible query MRQ' that involves the smallest
fragments (smallest pieces of data).
Step 3.2: Determine Transfer and Processing Strategy
Determine how to transfer fragments (data pieces) and where to
process the selected query (MRQ').
Use a function SELECT_STRATEGY to decide which fragments to move
and where to process them.
Step 3.3: Move Selected Fragments to the Chosen Sites
For each fragment F and site S in the fragment-site list:
Move fragment F to the site S (a specific server or database location).
Step 3.4: Execute the Query MRQ'
Execute the selected irreducible query MRQ' with the transferred
fragments.
Step 3.5: Update n
Decrease n by 1 to reflect that one more irreducible query has been
processed.
Step 4: Return the Final Output
The result of the last executed query (MRQ')