Module 2
Module 2
Query Decomposition
The first layer decomposes the calculus query into an algebraic query on global
relations. The information needed for this transformation is found in the global
conceptual schema describing the global relations. However, the information about
data distribution is not used here but in the next layer. Thus the techniques used by
this layer are those of a centralized DBMS.
Query decomposition can be viewed as four successive steps. First, the calculus
query is rewritten in a normalized form that is suitable for subsequent manipulation.
Normalization of a query generally involves the manipulation of the query quantifiers
and of the query qualification by applying logical operator priority.
Second, the normalized query is analyzed semantically so that incorrect queries are
detected and rejected as early as possible. Techniques to detect incorrect queries
exist only for a subset of relational calculus. Typically, they use some sort of graph
that captures the semantics of the query.
Third, the correct query (still expressed in relational calculus) is simplified. One way
to simplify a query is to eliminate redundant predicates. Note that redundant queries
are likely to arise when a query is the result of system transformations applied to the
user query. Such transformations are used for performing semantic data control
(views, protection, and semantic integrity control).
Data Localization
The input to the second layer is an algebraic query on global relations. The main role
of the second layer is to localize the query’s data using data distribution information
in the fragment schema. We saw that relations are fragmented and stored in disjoint
subsets, called fragments, each being stored at a different site. This layer determines
which fragments are involved in the query and transforms the distributed query into a
query on fragments. Fragmentation is defined by fragmentation predicates that can
be expressed through relational operators. A global relation can be reconstructed by
applying the fragmentation rules, and then deriving a program, called a localization
program, of relational algebra operators, which then act on fragments. Generating a
query on fragments is done in two steps. First, the query is mapped into a fragment
query by substituting each relation by its reconstruction program (also called
materialization program). Second, the fragment query is simplified and restructured
to produce another “good” query. Simplification and restructuring may be done
according to the same rules used in the decomposition layer. As in the
decomposition layer, the final fragment query is generally far from optimal because
information regarding fragments is not utilized.
Query optimization consists of finding the “best” ordering of operators in the query,
including communication operators that minimize a cost function. The cost function,
often defined in terms of time units, refers to computing resources such as disk
space, disk I/Os, buffer space, CPU cost, communication cost, and so on. Generally,
it is a weighted combination of I/O, CPU, and communication costs. Nevertheless, a
typical simplification made by the early distributed DBMSs, as we mentioned before,
was to consider communication cost as the most significant factor. This used to be
valid for wide area networks, where the limited bandwidth made communication
much more costly than local processing. This is not true anymore today and
communication cost can be lower than I/O cost. To select the ordering of operators it
is necessary to predict execution costs of alternative candidate orderings.
Determining execution costs before query execution (i.e., static optimization) is
based on fragment statistics and the formulas for estimating the cardinalities of
results of relational operators. Thus the optimization decisions depend on the
allocation of fragments and available statistics on fragments which are recorder in
the allocation schema.
zz
2. Two phase Locking protocol
3. Timestamp ordering protocol
Recovery in Distributed databases