Module 2

Uploaded by

maruffpathan

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

Module 2

Uploaded by

maruffpathan

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Distributed Transaction management

What is a Distributed Transaction?

A distributed transaction is a set of operations on data that is performed across

two or more data repositories (especially databases). It is typically coordinated
across separate nodes connected by a network, but may also span multiple
databases on a single server.

There are two possible outcomes: 1) all operations successfully complete, or 2)

none of the operations are performed at all due to a failure somewhere in the
system. In the latter case, if some work was completed prior to the failure, that
work will be reversed to ensure no net work was done. This type of operation is in
compliance with the “ACID” (atomicity-consistency-isolation-durability) principles
of databases that ensure data integrity. ACID is most commonly associated with
transactions on a single database server, but distributed transactions extend that
guarantee across multiple databases.

The operation known as a “two-phase commit” (2PC) is a form of a distributed

transaction. “XA transactions” are transactions using the XA protocol, which is
one implementation of a two-phase commit operation.
Distributed Query Processing

Layers of Query Processing

The problem of query processing can itself be decomposed into several
subproblems, corresponding to various layers. A generic layering scheme for query
processing is shown where each layer solves a well-defined subproblem. To simplify
the discussion, let us assume a static and semicentralized query processor that does
not exploit replicated fragments. The input is a query on global data expressed in
relational calculus. This query is posed on global (distributed) relations, meaning that
data distribution is hidden. Four main layers are involved in distributed query
processing. The first three layers map the input query into an optimized distributed
query execution plan. They perform the functions of query decomposition, data
localization, and global query optimization. Query decomposition and data
localization correspond to query rewriting. The first three layers are performed by a
central control site and use schema information stored in the global directory. The
fourth layer performs distributed query execution by executing the plan and returns
the answer to the query. It is done by the local sites and the control site.

Generic Layering Scheme for Distributed Query Processing

Query Decomposition
The first layer decomposes the calculus query into an algebraic query on global
relations. The information needed for this transformation is found in the global
conceptual schema describing the global relations. However, the information about
data distribution is not used here but in the next layer. Thus the techniques used by
this layer are those of a centralized DBMS.

Query decomposition can be viewed as four successive steps. First, the calculus
query is rewritten in a normalized form that is suitable for subsequent manipulation.
Normalization of a query generally involves the manipulation of the query quantifiers
and of the query qualification by applying logical operator priority.

Second, the normalized query is analyzed semantically so that incorrect queries are
detected and rejected as early as possible. Techniques to detect incorrect queries
exist only for a subset of relational calculus. Typically, they use some sort of graph
that captures the semantics of the query.

Third, the correct query (still expressed in relational calculus) is simplified. One way
to simplify a query is to eliminate redundant predicates. Note that redundant queries
are likely to arise when a query is the result of system transformations applied to the
user query. Such transformations are used for performing semantic data control
(views, protection, and semantic integrity control).

Fourth, the calculus query is restructured as an algebraic query. That several

algebraic queries can be derived from the same calculus query, and that some
algebraic queries are “better” than others. The quality of an algebraic query is
defined in terms of expected performance. The traditional way to do this
transformation toward a “better” algebraic specification is to start with an initial
algebraic query and transform it in order to find a “good” one. The initial algebraic
query is derived immediately from the calculus query by translating the predicates
and the target statement into relational operators as they appear in the query. This
directly translated algebra query is then restructured through transformation rules.
The algebraic query generated by this layer is good in the sense that the worse
executions are typically avoided. For instance, a relation will be accessed only once,
even if there are several select predicates. However, this query is generally far from
providing an optimal execution, since information about data distribution and
fragment allocation is not used at this layer.

Data Localization
The input to the second layer is an algebraic query on global relations. The main role
of the second layer is to localize the query’s data using data distribution information
in the fragment schema. We saw that relations are fragmented and stored in disjoint
subsets, called fragments, each being stored at a different site. This layer determines
which fragments are involved in the query and transforms the distributed query into a
query on fragments. Fragmentation is defined by fragmentation predicates that can
be expressed through relational operators. A global relation can be reconstructed by
applying the fragmentation rules, and then deriving a program, called a localization
program, of relational algebra operators, which then act on fragments. Generating a
query on fragments is done in two steps. First, the query is mapped into a fragment
query by substituting each relation by its reconstruction program (also called
materialization program). Second, the fragment query is simplified and restructured
to produce another “good” query. Simplification and restructuring may be done
according to the same rules used in the decomposition layer. As in the
decomposition layer, the final fragment query is generally far from optimal because
information regarding fragments is not utilized.

Global Query Optimization

The input to the third layer is an algebraic query on fragments. The goal of query
optimization is to find an execution strategy for the query which is close to optimal.
Remember that finding the optimal solution is computationally intractable. An
execution strategy for a distributed query can be described with relational algebra
operators and communication primitives (send/receive operators) for transferring
data between sites. The previous layers have already optimized the query, for
example, by eliminating redundant expressions. However, this optimization is
independent of fragment characteristics such as fragment allocation and
cardinalities. In addition, communication operators are not yet specified. By
permuting the ordering of operators within one query on fragments, many equivalent
queries may be found.

Query optimization consists of finding the “best” ordering of operators in the query,
including communication operators that minimize a cost function. The cost function,
often defined in terms of time units, refers to computing resources such as disk
space, disk I/Os, buffer space, CPU cost, communication cost, and so on. Generally,
it is a weighted combination of I/O, CPU, and communication costs. Nevertheless, a
typical simplification made by the early distributed DBMSs, as we mentioned before,
was to consider communication cost as the most significant factor. This used to be
valid for wide area networks, where the limited bandwidth made communication
much more costly than local processing. This is not true anymore today and
communication cost can be lower than I/O cost. To select the ordering of operators it
is necessary to predict execution costs of alternative candidate orderings.
Determining execution costs before query execution (i.e., static optimization) is
based on fragment statistics and the formulas for estimating the cardinalities of
results of relational operators. Thus the optimization decisions depend on the
allocation of fragments and available statistics on fragments which are recorder in
the allocation schema.

An important aspect of query optimization is join ordering, since permutations of the

joins within the query may lead to improvements of orders of magnitude. One basic
technique for optimizing a sequence of distributed join operators is through the
semijoin operator. The main value of the semijoin in a distributed system is to reduce
the size of the join operands and then the communication cost. However, techniques
which consider local processing costs as well as communication costs may not use
semijoins because they might increase local processing costs. The output of the
query optimization layer is a optimized algebraic query with communication operators
included on fragments. It is typically represented and saved (for future executions) as
a distributed query execution plan.

Distributed Query Execution

The last layer is performed by all the sites having fragments involved in the query.
Each subquery executing at one site, called a local query, is then optimized using the
local schema of the site and executed. At this time, the algorithms to perform the
relational operators may be chosen. Local optimization uses the algorithms of
centralized systems.

The goal of distributed query processing may be summarized as follows: given a

calculus query on a distributed database, find a corresponding execution strategy
that minimizes a system cost function that includes I/O, CPU, and communication
costs. An execution strategy is specified in terms of relational algebra operators and
communication primitives (send/receive) applied to the local databases (i.e., the
relation fragments). Therefore, the complexity of relational operators that affect the
performance of query execution is of major importance in the design of a query
processor.

Distributed Concurrency Control

zz
2. Two phase Locking protocol
3. Timestamp ordering protocol
Recovery in Distributed databases

Lect#2 DDBS (Characteristics and Layers of Query Processing)
78% (9)
Lect#2 DDBS (Characteristics and Layers of Query Processing)
20 pages
Query Proceessing
No ratings yet
Query Proceessing
5 pages
Unit II QUERY PROCESSING AND DECOMPOSITION
No ratings yet
Unit II QUERY PROCESSING AND DECOMPOSITION
24 pages
A Survey of Distributed Query Optimization
No ratings yet
A Survey of Distributed Query Optimization
10 pages
QoMoD: Effective Query Optimization in Mobile Database Systems
No ratings yet
QoMoD: Effective Query Optimization in Mobile Database Systems
9 pages
Adaptive Query Processing
No ratings yet
Adaptive Query Processing
140 pages
Assignment # 2: Submitted by Submitted To Class Semester Roll No
No ratings yet
Assignment # 2: Submitted by Submitted To Class Semester Roll No
9 pages
Run-Time Optimizations of Join Queries For Distributed Databases Over The Internet
No ratings yet
Run-Time Optimizations of Join Queries For Distributed Databases Over The Internet
22 pages
Operating Systems: Why Object-Oriented?
No ratings yet
Operating Systems: Why Object-Oriented?
7 pages
Distributed Database Overview
No ratings yet
Distributed Database Overview
4 pages
Query Optimization in Distributed Systems
No ratings yet
Query Optimization in Distributed Systems
4 pages
PP
No ratings yet
PP
4 pages
Compusoft, 3 (10), 1108-115 PDF
No ratings yet
Compusoft, 3 (10), 1108-115 PDF
8 pages
Topology Resource
No ratings yet
Topology Resource
7 pages
Query Optimization
No ratings yet
Query Optimization
11 pages
Transaction Processing in Replicated Data in The DDBMS: Ashish Srivastava, Udai Shankar, Sanjay Kumar Tiwari
No ratings yet
Transaction Processing in Replicated Data in The DDBMS: Ashish Srivastava, Udai Shankar, Sanjay Kumar Tiwari
8 pages
Top Down Database Design
No ratings yet
Top Down Database Design
4 pages
Unit - V: Database Database Management System Storage Devices CPU Computers Network
No ratings yet
Unit - V: Database Database Management System Storage Devices CPU Computers Network
4 pages
An Optimized Scheme for Vertical Partitioning of A
No ratings yet
An Optimized Scheme for Vertical Partitioning of A
8 pages
Computer Network Topology
No ratings yet
Computer Network Topology
5 pages
Bca3020 Unit 11 SLM
No ratings yet
Bca3020 Unit 11 SLM
22 pages
Sudhansu,DBMS-3rd
No ratings yet
Sudhansu,DBMS-3rd
6 pages
Introduction To Query Processing and Optimization
No ratings yet
Introduction To Query Processing and Optimization
4 pages
Index: S.no - Topic Date Sign
No ratings yet
Index: S.no - Topic Date Sign
38 pages
Scheduling of Operating System Services: Stefan Bonfert
No ratings yet
Scheduling of Operating System Services: Stefan Bonfert
2 pages
Designing High Performance Web-Based (1)
No ratings yet
Designing High Performance Web-Based (1)
9 pages
DDBMS-Chapter-4-SE-LectureNote (Version 1)
No ratings yet
DDBMS-Chapter-4-SE-LectureNote (Version 1)
11 pages
DIBAS - A Management System For Distributed Databases: by Eirik Dahle and Helge Berg
No ratings yet
DIBAS - A Management System For Distributed Databases: by Eirik Dahle and Helge Berg
11 pages
Data Handling in I.O.T: R.K.Biradar
No ratings yet
Data Handling in I.O.T: R.K.Biradar
17 pages
ADBMS Mid-2 Imp
No ratings yet
ADBMS Mid-2 Imp
7 pages
4005 BDA ASSSIGNMENT 2 (1)
No ratings yet
4005 BDA ASSSIGNMENT 2 (1)
9 pages
SF8 - UNIT 2 DDB
No ratings yet
SF8 - UNIT 2 DDB
97 pages
Query Processing and Optimization
No ratings yet
Query Processing and Optimization
28 pages
Heuristic-Based Query Optimization
No ratings yet
Heuristic-Based Query Optimization
6 pages
The Effect of Number of Agents On Optimization of Adaptivity Join Queries in Heterogeneous Distributed Databases
No ratings yet
The Effect of Number of Agents On Optimization of Adaptivity Join Queries in Heterogeneous Distributed Databases
5 pages
Fdbms Final
No ratings yet
Fdbms Final
8 pages
Ans: A: 1. Describe The Following: Dimensional Model
No ratings yet
Ans: A: 1. Describe The Following: Dimensional Model
8 pages
05 Unit5
No ratings yet
05 Unit5
22 pages
TOPIC III_Database Architecture
No ratings yet
TOPIC III_Database Architecture
5 pages
Pipes and Filters Pattern
No ratings yet
Pipes and Filters Pattern
10 pages
Bcs Higher Education Qualifications BCS Level 6 Professional Graduate Diploma in IT April 2011 Examiners' Report Distributed & Parallel Systems
No ratings yet
Bcs Higher Education Qualifications BCS Level 6 Professional Graduate Diploma in IT April 2011 Examiners' Report Distributed & Parallel Systems
7 pages
SEM 4 MC0077 Advances Database System
No ratings yet
SEM 4 MC0077 Advances Database System
38 pages
Unit 2
No ratings yet
Unit 2
73 pages
UT 1
No ratings yet
UT 1
5 pages
DYFRAM: Dynamic Fragmentation and Replica Management in Distributed Database Systems
No ratings yet
DYFRAM: Dynamic Fragmentation and Replica Management in Distributed Database Systems
25 pages
Classification and A Brief Analysis of Existing Developments For The Search Optimization in Databases
No ratings yet
Classification and A Brief Analysis of Existing Developments For The Search Optimization in Databases
2 pages
Lab Internal Questions
No ratings yet
Lab Internal Questions
11 pages
First Normal Form
No ratings yet
First Normal Form
28 pages
ADBMS Sem 1 Mumbai University (MSC - CS)
No ratings yet
ADBMS Sem 1 Mumbai University (MSC - CS)
39 pages
Literature Survey and Related Works
100% (2)
Literature Survey and Related Works
8 pages
Slicing A New Approach To Privacy Preserving Data Publishing
No ratings yet
Slicing A New Approach To Privacy Preserving Data Publishing
19 pages
Network-Assisted Mobile Computing With Optimal Uplink Query Processing
No ratings yet
Network-Assisted Mobile Computing With Optimal Uplink Query Processing
11 pages
14-queryexecution2 (1)
No ratings yet
14-queryexecution2 (1)
6 pages
Load Balancing Using Remote Method Invocation (JAVA RMI)
No ratings yet
Load Balancing Using Remote Method Invocation (JAVA RMI)
6 pages
Query Optimization
No ratings yet
Query Optimization
3 pages
1999 Ripple Join
No ratings yet
1999 Ripple Join
12 pages
DC Assignment
No ratings yet
DC Assignment
3 pages
A Systematic Approach To Composing and Optimizing Application Workflows
No ratings yet
A Systematic Approach To Composing and Optimizing Application Workflows
9 pages
Vin-Load Balancing On Grid
No ratings yet
Vin-Load Balancing On Grid
19 pages
Graph Layout Support for Model-Driven Engineering
From Everand
Graph Layout Support for Model-Driven Engineering
Miro Spönemann
No ratings yet
Database Administration Level IV Practical Exam 4
0% (1)
Database Administration Level IV Practical Exam 4
2 pages
Chapter3 DSDLC
No ratings yet
Chapter3 DSDLC
32 pages
Tech Roaster Oct DBMS 2017
No ratings yet
Tech Roaster Oct DBMS 2017
15 pages
XML Parsing
No ratings yet
XML Parsing
3 pages
SQL Quick Reference From W3Schools
No ratings yet
SQL Quick Reference From W3Schools
3 pages
Lecture 5 - MapReduce
No ratings yet
Lecture 5 - MapReduce
43 pages
Excel Chapter - 11
No ratings yet
Excel Chapter - 11
14 pages
Person Entitlement Detail Loader Functionality
No ratings yet
Person Entitlement Detail Loader Functionality
7 pages
MT6765 Android Scatter
No ratings yet
MT6765 Android Scatter
16 pages
10 Backups
No ratings yet
10 Backups
4 pages
Oracle SQL Cheatsheet
No ratings yet
Oracle SQL Cheatsheet
2 pages
Nova Guliyev: SR Data Consultant
No ratings yet
Nova Guliyev: SR Data Consultant
6 pages
Hitachi Datasheet Virtual Storage Platform G Series
No ratings yet
Hitachi Datasheet Virtual Storage Platform G Series
2 pages
Valantic Business Analytics Case Study Mosca EN
No ratings yet
Valantic Business Analytics Case Study Mosca EN
10 pages
Steps To Move SQL Server Log Shipping Secondary Database Files
No ratings yet
Steps To Move SQL Server Log Shipping Secondary Database Files
6 pages
(eBook PDF) Data Mining Concepts and Techniques 3rdinstant download
100% (3)
(eBook PDF) Data Mining Concepts and Techniques 3rdinstant download
44 pages
MySQL技术内幕InnoDB存储引擎第2版
No ratings yet
MySQL技术内幕InnoDB存储引擎第2版
437 pages
DA 100 Mod11 ENU PowerPoint
No ratings yet
DA 100 Mod11 ENU PowerPoint
21 pages
Azure - Interview Questions (Beginners To Expert)
100% (1)
Azure - Interview Questions (Beginners To Expert)
14 pages
Chapter 5 of PHP (WBP)
No ratings yet
Chapter 5 of PHP (WBP)
25 pages
Maidlang Dec
No ratings yet
Maidlang Dec
27 pages
Handout Lab1 Assignment
No ratings yet
Handout Lab1 Assignment
2 pages
CC ZG522 Course Handout
No ratings yet
CC ZG522 Course Handout
6 pages
Subqueries in SQL
No ratings yet
Subqueries in SQL
13 pages
0409 Candidate List Barangay 20230922 120948
No ratings yet
0409 Candidate List Barangay 20230922 120948
48 pages
GCD - Entwicklertag Presentation PDF
No ratings yet
GCD - Entwicklertag Presentation PDF
24 pages
Informatica MDM Course
No ratings yet
Informatica MDM Course
3 pages
User Guide For MDB Viewer Plus
No ratings yet
User Guide For MDB Viewer Plus
8 pages
ICT Lecture 07
No ratings yet
ICT Lecture 07
48 pages
Rdbms Practical
No ratings yet
Rdbms Practical
38 pages