0% found this document useful (0 votes)
58 views

ADMT chp3

The presentation covers topics about distributed databases including introduction, architecture, design issues, fragmentation, allocation, and query processing. It is meant for educational purposes for students at St. Francis Institute of Technology and modifications or distribution of the content is prohibited. Key topics include definitions of distributed databases, advantages like transparency and increased reliability, data fragmentation methods, and additional functions of distributed database systems.

Uploaded by

Wahid Ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views

ADMT chp3

The presentation covers topics about distributed databases including introduction, architecture, design issues, fragmentation, allocation, and query processing. It is meant for educational purposes for students at St. Francis Institute of Technology and modifications or distribution of the content is prohibited. Key topics include definitions of distributed databases, advantages like transparency and increased reliability, data fragmentation methods, and additional functions of distributed database systems.

Uploaded by

Wahid Ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 111

The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes.

Distribution and modifications of the content is prohibited.

Distributed Databases
Slides by: Ms. Shree Jaswal

St. Francis Institute of Technology, Department of Information Technology


The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

2 Topics to be covered
 Introduction : Distributed Data Processing,
 What is a Distributed Database System? Design Issues .
 Distributed DBMS Architecture.
 Distributed Database Design : Top-Down Design Process,
 Distribution Design Issues, Fragmentation , Allocation .
 Topic Beyond Syllabus:
 Overview of Query Processing : Query Processing Problem,
Objectives of Query Processing,
Layers of Query Processing,
 Concurrency Control in Distributed Database system
 Recovery algorithms in Distributed Database system
 Self-learning Topics: Query Optimization in Distributed Databases
ADMT Chp3 Slides by: Ms. Shree J.
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

3 Which chapter? Which Book?


 Chapter 25: Distributed Databases, Elmasri and
Navathe, “Fundamentals of Database Systems”, 6th
Edition, PEARSON Education.
 Chapter 19: Distributed Databases, Korth, Slberchatz,
Sudarshan, :”Database System Concepts”, 6th Edition,
McGraw – Hill
 Chapter 3: Distributed database design,Tamer Özsu &
Patrick Valduriez, “Principles of distributed database
systems”, 3rd edition
 Chapter 6: Tamer Özsu & Patrick Valduriez, “Principles of
distributed database systems”, 3rd edition

ADMT Chp3 Slides by: Ms. Shree J.


The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

4
Distributed Database Concepts

A transaction can be executed by


multiple networked computers in a
unified manner.
A distributed database (DDB)
processes Unit of execution (a
transaction) in a distributed manner.
A distributed database (DDB) can
be defined as….

ADMT Chp3 Slides by: Ms. Shree J.


The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

5 Distributed Database Concepts

A distributed database (DDB) is a


collection of multiple logically related
database distributed over a computer
network, and a distributed database
management system as a software
system that manages a distributed
database while making the distribution
transparent to the user.
ADMT Chp3 Slides by: Ms. Shree J.
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

6 Distributed Database System


 Advantages
Management of distributed data with different levels of
transparency:
This refers to the physical placement of data (files,
relations, etc.) which is not known to the user
(distribution transparency).

ADMT Chp3 Slides by: Ms. Shree J.


The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

7
Distributed Database System
Advantages (transparency, contd.)
The EMPLOYEE, PROJECT, and WORKS_ON
tables may be fragmented horizontally and
stored with possible replication as shown
below.

ADMT Chp3 Slides by: Ms. Shree J.


The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

8 Distributed Database System Advantages


Distribution and Network transparency:
Users do not have to worry about
operational details of the network.
There is Location transparency, which refers
to freedom of issuing command from any
location without affecting its working.
Then there is Naming transparency, which
allows access to any names object (files,
relations, etc.) from any location.
ADMT Chp3 Slides by: Ms. Shree J.
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

9 Distributed Database System Advantages


Replication transparency:
It allows to store copies of a data at
multiple sites
This is done to minimize access time to the
required data.
Fragmentation transparency:
Allows to fragment a relation horizontally
(create a subset of tuples of a relation) or
vertically (create a subset of columns of a
relation).
ADMT Chp3 Slides by: Ms. Shree J.
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

10 Distributed Database System Advantages


Increased reliability and availability:
Reliability refers to system live time, that is,
system is running efficiently most of the
time. Availability is the probability that the
system is continuously available (usable or
accessible) during a time interval.
A distributed database system has multiple
nodes (computers) and if one fails then
others are available to do the job.
ADMT Chp3 Slides by: Ms. Shree J.
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

Distributed Database System Advantages


11
- Improved performance:
A distributed DBMS fragments the
database to keep data closer to
where it is needed most (Data
Localization).
This reduces data management
(access and modification) time
significantly.
Easier expansion (scalability):
Allows new nodes (computers) to
be added anytime without chaining
the entire configuration.
ADMT Chp3 Slides by: Ms. Shree J.
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

12 Additional functions of DDB

Keeping track of data distribution


Distributed transaction management
Replicated data management
Distributed database recovery
Security
Distributed directory(catalog)management

ADMT Chp3 Slides by: Ms. Shree J.


The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

13 Data Fragmentation, Replication


and Allocation
Data Fragmentation
Split a relation into logically related
and correct parts. A relation can be
fragmented in two ways:
Horizontal Fragmentation
Vertical Fragmentation
ADMT Chp3 Slides by: Ms. Shree J.
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

14

ADMT Chp3 Slides by: Ms. Shree J.


The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

15
Data Fragmentation
Horizontal fragmentation
It is a horizontal subset of a relation which
contain those of tuples which satisfy selection
conditions.
Consider the Employee relation with selection
condition (DNO = 5). All tuples satisfy this
condition will create a subset which will be a
horizontal fragment of Employee relation.
ADMT Chp3 Slides by: Ms. Shree J.
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

16
Data Fragmentation
A selection condition may be
composed of several conditions
connected by AND or OR.
Derived horizontal fragmentation: It is
the partitioning of a primary relation to
other secondary relations which are
related with Foreign keys.
ADMT Chp3 Slides by: Ms. Shree J.
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

17 Data Fragmentation
Vertical fragmentation
It is a subset of a relation which is
created by a subset of columns.
Thus a vertical fragment of a
relation will contain values of
selected columns.
There is no selection condition
used in vertical fragmentation.

ADMT Chp3 Slides by: Ms. Shree J.


The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

18
Data Fragmentation
Consider the Employee relation. A vertical
fragment of can be created by keeping the
values of Name, Bdate, Sex, and Address.
Because there is no condition for creating a
vertical fragment, each fragment must
include the primary key attribute of the parent
relation Employee. In this way all vertical
fragments of a relation are connected.

ADMT Chp3 Slides by: Ms. Shree J.


The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

19
Data Fragmentation
Representation: Horizontal
fragmentation
Each horizontal fragment on a relation
can be specified by a sCi (R)
operation in the relational algebra.
Complete horizontal fragmentation: A
set of horizontal fragments whose
conditions C1, C2, …, Cn include all
the tuples in R- that is, every tuple in R
satisfies (C1 OR C2 OR … OR Cn).

ADMT Chp3 Slides by: Ms. Shree J.


The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

20
Data Fragmentation
Disjoint complete horizontal
fragmentation: No tuple in R satisfies (Ci
AND Cj) where i ≠ j.
To reconstruct R from horizontal
fragments a UNION is applied.

ADMT Chp3 Slides by: Ms. Shree J.


The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

21
Data Fragmentation
Representation:Vertical
fragmentation
A vertical fragment on a relation can
be specified by a Li(R) operation in the
relational algebra.
Complete vertical fragmentation: A set
of vertical fragments whose projection
lists L1, L2, …, Ln include all the
attributes in R but share only the primary
key of R. In this case the projection lists
satisfy the following two conditions:
ADMT Chp3 Slides by: Ms. Shree J.
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

22
Data Fragmentation
1. L1  L2  ...  Ln = ATTRS (R)
2. Li  Lj = PK(R) for any i j, where ATTRS
(R) is the set of attributes of R and
PK(R) is the primary key of R.
 To reconstruct R from complete
vertical fragments a FULL OUTER
JOIN is applied.
ADMT Chp3 Slides by: Ms. Shree J.
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

23
Data Fragmentation
Representation:Mixed (Hybrid)
fragmentation
A combination of Vertical
fragmentation and Horizontal
fragmentation.
This is achieved by SELECT-
PROJECT operations which is
represented by Li(sCi (R)).

ADMT Chp3 Slides by: Ms. Shree J.


The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

24
Data Fragmentation
If C = True (Select all tuples) and
L ≠ ATTRS(R), we get a vertical
fragment, if C ≠ True and L =
ATTRS(R), we get a horizontal
fragment and if C ≠ True and L ≠
ATTRS(R), we get a mixed
fragment.
If C = True and L = ATTRS(R), then
R can be considered a fragment.
ADMT Chp3 Slides by: Ms. Shree J.
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

25 Data Fragmentation
Fragmentation schema
A definition of a set of fragments
(horizontal or vertical or horizontal
and vertical) that includes all
attributes and tuples in the
database that satisfies the condition
that the whole database can be
reconstructed from the fragments
by applying some sequence of
UNION (or OUTER JOIN) and UNION
operations.
ADMT Chp3 Slides by: Ms. Shree J.
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

26
Data Fragmentation
Allocation schema
It describes the distribution of
fragments to sites of distributed
databases. It can be fully or partially
replicated or can be partitioned.

ADMT Chp3 Slides by: Ms. Shree J.


The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

27 Data Replication and allocation


Database is replicated to all sites.
In full replication the entire database is
replicated and in partial replication some
selected part is replicated to some of the
sites.
Data replication is achieved through a
replication schema.
ADMT Chp3 Slides by: Ms. Shree J.
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

28
Data Replication and allocation

Data Distribution (Data


Allocation)
This is relevant only in the case of
partial replication or partition.
The selected portion of the
database is distributed to the
database sites.
ADMT Chp3 Slides by: Ms. Shree J.
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

29 Example

 Suppose that the company has three computer sites— one for
each current department. Sites 2 and 3 are for departments 5 and
4, respectively.
 At each of these sites, we expect frequent access to the EMPLOYEE
and PROJECT information for the employees who work in that
department and the projects controlled by that department.
 Further, we assume that these sites mainly access the Name, Ssn,
Salary, and Super_ssn attributes of EMPLOYEE.
 Site 1 is used by company headquarters and accesses all
employee and project information regularly, in addition to keeping
track of DEPENDENT information for insurance purposes.
ADMT Chp3 Slides by: Ms. Shree J.
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

30

ADMT Chp3 Slides by: Ms. Shree J.


The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

31

ADMT Chp3 Slides by: Ms. Shree J.


The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

32 Example contd…

 To determine the fragments to be replicated at sites 2


and 3, first we can horizontally fragment DEPARTMENT by
its key Dnumber.
 Then we apply derived fragmentation to the EMPLOYEE,
PROJECT, and DEPT_LOCATIONS relations based on their
foreign keys for department number—called Dno,
Dnum, and Dnumber, respectively
 We can vertically fragment the resulting EMPLOYEE
fragments to include only the attributes {Name, Ssn,
Salary, Super_ssn, Dno}.

ADMT Chp3 Slides by: Ms. Shree J.


The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

Allocation of fragments to sites. (a) Relation fragments at site 2


33corresponding to department 5.

ADMT Chp3 Slides by: Ms. Shree J.


The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

Allocation of fragments to sites. (b) Relation fragments at


34 site 3 corresponding to department 4

ADMT Chp3 Slides by: Ms. Shree J.


The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

35 Example contd…

the WORKS_ON tuple <333445555, 10, 10.0>


relates an employee who works for department
5 with a project controlled by department 4.
In this case, we could fragment WORKS_ON
based on the department in which the
employee works (which is expressed by the
condition C) and then fragment further based
on the department that controls the projects
that employee is working on
ADMT Chp3 Slides by: Ms. Shree J.
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

Complete and disjoint fragments of the WORKS_ON


relation. (a) Fragments of WORKS_ON for employees
36
working in department 5 (C=[ESSN IN (SELECT SSN
FROM EMPLOYEE WHERE DNO=5)]).

ADMT Chp3 Slides by: Ms. Shree J.


The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

Complete and disjoint fragments of the WORKS_ON


37 relation. (b) Fragments of WORKS_ON for employees
working in department 4 (C=[ESSN IN (SELECT SSN FROM

EMPLOYEE WHERE DNO=4)]).

ADMT Chp3 Slides by: Ms. Shree J.


The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

Complete and disjoint fragments of the WORKS_ON


38 relation. (c) Fragments of WORKS_ON for employees
working in department 1 (C=[ESSN IN (SELECT SSN
FROM EMPLOYEE WHERE DNO=1)]).

ADMT Chp3 Slides by: Ms. Shree J.


The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

39 Example contd…

The union of fragments G1, G2, and G3 gives all


WORKS_ON tuples for employees who work for
department 5.
Similarly, the union of fragments G4, G5, and G6
gives all WORKS_ON tuples for employees who
work for department 4.
On the other hand, the union of fragments G1,
G4, and G7 gives all WORKS_ON tuples for
projects controlled by department 5.
ADMT Chp3 Slides by: Ms. Shree J.
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

40 Example contd…

 Hence, we place the union of fragments G1, G2, G3,


G4, and G7 at site 2 and the union of fragments G4, G5,
G6, G2, and G8 at site 3.
 Notice that fragments G2 and G4 are replicated at both
sites. This allocation strategy permits the join between
the local EMPLOYEE or PROJECT fragments at site 2 or
site 3 and the local WORKS_ON fragment to be
performed completely locally

ADMT Chp3 Slides by: Ms. Shree J.


The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

41 Correctness Rules of Fragmentation


 Completeness
 Decomposition of relation R into fragments R1, R2, ..., Rn is complete
if and only if each data item in R can also be found in some Ri
 Reconstruction
 If relation R is decomposed into fragments R1, R2, ..., Rn, then there
should exist some relational operator such that
R = 1≤i≤nRi
The operator  will be different for different forms of fragmentation

 Disjointness
 If relation R is decomposed into fragments R1, R2, ..., Rn, and data
item di is in Rj, then di should not be in any other fragment Rk (k ≠ j ).

ADMT Chp3 Slides by: Ms. Shree J.


The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

42 Types of Distributed Database Systems


Homogeneous
All sites of the database system
have identical setup, i.e., same
database system software.
The underlying operating system
may be different.
For example, all sites run Oracle or
DB2, or Sybase or some other
database system.
The underlying operating systems
can be a mixture of Linux, Window,
Unix, etc.
ADMT Chp3 Slides by: Ms. Shree J.
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

43 Types of Distributed Database


Systems
Window
Site 5 Unix
Oracle Site 1
Oracle
Window
Site 4 Communications
network

Oracle
Site 3 Site 2
Linux Oracle Linux Oracle
ADMT Chp3 Slides by: Ms. Shree J.
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

44 Types of Distributed Database Systems


Heterogeneous
Federated: Each site may run different
database system but the data access is
managed through a single conceptual
schema.
This implies that the degree of local
autonomy is minimum. Each site must
adhere to a centralized access policy.
There may be a global schema.

ADMT Chp3 Slides by: Ms. Shree J.


The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

45 Types of Distributed Database Systems

Multidatabase: There is no one


conceptual global schema. For data
access a schema is constructed
dynamically as needed by the
application software.

ADMT Chp3 Slides by: Ms. Shree J.


The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

46 Types of Distributed Database


Systems
Object Unix Relational
Oriented Site 5 Unix
Site 1
Hierarchical
Window
Site 4 Communications
network

Network
Object DBMS
Oriented Site 3 Site 2 Relational
Linux Linux

ADMT Chp3 Slides by: Ms. Shree J.


The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

47 Types of Distributed Database Systems


 Federated Database Management Systems
Issues
Differences in data models:
Relational, Objected oriented,
hierarchical, network, etc.
Differences in constraints:
Each site may have their own data
accessing and processing constraints.
Differences in query language:
Some site may use SQL, some may use
SQL-89, some may use SQL-92, and so on.
ADMT Chp3 Slides by: Ms. Shree J.
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

48 Distributed Query Processing


A distributed database query is processed in stages as follows:
 Query Mapping: The input query on distributed data is specified
formally using a query language. It is then translated into an
algebraic query on global relations.
 Localization: This stage maps the distributed query on the global
schema to separate queries on individual fragments using data
distribution and replication information.
 Global Query Optimization: Optimization consists of selecting a
strategy from a list of candidates that is closest to optimal.
 Local Query Optimization: This stage is common to all sites in the
DDB. The techniques are similar to those used in centralized systems.
The first three stages discussed above are performed at a central
control site, while the last stage is performed locally
ADMT Chp3 Slides by: Ms. Shree J.
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

Layers of Query Processing


49

ADMT Chp3 Slides by: Ms. Shree J.


The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

50 Query Processing in Distributed Databases


Issues
Cost of transferring data (files and
results) over the network.
This data are transferred to other sites
for further processing as well as the final
result files that may have to be
transferred to the site where the query
result is needed.
This cost is usually high so some
optimization is necessary.
ADMT Chp3 Slides by: Ms. Shree J.
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

Examples to illustrate volume of data


transferred.
51

ADMT Chp3 Slides by: Ms. Shree J.


The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

Query Processing in Distributed Databases


Example relations: Employee at site 1 and Department
at Site 2
Employee at site 1: 10,000 rows, Row size = 100
bytes. So Table size = 106 bytes.
Department at Site 2: 100 rows, Row size = 35 bytes.
So Table size = 3,500 bytes.

Q: For each employee, retrieve employee name and


department name Where the employee works.
Q: Fname, Lname, Dname (Employee Dno = Dnumber
Department)
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

53 Query Processing in Distributed


Databases
 Result
The result of this query will have 10,000 tuples,
assuming that every employee is related to a
department.
Suppose each result tuple is 40 bytes long. The
query is submitted at site 3 and the result is sent
to this site.
Problem: Employee and Department relations
are not present at site 3.

ADMT Chp3 Slides by: Ms. Shree J.


The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

54 Query Processing in Distributed Databases


Strategies:
1.Transfer Employee and Department to site 3.
Total transfer bytes = 1,000,000 + 3500 =
1,003,500 bytes.
2.Transfer Employee to site 2, execute join at
site 2 and send the result to site 3.
Query result size = 40 * 10,000 = 400,000
bytes. Total transfer size = 400,000 +
1,000,000 = 1,400,000 bytes.
ADMT Chp3 Slides by: Ms. Shree J.
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

55 Query Processing in Distributed Databases

3.
Transfer Department relation to site 1,
execute the join at site 1, and send the
result to site 3.
 Total bytes transferred = 400,000 + 3500
= 403,500 bytes.
 Optimization criteria: minimizing data
transfer.
 Preferred approach: strategy 3.
ADMT Chp3 Slides by: Ms. Shree J.
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

56 Query Processing in Distributed Databases

 Consider the query


Q’: For each department, retrieve the
department name and the name of
the department manager
 Relational Algebra expression:
Fname,Lname,Dname (Employee Mgrssn = SSN
Department)
ADMT Chp3 Slides by: Ms. Shree J.
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

57 Query Processing in Distributed Databases


 The result of this query will have 100 tuples, assuming
that every department has a manager, the execution
strategies are:
1. Transfer Employee and Department to the result site
and perform the join at site 3.
 Total bytes transferred = 1,000,000 + 3500 =
1,003,500 bytes.
2. Transfer Employee to site 2, execute join at site 2 and
send the result to site 3. Query result size = 40 * 100 =
4000 bytes.
 Total transfer size = 4000 + 1,000,000 = 1,004,000
bytes.
ADMT Chp3 Slides by: Ms. Shree J.
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

58 Query Processing in Distributed Databases

3. Transfer Department relation to site 1,


execute join at site 1 and send the result
to site 3.
 Total transfer size = 4000 + 3500 = 7500
bytes.
 Preferred strategy: Choose strategy 3.

ADMT Chp3 Slides by: Ms. Shree J.


The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

59 Query Processing in Distributed Databases


 Now suppose the result site is 2. Possible strategies :
1. Transfer Employee relation to site 2, execute the
query and present the result to the user at site 2.
 Total transfer size = 1,000,000 bytes for both
queries Q and Q’.
2. Transfer Department relation to site 1, execute join
at site 1 and send the result back to site 2.
 Total transfer size for Q = 400,000 + 3500 = 403,500
bytes and for Q’ = 4000 + 3500 = 7500 bytes.

ADMT Chp3 Slides by: Ms. Shree J.


The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

60 Query Processing in Distributed Databases


 Semijoin:
 Objective is to reduce the number of tuples in a relation
before transferring it to another site.
 The idea is to send the joining column of one relation R
to the site where the other relation S is located; this
column is then joined with S.
only the joining column of R is transferred in one
direction, and a subset of S with no extraneous tuples
or attributes is transferred in the other direction.
 The semijoin operation was devised to formalize this
strategy. A semijoin operation R A=B S, where A and B
are domain-compatible attributes of R and S,
respectively, produces the same result as the relational
algebra expression R(R A =B S)
ADMT Chp3 Slides by: Ms. Shree J.
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

61 Query Processing in Distributed Databases


 Example execution of Q or Q’:
1. Project the join attributes of Department at site 2, and
transfer them to site 1. For Q, 4 * 100 = 400 bytes are
transferred and for Q’, 9 * 100 = 900 bytes are
transferred.
2. Join the transferred file with the Employee relation at
site 1, and transfer the required attributes from the
resulting file to site 2. For Q, 34 * 10,000 = 340,000
bytes are transferred and for Q’, 39 * 100 = 3900
bytes are transferred.
3. Execute the query by joining the transferred file with
Department and present the result to the user at site
2. ADMT Chp3 Slides by: Ms. Shree J.
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

62 Parallel versus Distributed Architectures


There are two main types of multiprocessor system architectures
 Shared memory (tightly coupled) architecture: Multiple processors
share secondary (disk) storage and also share primary memory.
 Shared disk (loosely coupled) architecture: Multiple processors
share secondary (disk) storage but each has their own primary
memory
 Shared nothing architecture: In this architecture, every processor
has its own primary and secondary (disk) memory, no common
memory exists, and the processors communicate over a
highspeed interconnection network (bus or switch).
 Database management systems developed using the above
types of architectures are termed parallel database management
systems
ADMT Chp3 Slides by: Ms. Shree J.
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

63

ADMT Chp3 Slides by: Ms. Shree J.


The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

64 Parallel versus Distributed Architectures

 Although the shared nothing architecture resembles a


distributed database computing environment, major
differences exist in the mode of operation.
 In shared nothing multiprocessor systems, there is
symmetry and homogeneity of nodes; this is not true of
the distributed database environment where
heterogeneity of hardware and operating system at
each node is very common.

ADMT Chp3 Slides by: Ms. Shree J.


The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

65
Distributed DBMS architectures

3 types:
Client server
Collaborating server
Middleware

ADMT Chp3 Slides by: Ms. Shree J.


The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

66
Client-Server Database
Architecture
It consists of clients running client software, a set of
servers which provide all database functionalities and
a reliable communication infrastructure.
Server 1 Client 1

Client 2

Server 2 Client 3

ADMT Chp3 Server


Slides by: n
Ms. Shree J. Client n
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

67
Client-Server Database
Architecture
Clients reach server for desired service, but
server does not reach clients.
The server software is responsible for local data
management at a site, much like centralized
DBMS software.
The client software is responsible for most of the
distribution function.
The communication software manages
communication among clients and servers.

ADMT Chp3 Slides by: Ms. Shree J.


The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

68
Client-Server Database
Architecture
 The processing of a SQL queries goes as follows:
Client parses a user query and decomposes it
into a number of independent sub-queries. Each
subquery is sent to appropriate site for execution.
Each server processes its query and sends the
result to the client.
The client combines the results of subqueries and
produces the final result.

ADMT Chp3 Slides by: Ms. Shree J.


The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

69
Collaborating server systems
The client in client-server architecture is
incapable of breaking a query which
spans multiple servers into appropriate
sub queries to be executed at different
sites & then piecing together the answers
to sub queries.
Client process would therefore become
quiet complex & would begin to overlap
with server in terms of capabilities
In order to eliminate the problem of
distinguishing between client & server, an
alternative is collaborating server system.
ADMT Chp3 Slides by: Ms. Shree J.
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

70
Collaborating server systems
In this system we have a collection
of database servers each capable
of running transactions against local
data which co-operatively execute
transactions spanning multiple
servers.
When a server receives a query that
requires access to data at other
servers, it generates appropriate sub
queries to be executed by other
servers and puts the results together
to compute answers to the original
query.
ADMT Chp3 Slides by: Ms. Shree J.
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

71 Collaborating server systems

ADMT Chp3 Slides by: Ms. Shree J.


The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

72
Middleware systems
This architecture is designed to allow a single query
to span multiple servers, without requiring all
database servers to be capable of managing such
multisite execution strategies
The idea is that we need just 1 database server
capable of managing queries & transactions
spanning multiple servers; the remaining servers
need to handle local queries & transactions
This special server acts as a layer of software, often
called middleware
The middleware layer is capable of executing joins
& other relational operations on data obtained
from other servers but typically does not itself
maintain any data.
ADMT Chp3 Slides by: Ms. Shree J.
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

73 Middleware systems

ADMT Chp3 Slides by: Ms. Shree J.


The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

74 Query Optimization Example


 Consider the following schema
EMP(ENO, ENAME, TITLE)
ASG(ENO, PNO, RESP, DUR)

and the following simple user query: “Find the names of employees who are
managing a project”Slides by: Ms. Shree J.
ADMT Chp3
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

75 Selecting Alternatives
SELECT ENAME  Project
FROM EMP,ASG s Select
WHERE EMP.ENO = ASG.ENO  Join
AND RESP="Manager"

Strategy 1
ENAME(sRESP="Manager"EMP.ENO=ASG.ENO (EMP  ASG))
Strategy 2
ENAME(EMP EMP.ENO=ASG.ENO (sRESP="Manager" (ASG)))
Strategy 2 avoids Cartesian product, so is “better”
ADMT Chp3 Slides by: Ms. Shree J.
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

76 What is the Problem?


Site 1 Site 2 Site 3 Site 4 Site 5
ASG1=sENO≤“E3”(ASG) ASG2=sENO>“E3”(ASG) EMP1=sENO≤“E3”(EMP) EMP2=sENO>“E3”(EMP) Result

Site 5

result2=(EMP1 EMP2) EMP.ENO=ASG.ENO sRESP="Manager" (ASG1 ASG1)


Site 5
result = EMP1’EMP2’
EMP1’ EMP2’ ASG1 ASG2 EMP1
Site 1 Site 2 EMP2
Site 3 Site 4 Site 3 Site 4
EMP1’=EMP1 ’ EMP2’=EMP2 EMP.ENO=ASG.ENO ASG2

EMP.ENO=ASG.ENO ASG1

ASG1’ ASG2’
Site 1 Site 2
ASG1’=sRESP="Manager"(ASG1) ASG2’=sRESP="Manager" (ASG2)

ADMT Chp3 Slides by: Ms. Shree J.


The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

77 Cost of Alternatives
 Assume:
 size(EMP) = 400, size(ASG) = 1000
 tuple access cost = 1 unit; tuple transfer cost = 10 units
 there are 20 managers in relation ASG and data is uniformly distributed among sites.
 Strategy 1
produce ASG': (10+10)tuple access cost 20
transfer ASG' to the sites of EMP: (10+10)tuple transfer cost 200
produce EMP': (10+10) tuple access cost2 40
transfer EMP' to result site: (10+10) tuple transfer cost 200
Total cost 460
 Strategy 2
transfer EMP to site 5:400tuple transfer cost 4,000
transfer ASG to site 5 :1000tuple transfer cost 10,000
produce ASG':1000tuple access cost 1,000
join EMP and ASG':40020tuple access cost 8,000
Total cost 23,000
ADMT Chp3 Slides by: Ms. Shree J.
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

78
Concurrency and Recovery
Concurrency
Interleaved processing:
Concurrent execution of processes
is interleaved in a single CPU
Parallel processing:
Processes are concurrently
executed in multiple CPUs.
Recovery
Recovery from transaction failures
usually means that the database is
restored to the most recent consistent
state just before the time of failure.

ADMT Chp3 Slides by: Ms. Shree J.


The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

79 Concurrency Control and Recovery

 Distributed Databases encounter a number of


concurrency control and recovery problems which
are not present in centralized databases. Some of
them are listed below.
Dealing with multiple copies of data items
Failure of individual sites
Communication link failure
Distributed commit
Distributed deadlock

ADMT Chp3 Slides by: Ms. Shree J.


The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

80 Concurrency Control and Recovery


 Dealing with multiple copies of data items:
The concurrency control must maintain
global consistency. Likewise the recovery
mechanism must recover all copies and
maintain consistency after recovery.
 Failure of individual sites:
Database availability must not be affected
due to the failure of one or two sites and
the recovery scheme must recover them
before they are available for use.
ADMT Chp3 Slides by: Ms. Shree J.
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

81 Concurrency Control and Recovery

Communication link failure:


This failure may create network partition
which would affect database
availability even though all database
sites may be running.
Distributed commit:
A transaction may be fragmented and
they may be executed by a number of
sites.
ADMT Chp3 Slides by: Ms. Shree J.
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

82 Concurrency Control and Recovery

Distributed deadlock:
Since transactions are processed
at multiple sites, two or more sites
may get involved in deadlock. This
must be resolved in a distributed
manner.

ADMT Chp3 Slides by: Ms. Shree J.


The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

83 Concurrency Control and Recovery

Distributed Concurrency control based on a


distributed copy of a data item
1. Primary site technique: A single site is
designated as a primary site which serves as
a coordinator for transaction management.

ADMT Chp3 Slides by: Ms. Shree J.


The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

84 Concurrency Control and Recovery

Primary site
Site 5
Site 1

Site 4 Communications neteork

Site 3 Site 2
ADMT Chp3 Slides by: Ms. Shree J.
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

85 Concurrency Control and Recovery

Transaction management:
Concurrency control and commit are
managed by this site.
this site manages locking and releasing
data items.
Advantages:
Data items are locked only at one site but
they can be accessed at any site.
ADMT Chp3 Slides by: Ms. Shree J.
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

86 Concurrency Control and Recovery

 Disadvantages:
All transaction management activities go to
primary site which is likely to overload the site.
If the primary site fails, the entire system is
inaccessible.
To aid recovery a backup site is designated
which behaves as a shadow of primary site. In
case of primary site failure, backup site can act
as primary site.

ADMT Chp3 Slides by: Ms. Shree J.


The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

87 Concurrency Control and Recovery


2. Primary site approach with backup site:
Suspends all active transactions, designates
the backup site as the primary site and
identifies a new back up site. Primary site
receives all transaction management
information to resume processing.

ADMT Chp3 Slides by: Ms. Shree J.


The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

88 Concurrency Control and Recovery

3. Primary Copy Technique:


This method attempts to distribute the
load of lock coordination among
various sites by having the
distinguished copies of different data
items stored at different sites.
To lock a data item just the primary
copy of the data item is locked.
ADMT Chp3 Slides by: Ms. Shree J.
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

89 Concurrency Control and Recovery

Advantages:
Since primary copies are distributed at
various sites, a single site is not
overloaded with locking and unlocking
requests.
Disadvantages:
Identification of a primary copy is
complex. A distributed directory must
be maintained, possibly at all sites.
ADMT Chp3 Slides by: Ms. Shree J.
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

90 Concurrency Control and Recovery


Recovery from a coordinator failure
In both approaches a coordinator site or copy
may become unavailable. This will require the
selection of a new coordinator.
Primary site approach with no backup site:
Aborts and restarts all active transactions at all
sites. Elects a new coordinator and initiates
transaction processing.
Primary and backup sites fail or no backup site:
Use election process to select a new coordinator
site.
ADMT Chp3 Slides by: Ms. Shree J.
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

91 Concurrency Control and Recovery


4. Concurrency control based on voting:
There is no primary copy or coordinator.
Send lock request to sites that have data item.
If majority of sites grant lock then the requesting
transaction gets the data item.
Locking information (grant or denied) is sent to all these
sites.
To avoid unacceptably long wait, a time-out period is
defined. If the requesting transaction does not get any
vote information then the transaction is aborted.
ADMT Chp3 Slides by: Ms. Shree J.
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

92 Recovery- two-phase commit protocol


 In some cases, a single transaction, called a multidatabase
transaction, may require access to multiple databases.
 To maintain the atomicity of a multidatabase transaction, it is
necessary to have a two-level recovery mechanism.
 A global recovery manager, or coordinator, is needed to
maintain information needed for recovery, in addition to the
local recovery managers and the information they maintain
(log, tables).
 The coordinator usually follows a protocol called the two-
phase commit protocol

ADMT Chp3 Slides by: Ms. Shree J.


The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

93 Recovery- two-phase commit protocol


 Phase 1: All participating databases signal the coordinator that the
part of the multidatabase transaction involving each has concluded,
 The coordinator then sends a message prepare for commit to each
participant to get ready for committing the transaction.
 Each participating database receiving that message will force-write
all log records and needed information for local recovery to disk and
then send a ready to commit or OK signal to the coordinator.
 If the force-writing to disk fails or the local transaction cannot commit
for some reason, the participating database sends a cannot commit
or not OK signal to the coordinator.
 If the coordinator does not receive a reply from the database within a
certain time out interval, it assumes a not OK response.

ADMT Chp3 Slides by: Ms. Shree J.


The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

94 Recovery- two-phase commit protocol


 Phase 2: If all participating databases reply OK, and the coordinator’s vote
is also OK, the transaction is successful, and the coordinator sends a
commit signal for the transaction to the participating databases.
 Because all the local effects of the transaction and information needed for
local recovery have been recorded in the logs of the participating
databases, recovery from failure is now possible.
 Each participating database completes transaction commit by writing a
[commit] entry for the transaction in the log and permanently updating the
database if needed.
 On the other hand, if one or more of the participating databases or the
coordinator have a not OK response, the transaction has failed, and the
coordinator sends a message to roll back or UNDO the local effect of the
transaction to each participating database. This is done by undoing the
transaction operations, using the log.
ADMT Chp3 Slides by: Ms. Shree J.
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

95 Recovery- two-phase commit protocol


 The net effect of the two-phase commit protocol is that
either all participating databases commit the effect of
the transaction or none of them do.
 In case any of the participants—or the coordinator—
fails, it is always possible to recover to a state where
either the transaction is committed or it is rolled back.
 A failure during or before Phase 1 usually requires the
transaction to be rolled back, whereas a failure during
Phase 2 means that a successful transaction can
recover and commit.

ADMT Chp3 Slides by: Ms. Shree J.


The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

96

ADMT Chp3 Slides by: Ms. Shree J.


The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

97 Recovery- two-phase commit protocol


Disadvantages:
 The biggest drawback of 2PC is that it is a blocking
protocol. Failure of the coordinator blocks all
participating sites, causing them to wait until the
coordinator recovers
 When both the coordinator and a participant that has
committed crash together, a participant has no way to
ensure that all participants got the commit message in
the second phase.
 The result of the transaction becomes uncertain or
nondeterministic.

ADMT Chp3 Slides by: Ms. Shree J.


The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

98 Recovery- three-phase commit


protocol
 These problems are solved by the three-phase commit (3PC)
protocol, which essentially divides the second commit phase
into two subphases called prepare-to-commit and commit.
 The prepare-to-commit phase is used to communicate the
result of the vote phase to all participants. If all participants
vote yes, then the coordinator instructs them to move into the
prepare-to-commit state.
 if the coordinator crashes during this subphase, another
participant can see the transaction through to completion.

ADMT Chp3 Slides by: Ms. Shree J.


The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

99 Recovery- three-phase commit


protocol
 It can simply ask a crashed participant if it received a
prepare-to-commit message. If it did not, then it safely
assumes to abort. Thus the state of the protocol can be
recovered irrespective of which participant crashes.
 Also, by limiting the time required for a transaction to
commit or abort to a maximum time-out period, the
protocol ensures that a transaction attempting to
commit via 3PC releases locks on time-out.

ADMT Chp3 Slides by: Ms. Shree J.


The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

100 Recovery- three-phase commit


protocol
 The main idea is to limit the wait time for participants
who have committed and are waiting for a global
commit or abort from the coordinator.
 When a participant receives a precommit message, it
knows that the rest of the participants have voted to
commit.
 If a precommit message has not been received, then
the participant will abort and release all locks.

ADMT Chp3 Slides by: Ms. Shree J.


The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

101 Recovery- three-phase commit


protocol

ADMT Chp3 Slides by: Ms. Shree J.


The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

102 Distribution Design

 Top-down

mostly in designing systems from scratch

mostly in homogeneous systems

 Bottom-up

when the databases already exist at a number of sites

ADMT Chp3 Slides by: Ms. Shree J.


The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

104 Distribution Design Issues


 Distributed Database Design: Design can be partitioned (or non-replicated) and
replicated. Replicated designs can be either fully replicated or partially
replicated. The two fundamental design issues are fragmentation and distribution
 Distributed Directory Management: A directory (containing descriptions and
locations about data items) may be global to the entire DDBS or local to each
site; it can be centralized at one site or distributed over several sites; there can be
a single copy or multiple copies.
 Distributed Query Processing: The problem is how to decide on a strategy for
executing each query over the network in the most cost-effective way, however
cost is defined. The factors to be considered are the distribution of data,
communication costs, and lack of sufficient locally-available information.
 Distributed Concurrency Control: One not only has to worry about the integrity of
a single database, but also about the consistency of multiple copies of the
database. Two fundamental primitives that can be used are locking, which is
based on the mutual exclusion of accesses to data items, and timestamping,
where the transaction executions are ordered based on timestamps.
ADMT Chp3 Slides by: Ms. Shree J.
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

105 Distribution Design Issues


 Distributed Deadlock Management: The deadlock problem in DDBSs is
similar in nature to that encountered in operating systems. The well-
known alternatives of prevention, avoidance, and detection/recovery
also apply to DDBSs.
 Reliability of Distributed DBMS: When the computer system or network
recovers from the failure, the DDBSs should be able to recover and
bring the databases at the failed sites up-to-date. Recovery algorithms
needed.
 Replication: If the distributed database is (partially or fully) replicated, it
is necessary to implement protocols that ensure the consistency of the
replicas. These protocols can be eager in that they force the updates
to be applied to all the replicas before the transaction completes, or
they may be lazy so that the transaction updates one copy (called the
master) from which updates are propagated to the others after the
transaction completes.
ADMT Chp3 Slides by: Ms. Shree J.
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

106 Twelve rules for distributed DB


(1) Local autonomy
 The sites in a distributed system should be autonomous. In this
context, autonomy means that:
 local data is locally owned and managed;
 local operations remain purely local;
 all operations at a given site are controlled by that site.
(2) No reliance on a central site
 there should be no central servers for services such as transaction
management, deadlock detection, query optimization, and
management of the global system catalog.
(3) Continuous operation
 There is no affect of site failure to system.
ADMT Chp3 Slides by: Ms. Shree J.
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

107 Twelve rules for distributed DB contd..


(4) Location independence
 Location independence is equivalent to location transparency.
(5) Fragmentation independence
 The user should be able to access the data, no matter how it is fragmented.
(6) Replication independence
 The user should be unaware that data has been replicated.
(7) Distributed query processing
 The system should be capable of processing queries that reference data at
more than one site.
(8) Distributed transaction processing
 The system should support the transaction as the unit of recovery. The system
should ensure that both global and local transactions conform to the ACID
rules for transactions
ADMT Chp3 Slides by: Ms. Shree J.
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

108 Twelve rules for distributed DB contd..


(9) Hardware independence
 It should be possible to run the DDBMS on a variety of hardware
platforms.
(10) Operating system independence
 As a corollary to the previous rule, it should be possible to run the
DDBMS on a variety of operating systems.
(11) Network independence
 Again, it should be possible to run the DDBMS on a variety of
disparate communication networks.
(12) Database independence
 The system must support any vendor of the database product.In
other words, the system should support heterogeneity.
ADMT Chp3 Slides by: Ms. Shree J.
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

109 Questions from MU papers


 Why fragmentation is required in distributed databases? Explain
vertical fragmentation with example. Comment on completeness,
reconstruction and disjointness aspect of it. (Dec
2018)……10M…Ans: Chp 25, pg. 895
 Explain 2 phase commit protocol with proper flow diagram. (Dec
2018)……10M…Ans: Chp 23, pg. 825
 Explain Shared Memory and Shared Nothing Architecture for
Parallel DBs. (Dec 2019)……5M…Ans: Chp 25, pg. 887
 Explain primary horizontal, derived horizontal and vertical
fragmentation with example. Comment on completeness,
reconstruction and disjointness properties. (Dec 2019)……10M…Ans:
Chp 25, pg. 895, pg. 899

Note: Chapter number and page numbers are from the book, Elmasri and Navathe,
“Fundamentals of Database Systems”, 6th Edition, PEARSON Education.
ADMT Chp3 Slides by: Ms. Shree J.
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

110 Questions from MU papers


 List out the twelve rules for distributed DB (Dec 2019)……5M… Out of
syllabus (Thomas M. Connoly, Carolyn E. BEGG, “Database Systems, A
Practical Approach to Design, Implementation, and Management”,
Fourth Edition, Addition-Wesley, 2012.)
 List the Distribution Design Issues and explain any one in detail. (May
2019)……5M…Ans: Chp 1 , pg. 16 (Tamer Özsu & Patrick Valduriez,
“Principles of distributed database systems”, 3rd edition)
 Explain generic layering scheme for distributed query processing (May
2019)……10M…Ans: Chp 25 , pg. 907 (Elmasri and Navathe,
“Fundamentals of Database Systems”, 6th Edition), Chp 6, pg. 215
(Tamer Özsu & Patrick Valduriez, “Principles of distributed database
systems”, 3rd edition)
 List various fragmentation strategies in distributed database and explain
any one in detail. (May 2019)……10M…Ans: Chp 25, pg. 895, pg. 899

Note: Chapter number and page numbers are from the book, Elmasri and Navathe,
“Fundamentals of Database Systems”, 6th Edition, PEARSON Education.

ADMT Chp3 Slides by: Ms. Shree J.


The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

111 Questions from MU papers


 Explain Replication in detail. (Dec 2020)……5M…Ans: Chp 25, pg. 895
 Explain Distributed Database architecture in detail? (Dec
2020)……5M…Ans: Chp 25, pg. 895
 Which one of the following is not disadvantages of Fragmentation
are as follows:(Dec 2020)
 When data from different fragments are required, the access speeds
may be very high.
 In case of recursive fragmentations, the job of reconstruction will need
expensive techniques.
 Lack of back-up copies of data in different sites may render the
database ineffective in case of failure of a site.
 The end user is able to access any available copy of the data, and an
end user's request is processed by any processor at the data location

ADMT Chp3 Slides by: Ms. Shree J.


The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

112 Questions from MU papers

 What are the advantages of Replication of data in Distributed


database? (Dec 2020)
 Availability, Parallelism, Increased data transfer
 Availability, Parallelism, Reduced data transfer
 Availability, Increased parallelism and Cost of updates
 Availability, Increased data transfer, Cost of updates

ADMT Chp3 Slides by: Ms. Shree J.

You might also like