Chapter 4 Distributed Database Systems
Chapter 4 Distributed Database Systems
Distributed
Database Systems
Chapter 4 - Objectives
Basic Concepts in Distributed Database System.
Advantages and disadvantages of distributed
databases.
Functions and architecture for a DDBMS.
Distributed Database Design issues.
Levels of DDBMS Transparency.
Rules for DDBMSs.
Distributed DBMS
Software system that permits the management of the
distributed database and makes the distribution
transparent to users.
Distributed DBMSs should help resolve the islands of
information problem in organizations.
3 04/13/2024
Distributed Databases & Distributed Computing
5 04/13/2024
Distributed DBMS Architecture
6 04/13/2024
Data distribution and replication among
distributed databases – an example
7 04/13/2024
Distributed Processing
A centralized database that can be accessed over a
computer network. (this is not a distributed database)
8 04/13/2024
Parallel DBMS
A DBMS running across multiple processors and
disks designed to execute operations in parallel,
whenever possible, to improve performance.
Based on premise that single processor systems can
no longer meet requirements for cost-effective
scalability, reliability, and performance.
Parallel DBMSs link multiple, smaller machines to
achieve same throughput as single, larger machine,
with greater scalability and reliability.
9 04/13/2024
Parallel DBMS
Parallel technology is typically used for
very large databases possibly of the order
of terabytes (1012 bytes), or systems that
have to process thousands of transactions
per second.
Also note that most DBMS vendors have
a parallel DMBS version of their
products.
Also, this is not a distributed database
systems.
10 04/13/2024
Parallel DBMS
Main architectures for parallel DBMSs are:
Shared memory,
This architecture provides high-speed data access for a limited
number of processors, but it is not scalable beyond about 64
processors, at which point the interconnection network becomes
a bottleneck
Shared disk,
Architecture optimized for applications that are inherently
centralized and require high availability and performance
Shared nothing.
Often known as massively parallel processing (MPP), is a
multiple-processor architecture in which each processor is part
of a complete system, with its own memory and disk storage
11 04/13/2024
Parallel DBMS
12 04/13/2024
Multi- Database System (MDBS)
MDBS -A distributed DBMS in which each site maintains
complete autonomy
Simply speaking , MDBS is a DBMS that resides
transparently on top of existing database and file systems,
and presents a single database to its users
MDBS attempt to logically integrate a number of
independent DDBMSs while allowing the local DBMSs to
maintain complete control of their operations.
If there is no provision for the local sites to function as a
standalone DBMS, then the system has no local
autonomy
For a centralized database, there is complete autonomy
13 but a total lack of distribution and heterogeneity. 04/13/2024
Classification of DDBMS
There are unfederated (where there are no local users)
and federated(there are local users) MDBSs.
A federated system is a cross between a distributed
DBMS and a centralized DBMS; it is a distributed
system for global users and a centralized system for
local
In General, Classification of DDBMS is based on three
important factors
Level of data distribution
Degree of local site autonomy
Extent of site Heterogeneity
14 04/13/2024
Advantages of DDBMSs
Reflects organizational structure
Improved shareability and local autonomy
Improved availability
Improved reliability
Improved performance
15 04/13/2024
Disadvantages of DDBMSs
Complexity
Cost
Security
Integrity control more difficult
Lack of standards
Lack of experience
Database design more complex
16 04/13/2024
Types of DDBMS( based on site heterogeneity )
Homogeneous DDBMS
Heterogeneous DDBMS
17 04/13/2024
Homogeneous DDBMS
All sites use same DBMS product.
Much easier to design and manage.
Approach provides incremental growth and allows
increased performance.
Usually are results of a new system being designed
18 04/13/2024
Heterogeneous DDBMS
Sites may run different DBMS products, with possibly
different underlying data models.
Occurs when sites have implemented their own
databases and integration is considered later.
Translations required to allow for:
Different hardware.
Different DBMS products.
Different hardware and different DBMS products.
Typical solution is to use gateways.
Gateways: convert the language and model of each
different DBMS into the language and model of the
relational system.
19 04/13/2024
Overview of Networking
Network - Interconnected collection of autonomous
computers, capable of exchanging information.
Local Area Network (LAN) intended for connecting
computers at same site.
Wide Area Network (WAN) used when computers
or LANs need to be connected over long distances.
WAN relatively slow and less reliable than LANs.
DDBMS using LAN provides much faster response
time than one using WAN.
LANs can be extended over a long geographic areas
20
using Virtual Private Networks(VPNs)
Distributed Database Systems 04/13/2024
Overview of Networking- Summary of WAN
and LAN
24 04/13/2024
Components of a DDBMS
25 04/13/2024
Distributed Database Design Issues
Three key issues need to be considered:
Fragmentation,
Allocation,
Replication.
26 04/13/2024
Distributed Database Design
Fragmentation
Relation may be divided into a number of sub-relations,
which are then distributed.
Allocation
Each fragment is stored at a site with “optimal”
distribution.
Replication
Copy of fragment may be maintained at several sites.
27 04/13/2024
Why Fragment?
Usage
Applications work with views rather than entire
relations.
Efficiency
Data is stored close to where it is most frequently used.
Data that is not needed by local applications is not
stored.
31 04/13/2024
Data Allocation
Four alternative strategies regarding placement of
data:
Centralized( Distributed Processing),
Partitioned (or Fragmented),
Complete Replication,
Selective Replication.
32 04/13/2024
Data Allocation
Centralized: Consists of single database and DBMS
stored at one site with users distributed across the
network.(Not really a distributed database)
Partitioned: Database partitioned into disjoint
fragments, each fragment assigned to one site.
Complete Replication: Consists of maintaining
complete copy of database at each site.
Selective Replication: Combination of partitioning,
replication, and centralization, based on the nature of
data
This is the most commonly used strategy because of
its flexibility.
33 04/13/2024
Comparison of Strategies for Data Distribution
34 04/13/2024
Correctness of Fragmentation
Three correctness rules:
Completeness,
Reconstruction,
Disjointness.
35 04/13/2024
Correctness of Fragmentation
Completeness
If relation R is decomposed into fragments R1, R2, ... Rn,
each data item that can be found in R must appear in at
least one fragment.
Reconstruction
It must be possible to define a relational operation
that will reconstruct R from the fragments.
Reconstruction for horizontal fragmentation is Union
operation and for vertical Natural Join is used.
36 04/13/2024
Correctness of Fragmentation
Disjointness
If data item di appears in fragment Ri, then it should
not appear in any other fragment.
Exception: vertical fragmentation, where primary
key attributes must be repeated to allow
reconstruction.
For horizontal fragmentation, data item is a tuple.
For vertical fragmentation, data item is an
attribute.
37 04/13/2024
Fragmentation Options
Fragmenting a relation should be done with
caution. The following are the fragmentation
possibilities for relations in a database
Horizontal
Vertical
Mixed
Derived
No Fragmentation
38 04/13/2024
Horizontal and Vertical Fragmentation
39 04/13/2024
Mixed Fragmentation
40 04/13/2024
Horizontal Fragmentation
Consists of a subset of the tuples of a relation.
Defined using Selection operation of relational
algebra:
p(R)
For example:
P1 = type=‘House’(PropertyForRent)
P2 = type=‘Flat’(PropertyForRent)
Reconstruction expression ?
41 04/13/2024
Vertical Fragmentation
Consists of a subset of attributes of a relation.
Defined using Projection operation of relational algebra:
a1, ... ,an(R)
For example:
S1 = staffNo, position, sex, DOB, salary(Staff)
S2 = staffNo, fName, lName, branchNo(Staff)
Determined by establishing affinity of one attribute to
another.
Reconstruction expression ?
42 04/13/2024
Mixed Fragmentation
Consists of a horizontal fragment that is vertically
fragmented, or a vertical fragment that is horizontally
fragmented.
Defined using Selection and Projection operations of
relational algebra:
p(a1, ... ,an(R)) or
a1, ... ,an(σp(R))
43 04/13/2024
Example - Mixed Fragmentation
S1 = staffNo, position, sex, DOB, salary(Staff)
S2 = staffNo, fName, lName, branchNo(Staff)
S21 = branchNo=‘B003’(S2)
S22 = branchNo=‘B005’(S2)
S23 = branchNo=‘B007’(S2)
Reconstruction expression ?
44 04/13/2024
Derived Horizontal Fragmentation
A horizontal fragment of a child relation that is based
on horizontal fragmentation of a parent
relation( primary key table).
Some applications may involve a join of two or more
relations.
If the relations are stored at different locations, there
may be a significant overhead in processing the join.
45 04/13/2024
Derived Horizontal Fragmentation
In such cases, it may be more appropriate to ensure
that the relations, or fragments of relations, are at
the same location
Ensures that fragments that are frequently joined
together are at the same site.
Defined using Semijoin operation of relational
algebra:
Ri = R F Si, 1iw
where w is the number of horizontal fragments defined
on S( the parent table) and f is the join attribute
46 04/13/2024
Example - Derived Horizontal Fragmentation
If we have staff fragments below,
S3 = branchNo=‘B003’(Staff)
S4 = branchNo=‘B005’(Staff)
S5 = branchNo=‘B007’(Staff)
47 04/13/2024
Derived Horizontal Fragmentation
If a child relation contains more than one foreign
key, need to select one of the parent tables.
Choice can be based on fragmentation used most
frequently or fragmentation with better join
characteristics.
48 04/13/2024
No fragmentation
A final strategy is not to fragment a relation.
For example, the Branch relation contains only
a small number of tuples and is not updated
very frequently.
Rather than trying to horizontally fragment the
relation on branch number for example, it
would be more sensible to leave the relation
whole and simply replicate the Branch relation
at each site
49 04/13/2024
Distributed Database Design Methodology
1. Use normal methodology to produce a design for the
global relations.
2. Examine topology of system to determine where
databases will be located.
3. Analyse most important transactions and identify
appropriateness of horizontal/vertical fragmentation.
4. Decide which relations are not to be fragmented.
5. Examine relations on 1 side of relationships(Parent
Relations) and determine a suitable fragmentation
schema. Relations on many side (Child Relations) may
be suitable for derived horizontal fragmentation.
Distribution Transparency
Fragmentation Transparency
Location Transparency
Replication Transparency
Transaction Transparency
Concurrency Transparency
Failure Transparency
Performance Transparency
DBMS Transparency
52 Distributed Database Systems 04/13/2024
Distribution Transparency
Distribution transparency allows user to perceive database
as single, logical entity.
If DDBMS exhibits distribution transparency, the user has
freedom not to know the operational details of the network
and the placement of the data in the distributed system:
Fragmentation Transparency gives the user the freedom to be
unaware of the fact that data is fragmented (fragmentation
transparency),
This is the highest level of distribution transparency
local mapping transparency:. With local mapping transparency,
the user needs to specify both fragment names and the location
of data items, including replication sites if any.
This is the lowest level of distribution transparency
53 04/13/2024
Distribution Transparency
Location Transparency is the middle level of
distribution transparency.
With location transparency, the user must know
how the data has been fragmented but still does
not have to know the location of the data
With replication transparency, user is unaware
of replication of fragments – the existence of
floating copies of same data item at different
sites.
Closely related to location transparency