Distributed Database Concepts
Distributed Database Concepts
Data Fragmentation
Split a relation into logically related and correct parts.
A relation can be fragmented in two ways:
Horizontal Fragmentation
Vertical Fragmentation
Horizontal fragmentation
It is a horizontal subset of a relation which contain those of tuples
5). All tuples satisfy this condition will create a subset which will
be a horizontal fragment of Employee relation.
A selection condition may be composed of several conditions
Vertical fragmentation
It is a subset of a relation which is created by a subset of columns.
each fragment must include the primary key attribute of the parent
relation Employee. In this way all vertical fragments of a relation
are connected.
Representation
Horizontal fragmentation
Representation
Vertical fragmentation
Representation
Mixed (Hybrid) fragmentation
A combination of Vertical fragmentation and Horizontal
fragmentation.
This is achieved by SELECT-PROJECT operations which is
represented by Li(sCi (R)).
If C = True (Select all tuples) and L ≠ ATTRS(R), we get a
vertical fragment, and if C ≠ True and L ≠ ATTRS(R), we get a
mixed fragment.
If C = True and L = ATTRS(R), then R can be considered a
fragment.
Fragmentation schema
A definition of a set of fragments (horizontal or vertical or
horizontal and vertical) that includes all attributes and tuples in the
database that satisfies the condition that the whole database can
be reconstructed from the fragments by applying some sequence
of UNION (or OUTER JOIN) and UNION operations.
Allocation schema
It describes the distribution of fragments to sites of distributed
databases. It can be fully or partially replicated or can be
partitioned.
Data Replication
Database is replicated to all sites.
In full replication the entire database is replicated and in partial
replication some selected part is replicated to some of the sites.
Data replication is achieved through a replication schema.
Data Distribution (Data Allocation)
This is relevant only in the case of partial replication or partition.
The selected portion of the database is distributed to the database
sites.
Network
Object DBMS
Oriented Site 3 Site 2 Relational
Linux Linux
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe
Types of Distributed Database
Systems
Federated Database Management Systems
Issues
Differences in data models:
Relational, Objected oriented, hierarchical, network, etc.
Differences in constraints:
Each site may have their own data accessing and processing
constraints.
Differences in query language:
Some site may use SQL, some may use SQL-89, some may
use SQL-92, and so on.
Issues
Cost of transferring data (files and results) over the network.
This cost is usually high so some optimization is necessary.
Example relations: Employee at site 1 and Department at Site
2
Employee at site 1. 10,000 rows. Row size = 100 bytes. Table
size = 106 bytes.
Fname Minit Lname SSN Bdate Address Sex Salary Superssn Dno
Result
The result of this query will have 10,000 tuples,
assuming that every employee is related to a
department.
Suppose each result tuple is 40 bytes long. The query
is submitted at site 3 and the result is sent to this site.
Problem: Employee and Department relations are not
present at site 3.
Strategies:
1. Transfer Employee and Department to site 3.
Total transfer bytes = 1,000,000 + 3500 = 1,003,500 bytes.
2. Transfer Employee to site 2, execute join at site 2 and
send the result to site 3.
Query result size = 40 * 10,000 = 400,000 bytes. Total
transfer size = 400,000 + 1,000,000 = 1,400,000 bytes.
3. Transfer Department relation to site 1, execute the join at
site 1, and send the result to site 3.
Total bytes transferred = 400,000 + 3500 = 403,500 bytes.
Optimization criteria: minimizing data transfer.
Preferred approach: strategy 3.
Primary site
Site 5
Site 1
Site 3 Site 2