0% found this document useful (0 votes)
10 views32 pages

Unit3 - Distributed Databases

A distributed database is a database that is spread across multiple sites and managed by a distributed database management system (DDBMS) to appear as a single database to users. It offers features such as data synchronization, access mechanisms, and support for large volumes of data, while ensuring data integrity and confidentiality. The document also discusses types of distributed databases, their architectures, data fragmentation methods, access primitives, and integrity constraints.

Uploaded by

garas47896
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views32 pages

Unit3 - Distributed Databases

A distributed database is a database that is spread across multiple sites and managed by a distributed database management system (DDBMS) to appear as a single database to users. It offers features such as data synchronization, access mechanisms, and support for large volumes of data, while ensuring data integrity and confidentiality. The document also discusses types of distributed databases, their architectures, data fragmentation methods, access primitives, and integrity constraints.

Uploaded by

garas47896
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Unit 3

Distributed Databases
Introduction -
• A distributed database is basically a database that is not limited to one system, it is spread
over different sites, i.e, on multiple computers or over a network of computers.
• A distributed database system is located on various sites that don’t share physical
components.
• A distributed DBMS manages the distributed database in a manner so that it appears as one
single database to users.
• A distributed database management system (DDBMS) is a centralized software system that
manages a distributed database in a manner as if it were all stored in a single location.
• Data is physically stored across multiple sites. Data in each site can be managed by a DBMS
independent of the other sites.

• The processors in the sites are connected via a network. They do not have any
multiprocessor configuration.
Features of Distributed Database -
• It is used to create, retrieve, update and delete distributed databases.
• It synchronizes the database periodically and provides access mechanisms by the virtue of
which the distribution becomes transparent to the users.
• It ensures that the data modified at any site is universally updated.
• It is used in application areas where large volumes of data are processed and accessed by
numerous users simultaneously.
• It is designed for heterogeneous database platforms.
• It maintains confidentiality and data integrity of the databases.
Why to use Distributed Database -
Distributed Nature of Organizational Units − Most organizations in the current times are
subdivided into multiple units that are physically distributed over the globe. Each unit requires
its own set of local data. Thus, the overall database of the organization becomes distributed.
Need for Sharing of Data − The multiple organizational units often need to communicate with
each other and share their data and resources.
Database Recovery − Replication of data automatically helps in data recovery if database in
any site is damaged. Users can access data from other sites while the damaged site is being
reconstructed.
Support for Multiple Application Software − Most organizations use a variety of application
software each with its specific database support. DDBMS provides a uniform functionality for
using the same data among different platforms.
Types of Distributed Database -
Homogenous Database -
A homogeneous database stores data uniformly across all locations. All sites utilize the same
operating system, database management system, and data structures. They are therefore
simple to handle.
Heterogeneous Database -
With a heterogeneous distributed database, many locations may employ various software and
schema, which may cause issues with queries and transactions. Moreover, one site could not
be even aware of the existence of the other sites. Various operating systems and database
applications may be used by various machines. They could even employ separate database
data models.
Distributed Database Vs Centralized Database -

Centralized database Distributed database


It is a database that is stored, located as well It is a database that consists of multiple
as maintained at a single location only. databases which are connected with each
other and are spread across different physical
locations.
The data access time in the case of multiple The data access time in the case of multiple
users is more in a centralized database. users is less in a distributed database.
Since it is spread across different locations
This database provides a uniform and
thus it is difficult to provide a uniform view to
complete view to the user.
the user.
The management, modification, and backup The management, modification, and backup
of this database are easier as the entire data of this database are very difficult as it is
is present at the same location. spread across different physical locations.
Centralized database Distributed database
The users cannot access the database in case In a distributed database, if one database fails
of database failure occurs. users have access to other databases.
A centralized database is less efficient as data A centralized database is less efficient as data
finding becomes quite complex because of finding becomes quite complex because of
the storing of data and information at a the storing of data and information at a
particular place. particular place.
The response speed is more in comparison to The response speed is less in comparison to a
a distributed database. centralized database.
•Apache Ignite
•Apache Cassandra
•A desktop or server CPU •Apache HBase
•A mainframe computer. •Amazon SimpleDB
•Clusterpoint
•FoundationDB.
Transparencies in Distributed DBMS -
• Transparency refers to hiding the complexities of the system’s implementation details from
users and applications.
• It provide a seamless and consistent user experience regardless of the system’s architecture,
distribution, or configuration.
• It ensures that users and applications interact with distributed resources in a uniform and
predictable manner
• Transparency is very important in distributed systems because of:
• Simplicity and Abstraction
• Consistency
• Ease of maintenance
• Scalability, Reliability and Integrity of the database
Types of Transparencies in Distributed DBMS -
Location Transparency – It refers to the ability to access distributed resources without
knowing their physical or network locations. It hides the details of where resources are
located, providing a uniform interface for accessing them.
Example – DNS (Domain Name Systems) and VMs (Virtual Machines)

Access transparency – It ensures that users and applications can access distributed resources
uniformly, regardless of the distribution of those resources across the network.

Example – RPCs (Remote Procedure Calls) and Message Queues.

Concurrency transparency - It hides the complexities of concurrent access to shared resources


in distributed systems from the application developer. It ensures that concurrent operations do
not interfere with each other.
Example – Locking Mechanisms and Transaction Management
Replication transparency - It ensures that clients interact with a set of replicated resources as
if they were a single resource. It hides the presence of replicas and manages consistency
among them.
Example – Content Delivery Network and Database Replication Techniques
Failure transparency – It ensures that the occurrence of failures in a distributed system does
not disrupt service availability or correctness. It involves mechanisms for fault detection,
recovery, and resilience.
Example – Load Balancers and Automatic Failovers
Performance transparency – It ensures consistent performance levels across distributed nodes
despite variations in workload, network conditions, or hardware capabilities.
Example – Load Balancing and caching
Security transparency – It ensures that security mechanisms and protocols are integrated into
a distributed system seamlessly, protecting data and resources from unauthorized access or
breaches.
Example – Encryption Techniques and Access Controls
Management transparency – It simplifies the monitoring, control, and administration of
distributed systems by providing unified visibility and control over distributed resources.
Example – Cloud Management Platforms and configuration tools
Reference Architecture of Distributed DBMS -
Client−Server Architecture -
• A common method for spreading database functionality is the client−server architecture.
• Clients communicate with a central server, which controls the distributed database system
• The server is in charge of maintaining data storage, controlling access, and organizing
transactions.
• This architecture has several clients and may have several servers connected.
• A client sends a query and the server which is available at the earliest would help solve it.
• This Architecture is simple to execute because of the centralised server system.
Peer – to - Peer Architecture -
• Each node in the distributed database system
may function as both a client and a server in a
peer−to−peer architecture.
• Each node is linked to the others and works
together to process and store data. Each node is
in charge of managing its data management and
organizing node−to−node interactions.

• Because the loss of a single node does not


cause the system to collapse, peer−to−peer
systems provide decentralized control and high
fault tolerance.

• This design is ideal for distributed systems with nodes that can function independently and
with equal capabilities.
Federated Architecture -
• Multiple independent databases with
various types are combined into a single
meta−database using a federated database
design.

• It offers a uniform interface for navigating


and exploring distributed data.

• In the federated design, each site maintains a separate, independent database, while the
virtual database manager internally distributes requests.

• When working with several data sources or legacy systems that can't be simply updated,
federated architectures are helpful.
Shared−Nothing Architecture -

• Data is divided up and spread among several


nodes in a shared−nothing architecture, with
each node in charge of a particular portion of
the data.
• Resources are not shared across nodes, and
each node runs independently.

• Due to the system's capacity to add additional nodes as needed without affecting the
current nodes, this design offers great scalability and fault tolerance.
• Large−scale distributed systems, such as data warehouses or big data analytics platforms,
frequently employ shared−nothing designs.
Data Fragmentation in Distributed Databases -

• The process of dividing the database into smaller multiple parts or sub−tables is called
fragmentation.

• The smaller parts or sub−tables are called fragments and are stored at different locations.

• Data fragmentation should be done in a way that the reconstruction of the original parent
database from the fragments is possible.

• The restoration can be done using UNION or JOIN operations.

• The users needn’t be logically concerned about fragmentation which means they should not
concerned that the data is fragmented and this is called fragmentation Independence or we
can say fragmentation transparency.
Types of Data Fragmentation -
Database fragmentation is of three types.
Horizontal Fragmentation -
• It refers to the process of dividing a table horizontally by assigning each row (or a group of
rows) of relation to one or more fragments.
• These fragments can then be assigned to different sites in the distributed system.
• Some of the rows or tuples of the table are placed in one system and the rest are placed in
other systems.
• The rows that belong to the horizontal fragments are specified by a condition on one or
more attributes of the relation.
• In relational algebra horizontal fragmentation on table T, can be represented as σp(T)
Example - consider an EMPLOYEE table (T) as shown below -

This EMPLOYEE table can be divided into different fragments like:

EMP 1 = σDep = 1 EMPLOYEE


EMP 2 = σDep = 2 EMPLOYEE

These two fragments are: T1 fragment of Dep = 1

These two fragments are: T2 fragment of Dep = 2


Vertical Fragmentation -
• Vertical fragmentation refers to the process of decomposing a table vertically by attributes
or columns.
• In this fragmentation, some of the attributes are stored in one system and the rest are
stored in other systems.
• The fragmentation should be in such a manner that we can rebuild a table from the
fragment by taking the natural JOIN operation and to make it possible we need to include a
special attribute called Tuple-id to the schema.

• The projection is as follows:


πa1, a2,…, an (T)
where, π is relational algebra operator
a1…., an are the attributes of T
T is the table (relation)
For example, for the EMPLOYEE table we have T1 as :

For the second. sub table of relation after


vertical fragmentation is given as follows :

This is T2 and to get back to the original T, we join these two fragments T1 and T2
as πEMPLOYEE (T1 ⋈ T2)
Mixed or Hybrid Fragmentation -
• The combination of vertical fragmentation of a table followed by further horizontal
fragmentation of some fragments is called mixed or hybrid fragmentation.
• For defining this type of fragmentation we use the SELECT and the PROJECT operations of
relational algebra.
• The horizontal and the vertical fragmentation isn’t enough to distribute data for some
applications and in that conditions, we need a fragmentation called a mixed
fragmentation.
• Mixed fragmentation can be done in two different ways:
1. The first method is to first create a set or group of horizontal fragments and then create
vertical fragments from one or more of the horizontal fragments.
2. The second method is to first create a set or group of vertical fragments and then create
horizontal fragments from one or more of the vertical fragments.
• The original relation can be obtained by the combination of JOIN and UNION operations
Distributed Database Access Primitives -
The fundamental building blocks that distributed systems employ to coordinate and
communicate among network nodes are known as distributed primitives.
These fundamental activities, or protocols, offer dependable and effective methods for
exchanging information, coordinating actions, and managing errors in a distributed setting.
Major Access primitives are -
• Message Passing
• Locking
• Leader Election
• Atomic
• Consensnus
• Replication
Message Passing -
• A distributed system’s nodes can communicate with one another by using a protocol called
message passing or forwarding.
• It permits communication between nodes that might be dispersed geographically, uses
various operating systems or coding languages, and has various processing powers.
• Message passing, for instance, can be used in a microservices architecture to facilitate
communication between several services that each carry out particular tasks.
• When Service B receives a message from Service A, it may process it and reply to Service A.
• This enables services to function independently of one another and provides for flexible
connectivity between them.
Locking -
• Locking is a strategy for synchronizing access to resources in order to avoid conflicts caused
by many nodes trying to access the same resource concurrently.
• It is frequently used to guard against discrimination and guarantee fair access to data.
• For instance, locking can be used in a distributed database to prevent multiple nodes from
writing to the same database record at the same time.
• The other nodes must wait for the lock to be released, while only one node can acquire the
lock and execute the write operation.
Leader Election -
• A distributed system’s leader node is chosen using the leader election protocol to control
coordination and decision-making.
• It’s frequently used in fault-tolerant systems to make sure that only one node is in charge of
managing operations and making decisions.
• For instance, a leader election protocol can be used in a distributed system with numerous
nodes to guarantee that one node is designated as the major node in charge of coordinating
operations.
• Another node can be chosen as the new leader to take over the coordinating and decision-
making duties if the primary node fails.
Atomic Transaction -
• Atomic transactions are a method for ensuring that several activities are carried out as a
single, indivisible unit, thereby ensuring consistency and dependability.
• Atomic transactions in a distributed system guarantee that a set of operations will either
succeed completely or fail completely.
• An atomic transaction, for instance, can be used in a banking application to guarantee that a
money transfer between two accounts is successful or completely unsuccessful.
• To maintain consistency and dependability, the entire transaction is rolled back if any
portion of it fails.
Consensus -
• Consensus is a procedure that allows a group of nodes in a distributed system to come to an
understanding even when there are failures.
• Consensus methods guarantee that the agreed-upon value is trustworthy and consistent
and that all nodes concur on it.
• For instance, a consensus mechanism like proof of work or proof of stake is used in a
blockchain network to make sure that all nodes concur on the network’s state, the
sequencing of transactions, and the generation of new blocks.
Replication -
• To provide fault tolerance and scalability in a distributed system, replication is a technology
used to duplicate data or services across several nodes.
• Through replication, it is made possible for another node to take over processing duties in
the event of a failed node without affecting the system’s overall performance.
• Replication, for instance, can be used in a web application to guarantee that many instances
of the program are active at once, offering high availability and scalability.
• Without affecting the general user experience, processing can continue if one instance fails.
Integrity Constraints in Distributed Databases -
• Integrity constraints are the set of predefined rules that are used to maintain the quality of
information.
• Integrity constraints ensure that the data insertion, data updating, data deleting and other
processes have to be performed in such a way that the data integrity is not affected.
• They act as guidelines ensuring that data in the database remain accurate and consistent.
So, integrity constraints are used to protect databases.
Types of Integrity Constraints -
• Domain Constraints
• Not-Null Constraints
• Entity integrity Constraints
• Key Constraints
• Primary Key Constrains
• Referential integrity constraints
Domain Constraints -
These are defined as the definition of valid set of
values for an attribute. The data type of domain
include string, char, time, integer, date, currency etc.
The value of the attribute must be available in
comparable domains.
Not-Null Constraints -
It specifies that within a tuple, attributes overs which not-null constraint is specified must not
contain any null value.
Example:
Let, the not-null constraint be specified on the "Semester"
attribute in the relation/table given below, then the data
entry of 4th tuple will violate this integrity constraint,
because the "Semester" attribute in this tuple contains null
value. To make this database instance a legal instance, its
entry must not be allowed by database management
system.
Entity Integrity Constraints -
Entity integrity constraints state that primary key can never contain null value because primary
key is used to determine individual rows in a relation uniquely, if primary key contains null
value then we cannot identify those rows. A table can contain null value in it except primary
key field.
Example:
It is not allowed because it is containing primary key as NULL value.
Key Constraints -
Keys are the entity set that are used to
identify an entity within its entity set uniquely.
An entity set can contain multiple keys, bit out
of them one key will be primary key. A primary
key is always unique, it does not contain any
null value in table.

Primary Key Constraints -


It states that the primary key attributes are required to
be unique and not null. That is, primary key attributes
of a relation must not have null values and primary key
attributes of two tuples must never be same. This
constraint is specified on database schema to the
primary key attributes to ensure that no two tuples
are same.
Referential integrity constraints -
It can be specified between two tables. In case of referential integrity constraints, if a Foreign
key in Table 1 refers to Primary key of Table 2 then every value of the Foreign key in Table 1
must be null or available in Table 2.
Example:
Here, in below example Block_No 22 entry is not allowed because it is not present in 2nd
table.

You might also like