0% found this document useful (0 votes)
8 views

Midterm Elective Database Notes

The document discusses distributed database management systems. It defines distributed databases and describes their key features and advantages. It also covers database models, operations, and design principles including data replication and fragmentation.

Uploaded by

ssamyang92
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Midterm Elective Database Notes

The document discusses distributed database management systems. It defines distributed databases and describes their key features and advantages. It also covers database models, operations, and design principles including data replication and fragmentation.

Uploaded by

ssamyang92
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Chapter 1 - DISTRIBUTED DATABASE MANAGEMENT SYSTEMS DISTRIBUTED DBMS

 A database is an ordered collection of related data that is


built for a specific purpose.  They use small, reusable elements called objects.
 A database may be organized as a collection of multiple  Each object contains a data part and a set of operations which works
tables, where a table represents a real world element or upon the data.
entity.  In these systems, data is intentionally distributed among multiple
 A database management system is a collection of nodes so that all computing resources of the organization can be
programs that enables creation and maintenance of a optimally used.
database.  A distributed database is a collection of multiple interconnected
 DBMS is available as a software package that facilitates databases, which are spread physically across various locations that
definition, construction, manipulation and sharing of data in communicate via a computer network.
a database.
DISTRIBUTED DATABASE FEATURES
DATABASE SCHEMA
 A database schema is a description of the database which  Databases in the collection are logically interrelated with each other.
is specified during database design and subject to Often they represent a single logical database.
infrequent alterations.  Data is physically stored across multiple sites. Data in each site can
 It defines the organization of the data, the relationships be managed by a DBMS independent of the other sites.
among them, and the constraints associated with them.  The processors in the sites are connected via a network. They do not
 Databases are often represented through the three- have any multiprocessor configuration.
schema architecture or ANSISPARC architecture.  A distributed database is not a loosely connected file system.
 The three levels are –  A distributed database incorporates transaction processing, but it is
- Internal Level having Internal Schema not synonymous with a transaction processing system.
- Conceptual Level having Conceptual
Schema DISTRIBUTED MANAGEMENT SYSTEM
- External or View Level having External
Schemas or Views  A distributed database management system (DDBMS) is a centralized
TYPES OF DBMS software system that manages a distributed database in a manner as
1. Hierarchical DBMS if it were all stored in a single location.
- In hierarchical DBMS, the relationships DISTRIBUTED MANAGEMENT SYSTEM FEATURES
among data in the database are
established so that one data element  It is used to create, retrieve, update and delete distributed databases.
exists as a subordinate of another.  It synchronizes the database periodically and provides access
- The data elements have parent-child mechanisms by the virtue of which the distribution becomes
relationships and are modelled using the transparent to the users.
“tree” data structure. These are very fast  It ensures that the data modified at any site is universally updated.
and simple.  It is used in application areas where large volumes of data are
2. Network DBMS processed and accessed by numerous users simultaneously.
- Network DBMS in one where the  It is designed for heterogeneous database platforms.
relationships among data in the database  It maintains confidentiality and data integrity of the databases.
are of type many-to-many in the form of a
network. FACTORS ENCOURAGING DDBMS
- The structure is generally complicated due
to the existence of numerous many-to-  Distributed Nature of Organizational Units
many relationships.  Need for Sharing of Data
- Network DBMS is modelled using “graph”  Support for Both OLTP and OLAP
data structure.  Database Recovery
3. Relational DBMS  Support for Multiple Application Software
- In relational databases, the database is
represented in the form of relations. ADVANTAGES OF DISTRIBUTED DATABASES
- In the relation or table, a row is called a
 Modular Development
tuple and denotes a single record.
 More Reliable
- A column is called a field or an attribute
and denotes a characteristic property of  Better Response
the entity.  Lower Communication Cost
4. Object-Oriented DBMS
DISADVANTAGES OF DISTRIBUTED DATABASES
- Object-oriented DBMS is derived from the
model of the object-oriented programming  Need for complex and expensive software
paradigm.  Processing overhead
- They are helpful in representing both  Data integrity
consistent data as stored in databases, as  Overheads for improper data distribution
well as transient data, as found in
executing programs. DATABASE MODELS
- They use small, reusable elements called
objects.  Every DB captures data in two interdependent respects; it captures
- Each object contains a data part and a set both the data structure and the data content.
of operations which works upon the data.  The term “data content” refers to the values actually stored in the DB.
 the structure of data, refers to the data model (DM)
 also called the DB Schema or simply Schema
 DM captures many details about the data being stored, but the DM
does not include the actual data content
DATABASE OPERATIONS VERTICAL FRAGMENTATION

 Every DB must at least support the ability to “create” new  the fields or columns of a table are grouped into fragments
data content and the ability to retrieve existing data  to maintain reconstructiveness, each fragment should
content. contain the primary key field(s) of the table
 CRUP operations (create, retrieve, update delete)  can be used to enforce privacy of data

DESIGN PRINCIPLES VERTICAL FRAGMENTATION EXAMPLE

 Can either be:  fees details are maintained in the accounts section
- Replication fragementation is done this way:
- Fragmentation
CREATE TABLE STD_FEES AS
DATA REPLICATION SELECT Regd_No, Fees FROM STUDENT;
 the process of storing separate copies of the database at HORIZONTAL FRAGMENTATION
two or more sites
 popular fault tolerance technique of distributed databases  groups the tuples of a table in accordance to values of one
or more fields
ADVANTAGES OF DATA REPLICATION  should also confirm to the rule of reconstructiveness
 each horizontal fragment must have all columns of the
 Reliability
original base table
 Reduction in Network Load
 Quicker Response Horizontal Fragmentation Example
 Simpler Transactions
 if the details of all students of Computer Science Course
DISADVANTAGES OF DATA REPLICATION needs to be maintained at the School of Computer
Science, then the designer will horizontally fragment the
 Increased Storage Requirements
database as follows
 Increased Cost and Complexity of Data Updating
 Undesirable Application – Database coupling CREATE COMP_STD AS SELECT * FROM STUDENT
COMMON REPLICATION TECHNIQUES WHERE COURSE = "Computer Science";

HYBRID FRAGMENTATION
 Snapshot replication
 Near-real-time replication  a combination of horizontal and vertical fragmentation
 Pull replication techniques are used
 most flexible fragmentation technique since it generates
FRAGMENTATION
fragments with minimal extraneous information
 the task of dividing a table into a set of smaller tables  reconstruction of the original table is often an expensive
 subsets of the table are called fragments task
 should be done in a way so that the original table can be  Two ways to do it:
reconstructed from the fragments - At first, generate a set of horizontal
 this requirement is called “reconstructiveness” fragments; then generate vertical
fragments from one or more of the
TYPES OF FRAGMENTATION horizontal fragments.
- At first, generate a set of vertical
 Horizontal Fragmentation fragments; then generate horizontal
- primary horizontal fragmentation fragments from one or more of the vertical
- derived horizontal fragmentation fragments.
 Vertical Fragmentation
 Hybrid DATABASE ENVIRONMENT ARCHITECTURAL CONCEPTS

ADVANTAGES OF FRAGMENTATION  Services


- regardless of the deployment details, we
 Since data is stored close to the site of usage, efficiency of can create logical collections of related
the database system is increased. functionality called services
 Local query optimization techniques are sufficient for most - Services are merely logical collections,
queries since data is locally available. which means that they do not necessarily
 Since irrelevant data is not available at the sites, security have corresponding structure duplicated in
and privacy of the database system can be maintained. the actual implementation or deployment
details
DISADVANTAGES OF FRAGMENTATION  any piece of software that uses a service a service
consumer, while any piece of software implementing the
 When data from different fragments are required, the
service is called a service provider
access speeds may be very low.
 In case of recursive fragmentations, the job of COMPONENTS AND SUBSYSTEMS
reconstruction will need expensive techniques.
 Lack of back-up copies of data in different sites may  A component is simply a deployable bundle that provides a
render the database ineffective in case of failure of a site. reasonably cohesive set of functionality.
 a subsystem is a collection of one or more components
that work together toward a common goal.
Chapter 2 - UNIFIED SCHEMA

 similar to the GCM, except that there canbe more than one FRAGMENTED OR PARTITIONED
unified schema
 Schema integration is a process that uses a collection of  Fragmentation design approach breaks a table up into two or more
existing conceptual model elements, which have previously pieces called fragments or artitions and allows storage of these
been exported from one or more LCMs, to generate a pieces in different sites.
semantically integrated model (a single, unified schema)  This distribution alternative is based on the belief that not all the data
DISTRIBUTED TABLES CAN BE OF ANY OF THE FOLLOWING within a table is required at a given site.
FORM:  In addition, fragmentation provides for increased parallelism, access,
 Nonreplicated, nonfragmented (nonpartitioned)  disaster recovery, and security/privacy.
 Fully replicated (all tables)  In this design alternative, there is only one copy of each fragment in
 Fragmented (also known as partitioned) the system (nonreplicated fragments).
 Partially replicated (some tables or some fragments)  Three alternatives to fragmentation
 Mixed (any combination of the above) - Vertical Fragmentation
- Horizontal Fragmentation
DATA DISTRIBUTION - Hybrid Fragmentation
 The goal of any data distribution is to provide for increased PARTIALLY REPLICATED
availability, reliability, and improved query access time.
 On the other hand, as opposed to query access time,  In this distribution alternative, the designer will make copies of some
distributed data generally takes more time for modification of the tables (or fragments) in the database and store these copies at
(update, delete, and insert). different sites.
 This is based on the belief that the frequency of accessing database
DESIGN ALTERNATIVES tables is not uniform.
 Localized Design MIXED DISTRIBUTION
-This design alternative keeps all data logically belonging
to a given DBMS at one site (usually the site where the  In this design alternative, we fragment the database as desired,
controlling DBMS runs). This design alternative is either horizontally or vertically, and then partially replicate some of
sometimes called “not distributed.” the fragments.
 Distributed Data Design
-A database is said to be distributed if any of its tables are FRAGMENTATION
stored at different sites; one or more of its tables are
 Fragmentation requires a table to be divided into a set of smaller tables
replicated and their copies are stored at different sites;
called fragments.
one or more of its tables are fragmented and the
fragments are stored at different sites; and so on. In  Fragmentation can be horizontal, vertical, or hybrid (a mix of horizontal
and vertical).
general, a database is distributed if not all of its data is
localized at a single site.  designers need to decide on the degree of granularity for each
fragment
DISTRIBUTED DATA DESIGN  QUESTION: how many of the table columns and/or rows should be in
a fragment?
 Nonreplicated, nonfragmented  At one end, we can have all the rows and all the columns of the table
 Fully Replicated in one fragment.
 Fragmented or Partitioned  At the other end, we can put each data item (a single column value for
- Vertical fragmentation a single row) in a separate fragment.
- Horizontal fragmentation
- Hybrid fragmentation VERTICAL FRAGMENTATION
 Partially Replicated
 Mixed Distribution  Vertical fragmentation (VF) will group the columns of a table into
fragments.
NONREPLICATED, NONFRAGMENTED  VF must be done in such a way that the original table can be
reconstructed from the fragments
 This design alternative allows a designer to place different  This fragmentation requirement is called “reconstructiveness.”
tables of a given database at different sites.  Each VF fragment must contain the primary key column(s) of the table.
 The idea is that data should be placed close to (or at the  VF can be used to enforce security and/or privacy of data.
site) where it is needed the most. - Create table EMP_SAL as Select EmpID, Sal From
 One benefit of such data placement is the reduction of the EMP;
communication component of the processing cost. - Create table EMP_NON_SAL as
- Select EmpID, Name, Loc, DOB, Dept From EMP;
FULLY REPLICATED
 Reconstruct fragment 1 and 2
 This design alternative stores one copy of each database  After fragmentation, the EMP table will not be stored physically
table at every site. anywhere.
 Since every local system has a complete copy of the  But, to provide for fragmentation transparency—not requiring the users
entire database, all queries can be handled locally. to know that the EMP table is fragmented— we have to be able to
 This design alternative therefore provides for the best reconstruct the EMP table from its VF fragments.
possible query performance.  Reconstruction using JOIN
 On the other hand, since all copies need to be in sync— - Select EMP_SAL.EmpID, Sal, Name, Loc, DOB, Dept
show the same values—the update performance is From EMP_SAL, EMP_NON_SAL
impacted negatively. Where EMP_SAL.EmpID = EMP_NON_SAL.EmpID;
HORIZONTAL FRAGMENTATION
 Horizontal fragmentation (HF) can be applied to a base
table or to a fragment of a table. Create table PROJ1 as
 a fragment of a table is itself a table Select Pno, Pname, Budget, PROJ.Dno From PROJ, MPLS_DEPTS
 HF will group the rows of a table based on the values of one Where PROJ.Dno = MPLS_DEPTS.Dno;
or more columns. Create table PROJ2 as
 To create a horizontal fragment from a table, a select Select Pno, Pname, Budget, PROJ.Dno From PROJ, NY_DEPTS
statement is used. Where PROJ.Dno = NY_DEPTS.Dno;
 Example Create table PROJ3 as
- following statement selects the row from R Select Pno, Pname, Budget, PROJ.Dno From PROJ, LA_DEPTS
satisfying condition C: Where PROJ.Dno = LA_DEPTS.Dno;
Select * from R where C;
 Primary Horizontal Fragmentation HYBRID FRAGMENTATION
 Derived Horizontal Fragmentation
 Hybrid fragmentation (HyF) uses a combination of horizontal and
PRIMARY HORIZONTAL FRAGMENTATION vertical fragmentation to generate the fragments we need.
 This fragmentation approach provides for the most flexibility for the
 Primary horizontal fragmentation (PHF) partitions a table designers but at the same time it is the most expensive approach with
horizontally based on the values of one or more columns of respect to reconstruction of the original table.
the table.  Two approaches of HyF:
 Example 1. generate a set of horizontal fragments
- Suppose we have three branch offices, with each and then vertically fragment one of more
employee working at only one office. For ease of use, of these horizontal fragments
we decide that information for a given employee 2. generate a set of vertical fragments and
should be stored in the DBMS server at the branch then horizontally fragment one or more of
office where that employee works. these vertical fragments
Create table MPLS_EMPS as Select *  Example
From EMP - Let’s assume that employee salary information needs
Where Loc = ‘Minneapolis’; to be maintained in a separate fragment from the
Create table LA_EMPS as Select * nonsalary information as discussed above. A vertical
From EMP fragmentation plan will generate the EMP_SAL and
Where Loc = ‘LA’; EMP_NON_SAL vertical fragments
Create table NY_EMPS as Select * - The nonsalary information needs to be fragmented
From EMP into horizontal fragments, where each fragment
Where Loc = ‘New York’; contains only the rows that match the city where the
 After fragmentation, the EMP table will not be physically employees work. We can achieve this by applying
stored anywhere. To provide for horizontal fragmentation horizontal fragmentation to the EMP_NON_SAL
transparency, we have to be able to reconstruct the EMP fragment of the EMP table.
table from its HF fragments.
 This will give the users the illusion that the EMP table is Create table NON_SAL_MPLS_EMPS as Select *
stored intact. From EMP_NON_SAL
- (Select * from MPLS_EMPS Union Where Loc = ‘Minneapolis’;
Select * from LA_EMPS) Union Create table NON_SAL_LA_EMPS as Select *
Select * from NY_EMPS; From EMP_NON_SAL Where Loc = ‘LA’;
Create table NON_SAL_NY_EMPS as Select *
DERIVED HORIZONTAL FRAGMENTATION From EMP_NON_SAL
Where Loc = ‘New York’;
 a designer may decide to fragment a table according to the
way that another table is fragmented
 A simpler and more direct way of implementing the Hybrid
 DHF is usually used for two tables that are naturally (and
Fragmentation
frequently) joined.
 Therefore, storing corresponding fragments from the two
Create table NON_SAL_MPLS_EMPS as Select
tables at the same site will speed up the join across the
EmpID, Name, Loc, DOB, Dept From EMP
two tables.
Where Loc = ‘Minneapolis’;
 As a result, an implied requirement of this fragmentation
Create table NON_SAL_LA_EMPS as Select EmpID,
design is the presence of a join column across the
Name, Loc, DOB, Dept From EMP
two tables.
Where Loc = ‘LA’;
 Example Create table NON_SAL_NY_EMPS as Select EmpID,
- Table “DEPT(Dno, Dname, Budget, Loc),” where Dno Name, Loc, DOB, Dept From EMP
is the primary key of the table. Let’s assume that Where Loc = ‘New York’;
DEPT is ragmented based on the department’s city.
Applying PHF to the DEPT table generates three VERTICAL FRAMENTATION GENERATION GUIDELINES
horizontal fragments, one for each of the cities in the
database.  Grouping
- Consider the table “PROJ.” We can partition the - an approach that starts by creating as many vertical
PROJ table based on the values of Dno column in the fragments as possible and then incrementally
DEPT table’s fragments with the following SQL reducing the number offragments by merging the
statements. fragments together
create one fragment per nonkey column, placing the nonkey column and the fragments using a combination of some SQL
primary key of the table into each vertical fragment this creates as many vertical statements (joins in this case).
fragments as the number of nonkey columns in the table
VERTICAL FRAGMENT CORRECTNESS RULE

 Since the original table is not physically stored in a DDBE, the original
 The grouping approach uses joins across the primary key, to group table must be reconstructible from its vertical fragments using a
some of these fragments together, continue this process until the combination of some SQL statements (joins in this case).
desired design is achieved  The following requirements must be satisfied when fragmenting a table
 Splitting is essentially the opposite of grouping. vertically:
 a table is fragmented by placing each nonkey column in one (and only - Completeness
one) fragment, focusing on identifying a set of required columns for - Reconstructiveness
each vertical fragment - Shared primary key
 there is no overlap of nonprimary key columns in the vertical fragments
that are created using splitting HORIZONTAL FRAGMENTATION GENERATION GUIDELINES

SPLITTING IN DISTRIBUTED SYSTEMS  Applying horizontal fragmentation to a table creates a set of fragments
that contain disjoint rows of the table (horizontal disjointness).
 Consider applications “AP1”, “AP2”, “AP3”, and  Horizontal fragmentation is useful because it can group together the
 “AP4” as shown. These applications work on the table “T” rows of a table that satisfy the predicates of frequently run queries.
defined as “T(C, C1, C2, C3, C4),”  all rows of a given fragment should be in the result set of a query that
 where C is the primary key column of the table. runs frequently
 To fragment a table horizontally, we use one or more predicates
USAGE MATRIX (conditions). There are different types of predicates to choose from
when fragmenting the table.
 For a single site system, these applications are local and will
 In general, a simple predicate, P, follows the format “Column_Name
have the usage matrix.
comparative−operator Value.”
 the usage matrix is a two dimensional matrix that indicates
 The comparative operator is one of the operators in the set {=, <, >, >
whether or not an attribute (column) is used by an application
=, <=, <>}.
 The usage matrix only indicates if a column is used by an
 We can also use a minterm predicate, M, which is defined as a
application. However, the matrix does not show how many times
conjunctive normal form of simple predicates.
an application accesses the table columns during a given time
 The set of all simple predicates used by all applications that query a
period.
given table is shown as “Pr = {p1, p2,..., pn}.”
 Neither the usage matrix nor the access frequencies have any
 The set of all minterm predicates used by all applications that query
indication of distribution.
the table is shown as “ M = {m1, m2,..., mk}.”
 In a distributed system, however, an application can have
 Applying the minterm predicates M to the table generates k minterm
different frequencies at different sites.
horizontal fragments denoted by the set “F = {F1, F2,..., Fk}.” For this
 Example, in a four-site system, AP2 might run four times at S2
fragmentation design, all rows in Fi, for “i = 1..k,” satisfy mi.
and three times at S3. It is also possible that AP2 might run
seven times at site S2 and zero times everywhere else. In both MINIMALITY AND COMPLETENESS OF HORIZONTAL FRAGMENTATION
cases, the frequency would still be shown as seven in
the usage matrix. Also, each time the application runs at a site it  It should be obvious that the more fragments that exist in a system,
might make more than one access to the table (and its columns) the more time the system has to spend in reconstructing the table.
 suppose we had another application, AP5, defined as follows:  As a result, it is important to have a minimal set of horizontal
Begin AP5 fragments.
Select C1 from T where C4 = 100; Select C4 from T;  Rules:
End AP1 - Rule 1: The rows of a table (or a fragment) should be
 In this case, AP5 makes two references to T each time it runs. partitioned into at least two horizontal fragments, if the
As a result, the actual access frequency for AP5 is calculated as rows are accessed differently by at least one
“ACC(Pi) * REF (Pi),” where ACC(Pi) is the number of times the application.
application runs and REF (Pi) is the number of accesses Pi - When the successive application of Rule 1 is no
makes to T every time it runs. To simplify the discussion, we longer required, the designer has generated a minimal
assume “REF(Pi) = 1” for all processes, which results in “ACC(Pi) and complete set of horizontal fragments.
* REF(Pi) = ACC(Pi).” Completeness is, therefore, defined as follows.
 If we include the access frequencies of the applications at each - Rule 2. A set of simple predicates, Pr, for a table is
site for our original example (without AP5), this makes the complete if and only if, for any two rows within any
usage matrix a three-dimensional matrix, where in the third axis minterm fragment defined on Pr, the rows have the
we maintain the frequency of each application for each site. same probability of being accessed by any
 By adding access frequencies for each application across all application.
sites, we can get the affinity or closeness that each column has  Example
with the other columns referenced by the same application. - Suppose application “AP1” queries the table “EMP”,
looking for those employees who work in Los Angeles
BOND ENERGY ALGORITHM (LA). The set “Pr = {p1: Loc = “LA”}” shows all the
required simple predicates used by AP1. Therefore,
 This algorithm takes the affinity matrix as an input parameter and
the set “M = {m1: Loc = “LA”, m2: Loc<>“LA”}” is a
generates a new matrix called the clustered affinity matrix as its
minimal and complete set of minterm predicates for
output.
AP1
 The clustered affinity matrix is a matrix that reorders the columns
and rows of the affinity matrix so that the columns with the MINIMAL AND COMPLETENESS OF HORIZONTAL FRAGMENTATION
greatest affinity for each other are “grouped together” in the
same cluster—which is then used as the basis for splitting our M fragments EMP into the following two fragments:
table into vertical fragments. Fragment F1: Create table LA_EMPS as Select * from
 Since the original table is not physically stored in a DDBE, the EMP
original table must be reconstructible from its vertical Where Loc = "LA";
Fragment F2: Create table NON_LA_EMPS as Select - This is somewhat analogous to the way that users of
* from EMP a SQL view are often unaware that they are not using
Where Loc <> "LA"; an actual table (many views are actually defined as
several union and join operations working across
several different tables).
HORIZONTAL FRAGMENTATION CORRECTNESS RULEEXAMPLE  Replication Transparency
- The fact that there might be more than one copy of a
 The original table must be reconstructible from its fragments using
table stored in the system should be hidden from
a combination of SQL statements
the user.
 Whenever we use fragmentation (vertical, horizontal, or hybrid),
- This provides for replication transparency, which
the original table is not physically stored.
enables the user to query any table as if there were
 Vertically fragmented tables are reconstructed using join
only one copy of it.
operations
 horizontally fragmented tables are reconstructed using union IMPACT DISTRIBUTION ON USER QUERIES
operations
 tables fragmented using hybrid fragmentation are reconstructed  Developers of a distributed DBMS try to provide for location,
using a combination of union and join operations. fragmentation, and replication transparencies to their users.
 Any fragmentation must satisfy the following rules as defined by  This is an attempt to make the system easier to use.
¨Ozsu [ ¨Ozsu99]:  DDBMS must store distribution information in its global data dictionary
- Rule 1. Completeness. Decomposition of R into R1, and use this information in processing the users’ requests.
R2, ... , Rn is complete if and only if each data item in  In such a system, although the users query the tables as if they were
R can also be found in some Ri. stored locally, in reality their queries must be processed by one or
- Rule 2. Reconstruction. If R is decomposed into R1, more database servers across the network.
R2, ... , Rn, then there should exist some
relational operator, , such that “R = 1≤i≤n Ri.” DATABASE CONTROL
- Rule 3. Disjointness. If R is decomposed into R1, R2,
 Authentication
... , Rn, and di is a tuple in Rj, then di should
- Authentication in a DBE guarantees that only
not be in any other fragment, such as Rk,
legitimate users have access to data resources in a
- where k = j.
DBE. At the highest level of authentication, access to
REPLICATION the client computer (the client is the front-end to the
database server) or the database server is
 the designer may decide to copy some of the fragments or tables controlled.
to provide better accessibility and reliability  Access Rights
 the more copies of a table/fragment one creates, the easier it is - In relational database systems, what a database
to query that table/fragment. user can do inside the database is controlled by the
 On the other hand, the more copies that exist, the more access rights that are given to that user.
complicated (and time consuming) it is to update all the copies. - A user’s access rights specify the privileges that the
 As a rule of thumb, if it is queried more frequently than it is user has.
modified, then replication is advisable.  Access Rights
 Once we store more than one copy of a table/fragment in the - In any large database environment, there are many
distributed database system, we increase the probability of having users, many databases, many tables with many
a copy locally available to query. columns, and many other database objects.
 Having more than one copy of a fragment in the system increases - To reduce the overhead associated with
the resiliency of the system as well. management of rights for a large system, the
concept of a role is used
DISTRIBUTION TRANSPARENCY  Semantic Integrity Control
- A DBMS must have the ability to specify and enforce
 Distribution transparency is one of the sought after features of a correctness assertions in terms a set of semantic
distributed DBE. integrity rules.
 It is this transparency that makes the system easy to use by hiding - The semantic integrity service (Semi- S) is used to
the details of distribution from the users. define and enforce the semantic integrity rules for
 Three aspects of Distribution Transaparency the system.
- Location Transparency - When one or more semantic integrity rules are
- Fragmentation Transparency violated, Semi-S can report, reject, or try to correct
- Replication Transparency the query or transaction that is performing an illegal
 Location Transaparency operation.
- The fact that a table (or a fragment of table) is stored  Semantic Integrity Constraints
at a remote site in a distributed system should be - Data type constraints
hidden from the user. - Relation constraints
When a table or fragment is stored remotely, the user - Referential constraints
should not need to know which site it is located at, or - Explicit constraints
even be aware that it is not located locally.  Data type constraints
 Location Transparency - semantic integrity rules that specify the data type for
- Enables the user to query any table (or any fragment) columns of relational tables the range of values and
as if it were stored locally. the type of operations that we can apply to the
 Fragmentation Transparency column to which the data type is attached
- The fact that a table is fragmented should be hidden  Relation Constraints
from the user. - are the methods used to define a relation (or a
- This provides for fragmentation transparency, which table).
enables the user to query any table as if it were intact  Referential Constraints
and physically stored.
- (RI) constraints restrict the values stored in the between the sites involved in a
database so that they reflect the way the objects of transaction
the real world are related to each other. - The cost of communication is
directly associated with the
number of required messages
for coordination
 Explicit Constraints
 Semantic Integrity Enforcement
- are not inherited from the ERM like the other three
1. Cost in Distributed System- In calculating the cost of semantic
constraints. We have to either code these constraints
integrity enforcement in distributed systems, we assume the
into the application programs that use the database
cost of the CPU is negligible as compared to the
or code them into the database using the concept of
communication cost.
stored procedures and triggers.
We also assume that communication cost is directly related to
 In distributed systems, semantic integrity assertions are
the number of messages required for each approach.
basically the same as in centralized systems (i.e., relation, data
This assumption ignores the amount of data transferred by
type, referential, and explicit constraints).
each message and simply focuses on the number of messages.
 semantic integrity control issues become more complicated due
to distribution Chapter 3 - Query Optimization
 The following outlines the additional challenges in distributed Distributed Database Systems
semantic integrity control:
- Creation of multisite semantic integrity rules  Describe processing of queries in a
- Enforcement of multisite semantic integrity rules distributed database management system
- Maintaining mutual consistency of local semantic  Understand how to extend the concept of query processing
integrity rules across copies in a centralized system to incorporate distribution
- Maintaining consistency of local semantic integrity
rules and global semantic integrity rules Relational Algebra
 Approaches:
- Compile Time Validation  notations:
- Runtime Validatioin  R and S are two relations.
- Postexecution Time Validation  The number of tuples in a relation is called the cardinality of that
1. Compile Time Validation- relation.
- In this approach to validation,  R has attributes a1, a2, ... , an and has cardinality of K.
transactions are allowed to run  S has attributes b1, b2, ... , bm and has cardinality of L.
only after all the semantic
 r is a tuple in R and is shown as r[a1, a2, ... , an].
integrity (SI) constraints have
been validated.  s is a tuple in S and is shown as s[b1, b2, ... , bm].
- For this approach to work, SI
Relational Algebra: Subset Commands
data items need to be locked
so that during validation and  Relational algebra (RA) supports
afterward, during transaction
unary and binary types of operations.
execution, they are not
 Unary operations take one relation (table)
changed.
- It is easy to see that compile as an input and produce another as the output.
time validation is simple to  Binary operations take two relations as input
implement and does not incur and produce one relation as the output.
any cost for abort operations  Regardless of the type of operation
since we only run transactions the output is always a relation.
when they are validated—  divided into basic operators and derived operators
transactions that violate the SI o Basic operators need to be supported by the language
rule do not run at all. compiler since they cannot be created from any other
- to implement this approach, operations.
all the constraint data items o Derived operators, on the other hand, are optional since
need to be locked for the
they can
duration of validation and
be expressed in terms of the basic operators.
transaction execution.
2. Runtime Validation
 Notation Focus:
- transactions are validated
during execution
o SL represents the relational algebra SELECT operator.
- When there is a need for
validation, all the data items
o PJ represents the relational algebra PROJECT operator.
involved are locked and the
transaction is validated. If o JN represents the relational algebra JOIN operator.
semantic integrity rules are
violated, the transaction I rolled o NJN represents the relational algebra natural JOIN
back.
operator.
- If semantic integrity rules are
violated, the transaction is o N represents the relational algebra UNION operator.
rolled back.
- the duration for which the o SD represents the relational algebra natural SET
constraint data items need to
DIFFERENCE operator.
be locked is shorter
3. Postexecution Time Validation o CP represents the relational algebra CROSS PRODUCT
- validation cost is dominated by
operator.
the communication cost
o SI represents the relational algebra SET INTERSECT an account
operator. at the Main branch but do not have a loan there.
PJCID (SLBcity = ‘Main’ (Account))
o DV represents the relational algebra DIVIDE operator. SD
PJCID (SLBcity = ‘Main’ (Loan))
Symbols
 Cartessian Product
 σ - Sigma - Select Operator
 π – Pi Project Operator o also known as cross product a binary operation that
 - Bowtie - Cross Product / Join Operator
o concatenates each and every tuple from the first relation
Basic Operators with each and every tuple from the second relation
Select Operator
Derived Operators
o returns all tuples of the relation whose
attribute(s) satisfy the given predicates (conditions) o can be expressed in terms of the basic operators
o If no condition is specified, the
select operator returns all tuples of the relation o not required by the language, but are supported for ease
of programming
o Example:
SLbal =1200 (Account) o these operators are SI, JN (NJN), and DV
returns all accounts that have a balance of
$1200  Set Intersect Operator
Project Operator
o returns the values of all attributes specified in the project o a binary operator that returns the tuples in the intersection
operation for all tuples of the relation passed as a of two relations
parameter
o all rows qualify but only those attributes specified are o if the two relations do not intersect, the operator returns
returned an empty relation
o Example
o Example
PJ Cname,Ccity (Customer)
returns the customer name and the PJCname (SLBcity = ‘Main’ (Account))
city where the customer lives for each and SI
every customer of the bank PJCname (SLBcity = ‘Main’ (Loan))
 Combine the select and project operators in forming complex RA
expressions that not only apply a given set of predicates to the  Join Operator
tuples of a relation but also trim the attributes to a desired set
o special case of the CP operator
 Example o before the tuples are
PJ CID, Cname (SL Ccity =‘Edina’ concatenated, they are checked against some condition(s)
(Customer))
 Union Operator o a binary operation that returns a relation by combining
o a binary operation in RA that combines the tuples from tuples
two relations into one relation from two input relations based on some specified
Any tuple in the union is in the first relation, the second conditions
relation, or
both relations.  Divide Operator
o Compatibility requirements
 the two relations have to be of the same o a binary operator that takes two relations as input and
degree—the two produces one relation as the output
relations have to have the same number of
attributes Centralized Systems Query Optimization
 corresponding attributes of the two relations
have to be from  Query Processing is the activity
compatible domains performed in extracting data from the database. In query processing,
it takes various steps for fetching the data from the database. The
 Example steps involved are:
o Parsing and translation
 Suppose we need to get the name and the address for all of the o Optimization
customers who o Evaluation
live in a city  There are two join conditions and one select condition (known as a
PJCID, Cname (SLCcity = ‘Edina’ (Customer)) filter)
UN in this statement.
PJCID, Cname (SLCcity = ‘Eden Prairie’ (Customer))  The relational algebra (RA) expression that the parser might generate
is shown below:
 Set Difference PJcname (SLBcity = ‘Edina’ (Customer CP
o binary operation in RA that subtracts the tuples in one (Account CP Branch)))
relation from the tuples of another relation  Evaluation: Query Evaluation Plan
o after translating the user query, the system executes a
 Example query evaluation plan
Assume we need to print the customer ID for all customers who have
 Query Optimization
o The DBMS does not execute this expression as is. Chapter 4 – Query Tree – (Review lang ninyo ang
o The expression must go query tree)
o through a series of transformations and optimization
Chapter 5 - DATABASE CONSISTENCY
before it is ready to run.
o The query optimizer is the  Each data item in the database has an associated correctness
component responsible for doing that assertion.
o There are three steps that make up query optimization  A database is said to be consistent if and only if the correctness
 cost estimation criteria for all the data items of the database are satisfied.
 plan generation
TRANSACTION
 query plan code generation
 Query Optimization  A transaction is a collection of operations performed against the data
items of the database.
o Cost estimation  A transaction is a unit of consistent and reliable computation.
 A transaction takes a database, performs an action on it, and
 Unary operator is distributive generates a new version of the database, causing a state transition.
with respect to some binary  similar to what a query does, except that if the database was
operations consistent before the execution of the transaction, we can now
guarantee
 This is the inverse of the that it will be consistent at the end of its execution regardless of the
distributive property. fact that
(Uop(R)) Bop (Uop(S)) ≡ - (1) the transaction may have been executed concurrently
Uop(R Bop S) with others, and
- (2) failures may have occurred during its execution
Example:  a transaction is considered to be made up of a sequence of read and
write operations on the database, together with computation steps
 SLsal > 50000 (PJCname, sal (Customer))
UN (SLsal > 50000 (PJEname, sal (Employee))) ≡ TRANSACTION PROPERTIES
SLsal > 50000
ACID Properties
(PJCname, sal (Customer)) UN (PJEname, sal
(Employee)) 1. Atomicity
- The atomicity property of a transaction indicates that
o Plan Generation either all of the operations of a transaction are carried out
or none of them are carried out.
 A query plan (or simply, a plan, - This property is also known as the “all-or-nothing”
as it is known by almost all property.
DBMSs) is an extended query 2. Consistency
tree that includes access paths - The consistency property of a transaction requires a
for all operations in the tree. transaction to be written correctly.
- Programmer's responsibility that transactions are
 Access paths provide detailed implemented correctly, so that the program carries out the
information on how each intention of the transaction correctly.
operation in the tree is to be 3. Isolation
performed. - The isolation property of a transaction requires that the
transaction be run without interference from other
 In addition to the access paths transactions.
specified for each individual RA - Isolation guarantees that this transaction’s changes to the
operator, the plan also database are not seen by any other transactions until after
specifies how the intermediate this transaction has committed.
relations should be passedfrom 4. Durability
one operator to the next— - The durability property of a transaction requires the values
that the transaction commits to the database to be
materializing temporarytables
persistent.
and/or pipelining can be used.
- This requirement simply states that database changes
made by committed transactions are permanent, even
 Exhuastive Search
when failures happen in the system.
Optimization all possible
query plans are initially TYPES OF CHANGES IN THE DATABASE AFTER A FAILURE
generated and then the best
plan is selected  all the changes made by transactions that committed prior to
the failure
 Though these techniques  all the changes made by transactions that did not complete
provide the best solution, it prior to the failure traditional marketing
has an exponential time and
NOTE: Since changes made by incomplete transactions do not satisfy the
space complexity owing to the atomicity requirement, these incomplete transactions need to be undone after a
large solution space. failure. This will guarantee that the database is in a consistent state after a
failure.
 For example, dynamic
programming technique
SCHEDULES AND CONFLICTS  Serializability in DBMS decides if an interleaved parallel schedule is
serializable or not.
i. Schedule  Two schedules are said to be equivalent if they both produce the
- In a system with a number of simultaneous transactions, a same state for the database and every transaction reads the same
schedule is the total order of execution of operations. value(s) and writes the same value(s).
- Given a schedule S comprising of n transactions, say T1,
 Example:
T2, T3………..Tn; for any transaction Ti, the operations in
Ti must execute as laid down in the schedule S.

TYPES OF SCHEDULE
1. SERIAL SCHEDULES
 In a serial schedule, at any point of time, only
one transaction is active, i.e. there is no
overlapping of transactions.  Schedule1 is a serial schedule consisting of Transaction1
and Transaction2 wherein the operations on data item A
2. PARALLEL SCHEDULES (A1 and
 In parallel schedules, more than one A2) are
transactions are active simultaneously, i.e. the
transactions contain operations that overlap at
time.
performed first and later the operations on data item B (B1
and B2) are carried out serially.
 Schedule2 is a non-serial schedule consisting of
Transaction1 and Transaction2 wherein the operations are
interleaved.
ii. Conflict in schedule
- In a schedule comprising of multiple transactions, a EQUIVALENCE OF SCHEDULES
conflict occurs when two active transactions perform non-
1. RESULT EQUIVALENCE
compatible operations.
 Two schedules producing identical results are said to be
- Two operations are said to be in conflict, when all of the
result equivalent.
following three conditions exists simultaneously −
 Often, this kind of schedule is given the least significance
 The two operations are parts of different
since the result derived are mainly focussed on the output
transactions.
which in some cases may vary for the same set of inputs
 Both the operations access the same data
or might produce the same output for a different set of
item.
inputs.
 At least one of the operations is a write_item()
2. VIEW EQUIVALENCE
operation, i.e. it tries to modify the data item.
 Two schedules that perform similar action in a similar
SERIALIZABILITY manner are said to be view equivalent.
 Two schedules (one being serial schedule and another
 A serializable schedule of ‘n’ transactions is a parallel schedule being non- serial) are said to be view serializable if they
which is equivalent to a serial schedule comprising of the same satisfy the rules for being view equivalent to one another.
‘n’ transactions.  Example:
 A serializable schedule contains the correctness of serial
schedule while ascertaining better CPU utilization of parallel
schedule.

EQUIVALENCE

 Transactions that run concurrently can cause conflicts that lead to


anomalies.
 These anomalies can destroy the consistency of the database.
 Therefore, the scheduler must control the conflicting operations of
concurrent transactions.
 The scheduler has to make sure that the parallel schedule 3. CONFLICT EQUIVALENCE
preserves the consistency of the database.  Two schedules are said to be conflict equivalent if both
 The scheduler achieves this by making sure that the allowed parallel contain the same set of transactions and has the same
schedule is equivalent to a serial schedule for the same order of conflicting pairs of operations.
set of transactions.  When either of a conflict operation such as Read-Write or
 Two schedules are said to be equivalent if they both produce the Write-Read or Write-Write is implemented on the same
same state for the database and every transaction reads the same data item at the same time within different transactions
value(s) and writes the same value(s). then the schedule holding such transactions is said to be
a conflict schedule.
 Example:

SERIALIZABILTIY OF SCHEDULES

 A schedule is said to be serializable if it is equivalent to a serial


schedule.
 The check for serial schedule equivalence is known as serializability.
 A DBE guarantees the consistency of the database by enforcing
serializability.
- Schedule2 (a non-serial schedule) is considered to be  Some of timestamp based concurrency control algorithms are −
conflict serializable when its conflict operations are the  Basic timestamp ordering algorithm.
same as that of Shedule1 (a serial schedule).  Conservative timestamp ordering algorithm.
 Multiversion algorithm based upon timestamp ordering.
SERIALIZABILITY IN DISTRIBUTED SYSTEMS
BASIC TO CONCURRENCY CONTROL ALGORITHM
 In a distributed system, transactions run on one or more sites.
 the global schedule consists of a collection of local schedules—each  There are three rules that enforce serializability based on the age of
site involved has a local schedule that may or may not be serializable a transaction.
 if there is a local schedule that is not serializable, then the global  In what follows we outline these rules for two conflicting transactions.
schedule that contains it is not serializable  It should be noted that in a concurrent transaction processing
 The question we need to address now is whether or not the global system, there are many transactions running at the same time.
schedule is serializable when all local schedules are.  Timestamp based ordering follow three rules to enforce serializability
 The answer depends on whether or not the database is replicated. 1. Access Rule − When two transactions try to access the same
 that if the database is not replicated there is no mutual consistency data item simultaneously, for conflicting operations, priority is
requirement given to the older transaction. This causes the younger
 ,as long as local schedules are serializable, the global schedule is transaction to wait for the older transaction to commit first.
serializable as well 2. Late Transaction Rule − If a younger transaction has written a
data item, then an older transaction is not allowed to read or
SERIALIZABILITY IN REPLICATED DATABASES write that data item. This rule prevents the older transaction
from committing after the younger transaction has already
 In a replicated database, schedule S is globally serializable if and committed.
only if all local schedules are serializable and the order of 3. Younger Transaction Rule − A younger transaction can read
commitment for two conflicting transactions is the same at every site or write a data item that has already been written by an older
where the two transactions run. transaction.
CONTROLLING CONCURRENCY Conservative TO Algorithm
 Concurrency controlling techniques ensure that multiple transactions  A variation of the basic TO concurrency control algorithm that can
are executed simultaneously while maintaining the ACID properties eliminate some of the unnecessary restarts is known as conservative
of the transactions and serializability in the schedules TO ordering.
LOCKING-BASED CONCURRENCY CONTROL PROTOCOLS  In the conservative TO algorithm, the concurrency control does not
act on a request from a much younger transaction until the system
 Locking-based concurrency control protocols use the concept of has processed requests from a large enough number of the older
locking data items. transactions.
 A lock is a variable associated with a data item that determines  Therefore, any potential conflict of a younger transaction with older
whether read/write operations can be performed on that data item. transactions will be detected before the system commits a much
 Generally, a lock compatibility matrix is used which states whether a younger transaction.
data item can be locked by two transactions at the same time.
Multiversion Concurrency Control Algorithm
ONE-PHASE LOCKING 1PL
 The multiversion (MV) algorithms for concurrency control are mostly
 One-phase locking(1PL) is a method of locking that requires each used in engineering databases, where the system needs to maintain
ransaction to lock an item before it uses it and release the lock as the history of how a data item has changed over time.
soon as it has finished using it.  instead of saving only one value for data item X, the system maintains
 Obviously, this method of locking provides for the highest multiple values for X known as versions
concurrency level but does not always enforce serializability.  Every time a transaction is allowed to write to data item X, it creates a
new version of X. If X has been written N times, then there are N
TWO-PHASE LOCKING 2PL versions of X in the database.
 The transaction comprise of two phases. Optimistic Concurrency Control Algorithm
 In the first phase, a transaction only acquires all the locks
it needs and do not release any lock. This is called the  In optimistic concurrency control, a transaction’s life cycle is divided
expanding or the growing phase. into three phases:
 In the second phase, the transaction releases the locks  execution phase (EP)
and cannot request any new locks. This is called the  validation phase (VP)
shrinking phase.  commit phase (CP)
TIMESTAMP CONCURRENCY CONTROL ALGORITHMS Execution Phase (EP)

 Timestamp-based concurrency control algorithms use a  In this phase, transaction performs its actions and buffers the new
transaction’s timestamp to coordinate concurrent access to a data values for data items in memory.
item to ensure serializability.
 A timestamp is a unique identifier given by DBMS to a transaction Validation Phase (VP)
that represents the transaction’s start time.
 In this phase, transaction validates itself to assure that committing its
 These algorithms ensure that transactions commit in the order
changes to the database does not destroy the consistency of the
dictated by their timestamps.
database (i.e., generates serializable schedule).
 An older transaction should commit before a younger transaction,
since the older transaction enters the system before the younger Commit Phase (CP)
one.
 Timestamp-based concurrency control techniques generate  In this phase, transaction writes its changes from memory to the
serializable schedules such that the equivalent serial schedule is database on disk.
arranged in order of the age of the participating transactions.
Rules Used to Enforce Serializability  A conflict graph is a nondirected graph that has a set of vertical,
horizontal, and diagonal edges.
 Rule 1. For two transactions Ti and Tj, where Ti is reading what Tj is  A vertical edge connects two nodes within a class and indicates a
writing, Ti’s EP phase cannot overlap with Tj’s CP phase. Tj has to conflict between two transactions within the class.
start its CP after Ti has finished its EP.
 A horizontal edge connects two nodes across two classes and
 Rule 2. For two transactions Ti and Tj, where Ti is writing what Tj is indicates a write–write conflict across different classes.
reading, Ti’s CP phase cannot overlap with Tj’s EP phase. Tj has to
 A diagonal edge connects two nodes within two different classes and
start its EC after Ti has finished its CP. indicates a write– read or a read–write conflict across two classes.
 Rule 3. For two transactions Ti and Tj, where Ti is writing what Tj is
writing, Ti’s CP phase cannot overlap with Tj’s CP phase. Tj has to
start its CP after Ti has finished its CP.

Note: Unlike the previous algorithms, this algorithm uses a timestamp that
is assigned to a transaction when the transaction is ready to validate.
Delaying the assignment of the timestamp until validation time reduces the
number of unnecessary rejections.

Concurrency Control in Distributed Systems Analysis of the conflict graph determines if two transactions within the
same class or across two different classes can be run parallel to each
 the control may reside at a site different from the site where a other.
transaction enters the system
 control may be centralized, residing at only one site, or distributed— Distributed Optimistic Concurrency Control
where multiple sites cooperate to control execution of transactions
 Distributed optimistic concurrency control algorithm extends optimistic
 The check for serializability for these global schedules is the same as
concurrency control algorithm.
the check for serializability in a centralized system.
 For this extension, two rules are applied:
 We have to identify all conflicts in the schedule and make sure that
o Rule 1: a transaction must be validated locally at all sites
the total commitment order graph is acyclic.
when it executes. If a
In order for a set of transactions in a distributed DBE to be serializable, the o transaction is found to be invalid at any site, it is aborted.
following two requirements must be satisfied: Local validation guarantees that the transaction maintains
serializability at the sites where it has been executed. After
 All local schedules must be serializable. a transaction passes local validation test, it is globally
 If two transactions conflict at more than one site, their partial validated.
commitment order (PCO) requirements at all sites where they meet o Rule 2: after a transaction passes local validation test, it
must be compatible for all their conflicts. should be globally validated. Global validation ensures that
if two conflicting transactions run together at more than
2PL in distributed Systems one site, they should commit in the same relative order at
all the sites they run together. This may require a
 The basic principle of distributed two-phase locking is same as the
transaction to wait for the other conflicting transaction, after
basic two-phase locking protocol.
validation before commit. This requirement makes the
 However, in a distributed system there are sites designated as lock algorithm less optimistic since a transaction may not be
managers. A lock manager controls lock acquisition requests from able to commit as soon as it is validated at a site.
transaction monitors.
 In order to enforce co-ordination between the lock managers in Deadlock Handling
various sites, at least one site is given the authority to see all
transactions and detect lock conflicts.  a state of a database system having two or more transactions, when
 Depending upon the number of sites who can detect lock conflicts, each transaction is waiting for a data item that is being locked by
distributed two-phase locking approaches can be of three types: some other transaction
o Centralized two-phase locking − In this approach, one site  A deadlock can be indicated by a cycle in the wait-for-graph.
is designated as the central lock manager. All the sites in  This is a directed graph in which the vertices denote transactions and
the environment know the location of the central lock the edges denote waits for data items.
manager and obtain lock from it during transactions. o Transaction T1 is waiting for data item X which is locked by
 Primary copy two-phase locking − In this approach, a number of sites T3. T3 is waiting for Y which is locked by T2 and T2 is
are designated as lock control centers. Each of these sites has the waiting for Z which is locked by T1. Hence, a waiting cycle
responsibility of managing a defined set of locks. All the sites know is formed, and none of the transactions can proceed
which lock control center is responsible for managing lock of which executing.
data table/fragment item.
 Distributed two-phase locking − In this approach, there are a number
of lock managers, where each lock manager controls locks of data
items stored at its local site. The location of the lock manager is
based upon data distribution and replication.

Distributed Timestamp Concurrency Control

 In a distributed system, we cannot use any local physical clock Deadlock Handling in Centralized Systems
readings or any site’s logical clock readings as our global timestamps,
since they are not globally unique.  There are three classical approaches for deadlock handling, namely −
 An indication of the site ID (which is globally unique) needs to be o Deadlock prevention.
included in the timestamp of a transaction. o Deadlock avoidance.
 The issue with using a site’s physical clocks is called “drifting”— when o Deadlock detection and removal.
two or more clocks show numbers that are different from each other.  All of the three approaches can be incorporated in both a centralized
and a distributed database system.
Conflict Graph
Deadlock Prevention
 The deadlock prevention approach does not allow any transaction to  Once these concerns are addressed, deadlocks are handled through
acquire locks that will lead to deadlocks. any of deadlock prevention, deadlock avoidance or deadlock
 The convention is that when more than one transactions request for detection and removal.
locking the same data item, only one of them is granted the lock.
Transaction Location
 One of the most popular deadlock prevention methods is pre-
acquisition of all the locks.  processed in multiple sites and use data items in multiple sites
 A transaction acquires all the locks before starting to execute and  The amount of data processing is not uniformly distributed among
retains the locks for the entire duration of transaction. these sites. The time period of processing also varies. Thus, the
 If another transaction needs any of the already acquired locks, it has same transaction may be active at some sites and inactive at others.
to wait until all the locks it needs are available.  When two conflicting transactions are located in a site, it may happen
 Using this approach, the system is prevented from being deadlocked that one of them is in inactive state.
since none of the waiting transactions are holding any lock.  This concern is called transaction location issue.
Deadlock Avoidance  This concern may be addressed by Daisy Chain model.
 In this model, a transaction carries certain details when it moves from
 The deadlock avoidance approach handles deadlocks before they one site to another.
occur.  Some of the details are the list of tables required, the list of sites
 It analyzes the transactions and the locks to determine whether or not required, the list of visited tables and sites, the list of tables and sites
waiting leads to a deadlock. that are yet to be visited and the list of acquired locks with types.
 Transactions start executing and request data items that they need to  After a transaction terminates by either commit or abort, the
lock. information should be sent to all the concerned sites.
 The lock manager checks whether the lock is available.
 If it is available, the lock manager allocates the data item and the Transaction Control
transaction acquires the lock.  Transaction control is concerned with designating and controlling the
 If the item is locked by some other transaction in incompatible mode, sites required for processing a transaction in a distributed database
the lock manager runs an algorithm to test whether keeping the system.
transaction in waiting state will cause a deadlock or not.  There are many options regarding the choice of where to process the
 There are two algorithms for this purpose: transaction and how to designate the center of control, like:
o wait-die o One server may be selected as the center of control.
o wound-wait o The center of control may travel from one server to
 Let us assume that there are two transactions, T1 and T2, where T1 another.
tries to lock a data item which is already locked by T2. o The responsibility of controlling may be shared by a
o Wait-Die − If T1 is older than T2, T1 is allowed to wait. number of servers.
Otherwise, if T1 is younger than T2, T1 is aborted and later
restarted. Distributed Deadlock Prevention
o Wound-Wait − If T1 is older than T2, T2 is aborted and
later restarted.  Atransaction should acquire all the locks before starting to execute.
o Otherwise, if T1 is younger than T2, T1 is allowed to wait. This prevents deadlocks.
 The site where the transaction enters is designated as the controlling
Deadlock Detection and Removal site.
 The controlling site sends messages to the sites where the data items
 The deadlock detection and removal approach runs a deadlock are located to lock the items.
detection algorithm periodically and removes deadlock in case there
 Then it waits for confirmation. When all the sites have confirmed that
is one.
they have locked the data items, transaction starts. If any site or
 It does not check for deadlock when a transaction places a request for communication link fails, the transaction has to wait until they have
a lock. When a transaction requests a lock, the lock manager checks been repaired.
whether it is available.
 this approach has some drawbacks −
 If it is available, the transaction is allowed to lock the data item; o Pre-acquisition of locks requires a long time for
otherwise the transaction is allowed to wait. communication delays. This increases the time required for
 To detect deadlocks, the lock manager periodically checks if the wait- transaction.
forgraph has cycles. o In case of site or link failure, a transaction has to wait for a
 If the system is deadlocked, the lock manager chooses a victim long time so that the sites recover.
transaction from each cycle. o Meanwhile, in the running sites, the items are locked. This
 The victim is aborted and rolled back; and then restarted later. may prevent other transactions from executing.
 Some of the methods used for victim selection are: o If the controlling site fails, it cannot communicate with the
o Choose the youngest transaction. Choose the transaction other sites. These sites continue to keep the locked data
with fewest data items. items in their locked state, thus resulting in blocking.
o Choose the transaction that has performed least number of
updates. Choose the transaction having least restart Distributed Deadlock Avoidance
overhead.
o Choose the transaction which is common to two or more  In distributed systems, transaction location and transaction control
cycles. issues needs to be addressed. Due to the distributed nature of the
transaction, the following conflicts may occur:
Deadlock Handling In Distributed Systems o Conflict between two transactions in the same site.
o Conflict between two transactions in different sites.
 Transaction processing in a distributed database system is also  In case of conflict, one of the transactions may be aborted or allowed
distributed, ie. the same transaction may be processing at more than to wait as per distributed wait-die or distributed wound-wait
one site. algorithms.
 The two main deadlock handling concerns in a distributed database  Let us assume that there are two transactions, T1 and T2. T1 arrives
system that are not present in a centralized system are transaction at Site P and tries to lock a data item which is already locked by T2 at
location and transaction control. that site.
 Hence, there is a conflict at Site P. The algorithms are as follows:
o Distributed Wound-Die
 If T1 is older than T2, T1 is allowed to wait. T1
can resume execution after Site P receives a
message that T2 has either committed or
aborted successfully at all sites.
 If T1 is younger than T2, T1 is aborted. The
concurrency control at Site P sends a message
to all sites where T1 has visited to abort T1. The
controlling site notifies the user when T1 has
been successfully aborted in all the sites.
o Distributed Wait-Wait
 If T1 is older than T2, T2 needs to be aborted. If
T2 is active at Site P, Site P aborts and rolls
back T2 and then broadcasts this message to
other relevant sites. If T2 has left Site P but is
active at Site Q, Site P broadcasts that T2 has
been aborted; Site L then aborts and rolls back
T2 and sends this message to all sites.
 If T1 is younger than T1, T1 is allowed to wait.
T1 can resume execution after Site P receives a
message that T2 has completed processing.

Distributed Deadlock Detection

 Deadlocks are allowed to occur and are removed if detected.


 The system does not perform any checks when a transaction places a
lock request. For implementation, global wait-for- graphs are created.
Existence of a cycle in the global wait-for-graph indicates deadlocks.
 However, it is difficult to spot deadlocks since transaction waits for
resources across the network.
 In a distributed system, there can be more than one deadlock
detectors.
 A deadlock detector can find deadlocks for the sites under its control.
 There are three alternatives for deadlock detection in a distributed
system, namely:
o Centralized Deadlock Detector − One site is designated as
the central deadlock detector. Hierarchical Deadlock
Detector − A number of deadlock detectors are arranged in
hierarchy.
o Distributed Deadlock Detector − All the sites participate in
detecting deadlocks and removing them.

You might also like