DBMSPPTModule-4
DBMSPPTModule-4
1
Introduction
Schema Refinement:
• The Schema Refinement refers to refine the
schema by using some technique. The best
technique of schema refinement is decomposition.
• Normalization or Schema Refinement is a
technique of organizing the data in the database.
• It is a systematic approach of decomposing tables
to eliminate data redundancy
2
Cont.
Data Redundancy:
The same amount of data is held in more than one
case with in a database is called data redundancy.
Problems caused by redundancy:
Storing the same information redundantly, that is, in
more than one place within a database can lead to several
problems:
• Redundant Storage: Some information is stored
repeatedly.
• Update Anomalies: If one copy of such repeated data is
updated, an inconsistency is created unless all copies
are similarly updated.
3
Cont.
• Insertion Anomalies: It may not be possible
to store certain information unless some
other information is stored.
• Deletion Anomalies: It may not be possible
to delete certain
SID SName information
Rating without losing
Daily_Wag Worked_Hours
some other information.e
1 Lucky 8 10 32
2 Sumant 8 10 41
h
3 Harish 7 5 30
4 Krishna 7 5 25
5 HemantTable: 8Instance of an
10Hourly_Emp 35
h Relation 4
Decomposition
• To avoid redundancy and problems due to
redundancy, we use refinement technique
called Decomposition.
• It is a process of decomposing a larger
relation into smaller relations.
• Each of smaller relations contain subset of
attributes of original relation.
• If there is no proper decomposition of
relations then it may lead to problem like loss
of information. 5
Cont.
• We can decompose Hourly_Emp relation into two
relations.
1. Hourly_Emp1 (SID, SName, Rating, Worked_hours)
2. Wage (Rating, Daily_Wage)
• Here we can easily record the daily wage for any rating
simply by adding a tuple to daily wage, even if no
employee with that rating appears in the current
instance of hourly employee.
6
Cont.
Problems related to decomposition:
• Whenever we are going for decomposition
we must be careful because decomposition
a relational schema can create more
problems than it solves.
• Two important questions must be asked
repeatedly
1. Do we need to decompose a relation?
2. What problem (if any) does given decomposition
cause? 7
Functional Dependencies
9
Cont.
Reasoning about functional dependencies:
Types of functional dependencies:
1. Trivial functional dependency: If X Y is a functional
dependency where Y subset X, these type of FD’s called as
trivial functional dependency.
2. Non-trivial functional dependency: If X Y and Y is not
subset of X then it is called non-trivial functional
dependency.
3. Completely non-trivial functional dependency: If X Y
and X∩Y=Ф(null) then it is called completely non-trivial
functional dependency.
10
Cont.
Armstrong Axioms/ Inference rules:
Armstrong axioms defines the set of rules for
reasoning about functional dependencies and also
to infer all the functional dependencies on a
relational database.
Various axioms rules or inference rules:
Primary axioms:
1. Reflexivity Rule:
– If X is a set of attributes then X X holds
– If X is a set of attributes and Y is a subset of X then X
Y holds.
11
Cont.
2. Augmentation Rule:
If X Y holds and Z is a set of attributes then XZ YZ
holds.
3. Transitivity Rule:
If X Y holds and Y Z holds then X Z holds.
Secondary or Derived axioms:
1. Union Rule:
If X Y holds and X Z holds then X YZ holds.
2. Decomposition Rule:
If X YZ holds then X Y and X Z holds.
3. Pseudo Transitivity Rule:
If X Y holds and XZ W holds then YZ W holds.
12
Attribute Closure
• Attribute closure of an attribute set can be defined as
set of attributes which can be functionally determined
from it.
• By using attribute closure we can find the Super keys
and Candidate keys in a relation.
• Attribute closure for any attribute can be denoted by
X+ .
Note:
To find attribute closure of an attribute set:
1. Add elements of attribute set to the result set.
2. Recursively add elements to the result set which can be
functionally determined from the elements of result set. 13
Cont.
Example 1: Consider the following list of FD’s
and compute the closure for AG.
R (A, B, C, G, H,I)
F= { A B, A C, CG H, CG I, B H}
Sol: ITERATION CLOSURE USING
1 AG
2 AGB A B
3 AGBC A C
4 ABCGH B H
5 ABCGHI CG I
14
Cont.
Example 2: Consider the following FD’s and
calculate A+ with
R (A, B, C, D)
F= {A B, B C, AB D}
Example 3: Compute the closures for relational
schema
R (A, B, C, D, E)
F= {A BC, CD E, B D, E A} and
list the keys of R.
15
Cont.
Example 4: Let R (A, B, C, D, E) is a relational schema with functional
dependencies
F= {AB C, DE B, CD E}
find the closure for
i. B
ii. E
iii. ACE and
iv. BD
Example 5: Compute the closures for the set of FD’s
R (A, B, C, G, H, I)
F= {A B, A C, CG H, CG I, B H} and list the keys of R.
16
Normalization
• Normalization is the process of organizing the data
in the database.
• Normalization is used to minimize the redundancy
from a relation or set of relations.
• It is also used to eliminate the undesirable
characteristics like Insertion, Update and Deletion
Anomalies.
• Normalization divides the larger table into the
smaller table and links them using relationship.
• The normal form is used to reduce redundancy from
the database table.
17
Cont.
Types of Normal Forms:
1. First Normal Form
2. Second Normal Form
3. Third Normal Form
4. Boyce-Codd Normal Form
5. Fourth Normal Form
6. Fifth Normal Form
18
Cont.
First Normal Form (1NF):
If the database is in 1NF, it should
satisfies the following conditions:
i. Contains only atomic values.
ii. There are no repeating groups.
19
Cont.
Example: Consider the following relation
SID SName SAddress SMobile
881212121
2
101 Lucky Tirupati
990001222
2
891231239
102 Sumanth Bangalore
0
777888121
103 Hemanth Hyderabad
2
888187775
104 Gowtham Tirupati
2
This relation is not in 999000012
1 NF because it is having non
3
atomic values
105 Krishna Bangalore
812345098
7 20
Cont.
We re-arrange the relation as below, to
convert it to
SID
First Normal
SName
Form.
SAddress SMobile
881212121
101 Lucky Tirupati
2
990001222
101 Lucky Tirupati
2
891231239
102 Sumanth Bangalore
0
777888121
103 Hemanth Hyderabad
2
888187775
104 Gowtham Tirupati
2
999000012
105 Krishna Bangalore
3
812345098
105 Krishna Bangalore
7
21
Cont.
Second Normal Form (2NF):
If the database is in 2NF, it should
satisfies the following conditions:
i. Relation must be in 1NF.
ii. All non-key attributes are fully functional
dependent on the primary key.
Student
Project_ID Project_Name
Project
24
Cont.
Third Normal Form (3NF):
If the database is in 3NF, it should
satisfies the following conditions:
i. Relation must be in 2NF.
ii. Transitive functional dependency of non-
prime attribute on any super key should be
removed.
25
Cont.
Example: Consider the following relation
Zip City
ZipCode
27
Cont.
Boyce-Codd Normal Form (BCNF):
• BCNF is the advance version of 3NF. It is
stricter than 3NF.
• A relation is in BCNF if every functional
dependency X → Y, X is the super key of the
table.
• For BCNF, the table should be in 3NF, and for
every FD, LHS is super key.
28
Cont.
Example: Consider the following relation
Std_Coll Std_Secti Std_Dept
Std_ID Std_Dept
ege on no
Student_Section
Std_Dept Std_Section Std_Deptno
Student_Dept
Std_ID Std_Dept
30
Properties of Decomposition
• Decomposition is the process of breaking down in
parts or elements.
• It replaces a relation with a collection of smaller
relations.
• It breaks the table into multiple tables in a database.
• It should always be lossless, because it confirms that
the information in the original relation can be
accurately reconstructed based on the decomposed
relations.
• If there is no proper decomposition of the relation,
then it may lead to problems like loss of information.
31
Cont.
Following are the properties of Decomposition:
1. Lossless Decomposition
2. Dependency Preservation
Lossless Decomposition:
• If the information is not lost from the relation
that is decomposed, then the decomposition will
be lossless.
• The relation is said to be lossless decomposition
if natural joins of all the decomposition give the
original relation.
32
Cont.
Example: <Employee_Department> Table
Eid Ename Age City Salary Depti DeptNam
d e
E001 Lucky 21 Tirupati 20000 D001 Finance
E002 Sumanth 32 Tirupati 30000 D002 Productio
n
E003 Hemant 25 Hyderaba 5000 D003 Sales
h d
E004 Gowtha 24 Hyderaba 4000 D004 Marketin
m d g
E005 Krishna
Decompose 25
the Bangalor
above 25000 into
relation D005 HRrelations
two
e
to check whether a decomposition is lossless or lossy.
33
Cont.
Relation 1: <Employee> Table Relation 2:
<Department> Table
Eid Ename Age City Salary Depti Eid DeptNa
d me
E00 Lucky 21 Tirupati 20000 D001 E001 Finance
1 D002 E002 Productio
E00 Sumant 32 Tirupati 30000 n
2 h D003 E003 Sales
E00 Heman 25 Hyderab 5000 D004 E004 Marketin
3 th ad g
E00 Gowth 24 Hyderab 4000 D005 E005 HR
4 am ad
E00 Krishna 25 Bangalo 25000
5 re
34
Cont.
• So, the above decomposition is a Lossless Join
Decomposition, because the two relations contains one
common field that is 'Eid' and therefore join is possible.
• Now apply natural join on the decomposed relations.
Employee ⋈ Department:
Eid Ename Age City Salary Depti DeptNam
d e
E001 Lucky 21 Tirupati 20000 D001 Finance
E002 Sumant 32 Tirupati 30000 D002 Productio
h n
E003 Hemant 25 Hyderaba 5000 D003 Sales
h d
E004 Gowtha 24 Hyderaba 4000 D004 Marketin
m d g
E005 Krishna 25 Bangalor 25000 D005 HR 35
e
Cont.
Dependency Preservation:
• Dependency is an important constraint on
the database.
• Every dependency must be satisfied by at
least one decomposed table.
• If a relation R is decomposed into relation R1
and R2, then the dependencies of R either
must be a part of R1 or R2 or must be
derivable from the combination of functional
dependencies of R1 and R2.
36
Cont.
Example:
• Suppose there is a relation R (A, B, C, D) with
functional dependency set {A->BC}.
• The relational R is decomposed into R1(A,B)
and R2(A,C,D) which is dependency
preserving because FD A->BC is a part of
relation R1(ABC).
37
Cont.
Multivalued Dependency:
• Multivalued dependency occurs when two
attributes in a table are independent of
each other but, both depend on a third
attribute.
• A multivalued dependency consists of at
least two attributes that are dependent on
a third attribute that's why it always
requires at least three attributes.
38
Cont.
Example:
Suppose there is a bike manufacturer company which
produces two colors(white and black) of each model every
year. BIKE_MOD MANUF_Y COLOR
EL EAR
M2011 2008 White
M2001 2008 Black
M3001 2013 White
M3001 2013 Black
M4006 2017 White
M4006 2017 Black
42
Cont.
Example: <Student> Table
The given STUDENT table is in 3NF, but the COURSE and HOBBY
are two independent entity. Hence, there is no relationship between
COURSE and HOBBY.
43
Cont.
So to make the above table into 4NF, we can
decompose it into two tables:
Relation 1: <Std_Course> Table
Relation
STU_ID 2: <Std_Hobby>
COURSE TableSTU_ID HOBBY
21 ORACLE 21 Dancing
21 PYTHON 21 Singing
34 C 34 Dancing
74 JAVA 74 Cricket
59 C++ 59 Hockey
44
Cont.
Fifth Normal Form (5NF):
• If the database is in 5NF, it should satisfies the
following conditions:
i. Relation must be in 4NF.
ii. Relation should not contains any join dependency
and joining should be lossless.
• 5NF is satisfied when all the tables are broken
into as many tables as possible in order to avoid
redundancy.
• 5NF is also known as Project-join normal form
(PJ/NF). 45
Transactions in DBMS
• Transactions are a set of operations used to
perform a logical set of work.
• A transaction usually means that the data in the
database has changed.
• One of the major uses of DBMS is to protect the
user’s data from system failures.
• It is done by ensuring that all the data is restored
to a consistent state when the computer is
restarted after a crash.
46
Cont.
• The transaction is any one execution of the user program
in a DBMS.
• Executing the same program multiple times will generate
multiple transactions.
Example:
Transaction to be performed to withdraw cash from an
ATM
Set of Operations :
Consider the following example for transaction
operations as follows.
47
Cont.
Example: ATM transaction steps
• Transaction Start.
• Insert your ATM card.
• Select language for your transaction.
• Select Savings Account option.
• Enter the amount you want to withdraw.
• Enter your secret pin.
• Wait for some time for processing.
• Collect your Cash.
• Transaction Completed.
48
Cont.
49
Cont.
Example:
Transfer of 50 from Account A to Account B. Initially
A=500, B=800. This data is brought to RAM from Hard
Disk.
R(A) -- 500 // Accessed from RAM.
A = A-50 // Deducting 50 from A.
W(A)--450 // Updated in RAM.
R(B) -- 800 // Accessed from RAM.
B=B+50 // 50 is added to B's Account.
W(B) --850 // Updated in RAM.
commit // The data in RAM is taken back to Hard Disk.
50
Cont.
Properties of Transaction:
• The transaction properties can be termed as ACID
properties.
• ACID Properties are used for maintaining the
integrity of database during transaction processing.
• ACID in DBMS stands for
– Atomicity
– Consistency
– Isolation
– Durability.
51
Cont.
Atomicity:
• It states that all operations of the transaction take
place at once if not, the transaction is aborted.
• There is no midway, i.e., the transaction cannot occur
partially. Each transaction is treated as one unit and
either run to completion or is not executed at all.
Atomicity involves the following two operations:
• Abort: If a transaction aborts then all the changes
made are not visible.
• Commit: If a transaction commits then all the changes
made are visible.
52
Cont.
Example:
Let's assume that following transaction T consisting of
T1 and T2. A consists of Rs 600 and B consists of Rs 300.
Transfer Rs 100 from T1
account A to account
T2 B.
Read(A)
A:= A-100
Write(A)
Read(B)
B:= A+100
Write(B)
57
Cont.
Active state:
• The active state is the first state of every
transaction. In this state, the transaction is being
executed.
• For example: Insertion or deletion or updating a
record is done here. But all the records are still not
saved to the database.
Partially committed:
In the partially committed state, a transaction
executes its final operation, but the data is still not
saved to the database. 58
Cont.
Committed:
• A transaction is said to be in a committed state
if it executes all its operations successfully.
• In this state, all the effects are now
permanently saved on the database system.
Failed state:
If any of the checks made by the database
recovery system fails, then the transaction is said
to be in the failed state.
59
Cont.
Aborted:
• If any of the checks fail and the transaction has
reached a failed state then the database recovery
system will make sure that the database is in its
previous consistent state.
• If not then it will abort or roll back the transaction
to bring the database into a consistent state.
• If the transaction fails in the middle of the
transaction then before executing the transaction,
all the executed transactions are rolled back to its
consistent state. 60
Cont.
• After aborting the transaction, the database
recovery module will select one of the two
operations:
1. Re-start the transaction
2. Kill the transaction
61
Concurrent Execution
Schedule:
• A schedule is a series of operations from one or more
transactions.
• A schedule can be of two types:
1. Serial Schedule
2. Concurrent Schedule
Serial Schedule:
• When one transaction completely executes before
starting another transaction, the schedule is called
serial schedule.
• A serial schedule is always consistent.
62
Cont.
Example:
If a schedule S has debit transaction T1 and
credit transaction T2, possible serial schedules are T1
followed by T2 (T1->T2) or T2 followed by T1 (T2->T1).
Concurrent Schedule:
• When operations of a transaction are interleaved
with operations of other transactions of a schedule,
the schedule is called Concurrent schedule.
• But concurrency can lead to inconsistency in the
database.
63
Cont.
Advantages of concurrent Execution of a
transaction:
1. Decrease waiting time or turnaround time.
2. Improve response time.
3. Increased throughput or resource utilization.
Problems with Concurrent Execution:
4. Lost update problem (W-W conflict).
5. Dirty Read Problems (W-R Conflict).
6. Unrepeatable read (R-W Conflict).
64
Cont.
Lost update problem (W-W conflict):
• The problem occurs when two different database
transactions perform the read/write operations on
the same database items in an interleaved manner
(i.e., concurrent execution) that makes the values
of the items incorrect hence making the database
inconsistent.
• If there are two transactions T1 and T2 accessing
the same data item value and then update it, then
the second record overwrites the first record.
65
Cont.
Example: Let’s take the value of A is 100
66
Cont.
Dirty read problem (W-R conflict):
This type of problem occurs when one
transaction T1 updates a data item of the
database, and then that transaction fails due
to some reason, but its updates are accessed
by some other transaction.
67
Cont.
Example: Let’s take the value of A is 100
Time Transaction T1 Transaction T2
t1 Read(A)
t2 A=A+20
t3 Write(A)
t4 Read(A)
t5 A=A+30
t6 Write(A)
t7 Write(B)
68
Cont.
Unrepeatable read (R-W Conflict):
• It is also known as an inconsistent retrieval
problem.
• If a transaction T1 reads a value of data
item twice and the data item is changed by
another transaction T2 in between the two
read operation.
• Hence T1 access two different values for its
two read operation of the same data item.
69
Cont.
Example: Let’s take the value of A is 100
Transaction Transaction
Time T1 T2
t1 Read(A)
t2 Read(A)
t3 A=A+30
t4 Write(A)
t5 Read(A)
70
Serializability
Serializability:
• When multiple transactions are running
concurrently then there is a possibility that
the database may be left in an inconsistent
state.
• Serializability is a concept that helps to
identify which non-serial schedules are
correct and will maintain the consistency of
the database. 71
Cont.
Serializable Schedules:
If a given non-serial schedule of ‘n’
transactions is equivalent to some serial
schedule of ‘n’ transactions, then it is called as
a serializable schedule.
Types of Serializability:
1. Conflict Serializability
2. View Serializability
72
Cont.
Conflict Serializability:
If a given non-serial schedule can be converted
into a serial schedule by swapping its non-conflicting
operations, then it is called as a conflict serializable
schedule.
Conflicting Operations:
Two operations are said to be conflicting if all the
following conditions satisfy:
• They belong to different transactions
• They operate on the same data item
• At Least one of them is a write operation
73
Cont.
Example:
• Conflicting operations pair (R1(A), W2(A)) because
they belong to two different transactions on same
data item A and one of them is write operation.
• Similarly, (W1(A), W2(A)) and (W1(A), R2(A)) pairs
are also conflicting.
• On the other hand, (R1(A), W2(B)) pair is non-
conflicting because they operate on different data
item.
• Similarly, ((W1(A), W2(B)) pair is non-conflicting.
74
Cont.
Example:
• Swapping is possible only if S1 and S2 are logically
equal.
Since, S1 is conflict 78
Cont.
View Serializability:
A Schedule is called view serializable if it’s
view equal to a serial schedule (no overlapping
transactions).
View Equivalent:
Two schedules S1 and S2 are said to be view
equivalent if they satisfy the following conditions:
1. Initial Read
2. Updated Read
3. Final Write
79
Cont.
Initial Read:
An initial read of both schedules must be the same.
Suppose two schedule S1 and S2. In schedule S1, if a
transaction T1 is reading the data item A, then in S2,
transaction T1 should also read A.
80
Cont.
Updated Read:
In schedule S1, if Ti is reading A which is updated by Tj
then in S2 also, Ti should read A which is updated by Tj.
81
Cont.
Final Write:
A final write must be the same between both the
schedules. In schedule S1, if a transaction T1 updates A at last
then in S2, final writes operations should also be done by T1.
82
Recoverability
Recoverable Schedule:
• Schedules in which transactions commit
only after all transactions whose changes
they read commit are called recoverable
schedules.
• In other words, if some transaction Tj is
reading value updated or written by some
other transaction Ti, then the commit of Tj
must occur after the commit of Ti. 83
Cont.
Example:
Consider the following schedule involving two
transactions T1 and T2. T1 T2
R(A)
W(A)
W(A)
R(A)
commit
commit
86
Cont.
Recoverable with Cascading Rollback:
• The table in the next slide shows a schedule with two
transactions, T1 reads and writes A and that value is
read and written by T2. But later on, T1 fails. So we
have to rollback T1.
• Since T2 has read the value written by T1, it should also
be rollbacked. As it has not committed, we can rollback
T2 as well. So it is recoverable with cascading rollback.
• “If Tj is reading value updated by Ti and commit of Tj is
delayed till commit of Ti, the schedule is called
recoverable with Cascading rollback”.
87
Cont.
Example:
Consider the following schedule involving two
transactions T1 and T2.
88
Cont.
Cascade less Recoverable Rollback:
• The table in the next slide shows a schedule with
two transactions, T1 reads and writes A and
commits and that value is read by T2.
• But if T1 fails before commit, no other transaction
has read its value, so there is no need to rollback
other transaction. So this is a Cascade less
recoverable schedule.
• “If Tj reads value updated by Ti only after Ti is
committed, the schedule will be Cascade less
Recoverable”.
89
Cont.
Example:
Consider the following schedule involving two
transactions T1 and T2.
90
Cont.
Transaction Isolation Levels :
Based on the concurrent execution
issues the SQL standard defines four isolation
levels:
1. Read Uncommitted
2. Read Committed
3. Repeatable Read
4. Serializable
91
Testing for Serializability
• The Serializability of a schedule is tested
using a Serialization graph.
• Assume a schedule S, we can construct it
directed graph or precedence graph.
• A graph G is a pair G=(V,E) where V is a set
of vertices and E is a set of edges.
• Set of vertices consists of all the
transactions participating in the schedule.
92
Cont.
• Set of edges consists of all the edges Ti to Tj
for which one of the following three
conditions hold:
i. Ti executes W(A) before Tj executes R(A)
ii. Ti executes R(A) before Tj executes W(A)
iii. Ti executes W(A) before Tj executes W(A)
93