Module - 4 Notes - 06-11-2021
Module - 4 Notes - 06-11-2021
Schema Refinement: Database Schema of a Database system is its structure described in a formal
Language supported by the DBMS.
Schema: It is a blueprint of how the database is constructed. And IC imposed on a database and
IC’s can be used to refine the conceptual schema which is produced by translating on ER MODEL
Design into a collection of Relations.
An important class of constraints called Functional Dependencies and other kinds of IC’s for Eg –
Multivalued dependencies and Join Dependencies provide useful information. Schema Refinement
is intended to address and a refinement approach based on Decompositions. And Decomposition
can eliminate redundancy, it leads to problems its own.
Decomposition:
Decomposition in DBMS removes redundancy, anomalies and inconsistencies from a database by
dividing the table into multiple tables.
Problems caused by redundancy: Redundancy is the method of storing the same information
repeatedly i.e more than once in a data base, that is not needed normally, because that can be used
in the event of failure of disk to retrieve the loss of information. This redundancy can lead to several
problems.
1. Redundant storage: Same information is stored repeatedly.
2. Update anomalies: If one copy of among such repeated data is updated an inconsistency is
created until all the copies are simultaneously updated.
3. Insertion anomalies: it may not be possible to insert certain information unless and until some
other information should be added with it.
4. Deletion anomalies: Sometimes it may not be possible to delete some information unless and
until some other useful information should be deleted along with it.
Ex: Consider a relation obtained by translating the variants of an Hourly_employess entity set
Hourly_employees(ssn,name,lot,rating,hourly_wages,hourly_worked)
Here, we omit the attribute type information, because we focus on grouping of attributes into
relations. We defining these attributes denotes a single letter refer to a relation schema by a
string of letters.
S N L R W H
1
ssn name lot rating hourly_wages hourly_worked
1 Alice 48 8 10 40
2 Bob 22 8 10 30
3 Smith 35 5 7 30
4 Ram 35 5 7 32
5 Rahul 35 8 10 40
Now consider this table, the same value appears in ratings column of two tuples, the IC’s tells us
that same value must appear in hourly_wages column as well. This IC is an example of functional
dependency and now it leads to possible redundancy in a relation Hourly-employees.
Redundant storage: The rating value 8 corresponds to the hourly_wages 10, and this association
is repeated 3 times.
Insertion anomalies: We cannot insert a tuple for an employee unless; we know the hourly
wages for the employee’s rating value.
Deletion anomalies: If we delete all tuples with a given rating value, we loss the association
between the rating value and its hourly_wages value.
Update anomalies: The hourly_wages in the first tuple could be updated without making a
change in the similar tuples, but inconsistency is created unless remaining copies are updated.
Finally, the schema should not permit redundancy, but it cn allow redundancy by accepting the
drawbacks.
Null values:
The use of null values can address some of these problems and as well as it doesn’t provides
any complete information but in some cases it helps.
Example: Consider the above relation, null values cannot help eliminate redundant storage or
update anomalies.
1. Insertion anomalies: Insert an employee tuple with null values in hourly_wage field. So
we cannot record the hourly_wages for a rating, unless there is an employee with that
rating.
2. Deletion anomalies: Consider storing a tuple with null values in all fields except rating
and hourly_wages. The solution does’nt work because it requires the ssn value to be null
and primary key cannot be null.
Null values do not provide a general solution to the problems of redundancy, but helps in
some cases.
2
Decomposition:
Redundancy arises when a relation schema forces an association between attributes. Functional
dependencies can be used to identify such situations and suggests refinement to the schema.
Refinement means redefining the schema. Due to the problems of redundancy, for eliminating
redundancy, decomposition is used in DBMS.
A decomposition of a relation schema R consists of two or more relational schema that each
contains a subset of the attributes of R and together includes all the attributes in R.
Example: decomposing the hourly_employees relation into two sub relations,
Relation 1:
Hourly_employees2( ssn, name, lot, rating, hourly_worked)
Relation 2:
Wages( rating, hourly_wages)
Note: Now we can easily record the hourly_wages for ny rating by adding a tuple to wages
instead of updating a several tuples, and also it eliminates the inconsistency.
Problems related to decomposition:
Decomposing a relation schema can create more problems. Two important questions must be
asked repeatedly.
1. Do we need to decompose a relation?
2. What problems do give decomposition cause?
3
decide that a relation schema must be decomposed further, we must choose a particular
decomposition ( i.e., a particular collection of smaller relation to replace the given relation).
2) What problems do given decomposition cause?
Properties of decomposition:
a) Lossy decomposition
b) Lossless-join property
c) Dependency-preservation property.
After Decomposition
Supervisor id Department id
S1 D1
S2 D2
S3 D3
Department Id
Project id
P1 D1
P2 D2
P1 D3
4
Lossy Decomposition:
If R is a subset of (R1 JOIN R2) then Lossy Decomposition.
Main Table:
SSN NAME ADDRESS
1 JOE CHICAGO
2 BOB UK
3 BOB US
R2 = (NAME, ADDRESS)
NAME ADDRESS
JOE CHICAGO
BOB UK
BOB US
After decomposed the Relations and perform Join Operation, Here, The information lost in this
Example, is the address for person 2 and 3. In the original relation R person 2 lives at UK and
3 at US. After joining the Tables R1 and R2, the Person 2 either lives at UK or US.
5
This is how Extra information can result in lossy decomposition can result in lossy
decomposition. The records were not lost but we lost the information about which records were
in the Original Table.
Dependency Preservation:
It is used to allow us to check the integrity constraints efficiently.
Let R be a relation schema that is decomposed into 2 schemas with attributes X and Y. Let F
be a set of FD’s over R.
A decomposition D={R1,R2,R3…Rn} of R is dependency preserving with respect to a set F
of functional dependency if (F1 U F2 U … Fn)+ = F+
Consider a relation R
R—>F ( with some FD)
R is decomposed or divided into R1 with FD{F1} and R2 with {F2} , then there can be 2 cases:
1. F1U F2= F decomposition is dependency preserving
2. F1 U F2 is a subset of F, not dependency preserving
Example:
Problem: Let a relation R(A,B,C,D) and functional dependency { AB—> C , C—>D, D—>A}.
Relation R is decomposed into R1 (A ,B,C), R2(C,D) check whether decomposition is
dependency preserving or not.
Solution:
R1(A,B,C) and R2(C,D)
Let us find closure of F1 and F2 to find closure of F1, consider all combination of ABC, i.e,
find closure of A,B,C,AB,BC,AC.
Closure (A). = {A} // trivial
Closure (B). = {B} // trivial
Closure (C). = {C,A,D} // but D cannot be in closure as D is not present in R1
= {C,A}.
C—>A // removing C from right side as it is trivial attribute
Closure (AB). = {A,B,C,D} // D should be removed
{A,B,C}
AB—>C // removing AB from right side as these are trivial attributes
Closure (BC). = {B,C,D,A}
{B,C,A}
BC—> A // removing BC from right side as these are trivial attribute
6
F1= {C—>A, AB—>C, BC—> A}
Similarly F2= {C—> D}
Functional Dependencies:
The FD is a kind of integrity constraint that generalises the concept of a key. Let R be a relation
schema and let X and Y be non empty of attributes in R we say that an instance r of R satisfies
the FD X—> Y if the following holds for every pair of tuples t1 and t2 in R.
If t1.X = t2.X Then t1.Y=t2.Y
The notation t1.X to refer to the projection of tuples t1 onto the attributes in X. [ from TRC: t.a
for referring to attribute ‘a’ of tuple t]
An FD X—>Y essentially says that if two tuples agree on the values in attributes X, they must
also agree on the values in attribute Y.
Eg: FD: AB—>C by showing an instance that satisfies dependency.
First 2 tuples show an FD is not same as key constraint although the FD is not violated, AB is
clearly not a key for the relation.
A B C D
A1 B1 C1 D1
A1 B1 C1 D2
A1 B2 C2 D1
A2 B1 C3 D1
7
The FD ssn—>lot also holds on Workers.
An FD F is implied by a given set F of FD’s if F holds on every relation instance that satisfies
all dependencies in F that is, F holds all FD’s in F hold.
Note: it is not sufficient for F to hold on some instance that satisfies all dependencies in F,
rather F must hold on every instance that satisfies all dependencies in F.
Fully functional dependency:
In a relation, there exists full functional dependency between any two attributes𝑋 𝑎𝑛𝑑 𝑌, when 𝑋 is
functionally dependent on 𝑌 and is not functionally dependent on any proper subset of 𝑌.
For example: 𝐴𝐵𝐶 → 𝐷, ( Here 𝐷 is fully functionally dependent on 𝐴𝐵𝐶).
Fully functional dependent means, the D value determined by combination of attributes but not on any
subset of attributes𝐴, 𝐵, 𝐶.
For example: Consider a relation (𝑆𝑡𝑢𝑑𝑒𝑛𝑡(𝐼𝐷, 𝑠𝑛𝑎𝑚𝑒, 𝑝𝑟𝑜𝑜𝑓 𝐼𝐷, 𝑔𝑟𝑎𝑑𝑒)
8
Example:
Contracts relation:
Contracts( contractId, supplierid, projectid, deptid, partid,qty, values)
Using domain variables
ContractID – C
Supplierid – S
Projectid – J
Deptid- D
Partid – P
Qty- Q
Values- V
Schema contracts —>CSJDPQV
Following IC’s are known to hold:
The contractId C is a key: a C—>CSJDPQV
A project purchases a given part using a single contract JP—>C
A department purchases at most one part from a supplier SD—> P
Additional FD’s hold in closure of set of FD’s:
From JP—>C
Using transitivity: C—>CSJDPQV
Infer: JP—> CSJDPQV
From SD—>P
Using augmentation
Infer: SDJ—>JP
From SDJ—>JP
JP—>CSJDPQV
Using transitivity
Infer: SDJ—>CSJDPQV
Several additional FD’s that can infer in the closure of using augmentation and decomposition
Example: C—>CSJDPQV
9
Using decomposition:
Infer C->C, C->S, C->J, C->D, C->P, C->Q, C->V.
Finally using reflexivity rule we have number of trivial FD’s.
Attribute closure:
If we want to check whether a given functional dependency 𝑋 → 𝑌 is in the closure of the set of
FD’s, we can do so efficiently without computing 𝐹 + . Instead of it first compute the attribute closure
𝑋 + with respect to 𝐹.
Closure =X;
Repeat until there is no change : {
If there is an FD U_>V in F such that U subset Closure, then set closure = Closure U V
}
Example 1:
Given a Relational schema R(A, B, C, G, H, I) and the set of FDs are:
{A->B, A->C, CG->H, CG->I, B->H} Deduce the F+ for the given relation
A → H. Since A → B and B → H hold, we apply the transitivity rule.
Example 3:
R(A,B,C,D,E,F,G,H)
FD ={ CH->G A->BC B->CFH E->A F->EG}
10
(D)+ -= D
(DA)+=DABCFHEG (CK)
(DB)+=DBCFHEGA (CK)
(DC)+= DCG
(DE)+=DEABCFHEG (CK)
(DF)+=DFEGABCH (CK)
Minimal Cover for a set of FDs/ Canonical Form:
1. Every dependency is as small as possible; that is, each attribute on the left side of FD
is necessary and on the right side is a single attribute.
2. Every dependency in it is required for the closure to be equal to 𝐹 + (is closure set)
To identify Canonical cover for the given set, follow the steps:
a. Splitting Rule: For every FD given, on the right side of FD should have a single
attribute.
b. Remove extraneous attributes
c. Remove redundant FD( using closure set)
11
A->B closure of A+={AC} don’t discard A->B
C->D closure of C+={C} don’t discard C->D
A->C closure of A+={AB} don’t discard A->C
Irreducible set F’={A->B,C->D,A->C}
NORMALIZATION:
It is a process of analyzing given relation schema based on their FD and Primary Keys to
achieve desired properties of minimizing redundancy and minimizing the insertion, deletion
and Update anomalies.
Need for Normalization:
For minimizing data redundancy ie., no unnecessarily duplication of data.
To Make DB Structure flexible ie. It should be possible to add new data values and rows
without reorganizing the DB Structure.
Data should be consistent throughout the DB ie, it should not suffer from following anomalies.
a) Insertion Anomaly
b) Update Anomaly
c) Delete Anomaly
d) Redundant Storage
NORMAL FORMS: It is a state of relation that result by decomposition of that relation for a
good design to avoid redundancy and inconsistency. The Relation is said to be in a particular
Normal Form if it satisfies certain set of conditions.
12
There are different Forms. They are:
a) 1st NORMAL FORM (1NF)
b) 2nd NORMAL FORM (2NF)
c) 3rd NORMAL FORM (3NF)
d) BCNF – BOYCEE CODD NORMAL FORM
e) 4th NORMAL FORM (4NF)
f) 5th NORMAL FORM (5NF)
As per the I NF, the values should maintain atomic and unique. For satisfying the 1NF Criteria
have to represent in following way.
Students:
Roll Number Name Courses
101 X DBMS
101 X CN
101 X SE
102 Y CO
102 Y OS
103 Z CD
103 Z CN
13
Example:
R(A,B,C,D,E,F)
F ={ A-> BCDEF
BC->ADEF
B->F
D->E}
Step 1: First find the Prime Attribute and Non Prime Attributes.
For Finding the Prime and Non Prime Attribute have to perform the Attribute Closure operation
on LHS or which is not present in RHS.
(A)+={ABCDEF}
(BC)+ ={BCADEF}
(B)+={BF}
(D)+={DE}
Step 2: Check whether all Non Prime Attributes are Partial FD or Fully FD on candidate Key.
Case(i) :- If Non-Prime Attributes are partial FD. The Relation R is considered as not in 2NF
form.
Now check the NON-Prime Attributes {D, E, F} are Full Functional Dependency or not:
D Fully FD on A and BC
E Fully FD on A and BC as well as on D, But D is not a candidate Key.
F Fully FD on A and BC, but again F is determined by B.
Ie., BC->F
B->F
Partial FD on Candidate Key on BC .This F is not considered, because it is not suitable for
2NF Criteria.
For making this relation to be 2NF, this B->F. Now this B->F is decomposed into two Relations
R1( A B C D E) R2( B F)
14
3rd NORMAL FORM:
In 3NF, the Relation R be in 2NF and No Non Prime Attribute should be Transitively dependent
on candidate key. There should not be the case that a non prime attribute is determined by
another non prime attribute.
FD = { A->BCDE
BC->ADE
D->E}
Here, you having FD of D->E ie., now you have to remove D->E FD because both are NPA.
According to 3NF, No Prime Attribute should not be determined. For removing , decomposed
the Table into Two.
R1
R2(A, B, C, D) R3( D, E)
In BCNF and it should be in 3NF. For any dependency A->B, A should be a super Key, which
means for A->B, If A is Non-Prime Attribute and B is a Prime Attribute.
Example:
R(A, B, C, D)
A->BCD
BC-> AD
D->B
15
ii) BC->AD
iii) D->B
Here, A B C are super key and D is not super key ie., it is not present in BCNF.
R1(A,D, C) R2( D, B)
R1( A B C)
Therefore, B is a Prime Attribute and D is Non-Prime Attribute so from the above Relation R1
, B cannot be dependent on D. i.e. D -> B cannot be hold.
So, we consider as R1(A, B, C). Now this Relation is in BCNF.
A Relation should satisfy 3NF and BCNF. It should not contain any multivalued Dependency.
4 NF is a level of database normalization where there are non-trivial multivalued dependencies
other than a candidate key.
Multivalued Dependency:
When we declared that the relation in multivalued dependency according to criteria below 3
conditions should satisfy.
Example:
R(A,B,C)
16
Example:
There is no relation between Instructor and Text Book. So, this Relation is in Multivalued
Dependency.
Course Instructor
DBMS X
DBMS Y
DBMS Z
OS W
OS V
17
5th NORMAL FORM:
It is denoted by JD( R1, R2, R3 -------- Rn) specified on Relation Schema R specifies a
constraint on the states r of R. Every Legal state r of R should have a non-addictive Join
Decomposition in R1, R2--------Rn.
Example:
Consider a Relation R
R1 R2
R3
Company Product
C1 PENDRIVE
C1 CD
C2 SPEAKER
C1 SPEAKER
18
5th NF Criteria :
1) R must be in 4NF
2) If Join Dependency not exist , then it will be in 5NF, else If JD exist, then check
whether JD is trivial JD or not.
3) It cannot do decomposition is in 5NF or not.
FUNCTIONAL DEPENDENCY:
Functional Dependency (FD) is a constraint that determines the relation of one attribute to
another attribute in a Database Management System (DBMS).A functional dependency (FD)
has the form X -> Y (read X functionally determines Y ) where X and Y are sets of attributes
in a relation R
X -> Y if and only if: for any instance r of R
For any tuples t1 and t2 of r t1(X) = t2(X) implies t1(Y)=t2(Y)
The Functional Dependencies will identify relation between attributes within a relation. This
facilitate to decompose for removing redundancies.
Example:
R ( A, B, C)
AB-> B then it is a Trivial Functional Dependency
A->A called as Trivial Functional Dependency.
Example:
R ( A, B, C)
AB-> C then it is a NON-Trivial Functional Dependency, i.e, C is not a subset of AB.
19