Dbms Unit III
Dbms Unit III
SCHEMA REFINEMENT
The Schema Refinement refers to refine the schema by using some technique. The best
technique of schema refinement is decomposition. Normalisation or Schema Refinement is a
technique of organizing the data in the database. It is a systematic approach of decomposing
tables to eliminate data redundancy and undesirable characteristics like Insertion, Update and
Deletion Anomalies. Redundancy refers to repetition of same data or duplicate copies of
same data stored in different locations.
Anomalies: Anomalies refers to the problems occurred after poorly planned and normalised
databases where all the data is stored in one table which is sometimes called a flat file
database.
Problems Caused by Redundancy: Anomalies refers to the problems occurred after poorly
planned and unnormalised databases where all the data is stored in one table which is
sometimes called a flat file database. Let us consider such type of schema.
Storing the same information redundantly, that is, in more than one place within a database,
can lead to several problems:
Problem in updation / updation anomaly If there is updation in the fee from 5000 to 7000,
then we have to update FEE column in all the rows, else data will become inconsistent
Insertion Anomaly and Deletion Anomaly- These anomalies exist only due to redundancy,
otherwise they do not exist .
InsertionAnomalies: New course is introduced C4, But no student is there who is having C4
Subject
Because of insertion of some data, It is forced to insert some other dummy data.
Deletion Anomaly: Deletion of S3 student causes the deletion of course. Because of deletion of some
data forced to delete some other useful data.
To Avoid Redundancy and problems due to redundancy, we use refinement technique called
Decomposition.
Decomposition: Process of decomposing a larger relation into smaller relations. Each of smaller
relations contain subset of attributes of original relation
Armstrong’s Axioms: Armstrong axioms defines the set of rules for reasoning about
functional dependencies and also to infer all the functional dependencies on a relational
database. Various axioms rules or inference rules:
X+=X+ ∪ Z // add Z to X+
end if
end for
3.Return X+ // return closure of X
Here, roll_no → name is a non-trivial functional dependency, since the dependent name is not a
subset of determinant roll_no
Similarly, {roll_no, name} → age is also a non-trivial functional dependency, since age is not a
subset of {roll_no, name}
2. Trivial Functional Dependency:If A → B has trivial functional dependency if B is a subset of
A. In Trivial Functional Dependency, a dependent is always a subset of the determinant i.e. If X
→ Y and Y is the subset of X, then it is called trivial functional dependency
The following dependencies are also trivial like: A → A, B → B
AB→AB. For example,
roll_no name age
42 smith 17
43 john 18
44 peter 18
Here, {roll_no, name} → name is a trivial functional dependency, since the dependent name is a
subset of determinant set {roll_no, name}
Similarly, roll_no → roll_no is also an example of trivial functional dependency.
3. Fully Functional Dependency:
If X and Y are an attribute set of a relation, Y is fully functional dependent on X, if Y is
functionally dependent on X but not on any proper subset of X.
Example :In the relation ABC ->D, attribute D is fully functionally dependent on ABC and not
on any proper subset of ABC. That means that subsets of ABC like AB, BC, A, B, etc cannot
determine D. Example :
supplier_id item_id price
1 1 540
2 1 545
1 2 200
2 2 201
1 1 540
2 2 201
3 1 542
From the table, we can clearly see that neither supplier_id nor item_id can uniquely determine
the price but both supplier_id and item_id together can do so. So we can say that price is fully
functionally dependent on { supplier_id, item_id }. This summarizes and gives our fully
functional dependency .{ supplier_id , item_id } -> price
4. Partial Functional Dependency: A functional dependency XY is a partial dependency if Y is
functionally dependent on X and Y can be determined by any proper subset of X.A Functional
dependency XY is a partial dependency if, X is a part of candidate key and Y is a non-key
attribute(s).
Let R(A,B,C) and AB=candidate key.
A-> C is a partial dependency (where C dependson a part of the candidate key).
AB->C is a full dependency (where c depends on the entire candidate key).
Example
R(ABCDE)
F: {AB->C, C->D, B->E }. Find partial dependency.
Solution
AB+ = {A,B,C,D }
C+={ C,D }
B+= {B,E}
=>AB is the only candidate key.
=> key attributes =A,B and
=> non-key attributes = C,D,E.
Key attributes are also called prime attributes and non-key attributes are also called non-prime
attributes.AB->C is not a Partial dependency // it is full dependency.
B-> E is a partial dependency, since B is a part of the candidate key and E is a non-key attribute.
Differences between Full Functional Dependency (FFD) and Partial Functional Dependency
(PFD):
Full Functional Dependency Partial Functional Dependency
A functional dependency X->Y is a fully A functional dependency X->Y is a partial
functional dependency if Y is functionally dependency if Y is functionally dependent
dependent on X and Y is not functionally on X and Y can be determined by any
dependent on any proper subset of X. proper subset of X.
In full functional dependency, the non-prime In partial functional dependency, the non-
attribute is functionally dependent on the prime attribute is functionally dependent on
candidate key. part of a candidate key.
In fully functional dependency, if we In partial functional dependency, if we
remove any attribute of X, then the remove any attribute of X, then the
dependency will not exist anymore. dependency will still exist.
Full Functional Dependency equates to the Partial Functional Dependency does not
normalization standard of Second Normal equate to the normalization standard of
Form. Second Normal Form. Rather, 2NF
eliminates the Partial Dependency.
An attribute A is fully functional dependent An attribute A is partially functional
on another attribute B if it is functionally dependent on other attribute B if it is
dependent on that attribute, and not on any functionally dependent on any part (subset)
part (subset) of it. of that attribute.
Functional dependency enhances the quality Partial dependency does not enhance the
of the data in our database. data quality. It must be eliminated in order
to normalize in the second normal form.
4. Transitive Functional Dependency:
In transitive functional dependency, dependent is indirectly dependent on determinant.i.e. If A →
B &B →C, then according to axiom of transitivity, A → C. This is a transitive functional
dependency.
Prime and non-prime attributes: Attributes which are parts of any candidate key of relation are
called as prime attribute, others are non-prime attributes.
Candidate Key: Candidate Key is minimal set of attributes of a relation which can be used to
identify a tuple uniquely.
Consider student table: student(sno, sname,sphone,age) we can take sno as candidate key. we can
have more than 1 candidate key in a table.
Types of candidate keys:
1. simple(having only one attribute)
2. composite(having multiple attributes as candidate key)
Super Key:Super Key is set of attributes of a relation which can be used to identify a tuple
uniquely.
Adding zero or more attributes to candidate key generates super key.
A candidate key is a super key but vice versa is not true.
Consider student table: student(sno, sname,sphone,age).we can take sno, (sno, sname) as super
key.
Finding candidate keys problems: or Attribute Closure:
Consider the relation scheme R = {E, F, G, H, I, J, K, L, M, M} and the set of functional
dependencies
{{E, F} -> {G}, {F} -> {I, J}, {E, H} -> {K, L}, K -> {M}, L -> {N} on R.
What is the key for R?
Answer: Finding attribute closure of all given options, we get:
{E,F}+ = {EFGIJ}
{E,F,H}+ = {EFHGIJKLMN}
{E,F,H,K,L}+ = {{EFHGIJKLMN}
{E}+ = {E}
{EFH}+ and {EFHKL}+ results in set of all attributes, but EFH is minimal. So it will be
candidate key.
Consider a relation scheme R = (A, B, C, D, E, H) on which the following functional
dependencies hold:
{A–>B, BC–> D, E–>C, D–>A}. What are the candidate keys of R?
Normal Forms:
Normalization is the process of minimizing redundancy from a relation or set of relations.
Normalization is the process of organizing the data in the database. Redundancy in relation may
cause insertion, deletion, and update anomalies. So, it helps to minimize the redundancy in
relations. Normalization divides the larger table into the smaller table and links them using
relationship.
Types of Normal Forms:
First Normal Form(1NF)
Second Normal Form(2NF)
Third Normal Form(3NF)
Boyce and Codd Normal Form (BCNF)
Fourth Normal Form(4NF)
Fifth normal form (5NF)
Domain-key normal form or DKNF
Sixth normal form or (6NF)
First Normal Form(1NF):
A relation is in 1NF if it contains an atomic value. or It must hold only single-valued attribute. A
relation is in first normal form if every attribute in that relation is singled valued attribute. It states
that an attribute of a table cannot hold multiple values. A relation is in first normal form if it does
not contain any composite or multi-valued attribute. First normal form disallows the multi-valued
attribute, composite attribute, and their combinations.
Example: First Normal Form.
roll_no name subject
101 Anil AWS, Python
103 smith Java
102 peter ML, AI
But out of the 3 different students in our table, 2 have opted for more than 1 subject.And we have
stored the subject names in a single column. But as per the 1st Normal form each column must
contain atomic value. Here is our updated table and it now satisfies the First Normal Form.
roll_no Name subject
101 Anil AWS
101 Anil Python
103 smith Java
102 peter ML
102 peter AI
By doing so, although a few values are getting repeated but values for the subject column are now
atomic for each record/row. using the First Normal Form, data redundancy increases, as there will
be many columns with same data in multiple rows but each row as a whole will be unique.
Example:
Domain Courses
Programming Java, C
Web designing PHP, HTML
The above table consist of multiple values in single columns which can be reduced into
atomic values by using first normal form as follows
Domain Courses
Programming Java
Programming C
Web designing PHP
Web designing HTML
Second Normal Form :(2NF) :
A relation will be in 2NF if it is in 1NF and all non-key attributes are fully functional dependent
on the primary key. In the second normal form, all non-key attributes are fully functional
dependent on the primary key. A relation is in 2NF if it has No Partial Dependency, i.e., no non-
prime attribute (attributes which are not part of any candidate key) is dependent on any proper
subset of any candidate key of the table. In second normal form non-prime attributes should not
depend on proper subset of key attributes.
Fully Functional Dependency (FFD): If X and Y are an attribute set of a relation, Y is fully
functional dependent on X, if Y is functionally dependent on X but not on any proper subset of X.
Partial Dependency (PFD): If the proper subset of candidate key determines non-prime attribute,
it is called partial dependency.
Example: Consider following functional dependencies in relation R (A, B , C, D )
AB C [A and B together determine C]
BC D [B and C together determine D]
In the above relation, AB is the only candidate key and there is no partial dependency, i.e., any
proper subset of AB doesn’t determine any non-prime attribute.
Example:
Student id Student name Project Id Project name
101 smith P1 x
102 john P2 y
Here (student id, project id) are key attributes and (student name, project name) are non-prime
attributes. It is decomposed as
Student id Student name Project id
101 smith P1
102 john P2
P1 x
P2 y
Third Normal Form:
A relation is said to be in third normal form , if it is already in second normal form and no
transitive dependencies exists.
3NF is used to reduce the data duplication. It is also used to achieve the data integrity. If there is
no transitive dependency for non-prime attributes, then the relation must be in third normal form.
Update anomaly is caused by a transitive dependency.
A relation is in third normal form, if there is no transitive dependency for non-prime attributes as
well as it is in second normal form.
Let R be a relation schema, X be a subset of the attributes of R, and A be an attribute of R. R is in
third normal form if for every FD X → A that holds over R, one of the following statements is
true:
•A ∈ X; that is, it is a trivial FD, or
•X is a super key, or
OR
A relation is in 3NF if at least one of the following condition holds in every non-trivial function
dependency X –> Y
1. X is a super key.
2. Y is a prime attribute, i.e., each element of Y is part of some candidate key.
Transitive Functional dependency (TFD): If AB and BC are two FDs then AC is called
transitive dependency.
Transitive Dependencies
Suppose that a dependency X → A causes a violation of 3NF. There are two cases:
1.X is a proper subset of some key K. Such a dependency is sometimes called a partial
dependency.
2.X is not a proper subset of any key. Such a dependency is sometimes called a transitive
dependency because it means we have a chain of dependencies K → X → A. The problem is that
we cannot associate an X value with a K value unless we also associate an A value with an X
value.
Example: Consider relation R(A, B, C, D, E)
A BC
CD E
BD
EA
All possible candidate keys in above relation are {A, E, CD, BC} All attribute are on right sides
of all functional dependencies are prime.
******************************************************************************
Example: 1NF,2NF,3NF
so to avoid the repeating groups develop the following relation.
SID sname Caddr major CID Ctitle Iname Iloc grade
111 Anil 208 west IS IS11 DBMS SB 203 east A
To remove the partial dependencies decompose the relation into smaller relations. Decompose the
Grade Report relation into three relations they are Student Relation, Course Relation, Registration
relation.
Student Relation
SID Snsme Caddr Major
111 Anil 208 west IS
222 Adhir 104 East IT
Course Relation
Registration relation
SID CID Grade
111 IS11 A
111 IS22 B
222 IS11 C
222 IT22 B
222 IT33 A
Registration relation:3NF
SID CID Grade
111 IS11 A
111 IS22 B
222 IS11 C
222 IT22 B
222 IT33 A
Course Relation
Course Relation is not in 3NF because InameIloc is transitive dependency hence course
relation not in 3NF But it is in 2NF
CIDCtitle,Iname,Iloc
InameIloc.
CID Ctitle
CIDIname
CIDIloc
InameIloc is transitive dependency hence course relation not in 3NF But it is in 2NF
Here CID determines Iname and Iname determines Iloc is the transitive Dependency.
Course relation decomposes into two relations .
R1:CIDCtitle,Iname
R2: InameIloc. Make Iname as primary key
Course Relation
IS22 SAD AR
IS11 DBMS SB
IT22 COA SB
IT33 C HK
Make Iname as primary key
Iname Iloc
SB 203 east
AR 204 West
SB 203 East
SB 203 East
HK 209 South