0% found this document useful (0 votes)
13 views

Module 4 - Normalization

Uploaded by

H K
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Module 4 - Normalization

Uploaded by

H K
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 141

An Introduction to

Database Management Systems

Lecture by:
Prathima M. G ,B.E., M.E ,
Assisstant professor
Dept. of Computer Science & Engineering
Bangalore Institute of Technology
Bangalore – 560 004
[email protected]
Relational Database Design

Schema Refinement
RELATIONAL DATABASE DESIGN

NORMALIZATION

BOTTOM - UP APPROACH
(THROUGH SYNTHESIS)
Phases of database
design:
 Step-01: Requirement Gathering
 Step-02: Conceptual database design

( ER_modeling)
 Step-03: Logical database design
(ER- Relational Mapping)
 Step-04: Schema refinement
(Normalization)
 Step-05: Implementation
(using SQL)
Overview
 Informal guidelines for database design
 The concept of Functional Dependencies (FDs)
 Trivial and Nontrivial Dependencies
 Closure of set of Functional Dependencies
 Minimal Set of FD
 Finding the Candidate key
 Normal Forms (1NF, 2NF,3 NF, and BCNF)
 Examples on Normalization
Relational Database Design
Schema refinement :
 The process of evaluating relational schemas
for design quality or
 Measuring the appropriateness/ Goodness of
relational schema other than the intuition of
designer
Approaches to database design
 Analysis:
 Top- Down approach
 Identify ENTITIES and associate ATTRIBUTES.

 Synthesis:
 Bottom-Up approach
 Consider Individual attributes of a TABLE and
associate appropriately with a TABLE.
Normalization
 The formal process that can be followed to achieve
a good database design
 Also used to check that an existing design is of
good quality
 The different stages of normalization are known as
“normal forms”
 To understand this, we need to understand the
concept of functional dependency
Informal Guidelines for Good
Database Design
Four Informal Measures
 SEMANTICS of the relation attributes must be
maintained.

 Reducing the REDUNDANT VALUES in tuples

 Reducing NULL VALUES in tuples

 Disallowing
the possibility of GENERATING
SPURIOUS TUPLES.
Drawbacks of an unnormalized relation
 Consider a WASE DB.
 WASE needs to keep track of details regarding
 its STUDENT (like USN, Name, DOB, Gen, Addr,..)
 the COURSES/SUBJECTS offered like (Cno, Cname,
Sem)
 and also keep track of details regarding Student
being enrolled for many Courses and a Course
having many students enrolled for it along with the
Marks_Range obtained by each student in each
course he/she is enrolled
 and then award Grade based on Marks_range.
ER Diagram for WASE DB

Sname DOB Addr

USN
M_Range
WASE Student_Course Details

Grade
Cno Cname Sem
ER_Relational Mapping
WASE Student_Course Details Primary Key: USN,CNO

USN Sname DOB Addr CNO Cname Sem M-range Grade

S1 Ram 1-jan-92 blr cs11 DBMS 4 70-79 B


S1 Ram 1-jan-92 blr cs12 OS 4 80-89 A
S2 Shyam 1-jan-92 blr cs11 DBMS 4 90-100 A+
S3 Ram 1-jan-93 blr cs11 DBMS 4 80-89 A
S1 Ram 1-jan-92 blr cs13 DS 3 70-79 B
S4 John 1-feb-92 chi cs13 DS 3 80-89 A
S2 Shyam 1-jan-92 blr cs12 OS 4 60-69 C
S5 Jane 1-jan-94 chi cs12 OS 4 80-89 A
S5 Jane 1-jan-94 chi cs13 DS 3 70-79 B
Informal Guidelines-01
 Semanticsof the attributes of relation must be
maintained:
Whenever we group attributes to form a Relation
 Design a relation schema so that it is easy to explain its
meaning.
 Do not combine attributes from multiple entity types
into a single relation.
 If we combine attributes from multiple entities and
relationships, semantic ambiguity will result and the
relation cannot be easily explained.
STUDENT_COURSE
USN Sname DOB Addr CNO Cname Sem M-range Grade

Although there is nothing wrong logically in the


STUDENT_COURSE relation, the mixing of
attributes from distinct real-world entities will not
characterize either of the Entities.
EMP_DEPT
Emp_No Ename DOB Sal DOJ Addr DeptNo Dname Dlocs
Informal Guidelines-02
 Reducing not Eliminating Redundancy in tuples of a
relation:
 One prime objective of any database design is to minimize storage
space used by base relations
 Another serious problem is of UPDATE ANOMALIES
 Insertion Anomalies
 Deletion Anomalies
 Modification Anomalies
Insertion Anomalies
INSERT INTO STUDENT_COURSE (USN,SNAME, DOB, Addr) VALUES
(‘S6’, ’JULIE’, ‘1-jan-94’, ‘CHI’);
WASE Student_Course Details Primary Key: USN,CNO
USN Sname DOB Addr CNO Cname Sem M-range Grade
S1 Ram 1-jan-92 blr cs11 DBMS 4 70-79 B
S1 Ram 1-jan-92 blr cs12 OS 4 80-89 A
S2 Shyam 1-jan-92 blr cs11 DBMS 4 90-100 A+
S3 Ram 1-jan-93 blr cs11 DBMS 4 80-89 A
S1 Ram 1-jan-92 blr cs13 DS 3 70-79 B

S4 John 1-feb-92 chi cs13 DS 3 80-89 A

S2 Shyam 1-jan-92 blr cs12 OS 4 60-69 C

S5 Jane 1-jan-94 chi cs12 OS 4 80-89 A

S5 Jane 1-jan-94 chi cs13 DS 3 70-79 B

S6 Julie 1-jan-94 chi ????

???? CS14 CN 3
Insertion Anomalies
Identifying Attribute: USN,CNO
USN Sname DOB Addr CNO Cname Sem M-range Grade

Insertion Anomalies:
Experienced when we attempt to store a value for one field but
cannot do so because the value of another field is unknown.
Eg: To add a STUDENT to the database, we MUST specify the
course to which he has enrolled.
To add a COURSE to the database, we MUST specify the
student who has enrolled for the course..
Deletion Anomalies
 Experienced when the value of an attribute or field of a
relation is unexpectedly removed when value for another
an attribute/field is deleted.
Assume that a particular Student is no more. we need to
delete the student details.
 E.g., If we delete a Student S6 from the Table, then the
corresponding Cno, Cname, Sem, .. values of that row is
also deleted.
 This results in the loss of information. Here Course CS14
is removed from the database.
Deletion Anomalies
DELETE FROM STUDENT_COURSE WHERE USN=‘S6’;
WASE Student_Course Details Primary Key: USN,CNO
USN Sname DOB Addr CNO Cname Sem M-range Grade

S1 Ram 1-jan-92 blr cs11 DBMS 4 70-79 B


S1 Ram 1-jan-92 blr cs12 OS 4 80-89 A
S2 Shyam 1-jan-92 blr cs11 DBMS 4 90-100 A+
S3 Ram 1-jan-93 blr cs11 DBMS 4 80-89 A
S1 Ram 1-jan-92 blr cs13 DS 3 70-79 B
S4 John 1-feb-92 chi cs13 DS 3 80-89 A
S2 Shyam 1-jan-92 blr cs12 OS 4 60-69 C
S5 Jane 1-jan-94 chi cs12 OS 4 80-89 A
S5 Jane 1-jan-94 chi cs13 DS 3 70-79 B
S6 Julie 1-jan-94 chi CS14 CN 3 80-89 A
S6 Julie 1-jan-94 chi CS11 DBMS 4 80-89 A
Modification Anomalies
 In STUDENT_COURSE relation, if we want o change the
value of Sname of a STUDENT from Ram to
Ramkumar, then we must update all the tuples of that
attribute wherever that attribute Sname exists.

 If we fail to update in some tuples, then the same USN


will show different values for Sname, making the database
inconsistent.
Modification anamoly
WASE Student_Course Details Primary Key: USN,CNO
USN Sname DOB Addr CNO Cname Sem M-range Grade

S1 RamKumar 1-jan-92 blr cs11 DBMS 4 70-79 B


S1 RamKumar 1-jan-92 blr cs12 OS 4 80-89 A
S2 Shyam 1-jan-92 blr cs11 DBMS 4 90-100 A+
S3 Ram 1-jan-93 blr cs11 DBMS 4 80-89 A
S1 RamKumar 1-jan-92 blr cs13 DS 3 70-79 B
S4 John 1-feb-92 chi cs13 DS 3 80-89 A
S2 Shyam 1-jan-92 blr cs12 OS 4 60-69 C
S5 Jane 1-jan-94 chi cs12 OS 4 80-89 A
S5 Jane 1-jan-94 chi cs13 DS 3 70-79 B

S1 Ram 1-jan-92 blr cs14 PT 4 70-79 B


Guideline-2: A well structured
table
Well-structured table - contains minimal redundancy and
allows users to insert, modify, and delete the rows without
any errors or inconsistencies or any possible anomalies like
• Insertion Anomalies
• Deletion Anomalies
• Modification Anomalies
Reducing null values in tuples
Empno Ename DOB Sal Comm Designation DNO
E1 A 1-jan-80 10000 10000 salesman D1
E2 B 1-jan-80 100000 Manager D1
E3 C 1-jan-80 100000 Developer D1
. . . . . D2
E90 z 1-jan-81 20000 12000 salesman D2
. . . . 12000 salesman D3
E100 zzz 1-jan-79 100000 manager D3

If there are only three salesman of all 100 employees in the


Company, then having a column/attribute called Commission,
which will have 97 null values will waste space at the storage level.

Another problem with null value column is HOW to account for


them when AGGREGATE operations like COUNT, SUM,AVERAGE are
applied.
Null Values
Interpretation of Null values
 The attribute does not apply to this tuple
 The attribute value for his tuple is unknown
 The value is known but absent or not
recorded.
Guideline-03:
 As far as possible , avoid placing attributes in a base
relation whose values may frequently be null. If nulls
are unavoidable , make sure that they apply in
exceptional cases only and do not apply to majority
of tuples in the relation.
Generation of Spurious Tuples
 Consider 2 relations
Project Department
PNO Pname loc DNo DeptNO Dname loc
P1 PMS Blore D1 D1 Sales Blore
P2 BMS Blore D2 D2 R&D Mysore
P3 UMS Mglore D3 D3 Mktng Blore
P4 LMS Mysore D5 D4 Dev Mglore
P5 HMS Mglore D4 D5 Testing Mglore

P6 ERP Hubli D1

Project EQUI_JOIN(LOC) Department


Spurious Tuples
SELECT * FROM PROJECT P, DEPARTMENT D WHERE
P.LOC=D.LOC;

Project_Department
Pno Pname DNo Deptno Dname loc
Spurious Tuples

P1 PMS D1 D1 Sales Blore


P1 PMS D1 D3 Mktng Blore
P2 BMS D2 D1 Sales Blore
P2 BMS D2 D3 Mktng Blore
P3 UMS D3 D4 Dev Mglore
P3 UMS D3 D5 Testing Mglore
P5 HMS D4 D4 Dev Mglore
P5 HMS D4 D5 Testing Mglore
P4 LMS D5 D2 R&D Mysore
GuideLine-04
 Design Relation schemas so that they can be
joined with equity conditions on attributes
that are either primary key or Foreign key
fields in a way that guarantees that no
SPURIOUS tuples are generated.
Normalisation
Database design may have some amount of
 Inconsistences
 Uncertainties
 Redundancies
Normalization is the Refinement process so
as to eliminate these drawbacks.
 It is Defined as a step-by-step process of
decomposing a complex relation into a simple
and stable data structure so as to eliminate
these drawbacks.
Normalization is based on the concept of Functional
Dependency.
Functional Dependency (FD)
 A FD is defined as the constraint that exists
between the attribute sets of a relation.
 A FD, denoted by X  Y, between two sets of
attributes X and Y that are subsets of relation
R specifies a constraint on the possible tuples.
 The constraint is that, for any two tuples or
records t1 and t2 of R that have t1[X]=t2[X],
then they must also have t1[Y]=t2[Y].
 This means that the values of Y component of
a tuple in R depend on or is determined by the
values of X component
NOTE:
 If a constraint on relation R states that there
cannot be more than one tuple with a given X
value in R – then it is a candidate key of R.
 The FD is a property of the SEMANTICS of the
ATTRIBUTES.
 The database designer will use their understanding
of the semantics of attributes of R to specify FD
that hold on all relation states.
 Consider a relation R with (X, Y) attributes.
 Attribute Y is functionally dependent on attribute X,
iff. each value of X determines EXACTLY ONE value
of Y.
Functional dependency
 X  Y : We say here “x determines y” or “y is
functionally dependent on x”
 The left-hand side of the FD is some times called as
the Determinant and the right-hand side is called
Dependent.
 XY does not imply that YX
 If the value of an attribute “Marks” is known then the
value of an attribute “Grade” is determined since
Marks Grade
Types of functional dependencies:
 – Full dependency
 – Partial dependency
 – Transitive dependency
Full dependencies
 An attribute B of a relation R is fully functionally
dependent on attribute A of R if it is functionally
dependent on A & not functionally dependent on
any proper subset of A.
 Report( USN,CNO,Sname,DOB,Addr,Cname,
Sem,Marks_Range, Grade)
 USN,CNO Marks_Range
This implies that for a given pair of USN,CNO values
occurring in the relation Report there is exactly one value
of Marks. ie Marks are dependent on USN,CNO as a
composite pair, but not on either individually
Partial dependencies
An attribute B of a relation R is partially dependent on
attribute A of R if it is functionally dependent on any
proper subset of A.

 Report(USN,CNO,Sname,DOB,Addr,Cname,
Sem,Marks_Range, Grade)
 CNOCname, Sem
 USN Sname,DOB,Addr

The attributes Sname,DOB,Addr are said to be partially


dependent on the key (USN,CNO) since they are
dependent only on USN and not on CNO
Transitive dependencies
An attribute B of a relation R is transitively dependent on
attribute A of R if it is functionally dependent on an
attribute C Which in turn is functionally dependent on A
or any proper subset of A.

 Report(USN,CNO,Sname,DOB,Addr,Cname, Sem,Marks_Range,
Grade)
 USN,CNO Marks_Range
 Marks_Range Grade
AB and BC => AC
The attribute Room# is said to be transitively
dependent on the key C# since it is dependent on
LName which in turn is dependent on C#.
Inference Rules(IR) or
Armstrong's Axioms
 IR-1: Reflexivity: If X ‫ ﬤ‬Y AND XX, XY
 IR-2: Augmentation: If X  Y, then XZ  YZ.
 IR-3: Transitivity: If X  Y and Y  Z,

then X  Z.
 IR-4: Decomposition: If X  YZ, then X  Y

and X  Z.
 IR-5: Union: If X  Y and X  Z,

then X  YZ.
 IR-6: Pseudo transitivity: XY, WYZ then
WXZ
IR-1: Reflexivity: If X ‫ ﬤ‬Y AND XX, XY
y
 Proof: Let r is some relation state of R and
there exists 2 tuples t1 and t2
S.T. t1[x]=t2[x] then we must have
t1[y]=t2[y]
Because X ‫ ﬤ‬Y hence x->y must hold in R.
Ex:X={ssn,ename}
Y={ename}
{ssn.ename}->ename
IR-2: Augmentation: If X  Y, then XZ  YZ.

 Let x->y in a relation instance r of R then


 1)t1[x]=t2[x]
 2)t1[y]=t2[y]
 3)t1[xz]=t2[xz]
 4)From 1 and 3 t1[z]=t2[z]
 5)from 2 and 4 t1[yz]=t2[yz]

Ex:{ssn}->{ename}
{ssn,add}->{ename,add}
{ssn,phn,add}->{ename,phn,add}
Therefore xz->yz
IR-3: Transitivity: If X  Y and Y  Z,
then X  Z.

 Let r is an relation instance of R with 2


tuples t1 and t2
 S.t.1)t1[x]=t2[x]
 2)t1[y]=t2[y]
 3)t1[z]=t2[z]
 From 1 and 3
 x->z
 Ex:SSN->DNO
 DNO->DNAME
 SSN->DNAME
Closure of a set of FD (F + )
 Given some FDs, new FDs can often be
inferred.
 The set of all FDs that are implied by a given
set F of FDs is called the closure of F and is
denoted by F +.
 Example:
Let F = {AB  C, C  B} be a set of FDs
satisfied by R (A, B, C). Then,
F + = {A  A, AB  A, B  B, AB  AC,
AB  BC, AB  ABC, etc.}
Example
Supposing we are given a relation R {A, B, C, D, E, F}
with a set of FDs as shown below:
F={A  BC,B  E,D  EF}. Show that the FD
AD  F holds for R and is a member of the closure.

(1) A  BC {Given}
(2) A  C {Decomposition of (1)}
(3) AD  CD {Augmentation of (2) by adding D}
(4) D  EF {Given}
(5) CD  EF {Augmentation}
(5) AD  EF {Transitivity of (3) and (4)}
(6) AD  F {Decomposition of (5)}
Attribute closure, X+
 To compute F + , start with FDs in F; repeatedly apply IR-1 to
IR-3 until no new FD can be derived
 Armstrong's Axioms do not produce any incorrect FDs that are
added to F +. However, finding F + is too expensive; the
complexity grows exponentially
 The solution is to find the attribute closure of X, denoted as X
+
Algorithm to find X+
Algorithm Attribute_Closure()
{
X + = X;
Repeat {
for each FD XY in F do X + = X +  Y
for each FD Y  Z in F do
if Y  X + then X+ =X +  Z
// i. e. if Y is in X +, the add Z to X +
until no change;
// until no more attributes are added to X +
}
}
Example: Let us consider
SSN ename pnumber pname plocation hours

 F={ssn->ename,
pnumber->{pname,plocation},
{ssn,pnumber}->hours}
Closure sets w.r.t F
{ssn}+={ssn,ename}
{pnumber}+={pnumber,pname,plocation}
{ssn,pnumber}
+={ssn,ename,pnumber,pname,plocation,hours}
Example
Consider R (A, B, C) and a set of FDs
F = {AB  C, C  B}
Using the Algorithm, we calculate the following
closure sets with respect to F:
A+ = {A},
B+ = {B},
C+ = {C, B} because of FD-2
{AB}+ = {ABC} because of FD-1 add attribute C
{AC}+ = {ACB} because of AC  AB (IR-2) add
attribute B
{BC}+ = {BC} nothing can be added
{ABC}+ = {ABC} nothing can be added
Minimal Cover (F I )
 A set of FDs F is minimal and can be
represented as a set of FDs G if it satisfies the
following conditions:
a) Every FD in G has a single attribute on its right-
hand side, i.e. X  A, where A is a single attribute.
b) No FD can be removed from G and still have a set of
FDs that is equivalent to F.
c) We can not replace any FD: X  A in F with a
dependency Y  A, where Y  A and still have a
set of dependencies that is equivalent to F.
Algorithm to find the Minimal cover
Algorithm MinimalCover(F)
{
Step-1: G = F.
Step-2: Transform G into a set of FD's with right hand side
containing only one attribute (Canonical cover).
Step-3: Eliminate a redundant attribute from left-side.
For each dependency A1, A2, ..., Ak  B in current set of G,
and each attribute Ai in its left-side,
if G - {A1, A2, ..., Ak  B }  {A1, A2, ..., Ai -1, Ai+1,..., Ak  B}
is equivalent to G.
then delete Ai from the left side of A1, A2, ..., Ak  B.
Step-4: Eliminate a redundant dependency
For each dependency X  Y in the current set of dependencies G
if G - {X  Y} is equivalent to G then delete X  Y from G.
}
Given F = {B->A,D->A,AB->D} ,Find the
minimal cover of E.
STEP1:G= {B->A,D->A,AB->D}

STEP 2:All above dependencies are in


canonical form,so we have completed
step2 of alg and proceed to step 3 we
need to determine if AB->D has any
redundant attribute on the LHS,i,e can it
be replaced by A->D OR B->D ?
 2)since B->A by augmenting B on both sides
(IR2) we have BB->AB or B->AB(i)
However,AB->D as given (ii)
3)Hence by the transitive rule(IR3) we get
(i) and (ii) B->D,hence AB->D may be replaced
by B->D
4)We have a set equivalent to original
E say E I:{B->A,D->A,B->D}
No
further reduction is possible in step2 .Since
all FD’s have single attribute on the LHS.
5)In
step3,we look for a redundant FD in E I.
By using the transitive
rule on B->D and D->A
 We derive B->A. Hence B->A is redundant
in E I and can be eliminated .

Hence the minimal cover


of E is
{B->D,D->A}
Example
 Given F = {AB  C, A  D, BD  C, D  BG, AE  F}
Make all FDs with single attributes in the right-side (only
FD-4 need to be split).
Step-1: G = F
Step 2: G = {AB  C, A  D, BD  C, D  B, D  G,AE  F}
Step 3: The only FDs to be considered are
AB  C, BD  C.
For AB  C, we want to find whether A  C or B  C holds or
not.
Because A  D, D  B implies A  B (transitivity).
Because A  B implies A  AB.
Because A  AB, AB  C implies A  C (transitivity).
Thus, G - {AB  C}  {A  C}  G.
Similarly, BD  C can be replaced by D  C, because D  B
implies D  BD, and
D  BD, BD  C implies D  C.

However, AE  F cannot be replaced by any other FD.


The result of Step-3 is given below:
G = {A  C, A  D, D  C, D  B, D  G,
AE  F}
Step-4: A  C can be removed from G, because A  D,
D  C implies A  C.

The minimal cover of original F is:


G = {A  D, D  C, D  B, D  G, AE  F}
Find the Non-redundant cover of FI
R(A,B,C,D,E,H)
F={ABC, BE,ABHBD,DAEH,DHBC}

Find the canonical cover.


FC ={AB, AC , BE , ABHB , ABHD,
DA, DE, DH, DHB, DHC }
Potential FD’s for removing Redundant Fd’s are
AB, AC , BE
ABHB DHC DE,
DHB,
Find the Non-redundant cover of FI
Consider the FD’s
AB
ABHB
……LHS is redundant /Augmentation of AB
DHB
…..LHS is composite, hence remove
Consider the FD’s
 AC
 DHC

…..LHS is composite, hence remove


Consider the FD’s
 BE ……B is dependent and also a determinant, but D is a
determinant of more attributes, hence remove
 DE
FI={AB , AC, DA, DE, DH, ABHD}
Candidate Keys
1: Draw the dependency graph of F. Each vertex corresponds to an
attribute.
Edges can be defined as follows:
A  B becomes A B
A  BC becomes A B
C
AB  C becomes A
B C
2: Identify Vertices Vni that have No_Incoming edges.
3: Identify Vertices Voi that have Only_Incoming edges.
4: A candidate key is a set of attributes that
 contains all attributes in Vni.
 contains no attributes in Voi.
 has no subset that is already a candidate key.
Example - 1
 Consider R (A, B, C, G, H, I), and
F = {A  BC, CG  B HI, B  H}.
H
A C
I
G

Vni = {A, G} – ‘no incoming’


Voi = {H, I} – ‘only incoming’
(AG)+ = {A, B, C, G, H, I},
AG is the only candidate key
F= {A  D, AC, D  C, D  B, D  G, AE  H}

A D B

E C G
H

Vni = {B,C,G,H}
Voi = {A, E}
Candidate keys = AE
Example - 2
Consider R (A, B, C, D, E, H), and
F = {A  B, AB  E, BH  C, C  D, D  A}

A B C D

E H
Vni = {H}
Voi = {E}
Candidate keys = AH, BH, CH, and DH
 BS={ABCD} So to find ck,w.r.t BS
 AH+={AH}
 ={AHB}
 ={AHBE}
 ={AHBEC}
 ={AHBECD}

CK1={AH}

BH+={BH} {BHC}{BHCD}{BHCDA}
{BHCDAE}
CK2={BH}
={CHD}{CHDA}{CHDAB}
{CHADBE}=CK3={CH}
 DH+={DH}
 {DHA}
 {DHAB}{DHABE}{DHABEC}
 CK4=DH
 F={A->C,C->D,D->B,E->F}
 F={CH->G,A->BC,B->CFH,E->A,F->EG}
First normal form: 1NF
A relation schema is in 1NF if all of its attributes are:
 single-valued
 restricted to assuming atomic values,

1NF implies:
 Composite attributes are represented only by their
component attributes
 Attributes cannot have multiple values

In relational database design it is not practically


possible to have a table which is not in 1NF.
B_Code B_Name B_Addr BR_No BR_Name BR_Addr AccNo

AType Bal L_ID LType LAmt C_ID Cname


Prime Vs Non-Prime Attributes
An attribute of a relation R that belongs to any
key of R is said to be a prime attribute and
that which doesn’t is a non-prime attribute

–E.g Report (S#,C#,Title,Lname,Room#,Marks)


 S# is a prime attribute
 C# is a prime attribute
 Title is a non-prime attribute
Second normal form: 2NF
A relation schema R is in 2NF if it is in 1NF and every
non-prime attribute is fully functionally dependent on
every key of R

Consider the relational schema:


 Empdetails( E#, Project#, Role, Number_Of_shares,
Share_worth)
In this,
E#, Project#  Role
E#  Number_Of_shares

Number_of_shares depends only on E# irrespective of


the project currently working on. i.e. partial
dependency
A typical snapshot may look like…
After decomposing…
 Empdetails( E#, Project#, Role,
Loan_amount, Loan_type, Interest_rate)
Becomes
 Emp_Project (E#, P#, Role)
 Emp_Stock (E#, Num_of_Shares,
Share_worth)
This avoids the anomalies that were present in
the original relation Still redundancy remains
due to the transitive dependency
 E#->Num_of_shares
 Num_of_shares->Share_worth
Third normal form:3 NF
A relation schema R is in 3NF if it is in 2NF and
every non-prime attribute is non-transitively
dependent on every key of R

Applying this, the relation


 Emp_Stock
(E#,Num_of_Shares,Share_worth)
Will be decomposed into
 Emp_Stock( E#, Num_of_Shares)
 StockWorth (Num_of_Shares,Share_worth)
Boyce-Codd normal form:BCNF
A relation R is in BCNF if, for every non-trivial
functional dependency AB in it, it is true
that A is a superkey of R In other words,
every determinant is a candidate key
 BCNF is a stronger form of 3NF
 3NF states that every non-prime attribute
must be non-transitively dependent on every
key
BCNF states that every attribute (prime or
non-prime) must be non-transitively
dependent on every key
3NF versus BCNF
 A table is in 3NF if it is in 2NF and there are no transitive
dependencies.
 What about a case in which a non-key attribute is the
determinant of a key attribute?
 This condition does not violate 3NF, but does violate BCNF.
Because, BCNF expects that every determinant in the table be a
candidate key.

FD1:AB  CD
A B C D FD2: C  B

 How to decompose this table and make it BCNF?


 R1 = {A, C,D} and R2 = {C,B}
BCNF
 From the FD: F = {FD1:AB  CD, FD2:C  B} we
know that B is Dependent from FD2.
 Vni={A}
 Voi={ D}
A D
 Candidate Key is

A+ ={A}
AB+ ={A,B,D,C} B C
AC+ ={A,C,B,D}
 Logically R1{A,C,D} is a better choice over R1{A,B,D}
as the join operation is will not generate SPURIOUS
tuples.
Example - 2
 Consider R (City, Street, Zipcode) or R (C, S, Z) and F = {CS 
Z, Z  C}.
The candidate keys for R are CS and ZS (using dependency
graph).
 The relation R is in 3NF (since each attribute is prime) but not
in BCNF, because in Z  C, Z is not a superkey and also it is
not a trivial FD. In R, we cannot store the city to which a
zipcode belongs unless we know a street address with the
zipcode. This introduces insertion anomaly.
 To convert this into BCNF, decompose R into:
R1 = {Z, C} and R2 = {S, Z}
 If we have R2={C,S} as the other table, then
 We can’t have a Foreign key reference to link both the tables.
 Determinants of R2 (i.e.,C,S)does not determine any other attribute
BCNF
 Also, from the FD: F = {FD1:CS  Z, FD2:Z  C}
we know that C is Dependent from FD2.
 Vni={S}
 Voi={ }
 Candidate Key is S
Z
S ={S}
+

SZ+ ={S,Z,C}
C
SC+ ={C,S,Z}
 Logically R1{S,Z} is a better choice over R1{C,S} as
the join operation is will not generate SPURIOUS
tuples.
Example - 3
 Consider the relation GradeList (S, N, C, G}
FD-1: {Name, Course} GPA NC  G
FD-2: {StudentNo, Course}  GPASC  G
FD-3: Name  StudentNo NS
FD-4: StudentNo  Name SN
 Candidate keys are:
{Name, Course} N G
{StudentNo, Course}

C S
 The relation is in 3NF.
 But redundancy of data.
The association between Name and the corresponding
StudentNo is repeated.
- insertion anomaly.
 There exists deletion anomaly too.
(if a student fails in all subjects, looses the student
information!).
 The relation Gradelist is not in BCNF, because
of FD-3 and FD-4 which are nontrivial and
their determinants (left-hand side) are not
super keys of GradeList.
BCNF Checking
 For each FD X  Y in R calculate X+.
 If X+ includes all the attributes of R, then it is in BCNF,
otherwise it is not.
 Eg: Assume R (C, S, Z) and F = {CS  Z, Z  C} that
is not in BCNF.
Attribute Closure:
(CS)+ = (C, S, Z) and
Z+ = (Z, C). The second FD {Z  C} does not include
all attributes and hence it is not in BCNF. So,
decompose R based on 2FD as R2(Z,C) .
Emp_Details relation
E_ssn E_Name E-Dob E-Sal Dno D_name D_loc

P_no P_name E_#of _shares Shares_worth


FD Diagrams
SSN Name BDate Addr DNumber DName DMGRSSN

FD1 FD2

FD1: SSN  {Name, BDate, Addr, DNumber}

FD2: DNumber  {DName, DMgrSSN}


Eg: Consider the relation:
Courses (Dept#, Course#, Lecturer#, Num_Students)
Assumptions:
 Each Department offers may courses
 Course# is unique within a Department only*
 Each Lecturer belongs to one Dept only
 Each Lecturer may handle several courses within the
dept.
 A particular course offered by a department may be
handled by a single lecturer.
* The same course id may refer to a different course
offered by a different department
The functional dependencies
 {Dept#,Course#} Lecturer#
 {Dept#,Course#}  Num-of_students
 {Lecturer#,Course#} Num-of_students
 Lecturer#  Course#

The candidate keys are:


 {Dept#,Course#}
 {Lecturer#,Course#}
A sample table
Observations
 In the table, the only non-prime attribute is Num-
of_students.
 It depends on every key of the table non-transitively
 So, it is in 3NF
 But, the fact that Lecturer L1 belongs to department
D1 is repeated –redundancy
 Lecturer#->Dept#. In this, the attribute Dept# is
only partially dependent on the key

The solution
 Course_Offering (Lecturer#, Course#, Num-of-
Students)
 Lecturer (Lecturer#, Dept#)
Case Study
The HR dept of an organization is planning for a big recruitment drive.
• They wish to organize the data required for the process, in a
database. The data that needs to be captured is as follows:
Functional dependencies
 Enroll#, -> Name,
 Enroll ->Address,
 Enroll ->DOB,
 Enroll -> Gender,
 Enroll -> Phone,
 Enroll -> interviewer
 Interviewer -> Int_Name (transitive dependency)
 Interviewer -> Extension (transitive dependency)
Qualifications
 {Enroll#, qualification, year_of_passing }  awarded_by
 {Enroll#, qualification, year_of_passing }  class
Assumptions:
 A person may acquire the same qualification several times from
the same university (e.g M.A in english, M.A in history)
 Only one degree can be obtained in an year
Functional dependencies
Employment
 Enroll#, Employername,date_joined designation
 Enroll#, Employername, date_joined 

reason_for_Leaving,
 Enroll#, Employername, date_joined  date_left
 Enroll#, Employername, date_joined  last_slary
 Employername  address (partial dependency)
 Employername  telephone (partial dependency)
1NF
 Applicant( Enroll#, Name, Address, DOB,
Gender, Phone, interviewer,Int_Name,
Extension)
 Qualifications( Enroll#, qualification,
year_of_passing, awarded by ,class)
 Employment( Enroll#, Employername,
date_joined, address, telephone,designation,
reason_for_Leaving, date_left, last_slary)
2NF
 Applicant( Enroll#, Name, Address, DOB, Gender,
Phone, interviewer,Int_Name, Extension)
 Qualifications( Enroll#, qualification,
year_of_passing, awarded_by ,class)
 Employment( Enroll#, Employername, date_joined,
designation,reason_for_Leaving,date_left, last_slary )
 Employer( Employername, Address, Phone)

Removal of partial dependencies


3NF
Removal of transitive dependencies

 Applicant( Enroll#, Name, Address, DOB,


Gender, Phone, interviewer)
 Panel( interviewer, Name, Extn)
Example of ?
SupplierId City ProdId
S10 Bangalore P1
S10 Bangalore P2
S20 Chennai P1
S30 Bangalore P1
S30 Bangalore P4
S40 Belgaum P2

SupplierId  City
{SupplierId, ProdId}  City
End of Chapter
Functional Dependencies
Consider the following Relation

REPORT_CARD (REG_NO, COURSE_NO,


StudentName, CourseName, Marks,
Grade)
Description of attributes
 Marks  Scored in Course COURSE# by Student
STUDENT#
 Grade obtained by Student STUDENT# in Course
COURSE#
Identify Functional Dependencies
 Marks is awarded for each Student w.r.t each Course/
subject the student is enrolled for . So we observe the
following Functional dependency
Reg_No, COURSE_No  Marks
 We can get to know the Name of course, if we know
the Course_No. Hence, we observe the following
Functional dependency.
Course_No  CourseName
 We can get to know the Name of course, if we know
the Course_No. Hence, we observe the following
Functional dependency
Reg_No  StudentName
Identify Functional Dependencies
 We can get to know the GRADE of course, if
we know the MARKS obtained in the course.
Hence, we observe the following Functional
dependency.
Marks  Grade
Full dependencies
X ,Y ,Z are attributes of R.
X,Y Functionally determines Z
Note: Subset of X ,Y should not functionally determine Z

Student#

Marks

Course#

X,Y Functionally determines Z


Partial dependencies
X and W are attributes.
Attribute W is partially dependent on the Identifying attribute X,Y . It is
dependent on a sub-set of Identifying attribute X,Y.

We have both the functional dependency valid in our example

Student# Course# CourseName


Course# CourseName
Student# Student Name

So we can say that CourseName is partially dependent on Student#


Course#
Transitive dependencies
X Y and Z are three attributes.
X -> Y
Y-> Z
=> X -> Z
Need for Normalization

Lets observe the data of Online Retail Application Table in a flat file

CustomerDetails ItemDetails PurchaseDetails

1001 John 1500012351 STN001 Pen


Microsoft Office 10 A 5 50
Excel 97-2003 Worksheet

1002 Tom 1200354611 BAK003 Bread 10 A 1 10

1003 Maria 2134724532 GRO001 Potato 20 B 1 20

Each row of the table Represents the information of a customer


who has purchased an item.
Need for Normalization

In this Scenario
 Can we Insert the record of an item which has not been
purchased by any customer?
The table is not to maintain
the record of items but it is to
keep the record of purchase
of item by customers

Can we delete the record of item which has been purchased


by only one customer?
There will be
information Loss
for that item
Need for Normalization

 How many rows we need to update if there is a change in


description of item?
Depends upon the no
of times the item has
been purchased
 How many times we need to store the description of an item
if the same item is purchased many times?

Depends upon the no


of times the item has
been purchased

So we observe the following in the Un Normalized table:


Insert , Delete, Update Anomaly and Data Duplication
Additional Properties
1)Lossless Join or additive Property

2)Dependency Preservation Property


First Normal Form: 1NF
 A relation schema is in 1NF :
if and only if all the attributes of the relation R are

atomic in nature.
 Atomic: the smallest level to which data may be
broken down and remain meaningful.
1NF implies:
 Composite attributes are represented only by their
component attributes
 Attributes cannot have multiple values

In relational database design it is not practically


possible to have a table which is not in 1NF.
Online Retail Application Tables –
1NF Normalized
Observation on Un Normalized Retail Application Table
CustomerDetails ItemDetails PurchaseDetails
1001 John 1500012351 STN001 Pen 10 A 5 50
1002 Tom 1200354611 BAK003 Bread 10 A 1 10
1003 Maria 2134724532 GRO001 Potato 20 B 1 20

Above observation violates 1NF definition


To bring it to 1NF we need to make the columns atomic
QtyPurch
CustomerId CustomerName Accountno ItemId ItemName UnitPrice Class ased NetAmt

1001 John 1500012351 STN001 Pen 10 A 5 50

1002 Tom 1200354611 BAK003 Bread 10 A 1 10

1003 Maria 2134724532 GRO001 Potato 20 B 1 20


Second Normal Form: 2NF
 A Relation is said to be in Second Normal Form if and only if :
 It is in the First normal form, and
 There is FULL DEPENDENCY retained or No partial dependency exists between
non-key attributes and key attributes.
 An attribute of a relation R that belongs to the candidate key of R is
said to be a key attribute and that which doesn’t is a non-key
attribute.
 To make a table 2NF compliant, we have to remove all the partial
dependencies
Second Normal Form : Example
Functional Dependencies of Retail Application Table

RetailApplicationTable( CustomerId, ItemId, CustomerName,


AccountNo,ItemName, UnitPrice, Class,QtyPurchased,
NetAmount)

(i) CustomerId  CustName, AccountNo

(ii) ItemId  ItemName, UnitPrice, Class

(iii) CustId,ItemId QtyPurchased, NetPrice

(iv) UnitPriceClass
Second Normal Form : (Cont..)
Key and Non Key Attributes of Retail Application Table

{CustomerId, ItemId} is Candidate key

Key Attributes: CustomerId,ItemId

Non Key Attribures: CustomerName,


AccountNo,
ItemName,
UnitPrice,
Class,
QtyPurchased,
NetAmount
Second Normal Form : (Cont..)

Fully Functionally dependent on Key Attribute


CustomerId, ItemId QtyPurchased, NetPrice

Partial Dependency with respect to Key Attribute

CustomerId CustomerName,AccountNo

ItemId ItemName,UnitPrice,Class
Second Normal Form : (Cont..)
After removing the Partial dependencies on Key Attributes we get
the below tables which aree in 2NF:

Customer
CustomerId CustomerName Accountno
1001 John 1500012351
1002 Tom 1200354611
1003 Maria 2134724532

Item
ItemId ItemName UnitPrice Class
STN001 Pen 10 A
BAK003 Bread 10 A
GRO001 Potato 20 B
Second Normal Form : (Cont..)
ItemPurchase
CustomerId ItemId QtyPurchased NetAmt
1001 STN001 5 50
1002 BAK003 1 10
1003 GRO001 1 20
Third Normal Form: 3 NF
A relation R is said to be in the Third Normal Form (3NF) if and only if
 It is in 2NF and
 No transitive dependency exists between non-key attributes and key
attributes through another non key attribute.

A B C

It should
be key It should be
Attribute It should be non key
non key attribute
attribute

To make a table 3NF compliant, we have to remove all such Transitive


Dependencies
Third Normal Form : Example
Let us consider Item table obtained after bringing Retail Application
table in to 2NF:

ItemId UnitPrice

UnitPrice Class

ItemId UnitPrice Class

The above dependencies violates 3NF definition


Third Normal Form : (Cont..)
After removing the transitive dependencies from the Item table we
get the following two table

Item
ItemId ItemName UnitPrice
STN001 Pen 10
BAK003 Bread 10
GRO001 Potato 20

ItemClass
UnitPrice Class
10 A
20 B
ename ssn bdate add dno dname dmgrss
n
lots
pid Country Lot# area price Tax_rate
name

pid Countr Lot# area price Tax_rat


y name e
pid countryname Lot# area

area price
Boyce-codd Normal Form: BCNF
 A Relation schema R is in BCNF if
whenever a nontrivial FD X->A
 i)AсX(trivial)

ii)X is a super key of R.


 I,e every determinant in a FD should be a
key
A B C D
3NF versus BCNF
 A table is in 3NF if it is in 2NF and there are no transitive
dependencies.
 What about a case in which a non-key attribute is the
determinant of a key attribute?
 This condition does not violate 3NF, but does violate BCNF.
Because, BCNF expects that every determinant in the table be a
candidate key.

FD1:AB  CD
A B C D FD2: C  B

 How to decompose this table and make it BCNF?


 R1 = {A, C,D} and R2 = {C,B}
BCNF
 From the FD: F = {FD1:AB  CD, FD2:C  B} we
know that B is Dependent from FD2.
 Vni={A}
 Voi={ D}
A D
 Candidate Key is

A+ ={A}
AB+ ={A,B,D,C} B C
AC+ ={A,C,B,D}
 Logically R1{A,C,D} is a better choice over R1{A,B,D}
as the join operation is will not generate SPURIOUS
tuples.
Example - 2
 Consider R (City, Street, Zipcode) or R (C, S, Z) and F = {CS 
Z, Z  C}.
The candidate keys for R are CS and ZS (using dependency
graph).
 The relation R is in 3NF (since each attribute is prime) but not
in BCNF, because in Z  C, Z is not a superkey and also it is
not a trivial FD. In R, we cannot store the city to which a
zipcode belongs unless we know a street address with the
zipcode. This introduces insertion anomaly.
 To convert this into BCNF, decompose R into:
R1 = {Z, C} and R2 = {S, Z}
 If we have R2={C,S} as the other table, then
 We can’t have a Foreign key reference to link both the tables.
 Determinants of R2 (i.e.,C,S)does not determine any other attribute
BCNF
 Also, from the FD: F = {FD1:CS  Z, FD2:Z  C}
we know that C is Dependent from FD2.
 Vni={S}
 Voi={ }
 Candidate Key is S
Z
S ={S}
+

SZ+ ={S,Z,C}
C
SC+ ={C,S,Z}
 Logically R1{S,Z} is a better choice over R1{C,S} as
the join operation is will not generate SPURIOUS
tuples.
Example - 3
 Consider the relation GradeList (S, N, C, G}
FD-1: {Name, Course} GPA NC  G
FD-2: {StudentNo, Course}  GPASC  G
FD-3: Name  StudentNo NS
FD-4: StudentNo  Name SN
 Candidate keys are:
{Name, Course} N G
{StudentNo, Course}

C S
 The relation is in 3NF.
 But redundancy of data.
The association between Name and the corresponding
StudentNo is repeated.
- insertion anomaly.
 There exists deletion anomaly too.
(if a student fails in all subjects, looses the student
information!).
 The relation Gradelist is not in BCNF, because
of FD-3 and FD-4 which are nontrivial and
their determinants (left-hand side) are not
super keys of GradeList.
BCNF Checking
 For each FD X  Y in R calculate X+.
 If X+ includes all the attributes of R, then it is in BCNF,
otherwise it is not.
 Eg: Assume R (C, S, Z) and F = {CS  Z, Z  C} that
is not in BCNF.
Attribute Closure:
(CS)+ = (C, S, Z) and
Z+ = (Z, C). The second FD {Z  C} does not include
all attributes and hence it is not in BCNF. So,
decompose R based on 2FD as R2(Z,C) .
Emp_Details relation
E_ssn E_Name E-Dob E-Sal Dno D_name D_loc

P_no P_name E_#of _shares Shares_worth


FD Diagrams
SSN Name BDate Addr DNumber DName DMGRSSN

FD1 FD2

FD1: SSN  {Name, BDate, Addr, DNumber}

FD2: DNumber  {DName, DMgrSSN}


 R={booktitle,authorname,booktype,listpric
e,affilitation,publication}
F={book-title->booktype,publication,
author name->affiliation
book type->list price}
 Find the key and normalize.
 Find vni={Btitle,Authorname}
 Voi={publication,listprice,affilitation}
 {Btitle}+={Btitle,Btype,pub,listprice}#R
 {Authorname}+={Aname,Affilitation}#R
 {Btitle,Authorname}
+={btitle,btype,pub,listprice,authname,Af
 {Btitle,Aname} is the key

Btitle anam btype list aff pub


e
btitle btype list pub
btitle btype pub

btype listprice

Aname Aff
FD1:AB->CD
FD2:C->B
 A+={ABCD} //OK
 C+={CB} //NOT OK
 So decompose,
 R1(A,C,D)->remove B I.e.RHS of FD2
from FD1 .
 R2(C,B)
2.3 Equivalence of Sets of FDs
 Two sets of FDs F and G are equivalent if:
 Every FD in F can be inferred from G, and
 Every FD in G can be inferred from F
 Hence, F and G are equivalent if F+ =G+
 Definition (Covers):
 F covers G if every FD in G can be inferred from F
 (i.e., if G+ subset-of F+)
 F and G are equivalent if F covers G and G covers
F
 There is an algorithm for checking equivalence of
sets of FDs
Slide 10- 131
Example
 X={A->B,B->C}
 Y={A->B,B->C,A->C}

Check if X covers y y x
 Y covers x x y
 then X=Y

First take Y
Y=A->B Y=A->C A+=ABC
A+=ABC
Y=B->C
B+=BC
 X={AB->CD,B->C,C->D}
 Y={AB->C,AB->D,C->D}

X COVERS Y Y COVERS X
AB->C AB+=ABCD
AB+=ABCD C+=CD
B+=BCD
C+=CD
 F={A->B,A->C}
 G={A->B,B->C}
 F COVERS G G COVERS F
 A->B A->B
 A+=ABC A+=ABC
 B->C
 B+=B
Not equivalent
 F={A->B,B->C,C->A}
 G={C->B,B->A,A->C}

F COVERS G G COVERS F
A->B C->B
A+=ACB C+=ABC
B->C B->A
B+=ABC B+=ABC
C->A A->C
C+=ABC A+=ABC
F=G,THEN THEY ARE EQUIVALENT
R(A,B,C,D,E)
 F={A->B,AB->C,D->AC,D->E}
 G={A->BC,D->AE}
 G COVERS F F COVERS G
 A->BC A->B
 A+=ABC A+=ABC
 D->AE AB+=ABC
 D+=DACEB D+=DAEBC
 F=G
 THEN THEY ARE EQUIVALENT
 R(ABCDEF)
 FD={AB->C,C->DE,E->F,F->A}
 CHECK THE HIGHEST NORMAL FORM?
Merits of Normalization
 Normalization is based on a mathematical

foundation.

 Removes the redundancy to a greater

extent.

 Removes the anomalies present in INSERTs,

UPDATEs and DELETEs.


Demerits of Normalization
 Data retrieval or SELECT operation

performance will be severely affected.

 Normalization might not always represent

real world scenarios.


Summary of Normal Forms
Input Operation Output
Un-normalized Create separate rows or columns
for every combination of multi Table in 1 NF
Table valued columns

Table in 1 NF Eliminate Partial dependencies Tables in 2NF

Tables in 2 NF
Eliminate partial dependency Tables in 3 NF
Summary
 While converting ERD into relational schema, each strong entity becomes a table.

 Each weak entity becomes a table.

 For each M:N relationship new table is created.

 Normalization is a refinement process. It helps in removing anomalies present in


INSERTs/UPDATEs/DELETEs.

 There are three normal forms that were defined being commonly used.

 1NF makes sure that all the attributes are atomic in nature.

 2NF removes the partial dependency.

You might also like