Normalization
Normalization
Lecture 06
Schema Refinement and
Normalization
1
Goals of Normalization
Decide whether a particular relation R is in
good form.
In the case that a relation R is not in good
form, decompose it into a set of relations {R1,
R2, ..., Rn} such that
Each relation is in good form
The decomposition is a lossless-join decomposition
2
Decomposition
Suppose that relation R contains attributes
A1 ... An. A decomposition of R consists of
replacing R by two or more relations such
that:
Each new relation scheme contains a subset of the
attributes of R (and no attributes that do not
appear in R), and
Every attribute of R appears as an attribute of one
of the new relations.
3
Example of Redundancy
4
Decomposition
Eliminates Redundancy
To get back the original relation: Natural Join
5
Unnecessary Decomposition
6
Bad Decomposition
to
tocreate
createrelations
relationswhere
whereevery
everydependency
dependencyisison
onthe
thekey,
key,the
thewhole
wholekey,
key,
and
andnothing
nothingbut
butthe
thekey
key
11
Normalization
12
Normalization
13
First Normal Form 1NF
Is
Isthe
theabove
aboverelation
relationin
in INF?
INF?
14
First Normal Form 1NF
NewStu
Stuid lastName major credits status socSecNo
S1001 Smith History 90 Senior 100429500
S1003 Jones Math 95 Senior 010124567
S1006 Lee Math 15 Freshman 088520876
CSC
NewStu(StuId,
S1060 Jones
NewStu(StuId, lastName,
CSC 25major, Freshman
lastName, major, credits, status,
status, socSe
credits,064624738 socS
For
Foreach
eachmulti-valued
multi-valuedattribute,
attribute,create
createaanew
newtable,
table,ininwhich
whichyou
youplace
placethe
thekey
key
ofofthe
theoriginal
originaltable
tableand
andthe
themulti-valued
multi-valuedattribute.
attribute. Keep
Keepthetheoriginal
originaltable,
table,with
with
its
itskey
key
Another Method
Flatten
Flattenthe
theoriginal
originaltable
tableby
bymaking
makingthe
themulti-valued
multi-valuedattribute
attributepart
partofofthe
thekey
key
17
First Normal Form 1NF
Employee
Unnormalized
emp_no name dept_no dept_name skills
1 Kevin Jacobs 201 R&D C, Perl, Java
2 Barbara Jones 224 IT Linux, Mac
3 Jake Rivera 201 R&D DB2, Oracle, Java
Employee
Normalized to INF
emp_no name dept_no dept_name skills
1 Kevin Jacobs 201 R&D C
1 Kevin Jacobs 201 R&D Perl
1 Kevin Jacobs 201 R&D Java
2 Barbara Jones 224 IT Linux
2 Barbara Jones 224 IT Mac
3 Jake Rivera 201 R&D DB2
3 Jake Rivera 201 R&D Oracle
3 Jake Rivera 201 R&D Java
18
Second Normal Form 2NF
CLASS
CLASS (Course#,
(Course#, StudID,
StudID, StuLastName,
StuLastName, FacID,
FacID, Sched,
Sched, Ro
R
In
Inaarelation
relationR,
R,attribute
attributeBBofofRRisisfully
fullyfunctionally
functionallydependent
dependenton
onananattribute
attributeor
or
set
setofofattributes
attributesAAofofRRififBBisisfunctionally
functionallydependent
dependenton onAAbut
butnot
notfunctionally
functionally
dependent
dependenton onany
anyproper
propersubset
subsetofofAA
StudID StuLastName
(Partial FD)
Grade
Gradeisisfully
fullyfunctionally
functionallydependent
dependenton
onCourse#,
Course#,StudID
StudID
20
Second Normal Form 2NF
AArelation
relationisisinin2NF
2NFififand
andonly
onlyififititisisinin1NF
1NFand
andall
allthe
thenonkey
nonkeyattributes
attributesare
arefully
fully
functionally
functionallydependent
dependenton onthe
thekey
key
21
Second Normal Form 2NF
i.i. Identify
Identifyeach
eachpartial
partialFD
FD
ii.ii. Remove
Removethe
theattributes
attributesthat
thatdepend
dependon
oneach
eachofofthe
thedeterminants
determinantsso
so
identified
identified
iii.
iii. Place
Placethese
thesedeterminants
determinantsininseparate
separaterelations
relationsalong
alongwith
withtheir
theirdependent
dependent
attributes
attributes
iv.
iv. InInoriginal
originalrelation
relationkeep
keepthe
thecomposite
compositekey
keyand
andany
anyattributes
attributesthat
thatare
arefully
fully
functionally
functionallydependent
dependenton
onall
allofofitit
v.v. Even
Evenififthe
thecomposite
compositekey
keyhas
hasno
nodependent
dependentattributes,
attributes,keep
keepthat
thatrelation
relation
totoconnect
connectlogically
logicallythe
theothers
others
22
Second Normal Form 2NF
CLASS
CLASS (Course#,
(Course#, StudID,
StudID, StuName,
StuName, FacID,
FacID, Sched,
Sched, Room
Room
FDs grouped by determinant:
Course#, StudID StuLastName, FacID, Sched, Room, G
Course# FacID, Sched, Room
StudID StuLastName
Create tables grouped by determinants:
ART103A S1006 Lee F101 MWF9 H221 B First Normal Form Relation
CSC201A S1003 Jones F105 TUTHF10 M110 A
CSC201A S1006 C
HST205A S1001
Second Normal Form Relations
24
Third Normal Form 3NF
Transitive Dependency
If A, B, and C are attributes of relation R, such that A B, and B
C, then C is transitively dependent on A
Example:
NewStudent
NewStudent (stuId,
(stuId, lastName,
lastName, major,
major, credits,
credits,
status)
status)
FD:creditsstatus
FD:creditsstatus
By
By transitivity:
transitivity:
stuIdcredits creditsstatus
stuIdcredits creditsstatus implies
implies
25
stuIdstatus
Third Normal Form 3NF
AArelation
relationisisin
in3NF
3NFifif itit isisin
in 2NF
2NF and
andno
no nonkey
nonkey attribute
attributeisis
transitively
transitivelydependent
dependenton onthe thekey
key
Example
NewStudent
NewStudent (stuId,
(stuId, lastName,
lastName, major,
major,
credits,
credits, status)
status)
With FD:creditsstatus
Is NOT in 3NF
26
Third Normal Form 3NF
27
Third Normal Form 3NF
NewStudent
Stuid lastName Major Credits Status
S1001 Smith History 90 Senior
S1003 Jones Math 95 Senior
S1006 Lee CSC 15 Freshman
TransitiveJones
S1060
Dependency
CSC 25 Freshman
NewStu2 Stats
Stuid lastName Major Credits Credits Status
S1001 Smith History 90 15 Freshman
S1003 Jones Math 95
S1006 Lee CSC 15 25 Freshman
63 Junior
Removed
S1060 Transitive
Jones Dependency
CSC 25
28
Stages of Normalisation
Unnormalised
(UDF)
Remove repeating groups
First normal form
(1NF)
Remove partial dependencies
Second normal form
(2NF)
Remove transitive dependencies
Third normal form
(3NF)
Remove remaining functional
dependency anomalies
Boyce-Codd normal
form (BCNF)
Remove multivalued dependencies
Fourth normal form
(4NF)
Remove remaining anomalies
Fifth normal form
(5NF)
Unnormalised Normal Form (UNF)
Definition: A relation is unnormalised when it has not had any
normalisation rules applied to it, and it suffers from various anomalies.
This only tends to occur where the relation has been designed
using a bottom-up approach. i.e., the capturing of attributes to a
Universal Relation from a screen layout, manual report, manual
document, etc...
Recap up to 3NF
Example
31
Recap up to 3NF
Example
32
Boyce-Codd Normal Form
AArelation
relationisis in
inBCNF
BCNF ifif and
andonly
onlyififevery
everydeterminant
determinant isisaa
candidate
candidatekey key
33
BCNF and 3NF
34
BCNF: Example
FDs:
office dept
facName,dept office, rank, dateHired Candidate Keys
facName,office dept, rank, dateHired
Primary Key
NewFac is not BCNF because office is not a candidate key
Fac1 Fac2
office dept facName office rank dateHired
A101 Art Adams A101 Professor 1975
36
Normalization: Example 1
37
Normalization: Example 1
prijName projMgr empId hours empName budget startDate salary empMgr empDept rating
39
Normalization: Example 2
1. Each project has a unique name.
2. Although project names are unique, names of employees and
managers are not.
3. Each project has one manager, whose name is stored in projMgr.
4. Many employees can be assigned to work on each project, and an
employee can be assigned to more than one project. The attribute
hours tells the number of hours per week a particular employee is
assigned to work on a particular project.
5. budget stores the amount budgeted for a project, and startDate
gives the starting date for a project.
6. salary gives the annual salary of an employee.
7. empMgr gives the name of the employees manager, who might not
be the same as the project manager.
8. empDept gives the employees department. Department names are
unique. The employees manager is the manager of the employees
department.
9. rating gives the employees rating for a particular project. The
project manager assigns the rating at the end of the employees
work on the project. 40
Normalization: Example 2
Functional dependencies
projName projMgr, budget, startDate
empId empName, salary, empMgr, empDept
projName, empId hours, rating
empDept empMgr
empMgr does not functionally determine empDept since people's
names were not unique (different managers may have same
name and manage different departments or a manager may
manage more than one department
projMgr does not determine projName
Primary Key
projName, empId since every member depends on that
combination
41
projName
projNameprojMgr,
projMgr,budget,
budget,startDate
startDate
empId
empId empName, salary, empMgr,em
empName, salary, empMgr, e
projName,
projName,empId
empIdhours,
hours,rating
rating
Normalization: Example 2
empDept
empDeptempMgr
empMgr
Transform to:
Proj
Proj(projName,
(projName,projMgr,
projMgr,budget,
budget,startDate)
startDate)
Emp
Emp(empId,
(empId,empName,
empName,salary,
salary,empMgr,
empMgr,
empDept)
empDept)
Work1
Work1(projName,
(projName,empId,
empId,hours,
hours,rating)
rating)
42
Proj
Proj(projName,
(projName,projMgr,
projMgr,budget,
budget,startDate)
startDate)
Emp (empId, empName, salary, empMgr,
Emp (empId, empName, salary, empMgr,
empDept)
empDept)
Work1
Work1(projName,
(projName,empId,
empId,hours,
hours,rating)
rating)
Normalization: Example 2
Second Normal Form
Proj Work1
prijName empId hours rating
prijName projMgr budget startDate
Jupiter E101 25 9
Jupiter Smith 100000 01/15/04
Jupiter E105 40
Maxima Lee 200000 03/01/04
Jupiter E110 10 8
Maxima E101 15
Maxima E110 30
Maxima E120 15
Emp
empId empName salary empMgr empDept
E101 Jones 60000 Levine 10
E105 Adams 55000 Jones 12
E110 Rivera 43000 Levine 10
E101 Jones 60000 Levine 10
E110 Rivera 43000 Levine 10
E120 Tanaka 45000 Jones 15
43
Proj
Proj(projName,
(projName,projMgr,
projMgr,budget,
budget,startDate)
startDate)
Emp
Emp (empId, empName, salary, empMgr,empDept)
(empId, empName, salary, empMgr, empDept)
Work1 (projName, empId, hours, rating)
Work1 (projName, empId, hours, rating)
44
Normalization: Example 2