1
Normalization of Database Tables
CHAPTER 4
Chapter Objectives
Understand concepts of normalization
Learn how to normalize tables
Understand normalization and database design issues
Database Tables and Normalization
Normalization is a process for assigning attributes to entities.
It reduces data redundancies.
An un-normalized relation (table) stores redundant data, which
can cause insertion, deletion, and modification anomalies.
In simple words: Normalization means keeping a single copy
of data in your database.
Normalization theory provides a step by step method to
remove redundant data and undesirable table structures.
Normal Forms
Tables are normalized by applying rules to create a series of
normal forms:
First normal form (1NF)
Second normal form (2NF)
Third normal form (3NF)
Boyce/Codd normal form (BCNF)
Fourth normal form (4NF)
Projection Join normal form (PJNF, aka 5NF)
A table or relation in a higher level normal form always
confirms to lower level normal forms.
Normal Forms
1NF Relations
2NF Relations
3NF Relations
BCNF Relations
4NF Relations
PJ/NF (5NF) Relations
While higher level normal forms are available, normalization up to BCNF is often found to
be adequate for business data.
First Normal Form
A relation is in 1NF if all underlying domains contain atomic
values only, i.e., the intersection of each row and column
contains one and only one value.
The relation must not contain repeating groups.
PNo
1
PName
Alpha
Beta
Omega
ENo
101
105
110
101
108
106
102
105
EName
John Doe
Jane Vo
Bob Lund
John Doe
Jeb Lee
Sara Lee
Beth Reed
Jane Vo
Is the above relation in 1NF?
Jcode
NE
SA
CP
NE
NE
SA
PM
SA
ChgHr
$65
$80
$60
$65
$65
$80
$125
$80
Hrs
20
15
40
20
15
20
20
10
First Normal Form
The previous relation can be converted into first
normal form by adding Pno and Pname to each row.
PNo
1
1
1
2
2
2
3
3
PName
Alpha
Alpha
Alpha
Beta
Beta
Beta
Omega
Omega
ENo
101
105
110
101
108
106
102
105
EName
John Doe
Jane Vo
Bob Lund
John Doe
Jeb Lee
Sara Lee
Beth Reed
Jane Vo
Jcode
NE
SA
CP
NE
NE
SA
PM
SA
What is the primary key in this relation?
Do you see redundant data in this table?
What anomalies could be caused?
ChgHr
$65
$80
$60
$65
$65
$80
$125
$80
Hrs
20
15
40
20
15
20
20
10
Functional Dependency Revisited
If A and B are attributes (or group of attributes) of a relation R,
B is functionally dependent on A (denoted A B), if each value
of A in R is associated with exactly one value of B in R.
A is called a determinant.
Consider the relation
Student (ID, Name, Soc Sec Nbr, Major, Deptmt)
Assume a department offers several majors, e.g. INSY
department offers, INSY, MASI, and POMA majors.
How many determinants can you identify in Student?
Functional Dependency Revisited
A Dependency diagram
ID
Name Soc_Sec_Nbr
Major
Dept
Functional Dependency Revisited
Full functional dependency
10
Attribute B is fully functionally dependent on attribute A if it is
functionally dependent on A and not functionally dependent on
any proper subset of A.
This becomes an issue only with composite keys.
Transitive dependency
A, B and C are attributes of a relation such that A B and B C,
then C is transitively dependent on A via B (provided that A is
not functionally dependent on B or C)
Second Normal Form
11
Dependency diagram for Project
PNo
PName
ENo
EName
JCode
ChgHr
Hrs
Second Normal Form
12
A relation is in 2NF if:
It is in 1NF and
every nonkey attribute is fully dependent on the primary
key, i.e., no partial dependency.
A nonkey attribute is one that is not a primary key or part of a
primary key.
We create new relations that are in 2NF through projection
of the original relation.
Project(PNo, PName)
Employee(ENo, EName, Jcode, ChgHr)
Charge(PNo, ENo, Hrs)
2NF
13
2NF
PNo
PName
ENo
PNo
ENo
Hrs
EName
JCode
ChgHr
Second Normal Form
14
Tables in 2NF
Project
PNo
1
2
3
PName
Alpha
Beta
Omega
Employee
ENo
EName JCode ChgHr
101
John Doe NE $65
102
Beth Reed PM $125
105
Jane Vo
SA $80
106
Sara Lee
SA $80
108
Jeb Lee
NE $65
110
Bob Lund CP $60
Charge
PNo
1
1
1
2
2
2
3
3
ENo
101
105
110
101
108
106
102
105
Hrs
20
15
40
20
15
20
20
10
Second Normal Form
15
Note that the original relation can be recreated through natural
join of the new relation.
Thus, no information is lost in the process of creating 2NF
relations from a 1NF relation. This is called nonloss
decomposition.
If a relation that is in 1NF has a non composite primary key
(i.e., the primary key consists of a single attribute) what can
you say about its status with regard to 2NF?
Do you see any redundant data in the tables that are in 2NF?
What anomalies could be caused by such redundancy?
Third Normal Form
16
A relation is in 3NF if:
It is in 2NF and
every nonkey attribute is nontransitively dependent on the
primary key (i.e., no transitive dependency).
Relation Employee has a transitive dependency:
ENo
JCode ChgHr
Employee can be replaced by two relations, that are in 3NF:
Employee(ENo, EName, Jcode)
Job(JCode, ChgHr)
3NF
17
3NF
PNo
PName
ChgHr
JCode
PNo
ENo
ENo
Hrs
EName
JCode
Third Normal Form
18
Tables in 3NF
Project
PNo
1
2
3
PName
Alpha
Beta
Omega
Employee
ENo
EName Jcode
101
John Doe NE
102
Beth Reed PM
105
Jane Vo
SA
106
Sara Lee
SA
108
Jeb Lee
NE
110
Bob Lund CP
Charge
PNo
1
1
1
2
2
2
3
3
Job
Jcode
CP
NE
PM
SA
ENo
101
105
110
101
108
106
102
105
ChgHr
$60
$65
$125
$80
Hrs
20
15
40
20
15
20
20
10
Boyce-Codd Normal Form
A relation is in BCNF if
every determinant is a candidate key.
BCNF is a special case of 3NF.
The potential to violate BCNF may occur in a relation that:
A determinant is an attribute (combination of attributes) on which some
other attribute is fully functionally dependent.
contains two (or more) composite candidate keys,
these keys overlap and share at least one attribute.
Thus, if a table contains only one candidate key or only noncomposite keys, then 3NF and BCNF are equivalent.
19
3NF Table Not in BCNF
Figure 4.7
20
Decomposition of Table
Structure to Meet BCNF
21
Figure 4.8
Boyce-Codd Normal Form
22
Consider the following example:
The members of a recruiting team interview candidates on a oneto-one basis. Each member is assigned a particular room on a
given date. Each candidate is interviewed only once on a
specific date. He/she may return for follow up interviews on
later dates.
Interview (CID, IDate, ITime, StaffID, RmNo)
CID
C01
C02
C03
C01
IDate
8-22-99
8-22-99
8-22-99
8-29-99
ITime
10:00
11:00
10:00
3:00
StaffID
S01
S01
S05
S06
RmNo
B107
B107
B108
B108
Boyce-Codd Normal Form
This relation has following functional dependencies:
23
CID, IDate ITime, StaffID, RmNo
StaffID, IDate, ITime
CID, RmNo
RmNo, Idate, Itime
StaffID, CID
StaffID, IDate
RmNo
This relation does not have any partial or transitive
dependencies on the primary key (CID, IDate)
It is not in BCNF because (StaffID, Idate) is a determinant but
not a candidate key.
The new relations in BCNF are:
Interview (CID, IDate, ITime, StaffID)
Room(StaffID, IDate, RmNo)
Dependency Diagram
Dependency diagram
Fig 1
CID
IDate
ITime StaffID RmNo
Fig 2
CID
IDate
ITime StaffID RmNo
Fig 3
CID
IDate
ITime StaffID RmNo
24
Fourth Normal Form
A table is in 4NF if
25
it is in 3NF and
has no multiple sets of multivalued dependencies.
Consider the following example:
Each course is taught by many teachers and requires many texts.
CTXU (Unnormalized)
Course Teacher Text
Physics Green
Basic Mechanics
Brown Intro to Optics
Math
White
Modern Algebra
Intro to Calculus
CTXN (Normalized)
Course Teacher Text
Physics Green
Basic Mechanics
Physics Green
Intro to Optics
Physics Brown Basic Mechanics
Physics Brown Intro to Optics
Math
White
Modern Algebra
Math
White
Intro to Calculus
Fourth Normal Form
CTXN is in BCNF, because it is all key and there are no other
functional dependencies.
It, however, has redundant data that could cause update
anomalies.
This table shows two multivalued dependencies:
Each course has a defined set of teachers and
Course
Teacher
Each course has a defined set of textbooks.
26
Course
Text
MVDs can exist only when the relation has at least three
attributes.
An FD is a special case of MVD when the set of dependent
values has a single value.
Fourth Normal Form
Tables in 4NF
CT
Course
Physics
Physics
Math
Teacher
Green
Brown
White
CX
Course
Physics
Physics
Math
Math
Text
Basic Mechanics
Intro to Optics
Modern Algebra
Intro to Calculus
27
Conversion to 4NF
Figure 4.15
Set of Tables in 4NF
Figure 4.14
Multivalued Dependencies
28
Normalization and Database Design
Normalization should be part of the design process
E-R Diagram provides macro view
Normalization provides micro view of entities
29
Focuses on characteristics of specific entities
May yield additional entities
Difficult to separate normalization from E-R diagramming
Business rules must be determined
Normalization purity is difficult to sustain due to conflict
in:
Design efficiency
Information requirements
Processing
Denormalization
30
Normalized (decomposed) tables require additional processing,
thus reducing system speed.
Sometimes normalization is not done keeping in mind
processing speed requirements and practical aspects of the
situation.
A good example is: storing Zip code and City as attributes in a
Customer relation violates 3NF because City is transitively
dependent on Cust ID via Zip Code.
Why should we not create a separate relation
City)?
ZIP (ZipCode,