Ch04 Normalization
Ch04 Normalization
Normalization
Data Normalization
Formal process of decomposing
relations with anomalies to
produce smaller, well-
structured and stable relations
Primarily a tool to validate and
improve a logical design so that
it satisfies certain constraints
that avoid unnecessary
duplication of data
2
1
Well-Structured Relations
A relation that contains minimal data redundancy
and allows users to insert, delete, and update rows
without causing data inconsistencies
Goal is to avoid (minimize) anomalies
Insertion Anomaly adding new rows forces user to
create duplicate data
Deletion Anomaly deleting a row may cause loss of
other data representing completely different facts
Modification Anomaly changing data in a row forces
changes to other rows because of duplication
2
Anomalies in this Table
Insertion cant enter a new employee without having
the employee take a class
Deletion if we remove employee 140, we lose
information about the existence of a Tax Acc class
Modification giving a salary increase to employee
100 forces us to update multiple records
Second normal
form(2NF)
Third normal
form (3NF)
Boyce-Codd normal
form (BC-NF)
Fourth normal
Form (4NF)
Fifth normal
form (5NF)
6
3
First Normal Form (1NF)
Only atomic attributes (simple, single-value)
A primary key has been identified
Every relation is in 1NF by definition
1NF example:
Student
StudentId StuName CourseId CourseName Grade
100 Mike 112 C++ A
100 Mike 111 Java B
101 Susan 222 Database A
140 Lorenzo 224 Graphics B
Third normal
form (3NF)
Boyce-Codd normal
form (BC-NF)
Fourth normal
Form (4NF)
Fifth normal
form (5NF)
8
4
Functional Dependencies
Functional Dependency: The value of one attribute (the
determinant) determines the value of another attribute.
AB reads Attribute B is functionally dependent on A
AB means if two rows have same value of A they
necessarily have same value of B
FDs are determined by semantics: You cant say that a FD
exists just by looking at data. But can say whether it does
not exist by looking at data.
Quick Check
Id Name?
Age Gender?
Name Id?
Name, Age Id?
10
5
Functional Dependencies and Keys
Functional Dependency: The value of one attribute (the
determinant) determines the value of another attribute.
Candidate Key
Attribute that uniquely identifies a row in a relation
Could be a combination of (non-redundant) attributes
Each non-key field is functionally dependent on every
candidate key
12
EmpID ____________________
EmpID, CourseTitle ____________________
13
6
Practice Exercise #7, page #193
14
15
7
Second Normal Form (2NF)
16
8
Functional Dependencies in Student
2NF: Normalizing
How do we convert the partial dependencies into
normal ones ? By breaking into more tables.
19
9
You Try
20
Boyce-Codd normal
form (BC-NF)
Fourth normal
Form (4NF)
Fifth normal
form (5NF)
22
10
Third Normal Form
2NF and no transitive dependencies
A transitive dependency is when a non-key
attribute depends on another non-key
attribute
Note: This is called transitive, because the
primary key is a determinant for another
attribute, which in turn is a determinant for
a third attribute
23
3NF Example
25
11
3NF Normalization
Classroom Capacity
27
You Try
28
12
Practice Exercise #15, page #196
Insertion anomaly?
Deletion anomaly?
Modification anomaly?
30
31
13
Figure: 4-22 Steps in Normalization
Table with Multivalued
attributes
Remove Multivalued
Attributes
First normal
form (1NF)
Remove Partial
Dependencies
Second normal
form(2NF)
Remove Transitive
Dependencies
Third normal
form (3NF)
Remove remaining
anomalies resulting from
Boyce-Codd normal multiple candidate keys
form (BC-NF)
Fourth normal
Form (4NF)
Fifth normal
form (5NF)
33
Further Normalization
Boyce-Codd Normal form (BCNF)
Slight difference with 3NF
To be in 3NF but not in BNF, needs two composite
candidate keys, with one attribute of one key depending
on one attribute of the other
Not very common J
If a table contains only one candidate key, the 3NF and the
BCNF are equivalent.
Fourth Normal Form (4NF)
To break it, need to have multivalued dependencies, a
generalization of functional dependencies
14
BCNF Example
Assume that
For each subject, each student is taught by one Instructor
Each Instructor teaches only one subject
Each subject is taught by several Instructors
Course, Student Instructor
Course Instructor Student
CS 121 Dr. A. James Bill Payne
Instructor Course
CS 121 Dr. A. James Tony Perez
CS 121 Dr. A. James James Atkinson
CS 121 Dr. A. James Linda Lee
BCNF
Boyce-Codd normal form (BCNF)
A relation is in BCNF, if and only if, every
determinant is a candidate key.
The difference between 3NF and BCNF is that for
a functional dependency A B, 3NF allows this
dependency in a relation if B is a primary-key
attribute and A is not a candidate key,
36
15
Figure: 4-22 Steps in Normalization
Table with Multivalued
attributes
Remove Multivalued
Attributes
First normal
form (1NF)
Remove Partial
Dependencies
Second normal
form(2NF)
Remove
Third normal
form (3NF)
Remove remaining
anomalies resulting from
Boyce-Codd normal multiple candidate keys
form (BC-NF)
Remove Multivalued
Fourth normal Dependencies
Form (4NF)
Fifth normal
form (5NF)
37
4NF
A multi-valued dependency exists when
There are at least 3 attributes A, B, C in a relation and
For each value of A there is a well defined set of values
for B, and a well defined set of values for C,
But the set of values for B is independent on the set of
values for C
38
16
4NF Example
Assume that
Each subject is taught by many Instructors
The same books are used in many subjects
Each Instructor uses a different book
Course, Instructor Text
Course, Text Instructor
Course Instructor Text
CS 121 Dr. A. James Int to Com Science
CS 121 Dr. P. Hold Comp Scien Int
Textbook Example
40
17
The Normalization Example in the Text Book
Figure 4-24 INVOICE (Pine Valley Furniture Company)
41
18
Figure 4-26 INVOICE relation (1NF)
Table with no multivalued attributes and unique rows
44
19
Figure 4-27 Functional dependency diagram for INVOICE
2NF
20
Figure 4-29 Transitive Dependencies were Removed (3NF)
Two Relations
Remain
Getting it into
Third Normal
Form
3NF
47
You Try
48
21
49
50
22
After learning one of most important
database concepts and theories...
WHATS NEXT ?
51
User view-1 User view-2 User view-3 User view-N
Logical Model
(ERD or E/ERD)
(more relations
(Six) Relations Transformation produced)
(more tables
NORMALIZATION (up to 3NF) created)
IMPLEMENTATION
52
23