1
06 Normalization
CT12F3 Database Modelling And Implementation
2
Learning Outcomes
Students can explain normalization steps
3
Topics
Intro to Normalization
Data Anomalies
Functional Dependency & Keys
Normalization Steps
4
Intro to Normalization
5
Background
• Transforming ER diagrams into relations typically results in well-structured
relations. However, there is no guarantee that all anomalies or redundancies are
removed by following these steps
• Normalization is a formal process in logical data-modelling technique for deciding
which attributes should be grouped together in a relation so that all anomalies are
removed and smaller & well-structured relations (from an organization-wide
view) are produced.
6
Normalization Goals
• Minimize data redundancy, thereby avoiding anomalies and conserving storage
space.
• Simplify the enforcement of referential integrity constraints.
A referential integrity constraint is a rule that maintains consistency among the
rows of two relations. The rule states that if there is a foreign key in one relation,
either each foreign key value must match a primary key value in another relation or
the foreign key value must be null.
• Make it easier to maintain data (insert, update, and delete).
• Provide a better design that is an improved representation of the real world and a
stronger basis for future growth
7
Well-Structured Relations
• A well-structured relation contains
minimal redundancy and allows
users to insert, modify, and delete
the rows in a table without errors or
inconsistencies.
• EMPLOYEE1 is such a relation.
Each row of the table contains data
Figure 1. EMPLOYEE1 relation describing one employee, and any
modification to an employee’s data
(such as a change in salary) is
confined to one row of the table
8
Well-Structured Relations (2)
Figure 2. EMPLOYEE2 relation
• In contrast, EMPLOYEE2 is not a well-structured relation.
• If you examine the sample data in the table, you will notice considerable redundancy.
• For example, values for EmpID, Name, DeptName, and Salary appear in two separate rows for
employees 100, 110, and 150.
• Consequently, if the salary for employee 100 changes, we must record this fact in two rows.
9
Data Anomalies
10
Data Anomalies
• Redundancies in a table may result in errors or inconsistencies (called anomalies)
when a user attempts to change data in the table
• Three types of anomalies
• Insertion anomalies
• Deletion anomalies
• Modification anomalies
11
Insertion Anomaly
Figure 2. EMPLOYEE2 relation
• Suppose that we need to add a new employee to EMPLOYEE2.
• The primary key for this relation is the combination of EmpID and CourseTitle. Therefore, to insert
a new row, the user must supply values for both EmpID and CourseTitle (because primary key
values cannot be null or nonexistent).
• This is an anomaly because the user should be able to enter employee data without supplying course
data.
12
Deletion Anomaly
Figure 2. EMPLOYEE2 relation
• Suppose that the data for employee number 140 are deleted from the table.
• This will result in losing the information that this employee completed a course (Tax Acc) on
12/8/2015.
• In fact, it results in losing the information that this course had an offering that completed on that
date.
13
Modification Anomaly
Figure 2. EMPLOYEE2 relation
• Suppose that employee number 100 gets a salary increase.
• We must record the increase in each of the rows for that employee (two occurrences in the Figure);
otherwise, the data will be inconsistent.
14
Data Anomalies
• These anomalies indicate that EMPLOYEE2 is not a well-structured relation.
• The problem with this relation is that it contains data about two entities:
EMPLOYEE and COURSE. We will use normalization theory to divide
EMPLOYEE2 into two relations.
• One of the resulting relations is EMPLOYEE1. The other we will call EMP
COURSE.
• The primary key of this relation is the combination of EmpID and CourseTitle, and
we underline these attribute names to highlight this fact.
• Examine the Figure to verify that EMP COURSE is free of the types of anomalies
described previously and is therefore well structured.
15
Example of Normalization Result
EMP COURSE
16
Functional Dependency & Keys
17
Background
• A functional dependency is a constraint between two attributes in
which the value of one attribute is determined by the value of another
attribute.
• Normalization forms that we will discuss are based on the analysis of
functional dependencies
18
Definition
• For any relation R, attribute B is functionally dependent on attribute A if, for every
valid instance of A, that value of A uniquely determines the value of B (Dutka and
Hanson, 1989).
• The functional dependency of B on A is represented by an arrow, as follows: A →
B.
• A functional dependency is not a mathematical dependency: B cannot be computed
from A.
• Rather, if you know the value of A, there can be only one value for B. An attribute
may be functionally dependent on a combination of two (or more) attributes rather
than on a single attribute.
19
Example
Assuming Staff_No Position, then it can be stated
that Position is functionally dependent to Staff_No.
That is:
Value in Position can be determined with certainty if
value in Staff_No is known.
Because one Position can be occupied by more than one
Staff, then Position Staff_No is incorrect. In other
words, Staff_No is not functionally dependent to
Position. That is: Staff_No value can’t be determined if
Position value is known.
20
Example (2)
Typical examples of functional dependencies are the following:
• SSN → Name, Address, Birthdate A person’s name, address, and birth date are
functionally dependent on that person’s Social Security number (in other words,
there can be only one Name, one Address, and one Birthdate for each SSN).
• VIN → Make, Model, Color The make, model, and the original color of a vehicle
are functionally dependent on the vehicle identification number (as above, there can
be only one value of Make, Model, and Color associated with each VIN).
• ISBN → Title, FirstAuthorName, Publisher The title of a book, the name of the
first author, and the publisher are functionally dependent on the book’s international
standard book number (ISBN).
21
Example (3)
• Consider the relation EMP COURSE (EmpID, EMP COURSE
CourseTitle, DateCompleted) shown in the figure.
• We represent the functional dependency in this relation
as follows:
EmpID, CourseTitle → DateCompleted
• The comma between EmpID and CourseTitle stands for
the logical AND operator, because DateCompleted is
functionally dependent on EmpID and CourseTitle in
combination.
• The functional dependency in this statement implies that
the date when a course is completed is determined by
the identity of the employee and the title of the course.
22
Determinant
• The attribute on the left side of the arrow in a functional dependency is
called a determinant.
• SSN, VIN, and ISBN are determinants in the preceding three examples.
• In the EMP COURSE relation, the combination of EmpID and
CourseTitle is a determinant.
23
Candidate Keys
• A candidate key is an attribute, or combination of attributes, that
uniquely identifies a row in a relation.
• A candidate key must satisfy the following properties (Dutka and
Hanson, 1989):
• Unique identification For every row, the value of the key must uniquely identify
that row. This property implies that each nonkey attribute is functionally
dependent on that key.
• Nonredundancy No attribute in the key can be deleted without destroying the
property of unique identification.
24
Example (4)
• The EMPLOYEE1 relation has the following schema:
EMPLOYEE1(EmpID, Name, DeptName, Salary).
• EmpID is the only determinant in this relation.
• All of the other attributes are functionally dependent on EmpID.
• Therefore, EmpID is a candidate key and (because there are no other candidate
keys) also is the primary key.
25
Example (5)
• For the relation EMPLOYEE2 EmpID does not uniquely identify a row in the relation. For example, there are two rows in the
table for EmpID number 100.
• There are two types of functional dependencies in this relation:
1. EmpID → Name, DeptName, Salary
2. EmpID, CourseTitle → DateCompleted
• The functional dependencies indicate that the combination of EmpID and CourseTitle is the only candidate key (and therefore
the primary key) for EMPLOYEE2.
• In other words, the primary key of EMPLOYEE2 is a composite key.
26
Representing Functional Dependencies
27
Candidate Key & Determinant
• A candidate key is always a determinant, whereas a determinant may or may not
be a candidate key.
• For example, in EMPLOYEE2, EmpID is a determinant but not a candidate key.
• A candidate key is a determinant that uniquely identifies the remaining (nonkey)
attributes in a relation.
• A determinant may be a candidate key (such as EmpID in EMPLOYEE1), part of a
composite candidate key (such as EmpID in EMPLOYEE2), or a nonkey attribute.
28
Example (6)
SID StdName CourseCode CourseName Address Lecturer Lounge
1103156000 Thomas Alfa CSH2C3 Pemodelan Jl. Sukabirus VRE F202
Edison Basis Data
1103156000 Thomas Alfa CSG2H3 Pemrograman Jl. Sukabirus MTD F211
Edison Berbasis Objek
1103156002 Galileo Galilei CSH2C3 Pemodelan Jl. Buah Batu NDN F211
Basis Data
Based on table above, FD that can be defined:
SID StdName
SID Address
CourseCode CourseName
Lecturer Lounge
SID, CourseCode Lecturer
29
How to make sure that determinant
is a candidate key?
30
Requirements
• The set of attributes whose attribute closure is set of all attributes of
relation is called super key of relation.
• The minimal set of attributes whose attribute closure is set of all
attributes of relation is called candidate key of relation.
31
Closure of Functional Dependencies
• Given a set F set of functional dependencies, there are certain other
functional dependencies that are logically implied by F.
• If A B and B C, then we can infer that A C
• etc.
• The set of all functional dependencies logically implied by F is the
closure of F.
• We denote the closure of F by F+.
32
Closure of Functional Dependencies
• We can compute F+, the closure of F, by repeatedly applying Armstrong’s Axioms:
• Reflexivity rule : if , then
• Augmentation rule : if , then
• Transitivity rule : if , and , then
• These rules are
• Sound -- generate only functional dependencies that actually hold, and
• Complete -- generate all functional dependencies that hold.
33
Example of F +
R = (A, B, C, G, H, I)
F={AB
AC
CG H
CG I
B H}
Some members of F+
AH
by transitivity from A B and B H
AG I
by augmenting A C with G, to get AG CG
and then transitivity with CG I
CG HI
by augmenting CG I to infer CG CGI, and augmenting of CG H to infer CGI HI, and
then transitivity
34
Closure of Functional Dependencies (cont.)
Additional rules:
• Union rule:
If holds and holds, then holds.
• Decomposition rule:
If holds, then holds and holds.
• Pseudotransitivity rule:
If holds and holds, then holds.
The above rules can be inferred from Armstrong’s axioms.
35
Example of F (2)
+
R = (A, B, C, G, H, I)
F = {A B
AC
CG H
CG I
B H}
(AG)+
1. result = AG
2. result = ABCG (A C and A B)
3. result = ABCGH (CG H and CG AGBC)
4. result = ABCGHI (CG I and CG AGBCH)
Is AG a candidate key?
Is AG a super key?
Does AG R? == Is R (AG)+
Is any subset of AG a superkey?
Does A R? == Is R (A)+
Does G R? == Is R (G)+
36
Continuing from Example (6)
SID StdName CourseCode CourseName Address Lecturer Lounge
Based on rules explained, FD can be defined:
• SID, CourseCode SID (Rule: Subset/Reflexivity)
• SID, CourseCode CourseCode (Rule: Subset/Reflexivity)
• CourseCode, Lecturer CourseName, Lecturer (Rule: Augmentation with addition of Lecturer attribute on
both sides)
• SID, CourseCode Lounge (Rule: Transitivity)
from SID, CourseCode Lecturer and Lecturer Lounge
• SID StdName, Address (Rule: Union)
from SID StdName and SID Address
• and so on
37
Normalization Steps
38
Intro
• Now that we have examined functional dependencies and keys, we are ready to
describe and illustrate the steps of normalization.
• If an EER data model has been transformed into a comprehensive set of relations
for the database, then each of these relations needs to be normalized.
• In other cases in which the logical data model is being derived from user interfaces,
such as screens, forms, and reports, you will want to create relations for each user
interface and normalize those relations.
39
Steps in Normalization
40
Steps in Normalization (cont.)
1. First normal form Any multivalued attributes (also called repeating groups) have been removed, so
there is a single value (possibly null) at the intersection of each row and column of the table.
2. Second normal form Any partial functional dependencies have been removed (i.e., nonkey attributes
are identified by the whole primary key).
3. Third normal form Any transitive dependencies have been removed (i.e., nonkey attributes are
identified by only the primary key).
4. Boyce-Codd normal form Any remaining anomalies that result from functional dependencies have
been removed (because there was more than one possible primary key for the same nonkeys).
5. Fourth normal form Any multivalued dependencies have been removed.
6. Fifth normal form Any remaining anomalies have been removed.
41
Example
• For a simple illustration, we use a customer invoice from Pine Valley Furniture Company
42
Represent the View in Tabular Form
43
Invoice data (Pine Valley Furniture Company)
*Plus data for second order (Order ID 1007)
44
Convert to First Normal Form
45
1NF Constraints
A relation is in first normal form (1nF) if the following two constraints both apply:
1. There are no repeating groups in the relation (thus, there is a single fact at the
intersection of each row and column of the table).
2. A primary key has been defined, which uniquely identifies each row in the
relation.
46
Remove Repeating Group
Repeating Group
47
Relation with no repeating group
48
Determine FD & Select the Candidate Key
There are four determinants in INVOICE, and their functional dependencies are the following:
• OrderID → OrderDate, CustomerID, CustomerName, CustomerAddress
• CustomerID → CustomerName, CustomerAddress
• ProductID → ProductDescription, ProductFinish, ProductStandardPrice
• OrderID, ProductID → OrderedQuantity
From the four determinants we can infer that:
• OrderID, ProductID OrderDate, CustomerID, CustomerName, CustomerAddress,
ProductDescription, ProductFinish, ProductStandardPrice, OrderedQuantity
As you can see, the only candidate key for INVOICE is the composite key consisting of the attributes
OrderID and ProductID (because there is only one row in the table for any combination of values for
these attributes).
49
Relation in 1NF with CK
Candidate Key
50
Anomalies still occur in 1NF
INSERTION DELETION UPDATE
ANOMALY ANOMALY ANOMALY
51
Convert to Second Normal Form
52
2NF Constraints
A relation is in second normal form (2NF) if it is :
1. In first normal form
2. Contains no partial functional dependencies.
A partial functional dependency exists when a nonkey attribute is functionally
dependent on part (but not all) of the primary key.
53
Functional Dependency Diagram for Invoice
As you can see, the following partial dependencies exist in the
diagram:
OrderID → OrderDate, CustomerID, CustomerName, CustomerAddress
ProductID → ProductDescription, ProductFinish, ProductStandardPrice
54
• OrderID, ProductID OrderID, ProductID, OrderDate, CustomerID, CustomerName,
CustomerAddress, ProductDescription, ProductFinish, ProductStandardPrice, OrderedQuantity
• OrderID, ProductID OrderID (partial)
• OrderID OrderID (full)
• OrderID, ProductID ProductID (partial)
• ProductID ProductID (full)
• OrderID, ProductID OrderDate (partial)
• OrderID OrderDate (full)
• OrderID, ProductID CustomerID, CustomerName, CustomerAddress (partial)
• OrderID CustomerID, CustomerName, CustomerAddress (full)
• OrderID, ProductID ProductDescription, ProductFinish, ProductStandardPrice (partial)
• ProductID ProductDescription, ProductFinish, ProductStandardPrice (full)
• OrderID, ProductID OrderedQuantity (Full)
55
2NF
• OrderID OrderID, OrderDate , CustomerID, CustomerName,
CustomerAddress
• ProductID ProductID, ProductDescription, ProductFinish,
ProductStandardPrice
• OrderID, ProductID OrderedQuantity
Tabel 1 OrderID OrderDate CustomerID CostumerNAme CustomerAddress
Tabel 2 ProductID ProductDescription ProductFinish ProductStandardPrice
Tabel 3 OrderID ProductID OrderedQuantity
56
3NF
• OrderID OrderID, OrderDate , CustomerID, CustomerName,
CustomerAddress (transitive)
• CustomerID CustomerName, CustomerAddress
• OrderID OrderID, OrderDate , CustomerID
Tabel 1.1 OrderID OrderDate CustomerID
Tabel 1.2 CostumerID CustomerNAme CustomerAddress
Tabel 2 ProductID ProductDescription ProductFinish ProductStandardPrice
Tabel 3 OrderID ProductID OrderedQuantity
57
Removing partial dependencies
To convert a relation with partial dependencies to second normal form, the following
steps are required:
1. Create a new relation for each primary key attribute (or combination of
attributes) that is a determinant in a partial dependency. That attribute is the primary
key in the new relation.
2. Move the nonkey attributes that are only dependent on this primary key attribute
(or attributes) from the old relation to the new relation.
58
Relation in 2NF (no partial dependencies)
59
Convert to Third Normal Form
60
Constraints
A relation is in third normal form (3NF) if it is:
1. in second normal form
2. no transitive dependencies exist.
Transitive dependency in a relation is a functional dependency between the
primary key and one or more nonkey attributes that are dependent on the primary
key via another nonkey attribute.
61
There are two transitive dependencies in the CUSTOMER ORDER relation
shown in the diagram:
1. OrderID → CustomerID → CustomerName
2. OrderID → CustomerID → CustomerAddress
62
Removing transitive dependencies
You can easily remove transitive dependencies from a relation by means of a three-step procedure:
1. For each nonkey attribute (or set of attributes) that is a determinant in a relation, create a new
relation. That attribute (or set of attributes) becomes the primary key of the new relation.
2. Move all of the attributes that are functionally dependent only on the primary key of the new relation
from the old to the new relation.
3. Leave the attribute that serves as a primary key in the new relation in the old relation to serve as a
foreign key that allows you to associate the two relations.
63
Removing transitive dependencies
into
64
Relation in 3NF (no transitive dependencies)
65
Relational schema for invoice data
Customer Order
PK CustomerID PK OrderID
CustomerName OrderDate
CustomerAddress FK CustomerID
Product OrderLine
PK ProductID PK FK OrderID
ProductDescription PK FK ProductID
ProductFinish OrderedQuantity
ProductStandardPrice
66
Advanced Normalization:
Boyce-Codd Normal Form
67
Definition & Constraint
• A relation is in Boyce-Codd normal form (BCNF) if and only if every
determinant in the relation is a candidate key
68
Invoice Relation
There are four determinants in INVOICE from functional dependencies
• OrderID → OrderDate, CustomerID, CustomerName, CustomerAddress
• CustomerID → CustomerName, CustomerAddress
• ProductID → ProductDescription, ProductFinish, ProductStandardPrice
• OrderID, ProductID → OrderedQuantity
3NF relation:
Based on 3NF result, INVOICE relations already in BCNF, because every determinant is a candidate key.
69
Relation in 3NF but not in BCNF
Relation with sample data
Functional dependencies in
STUDENT ADVISOR
70
2-step process
1. The relation is modified so that the determinant in the
relation that is not a candidate key becomes a component of
the primary key of the revised relation.
71
2-step process
2. Decompose the relation to eliminate the partial functional
dependency
72
Note for Decomposition
• The decomposition is a lossless decomposition
Let R be a relation schema and let R1 and R2 form a decomposition of R . That is R =
R1 U R2
We say that the decomposition is a lossless decomposition if there is no loss of
information by replacing R with the two relation schemas R1 U R2
73
Example of Lossy Decomposition
FAILURE DECOMPOSITION
74
Reference
Hoffer, Jeffrey A., et.al., "Modern Database Management", Twelfth Edition, Pearson,
2016. Chapter 4
75
Questions