4.4 Normalization
4.4 Normalization
CS341
Purpose of Normalization
Normalization is a technique for producing a set of
suitable relations that support the data requirements of
an enterprise
Characteristics of a suitable set of relations include:
Using the minimal number of attributes that is
necessary to support the data requirements of the enterprise;
Attributes with a close logical relationship are found in the
same relation;
Minimal Redundancy: Each attribute is represented only once
with the exception of attributes that form a part of foreign keys.
2
Purpose of Normalization (cont.)
Normalization decomposes relations to smaller relations
Normalization involves creating tables and establishing
relationships between those tables
The benefits of using a database that has a suitable set
of relations is that :
The database will be easier for the user to access and maintain
the data;
Update Anomalies will be avoided
The database will need minimal storage space on the
computer
3
Why Normalize Relations?
Example 1
We have to repeat the loan amount once for each customer
4
Why Normalize Relations? (cont.)
Example 2
We store the amount of each loan exactly once
Less Redundancy
once
5
Normalization Process
Normalization is performed as a series of tests on a
relation to determine whether it satisfies the
requirements of a given Normal Form
It is based on the Functional Dependencies among the
attributes of a relation.
A relation is normalized to prevent the possible
occurrence of Update Anomalies
6
Normal Forms
The 4 most commonly used normal forms are
First Normal Form (1NF)
Second Normal Form (2NF)
Third Normal Form (3NF)
Boyce-Codd Normal Form (BCNF)
7
What does Normalization do?
Normalization decomposes relations to smaller/non-
redundant relations for efficient storage & processing
Two important properties of Decomposition.
Lossless-join property enables us to find any instance of the
original relation from corresponding instances in the smaller
relations.
Dependency preservation property enables us to enforce a
constraint on the original relation by enforcing some
constraint on each of the smaller relations.
8
Data Redundancy
Major aim of relational database design is the grouping
of attributes into relations to
Minimize data redundancy and
Reduce file storage space required
Main benefits of minimizing redundancy include:
Updates are achieved with a minimal number of operations
thus reducing the opportunities for data inconsistencies.
Reduction in the file storage space required by the base
relations
thus minimizing costs.
9
Update Anomalies
Update Anomalies are data inconsistencies created due
to an update operation (insert, update, and delete )
Relations that contain redundant information may
potentially suffer from update anomalies
Normalization is the process of decomposing relations
with anomalies to produce well-structured relations
Types of Update Anomalies are:
Insertion:- impossible to insert some data
Deletion:- Loss of dependent data
Modification:- Inconsistency (need to update all occurrences of
the data)
10
Update Anomalies
Insertion Anomaly:
We cannot insert a department without inserting a member of staff that works in
that department.
Example: Insert a new department named Administration
Update Anomaly:
Updates may need to be made at several points
Example: Changing the name of the department that employee “101” works in
without simultaneously changing the department that “102” works.
11
Update Anomalies (cont.)
Deletion Anomaly:
Deletion of one record may result in the loss of other information
Example: By removing, employee 100, we have removed all information
pertaining to the sales department
12
Functional Dependencies
Functional dependency describes relationship between
attributes.
FDs provide a formal mechanism to express constraints
between attributes
If A and B are attributes of relation R, then B is
functionally dependent on A
If each value of A is associated with exactly one
value of B in R
A B means that the value of B is determined by A
13
Notation - FD
The notation of functional dependency is
A → B.
The meaning of this notation is:
1. “A” determines “B”
2. “B” is functionally dependent on “A”
“A” is called the DETERMINANT
“B” is called OBJECT of the determinant
Example
StudentID → GPA
The value is the GPA can be determined if
we know the
studentID
Child → Mother ? OR
14
Mother → Child ?
Example
15
Functional Dependencies
The Determinant of a functional dependency refers to
the attribute or group of attributes on the left-hand
side of the arrow.
16
Functional Dependencies
Consider the values shown in staffNo and sName
attributes of the Staff relation
the following functional dependencies appear to hold.
staffNo → sName
sName →
staffNo
However, the only functional dependency that remains
true for all possible values for the staffNo and sName
attributes of the Staff relation is:
staffNo → sName
17
FD - Example
19
Partial Dependency
This is the situation that exists if it is possible to use a
subset of the attributes of the composite determinant to
identify its object
Example
staffNo, sName → branchNo
Each value of (staffNo, sName) is associated with a
single value of branchNo.
branchNo is also functionally dependent on a subset of
(staffNo, sName)
namely staffNo
This is a Partial Dependency
i.e. branchNo is partially dependent on staffNo, sName
20
FD - Examples
Identify the functional dependencies
A B C D
1 4 10 100
2 6 20 50
3 6 20 200
1 4 10 200
2 6 20 200
3 6 20 300
1 4 10 null
2 6 20 50
3 6 20 50
21
Transitive Dependencies
Transitive dependency describes a condition where A, B,
and C are attributes of a relation such that
if A → B and B → C, then
C is transitively dependent on A via B
provided that A is not functionally dependent on B or C
Example: Consider functional dependencies in the
StaffBranch relation
staffNo → sName, position, salary, branchNo, bAddress
branchNo → bAddress
Transitive dependency
branchNo → bAddress exists on staffNo via branchNo.
22
FD in Normalization
Functional dependencies in Normalization:
There is a one-to-one relationship between the attribute(s) on
the left-hand side (determinant) and those on the right-hand side
of a functional dependency.
The functional dependency holds for all times
The determinant has the minimal number of attributes
necessary to maintain the dependency with the attribute(s) on
the right hand-side.
23
The Process of Normalization
Formal technique for analyzing a relation based on its
primary key and the functional dependencies between
the attributes of that relation.
Often executed as a series of steps.
Each step corresponds to a specific normal form, which
has known properties.
As normalization proceeds, the relations become
progressively more restricted (stronger) in format and
less vulnerable to update anomalies
24
The Process of Normalization
25
26
Identifying FDs
FDs between a set of attributes is relatively simple if
the meaning of each attribute and
the relationships between the attributes are well understood.
This information should be clear from the users’
requirements specification.
27
Identifying FDs - Example
FDs for the StaffBranch relation:
staffNo → sName, position, salary, branchNo, bAddress
branchNo → bAddress
bAddress → branchNo
branchNo, position → salary
bAddress, position → salary
28
Repeating Group
A repeating group is an attribute (or set of attributes) that
can have more than one value for a primary key value.
Repeating groups are not allowed in well-formed
relations
all attributes have to be atomic
i.e., there can only be one value per cell in a table
29
First Normal Form (1NF)
A relation that contains one or more repeating groups is
referred to as Unnormalized Form (UNF)
1NF - A relation in which the intersection of each row
and column contains one and only one value
A relation should only include ATOMIC values
Converting from UNF to 1NF
Select attribute(s) to act as the key for the table
Identify the repeating group(s) in the unnormalized table which
repeats for the key attribute(s)
Remove the repeating group i.e. ‘flatten’ the table
Create a separate row for each repeated (multivalued) attribute
Repeat the data for the empty columns in each of the rows
30
1NF - Example
Contacts relation is not in the 1NF
Why?
Convert the Contacts relation to the 1NF
31
1NF - Example
Contacts relation in 1NF
Eliminate repeating groups
Identify each set of related data with a primary key
All attributes are single valued & non-repeating
32
1NF - Example
STAFF relation is not in the 1NF
Why?
Convert the STAFF relation to the 1NF
33
1NF - Example
Department relation is not in the 1NF
Why?
Convert the Department relation to the 1NF
DepID DepName Location DepID DepName Location
1 Admin A, C 1 Admin A
2 Finance A, B, C 1 Admin C
3 Sales C 2 Finance A
4 Research A, B 2 Finance B
2 Finance C
3 Sales C
4 Research A
4 Research B
34
Second Normal Form (2NF)
The 2NF is based on the concept of full functional
dependency.
Full functional dependency indicates that if
A and B are attributes of a relation, then B is fully dependent on
A if B is functionally dependent on A but not on any proper
subset of A
A relation is said to be in the 2NF if
The relation that is in 1NF and
Every non-primary-key attribute is fully functionally dependent
on the primary key
2NF => No PARTIAL Dependency on the PK
There should not be a data item that is functionally
dependent on the a part of the compound key!
35
1NF to 2NF
Identify the primary key for the 1NF relation.
Identify the functional dependencies in the relation.
If partial dependencies exist on the primary key remove
them
Placethem in a new relation along with a copy of their
determinant
36
1NF to 2NF - Example
The relation EMP_PROJ is NOT in the 2NF
EmpID EName ProjID ProjName TotalTime
38
Third Normal Form (3NF)
3NF is based on the concept of Transitive
Dependency.
Transitive Dependency is a condition
where
A, B and C are attributes of a relation such that if
A B and B C,
then C is transitively dependent on A through B.
Provided that A is not functionally dependent on
B or C
A relation is in the 3NF if
it is in 2NF and
No non-primary-key attribute is transitively
dependent on the
Primary Key
39
2NF to 3NF
Identify the primary key in the 2NF relation.
Identify functional dependencies in the relation.
If transitive dependencies exist on the primary key
remove them
by placing them in a new relation along with a copy of their
determinant.
40
2NF to 3NF - Example
The Employee relation is not in the 3NF
EmpID EName DepID DepName
41
2NF to 3NF - Example
Solution => Decompose:
If transitive dependencies exist on the primary key, then
remove them by placing them in a new relation along with a copy of their
determinant
EmpID → DepName is a Transitive Dependency
DepID DepName
42
2NF to 3NF - Example
Relation
SALES (CUSTOMERID, CUSTOMERNAME, SALESPERSON, REGION)
Functional Dependencies
CUSTOMERID → CUSTOMERNAME
CUSTOMERID → SALESPERSON
CUSTOMERID → REGION
SALESPERSON → REGION
CUSTOMERID → REGION
is a Transitive FD
Solution?
SALES1 (CUSTOMERID, CUSTOMERNAME, SALESPERSON)
SALES2 (SALESPERSON, REGION) 43
Boyce–Codd Normal Form
Reading Assignment
44
Disadvantages of Normalization
The disadvantage of normalization is that it produces
many tables
A query might require retrieval of data from multiple normalized
tables
This can result in complicated table joins
Decomposition of tables can lead to a performance
problem
All the joins required to merge data will slow down the
process
Denormalization
Denormalization is used to improve performance in cases
where over-normalized structures are causing overhead to the
query processor
Denormalization should always be done after the relation
are normalized 45
Definition of Terms
Key is an attribute or group of attributes, which is used
to identify a row in a relation
Key can be broadly classified into (1) Superkey (2)
Candidate key, and (3) Primary key
Supper Key: a set of attributes that are unique for each
record
Candidate key: a minimal superkey
A minimal set of attributes whose values uniquely identify tuples
in the corresponding relation.
Primary Key: is a designated candidate key
The primary key should not be null
Normal Form of a Relation: refers to the highest normal
form that it satisfies
46
THE NORMALISATION OATH
A useful mnemonic for remembering the rationale for
normalization is the distortion of the legal oath:
1. No Repeating,
2. The Data-Items Depend Upon The Key,
3. The Whole Key,
4. And Nothing But The Key,
5. So Help Me Codd.
Line 1 indicates that there should be no repeating groups of data in a
table.
Line 2 states that all data-items in a table must depend solely upon the
key.
Line 3 indicates that there should be no part-key dependencies in a
table.
Line 4 reminds us that there should be no inter-data dependencies in a
table. The only dependency should be between the key and other data-
items in a table 47
Line 5 simply reminds us that the techniques were originally developed
by
Exercise
Convert the following relation to the 3NF
1. R1 = ( A , B , C , D )
AB,C, D
C D
2. R2 = ( A , B , C , D , E)
A,B C,D,E
B D
3. R3 = ( A , B , C , D , E)
A B,C,D,E
B D,E
48
Exercise
A company obtains parts from a number of suppliers. Each supplier
is located in one city. A city can have more than one supplier located
there and each city has a status code associated with it. Each
supplier may provide many parts. Identify in which normal form of
the following relation and normalize it to third normal form.
R (sID, cCode, city, pID, quantity)
50
Boyce–Codd Normal Form
BCNF is based on functional dependencies that take into
account all candidate keys in a relation
A relation is in BCNF if and only if every determinant is
a candidate key.
If X A , then X is a candidate key
(A , B , C )
A,BC
C B [ not allowed in BCNF unless C is an Alternate KEY]
51
BCNF
Violation of BCNF is quite rare.
Usually violation to BCNF may occur in a relation that:
contains two (or more) composite candidate keys;
the candidate keys overlap
i.e. have at least one attribute in common.
Example PATIENT HOSPITAL DOCTOR
FDs
{PATIENT, HOSPITAL} DOCTOR
DOCTOR HOSPITAL
Converting to BCNF
R1 ( PATIENT, DOCTOR )
R2 (DOCTOR , HOSPITAL )
52
BCNF and 3NF
Difference between 3NF and BCNF is that for a
functional dependency A B
3NF allows this dependency in a relation if B is a primary-key
attribute and A is not a candidate key.
BCNF insists that for this dependency to remain in a relation, A
must be a candidate key
Every relation in BCNF is also in 3NF.
However, a relation in 3NF is not necessarily in BCNF.
53
BCNF - Example
FDs
StudID, Course Instructor
Instructor Course (an instructor teaches only one course)
The relation is not in the BCNF because
Instructor is not a KEY and StudID Course Instructor
Instructor Course 101 Java P. Java
To change to BCNF 101 C++ P. Cpp
R1 (StudID, Instructor) 102 Java P. Java2
R2 (Instructor , Course) 103 C# P. Hash
104 Java P. Java
54
Exercise - Normalize to the 3NF
1. Student (StudID, StudName, Gender, HSCode,
HSName, HSCity, GPA, Priority )
FDs
StudID StudName , Gender
GPA Priority
HSCode HSName, HSCity
2. R(A,B,C,D,E,F,G,H,I,J )
FDs
A,BC
A D,E
B F
D I , J
F G, H
55