Data Normalisation Overview by Nick Rossiter
Data Normalisation Overview by Nick Rossiter
12/08/21 1
Learning Objectives
1. To consider the process of Normalisation
12/08/21 2
ACTIVITY 1
NORMALISATION PRINCIPLES
12/08/21 3
Normalisation
• Definition : a systematic method that takes
pre-existing relations and produces a
canonical set of relations.
• It can be used both for :
• designing canonical relations,
• checking existing relations to ensure they are
canonical.
12/08/21 4
Normal forms
12/08/21 5
Relational Design Tool
• In practice, we will always use normalisation for
designing relations.
– This is in order to develop our skills in normalisation.
• 1NF stands for First Normal Form, 2NF for
Second Normal Form, and so on.
• The constraints of a particular normal form are
those of the previous normal form
– plus the additional constraint(s) peculiar to this
particular normal form.
12/08/21 6
The Normalisation Procedure
• The normalisation procedure starts with a set of
relations, each of which, it is presumed, may be un-
normalised or in 0NF.
– DO FOR xNF = 1NF, ..... 5NF
• DO FOR each relation that exists
– IF relation already conforms to xNF
» THEN it is in xNF, so do nothing
– ELSE create 2 or more replacement relations from it that do conform to xNF.
– END-LOOP
• END-LOOP
7
What is a Normal Form?
• Each Normal Form has two parts
– A definition that specifies exactly what constraints
apply to a relation in that normal form.
• This is used to check whether any given relation is
already in that normal form or not.
– A method to be used to replace the relation with
2 or more that will be in that normal form.
• The method assumes that the relation-to-be-replaced
is in the previous normal form.
12/08/21 8
Normalising : Possibilities
Already
in 1NF.
Nothing
to do.
Already
Relation
in 2NF.
Nothing
to do.
And so on.
12/08/21 9
Consequences of Normalisation
• If new, replacement relations are created, then
they must be projections of the original.
– New-Relation Original-relation
Project[ attributes
• Normalisation always creates new relations
such that
– Original-relation New-Rel-1 Join[ attributes ]
New-Rel-2
• This ensures that no information is ever lost.
12/08/21 10
ACTIVITY 3
FIRST NORMAL FORM (1NF)
12/08/21 11
Definition of 1NF
• A relation is in 1NF
if and only if every attribute value it can
ever contain is an atomic value
12/08/21 12
Example : Purchase Order Relation
The following relation holds data about purchase orders
placed on suppliers for parts
Ord Sno Sname Saddr Date Part Pname Qty Price Tot
12/08/21 15
The Non-Atomic Attributes
• We can’t just throw away this data because it
is a nuisance to store!
• The values in all these attributes repeat
together.
– If a part is removed from an order, its values must
be removed from all 4 attributes.
– If another part is placed on an order, there must
be a value for that part in all 4 attributes.
12/08/21 16
Repeating Together
• Thus a set of values that repeat together should become a
tuple in a new relation.
• Now the attributes in these tuples contain only atomic data !
• Thus we form another new replacement relation to hold the
tuples of data that repeat together.
• There is no intrinsic reason why all the non-atomic attributes
in an un-normalised relation should always repeat together.
The values in all these attributes repeat Part Pname Qty Price
together.
4
If a part is removed from an order, its
N8
N8 Nut
Nut
70
4
L5
N8
N8 Nut
Nut
70
L5
B6
B6 Bolt
Bolt 760 45
L6 L4 Nail 100 3
L4 Nail 0
L6 P3 Pump 5 150
5
L6 Q7 Motor 5 250
6
0 the 3
We can solve this problem by adding (purchase) order
number attribute to this relation.
In general, we must add the attribute(s) which formed a candidate
key in the original relation, to this relation as a foreign key. This
12/08/21
retains the relationship information. 18
Candidate Keys for Relations
The candidate key is Ord
N Nu
Ord Sno Sname Saddr Date Tot L5 4
70
N8 Nut
8 t 4
L5 5
L5 127 Smith N’cle 7 May 12 60
B6 Bolt
L6 L4 Nail 100 3
L6
6 lt
Q7 Motor 5
5250
L Na
4 il 3
Extend the candidate key to Ord, Part
including the foreign key Ord
12/08/21 19
ACTIVITY 5
SECOND NORMAL FORM (2NF)
12/08/21 20
Definition of 2NF
A relation is in 2NF
if and only if
it is in 1NF
and
every non-key attribute is fully functionally
dependent on the candidate key. The extra
constraint
applied by
2NF
12/08/21 21
Fully Functionally Dependent
• Question : What does fully functionally
dependent mean?
• We will first consider the principle of
functional dependency, and then see
– what full functional dependency means,
– the application to achieve 2NF.
12/08/21 22
Example of Functional
Dependency
Assume some kind of loan account where payments of a certain
amount have to be made on a regular basis to pay off the loan.
This means :
if and only if
12/08/21 25
Relationship X:Y in FD is many:1
• For any given set of values X, there is just one
corresponding set of values Y.
12/08/21 26
Further FD Examples
Example 1:
Supplier Name
Example 2:
Customer Name
a set of two attributes
determining Customer Telephone No.
a set containing one Customer Address
attribute.
12/08/21 27
Full Functional Dependency & 2NF
• The definition of 2NF requires not merely
functional dependency, but full functional
dependency.
Definition of FULL Functional Dependency:
if and only if
Y is functionally
dependent on all the attributes of X
and not just a subset of them.
12/08/21 28
Condition for 2NF
Thus, to be in 2NF means that:
are fully FD on
12/08/21 29
Examples: Purchase Order
Relations
P_ORDER_1: FD Diagram
P_ORDER_1
Sno
Ord Saddr
Date
Tot
12/08/21 30
P_ITEM_1: FD Diagram
P_ITEM_1
N Nu
L5 4
70
N8 Nut
The functional dependencies of the non-key
8 t 4
L5 5
attributes in P-ITEM-1 on its candidate key can B6 Bolt
60
L6
B Bo
P3 Pump 5 150
L6
6 lt
Q7 Motor 5
5 250
L Na
Price
As they are not all fully FD on (Ord, Part) the
relation is not in 2NF. Ord
4 il 3
Qty
Part
Pname
12/08/21 31
Reason for non-2NF
• Attributes Price and Qty depend on the full key.
– They depend not only on what kind of part they refer
to, but also on the order itself
• the quantity of a part type ordered will vary with &
depend on the order, as will the price since it depends on
the quantity.
• However Pname depends solely on the type of
part.
– A particular kind of part will have the same name on
every order on which it appears.
12/08/21 32
Three Problems of a Non-2NF
Relation
• Redundant data may be stored.
• Update anomalies
– there can be problems in inserting, deleting and
amending some of the data.
• Semantic problems.
– relation does not reflect the real-world meaning of the
data, leading to problems in its use.
12/08/21 33
Redundant Data
P_ITEM_1
L6
6 lt
Q7 Motor 5
5250
L7
L Na
Q7 Motor 2 100
4 il 3
Example: Pname is unnecessarily
repeated. Motor is repeated in orders L6 & L7.
One order is sufficient to give us the
name, so the other is redundant.
(Either one).
Up
12/08/21 34
Update Anomalies
P_ITEM_1
L5
N Nu
N8 Nut
70
4
L5
Deletions:
L6 8 t
B6
L4 Bolt
Nail
60
100 45
3
Amendments:
?? 4 il
F5 Flange ? 3
???
12/08/21 35
Semantic Problems
Semantic Problems
P_ITEM_1
L6 B Bo
P3 Pump 5 150
Yet in real-life, a part type can only ever
have one name. L6 6 lt
Q7 Motor 5 5250
L7 L Na
Q7 Engine 2 100
4 il 3
Q7 now has two different names.
12/08/21 36
Putting P_ITEM_1 into 2NF (1)
Price
Ord
The problem is caused by ‘Pname’ being
FD on just part, not the whole of the Qty
candidate key.
Part
Pname
Price
The solution is to separate out each Ord
determinant and its dependents.
Qty
Create 2 replacement relations based on Part
these FDs.
Part Pname
12/08/21 37
Satisfaction of 2NF
• A relation created with a determinant as its candidate
key, and with non-key attributes that are fully
functionally dependent on that candidate key, must
be in 2NF by definition.
12/08/21 38
Putting P_ITEM_1 into 2NF (2)
Price
P_ITEM_2
Ord
Ord Part Qty Price
Qty
Part
N
L5 4
70
N8
8 4
L5 5
60
B6
L6 L4 100 3
L6
6
Q7 5
5250
12/08/21 39
Putting P_ITEM_1 into 2NF (3)
PART_2
N Nu
N8 Nut
8 t
B6
L4
Bolt
Nail
6 lt
Q7 Motor
L Na
4 il
12/08/21 40
Benefits of 2NF
• No information has been lost.
– A natural join of P_ITEM_2 and PART_2 on
attribute Part will re-create the original relation
P_ITEM_1.
• Problems Solved:
– Redundant data removed – each Pname in once
– Update anomalies – no side effects in operations
– Semantic problems – each part type has just one
name
12/08/21 41
THIRD NORMAL FORM (3NF)
12/08/21 42
Definition of 3NF
A relation is in 3NF
if and only if
it is in 2NF
and
every non-key attribute is non-transitively
fully FD on the candidate key.
The extra
constraint
applied by
3NF
Note that 3NF is more stringent than 2NF, as it requires that the relation
not only have full functional dependencies on the candidate key, but that
these dependencies must now additionally be “non-transitive”.
12/08/21 43
Transitivity
Assume there are three sets of attributes,
‘A’, ‘B’ and ‘C’.
12/08/21 44
Example of Transitive FD
• Suppose pilots always fly the same aircraft
– then if we know the pilot, we know the aircraft; so pilot
functionally determines aircraft.
• If we know the aircraft, then we know the airline that
owns it
– so aircraft functionally determines airline.
• Putting these two dependencies together
– then pilot functionally determines airline.
• But the functional dependency of airline on pilot is
transitive, because it goes via aircraft.
12/08/21 45
Non-Transitive Full FD & 3NF
So, to be in 3NF means that
2.
NK3
12/08/21 47
Example: P_ITEM_2
P_ITEM_2
N
L5 4
70
N8
8 4
L5 5
60 Price
B6
L6 L4 100 3 Ord
L6
B
P3 5 150
Qty
L6
6
Q7 5
5250
Part
L
4 3
Neither ‘Price’ nor ‘Qty’ is FD on the
candidate key via the other, but non-
transitively FD on the key.
12/08/21 48
Example : PART_2
PART_2
N Nu
N8 Nut
8 t
B6
L4
Bolt
Nail
Part Pname
B Bo
P3 Pump
6 lt
Q7 Motor
Pname’ is non-transitively FD on the candidate key.
L Na
Thus PART_2 is already in 3NF.
Sno
L6 315 Bloggs D’ham 8 June 400
Sname
Date
However, not all of these FDs are non-
transitive FDs (= NTFDs).
Tot
Sname
Sno
Taking account now of Saddr
transitivity, the FD diagram Ord Date
can be re-drawn as:-
Tot
Tot
Solution: separate out each determinant and its NTFD dependents, & create 2
replacement relations based on them.
Sno Sname
12/08/21 51
Putting P_ORDER_1 into 3NF (2)
Sno
Ord Date
Tot
P_ORDER_3
12/08/21 52
Putting P_ORDER_1 into 3NF (3)
Sname
Sno
Saddr
SUPPLIER_3
12/08/21 53
Benefits
• No information has been lost.
– A natural join of P_ORDER_3 and SUPPLIER_3 on
attribute Sno will re-create the original relation
P_ORDER_1.
• Problems Solved:
– Redundant data removed – each Sname in once
– Update anomalies – no side effects in operations
– Semantic problems – each supplier has just one
name
12/08/21 54