0% found this document useful (0 votes)
454 views54 pages

Data Normalisation Overview by Nick Rossiter

The document discusses data normalization and normal forms. It defines normalization as a process that takes existing relations and produces canonical sets of relations. Normalization uses normal forms, which are organized in a sequence from 1NF to higher normal forms. A relation is in a normal form if it meets the constraints for that form. The document then discusses the definitions and constraints of 1NF, 2NF and 3NF, and how to normalize a relation by decomposing it into relations in a higher normal form.

Uploaded by

Raushan Baranwal
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
454 views54 pages

Data Normalisation Overview by Nick Rossiter

The document discusses data normalization and normal forms. It defines normalization as a process that takes existing relations and produces canonical sets of relations. Normalization uses normal forms, which are organized in a sequence from 1NF to higher normal forms. A relation is in a normal form if it meets the constraints for that form. The document then discusses the definitions and constraints of 1NF, 2NF and 3NF, and how to normalize a relation by decomposing it into relations in a higher normal form.

Uploaded by

Raushan Baranwal
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd

Database Modelling

Lecture 7: Data Normalisation


Nick Rossiter

12/08/21 1
Learning Objectives
1. To consider the process of Normalisation

2. To consider the definition and application of 1NF

3. To consider the definition and application of 2NF

4. To consider the definition and application of 3NF

12/08/21 2
ACTIVITY 1
NORMALISATION PRINCIPLES

12/08/21 3
Normalisation
• Definition : a systematic method that takes
pre-existing relations and produces a
canonical set of relations.
 
• It can be used both for :
• designing canonical relations,
• checking existing relations to ensure they are
canonical.

12/08/21 4
Normal forms

Normalisation uses the concept of Normal Forms. They are


organised in a sequence, each successive normal form being
higher than the one before.

1NF 2NF 3NF

A normal form is higher because it applies more stringent


constraints to a relation than a lower normal form.

A relation is said to a be in a certain “normal form” if it


conforms to the constraints of that normal form.

12/08/21 5
Relational Design Tool
• In practice, we will always use normalisation for
designing relations.
– This is in order to develop our skills in normalisation.
• 1NF stands for First Normal Form, 2NF for
Second Normal Form, and so on.
• The constraints of a particular normal form are
those of the previous normal form
– plus the additional constraint(s) peculiar to this
particular normal form.
12/08/21 6
The Normalisation Procedure
• The normalisation procedure starts with a set of
relations, each of which, it is presumed, may be un-
normalised or in 0NF.
– DO FOR xNF = 1NF, ..... 5NF
• DO FOR each relation that exists
– IF relation already conforms to xNF
» THEN it is in xNF, so do nothing
– ELSE create 2 or more replacement relations from it that do conform to xNF.
– END-LOOP
• END-LOOP

• 5NF is the highest possible normal form.


•  In practice, 3NF is the highest normal form usually
reached.

7
What is a Normal Form?
• Each Normal Form has two parts
– A definition that specifies exactly what constraints
apply to a relation in that normal form.
• This is used to check whether any given relation is
already in that normal form or not.
– A method to be used to replace the relation with
2 or more that will be in that normal form.
• The method assumes that the relation-to-be-replaced
is in the previous normal form.

12/08/21 8
Normalising : Possibilities

The set of all The set of all The set of all


un-normalised relations relations
relations in 1NF in 2NF
A given
relation.

Already
in 1NF.
Nothing
to do.

Already
Relation
in 2NF.
Nothing
to do.

And so on.

12/08/21 9
Consequences of Normalisation
• If new, replacement relations are created, then
they must be projections of the original.
– New-Relation  Original-relation
Project[ attributes
• Normalisation always creates new relations
such that 
– Original-relation  New-Rel-1 Join[ attributes ]
New-Rel-2
• This ensures that no information is ever lost.
12/08/21 10
ACTIVITY 3
FIRST NORMAL FORM (1NF)

12/08/21 11
Definition of 1NF
• A relation is in 1NF
if and only if every attribute value it can
ever contain is an atomic value

• Question : What is an atomic value ?


• Answer : A value that cannot meaningfully
be broken down into two or more constituent
parts.

12/08/21 12
Example : Purchase Order Relation
The following relation holds data about purchase orders
placed on suppliers for parts
Ord Sno Sname Saddr Date Part Pname Qty Price Tot

L5 127 Smith N’cle 7 May N8 Nut 70 4 12


B6 Bolt 60 5
L4 Nail 100 3

L6 315 Bloggs D’ham 8 June P3 Pump 5 150 400


Q7 Motor 5 250

Ord is the only candidate key of the relation.


Each tuple holds all the required data for one purchase order.
Ord Order number that uniquely identifies every purchase order.
Sno Supplier number that uniquely identifies any supplier.
Sname The name of a supplier.
Saddr The address of a supplier.
Date The date on which the order was placed.
Part Part number that uniquely identifies every kind of part used by the company.
Pname The name of a particular kind of part.
Qty The quantity of a particular kind of part ordered on a purchase order.
Price The price of that quantity of that particular kind of part.
12/08/21 13
Tot The total price to be paid for the whole order.
Not in 1NF
• Attributes Ord, Sno, Sname, Saddr, Date and Tot
currently contain only atomic values, and in fact
can only ever contain atomic values.

• Attributes Part, Pname, Qty and Price currently


contain non-atomic values, and in fact may often
contain non-atomic values.

• Therefore the relation is not in 1NF.


12/08/21 14
Putting Purchase Order into 1NF
• Separate out the atomic and non-atomic
attributes
• Put all the atomic attributes in a new
replacement relation, which then by definition
is in 1NF.

Ord Sno Sname Saddr Date Tot

L5 127 Smith N’cle 7 May 12

L6 315 Bloggs D’ham 8 June 400

12/08/21 15
The Non-Atomic Attributes
• We can’t just throw away this data because it
is a nuisance to store!
• The values in all these attributes repeat
together.
– If a part is removed from an order, its values must
be removed from all 4 attributes.
– If another part is placed on an order, there must
be a value for that part in all 4 attributes.

12/08/21 16
Repeating Together
• Thus a set of values that repeat together should become a
tuple in a new relation.
• Now the attributes in these tuples contain only atomic data !
• Thus we form another new replacement relation to hold the
tuples of data that repeat together.
• There is no intrinsic reason why all the non-atomic attributes
in an un-normalised relation should always repeat together.
The values in all these attributes repeat Part Pname Qty Price
together.
4
If a part is removed from an order, its
N8
N8 Nut
Nut
70

values must be removed from all 4


attributes.
B6
B6 Bolt
Bolt 760 45
L4 Nail 100 3
If another part is placed on an order, there
L4 Nail 0
P3 Pump 5 150
must be a value for that part in all 4
attributes.
5
12/08/21 Q7 Motor 5 250 17
6
Foreign Keys
The problem with this relation is that the part data is no longer
associated with its order data.

We no longer know which part type was ordered on which


purchase order.
Ord Part Pname Qty Price

4
L5
N8
N8 Nut
Nut
70
L5
B6
B6 Bolt
Bolt 760 45
L6 L4 Nail 100 3
L4 Nail 0
L6 P3 Pump 5 150
5
L6 Q7 Motor 5 250
6
0 the 3
We can solve this problem by adding (purchase) order
number attribute to this relation.
In general, we must add the attribute(s) which formed a candidate
key in the original relation, to this relation as a foreign key. This
12/08/21
retains the relationship information. 18
Candidate Keys for Relations
The candidate key is Ord

Ord Part Pname Qty Price

N Nu
Ord Sno Sname Saddr Date Tot L5 4
70
N8 Nut

8 t 4
L5 5
L5 127 Smith N’cle 7 May 12 60
B6 Bolt
L6 L4 Nail 100 3

L6 315 Bloggs D’ham 8 June 400 L6


B Bo
P3 Pump 5 150

L6
6 lt
Q7 Motor 5
5250

L Na
4 il 3
Extend the candidate key to Ord, Part
including the foreign key Ord
12/08/21 19
ACTIVITY 5
SECOND NORMAL FORM (2NF)

12/08/21 20
Definition of 2NF

A relation is in 2NF
if and only if
it is in 1NF
and
every non-key attribute is fully functionally
dependent on the candidate key. The extra
constraint
applied by
2NF

Note that 2NF is more stringent than 1NF, in that it


requires the relation to conform to the additional “full
functional dependency” constraint.

12/08/21 21
Fully Functionally Dependent
• Question : What does fully functionally
dependent mean?
 
• We will first consider the principle of
functional dependency, and then see
– what full functional dependency means,
– the application to achieve 2NF.

12/08/21 22
Example of Functional
Dependency
Assume some kind of loan account where payments of a certain
amount have to be made on a regular basis to pay off the loan.

Account Number Payment Due

This type of arrow indicates


a function dependency.

This means :

• A given account number determines what payment is due.

• In principle, given an account number, one can find out what


regular payment is due. (May not always be easy or feasible in practice).
12/08/21 23
Terminology
• The Account Number is said to functionally
determine the Payment Due.

• The Payment Due is said to be functionally


dependent on the Account Number.

• Both are equally good means of expression, and convenience


and emphasis usually determine which of the two is preferred
in any particular situation.
12/08/21 24
Definition of Functional
Dependency (FD)

A set of attributes Y in a relation is functionally dependent on


a set of attributes X in the same relation

if and only if

a given set of attribute values in X


determines a specific set of attribute values in Y
for every instant of time.

12/08/21 25
Relationship X:Y in FD is many:1
• For any given set of values X, there is just one
corresponding set of values Y.

• It is possible that there may be many sets of


values X for which there is just one set of values Y.

• A functional dependency is a permanent


association between attributes.

12/08/21 26
Further FD Examples
Example 1:
Supplier Name

A set containing one attribute


Supplier Number Supplier Address determining
a set of three attributes.

Supplier Telephone No.

Example 2:

Customer Name
a set of two attributes
determining Customer Telephone No.
a set containing one Customer Address
attribute.

12/08/21 27
Full Functional Dependency & 2NF
• The definition of 2NF requires not merely
functional dependency, but full functional
dependency.
Definition of FULL Functional Dependency:

A set of attributes Y is fully functionally dependent on


a set of attributes X

if and only if

Y is functionally
dependent on all the attributes of X
and not just a subset of them.

12/08/21 28
Condition for 2NF
Thus, to be in 2NF means that:

all attributes not in the candidate key

are fully FD on

all those attributes that are in the


candidate key.

12/08/21 29
Examples: Purchase Order
Relations
P_ORDER_1: FD Diagram

P_ORDER_1

Ord Sno Sname Saddr Date Tot

The functional dependencies of the


L5 127 Smith N’cle 7 May 12
non-key attributes in P_ORDER_1
on its candidate key can be
represented by the following FD L6 315 Bloggs D’ham 8 June 400
diagram :-

Sno

As they are all fully FD on Ord, the Sname


relation is already in 2NF.

Ord Saddr

Date

Tot
12/08/21 30
P_ITEM_1: FD Diagram
P_ITEM_1

Ord Part Pname Qty Price

N Nu
L5 4
70
N8 Nut
The functional dependencies of the non-key
8 t 4
L5 5
attributes in P-ITEM-1 on its candidate key can B6 Bolt
60

be represented by the following FD diagram :- L6 L4 Nail 100 3

L6
B Bo
P3 Pump 5 150

L6
6 lt
Q7 Motor 5
5 250

L Na
Price
As they are not all fully FD on (Ord, Part) the
relation is not in 2NF. Ord
4 il 3
Qty

Part
Pname

12/08/21 31
Reason for non-2NF
• Attributes Price and Qty depend on the full key.
– They depend not only on what kind of part they refer
to, but also on the order itself
• the quantity of a part type ordered will vary with &
depend on the order, as will the price since it depends on
the quantity. 
• However Pname depends solely on the type of
part.
– A particular kind of part will have the same name on
every order on which it appears.
12/08/21 32
Three Problems of a Non-2NF
Relation
• Redundant data may be stored.

• Update anomalies
– there can be problems in inserting, deleting and
amending some of the data.

• Semantic problems.
– relation does not reflect the real-world meaning of the
data, leading to problems in its use.
12/08/21 33
Redundant Data
P_ITEM_1

Ord Part Pname Qty Price

Every time a part type appears on an


N Nu
L5 4
order (say Q7), its name (Motor) also 70
N8 Nut
appears.
8 t 4
L5 5
60
B6 Bolt
L6 L4 Nail 100 3
N.B. the part number (say Q7) is enough
to identify the part type. L6
B Bo
P3 Pump 5 150

L6
6 lt
Q7 Motor 5
5250

L7
L Na
Q7 Motor 2 100

4 il 3
Example: Pname is unnecessarily
repeated. Motor is repeated in orders L6 & L7.
One order is sufficient to give us the
name, so the other is redundant.
(Either one).

Up

12/08/21 34
Update Anomalies
P_ITEM_1

Ord Part Pname Qty Price

L5
N Nu
N8 Nut
70
4

L5

Deletions:
L6 8 t
B6
L4 Bolt
Nail
60
100 45
3

Can’t delete (P3, Pump);


it appears on order L6. L6 B Bo
P3 Pump 5 150
Yet when order L6 is
deleted, then (P3, Pump) L6 6 lt
Q7 Motor 5 5
250
data is lost.
L7 L Na
Q7 Engine 2 100

Amendments:
?? 4 il
F5 Flange ? 3
???

Can’t change Q7’s name from Motor


to Engine without changing all
occurrences to retain consistency.
Additions:
Can’t add Flange; there is no
order data to complete the tuple.

Example: Part type details (Part and


Pname) cannot always be updated.

12/08/21 35
Semantic Problems
Semantic Problems
P_ITEM_1

Ord Part Pname Qty Price


Example:
L5
N Nu
4
If multiple copies of a part type’s ‘Pname’ N8 Nut
70

are inconsistently amended, the same L5


part type could end up with two or more
different names.
L6 8 t
B6
L4 Bolt
Nail
60
100 4 5
3

L6 B Bo
P3 Pump 5 150
Yet in real-life, a part type can only ever
have one name. L6 6 lt
Q7 Motor 5 5250

L7 L Na
Q7 Engine 2 100

4 il 3
Q7 now has two different names.

12/08/21 36
Putting P_ITEM_1 into 2NF (1)
Price
Ord
The problem is caused by ‘Pname’ being
FD on just part, not the whole of the Qty
candidate key.
Part
Pname

Price
The solution is to separate out each Ord
determinant and its dependents.
Qty
Create 2 replacement relations based on Part
these FDs.

Part Pname

12/08/21 37
Satisfaction of 2NF
• A relation created with a determinant as its candidate
key, and with non-key attributes that are fully
functionally dependent on that candidate key, must
be in 2NF by definition.

• Note that a determining attribute - Part in the above


example - can appear in more than one complete
determinant.
– This is perfectly acceptable. It just depends what
attributes form determinants.

12/08/21 38
Putting P_ITEM_1 into 2NF (2)

Price
P_ITEM_2
Ord
Ord Part Qty Price
Qty
Part
N
L5 4
70
N8

8 4
L5 5
60
B6
L6 L4 100 3

The corresponding relation is :- L6


B
P3 5 150

L6
6
Q7 5
5250

12/08/21 39
Putting P_ITEM_1 into 2NF (3)

PART_2

Part Pname Part Pname

N Nu
N8 Nut

8 t
B6
L4
Bolt
Nail

The corresponding relation is :- B Bo


P3 Pump

6 lt
Q7 Motor

L Na
4 il

12/08/21 40
Benefits of 2NF
• No information has been lost.
– A natural join of P_ITEM_2 and PART_2 on
attribute Part will re-create the original relation
P_ITEM_1.
• Problems Solved:
– Redundant data removed – each Pname in once
– Update anomalies – no side effects in operations
– Semantic problems – each part type has just one
name
12/08/21 41
THIRD NORMAL FORM (3NF)

12/08/21 42
Definition of 3NF

A relation is in 3NF
if and only if
it is in 2NF
and
every non-key attribute is non-transitively
fully FD on the candidate key.
The extra
constraint
applied by
3NF

Question : what does non-transitively mean ?

Note that 3NF is more stringent than 2NF, as it requires that the relation
not only have full functional dependencies on the candidate key, but that
these dependencies must now additionally be “non-transitive”.
12/08/21 43
Transitivity
Assume there are three sets of attributes,
‘A’, ‘B’ and ‘C’.

Let A B These FDs are


non-transitive
i.e. direct,
because they do
and B C not go via any
other sets of
If A determines B, and B
attributes. determines C, then
logically A determines C,
Then A C
but transitively via B.

This FD is transitive, because it is via


another set of attributes, in this case ‘B’.

12/08/21 44
Example of Transitive FD
• Suppose pilots always fly the same aircraft
– then if we know the pilot, we know the aircraft; so pilot
functionally determines aircraft.
• If we know the aircraft, then we know the airline that
owns it
– so aircraft functionally determines airline.
• Putting these two dependencies together
– then pilot functionally determines airline.
•  But the functional dependency of airline on pilot is
transitive, because it goes via aircraft.

12/08/21 45
Non-Transitive Full FD & 3NF
So, to be in 3NF means that 

all attributes not in the candidate key


are non-transitively - i.e. directly - fully FD on
all those attributes that are in candidate key,

and not FD on the candidate key


via some other non-key attribute.
12/08/21 46
Reviewing the Definition of 3NF
1.

R1’s FD diagram shows a


R1( Key, NK1, NK2, NK3 )
“chain of dependencies”.
It is not in 3NF.

Key NK1 NK2 NK3

2.

R2( Key, NK1, NK2, NK3 )

R2’s FD diagram shows no NK1


“chain of dependencies”. It is
in 3NF. Key NK2

NK3
12/08/21 47
Example: P_ITEM_2
P_ITEM_2

Ord Part Qty Price As we have already seen, its FD diagram is :-

N
L5 4
70
N8

8 4
L5 5
60 Price
B6
L6 L4 100 3 Ord

L6
B
P3 5 150
Qty
L6
6
Q7 5
5250
Part

L
4 3
Neither ‘Price’ nor ‘Qty’ is FD on the
candidate key via the other, but non-
transitively FD on the key.

Thus P_ITEM_2 is already in 3NF.

12/08/21 48
Example : PART_2
PART_2

Part Pname As we have already seen, its FD diagram is :-

N Nu
N8 Nut

8 t
B6
L4
Bolt
Nail
Part Pname

B Bo
P3 Pump

6 lt
Q7 Motor
Pname’ is non-transitively FD on the candidate key.

L Na
Thus PART_2 is already in 3NF.

If a 2NF relation only has one non-key attribute, then it must


already be in 3NF, as there is no other non-key attribute via
which a transitive dependency can occur.
12/08/21 49
Example : P_ORDER_1
P_ORDER_1

Ord Sno Sname Saddr Date Tot

L5 127 Smith N’cle 7 May 12

Sno
L6 315 Bloggs D’ham 8 June 400

Sname

As we have already seen, its FD Ord Saddr


diagram is :-

Date
However, not all of these FDs are non-
transitive FDs (= NTFDs).
Tot
Sname

Sno
Taking account now of Saddr
transitivity, the FD diagram Ord Date
can be re-drawn as:-
Tot

12/08/21 Hence P_ORDER_1 is not in 3NF. 50


Putting P_ORDER_1 into 3NF (1)
The problem is caused by ‘Sname’
and ‘Saddr’ being only transitively Sname
FD on the candidate key.
Sno
Saddr
Ord Date

Tot

Solution: separate out each determinant and its NTFD dependents, & create 2
replacement relations based on them.

Sno Sname

Ord Date Sno


Saddr
Tot

12/08/21 51
Putting P_ORDER_1 into 3NF (2)

Sno

Ord Date

Tot

P_ORDER_3

Ord Sno Date Tot

The corresponding relation is:- L5 127 7 May 12

L6 315 8 June 400

12/08/21 52
Putting P_ORDER_1 into 3NF (3)

Sname

Sno
Saddr

SUPPLIER_3

Sno Sname Saddr

The corresponding relation is:- 127 Smith N’cle

315 Bloggs D’ham

12/08/21 53
Benefits
• No information has been lost.
– A natural join of P_ORDER_3 and SUPPLIER_3 on
attribute Sno will re-create the original relation
P_ORDER_1.
• Problems Solved:
– Redundant data removed – each Sname in once
– Update anomalies – no side effects in operations
– Semantic problems – each supplier has just one
name
12/08/21 54

You might also like