0% found this document useful (0 votes)
39 views

2 Normalization Part I

The document discusses database normalization theory and different normal forms. It explains how to decompose relations into higher normal forms through lossless decomposition while preserving functional dependencies. Normalization is important for reducing data anomalies and issues with updates, inserts and deletes.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views

2 Normalization Part I

The document discusses database normalization theory and different normal forms. It explains how to decompose relations into higher normal forms through lossless decomposition while preserving functional dependencies. Normalization is important for reducing data anomalies and issues with updates, inserts and deletes.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Normalization

● Normalization theory is a formalism of simple ideas with


a practical application in logical database schema design.
● Normalization theory allow to recognize relations with
undesirable properties, tell us what is "wrong" & how to
"correct" it.
● Normalization makes a relation better in the sense that it
does not contain any update problems after normalization
Normal Forms
● Normalization theory is built around normal forms - each
normal form has a set of satisfiable criteria.
● Normal forms exist in a hierarchy:
1NF 2NF 3NF BCNF 4NF PJ/NF (5NF)
A normal form implies all lower normal forms.
● Codd defined 1NF, 2NF, 3NF in 1972 based on primary keys.
● 3NF had inadequacies so it was revised in 1974 by Boyce/Codd
(BCNF).
● 1977 Fagin defined 4NF, 1979 defined 5NF.
● 6NF,7NF ?... dependencies theory suggests there may be higher
NFs but not practicable in database environment.
Normalizing a relation
● Normalization of a relation involves converting the
relation into higher normal form relation(s)
● Decomposition operation is used to convert the
relation into equivalent higher normal form
relation(s)
● Decomposition of a relation is essentially project
relational algebra operation
● Join is the reverse of decomposition operation
● Decomposition must be lossless or reversible
Decomposition
S# Status City
s3 30 Paris
s5 30 Athens
Decomposition
S# Status City
s3 30 Paris
s5 30 Athens
decompose
Decomposition
S# Status City
s3 30 Paris
s5 30 Athens
decompose
SST SC
S# Status S# City
s3 30 S3 Paris
s5 30 s5 Athens

We can decompose the bigger relation (S#, Status, City) into two
smaller relations SST (S#, Status) and SC (S#, City)
Decomposition
S# Status City
s3 30 Paris
s5 30 Athens

SST SC
Join
S# Status S# City
s3 30 S3 Paris
s5 30 s5 Athens
Decomposition
S# Status City
s3 30 Paris
s5 30 Athens

SST SC
Join
S# Status S# City
s3 30 S3 Paris
s5 30 s5 Athens
Decomposition
S# Status City
s3 30 Paris
s5 30 Athens

SST SC
S# Status S# City
s3 30 S3 Paris
s5 30 s5 Athens

S# Status City
Decomposition
S# Status City
s3 30 Paris
s5 30 Athens

SST SC
S# Status S# City
s3 30 S3 Paris
s5 30 s5 Athens

S# Status City
Decomposition
S# Status City
s3 30 Paris
s5 30 Athens

SST SC
S# Status S# City
s3 30 S3 Paris
s5 30 s5 Athens

S# Status City
s3 30 Paris
Decomposition
S# Status City
s3 30 Paris
s5 30 Athens

SST SC
S# Status S# City
s3 30 S3 Paris
s5 30 s5 Athens

S# Status City
s3 30 Paris
Decomposition
S# Status City
s3 30 Paris
s5 30 Athens

SST SC
S# Status S# City
s3 30 S3 Paris
s5 30 s5 Athens

S# Status City
s3 30 Paris
s5 30 Athens
Decomposition
S# Status City
s3 30 Paris
s5 30 Athens

SST SC
S# Status S# City
s3 30 S3 Paris
s5 30 s5 Athens

S# Status City
s3 30 Paris
s5 30 Athens
Decomposition
S# Status City
s3 30 Paris
s5 30 Athens

SST SC
S# Status S# City Lossless
Decomposition
s3 30 S3 Paris
s5 30 s5 Athens

S# Status City
s3 30 Paris
s5 30 Athens
Decomposition
S# Status City
s3 30 Paris
s5 30 Athens
decompose

SST STC
S# Status Status City
s3 30 30 Paris
s5 30 30 Athens
Decomposition
S# Status City
s3 30 Paris
s5 30 Athens

SST STC
Join
S# Status Status City
s3 30 30 Paris
s5 30 30 Athens
Decomposition
S# Status City
s3 30 Paris

s5 30 Athens

SST STC
S# Status Status City
s3 30 30 Paris
s5 30 30 Athens
S# Status City
s3 30 Paris
Decomposition
S# Status City
s3 30 Paris

s5 30 Athens

SST STC
S# Status Status City
s3 30 30 Paris
s5 30 30 Athens
S# Status City
s3 30 Paris

s3 30 Athens
Decomposition
S# Status City
s3 30 Paris

s5 30 Athens

SST STC
S# Status Status City
s3 30 30 Paris
s5 30 30 Athens
S# Status City
s3 30 Paris

s3 30 Athens
s5 30 Paris
Decomposition
S# Status City
s3 30 Paris

s5 30 Athens

SST STC
S# Status Status City
s3 30 30 Paris
s5 30 30 Athens
S# Status City
s3 30 Paris

s3 30 Athens
s5 30 Paris
s5 30 Athens
Decomposition
S# Status City
s3 30 Paris

s5 30 Athens

SST STC
S# Status Status City
s3 30 30 Paris
s5 30 30 Athens
S# Status City
s3 30 Paris

s3 30 Athens
s5 30 Paris
s5 30 Athens
Decomposition
S# Status City
s3 30 Paris

s5 30 Athens

SST STC
Lossy
S# Status Status City Decomposition
s3 30 30 Paris
s5 30 30 Athens
S# Status City
s3 30 Paris

s3 30 Athens
s5 30 Paris
s5 30 Athens
Lossless Decomposition
S# Status City
s3 30 Paris
s5 30 Athens
Decomposition 1 (lossless)

SST SC
S# Status S# City
s3 30 S3 Paris
s5 30 s5 Athens
Decomposition 2 (lossy)

SST STC
S# Status Status City
s3 30 30 Paris
s5 30 30 Athens
Lossless Decomposition
● What decomposition are lossless?
Heath’s Theorem: Let R(A,B,C) be a relation where A,
B, C are sets of attributes of R. If R satisfies AB then
R is equal to join of its projections on {A,B} and {A,C}.

S#Status holds
Status S# doesn’t holds
● All functional dependencies must also be preserved
● Not possible for all lossless decompositions
First Normal Form (1NF)
● A relation is in 1NF iff every tuple contains exactly
one value for each attribute.
● Every relation (in relational model) is in 1NF
In order to show 2NF let us unite S and SP to get:
FIRST(S#,STATUS,CITY,P#,QTY)
● We also introduce a new constraint such that
STATUS is functionally dependent on CITY, eg
● London suppliers have status 10, always
Paris suppliers have status 20, always
● Primary key is (S#,P#) and the functional
dependency diagram is ...
S# Status City P# QTY
S1 20 London P1 300
S1 20 London P2 200
S1 20 London P3 400
S2 30 Paris P6 100
FD Diagram for FIRST
Update Problems with FIRST
● INSERT: Cannot insert the fact that a supplier
exists until that supplier actually makes a shipment
● DELETE: Deleting the last tuple based on S#,P#
could lose the information that S2 is located in
CITY Paris
● UPDATE: CITY values occur for each shipment
thus an update of CITY is unnecessarily expensive.
Cause of the update problems
● All these update problems are due to FDs
● S#Status
● S#City
● One solution is to replace FIRST by:
● SECOND(S#,STATUS,CITY) and SP(S#,P#,QTY)
● This yields the following FD diagram:

This is appealing as follows:


INSERT: can enter the fact that S5 is in ATHENS without S5 actually having to make
a shipment
DELETE: can delete shipment tuples and not lose location information
UPDATE: information appears once only thus updating is more efficient
Definition of 2NF

A relation R is in 2NF iff it is in 1NF and every nonkey


attribute is fully dependent on the primary key.
● A relation in 1NF and not in 2NF can always be reduced to a
collection of 2NF relations by replacing by an equivalent
collection of projections.
● Equivalence here means 1NF form can be obtained from the
2NF forms by taking the natural joins so the decomposition
is non-loss and reversible.
R(A,B,C,D) PK={A,B} AD
can be decomposed into 2NF relations
R1(A,D) AD
R2(A,B,C)
Problems with SECOND relation

● The relation SECOND still has some problems however,


it lacks mutual independence among the nonkey
attributes.
Problems

● The dependency of STATUS on S# is fully functional


but is transitive via CITY and this can lead to update
anomalies as follows:
● INSERT: cannot enter status of Rome as 50 until we
have some supplier in Rome
● DELETE: if we delete the only SECOND tuple of a
city we destroy the status of that city
● UPDATE: STATUS of a CITY appears many times
Cause of the update problems
● All these update problems are due to FDs
● CityStatus
Decompose SECOND

So we replace SECOND by:


● SC(S#,CITY) and CS(CITY,STATUS)
● This gives the following FD diagram:

● In SECOND, STATUS did not describe the entity identified by
the primary key (S#).

You might also like