Normalization
Normalization
Normalization
NOTE: Unless each step is carried out properly the next step will be
flawed, i.e., unless a data set is in first normal form it will never be in
a valid second or third normal form.
Unnormalised
(0NF)
Remove repeating groups
First normal form
(1NF)
Remove partial dependencies
Second normal form
(2NF)
R3 = (Pilot Number, Name, {Date Flown, Aircraft Number, Aircraft Type, Hours Flown})
Single attribute dataset should be excluded, thus R4 is not a
Or
good choice
R4 = (Pilot Number, Name, {Date Flown, {Aircraft Number, Aircraft Type, Hours Flown}})
Issues:
Single attribute/column dataset should not
appear in any repeating group/s
(i.e., 2nd dataset R3, not R4, in last slide)
Calculated field should be excluded from any
repeating groups
They cause data redundancy, if included;
The data can be calculated/generated at the time when final report is
displayed/printed;
There are no more remaining repeating groups thus the data set is now in
1NF.
Becomes
R11 = (Pilot Number, Name)
R12 = (Pilot Number, Date Flown, {Aircraft Number, Aircraft Type, Hours Flown})
Then
R12 = (Pilot Number, Date Flown, {Aircraft Number, Aircraft Type, Hours Flown})
Becomes
R121 = (Pilot Number, Date Flown)
R122 = (Pilot Number, Date Flown, Aircraft Number, Aircraft Type, Hours Flown)
In Summary:
0NF (Un-normalised):
R1 = (Pilot Number, Name, {Date Flown, {Aircraft Number, Aircraft Type, Hours Flown}})
1NF (First Normal Form):
However this solution raises a common problem, where one of the data sets (e.g., R121) IS
PART OF another dataset (e.g., R122). This indicated a flaw in the normalisation process.
This problem can be solved by excluding the single attribute dataset in R1, and redo
normalisation.
Compiled by E.Maina SCT 307 Slides 33
Relational Symbol Notation
• We use relational symbol notation, e.g., R1, R2 … to
represent data sets (or relations), rather than relation
name/s, during normalization process
• When a relation (or dataset ), say R1, is split into two or
more datasets, the new datasets will be named as R11, R12,
…
– For each splitting, add one more level of subscript/s to the right of
R1. indicate they all originated from R1
– Similarly, if R1223 is split, then resultant datasets will be named
R12231, R12232, R12233, … – This makes it easy for us to track back
the dataset splitting when we find anything wrong in some late
stage of the normalization process.
Trying a new example from scratch create an un-normalised data set for:
1NF:
• Normalisation steps:
1. Gather the un-normalised data set (covered in week 01)
2. Remove the repeating groups and identify keys (1NF)
3. Remove all partial functional dependencies (2NF)
4. Remove all transitive dependencies (3NF)
5. Name the resultant data sets
Note: It is important to note that these steps MUST be
performed in order to ensure that the correct result is
reached.