Normalization
Normalization
1
Normalization
The biggest problem needed to be solved in database is
data redundancy.
Why data redundancy is the problem? Because it causes:
Insert Anomaly
Update Anomaly
Delete Anomaly
2
Normalization (Cont.)
Normalization is the process of removing redundant data
from your tables to improve storage efficiency, data
integrity, and scalability.
Normalization generally involves splitting existing tables
into multiple ones, which must be re-joined or linked
each time a query is issued.
Why normalization?
The relation derived from the user view or data store will most
likely be unnormalized.
The problem usually happens when an existing system uses
unstructured file, e.g. in MS Excel.
3
Steps of Normalization
First Normal Form (1NF)
Second Normal Form (2NF)
Third Normal Form (3NF)
Boyce-Codd Normal Form (BCNF)
Fourth Normal Form (4NF)
Fifth Normal Form (5NF)
4
First Normal Form (1NF)
The official qualifications for 1NF are:
1. Each attribute name must be unique.
2. Each attribute value must be single.
3. Each row must be unique.
4. There is no repeating groups.
Additional:
Choose a primary key.
Reminder:
A primary key is unique, not null, unchanged. A primary
key can be either an attribute or combined attributes.
5
First Normal Form (1NF) (Cont.)
Example of a table not in 1NF :
Group Topic Student Score
Group A Intro MongoDB Sok San 18 marks
Sao Ry 17 marks
Group B Intro MySQL Chan Tina 19 marks
Tith Sophea 16 marks
6
First Normal Form (1NF) (Cont.)
After eliminating:
Group Topic Family Name Given Name Score
A Intro MongoDB Sok San 18
A Intro MongoDB Sao Ry 17
B Intro MySQL Chan Tina 19
B Intro MySQL Tith Sophea 16
Now it is in 1NF.
However, it might still violate 2NF and so on.
7
Example
8
Example (Repeating groups)
9
Example Data Organization: First
Normal Form
After
Before
1
0
Example
Example
Problem in 1NF
Functional Dependencies
We say an attribute, B, has a functional dependency on another
attribute, A, if for any two records, which have
the same value for A, then the values for B in these two records
must be the same. We illustrate this as:
1
5
Functional Dependencies (cont.)
EmpNum -> EmpEmail, EmpFname, EmpLname
3 different ways
you might see
FDs depicted
EmpEmail
EmpNum EmpFname
EmpLname
16
Determinant
Functional Dependency
17
Second Normal Form (2NF)
18
1NF to 2NF –Example
Problems Resolved in 2NF
Problems Remaining in 2NF
Example of a table not in 2NF:
Primary Key
The Course Name depends on only CourseID, a part of the primary key
not the whole primary {CourseID, SemesterID}.It’s called partial
dependency.
Solution:
Remove CourseID and Course Name together to create a new table.
22
CourseID Course Name CourseID SemesterID Num Student
IT101 Database IT101 201301 25
IT101 Database IT101 201302 25
IT102 Web Prog IT102 201301 30
IT102 Web Prog IT102 201302 35
IT103 Networking IT103 201401 20
26
2NF to 3NF –Example
27
Problems Resolved in 3NF
28
Example of a Table not in 3NF:
Primary Key
The Teacher Tel is a nonkey attribute, and
the Teacher Name is also a nonkey atttribute.
But Teacher Tel depends on Teacher Name.
It is called transitive dependency.
Solution:
Remove Teacher Name and Teacher Tel together
to create a new table.
29
Done?
Teacher Name Teacher Tel
Oh no, it is still not in 1NF yet.
Sok Piseth 012 123 456 Remove Repeating row.
Sao Kanha 0977 322 111
StudyID Course Name T.ID
Chan Veasna 012 412 333
1 Database T1
Chan Veasna 012 412 333
2 Database T2
Pou Sambath 077 545 221
3 Web Prog T3
Teacher Name Teacher Tel 4 Web Prog T3
Sok Piseth 012 123 456 5 Networking T4
Sao Kanha 0977 322 111
Chan Veasna 012 412 333
Pou Sambath 077 545 221
31
Boyce Codd Normal Form (BCNF) (Cont.)
Example of a table not in BCNF:
Student Course Teacher
Sok DB John
Sao DB William
Chan E-Commerce Todd
Sok E-Commerce Todd
Chan DB William
Key: {Student, Course}
Functional Dependency:
{Student, Course} Teacher
Teacher Course
Problem: Teacher is not a superkey but determines Course.
32
Solution: Decouple a table
Student Course
contains Teacher and Course
Sok DB from from original table (Student,
Sao DB Course). Finally, connect the new
and old table to third table
Chan E-Commerce contains Course.
Sok E-Commerce
Chan DB Course
DB
E-Commerce
Course Teacher
DB John
DB William
E-Commerce Todd
33
Forth Normal Form (4NF)
The official qualifications for 4NF are:
1. A table is already in BCNF.
2. A table contains no multi-valued dependencies.
34
Forth Normal Form (4NF) (Cont.)
Example of a table not in 4NF:
Student Major Hobby
Sok IT Football
Sok IT Volleyball
Sao IT Football
Sao Med Football
Chan IT NULL
Puth NULL Football
Tith NULL NULL
35
Solution: Decouple to each Student Major
table contains MVD. Finally, Sok IT
connect each to a third table
contains Student. Sao IT
Sao Med
Student Chan IT
Sok Puth NULL
Sao Tith NULL
Chan
Puth Student Hobby
Tith Sok Football
Sok Volleyball
Sao Football
Chan NULL
Puth Football
Tith NULL
36
Fifth Normal Form (5NF)
The official qualifications for 5NF are:
1. A table is already in 4NF.
2. The attributes of multi-valued dependencies are related.
37
Fifth Normal Form (5NF) (Cont.)
Example of a table not in 5NF:
Seller Company Product
Sok MIAF Trading Zenya
Sao Coca-Cola Corp Coke
Sao Coca-Cola Corp Fanta
Sao Coca-Cola Corp Sprite
Chan Angkor Brewery Angkor Beer
Chan Cambodia Brewery Cambodia Beer
38
Company
Seller Company
Seller 1 M Sok 1 MIAF Trading
MIAF Trading
Sok Coca-Cola Corp
Sao Coca-Cola Corp
Sao Angkor Brewery
Chan Angkor Brewery
Chan Cambodia Brewery
Chan Cambodia Brewery
1 1
M
M
1 Company Product
Seller Product
MIAF Trading Zenya
Sok Zenya
M Coca-Cola Corp Coke
Sao Coke
Product Coca-Cola Corp Fanta
Sao Fanta
Zenya Coca-Cola Corp Sprite
Sao Sprite
Coke Angkor Brewery Angkor Beer
Chan Angkor Beer
Fanta Cambodia Cambodia
Chan Cambodia Brewery Beer
Sprite
Beer M
Angkor Beer 1
Cambodia Beer
39