Normalization
Normalization
1
Normalization
The biggest problem needed to be solved in database is
data redundancy.
Why data redundancy is the problem? Because it causes:
Insert Anomaly
Update Anomaly
Delete Anomaly
2
Anomaly means inconsistency in the pattern from the normal
form. In Database Management System (DBMS), anomaly
means the inconsistency occurred in the relational table during
the operations performed on the relational table.
4
Steps of Normalization
First Normal Form (1NF)
Second Normal Form (2NF)
Third Normal Form (3NF)
Boyce-Codd Normal Form (BCNF)
Fourth Normal Form (4NF)
Fifth Normal Form (5NF)
5
First Normal Form (1NF)
The official qualifications for 1NF are:
1. Each attribute name must be unique.
2. Each attribute value must be single.
3. Each row must be unique.
4. There is no repeating groups.
Additional:
Choose a primary key.
Reminder:
A primary key is unique, not null, unchanged. A primary
key can be either an attribute or combined attributes.
6
First Normal Form (1NF) (Cont.)
Example of a table not in 1NF :
Group Topic Student Score
Group A Intro MongoDB Sok San 18 marks
Sao Ry 17 marks
Group B Intro MySQL Chan Tina 19 marks
Tith Sophea 16 marks
7
First Normal Form (1NF) (Cont.)
After eliminating:
Group Topic Family Name Given Name Score
A Intro MongoDB Sok San 18
A Intro MongoDB Sao Ry 17
B Intro MySQL Chan Tina 19
B Intro MySQL Tith Sophea 16
Now it is in 1NF.
However, it might still violate 2NF and so on.
8
Functional Dependencies
We say an attribute, B, has a functional dependency on another
attribute, A, if for any two records, which have
the same value for A, then the values for B in these two records
must be the same. We illustrate this as:
10
Functional Dependencies (cont.)
EmpNum -> EmpEmail, EmpFname, EmpLname
3 different ways
you might see
FDs depicted
EmpEmail
EmpNum EmpFname
EmpLname
11
Determinant
Functional Dependency
12
Second Normal Form (2NF)
13
Example of a table not in 2NF:
Primary Key
The Course Name depends on only CourseID, a part of the primary key
not the whole primary {CourseID, SemesterID}.It’s called partial
dependency.
Solution:
Remove CourseID and Course Name together to create a new table.
14
CourseID Course Name C ourseI D SemesterID Num Student
IT101 Database IT101 201301 25
IT101 Database IT101 201302 25
IT102 Web Prog IT102 201301 30
IT102 Web Prog IT102 201302 35
IT103 Networking IT103 201401 20
Ship-> capacity
Ship, date-> cargo
Cargo, capacity-> value
1. Suppose R(A B C D E) is relational schema and set of
Functional dependency : FDs: A-> B , B-> E, C ->D
Find out the relation R is in 2NF or not? If not decompose it
in 2NF
Primary Key
The Teacher Tel is a nonkey attribute, and
the Teacher Name is also a nonkey atttribute.
But Teacher Tel depends on Teacher Name.
.
Solution:
Remove Teacher Name and Teacher Tel together
to create a new table.
20
Done?
Teacher Name Teacher Tel
Oh no, it is still not in 1NF yet.
Sok Piseth 012 123 456 Remove Repeating row.
Sao Kanha 0977 322 111
StudyI D C ourse N am e T.I D
Chan Veasna 012 412 333
1 Database T1
Chan Veasna 012 412 333
2 Database T2
Pou Sambath 077 545 221
3 Web Prog T3
Teacher Name Teacher Tel 4 Web Prog T3
Sok Piseth 012 123 456 5 Networking T4
Sao Kanha 0977 322 111
Chan Veasna 012 412 333
Pou Sambath 077 545 221
Note about primary key:
-In theory, you can choose ID Teacher Name Teacher Tel
Teacher Name to be a primary key.
T1 Sok Piseth 012 123 456
-But in practice, you should
add Teacher ID as the primary T2 Sao Kanha 0977 322 111
key. T3 Chan Veasna 012 412 333
17
T4 Pou Sambath 077 545 221
Given a relation R( X, Y, Z, W, P) and Functional Dependency set FD =
{ X → Y, Y → P, and Z → W}, determine whether the given R is in 3NF?
If not convert it into 3 NF.
Solution: Let us construct an arrow diagram on R using FD to calculate the candidate key.
XZ is Candidate Key
1.FD: X → Y does not satisfy the definition of 3NF, that neither X is Super Key nor Y
is a prime attribute.
2.FD: Y → P does not satisfy the definition of 3NF, that neither Y is Super Key nor P
is a prime attribute.
3.FD: Z → W satisfies the definition of 3NF, that neither Z is Super Key nor W is a
prime attribute.
Convert the table R( X, Y, Z, W, P) into 3NF:
Since all the FD = { X → Y, Y → P, and Z → W} were not in 3NF, let
us convert R in 3NF
R1(X, Y) {Using FD X → Y}
R2(Y, P) {Using FD Y → P}
R3(Z, W) {Using FD Z → W}
And create one table for Candidate Key XZ
R4( X, Z) { Using Candidate Key XZ }
All the decomposed tables R1, R2, R3, and R4 are in 2NF( as there is
no partial dependency) as well as in 3NF.
Example 01:
Question: Consider a relation R =(ABCD) and FD= (AB→ C, C→ D). Check
Relation is in 3NF or Not?
Solution:
•CK ={ AB}
•Prime attribute ={ A, B}
•Non-prime Attribute= {C, D}
According to first FD (AB→ C) .The L.H.S of FD contains candidate key so, it is
valid for 3rd NF.
According to Second FD (C→ D). The L.H.S of FD is not a candidate key or
supper key And R.H.S is also a non-prime attribute. As both conditions false, So this
FD is not valid for 3rd NF.
Conclusion: As above all FD’s of a relation are not valid, So relation is not in 3NF.
Example 02:
Question: Consider a relation R =(ABCD) and FD= (AB→CD, D→A). Check
Relation is in 3NF or Not?
Solution:
•First find Candidate key (C.K) = {AB, DB}
•Second Find Prime attributes = {A, B, D}
•Third Find Non-Prime attributes = {C}
Now check for 3NF
•For first FD in Relation = AB→CD (As L.H.S contains the Candidate key so this
FD is suitable for 3NF).
•For Second FD in Relation = D→A (As L.H.S does not contains the candidate or
super key but R.H.S contains the prime attribute so this FD is suitable for 3NF.)
Result: As all FD’s of Relation fulfill the conditions of 3NF so this relation is in
3NF
Boyce Codd Normal Form (BCNF) –3.5NF
The official qualifications for BCNF are:
1.A table is already in 3NF.
2. All determinants must be superkeys.
All determinants that are not superkeys are removed to place in
another table.
Means :
in x -> y
x must be super key
27
Boyce Codd Normal Form (BCNF) (Cont.)
Example of a table not in BCNF:
Student Course Teacher
Sok DB John
Sao DB William
Chan E-Commerce Todd
Sok E-Commerce Todd
Chan DB William
Key: {Student, Course}
Functional Dependency:
{Student, Course} Teacher
Teacher Course
Problem: Teacher is not a superkey but determines Course.
28
Solution: Decouple a table
Student Course
contains Teacher and Course
Sok DB from from original table (Student,
Sao DB Course). Finally, connect the new
and old table to third table
C han E-Commerce contains Course.
Sok E-Commerce
C han DB Course
DB
E-Commerce
Course Teacher
DB John
DB W illiam
E-Commerce Todd
29
Which normal form?
S (A,B,C,D)
A-> B
B->C
C->D
D->A
Example:
A B C D
1 a p x
2 b q y
R1 R2
R1 X R2
A B C D
1 a p x
1 a q y
2 b p x
2 b q y
Attribute (R1) ∩ Attribute (R2) ≠ Φ
A B C
1 a p
2 b q
A B B C
1 a a p
2 b b q
3 a a r
A B C
1 a p
1 a r
2 b q
3 a P
3 a r
Advantages of Lossless Decomposition
1.Reduced Data Redundancy: Lossless decomposition helps in reducing the
data redundancy that exists in the original relation. This helps in improving the
efficiency of the database system by reducing storage requirements and
improving query performance.
2.Maintenance and Updates: Lossless decomposition makes it easier to
maintain and update the database since it allows for more granular control over
the data.
3.Improved Data Integrity: Decomposing a relation into smaller relations can
help to improve data integrity by ensuring that each relation contains only data
that is relevant to that relation. This can help to reduce data inconsistencies and
errors.
4.Improved Flexibility: Lossless decomposition can improve the flexibility of
the database system by allowing for easier modification of the schema.
Disadvantages of Lossless Decomposition
•Increased Complexity: Lossless decomposition can increase the
complexity of the database system, making it harder to understand and
manage.
•Increased Processing Overhead: The process of decomposing a relation
into smaller relations can result in increased processing overhead. This can
lead to slower query performance and reduced efficiency.
•Join Operations: Lossless decomposition may require additional join
operations to retrieve data from the decomposed relations. This can also
result in slower query performance.
•Costly: Decomposing relations can be costly, especially if the database is
large and complex. This can require additional resources, such as
hardware and personnel.