0% found this document useful (0 votes)
10 views

Normalization

Normalization

Uploaded by

anshdarock222
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Normalization

Normalization

Uploaded by

anshdarock222
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

Normalization

Asst. Professor- Ms. Attiuttama Mishra

1
Normalization
The biggest problem needed to be solved in database is
data redundancy.
Why data redundancy is the problem? Because it causes:
Insert Anomaly
Update Anomaly
Delete Anomaly

Teacher Subject Teacher Degree Tel

Sok San Database Master's 012666777

Van Sokhen Database Bachelor's 017678678

Sok San E-Commerce Master's 012666777

2
Anomaly means inconsistency in the pattern from the normal
form. In Database Management System (DBMS), anomaly
means the inconsistency occurred in the relational table during
the operations performed on the relational table.

Insertion anomaly: Adding a record wishes another record’s


enrolment which is not logically related to the record.

Deletion anomaly: During deleting the item some instance of


data is removed as well

Update anomaly: Because of a change of a data we have to


update it at its every place of concurrency
Normalization (Cont.)
Normalization is the process of removing redundant data
from your tables to improve storage efficiency, data
integrity, and scalability.
Normalization generally involves splitting existing tables
into multiple ones, which must be re-joined or linked
each time a query is issued.
Why normalization?
The relation derived from the user view or data store will most
likely be unnormalized.
The problem usually happens when an existing system uses
unstructured file, e.g. in MS Excel.

4
Steps of Normalization
First Normal Form (1NF)
Second Normal Form (2NF)
Third Normal Form (3NF)
Boyce-Codd Normal Form (BCNF)
Fourth Normal Form (4NF)
Fifth Normal Form (5NF)

In practice, 1NF, 2NF, bcnf and 3NF are enough for


database.

5
First Normal Form (1NF)
The official qualifications for 1NF are:
1. Each attribute name must be unique.
2. Each attribute value must be single.
3. Each row must be unique.
4. There is no repeating groups.
Additional:
Choose a primary key.

Reminder:
A primary key is unique, not null, unchanged. A primary
key can be either an attribute or combined attributes.

6
First Normal Form (1NF) (Cont.)
Example of a table not in 1NF :
Group Topic Student Score
Group A Intro MongoDB Sok San 18 marks
Sao Ry 17 marks
Group B Intro MySQL Chan Tina 19 marks
Tith Sophea 16 marks

It violates the 1NF because:


Attribute values are not single.
Repeating groups exists.

7
First Normal Form (1NF) (Cont.)
After eliminating:
Group Topic Family Name Given Name Score
A Intro MongoDB Sok San 18
A Intro MongoDB Sao Ry 17
B Intro MySQL Chan Tina 19
B Intro MySQL Tith Sophea 16

Now it is in 1NF.
However, it might still violate 2NF and so on.

8
Functional Dependencies
We say an attribute, B, has a functional dependency on another
attribute, A, if for any two records, which have
the same value for A, then the values for B in these two records
must be the same. We illustrate this as:

A B (read as: A determines B or B depends on A)

employee name project email address


Sok San POS Mart Sys [email protected]
Sao Ry Univ Mgt Sys [email protected]
Sok San Web Redesign [email protected]
Chan Sokna POS Mart Sys [email protected]
Sao Ry DB Design [email protected]

employee name -> email address


9
Functional Dependencies (cont.)
EmpNum EmpEmail EmpFname EmpLname
123 [email protected] John Doe
456 [email protected] Peter Smith
555 [email protected] Alan Lee
633 [email protected] Peter Doe
787 [email protected] Alan Lee

If EmpNum is the PK then the FDs:


EmpNum -> EmpEmail, EmpFname, EmpLname
must exist.

10
Functional Dependencies (cont.)
EmpNum -> EmpEmail, EmpFname, EmpLname
3 different ways
you might see
FDs depicted
EmpEmail
EmpNum EmpFname

EmpLname

EmpNum EmpEmail EmpFname EmpLname

11
Determinant
Functional Dependency

EmpNum -> EmpEmail

Attribute on the left hand side is known as the


determinant
• EmpNum is a determinant of EmpEmail

12
Second Normal Form (2NF)

The official qualifications for 2NF are:


1.A table is already in 1NF.
2. All nonkey attributes are fully dependent on the primary
key.
All partial dependencies are removed to place in another
table.

13
Example of a table not in 2NF:

CourseID SemesterID Num Student Course Name


IT101 201301 25 Database
IT101 201302 25 Database
IT102 201301 30 Web Prog
IT102 201302 35 Web Prog
IT103 201401 20 Networking

Primary Key

The Course Name depends on only CourseID, a part of the primary key
not the whole primary {CourseID, SemesterID}.It’s called partial
dependency.
Solution:
Remove CourseID and Course Name together to create a new table.
14
CourseID Course Name C ourseI D SemesterID Num Student
IT101 Database IT101 201301 25
IT101 Database IT101 201302 25
IT102 Web Prog IT102 201301 30
IT102 Web Prog IT102 201302 35
IT103 Networking IT103 201401 20

Done? Oh no, it is still


not in 1NF yet.
Remove the repeating
groups too.
Finally, connect the CourseID Course Name
relationship. IT101 D atabase
IT102 Web Prog
IT103 Networking
15
Find Out?

Shipping (ship, capacity, date, cargo,


value)

Ship-> capacity
Ship, date-> cargo
Cargo, capacity-> value
1. Suppose R(A B C D E) is relational schema and set of
Functional dependency : FDs: A-> B , B-> E, C ->D
Find out the relation R is in 2NF or not? If not decompose it
in 2NF

2. Suppose R(A B C D E F G H I J) is relational schema and set


of functional dependency :

Find out the relation R is in 2NF or not? If not decompose it


in 2NF
AKTU QUESTIONS
• List all prime and non-prime attributes In Relation R(A,B,C,D,E) with FD set
F = {AB→C, B→E, C→D}.
• Consider the universal relation R = {A, B, C, D, E, F, G, H, I, J} and the set of
functional dependencies F = { {A, B}→{C}, {A}→{D, E}, {B}→{F}, {F}→{G,H},
{D}→{I, J} }. What is the key for R? Decompose R into 2NF and then3NF
relations.

• A set of FDs for the relation R{A, B, C, D, E, F} is AB →C, C → A, BC →


D,ACD → B, BE → C, EC → FA, CF → BD, D→ E. Find a minimum cover forth
is set of FDs
Third Normal Form (3NF)
The official qualifications for 3NF are:
1.A table is already in 2NF.
2. Nonprimary key attributes do not depend on other
nonprimary key attributes
(i.e. no transitive dependencies)
All transitive dependencies are removed to place in
another table.
or
If x->y is FD:
X must be a super key or y is a prime attribute (part
of keys)
19
Example of a Table not in 3NF:

StudyID Course Name Teacher Name Teacher Tel


1 Database Sok Piseth 012 123 456
2 Database Sao Kanha 0977 322 111
3 Web Prog Chan Veasna 012 412 333
4 Web Prog Chan Veasna 012 412 333
5 Networking Pou Sambath 077 545 221

Primary Key
The Teacher Tel is a nonkey attribute, and
the Teacher Name is also a nonkey atttribute.
But Teacher Tel depends on Teacher Name.
.

Solution:
Remove Teacher Name and Teacher Tel together
to create a new table.
20
Done?
Teacher Name Teacher Tel
Oh no, it is still not in 1NF yet.
Sok Piseth 012 123 456 Remove Repeating row.
Sao Kanha 0977 322 111
StudyI D C ourse N am e T.I D
Chan Veasna 012 412 333
1 Database T1
Chan Veasna 012 412 333
2 Database T2
Pou Sambath 077 545 221
3 Web Prog T3
Teacher Name Teacher Tel 4 Web Prog T3
Sok Piseth 012 123 456 5 Networking T4
Sao Kanha 0977 322 111
Chan Veasna 012 412 333
Pou Sambath 077 545 221
Note about primary key:
-In theory, you can choose ID Teacher Name Teacher Tel
Teacher Name to be a primary key.
T1 Sok Piseth 012 123 456
-But in practice, you should
add Teacher ID as the primary T2 Sao Kanha 0977 322 111
key. T3 Chan Veasna 012 412 333
17
T4 Pou Sambath 077 545 221
Given a relation R( X, Y, Z, W, P) and Functional Dependency set FD =
{ X → Y, Y → P, and Z → W}, determine whether the given R is in 3NF?
If not convert it into 3 NF.

Solution: Let us construct an arrow diagram on R using FD to calculate the candidate key.

XZ is Candidate Key
1.FD: X → Y does not satisfy the definition of 3NF, that neither X is Super Key nor Y
is a prime attribute.
2.FD: Y → P does not satisfy the definition of 3NF, that neither Y is Super Key nor P
is a prime attribute.
3.FD: Z → W satisfies the definition of 3NF, that neither Z is Super Key nor W is a
prime attribute.
Convert the table R( X, Y, Z, W, P) into 3NF:
Since all the FD = { X → Y, Y → P, and Z → W} were not in 3NF, let
us convert R in 3NF
R1(X, Y) {Using FD X → Y}
R2(Y, P) {Using FD Y → P}
R3(Z, W) {Using FD Z → W}
And create one table for Candidate Key XZ
R4( X, Z) { Using Candidate Key XZ }
All the decomposed tables R1, R2, R3, and R4 are in 2NF( as there is
no partial dependency) as well as in 3NF.
Example 01:
Question: Consider a relation R =(ABCD) and FD= (AB→ C, C→ D). Check
Relation is in 3NF or Not?
Solution:
•CK ={ AB}
•Prime attribute ={ A, B}
•Non-prime Attribute= {C, D}
According to first FD (AB→ C) .The L.H.S of FD contains candidate key so, it is
valid for 3rd NF.
According to Second FD (C→ D). The L.H.S of FD is not a candidate key or
supper key And R.H.S is also a non-prime attribute. As both conditions false, So this
FD is not valid for 3rd NF.
Conclusion: As above all FD’s of a relation are not valid, So relation is not in 3NF.
Example 02:
Question: Consider a relation R =(ABCD) and FD= (AB→CD, D→A). Check
Relation is in 3NF or Not?
Solution:
•First find Candidate key (C.K) = {AB, DB}
•Second Find Prime attributes = {A, B, D}
•Third Find Non-Prime attributes = {C}
Now check for 3NF
•For first FD in Relation = AB→CD (As L.H.S contains the Candidate key so this
FD is suitable for 3NF).
•For Second FD in Relation = D→A (As L.H.S does not contains the candidate or
super key but R.H.S contains the prime attribute so this FD is suitable for 3NF.)
Result: As all FD’s of Relation fulfill the conditions of 3NF so this relation is in
3NF
Boyce Codd Normal Form (BCNF) –3.5NF
The official qualifications for BCNF are:
1.A table is already in 3NF.
2. All determinants must be superkeys.
All determinants that are not superkeys are removed to place in
another table.
Means :
in x -> y
x must be super key

27
Boyce Codd Normal Form (BCNF) (Cont.)
Example of a table not in BCNF:
Student Course Teacher
Sok DB John
Sao DB William
Chan E-Commerce Todd
Sok E-Commerce Todd
Chan DB William
Key: {Student, Course}
Functional Dependency:
{Student, Course} Teacher
Teacher Course
Problem: Teacher is not a superkey but determines Course.
28
Solution: Decouple a table
Student Course
contains Teacher and Course
Sok DB from from original table (Student,
Sao DB Course). Finally, connect the new
and old table to third table
C han E-Commerce contains Course.
Sok E-Commerce
C han DB Course
DB
E-Commerce
Course Teacher
DB John
DB W illiam
E-Commerce Todd

29
Which normal form?

S (A,B,C,D)
A-> B
B->C
C->D
D->A
Example:

Consider the following relational schemes for a library database:


Book (Title, Author, Catalog_no, Publisher, Year, Price)
Collection (Title, Author, Catalog_no)
With the following functional dependencies:

I. Title, Author -> Catalog_no


II. Catalog_no -> Title, Author, Publisher, Year
III. Publisher, Title, Year -> Price
Assume {Author, Title} is the key for both schemes. find the
highest normal form of this relation scheme is>
. Define partial dependency. Find the
following normal form:-
R (A,B,C,D)
AB-> CD
C->A
D-> B
Lossless Decomposition

The original relation and relation reconstructed from joining decomposed


relations must contain the same number of tuples if the number is increased
or decreased then it is Lossy Join decomposition.

Lossless join decomposition is a decomposition of a relation R into relations


R1, and R2 such that if we perform a natural join of relation R1 and R2, it will
return the original relation R. This is effective in removing redundancy from
databases while preserving the original data.

Attribute(R1) U Attribute (R2) = Attribute (R)

If R is decomposed into = R1, R2


IF we combine then we need R1 again.
But if we are not getting R1 after joining then this time decomposition is
lossy,.
Following conditions must hold: To check for lossless join
decomposition using Functional Dependency set.

1. Attribute(R1) U Attribute (R2) = Attribute (R)

2. On doing intersection of Attributes of relation R1 and relation R2 must not be NULL,


i.e., at least there should be one common attribute in both the table based on which
you join both the table

Attribute (R1) ∩ Attribute (R2) ≠ Φ


A B C D
1 a p x
2 b q y

A B C D

1 a p x

2 b q y

R1 R2
R1 X R2

A B C D
1 a p x
1 a q y
2 b p x
2 b q y
Attribute (R1) ∩ Attribute (R2) ≠ Φ

A B C
1 a p
2 b q

A B B C

1 a a p

2 b b q

3 a a r
A B C
1 a p
1 a r
2 b q
3 a P
3 a r
Advantages of Lossless Decomposition
1.Reduced Data Redundancy: Lossless decomposition helps in reducing the
data redundancy that exists in the original relation. This helps in improving the
efficiency of the database system by reducing storage requirements and
improving query performance.
2.Maintenance and Updates: Lossless decomposition makes it easier to
maintain and update the database since it allows for more granular control over
the data.
3.Improved Data Integrity: Decomposing a relation into smaller relations can
help to improve data integrity by ensuring that each relation contains only data
that is relevant to that relation. This can help to reduce data inconsistencies and
errors.
4.Improved Flexibility: Lossless decomposition can improve the flexibility of
the database system by allowing for easier modification of the schema.
Disadvantages of Lossless Decomposition
•Increased Complexity: Lossless decomposition can increase the
complexity of the database system, making it harder to understand and
manage.
•Increased Processing Overhead: The process of decomposing a relation
into smaller relations can result in increased processing overhead. This can
lead to slower query performance and reduced efficiency.
•Join Operations: Lossless decomposition may require additional join
operations to retrieve data from the decomposed relations. This can also
result in slower query performance.
•Costly: Decomposing relations can be costly, especially if the database is
large and complex. This can require additional resources, such as
hardware and personnel.

You might also like