0% found this document useful (0 votes)
7 views

Unit3 Normalization

Normalization is a process of organizing data in a database to reduce redundancy and eliminate anomalies such as insertion, update, and deletion issues. It involves dividing large tables into smaller ones and defining relationships between them, progressing through various normal forms (1NF, 2NF, 3NF, BCNF, 4NF, and 5NF) to ensure data integrity and efficiency. Each normal form addresses specific types of dependencies and anomalies, ultimately leading to a more structured and reliable database.

Uploaded by

vpabdulazeez8
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Unit3 Normalization

Normalization is a process of organizing data in a database to reduce redundancy and eliminate anomalies such as insertion, update, and deletion issues. It involves dividing large tables into smaller ones and defining relationships between them, progressing through various normal forms (1NF, 2NF, 3NF, BCNF, 4NF, and 5NF) to ensure data integrity and efficiency. Each normal form addresses specific types of dependencies and anomalies, ultimately leading to a more structured and reliable database.

Uploaded by

vpabdulazeez8
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Normalization

Normalization is a systematic approach of decomposing tables to eliminate data redundancy


and undesirable characteristics like insertion,updation and deletion anomalies.Normalization usually
involves dividing large tables into smaller tables and defining relationship between them.

Suppose a manufacturing company stores the employee details in a table named employee that
has four attributes: empid for storing employee’s id, empname for storing employee’s name,
empstreet for storing employee’s address and empdept for storing the department details in which the
employee works. At some point of time the table looks like this:

Table_Product

empid empname empstreet empdept

100 A Kochi D01

101 B Calicut D03

101 B Calicut D02

102 C Kannur D05

103 D Trivandrum D04

103 D Trivandrum D06

Update anomaly: In the above table we have two rows for employee B as he belongs to two
departments of the company. If we want to update the address of B then we have to update the same
in two rows or the data will become inconsistent. If somehow, the correct address gets updated in one
department but not in other then as per the database. B would be having two different addresses,
which is not correct and would lead to inconsistent data.

Insert anomaly: Suppose a new employee joins the company, who is under training and currently not
assigned to any department then we would not be able to insert the data into the table if empdept field
doesn’t allow nulls.

Delete anomaly: Suppose, if at a point of time the company closes the department D05 then deleting
the rows that are having empdept as D05 would also delete the information of employee C since he is
assigned only to this department.

To overcome these anomalies we need to normalize the data.

Normalization video ; https://www.youtube.com/watch?v=ZJ4Mc75SajY&t=222s

Un-Normalized Form (UNF)


If a table contains non-atomic values at each row, it is said to be in UNF. An atomic value is a value
that cannot be divided. A non-atomic value, as the name suggests, can be further decomposed and
simplified

In the below table contains non-atomic values , the values in the COLOR column in the first row can
be divided into "red" and "blue“. Hence it is not in 1NF.This table is in UNF (Un-Normalized Form).

Table_Product

P_ID COLOR PRICE

100 red,blue 30

200 green 18

300 white 25

400 green,orange 40

500 red 21

First Normal Form(1NF)


A database is in first normal form if it satisfies the following condition

1)Contains only atomic values

2)There are no repeating columns. (Repeating columns means that a table contains two or more
columns that are closely related)., Each record needs to be unique. i.e., there are no duplicated rows in
the table For example Table_Product

P_ID COLOR PRICE

100 red,blue 30

200 green 18

300 white 25

400 green,orange 40

500 red 21
To bring this table to first normal form, we split the table into two tables and now we have the
resulting tables:

Table_Product_Price Table_Product_Color

P_ID PRICE P_ID COLOR

100 30 100 Red

200 18 100 Blue

300 25 200 Green

400 40 300 White

500 21 400 Green

400 Orange

500 red

1NF video: https://www.youtube.com/watch?v=MeOONwKUWLI&t=229s

Second Normal Form (2NF)


A database is in second normal form if it satisfies the following conditions:

1)It is in first normal form

2)All non-key attributes are fully functional dependent on the primary key, ie., it has no partial
dependencies on the primary key.

TABLE_Purchase_details

Cust_ID Store_ID Purchase_location

1 1 Delhi

1 3 Chennai
2 1 Delhi
3 2 Kolkata
4 3 Chennai

This table has a composite primary key [Cust_ID, Store _ID]. The non-key attribute [Purchase
Location] only depends on [Store ID], which is only part of the primary key, ie., it has partial
dependency. As per the Second Normal Form there must not be any partial dependency of any
column on primary key.

Therefore, this table does not satisfy second normal form.

To bring this table to second normal form, we break the table into two tables, and now we have the
following:

Table_purchase

Table_store

Cust_ID Store_ID

Store_ID Purchase_location
1 1

1 3 1 Delhi

2 1 2 Kolkata

3 2 3 Chennai

4 3

In the table [TABLE_STORE],the column [Purchase_Location] is fully dependent on the primary key of that
table,which is [Store_ID].

Second Normal form video : https://www.youtube.com/watch?v=f5tHmuZn_Tg

Third Normal Form(3NF)


A database is in third normal form if it satisfies the following conditions:
1)It is in second normal form
2)There is no transitive functional dependency

What is transitive functional dependency


A B [B depends on A]
And
B C [C depends on B]
Then we may derive
A C[C depends on A

Such derived dependencies hold well in most of the situations. For example if we have
Roll Marks
And
Marks Grade
Then we may safely derive
Roll Grade.
This third dependency was not originally specified but we have derived it.

The derived dependency is called a transitive dependency when such dependency becomes
improbable. For example we have been given
Roll City
And
City STDCode
If we try to derive Roll STDCode it becomes a transitive dependency, because obviously
the
STDCode of a city cannot depend on the roll number issued by a school or college. In such a
case the
relation should be broken into two, each containing one of these two dependencies:
Roll City
And
City STD code)

For example, in the following table , street , city and state are unbreakably bound to their zip code.

CustID Name DOB Street City State Zipcode

A100 A 12-3-1990 gudal des lova 78321

A101 B 23-5-1991 dal nevada rino 89251

A102 C 12-9-1991 port gindal hazard 57152

A103 D 25-4-1992 kallar tulsa origon 73624

The dependency between the zip code and the address is called as a transitive dependency . To bring
this table to third normal form, we break the table into two tables, and now we have the following

CustID Name DOB Zipcode

A100 A 12-3-1990 78321

A101 B 23-5-1991 89251

A102 C 12-9-1991 57152

A103 D 25-4-1992 73624

Zipcode Street City State

78321 gudal des lova


89251 dal nevada rino

57152 port gindal hazard

73624 kallar tulsa origon

The advantages of removing transitive dependencies are mainly two-fold. First, the amount of data
duplication is reduced and therefore your database becomes smaller.The second advantage is data
integrity.

Third Normal form video : https://www.youtube.com/watch?v=KdxSv8kwvM0&t=9s

Boyce-Code Normal Form (BCNF)

A relationship is said to be in BCNF if it is already in 3NF and the left hand side of every
dependency is a candidate key. A relation which is in 3NF is almost always in BCNF. Boyce and Codd
Normal Form is a higher version of the Third Normal form.

These could be same situation when a 3NF relation may not be in BCNF the following
conditions are found true.

1. The candidate keys are composite.


2. There are more than one candidate keys in the relation.
3. There are some common attributes in the relation

Professor Code Department Head of Dept. Percent Time


P1 Physics Gigin 50
P1 Computer Science Pandit 50
P2 Chemistry Rao 25
P2 Physics Gigin 75
P3 Computer Science Pandit 100

Consider, as an example, the above relation. It is assumed that:

1. A professor can work in more than one department


2. The percentage of the time he spends in each department is given.
3. Each department has only one Head of Department.
The relation diagram for the above relation is given as the following:

The given relation is in 3NF. Observe, however, that the names of Dept. and Head of Dept. are
duplicated. Further, if Professor P2 resigns, rows 3 and 4 are deleted. We lose the information that
Rao is the Head of Department of Chemistry.

The normalization of the relation is done by creating a new relation for Dept. and Head of Dept. and
deleting Head of Dept. form the given relation. The normalized relations are shown in the following.

Professor Code Department Percent Time


P1 Physics 50 Department Head of Dept.
P1 Computer science 50
P2 Chemistry 25 Physics Gigin
P2 Physics 75 Computer Science Pandit
P3 Computer science 100 Chemistry Rao
See the dependency diagrams for these new relations.

BCNF Video: https://www.youtube.com/watch?v=Ac_D3jTWd6g

4NF: A database table is said to be in 4NF if it is in BCNF and contains no multivalued dependency.

A table is said to have multi-valued dependency, if the following conditions are true,
For a dependency A → B, if for a single value of A, more than one value of B exists, then the table may
have multi-valued dependency.
Also, a table should have at-least 3 columns.
And, for a table with A,B,C , if there is a multi-valued dependency between, A and B, then B and C should
be independent of each other.
Reg_id Subject Hobby

101 C Reading
101 JAVA Singing

102 C# Cricket
102 PHP football

In the table above, student with Reg_id 101 has opted for two subjects C and Java and , has two
hobbies,Reading and Singing

Reg_id Subject Hobby


101 C Reading
101 JAVA Singing
101 C Singing
101 JAVA Reading

In the table above, there is no relationship between the columns Subject and hobby. They are
independent of each other . So there is multi-value dependency, which leads to un-necessary repetition
of data. To eliminate this dependency, divide the table into two as below
Reg_id Subject Reg_id Hobby
101 C 101 Reading
101 JAVA 101 Singing
102 C#
102 Cricket
102 PHP
102 football

Now this relation satisfies the fourth normal form.


4NF Video: https://www.youtube.com/watch?v=CX3DF03VyrE
5NF :
A database table is said to be in 5NF if it is in 4NF and contains no join dependencies.The process
of converting the table into 5NF is as follows:
1)Remove the join dependency.
2)Break the database table into smaller and smaller tables to remove all data redundancy.

Company Product Suppliers


Godrej Soap Vinu
Godrej Shampoo Anu
Godrej Shampoo Vinu
H.Lever Soap Vinu
H.Lever Shampoo Anu
H.Lever Soap Manu
5NF of above table is as follows:

Company Product Product Suppliers


Company Suppliers
Godrej Soap Soap Vinu
Godrej Vinu Shampoo Anu
Godrej Shampoo Godrej Anu
H.Lever Soap Shampoo Vinu
H.Lever Vinu Soap Manu
H.Lever Shampoo H.Lever Anu
H.Lever Manu

5NF video : https://www.youtube.com/watch?v=yamHNlCg0Xs


For more study materials
Visit www.bcamcas.in

You might also like