0% found this document useful (0 votes)
13 views

NORMALIZATION (Autosaved)

Normalization is the process of organizing data in a database to reduce redundancy and ensure data is stored logically. It involves removing anomalies like insertion, updation, and deletion anomalies that occur due to poor planning and redundant data. There are several normal forms like 1NF, 2NF, 3NF and BCNF that organize data to reduce anomalies. Functional dependencies define relationships where one set of attributes determines another. Partial dependencies violate 2NF by having non-key attributes depend on only part of a candidate key.

Uploaded by

ishasidana786
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

NORMALIZATION (Autosaved)

Normalization is the process of organizing data in a database to reduce redundancy and ensure data is stored logically. It involves removing anomalies like insertion, updation, and deletion anomalies that occur due to poor planning and redundant data. There are several normal forms like 1NF, 2NF, 3NF and BCNF that organize data to reduce anomalies. Functional dependencies define relationships where one set of attributes determines another. Partial dependencies violate 2NF by having non-key attributes depend on only part of a candidate key.

Uploaded by

ishasidana786
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

NORMALIZATION

 Normalization is the process of organizing the data and the attributes of a database.
 It is performed to reduce the data redundancy in a database and to ensure that data is stored
logically.
 Data redundancy in DBMS means having the same data but at multiple places.
 It is necessary to remove data redundancy because it causes anomalies in a database which
makes it very hard for a database administrator to maintain it.
Why Do We Need Normalization?

As we have discussed above, normalization is used to reduce data


redundancy. It provides a method to remove the following anomalies from
the database and bring it to a more consistent state:
A database anomaly is a flaw in the database that occurs because of poor
planning and redundancy.
• Insertion anomalies: This occurs when we are not able to insert data into a
database because some attributes may be missing at the time of insertion.
• Updation anomalies: This occurs when the same data items are repeated
with the same values and are not linked to each other.
• Deletion anomalies: This occurs when deleting one part of the data deletes
the other necessary information from the database.
Normal Forms

• There are six types of normal forms that are usually used in relational
database:
• 1NF: A relation is in 1NF if all its attributes have an atomic value.
• 2NF: A relation is in 2NF if it is in 1NF and all non-key attributes are
fully functional dependent on the candidate key in DBMS.
• 3NF: A relation is in 3NF if it is in 2NF and there is no transitive
dependency.
• BCNF: A relation is in BCNF if it is in 3NF and for every Functional
Dependency, LHS is the super key.
Functional dependency
• Functional dependency is a relationship that exists between two sets
of attributes of a relational table where one set of attributes can
determine the value of the other set of attributes. It is denoted by X -
> Y, where X is called a determinant and Y is called dependent.
First Normal Form (1NF)

• A relation is in 1NF if every attribute is a single-valued attribute or it


does not contain any multi-valued or composite attribute, i.e., every
attribute is an atomic attribute. If there is a composite or multi-valued
attribute, it violates the 1NF. To solve this, we can create a new row
for each of the values of the multi-valued attribute to convert the
table into the 1NF.
Example
Let’s take an example of a relational table <EmployeeDetail> that contains the details of the employees of the compan
<EmployeeDetail>
Here, the Employee Phone Number is a multi-valued attribute. So, this relation is not in 1NF.
To convert this table into 1NF, we make new rows with each Employee Phone Number as a new row as shown below:
<EmployeeDetail>
Employee Phone
Employee Phone Employee Code Employee Name
Employee Code Employee Name Number
Number
101 John 998234123
98765623,9982341
101 John 101 John 98765623
23
101 John 89023467 101 John 89023467
102 Ryan 76213908 102 Ryan 76213908
103 Stephanie 98132452 103 Stephanie 98132452
Second normal form 2nf
• Second Normal Form (2NF)
• The normalization of 1NF relations to 2NF involves the elimination of
partial dependencies. A partial dependency in DBMS exists when any non-
prime attributes, i.e., an attribute not a part of the candidate key, is not
fully functionally dependent on one of the candidate keys.
• For a relational table to be in second normal form, it must satisfy the
following rules:
• The table must be in first normal form.
• It must not contain any partial dependency, i.e., all non-prime attributes
are fully functionally dependent on the primary key.
• If a partial dependency exists, we can divide the table to remove the
partially dependent attributes and move them to some other table where
they fit in well.
Example
• Let us take an example of the following <EmployeeProjectDetail>
table to understand what is partial dependency and how to normalize
the table to the second normal form:

<EmployeeProjectDetail>

Employee Employee
Project ID Project Name
Code Name
101 P03 John Project103
101 P01 John Project101
102 P04 Ryan Project104
103 P02 Stephanie Project102
Contd..
• In the above table, the prime attributes of the table are Employee
Code and Project ID. We have partial dependencies in this table
because Employee Name can be determined by Employee Code and
Project Name can be determined by Project ID. Thus, the above
relational table violates the rule of 2NF.

• Prime attribute: they are the key attributes beacause they can be
used to uniquely identify any of the tables record.
• Non- prime attributes: they are those that are not primary key
attributes. They can store a value an unlimited number of times.
Partial dependencies
• In order to understand partial dependency, let us first know some basic
terminologies with the help of an example.
• Consider a relation(table) having four attributes, P, Q, R, S having the following
dependencies:
P,Q→S
Q→R
• Using P and Q, we can derive S, and using only Q, we can derive R. Hence, we can
say that if we use both P and Q together, then we can derive all the attributes of
the table, i.e., P, Q, R, S. (since P→P and Q→Q is self-explanatory).
• We can write as (PQ)+={P,Q,R,S}, or in simple words, we can say the closure
of P and Q gives us all the attributes of the relation. The minimal sets like PQ in a
relation(table) that are capable of deriving all the attributes of a relation(table)
are called Candidate keys. There can be more than one candidate key in a table.
• concept of partial\ dependency.
• If an attribute is a part of any candidate key of the relation, then it is called
a Primary attribute else, it is said to be a Non−Primary attribute. In the
example above, we can say that P and Q are primary attributes,
and R and S are non-primary attributes.
• We now know the basic definitions required to understand the concept of
partial dependency. In the above example, S is dependent on all the
primary attributes, i.e., P and Q. If either P or Q are missing, then we
cannot derive S. In the case of R, it is not the same.
• Even if P, a primary attribute, is missing, we can still derive R using only Q.
Hence, instead of depending totally on the candidate key, R is partially
dependent on Q, part of a candidate key. This is the
Partial Functional Dependency

• A functional dependency denoted as X→Y where X and Y are an


attribute set of a relation, is a partial dependency , if some attribute
A∈X can be removed and the dependency still holds.
• Let us take an example, consider an example of a College. A student
studies in a course, and every student in the college has a unique Roll
number.
Course Roll. No. Name
BTech EE 2015EE42 Saloni
BTech CS 2014CS12 Anshuman
BSc Maths 2017MM16 Saloni
BTech CS 2014CS10 Abhimanyu
Mtech EE 2018EE40 Suchandra
MTech CS 2020CS37 Satbir
Contd..
• Suppose you are a student at this college. If a professor asks you to go and
give a notebook to the student who has a roll. No. 2020CS37, you can
quickly identify the student by observing his/her roll. no.,
i.e., 2020372020CS37. S/he is from 2020 batch,
studying )Computer Science(CS) and his Roll. No. is 3737.
• Hence, you can successfully give him/her the notebook. You don't even
need to know the Course that s/he is pursuing because you can easily
determine it with his/her unique Roll. No.
• In other words, if someone provides you with a just the roll number, you
can quickly tell the student's name. A roll number alone is sufficient to
identify or know the student's name. The Name attribute is partially
dependent on the Roll. No. attribute.
Fully functional dependencies
• A functional dependency denoted as X→Y where X and Y are an
attribute set of a relation, is a full dependency , if all the attributes
present in X are required to maintain the dependency.
• Let us take an example, consider an example of a school. A student
studies in a class, and in each class, every student has a unique Roll
number. Class Roll. No. Name
5 42 Saloni
8 12 Anshuman
11 37 Saloni
8 10 Abhimanyu
10 40 Suchandra
3 37 Satbir
Contd..
• Suppose you are a student at this school. If a teacher asks you to go
and give a notebook to the student who has a roll. No. 37, you will get
confused. Then you will ask the teacher to tell you about the class in
which s/he is studying. You can then quickly identify the student and
successfully give him/her the notebook.
• In other words, if someone provides you with a class and the roll
number, you can quickly tell the student's name. A class or a roll
number alone is insufficient to identify or know the student's name.
The Name attribute is fully dependent on
the Class and Roll. No. attribute.
To remove partial dependencies from this table and normalize it into
second normal form, we can decompose the <EmployeeProjectDetail>
table into the following three tables:
<EmployeeDetail>
<EmployeeProject>
Employee Code Employee Name Employee Code Project ID
101 John 101 P03
101 John 101 P01
102 Ryan 102 P04
103 Stephanie 103 P02

<ProjectDetail>

Project ID Project Name


P03 Project103
P01 Project101
P04 Project104
P02 Project102
Contd..
• Thus, we’ve converted the <EmployeeProjectDetail> table into 2NF by
decomposing it into <EmployeeDetail>, <ProjectDetail> and
<EmployeeProject> tables. As you can see, the above tables satisfy
the following two rules of 2NF as they are in 1NF and every non-
prime attribute is fully dependent on the primary key.
• The relations in 2NF are clearly less redundant than relations in 1NF.
However, the decomposed relations may still suffer from one or more
anomalies due to the transitive dependency. We will remove the
transitive dependencies in the Third Normal Form.
Third Normal Form (3NF)

• The normalization of 2NF relations to 3NF involves the elimination of transitive


dependencies in DBMS.
• A functional dependency X -> Z is said to be transitive if the following three functional
dependencies hold:
• X -> Y
• Y does not -> X
• Y -> Z
• For a relational table to be in third normal form, it must satisfy the following rules:
• The table must be in the second normal form.
• No non-prime attribute is transitively dependent on the primary key.
• For each functional dependency X -> Z at least one of the following conditions hold:
• X is a super key of the table.
• Z is a prime attribute of the table.
• If a transitive dependency exists, we can divide the table to remove the transitively
dependent attributes and place them to a new table along with a copy of the
determinant.
Example
• Let us take an example of the following <EmployeeDetail> table to
understand what is transitive dependency and how to normalize the
table to the third normal form:
<EmployeeDetail>

Employee Employee Employee


Employee City
Code Name Zipcode
101 John 110033 Model Town
101 John 110044 Badarpur
102 Ryan 110028 Naraina
103 Stephanie 110064 Hari Nagar
Contd..
• The above table is not in 3NF because it has Employee Code ->
Employee City transitive dependency because:
• Employee Code -> Employee Zipcode
• Employee Zipcode -> Employee City
• Also, Employee Zipcode is not a super key and Employee City is not a
prime attribute.
• To remove transitive dependency from this table and normalize it into
the third normal form, we can decompose the <EmployeeDetail>
table into the following two tables:
Contd..
<EmployeeDetail> <EmployeeLocation>

Employee Code Employee Name Employee Zipcode Employee Zipcode Employee City
101 John 110033 110033 Model Town
101 John 110044 110044 Badarpur
102 Ryan 110028 110028 Naraina
103 Stephanie 110064 110064 Hari Nagar
Contd..
• Thus, we’ve converted the <EmployeeDetail> table into 3NF by
decomposing it into <EmployeeDetail> and <EmployeeLocation>
tables as they are in 2NF and they don’t have any transitive
dependency.
• The 2NF and 3NF impose some extra conditions on dependencies on
candidate keys and remove redundancy caused by that. However,
there may still exist some dependencies that cause redundancy in the
database. These redundancies are removed by a more strict normal
form known as BCNF.
Boyce-Codd Normal Form (BCNF)

• Boyce-Codd Normal Form(BCNF) is an advanced version of 3NF as it


contains additional constraints compared to 3NF.
• For a relational table to be in Boyce-Codd normal form, it must satisfy
the following rules:
• The table must be in the third normal form.
• For every non-trivial functional dependency X -> Y, X is the superkey
of the table. That means X cannot be a non-prime attribute if Y is a
prime attribute.
• A superkey is a set of one or more attributes that can uniquely
identify a row in a database table.
Example
<EmployeeProjectLead>

Employee Code Project ID Project Leader


101 P03 Grey
101 P01 Christian
102 P04 Hudson
103 P02 Petro

The above table satisfies all the normal forms till 3NF, but it violates the rules of BCNF because the candidate key of the
above table is {Employee Code, Project ID}.
For the non-trivial functional dependency,
Project Leader -> Project ID,
Project ID is a prime attribute but Project Leader is a non-prime attribute. This is not allowed in BCNF.
<EmployeeProject>
<ProjectLead>

Employee Code Project ID Project Leader Project ID


101 P03 Grey P03
101 P01 Christian P01
102 P04 Hudson P04
103 P02 Petro P02

Thus, we’ve converted the <EmployeeProjectLead> table into BCNF by decomposing it into <EmployeeProject> and
<ProjectLead> tables.

You might also like