0% found this document useful (0 votes)
6 views

DATABASE NORMALIZATION

Database normalization is a process that organizes a relational database to reduce redundancy and enhance data integrity, initially proposed by Edgar F. Codd. It involves applying specific rules to achieve different normal forms, which help prevent data anomalies such as insert, update, and delete anomalies. The process improves database efficiency, accuracy, and query performance while ensuring that data is stored correctly and consistently.

Uploaded by

addokenneth58
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

DATABASE NORMALIZATION

Database normalization is a process that organizes a relational database to reduce redundancy and enhance data integrity, initially proposed by Edgar F. Codd. It involves applying specific rules to achieve different normal forms, which help prevent data anomalies such as insert, update, and delete anomalies. The process improves database efficiency, accuracy, and query performance while ensuring that data is stored correctly and consistently.

Uploaded by

addokenneth58
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

DATABASE NORMALIZATION .

Database normalization organizes a relational database according to specific rules called normal forms
to decrease redundancy and enhance data integrity. It was first proposed by British computer scien-
tist Edgar F. Codd as part of his relational model.

It’s something a person does manually, as opposed to a system or a tool doing it. It’s commonly done
by database developers and database administrators.

Normalization involves organizing database columns (attributes) and tables (relations) to enforce de-
pendencies through database integrity constraints. This is achieved by applying formal rules through ei-
ther synthesis (creating a new database design) or decomposition (enhancing an existing one).

IMPORTANCE OF NORMALIZATION
 Make the database more efficient.
 Reduce the storage space that a database takes up.
 Prevent updates being made to some data but not others (called an “update
anomaly”).
 Ensure the data is accurate.
 Ensure the queries on a database run as fast as possible.
 Prevent the same data from being stored in more than one place (called an “in-
sert anomaly”).
 Prevent data not being deleted when it is supposed to be, or from data being lost
when it is not supposed to be (called a “delete anomaly”).

Normalization in a DBMS (Database Management System) is done to achieve these


points. Without normalization on a database, the data can be slow, incorrect, and mess.

Data Anomalies
An anomaly is where there is an issue in the data that is not meant to be there. This can
happen if a database is not normalised.

Let’s take a look at the different kinds of data anomalies that can occur and that can be
prevented with a normalised database.

Our Example
We’ll be using a student database as an example in this article, which records stu-
dent, class, and teacher information.

Fees Course
Student ID Student Name Class 1 Class 2 Class 3
Paid Name
1 John Smith 200 Econs Econs 1 Biology 1

2 Maria Griffin 500 Com Science Biology 1 Business Intro Programming 2

3 Susan Johnson 400 Medicine Biology 2

4 Matt Long 850 Dentistry

Let’s say our student database looks like this:

This table keeps track of a few pieces of information:

 The student names


 The fees a student has paid
 The classes a student is taking, if any
This is not a normalised table, and there are a few issues with this.

Insert Anomaly
An insert anomaly happens when we try to insert a record into this table without know-
ing all the data we need to know.

For example, if we wanted to add a new student but did not know their course name.

The new record would look like this:

Student Student Fees Course


Class 1 Class 2 Class 3
ID Name Paid Name

1 John Smith 200 Econs Econs 1 Biology 1

Com Sci-
2 Maria Griffin 500 Biology 1 Business Intro Programming 2
ence

Susan John-
3 400 Medicine Biology 2
son

4 Matt Long 850 Dentistry


Jared Old-
5 0 ?
ham

We would be adding incomplete data to our table, which can cause issues when trying
to analyse this data.

Update Anomaly
An update anomaly happens when we want to update data, and we update some of the
data but not other data.

For example, let’s say the class Biology 1 was changed to “Intro to Biology”. We would
have to query all of the columns that could have this Class field and rename each one
that was found.

Stu- Student Fees Course


Class 1 Class 2 Class 3
dent ID Name Paid Name

Intro to Biol-
1 John Smith 200 Econs Econs 1
ogy

Intro to Biol-
2 Maria Griffin 500 Com Science Business Intro Programming 2
ogy

Susan John-
3 400 Medicine Biology 2
son

4 Matt Long 850 Dentistry

There’s a risk that we miss out on a value, which would cause issues.

Ideally, we would only update the value once, in one location.

Delete Anomaly
A delete anomaly occurs when we want to delete data from the table, but we end up
deleting more than what we intended.
For example, let’s say Susan Johnson quits and her record needs to be deleted from the
system. We could delete her row:

Student Student Fees


Course Name Class 1 Class 2 Class 3
ID Name Paid

Economics
1 John Smith 200 Economics Biology 1
1

Computer Sci- Business Program-


2 Maria Griffin 500 Biology 1
ence Intro ming 2

Susan John-
3 400 Medicine Biology 2
son

4 Matt Long 850 Dentistry

But, if we delete this row, we lose the record of the Biology 2 class, because it’s not
stored anywhere else. The same can be said for the Medicine course.

We should be able to delete one type of data or one record without having impacts on
other records we don’t want to delete.

What Are The Normal Forms?


The process of normalization involves applying rules to a set of data. Each of these rules
transforms the data to a certain structure, called a normal form. There are three
main normal forms that you should consider (Actually, there are six normal forms in
total, but the first three are the most common).

Whenever the first rule is applied, the data is in “first normal form“. Then, the second
rule is applied and the data is in “second normal form“. The third rule is then applied
and the data is in “third normal form“.

Fourth and fifth normal forms are then achieved from their specific rules.

The normal forms (from least normalized to most normalized) are

UNF: Unnormalized form

1NF: First normal form

2NF: Second normal form


3NF: Third normal form

EKNF: Elementary key normal form

BCNF: Boyce–Codd normal form

4NF: Fourth normal form

ETNF: Essential tuple normal form

5NF: Fifth normal form

DKNF: Domain-key normal form

6NF: Sixth normal form

First Normal Form


First normal form is the way that your data is represented after it has the first rule of
normalization applied to it. Normalization in DBMS starts with the first rule being applied
– you need to apply the first rule before applying any other rules.

Second Normal Form


The rule of second normal form on a database can be described as:

1. Fulfil the requirements of first normal form


2. Each non-key attribute must be functionally dependent on the primary key

What does this even mean?

It means that the first normal form rules have been applied. It also means that each
field that is not the primary key is determined by that primary key, so it is spe-
cific to that record. This is what “functional dependency” means.

Third Normal Form


Third normal form is the final stage of the most common normalization process. The rule
for this is:

 Fulfils the requirements of second normal form


 Has no transitive functional dependency
What does this even mean? What is a transitive functional dependency?
It means that every attribute that is not the primary key must depend on the
primary key and the primary key only.

Fourth Normal Form and Beyond

Fourth normal form is the next step after third normal form.

What does it mean?

It needs to satisfy two conditions:

 Meet the criteria of third normal form.


 There are no non-trivial multivalued dependencies other than a candidate key.
So, what does this mean?

A multivalued dependency is probably better explained with an example, which I’ll show
you shortly. It means that there are other attributes in the table that are not dependent
on the primary key, and can be moved to another table.

You might also like