DATABASE NORMALIZATION
DATABASE NORMALIZATION
Database normalization organizes a relational database according to specific rules called normal forms
to decrease redundancy and enhance data integrity. It was first proposed by British computer scien-
tist Edgar F. Codd as part of his relational model.
It’s something a person does manually, as opposed to a system or a tool doing it. It’s commonly done
by database developers and database administrators.
Normalization involves organizing database columns (attributes) and tables (relations) to enforce de-
pendencies through database integrity constraints. This is achieved by applying formal rules through ei-
ther synthesis (creating a new database design) or decomposition (enhancing an existing one).
IMPORTANCE OF NORMALIZATION
Make the database more efficient.
Reduce the storage space that a database takes up.
Prevent updates being made to some data but not others (called an “update
anomaly”).
Ensure the data is accurate.
Ensure the queries on a database run as fast as possible.
Prevent the same data from being stored in more than one place (called an “in-
sert anomaly”).
Prevent data not being deleted when it is supposed to be, or from data being lost
when it is not supposed to be (called a “delete anomaly”).
Data Anomalies
An anomaly is where there is an issue in the data that is not meant to be there. This can
happen if a database is not normalised.
Let’s take a look at the different kinds of data anomalies that can occur and that can be
prevented with a normalised database.
Our Example
We’ll be using a student database as an example in this article, which records stu-
dent, class, and teacher information.
Fees Course
Student ID Student Name Class 1 Class 2 Class 3
Paid Name
1 John Smith 200 Econs Econs 1 Biology 1
Insert Anomaly
An insert anomaly happens when we try to insert a record into this table without know-
ing all the data we need to know.
For example, if we wanted to add a new student but did not know their course name.
Com Sci-
2 Maria Griffin 500 Biology 1 Business Intro Programming 2
ence
Susan John-
3 400 Medicine Biology 2
son
We would be adding incomplete data to our table, which can cause issues when trying
to analyse this data.
Update Anomaly
An update anomaly happens when we want to update data, and we update some of the
data but not other data.
For example, let’s say the class Biology 1 was changed to “Intro to Biology”. We would
have to query all of the columns that could have this Class field and rename each one
that was found.
Intro to Biol-
1 John Smith 200 Econs Econs 1
ogy
Intro to Biol-
2 Maria Griffin 500 Com Science Business Intro Programming 2
ogy
Susan John-
3 400 Medicine Biology 2
son
There’s a risk that we miss out on a value, which would cause issues.
Delete Anomaly
A delete anomaly occurs when we want to delete data from the table, but we end up
deleting more than what we intended.
For example, let’s say Susan Johnson quits and her record needs to be deleted from the
system. We could delete her row:
Economics
1 John Smith 200 Economics Biology 1
1
Susan John-
3 400 Medicine Biology 2
son
But, if we delete this row, we lose the record of the Biology 2 class, because it’s not
stored anywhere else. The same can be said for the Medicine course.
We should be able to delete one type of data or one record without having impacts on
other records we don’t want to delete.
Whenever the first rule is applied, the data is in “first normal form“. Then, the second
rule is applied and the data is in “second normal form“. The third rule is then applied
and the data is in “third normal form“.
Fourth and fifth normal forms are then achieved from their specific rules.
It means that the first normal form rules have been applied. It also means that each
field that is not the primary key is determined by that primary key, so it is spe-
cific to that record. This is what “functional dependency” means.
Fourth normal form is the next step after third normal form.
A multivalued dependency is probably better explained with an example, which I’ll show
you shortly. It means that there are other attributes in the table that are not dependent
on the primary key, and can be moved to another table.