Normalization-2
Normalization-2
com/omarelgabrys-blog/database-normalization-part-7-ef7225150c7f
Database — Normalization
The process of organizing your database through a set of defined steps.
This is a process where you take your database design, and you apply a set of formal criteria of
rules called “Normal Forms”. And we step through them first normal form, second normal form,
and third normal form.
You usually will end up creating a few new tables as part of the process. But, the end result of
normalization is to make your database easier to edit, easier to maintain, and preform operations,
remove any duplicates, & more reliable to work with.
So, for example, if you have a customer, and this customer can have two emails; the main email
and the alternative one. Now, you can just put two columns in your table, each column represent
on of these emails (repeating group ☹).
Alternatively, you can have two rows for this customer, one with the main email, and the other
row with the alternative email, while keeping the other columns’ values as they are (repeating
group ☹).
But, “What if you want to extend this, and add more emails?”. We don’t want to have a table that
has repeating group of data.
So, what are going to do is to extract the customer emails in a new table. This new table will have
a foreign key points to the primary key of the original table, and another column for the email
values. Both the foreign key and the email value are together form the primary key of the new
table.
1
First Normal Form
Now, we can say that our tables are in the first normal form, let’s make it in the second normal
form.
An example could be, let’s say you have a table that stores books information. A book could have
a name, author, and date released.
According to the requirements, we can have book that’s written by more than one author, but we
can’t have more than one book with the same book name and the same author. So, both name and
author form a composite primary key.
2
A book table
Now, the date column relies only on the name of the book, it doesn’t care about who wrote this
book, it cares about the name of the book. And if you noticed we’ve written the date “2011” more
than one time with “Database Systems” book. This is not a good practice when you have
duplicates.
So, what we are going to do is to create a separate table, with one-to-many relationship with the
original table, it has the book name as a primary key, and all the dependent attributes (like date).
Now, the book name in the book table is a foreign key that points to the primary key of the new
created table, and every non-primary column in the book table (if exist) depends on the whole
composite primary key; name and author, and not one of them.
3
Second Normal Form
Now, we can say that our tables are in the second normal form, let’s make it in the third one.
An example for that could be with the course table, where every course has an name as a primary
key, room number and capacity.
4
Third Normal Form
Now, it’s obviously noted that the capacity depends on the room, it has nothing to do with the
course name. This means we have a non-key attribute that depends on another non-key attribute.
Why this sounds like a problem? Well, “What happens if somebody reached into this table, and
they changed that room for a specific course, but they didn’t change the capacity?”.
So, what we are going to do is the same, we will create a new table with one-to-many relationship
with the original table, it has the room number as a primary key, and all the dependent attributes
like the capacity.
Now, the room number in the course table is a foreign key that points to the primary key of the
new created table.
Derived Attribute
Another example of third normal form is when you have a (non-primary) derived attribute that
depends on other non-primary attributes.
5
We would remove that derived column form this table, as we can figure it out when we need to.
One of the main reasons for this is to prevent any conflicts, any inaccurate data to be inserted
mistakenly.
Many database systems offer you the option of defining a computed or calculated column. It’s not
actually stored in the database, it is a convenient read-only fiction. It’s value is automatically
calculated based on the other columns in the table, and you may find that useful from time to
time.
Denormalization
Sometimes you will hit the situation where you will break the normalization rules for performance
improvements, because normalization is often involves splitting data into multiple tables.
If you are following along, you remember at the first normal form, when we explained the
example of emails. “What happens if we added two email columns to the customer table?.”
Technically, this can be described as breaking first normal form. It’s a repeating group. But in
practice, you may find it more convenient to just allow an email1 and email2 columns, because
you’re sure there won’t be a flexible number of email addresses.
This will improve your performance, and save you from creating more tables.