Data Base_Database_Databse Chapter 9
Data Base_Database_Databse Chapter 9
Fundamentalsof DatabaseSystems
(Database I)
Lecturer:Barakzai
Fundamentalsof Database Systems
Normalization
What is normalization in database?
• Database Normalization is a technique of organizing
the data in the database.
• Normalization is a systematic approach of
decomposing tables to eliminate data
redundancy(repetition) and undesirable characteristics
like Insertion, Update and Deletion Anomalies.
• It is a multi-step process that puts data into tabular
form, removing duplicated data from the relation tables.
Continue…
• Normalization is used for mainly two purposes,
• Eliminating redundant(useless) data.
• Ensuring data dependencies make sense i.e data is logically
stored.
Why normalization is important in database?
2 Ahmad Bcs
3 Sara Bcs
Types of normalization
• Normalization can be achieved in multiple ways
• We have 3 basic forms of normalization but the forms are divided
into further forms.
• 1st normal form
• 2nd normal form
• 3rd normal form
• BCNF (Boyce code normal form)
• 4th normal form
• 5th normal form
What is First Normal Form (1NF)?
• Rules for First Normal Form
The first normal form expects you to follow a few simple rules
while designing your database, and they are:
• Rule 1: Single Valued Attributes
Each column of your table should be single valued which means
they should not contain multiple values. We will explain this with
help of an example later, let's see the other rules for now.
• Rule 2: Attribute Domain should not change
This is more of a "Common Sense" rule. In each column the
values stored must be of the same kind or type.
Continue…
For example: If you have a column dob to save date of births of a set of
people, then you cannot or you must not save 'names' of some of them in
that column along with 'date of birth' of others in that column. It should
hold only 'date of birth' for all the records/rows.
Example:
• Although all the rules are self explanatory still let's take an
example where we will create a table to store student data which
will have student's roll no., their name and the name of subjects
they have opted for.
• Here is our table, with some sample data added to it.
Continue…
• Our table already satisfies 3 rules out of the 4 rules, as all our
column names are unique, we have stored data in the order we
wanted to and we have not inter-mixed different type of data in
columns.
• But out of the 3 different students in our table, 2 have opted for
more than 1 subject. And we have stored the subject names in a
single column. But as per the 1st Normal form each column must
contain atomic value.
How to solve this Problem?
• It's very simple, because all we have to do is break the values into
atomic values.
• Here is our updated table and it now satisfies the First Normal
Form.
Continue…
• By doing so, although a few values are getting repeated but values for
the subject column are now atomic for each record/row.
• Using the First Normal Form, data redundancy increases, as there will be
many columns with same data in multiple rows but each row as a whole
will be unique.
What is Second Normal Form?
• For a table to be in the Second Normal Form, it must satisfy two
conditions:
1.The table should be in the First Normal Form.
2.There should be no Partial Dependency.
• What is Dependency?
• Let's take an example of a Student table with columns student_id, name,
reg_no(registration number), branch and address(student's home
address).
Continue…
• In this table, student_id is the primary key and will be unique for every
row, hence we can use student_id to fetch any row of data from this
table
• Even for a case, where student names are same, if we know the
student_id we can easily fetch the correct record.
Continue…
• Hence we can say a Primary Key for a table is the column or a group of
columns(composite key) which can uniquely identify each record in the
table.
• I can ask from branch name of student with student_id 10, and I can get
it. Similarly, if I ask for name of student with student_id 10 or 11, I will
get it. So all I need is student_id and every other column depends on it,
or can be fetched using it.
• For a simple table like Student, a single column like student_id can
uniquely identfy all the records in a table.
• But this is not true all the time. So now let's extend our example to see if
more than 1 column together can act as a primary key.
• Let's create another table for Subject, which will have subject_id and
subject_name fields and subject_id will be the primary key.
Continue…
• Now we have a Student table with student information and
another table Subject for storing subject information.
• Let's create another table Score, to store the marks obtained by
students in the respective subjects. We will also be saving name
of the teacher who teaches that subject along with marks.
Continue…
• In the score table we are saving the student_id to know which student's
marks are these and subject_id to know for which subject the marks are
for.
• See, if I ask you to get me marks of student with student_id 10, can you
get it from this table? No, because you don't know for which subject.
And if I give you subject_id, you would not know for which student.
Hence we need student_id + subject_id to uniquely identify any row.
But where is Partial Dependency?
• Now if you look at the Score table, we have a column names teacher
which is only dependent on the subject, for Java it's Java Teacher and for
C++ it's C++ Teacher & so on.
• Now as we just discussed that the primary key for this table is a
composition of two columns which is student_id & subject_id but the
teacher's name only depends on subject, hence the subject_id, and has
nothing to do with student_id.
• The simplest solution is to remove columns teacher from Score table and
add it to the Subject table. Hence, the Subject table will become:
Quick Recap
• For a table to be in the Second Normal form, it should be in the First
Normal form and it should not have Partial Dependency.
• Partial Dependency exists, when for a composite primary key, any
attribute in the table depends only on a part of the primary key and not
on the complete primary key.
• To remove Partial dependency, we can divide the table, remove the
attribute which is causing partial dependency, and move it to some other
table where it fits in well.
Third Normal Form (3NF)
• Third Normal Form is an upgrade to Second Normal Form. When a
table is in the Second Normal Form and has no transitive
dependency, then it is in the Third Normal Form.
• One student can enrol for multiple subjects. For example, student with
student_id 101, has opted for subjects - Java & C++
• For each subject, a professor is assigned to the student.
• And, there can be multiple professors teaching one subject like we have
for Java.
Continue…
What do you think should be the Primary Key?
Well, in the table above student_id, subject together form the primary key,
because using student_id and subject, we can find all the columns of the
table.
One more important point to note here is, one professor teaches only one
subject, but one subject may have two different professors.
Hence, there is a dependency between subject and professor here, where
subject depends on the professor name.
Continue…
This table satisfies the 1st Normal form because all the values are atomic,
column names are unique and all the values stored in a particular column
are of same domain.
This table also satisfies the 2nd Normal Form as their is no Partial
Dependency.
And, there is no Transitive Dependency, hence the table also satisfies the
3rd Normal Form.
But this table is not in Boyce-Codd Normal Form.
Why this table is not in BCNF?
In the table above, student_id, subject form primary key, which means
subject column is a prime attribute.
Below we have a college enrolment table with columns s_id, course and
hobby.
As you can see in the table above, student with s_id 1 has opted for two
courses, Science and Maths, and has two hobbies, Cricket and Hockey.
Continue…
You must be thinking what problem this can lead to, right?
Well the two records for student with s_id 1, will give rise to two more
records, as shown below, because for one student, two hobbies exists,
hence along with both the courses, these hobbies should be specified.
Continue…
And, in the table above, there is no relationship between the columns
course and hobby. They are independent of each other.
So there is multi-value dependency, which leads to un-necessary
repetition of data and other anomalies as well.
How to satisfy 4th Normal Form?
• To make the above relation satify the 4th normal form, we can
decompose the table into 2 tables.
Continue…
Now this relation satisfies the fourth normal form.
A table can also have functional dependency along with multi-valued dependency.
In that case, the functionally dependent columns are moved in a separate table
and the multi-valued dependent columns are moved to separate tables.
If you design your database carefully, you can easily avoid these issues.
Continue…
• 5NF (Fifth Normal Form) Rules
• A table is in 5th Normal Form only if it is in 4NF and it cannot be
of smaller tables without loss of data.
• 6NF (Sixth Normal Form) Proposed
• 6th Normal Form is not standardized, yet however, it is being discussed
for some time. Hopefully, we would have a clear & standardized
in the near future…
• That’s all to SQL Normalization!!!
Thank you