0% found this document useful (0 votes)
1 views

Data Base_Database_Databse Chapter 9

The document provides an overview of database normalization, explaining its importance in organizing data to eliminate redundancy and prevent anomalies during data operations. It details the various forms of normalization, including First, Second, and Third Normal Forms, as well as Boyce-Codd Normal Form and Fourth Normal Form, highlighting the rules and conditions for each. The document also illustrates common issues related to data redundancy and the solutions provided by normalization techniques.

Uploaded by

shabir.ahmad1317
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

Data Base_Database_Databse Chapter 9

The document provides an overview of database normalization, explaining its importance in organizing data to eliminate redundancy and prevent anomalies during data operations. It details the various forms of normalization, including First, Second, and Third Normal Forms, as well as Boyce-Codd Normal Form and Fourth Normal Form, highlighting the rules and conditions for each. The document also illustrates common issues related to data redundancy and the solutions provided by normalization techniques.

Uploaded by

shabir.ahmad1317
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

Faculty of Computer Science

Fundamentalsof DatabaseSystems
(Database I)

Lecturer:Barakzai
Fundamentalsof Database Systems

Normalization
What is normalization in database?
• Database Normalization is a technique of organizing
the data in the database.
• Normalization is a systematic approach of
decomposing tables to eliminate data
redundancy(repetition) and undesirable characteristics
like Insertion, Update and Deletion Anomalies.
• It is a multi-step process that puts data into tabular
form, removing duplicated data from the relation tables.
Continue…
• Normalization is used for mainly two purposes,
• Eliminating redundant(useless) data.
• Ensuring data dependencies make sense i.e data is logically
stored.
Why normalization is important in database?

• Normalization is a technique for organizing data in a


database. It is important that a database is normalized to
minimize redundancy (duplicate data) and to ensure only
related data is stored in each table. It also prevents any
issues stemming from database modifications such as
insertions, deletions, and updates.
• So we will start with understanding what all problems arise if a table
or database is not normalized and how normalization solve these
problems?
• Normalization is actually a technique of organizing the data into
multiple related tables, to minimize data redundancy.
• So now the question is what is data redundancy?
• Data redundancy is nothing but the repetition of similar data at
multiple places.
• Why we want to reduce data redundancy?
• We want to reduce it because the repetition of data increases the
size of database.
It also leads to multiple other issues like
• Insertion problems
• Deletion problems
• Updation problems
Roll number Name Branch Head of Office
department telephone
1 Ali BCS Mr.Sabit 12345

2 Ahmad BCS Mr.Sabit 12345

3 Sara BCS Mr.Sabit 12345

4 Omar BCS Mr.Sabit 12345


• So from the table we can observe that the branch, hod and
office telephone number is the same for all the entries of
the table so whenever a new entry is added in the table so
we must repeat the same thing again and again so that is
called data redundancy.
• We can conclude from this that data redundancy leads to
• Insertion anomaly
• Deletion anomaly
• Updation anomaly
• Insertion anomaly; In this case if we want to create a new entry in
the table we have to type repeatedly the same data and if we do it
for the more 500 students so it is obvious it will consume more
space.
( To insert redundant data for every new row of student data in our
class is a data insertion problem or anomaly)
• Deletion anomaly; In this case if we delete one student information
the branch information will also be deleted and if we delete all
students from the table so we will lose the entire branch
information from the table.
(Loss of a related dataset when some other dataset is deleted is
called Deletion anomaly)
Continue…

• Updation anomaly; In this case we can present an example


like if head of department leaves the university for any reason
and the new head comes so we have to update all student
rows and if we miss any of these rows so it will lead to data
inconsistency.
So now the question is how the normalization will solve all these
problems?
So the normalization will break the existence table into to different
table.
student table branch table
Roll number Name Branch Branch Head of Office
department telephone

1 Ali Bcs Bcs Mr.Sabit 12345

2 Ahmad Bcs

3 Sara Bcs
Types of normalization
• Normalization can be achieved in multiple ways
• We have 3 basic forms of normalization but the forms are divided
into further forms.
• 1st normal form
• 2nd normal form
• 3rd normal form
• BCNF (Boyce code normal form)
• 4th normal form
• 5th normal form
What is First Normal Form (1NF)?
• Rules for First Normal Form
The first normal form expects you to follow a few simple rules
while designing your database, and they are:
• Rule 1: Single Valued Attributes
Each column of your table should be single valued which means
they should not contain multiple values. We will explain this with
help of an example later, let's see the other rules for now.
• Rule 2: Attribute Domain should not change
This is more of a "Common Sense" rule. In each column the
values stored must be of the same kind or type.
Continue…
For example: If you have a column dob to save date of births of a set of
people, then you cannot or you must not save 'names' of some of them in
that column along with 'date of birth' of others in that column. It should
hold only 'date of birth' for all the records/rows.

• Rule 3: Unique name for Attributes/Columns


This rule expects that each column in a table should have a unique name.
This is to avoid confusion at the time of retrieving data or performing any
other operation on the stored data.
If one or more columns have same name, then the DBMS system will be
left confused.
Continue…
• Rule 4: Order doesn't matters
This rule says that the order in which you store the data in your table
doesn't matter.

Example:
• Although all the rules are self explanatory still let's take an
example where we will create a table to store student data which
will have student's roll no., their name and the name of subjects
they have opted for.
• Here is our table, with some sample data added to it.
Continue…
• Our table already satisfies 3 rules out of the 4 rules, as all our
column names are unique, we have stored data in the order we
wanted to and we have not inter-mixed different type of data in
columns.
• But out of the 3 different students in our table, 2 have opted for
more than 1 subject. And we have stored the subject names in a
single column. But as per the 1st Normal form each column must
contain atomic value.
How to solve this Problem?
• It's very simple, because all we have to do is break the values into
atomic values.
• Here is our updated table and it now satisfies the First Normal
Form.
Continue…
• By doing so, although a few values are getting repeated but values for
the subject column are now atomic for each record/row.

• Using the First Normal Form, data redundancy increases, as there will be
many columns with same data in multiple rows but each row as a whole
will be unique.
What is Second Normal Form?
• For a table to be in the Second Normal Form, it must satisfy two
conditions:
1.The table should be in the First Normal Form.
2.There should be no Partial Dependency.

• What is Dependency?
• Let's take an example of a Student table with columns student_id, name,
reg_no(registration number), branch and address(student's home
address).
Continue…
• In this table, student_id is the primary key and will be unique for every
row, hence we can use student_id to fetch any row of data from this
table
• Even for a case, where student names are same, if we know the
student_id we can easily fetch the correct record.
Continue…
• Hence we can say a Primary Key for a table is the column or a group of
columns(composite key) which can uniquely identify each record in the
table.

• I can ask from branch name of student with student_id 10, and I can get
it. Similarly, if I ask for name of student with student_id 10 or 11, I will
get it. So all I need is student_id and every other column depends on it,
or can be fetched using it.

• This is Dependency and we also call it Functional Dependenc


What is Partial Dependency?
• Now that we know what dependency is, we are in a better state to
understand what partial dependency is.

• For a simple table like Student, a single column like student_id can
uniquely identfy all the records in a table.

• But this is not true all the time. So now let's extend our example to see if
more than 1 column together can act as a primary key.

• Let's create another table for Subject, which will have subject_id and
subject_name fields and subject_id will be the primary key.
Continue…
• Now we have a Student table with student information and
another table Subject for storing subject information.
• Let's create another table Score, to store the marks obtained by
students in the respective subjects. We will also be saving name
of the teacher who teaches that subject along with marks.
Continue…
• In the score table we are saving the student_id to know which student's
marks are these and subject_id to know for which subject the marks are
for.

• Together, student_id + subject_id forms a Candidate Key(learn about


Database Keys) for this table, which can be the Primary key.

• Confused, How this combination can be a primary key?

• See, if I ask you to get me marks of student with student_id 10, can you
get it from this table? No, because you don't know for which subject.
And if I give you subject_id, you would not know for which student.
Hence we need student_id + subject_id to uniquely identify any row.
But where is Partial Dependency?
• Now if you look at the Score table, we have a column names teacher
which is only dependent on the subject, for Java it's Java Teacher and for
C++ it's C++ Teacher & so on.

• Now as we just discussed that the primary key for this table is a
composition of two columns which is student_id & subject_id but the
teacher's name only depends on subject, hence the subject_id, and has
nothing to do with student_id.

• This is Partial Dependency, where an attribute in a table depends on only


a part of the primary key and not on the whole key.
How to remove Partial Dependency?
• There can be many different solutions for this, but out objective is to
remove teacher's name from Score table.

• The simplest solution is to remove columns teacher from Score table and
add it to the Subject table. Hence, the Subject table will become:
Quick Recap
• For a table to be in the Second Normal form, it should be in the First
Normal form and it should not have Partial Dependency.
• Partial Dependency exists, when for a composite primary key, any
attribute in the table depends only on a part of the primary key and not
on the complete primary key.
• To remove Partial dependency, we can divide the table, remove the
attribute which is causing partial dependency, and move it to some other
table where it fits in well.
Third Normal Form (3NF)
• Third Normal Form is an upgrade to Second Normal Form. When a
table is in the Second Normal Form and has no transitive
dependency, then it is in the Third Normal Form.

So let's use the same example, where we have 3 tables, Student,


Subject and Score.
Requirements for Third Normal Form
• For a table to be in the third normal form,
1.It should be in the Second Normal form.
2.And it should not have Transitive Dependency.

What is Transitive Dependency?

With exam_name and total_marks added to our Score table, it


saves more data now. Primary key for our Score table is a
composite key, which means it's made up of two attributes or
columns → student_id + subject_id.
Continue…
Our new column exam_name depends on both student and subject.
For example, a mechanical engineering student will have Workshop
exam but a computer science student won’t.
And for some subjects you have Prctical exams and for some you
don't. So we can say that exam_name is dependent on both
student_id and subject_id.

And what about our second new column total_marks? Does it


depend on our Score table's primary key?
Continue…
Well, the column total_marks depends on exam_name as with exam
type the total score changes. For example, practicals are of less
marks while theory exams are of more marks.

But, exam_name is just another column in the score table. It is not a


primary key or even a part of the primary key, and total_marks
depends on it.

This is Transitive Dependency. When a non-prime attribute depends


on other non-prime attributes rather than depending upon the prime
attributes or primary key.
How to remove Transitive Dependency?
• Again the solution is very simple. Take out the columns exam_name and
total_marks from Score table and put them in an Exam table and use the
exam_id wherever required.
Advantage of removing Transitive Dependency
The advantage of removing transitive dependency is:
• Amount of data duplication is reduced.
• Data integrity achieved.
Boyce-Codd Normal Form (BCNF)
Rules for BCNF
• For a table to satisfy the Boyce-Codd Normal Form, it should
satisfy the following two conditions:
1.It should be in the Third Normal Form.
2.And, for any dependency A → B, A should be a super key.
• The second point sounds a bit tricky, right? In simple words, it
means, that for a dependency A → B, A cannot be a non-prime
attribute, if B is a prime attribute.
Continue…
Below we have a college enrolment table with
columns student_id, subject and professor.
Continue..
• As you can see, we have also added some sample data to the table.

• In the table above:

• One student can enrol for multiple subjects. For example, student with
student_id 101, has opted for subjects - Java & C++
• For each subject, a professor is assigned to the student.
• And, there can be multiple professors teaching one subject like we have
for Java.
Continue…
What do you think should be the Primary Key?

Well, in the table above student_id, subject together form the primary key,
because using student_id and subject, we can find all the columns of the
table.
One more important point to note here is, one professor teaches only one
subject, but one subject may have two different professors.
Hence, there is a dependency between subject and professor here, where
subject depends on the professor name.
Continue…
This table satisfies the 1st Normal form because all the values are atomic,
column names are unique and all the values stored in a particular column
are of same domain.

This table also satisfies the 2nd Normal Form as their is no Partial
Dependency.

And, there is no Transitive Dependency, hence the table also satisfies the
3rd Normal Form.
But this table is not in Boyce-Codd Normal Form.
Why this table is not in BCNF?
In the table above, student_id, subject form primary key, which means
subject column is a prime attribute.

But, there is one more dependency, professor → subject.

And while subject is a prime attribute, professor is a non-prime attribute,


which is not allowed by BCNF.
How to satisfy BCNF?
• To make this relation(table) satisfy BCNF, we will decompose this
table into two tables, student table and professor table.

• Below we have the structure for both the tables.


Rules for 4th Normal Form
• For a table to satisfy the Fourth Normal Form, it should satisfy the
following two conditions:
1.It should be in the Boyce-Codd Normal Form.
2.And, the table should not have any Multi-valued Dependency.

What is Multi-valued Dependency?


A table is said to have multi-valued dependency, if the following conditions
are true,
For a dependency A → B, if for a single value of A, multiple value of B
exists, then the table may have multi-valued dependency.
Continue…
Also, a table should have at-least 3 columns for it to have a multi-valued
dependency.
And, for a relation R(A,B,C), if there is a multi-valued dependency
between, A and B, then B and C should be independent of each other.
If all these conditions are true for any relation(table), it is said to have
multi-valued dependency.

Below we have a college enrolment table with columns s_id, course and
hobby.
As you can see in the table above, student with s_id 1 has opted for two
courses, Science and Maths, and has two hobbies, Cricket and Hockey.
Continue…
You must be thinking what problem this can lead to, right?
Well the two records for student with s_id 1, will give rise to two more
records, as shown below, because for one student, two hobbies exists,
hence along with both the courses, these hobbies should be specified.
Continue…
And, in the table above, there is no relationship between the columns
course and hobby. They are independent of each other.
So there is multi-value dependency, which leads to un-necessary
repetition of data and other anomalies as well.
How to satisfy 4th Normal Form?
• To make the above relation satify the 4th normal form, we can
decompose the table into 2 tables.
Continue…
Now this relation satisfies the fourth normal form.
A table can also have functional dependency along with multi-valued dependency.
In that case, the functionally dependent columns are moved in a separate table
and the multi-valued dependent columns are moved to separate tables.
If you design your database carefully, you can easily avoid these issues.
Continue…
• 5NF (Fifth Normal Form) Rules
• A table is in 5th Normal Form only if it is in 4NF and it cannot be
of smaller tables without loss of data.
• 6NF (Sixth Normal Form) Proposed
• 6th Normal Form is not standardized, yet however, it is being discussed
for some time. Hopefully, we would have a clear & standardized
in the near future…
• That’s all to SQL Normalization!!!
Thank you

You might also like