0% found this document useful (0 votes)
33 views

Normalization

Normalization is the process of transforming a database relation into a set of relations that minimize redundancy and dependency to reduce data anomalies. It involves removing repeating groups and partial and transitive dependencies through a series of normal forms up to Boyce-Codd normal form. The document discusses the goals, theory, and anomalies of normalization, and provides examples of transforming relations into first, second, third, and Boyce-Codd normal forms through removing redundancy and dependencies.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views

Normalization

Normalization is the process of transforming a database relation into a set of relations that minimize redundancy and dependency to reduce data anomalies. It involves removing repeating groups and partial and transitive dependencies through a series of normal forms up to Boyce-Codd normal form. The document discusses the goals, theory, and anomalies of normalization, and provides examples of transforming relations into first, second, third, and Boyce-Codd normal forms through removing redundancy and dependencies.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

What Is Normalization?

Normalization is the branch of relational theory that provides design insights. It is the process of
determining how much redundancy exists in a table. The goals of normalization are to:

 Be able to characterize the level of redundancy in a relational schema


 Provide mechanisms for transforming schemas in order to remove redundancy

Normalization theory draws heavily on the theory of functional dependencies. Normalization


theory defines six normal forms (NF). Each normal form involves a set of dependency properties
that a schema must satisfy and each normal form gives guarantees about the presence and/or
absence of update anomalies. This means that higher normal forms have less redundancy, and as
a result, fewer update problems.

If a database design is not perfect, it may contain anomalies, which are like a bad dream for any
database administrator. Managing a database with anomalies is next to impossible.

 Update anomalies − If data items are scattered and are not linked to each other properly,
then it could lead to strange situations. For example, when we try to update one data item
having its copies scattered over several places, a few instances get updated properly while
a few others are left with old values. Such instances leave the database in an inconsistent
state.
 Deletion anomalies − We tried to delete a record, but parts of it was left undeleted
because of unawareness, the data is also saved somewhere else.
 Insert anomalies − We tried to insert data in a record that does not exist at all.

Normalization is a method to remove all these anomalies and bring the database to a consistent
state.

Normal Forms
All the tables in any database can be in one of the normal forms we will discuss next. Ideally we
only want minimal redundancy for PK to FK. Everything else should be derived from other
tables. There are six normal forms, but we will only look at the first four, which are:

 First normal form (1NF)


 Second normal form (2NF)
 Third normal form (3NF)
 Boyce-Codd normal form (BCNF)

BCNF is rarely used.


First Normal Form (1NF)
In the first normal form, only single values are permitted at the intersection of each row and
column; hence, there are no repeating groups.

To normalize a relation that contains a repeating group, remove the repeating group and form
two new relations.

The PK of the new relation is a combination of the PK of the original relation plus an attribute
from the newly created relation for unique identification.

Process for 1NF

We will use the Student_Grade_Report table below, from a School database, as our example to
explain the process for 1NF.

Student_Grade_Report (StudentNo, StudentName, Major, CourseNo, CourseName, InstructorNo,


InstructorName, InstructorLocation, Grade)

 In the Student Grade Report table, the repeating group is the course information. A student can
take many courses.
 Remove the repeating group. In this case, it’s the course information for each student.
 Identify the PK for your new table.
 The PK must uniquely identify the attribute value (StudentNo and CourseNo).
 After removing all the attributes related to the course and student, you are left with the student
course table (StudentCourse).
 The Student table (Student) is now in first normal form with the repeating group removed.
 The two new tables are shown below.

Student (StudentNo, StudentName, Major)


StudentCourse (StudentNo, CourseNo, CourseName, InstructorNo, InstructorName, InstructorLocation,
Grade)

It defines that all the attributes in a relation must have atomic domains. The values in an atomic domain
are indivisible units.
Second Normal Form (2NF)
For the second normal form, the relation must first be in 1NF. The relation is automatically in
2NF if, and only if, the PK comprises a single attribute.

If the relation has a composite PK, then each non-key attribute must be fully dependent on the
entire PK and not on a subset of the PK (i.e., there must be no partial dependency or
augmentation).

Process for 2NF

To move to 2NF, a table must first be in 1NF.

 The Student table is already in 2NF because it has a single-column PK.


 When examining the Student Course table, we see that not all the attributes are fully dependent
on the PK; specifically, all course information. The only attribute that is fully dependent is grade.
 Identify the new table that contains the course information.
 Identify the PK for the new table.
 The three new tables are shown below.

Student (StudentNo, StudentName, Major)


CourseGrade (StudentNo, CourseNo, Grade)

CourseInstructor (CourseNo, CourseName, InstructorNo, InstructorName, InstructorLocation)


Remove partial dependency

Third Normal Form (3NF)


To be in third normal form, the relation must be in second normal form. Also all transitive
dependencies must be removed; a non-key attribute may not be functionally dependent on
another non-key attribute.

Process for 3NF

 Eliminate all dependent attributes in transitive relationship(s) from each of the tables that have
a transitive relationship.
 Create new table(s) with removed dependency.
 Check new table(s) as well as table(s) modified to make sure that each table has a determinant
and that no table contains inappropriate dependencies.
 See the four new tables below.

Student (StudentNo, StudentName, Major)


CourseGrade (StudentNo, CourseNo, Grade)

Course (CourseNo, CourseName, InstructorNo)

Instructor (InstructorNo, InstructorName, InstructorLocation)


For a relation to be in Third Normal Form, it must be in Second Normal form and the following
must satisfy −

 No non-prime attribute is transitively dependent on prime key attribute.


 For any non-trivial functional dependency, X → A, then either −
o X is a superkey or,
o A is prime attribute.

At this stage, there should be no anomalies in third normal form. Let’s look at the dependency
diagram (Figure 12.1) for this example. The first step is to remove repeating groups, as discussed
above.

Student (StudentNo, StudentName, Major)

StudentCourse (StudentNo, CourseNo, CourseName, InstructorNo, InstructorName,


InstructorLocation, Grade)

To recap the normalization process for the School database, review the dependencies shown in
Figure 12.1.

Figure 12.1 Dependency diagram, by A. Watt.

The abbreviations used in Figure 12.1 are as follows:

 PD: partial dependency


 TD: transitive dependency
 FD: full dependency (Note: FD typically stands for functional dependency. Using FD as an
abbreviation for full dependency is only used in Figure 12.1.)
Boyce-Codd Normal Form (BCNF)
When a table has more than one candidate key, anomalies may result even though the relation is
in 3NF. Boyce-Codd normal form is a special case of 3NF. A relation is in BCNF if, and only if,
every determinant is a candidate key.

Boyce-Codd Normal Form


Boyce-Codd Normal Form (BCNF) is an extension of Third Normal Form on strict terms. BCNF
states that −

 For any non-trivial functional dependency, X → A, X must be a super-key.

In the above image, Stu_ID is the super-key in the relation Student_Detail and Zip is the super-
key in the relation ZipCodes. So,

Stu_ID → Stu_Name, Zip

and

Zip → City

BCNF Example 1

Consider the following table (St_Maj_Adv).

Student_id Major Advisor

111 Physics Smith

111 Music Chan

320 Math Dobbs

671 Physics White

803 Physics Smith

The semantic rules (business rules applied to the database) for this table are:

1. Each Student may major in several subjects.


2. For each Major, a given Student has only one Advisor.
3. Each Major has several Advisors.
4. Each Advisor advises only one Major.
5. Each Advisor advises several Students in one Major.

The functional dependencies for this table are listed below. The first one is a candidate key; the
second is not.

1. Student_id, Major ——> Advisor


2. Advisor ——> Major

Anomalies for this table include:

1. Delete – student deletes advisor info


2. Insert – a new advisor needs a student
3. Update – inconsistencies

Note: No single attribute is a candidate key.

PK can be Student_id, Major or Student_id, Advisor.

To reduce the St_Maj_Adv relation to BCNF, you create two new tables:

1. St_Adv (Student_id, Advisor)


2. Adv_Maj (Advisor, Major)

St_Adv table

Student_id Advisor

111 Smith

111 Chan

320 Dobbs

671 White

803 Smith

Adv_Maj table

Advisor Major

Smith Physics

Chan Music
Dobbs Math

White Physics

You might also like