0% found this document useful (0 votes)
4 views

A draft note about DataBase Normalization

Database normalization is a process that organizes a database into tables and columns to minimize data redundancy and avoid modification anomalies. It involves three main normal forms, each progressively reducing duplication and ensuring that tables serve a single purpose. Understanding these forms helps in maintaining data integrity and simplifying database management.

Uploaded by

Spyros Veronikis
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

A draft note about DataBase Normalization

Database normalization is a process that organizes a database into tables and columns to minimize data redundancy and avoid modification anomalies. It involves three main normal forms, each progressively reducing duplication and ensuring that tables serve a single purpose. Understanding these forms helps in maintaining data integrity and simplifying database management.

Uploaded by

Spyros Veronikis
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Introduction

Database normalization is process used to organize a database into tables and


columns. The idea is that a table should be about a specific topic and that only
those columns which support that topic are included. For example, a spreadsheet
containing information about sales people and customers serves several purposes:
 Identify sales people in your organization
 List all customers your company calls upon to sell product
 Identify which sales people call on specific customers.
By limiting a table to one purpose you reduce the number of duplicate data that is
contained within your database, which helps eliminate some issues stemming from
database modifications. To assist in achieving these objectives, some rules for
database table organization have been developed. The stages of organization are
called normal forms; there are three normal forms most databases adhere to using.
As tables satisfy each successive normalization form, they become less prone to
database modification anomalies and more focused toward a sole purpose or topic.
Before we move on be sure you understand the definition of a database table.

Reasons for Normalization


There are three main reasons to normalize a database. The first is to minimize
duplicate data, the second is to minimize or avoid data modification issues, and the
third is to simplify queries. As we go through the various states of normalization we’ll
discuss how each form addresses these issues, but to start, let’s look at some data
which hasn’t been normalized and discuss some potential pitfalls. Once these are
understood, I think you’ll better appreciate the reason to normalize the data.
Consider the following table:

Note: The primary key columns are underlined

The first thing to notice is this table serves many purposes including:
1. Identifying the organization’s salespeople
2. Listing the sales offices and phone numbers
3. Associating a salesperson with an sales office
4. Showing each salesperson’s customers
As a DBA this raises a red flag. In general I like to see tables that have one
purpose. Having the table serve many purposes introduces many of the challenges;
namely, data duplication, data update issues, and increased effort to query data.

Data Duplication and Modification Anomalies


Notice that for each SalesPerson we have listed both the SalesOffice and
OfficeNumber. This information is duplicated for each SalesPerson. Duplicated
information presents two problems:

1. It increases storage and decrease performance.

2. It becomes more difficult to maintain data changes.


For example

 Consider if we move the Chicago office to Evanston, IL. To properly reflect this in our table,

we need to update the entries for all the SalesPersons currently in Chicago. Our table is a

small example, but you can see if it were larger, that potentially this could involve hundreds

of updates.

 Also consider what would happen if John Hunt quits. If we remove his entry, then we lose

the information for New York.


These situations are modification anomalies. There are three modification
anomalies that can occur:

Insert Anomaly
There are facts we cannot record until we know information for the entire row. In our
example we cannot record a new sales office until we also know the sales person.
Why? Because in order to create the record, we need provide a primary key. In our
case this is the
EmployeeID.

Update Anomaly
The same information is recorded in multiple rows. For instance if the office number
changes, then there are multiple updates that need to be made. If these updates are
not successfully completed across all rows, then an inconsistency
occurs.

Deletion Anomaly
Deletion of a row can cause more than one set of facts to be removed. For instance,
if John Hunt retires, then deleting that row cause use to lose information about the
New York
office.

Definition of Normalization
There are three common forms of normalization: 1 st, 2nd, and 3rd normal form. There
are several additional forms, such as BCNF, but I consider those advanced, and not
too necessary to learn in the beginning. The forms are progressive, meaning that to
qualify for 3rd normal form a table must first satisfy the rules for 2 nd normal form, and
2nd normal form must adhere to those for 1stnormal form. Before we discuss the
various forms and rules in details, let’s summarize the various forms:
 First Normal Form – The information is stored in a relational table and each column
contains atomic values, and there are not repeating groups of columns.
 Second Normal Form – The table is in first normal form and all the columns depend
on the table’s primary key.
 Third Normal Form – the table is in second normal form and all of its columns are
not transitively dependent on the primary key

Terms

Functional Dependency: st_id -> st_name

Transitive Dependency: A -> B & B->C then A->C

Database normalization is the process of organizing the attributes and tables of a


relational database to minimize data redundancy.

Normalization
If a database design is not perfect it may contain anomalies, which are like a bad dream for
database itself. Managing a database with anomalies is next to impossible.

Normalization is a method to remove all these anomalies and bring database to consistent state
and free from any kinds of anomalies.

First Normal Form:


This is defined in the definition of relations (tables) itself. This rule defines that all the attributes
in a relation must have atomic domains. Values in atomic domain are indivisible units.

We re-arrange the relation (table) as below, to convert it to First Normal Form


Each attribute must contain only single value from its pre-defined domain.

Second Normal Form:

Before we learn about second normal form, we need to understand the following:

 Prime attribute: an attribute, which is part of prime-key, is prime attribute.


 Non-prime attribute: an attribute, which is not a part of prime-key, is said to be a
non-prime attribute.
Second normal form says, that every non-prime attribute should be fully functionally dependent
on prime key attribute. That is, if X → A holds, then there should not be any proper subset Y of X,
for that Y → A also holds.

We see here in Student_Project relation that the prime key attributes are Stu_ID and Proj_ID.
According to the rule, non-key attributes, i.e. Stu_Name and Proj_Name must be dependent
upon both and not on any of the prime key attribute individually. But we find that Stu_Name can
be identified by Stu_ID and Proj_Name can be identified by Proj_ID independently. This is
called partial dependency, which is not allowed in Second Normal Form.
We broke the relation in two as depicted in the above picture. So there exists no partial
dependency.

Third Normal Form:

For a relation to be in Third Normal Form, it must be in Second Normal form and the following
must satisfy:

 No non-prime attribute is transitively dependent on prime key attribute


 For any non-trivial functional dependency, X → A, then either
 X is a superkey or,
 A is prime attribute.

We find that in above depicted Student_detail relation, Stu_ID is key and only prime key
attribute. We find that City can be identified by Stu_ID as well as Zip itself. Neither Zip is a
superkey nor City is a prime attribute. Additionally, Stu_ID → Zip → City, so there
exists transitive dependency.

We broke the relation as above depicted two relations to bring it into 3NF.

You might also like