Normalization Document
Normalization Document
Igwe
Data normalization is a process in which data attributes within a data model are organized to
increase the cohesion of entity types. In other words, the goal of data normalization is to reduce
and even eliminate data redundancy, an important consideration for application developers
because it is incredibly difficult to stores objects in a relational database that maintains the same
information in several places. Table 1 summarizes the three most common forms of
normalization ( First normal form (1NF), Second normal form (2NF), and Third normal
form (3NF)) describing how to put entity types into a series of increasing levels of
normalization. Higher levels of data normalization are beyond the scope of this article. With
respect to terminology, a data schema is considered to be at the level of normalization of its
least normalized entity type. For example, if all of your entity types are at second normal form
(2NF) or higher then we say that your data schema is at 2NF.
Dr. Igwe
Table 1: Data Normalization Rules.
Level
Rule
Dr. Igwe
An important thing to notice is the application of primary and foreign keys in the new solution.
Order1NF has kept OrderID, the original key of Order0NF, as its primary key. To maintain
the relationship back to Order1NF, the OrderItem1NF table includes the OrderID column within
its schema, which is why it has the stereotype of FK. When a new table is introduced into a
schema, in this case OrderItem1NF, as the result of first normalization efforts it is common to
use the primary key of the original table (Order0NF) as part of the primary key of the new table.
Because OrderID is not unique for order items, you can have several order items on an order, the
column ItemSequence was added to form a composite primary key for the OrderItem1NF table.
A different approach to keys was taken with the ContactInformation1NF table. The column
ContactID, a surrogate key that has no business meaning, was made the primary key.
Dr. Igwe
the price of the item. The value of the SubtotalBeforeTax column within the Order2NF table is
the total of the values of the total price extended for each of its order items.
Figure 3. An Order in 2NF (UML Notation).
Dr. Igwe
4. Beyond 3NF
The data schema of Figure 4 can still be improved upon, at least from the point of view of data
redundancy, by removing attributes that can be calculated/derived from other ones. In this case
we could remove the SubtotalBeforeTax column within the Order3NF table and the
TotalPriceExtended column of OrderItem3NF, as you see in Figure 5.
Dr. Igwe
6. Denormalization
From a purist point of view you want to normalize your data structures as much as possible, but
from a practical point of view you will find that you need to 'back out" of some of your
normalizations for performance reasons. This is called "denormalization". For example, with
the data schema of Figure 1 all the data for a single order is stored in one row (assuming orders
of up to nine order items), making it very easy to access. With the data schema of Figure 1 you
could quickly determine the total amount of an order by reading the single row from the
Order0NF table. To do so with the data schema of Figure 5 you would need to read data from a
row in the Order table, data from all the rows from the OrderItem table for that order and data
from the corresponding rows in the Item table for each order item. For this query, the data
schema of Figure 1 very likely provides better performance.
Dr. Igwe