Normalization
Normalization
Intro
Good database design must be matched with good table structures E-R diagrams help use with overall design, but a good conceptual design does not necessarily lead to a good table structures Tables are the basic building blocks of the database We wish to avoid data anomalies/redundancies by controlling the table structure logically The process of identifying and eliminating data anomalies and redundancies is called normalization
Normalization
Step-by-step process for eliminating data redundancies and anomalies Enables us to recognize bad table structures Enables us to create good table structures Stages are called normal forms each better than previous (less anomalies/redundancy)
First normal form (1NF) Second normal form (2NF) Third normal form (3NF)
Normalization
There are fourth and fifth normal forms that are seldom used Highest level not always the most desirable In general, the higher the level the slower the database response because of underlying pointer movement Most professionally designed databases reach third normal form
Desired Report
Proj No Proj Name 1 Hurricane Emp No 101 102 104 Emp Name John News David Senior Anne Ramoras Job Class Elec Eng Comm Tech Comm Tech Chg/Hr Hrs Billed 65 13 60 16 60 19 Sub Tot 65 55 15 17 Sub Tot 18 14 Sub Tot Total Tot Chg 845 960 1,140 2,945 975 935 1,910 1,080 840 1,920 6,775
2 Coast
3 Satellite
60 60
Table
P_No P_Name 1 Hurricane E_No 101 102 104 101 103 104 102 E_Name John News David Senior Anne Ramoras John News June Arbough Anne Ramoras David Senior Job_Class Chg_Hr Elec Eng 65 Comm Tech 60 Comm Tech 60 Elec Eng 65 Biol Eng 55 Comm Tech 60 Comm Tech 60 Hrs 13 16 19 15 17 18 14
2 Coast 3 Satellite
Group 2
Chg_Hr1 Hrs1 E_No2 E_Name2 65 13 102 David Senior 65 15 103 June Arbough 60 18 102 David Senior
Etc.
Job_Class2 Chg_Hr2 Hrs2 Comm Tech 60 16 Biol Eng 55 17 Comm Tech 60 14
Problems
P_No intended to be primary key but contains null values Data redundancies
Invites data inconsistencies (Elect Eng & EE)
Anomalies
Update anomaly modify Job_Class for E_No 101 requires many alterations Insert anomaly to add a project row we need an employee Deletion anomaly delete E_No 101, we delete other vital data too
Problems
Date redundancy
If add new employee to project 2 must type:
2 C oas t 104 A nne Ram oras Com m Tec h 60 19
Wastes data entry time Wastes storage space Leads to data inconsistency
Huricane or Hurricane or Hurracaine
Conversion to 1NF
Table above has repeating groups Each P_No has a group of entries
P_No P_Name 1 Hurricane E_No 101 102 104 E_Name John News David Senior Anne Ramoras Job_Class Chg_Hr Elec Eng 65 Comm Tech 60 Comm Tech 60 Hrs 13 16 19
1NF
Eliminate repeating groups By adding entries in primary key column (at least)
P_No 1 1 1 2 2 3 3 P_Name Hurricane Hurricane Hurricane Coast Coast Satellite Satellite E_No 101 102 104 101 103 104 102 E_Name John News David Senior Anne Ramoras John News June Arbough Anne Ramoras David Senior Job_Class Chg_Hr Elec Eng 65 Comm Tech 60 Comm Tech 60 Elec Eng 65 Biol Eng 55 Comm Tech 60 Comm Tech 60 Hrs 13 16 19 15 17 18 14
Problems
Primary key P_No does not uniquely identify all attributes in row Must create composite key made up of P_No & E_No
Dependency Diagram
Helps us to discover relationships between entity attributes Upper arrows implies dependency on P_No & E_No Lower arrows implies dependency on only one attribute
P_No
P_Name
E_No E_Name
Job_Class
Chg_Hr
Hrs
Dependencies
Upper arrows If you know P_No & E_No you can determine the other row values Lower arrows Partial dependencies based on only part of key P_Name only dependent on P_No E_Name, Job_Class, Chg_Hr only dependent on E_No Dependency diagram may be written: P_No, E_No P_Name, E_Name, Job_Class, Chg_Hr, Hrs P_No P_Name E_No E_Name, Job_Class, Chg_Hr
New Table
Composite primary key P_No & E_No
Charges Table
P_No 1 1 1 2 2 3 3 E_No 101 102 104 101 103 104 102 P_Name Hurricane Hurricane Hurricane Coast Coast Satellite Satellite E_Name John News David Senior Anne Ramoras John News June Arbough Anne Ramoras David Senior Job_Class Chg_Hr Elec Eng 65 Comm Tech 60 Comm Tech 60 Elec Eng 65 Biol Eng 55 Comm Tech 60 Comm Tech 60 Hrs 13 16 19 15 17 18 14
1NF Definition
1. All the key attributes are defined Any attribute that is part of the primary key 1. There are no repeating groups in the table Each cell can contain one and only one value, rather than set 1. All attributes are dependent on the primary key
Problems
Contains partial dependencies
Dependencies base on only part of the primary key
This makes table subject to data redundancies and hence to data anomalies Redundancy caused by fact that every row entry requires duplicate data
E.g., suppose E_No 105 is entered 20 times, must also enter E_Name, Job_Class, Chg_Hr
Conversion to 2NF
1. Starting with 1NF write each of the key components on separate lines, then write the original key on the last line P_No E_No P_No E_No Each will become key in a new table Original table split into three tables
Conversion to 2NF
2. Write the dependent attributes after each of the new keys using the dependency diagram P_No P_Name E_No E_Name, Job_Class, Chg_Hr P_No E_No Hrs
Assign Table
P_No 1 1 1 2 2 3 3 E_No 101 102 104 101 103 104 102 Hrs 13 16 19 15 17 18 14
2NF Definition
1. Table is in 1NF and 2. It includes no partial dependencies (no attribute is dependent on only a portion of the primary key) Note: Since partial dependencies can exist only if there is a composite key, a table with a single attribute as primary key is automatically in 2NF if it is in 1NF
Conversion to 3NF
Break off the pieces that are identified by the transitive dependency arrows (lower arrows) in the dependency diagram Store them in a separate table
P_No P_Name E_No E_Name, Job_Class P_No E_No Hrs Job_Class Chg_Hr
Note: Job_Class must be retained in Employee table to establish a link to the newly created Job table
New Tables
Project Table P_No P_Name 1 Hurricane 2 Coast 3 Satellite Assign Table
P_No 1 1 1 2 2 3 3 E_No 101 102 104 101 103 104 102 Hrs 13 16 19 15 17 18 14
Employee Table
E_No 101 102 103 104 E_Name John News David Senior June Arbough Anne Ramoras Job_Class Elec Eng Comm Tech Biol Eng Comm Tech
Job Table
Chg_Hr 55 60 65
3NF Definition
1. Table is in 2NF and 2. It contains no transitive dependencies
Problem
Although the four tables are in 3NF, we have a potential problem The Job_Class is entered for each new employee in the Employee table For example, too easy to enter Electrical Engr, or EE, or El Eng
Problem
Employee Table
E_Name John News David Senior June Arbough Anne Ramoras John Smith Alice White Bob Jones
Job_Class Elec Eng Comm Tech Biol Eng Comm Tech Comm Tech Biol Eng Elec Eng
New Attribute
Create a Job_Code attribute to serve as primary key in the Job table and as a foreign key in the Employee table
Changed Tables
Project Table P_No P_Name 1 Hurricane 2 Coast 3 Satellite Assign Table
P_No 1 1 1 2 2 3 3 E_No 101 102 104 101 103 104 102 Hrs 13 16 19 15 17 18 14
Employee Table
E_No 101 102 103 104 E_Name John News David Senior June Arbough Anne Ramoras Job_Code 502 501 500 501
Job Table
Chg_Hr 55 60 65
3NF Version
Vast improvement over original design No data anomalies
In the Job table each job code has single job class and charge per hour entry
No opportunities to use different values describing same object
Similarly for Employee & Project tables only one entry for each attribute Also, Assign table has only what is needed
Summary
1NF Eliminate repeating groups 2NF Eliminate partial dependencies 3NF Eliminate transitive dependencies Tables are the critical building blocks for the database It is important to design their structure well Normalization is a formal process for doing this that reduces data redundancy and anomalies
End
References: New Perspectives on Microsoft Access 2000, Introductory, by Adamski, Hommel, and Finnegan, Course Technology, 1999. Access Database Design and Programming, Third Edition, by Roman, OReilly, 2002. Database Systems: Design, Implementation, and Management, by Rob & Coronel, Boyd & Fraser, 1995.