0% found this document useful (0 votes)
54 views

Normalization: ITM 692 Sanjay Goel

The document discusses normalization, which is the process of structuring a database to minimize redundancy and dependency. It involves: 1) Eliminating repeating groups by creating separate tables for related attributes and giving each a primary key. 2) Eliminating redundant data by removing attributes that depend on only part of a key. 3) Isolating independent and semantically related relationships to avoid anomalies. Proper normalization results in scalar values, absence of redundancy, minimal nulls, and minimal information loss. The document outlines various normalization forms up to fifth normal form.

Uploaded by

Aroosa Sheikh
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views

Normalization: ITM 692 Sanjay Goel

The document discusses normalization, which is the process of structuring a database to minimize redundancy and dependency. It involves: 1) Eliminating repeating groups by creating separate tables for related attributes and giving each a primary key. 2) Eliminating redundant data by removing attributes that depend on only part of a key. 3) Isolating independent and semantically related relationships to avoid anomalies. Proper normalization results in scalar values, absence of redundancy, minimal nulls, and minimal information loss. The document outlines various normalization forms up to fifth normal form.

Uploaded by

Aroosa Sheikh
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 34

Normalization

ITM 692
Sanjay Goel

05/30/21 Sanjay Goel, School of Business, University at 1 of 34


Albany
Normalization
Definition
• This is the process which allows you to winnow out redundant
data within your database.
– The results of a well executed normalization process are the same as
those of a well planned E-R model
• This involves restructuring the tables to successively meeting
higher forms of Normalization.
• A properly normalized database should have the following
characteristics
– Scalar values in each fields
– Absence of redundancy.
– Minimal use of null values.
– Minimal loss of information.

(Note: Winnow(Webster): To get rid of / eliminate inferior material


05/30/21 Sanjay Goel, School of Business, University at 2 of 34
Albany
Normalization
Process
• Eliminate Repeating Groups
– Make a separate table for each set of related attributes and give each
table a primary key.
• Eliminate Redundant Data
– If an attribute depends on only part of a multivalued key, remove it to a
separate table.
• Eliminate Columns not dependent on key
– If attributes do not contribute to a description of the key, remove them
to a separate table.

05/30/21 Sanjay Goel, School of Business, University at 3 of 34


Albany
Normalization
Process
• Isolate Independent multiple relationships
– No table may contain two or more 1:n or n:m relationships that are not
directly related.
• Isolate Semantically Related Multiple Relationships
– There may be practical constraints on information that justify separating
logically related many-to-many relationships.

05/30/21 Sanjay Goel, School of Business, University at 4 of 34


Albany
Normalization
Levels
• Levels of normalization based on the amount of redundancy in the
database.
• Relational theory defines a number of structure conditions called Normal
Forms that assure that certain data anomalies do not occur in a database.
• Various levels of normalization are:
– First Normal Form (1NF)

Number of Tables
Redundancy
– Second Normal Form (2NF)

Complexity
– Third Normal Form (3NF)
– Boyce-Codd Normal Form (BCNF)
– Fourth Normal Form (4NF)
– Fifth Normal Form (5NF)
– Domain Key Normal Form (DKNF)

Most
Mostdatabases
databasesshould
shouldbe
be3NF
3NFor
orBCNF
BCNFininorder
ordertotoavoid
avoidthe
thedatabase
databaseanomalies.
anomalies.

05/30/21 Sanjay Goel, School of Business, University at 5 of 34


Albany
Normalization
Levels

1NF 1NF Keys; No repeating groups or


multi-valued
2NF
3NF/BCNF 2NF No partial dependencies
4NF 3NF No transitive dependencies
5NF
BCNF Determinants are candidate keys
DKNF
4NF No multivalued dependencies
5NF No multivalued dependencies
4NF No multivalued dependencies

Each
Eachhigher
higherlevel
levelisisaasubset
subsetof
ofthe
thelower
lowerlevel
level
05/30/21 Sanjay Goel, School of Business, University at 6 of 34
Albany
Normalization
First Normal Form (1NF)
A table is considered to be in 1NF if all the fields contain
only scalar values (as opposed to list of values).

Example (Not 1NF)


ISBN Title AuName AuPhone PubName PubPhone Price

0-321-32132-1 Balloon Sleepy, 321-321-1111, Small House 714-000-0000 $34.00


Snoopy, 232-234-1234,
Grumpy 665-235-6532

0-55-123456-9 Main Street Jones, 123-333-3333, Small House 714-000-0000 $22.95


Smith 654-223-3455
0-123-45678-0 Ulysses Joyce 666-666-6666 Alpha Press 999-999-9999 $34.00

1-22-233700-0 Visual Roman 444-444-4444 Big House 123-456-7890 $25.00


Basic

Author
Authorand
andAuPhone
AuPhonecolumns
columnsare
arenot
notscalar
scalar

05/30/21 Sanjay Goel, School of Business, University at 7 of 34


Albany
Normalization
1NF: Decomposition
1. Place all items appearing in the repeating group in a new table
2. Designate a primary key for each new table produced.
3. Create a relationship between the two tables
• For 1:N relation duplicate the P.K. from 1 side to many side
• For M:N relation create a new table with P.K. from both tables
Example (1NF) ISBN AuName AuPhone

ISBN Title PubName PubPhone Price 0-321-32132-1 Sleepy 321-321-1111

0-321-32132-1 Balloon Small House 714-000-0000 $34.00


0-321-32132-1 Snoopy 232-234-1234

0-55-123456-9 Main Street Small House 714-000-0000 $22.95


0-321-32132-1 Grumpy 665-235-6532

0-123-45678-0 Ulysses Alpha Press 999-999-9999 $34.00


0-55-123456-9 Jones 123-333-3333

1-22-233700-0 Visual Big House 123-456-7890 $25.00


0-55-123456-9 Smith 654-223-3455
Basic

0-123-45678-0 Joyce 666-666-6666

1-22-233700-0 Roman 444-444-4444

05/30/21 Sanjay Goel, School of Business, University at 8 of 34


Albany
Normalization
Functional Dependencies
1. If one set of attributes in a table determines another set of
attributes in the table, then the second set of attributes is
said to be functionally dependent on the first set of
attributes.

Example 1
ISBN Title Price Table Scheme: {ISBN, Title, Price}
0-321-32132-1 Balloon $34.00 Functional Dependencies: {ISBN}  {Title}
0-55-123456-9 Main Street $22.95 {ISBN}  {Price}
0-123-45678-0 Ulysses $34.00

1-22-233700-0 Visual $25.00


Basic

05/30/21 Sanjay Goel, School of Business, University at 9 of 34


Albany
Normalization
Functional Dependencies
Example 2
PubID PubName PubPhone Table Scheme: {PubID, PubName, PubPhone}
1 Big House 999-999-9999 Functional Dependencies: {PubId}  {PubPhone}
2 Small House 123-456-7890
{PubId}  {PubName}
3 Alpha Press 111-111-1111
{PubName, PubPhone}  {PubID}

Example 3
AuID AuName AuPhone Table Scheme: {AuID, AuName, AuPhone}
1 Sleepy 321-321-1111 Functional Dependencies: {AuId}  {AuPhone}
2 Snoopy 232-234-1234
{AuId}  {AuName}
3 Grumpy 665-235-6532
{AuName, AuPhone}  {AuID}
4 Jones 123-333-3333

5 Smith 654-223-3455

6 Joyce 666-666-6666

7 Roman 444-444-4444

05/30/21 Sanjay Goel, School of Business, University at 10 of 34


Albany
Normalization
Dependency Diagram
• The primary key components are bold, underlined, and shaded in a
different color.
• The arrows above entities indicate all desirable dependencies, i.e.,
dependencies that are based on PK.
• The arrows below the dependency diagram indicate less desirable
dependencies -- partial dependencies and transitive dependencies

Example:

05/30/21 Sanjay Goel, School of Business, University at 11 of 34


Albany
Normalization
Functional Dependencies: Example
Database to track reviews of papers submitted to an academic
conference. Prospective authors submit papers for review and
possible acceptance in the published conference proceedings.
Details of the entities:
– Author information includes a unique author number, a name, a mailing
address, and a unique (optional) email address.
– Paper information includes the primary author, the paper number, the
title, the abstract, and review status (pending, accepted,rejected)
– Reviewer information includes the reviewer number, the name, the
mailing address, and a unique (optional) email address
– A completed review includes the reviewer number, the date, the paper
number, comments to the authors, comments to the program chairperson,
and ratings (overall, originality, correctness, style, clarity)

05/30/21 Sanjay Goel, School of Business, University at 12 of 34


Albany
Normalization
Functional Dependencies: Example
Functional Dependencies
– AuthNo  AuthName, AuthEmail, AuthAddress
– AuthEmail  AuthNo
– PaperNo  Primary-AuthNo, Title, Abstract, Status
– RevNo  RevName, RevEmail, RevAddress
– RevEmail  RevNo
– RevNo, PaperNo  AuthComm, Prog-Comm, Date,
Rating1, Rating2, Rating3, Rating4, Rating5

05/30/21 Sanjay Goel, School of Business, University at 13 of 34


Albany
Normalization
Second Normal Form (2NF)
For a table to be in 2NF, there are two requirements:
– The database is in first normal form
– All nonkey attributes in the table must be functionally dependent on the
entire primary key
Note: Remember that we are dealing with non-key attributes

Example 1 (Not 2NF)


Scheme  {StudentId, CourseId, StudentName, CourseTitle, Grade}
1. Key  {StudentId, CourseId}
2. {StudentId}  {StudentName}
3. {CourseId}  {CourseTitle}
4. {StudentId, CourseId}  {Grade}
5. StudentName depends on a subset of the key I.e. StudentId
6. CourseTitle depends on a subset of the key. i.e. CourseId

05/30/21 Sanjay Goel, School of Business, University at 14 of 34


Albany
Normalization
Second Normal Form (2NF)
Example 2 (Not 2NF)
Scheme  {City, Street, HouseNumber, HouseColor, CityPopulation}
1. key  {City, Street, HouseNumber}
2. {City, Street, HouseNumber}  {HouseColor}
3. {City}  {CityPopulation}
4. CityPopulation does not belong to any key.
5. CityPopulation is functionally dependent on the City which is a proper subset
of the key

Example 3 (Not 2NF)


Scheme  {studio, movie, budget, studio_city}
1. Key  {studio, movie}
2. {studio, movie}  {budget}
3. {studio}  {studio_city}
4. studio_city is not a part of a key
5. studio_city functionally depends on studio which is a proper subset of the key

05/30/21 Sanjay Goel, School of Business, University at 15 of 34


Albany
Normalization
2NF: Decomposition
1. If a data item is fully functionally dependent on only a part of the
primary key, move that data item and that part of the primary key
to a new table.
2. If other data items are functionally dependent on the same part of
the key, place them in the new table also
3. Make the partial primary key copied from the original table the
primary key for the new table.
(Place all items that appear in the repeating group in a new table)

Example 1 (Convert to 2NF)


Old Scheme  {StudentId, CourseId, StudentName, CourseTitle, Grade}
New Scheme  {StudentId, StudentName}
New Scheme  {CourseId, CourseTitle}
New Scheme  {StudentId, CourseId, Grade}

05/30/21 Sanjay Goel, School of Business, University at 16 of 34


Albany
Normalization
2NF: Decomposition
Example 2 (Convert to 2NF)
Old Scheme  {StudioID, Movie, Budget, StudioCity}
New Scheme  {Movie, StudioID, Budget}
New Scheme  {Studio, City}

Example 3 (Convert to 2NF)


Old Scheme  {City, Street, HouseNumber, HouseColor, CityPopulation}
New Scheme  {City, Street, HouseNumber, HouseColor}
New Scheme  {City, CityPopulation}

05/30/21 Sanjay Goel, School of Business, University at 17 of 34


Albany
Normalization
Third Normal Form (3NF)
• This form dictates that all non-key attributes of a table must be
functionally dependent on a candidate key such that there are no
interdependencies among non-key attributes i.e. there should be no
transitive dependencies

• For a table to be in 3NF, there are two requirements


– The table should be second normal form
– No attribute is transitively dependent on the primary key

Title PubID BookType Price


Example (Not in 3NF) Moby Dick 1 Adventure 34.95

Scheme  {Title, PubID, BookType, Price } Giant 2 Adventure 34.95


MobyDick 2 Adventure 34.95
1. Key  {Title, PubId}
Iliad 1 War 44.95
2. {Title, PubId}  {BookType} Romeo &Juliet 1 Love 59.90
3. {BookType}  {Price}
4. Both Price and BookType depend on a key hence 2NF
5. Transitively {Title, PubID}  {Price} hence not in 3NF

05/30/21 Sanjay Goel, School of Business, University at 18 of 34


Albany
Normalization
Third Normal Form (3NF)
Example 2 (Not in 3NF)
Scheme  {StudioID, StudioCity, CityTemp}
1. Primary Key  {StudioID}
2. {StudioID}  {StudioCity}
3. {StudioCity}  {CityTemp}
4. {StudioID}  {CityTemp}
5. Both StudioCity and CityTemp depend on the entire key hence 2NF
6. CityTemp transitively depends on Studio hence violates 3NF

Example 3 (Not in 3NF)


Scheme  {BuildingID, Contractor, Fee} BuildingID Contractor Fee
1. Primary Key  {BuildingID}
100 Randolph 1200
2. {BuildingID}  {Contractor}
150 Ingersoll 1100
3. {Contractor}  {Fee}
200 Randolph 1200
4. {BuildingID}  {Fee}
5. Fee transitively depends on the BuildingID 250 Pitkin 1100

6. Both Contractor and Fee depend on the entire key hence


300 2NF Randolph 1200

05/30/21 Sanjay Goel, School of Business, University at 19 of 34


Albany
Normalization
3NF: Decomposition
1. Move all items involved in transitive dependencies to a new entity.
2. Identify a primary key for the new entity.
3. Place the primary key for the new entity as a foreign key on the
original entity.

Example 1 (Convert to 3NF)


Old Scheme  {Title, PubID, BookType, Price }
New Scheme  {BookType, Price}
New Scheme  {Title, PubID, BookType}

05/30/21 Sanjay Goel, School of Business, University at 20 of 34


Albany
Normalization
3NF: Decomposition
Example 2 (Convert to 3NF)
Old Scheme  {StudioID, StudioCity, CityTemp}
New Scheme  {StudioID, StudioCity}
New Scheme  {StudioCity, CityTemp}

BuildingID Contractor

Example 3 (Convert to 3NF) 100 Randolph


150 Ingersoll
Old Scheme  {BuildingID, Contractor, Fee}
200 Randolph
New Scheme  {BuildingID, Contractor} 250 Pitkin
New Scheme  {Contractor, Fee} 300 Randolph

Contractor Fee

Randolph 1200
Ingersoll 1100
Pitkin 1100
05/30/21 Sanjay Goel, School of Business, University at 21 of 34
Albany
Normalization
Boyce-Codd Normal Form (BCNF)
• BCNF does not allow dependencies between attributes that belong to candidate
keys.
• BCNF is a refinement of the third normal form in which it drops the restriction
of a non-key attribute from the 3rd normal form.
• Third normal form and BCNF are not same if following conditions are true:
– The table has two or more candidate keys
– At least two of the candidate keys are composed of more than one attribute
– The keys are not disjoint i.e. The composite candidate keys share some attributes
Example 1 - Address (Not in BCNF)
Scheme  {City, Street, ZipCode}
1. Key1  {City, Street }
2. Key2  {ZipCode, Street}
3. No non-key attribute hence 3NF
4. {City, Street}  {ZipCode}
5. {ZipCode}  {City}
6. Dependency between attributes belonging to a key

05/30/21 Sanjay Goel, School of Business, University at 22 of 34


Albany
Normalization
Boyce-Codd Normal Form (BCNF)
Example 2 - Movie (Not in BCNF)
Scheme  {MovieTitle, StudioID, MovieID, ActorName, Role, Payment }
1. Key1  {MovieTitle, StudioID, ActorName}
2. Key2  {MovieID, ActorName}
3. Both role and payment functionally depend on both candidate keys thus 3NF
4. {MovieID}  {MovieTitle}
5. Dependency between MovieID & MovieTitle Violates BCNF

Example 3 - Consulting (Not in BCNF)


Scheme  {Client, Problem, Consultant}
(Only one consultant works on a specific client problem)
1. Key1  {Client, Problem}
2. Key2  {Client, Consultant}
3. No non-key attribute hence 3NF
4. {Client, Problem}  {Consultant}
5. {Client, Consultant}  {Problem}
6. Dependency between attributes belonging to keys violates BCNF

05/30/21 Sanjay Goel, School of Business, University at 23 of 34


Albany
Normalization
BCNF: Decomposition
1. Place the two candidate primary keys in separate entities
2. Place each of the remaining data items in one of the resulting
entities according to its dependency on the primary key.
Example 1 (Convert to BCNF)
Old Scheme  {City, Street, ZipCode }
New Scheme1  {ZipCode, Street}
New Scheme2  {City, Street}
• Loss of relation {ZipCode}  {City}
Alternate New Scheme1  {ZipCode, Street }
Alternate New Scheme2  {ZipCode, City}

05/30/21 Sanjay Goel, School of Business, University at 24 of 34


Albany
Normalization
Decomposition: Loss of Information
1. If decomposition does not cause any loss of information it is
called a lossless decomposition.
2. If a decomposition does not cause any dependencies to be lost it
is called a dependency-preserving decomposition.
3. Any table scheme can be decomposed in a lossless way into a
collection of smaller schemas that are in BCNF form. However
the dependency preservation is not guaranteed.
4. Any table can be decomposed in a lossless way into 3 rd normal
form that also preserves the dependencies.
• 3NF may be better than BCNF in some cases

Use
Useyour
yourown
ownjudgment
judgmentwhen
whendecomposing
decomposingschemas
schemas
05/30/21 Sanjay Goel, School of Business, University at 25 of 34
Albany
Normalization
BCNF: Decomposition
Example 2 (Convert to BCNF)
Old Scheme  {MovieTitle, StudioID, MovieID, ActorName, Role, Payment }
New Scheme  {MovieID, ActorName, Role, Payment}
New Scheme  {MovieTitle, StudioID, ActorName}
• Loss of relation {MovieID}  {MovieTitle}
New Scheme  {MovieID, ActorName, Role, Payment}
New Scheme  {MovieID, MovieTitle}
• We got the {MovieID}  {MovieTitle} relationship back

Example 3 (Convert to BCNF)


Old Scheme  {Client, Problem, Consultant}
New Scheme  {Client, Consultant}
New Scheme  {Client, Problem}
Loss or Relation {Consultant, Problem}
New Schema  {Client, Consultant}
New Schema  {Consultant, Problem}
Sanjay Goel, School of Business, University at
05/30/21 26 of 34
Albany
Normalization
Fourth Normal Form (4NF)
  • Fourth normal form eliminates independent many-to-one relationships
between columns.
• To be in Fourth Normal Form,
– a relation must first be in Boyce-Codd Normal Form. 
– a given relation may not contain more than one multi-valued attribute.
Example (Not in 4NF)
Scheme  {MovieName, ScreeningCity, Genre)
Primary Key: {MovieName, ScreeningCity, Genre)
1. All columns are a part of the only candidate key, hence BCNF
2. Many Movies can have the same Genre
Movie ScreeningCity Genre
3. Many Cities can have the same movie Hard Code Los Angles Comedy

4. Violates 4NF Hard Code New York Comedy


Bill Durham Santa Cruz Drama

Bill Durham Durham Drama


The Code Warrier New York Horror

05/30/21 Sanjay Goel, School of Business, University at 27 of 34


Albany
Normalization
Fourth Normal Form (4NF)
Manager Child  Employee
Example 2 (Not in 4NF)    
Jim Beth Alice
Scheme  {Manager, Child, Employee}
Mary Bob Jane
1. Primary Key  {Manager, Child, Employee}
Mary NULL Adam
2. Each manager can have more than one child
3. Each manager can supervise more than one employee
4. 4NF Violated

Example 3 (Not in 4NF)


Scheme  {Employee, Skill, ForeignLanguage} Employee Skill Language

1. Primary Key  {Employee, Skill, Language } 1234 Cooking French


2. Each employee can speak multiple languages 1234 Cooking German
3. Each employee can have multiple skills 1453 Carpentry Spanish
4. Thus violates 4NF 1453 Cooking Spanish
2345 Cooking Spanish

05/30/21 Sanjay Goel, School of Business, University at 28 of 34


Albany
Normalization
4NF: Decomposition
1. Move the two multi-valued relations to separate tables
2. Identify a primary key for each of the new entity.

Example 1 (Convert to 4NF)


Old Scheme  {MovieName, ScreeningCity, Genre}
New Scheme  {MovieName, ScreeningCity}
New Scheme  {MovieName, Genre}

Movie Genre Movie ScreeningCity


Hard Code Comedy Hard Code Los Angles

Bill Durham Drama Hard Code New York

The Code Warrier Horror Bill Durham Santa Cruz

Bill Durham Durham

The Code Warrier New York

05/30/21 Sanjay Goel, School of Business, University at 29 of 34


Albany
Normalization
4NF: Decomposition
Example 2 (Convert to 4NF) Manager Child  Manager Employee
   
Old Scheme  {Manager, Child, Employee} Jim Beth Jim Alice

New Scheme  {Manager, Child} Mary Bob Mary Jane


Mary Adam
New Scheme  {Manager, Employee}

Example 3 (Convert to 4NF)


Old Scheme  {Employee, Skill, ForeignLanguage}
New Scheme  {Employee, Skill}
New Scheme  {Employee, ForeignLanguage}
Employee Skill Employee Language
1234 Cooking 1234 French

1453 Carpentry 1234 German

1453 Cooking 1453 Spanish

2345 Cooking 2345 Spanish

05/30/21 Sanjay Goel, School of Business, University at 30 of 34


Albany
Normalization
Fifth Normal Form (5NF)
  • Fifth normal form applies to M-Way relationships.
• In 5NF all tables are broken into as many tables as possible in order to
avoid redundancy.
• Once it is in fifth normal form it cannot be broken into smaller
relations without changing the facts or the meaning. 

05/30/21 Sanjay Goel, School of Business, University at 31 of 34


Albany
Normalization
Domain Key Normal Form (DKNF)
  • A relation is in DKNF if all constraints and dependencies on the relation can
be enforced by enforcing the domain constraints and key constraints on the
relation.
– A domain is the set of permissible values for an attribute.
• By enforcing key and domain restrictions, the database is assured of being
freed from modification (insertion & deletion) anomalies.
• Designed to specify the “ultimate normal form” which uses all possible
types of dependencies and constraints.
– DKNF is the normalization level that most designers aim to achieve.
– The practical utility of DKNF is limited, because it is difficult to specify
general integrity constraints.
• It has been shown that a relation in DKNF is in 5NF and that DKNF is not
always achievable.

05/30/21 Sanjay Goel, School of Business, University at 32 of 34


Albany
Normalization
Domain Key Normal Form (DKNF)
  • Example (Relations with complex constraints)
– CAR = {MAKE, VIN#}, MANUFACTURE = {VIN#, COUNTRY} where
COUNTRY is the country where the car was manufactured.
– A complex constraint is For a Toyota or Lexus made in Japan, the first
character of the VIN# is a “J”; for a Honda or Acura made in Japan, the second
character of the VIN# is a “J”.
• Example (Normalization)
– R = {BRANCH, ACCTNUM, BALANCE}
– Constraint: An ACCTNUM beginning with 9 is a special account which
requires a minimum balance of $2,500.
– R is not in DKNF.
– Replace R by the decomposition D = {R1, R2} where R1 = {BRANCH,
ACCTNUM, BALANCE} with the constraint that an ACCTNUM does not
begin with 9 and R2 = {BRANCH, ACCTNUM, BALANCE} with the
constraints that an ACCTNUM begins with 9 and the BALANCE is greater
than or equal to 2500.
– D is in DKNF. Sanjay Goel, School of Business, University at
05/30/21 33 of 34
Albany
Normalization
Summary
 
• Different Stages of Normalization
– 1NF Keys; No repeating groups
– 2NF No partial dependencies
– 3NF No transitive dependencies
– BCNF Determinants are candidate keys
– 4NF No multivalued dependencies
– 5NF Remove m-way relationships
– DKNF Use domain constraints to enforce dependencies

05/30/21 Sanjay Goel, School of Business, University at 34 of 34


Albany

You might also like