0% found this document useful (0 votes)
8 views

Normalization (2)

DBMS

Uploaded by

Arthy J
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Normalization (2)

DBMS

Uploaded by

Arthy J
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 39

Database Normalization

Outline

• Features of Good Relational Design


• Functional Dependencies
• Decomposition Using Functional Dependencies
• Normal Forms
• Functional Dependency Theory
• Algorithms for Decomposition using Functional Dependencies
• Decomposition Using Multivalued Dependencies
• More Normal Form
• Atomic Domains and First Normal Form
• Database-Design Process
• Modeling Temporal Data
ER versus Normalization

• Features of ER

– Entity sets and Relationship sets


– Mapping to tables

• Features of Normalization

– Sets of all attributes used in the database


– Distribution of the attributes to various tables
Normalization Goal

• Given a set of attributes


R = {A1 , A2 , A3 …. An}
• Partition the attributes among M
relations
– With no repetition of information
• Seems to be mission impossible.
Features of Good Relational Designs

• Consider the new relation in_dep that combines the instructor and
department tables

• There is repetition of information in the relation in_dep


• Need to use null values (if we add a new department with no
instructors)
• This relation is NOT in good form
Good Form
• A relation is said to be in “good” form if:
– There is no repetition of information
– There is no need to use null values

• The definition is a bit vague but good enough for now.


• Goal:
– Devise a scheme to make sure that all tables are in
good form
Decomposition
• The only way to avoid the repetition-of-information problem
in the in_dep schema is to decompose it into two schemas –
instructor and department schemas.
• Not all decompositions are good. Suppose we have a
scheme:

employee(ID, name, street, city, salary)


and we decompose it into
employee1 (ID, name)
employee2 (name, street, city, salary)
The problem arises when we have two employees with the
same name
• The next slide shows how we lose information when we
decompose the relation employee -- we cannot reconstruct
the original employee relation -- and so, this is a lossy
decomposition.
A Lossy Decomposition
Lossless Decomposition

• Let R be a relation schema and let R1 and R2 form a


decomposition of R . That is R = R1 U R2
• We say that the decomposition is a lossless
decomposition if there is no loss of information by
replacing R with the two relation schemas R 1 and R2
• Let r be any valid instance of R
• A decomposition is a lossless decomposition if
r =  R (r)  R (r)
1 2
• And conversely a decomposition is lossy if
r   R (r)  R (r)
1 2
Lossless Decomposition Example

• Decomposition of R = (A, B, C) Into


R1 = (A, B) R2 = (B, C)
• Example of a database instance
Keys and Functional Dependencies

• K is a superkey for relation schema R if and only if K  R


• K is a candidate key for R if and only if
– K  R, and

– For no   K,   R
• Functional dependencies allow us to express constraints that cannot be expressed
using superkeys. Consider the schema:
in_dep (ID, name, salary, dept_name, building, budget ).
We expect these functional dependencies to hold:
dept_name building
ID  building
but would not expect the following to hold:
dept_name  salary
Definition
• This is the process which allows you to winnow out
redundant data within your database.
• This involves restructuring the tables to successively
meeting higher forms of Normalization.
• A properly normalized database should have the
following characteristics
– Scalar values in each fields
– Absence of redundancy.
– Minimal use of null values.
– Minimal loss of information.
Levels of Normalization
• Levels of normalization based on the amount
of redundancy in the database.
• Various levels of normalization are:
– First Normal Form (1NF)

Redundancy
– Second Normal Form (2NF)
– Third Normal Form (3NF)

Number of Tables
– Boyce-Codd Normal Form (BCNF)

Complexity
Fourth Normal Form (4NF)
– Fifth Normal Form (5NF)
– Domain Key Normal Form (DKNF)

Most
Mostdatabases
databasesshould
shouldbe
be3NF
3NFororBCNF
BCNFin inorder
orderto
toavoid
avoid
the
thedatabase
databaseanomalies.
anomalies.
Levels of Normalization
1NF
2NF
3NF
4NF
5NF
DKNF

Each
Eachhigher
higherlevel
levelisisaasubset
subsetof
ofthe
thelower
lowerlevel
level
First Normal Form
(1NF)
A table is considered to be in 1NF if all the fields
contain
only scalar values (as opposed to list of values).
Example (Not 1NF)
ISBN Title AuName AuPhone PubName PubPhone Price

0-321-32132-1 Balloon Sleepy, 321-321-1111, Small House 714-000-0000 $34.00


Snoopy, 232-234-1234,
Grumpy 665-235-6532

0-55-123456-9 Main Street Jones, 123-333-3333, Small House 714-000-0000 $22.95


Smith 654-223-3455
0-123-45678-0 Ulysses Joyce 666-666-6666 Alpha Press 999-999-9999 $34.00

1-22-233700-0 Visual Roman 444-444-4444 Big House 123-456-7890 $25.00


Basic

Author
Authorand
andAuPhone
AuPhonecolumns
columnsare
arenot
notscalar
scalar
1NF - Decomposition
1. Place all items that appear in the repeating group
in a new table
2. Designate a primary key for each new table
produced.
3. Duplicate in the new table the primary key of the
table from which the repeating group was
extracted or vice versa.
ISBN AuName AuPhone
Example (1NF) 0-321-32132-1 Sleepy 321-321-1111

ISBN Title PubName PubPhone Price 0-321-32132-1 Snoopy 232-234-1234

0-321-32132-1 Balloon Small House 714-000-0000 $34.00 0-321-32132-1 Grumpy 665-235-6532

0-55-123456-9 Main Street Small House 714-000-0000 $22.95 0-55-123456-9 Jones 123-333-3333

0-123-45678-0 Ulysses Alpha Press 999-999-9999 $34.00 0-55-123456-9 Smith 654-223-3455

1-22-233700-0 Visual Big House 123-456-7890 $25.00 0-123-45678-0 Joyce 666-666-6666


Basic
1-22-233700-0 Roman 444-444-4444
Functional
Dependencies
1. If one set of attributes in a table determines
another set of attributes in the table, then the
second set of attributes is said to be
functionally dependent on the first set of
attributes.

Example 1
ISBN Title Price Table Scheme: {ISBN, Title, Price}
0-321-32132-1 Balloon $34.00 Functional Dependencies: {ISBN} 
0-55-123456-9 Main Street $22.95 {Title}
0-123-45678-0 Ulysses $34.00
{ISBN} 
{Price}
1-22-233700-0 Visual $25.00
Basic
Functional
Dependencies
Example 2
PubID PubName PubPhone Table Scheme: {PubID, PubName,
1 Big House 999-999-9999 PubPhone}
2 Small House 123-456-7890 Functional Dependencies: {PubId} 
3 Alpha Press 111-111-1111 {PubPhone}
{PubId} 
{PubName}
Example 3 {PubName, PubPhone} 
AuID AuName AuPhone {PubID}
1 Sleepy 321-321-1111
Table Scheme: {AuID, AuName,
2 Snoopy 232-234-1234
AuPhone}
3 Grumpy 665-235-6532 Functional Dependencies: {AuId} 
4 Jones 123-333-3333 {AuPhone}
5 Smith 654-223-3455 {AuId} 
6 Joyce 666-666-6666 {AuName}
7 Roman 444-444-4444 {AuName, AuPhone} 
{AuID}
FD – Example
Database to track reviews of papers submitted to an
academic conference. Prospective authors submit
papers for review and possible acceptance in the
published conference proceedings. Details of the entities
– Author information includes a unique author number, a
name, a mailing address, and a unique (optional) email
address.
– Paper information includes the primary author, the paper
number, the title, the abstract, and review status
(pending, accepted,rejected)
– Reviewer information includes the reviewer number, the
name, the mailing address, and a unique (optional) email
address
– A completed review includes the reviewer number, the
date, the paper number, comments to the authors,
comments to the program chairperson, and ratings
(overall, originality, correctness, style, clarity)
FD – Example
Functional Dependencies
– AuthNo  AuthName, AuthEmail, AuthAddress
– AuthEmail  AuthNo
– PaperNo  Primary-AuthNo, Title, Abstract,
Status
– RevNo  RevName, RevEmail, RevAddress
– RevEmail  RevNo
– RevNo, PaperNo  AuthComm, Prog-Comm,
Date, Rating1, Rating2, Rating3, Rating4,
Rating5
Second Normal Form
(2NF)
For a table to be in 2NF, there are two requirements
– The database is in first normal form
– All nonkey attributes in the table must be functionally
dependent on the entire primary key
Note: Remember that we are dealing with non-key attributes

Example 1 (Not 2NF)


Scheme  {Title, PubId, AuId, Price, AuAddress}
1. Key  {Title, PubId, AuId}
2. {Title, PubId, AuID}  {Price}
3. {AuID}  {AuAddress}
4. AuAddress does not belong to a key
5. AuAddress functionally depends on AuId which is a
subset of a key
Second Normal Form
Example 2 (Not 2NF)
(2NF)
Scheme  {City, Street, HouseNumber, HouseColor,
CityPopulation}
1. key  {City, Street, HouseNumber}
2. {City, Street, HouseNumber}  {HouseColor}
3. {City}  {CityPopulation}
4. CityPopulation does not belong to any key.
5. CityPopulation is functionally dependent on the City which is a
proper subset of the key

Example 3 (Not 2NF)


Scheme  {studio, movie, budget, studio_city}
6. Key  {studio, movie}
7. {studio, movie}  {budget}
8. {studio}  {studio_city}
9. studio_city is not a part of a key
10. studio_city functionally depends on studio which is a proper
subset of the key
2NF - Decomposition
1. If a data item is fully functionally dependent on only a
part of the primary key, move that data item and that
part of the primary key to a new table.
2. If other data items are functionally dependent on the
same part of the key, place them in the new table also
3. Make the partial primary key copied from the original
table the primary key for the new table. Place all items
that appear in the repeating group in a new table
Example 1 (Convert to 2NF)
Old Scheme  {Title, PubId, AuId, Price, AuAddress}
New Scheme  {Title, PubId, AuId, Price}
New Scheme  {AuId, AuAddress}
2NF - Decomposition
Example 2 (Convert to 2NF)
Old Scheme  {Studio, Movie, Budget, StudioCity}
New Scheme  {Movie, Studio, Budget}
New Scheme  {Studio, City}

Example 3 (Convert to 2NF)


Old Scheme  {City, Street, HouseNumber, HouseColor,
CityPopulation}
New Scheme  {City, Street, HouseNumber, HouseColor}
New Scheme  {City, CityPopulation}
Third Normal Form
(3NF)
This form dictates that all non-key attributes of a table must be
functionally dependent on a candidate key i.e. there can be
no interdependencies among non-key attributes.

For a table to be in 3NF, there are two requirements


– The table should be second normal form
– No attribute is transitively dependent on the primary key

Example (Not in 3NF)


Scheme  {Title, PubID, PageCount, Price }
1. Key  {Title, PubId}
2. {Title, PubId}  {PageCount}
3. {PageCount}  {Price}
4. Both Price and PageCount depend on a key hence 2NF
5. Transitively {Title, PubID}  {Price} hence not in 3NF
Third Normal Form
(3NF)
Example 2 (Not in 3NF)
Scheme  {Studio, StudioCity, CityTemp}
1. Primary Key  {Studio}
2. {Studio}  {StudioCity}
3. {StudioCity}  {CityTemp}
4. {Studio}  {CityTemp}
5. Both StudioCity and CityTemp depend on the entire key hence
2NF
6. CityTemp transitively depends on Studio hence violates 3NF
Contractor
Buildin Fee
gID
100 Randolp 120
Example 3 (Not in 3NF) h 0
150 Ingersoll 110
Scheme  {BuildingID, Contractor, Fee} 0
200 Randolp 120
7. Primary Key  {BuildingID} h 0
250 Pitkin 110
8. {BuildingID}  {Contractor} 0
9. {Contractor}  {Fee} 300 Randolp 120
h 0
10. {BuildingID}  {Fee}
11. Fee transitively depends on the BuildingID
12. Both Contractor and Fee depend on the entire key hence 2NF
3NF - Decomposition
1. Move all items involved in transitive dependencies to a
new entity.
2. Identify a primary key for the new entity.
3. Place the primary key for the new entity as a foreign
key on the original entity.
Example 1 (Convert to 3NF)
Old Scheme  {Title, PubID, PageCount, Price }
New Scheme  {PubID, PageCount, Price}
New Scheme  {Title, PubID, PageCount}
3NF - Decomposition
Example 2 (Convert to 3NF)
Old Scheme  {Studio, StudioCity, CityTemp}
New Scheme  {Studio, StudioCity}
New Scheme  {StudioCity, CityTemp}

Example 3 (Convert to 3NF) Buildin Contractor Contractor Fee


Old Scheme  {BuildingID, Contractor,gID
Fee}
100 Randolp Randolp 120
h h 0
New Scheme  {BuildingID, Contractor}150 Ingersoll Ingersoll 110
200 Randolp 0
New Scheme  {Contractor, Fee} Pitkin 110
h 0
250 Pitkin
300 Randolp
h
Boyce-Codd Normal Form

(BCNF)
BCNF does not allow dependencies between attributes that belong to
candidate keys.
• BCNF is a refinement of the third normal form in which it drops the
restriction of a non-key attribute from the 3rd normal form.
• Third normal form and BCNF are not same if the following conditions are
true:
– The table has two or more candidate keys
– At least two of the candidate keys are composed of more than one attribute
– The keys are not disjoint i.e. The composite candidate keys share some attributes

Example 1 - Address (Not in BCNF)


Scheme  {City, Street, ZipCode }
1. Key1  {City, Street }
2. Key2  {ZipCode, Street}
3. No non-key attribute hence 3NF
4. {City, Street}  {ZipCode}
5. {ZipCode}  {City}
6. Dependency between attributes belonging to a key
Boyce Codd Normal Form
(BCNF)
Example 2 - Movie (Not in BCNF)
Scheme  {MovieTitle, MovieID, PersonName, Role, Payment }

1. Key1  {MovieTitle, PersonName}


2. Key2  {MovieID, PersonName}
3. Both role and payment functionally depend on both candidate
keys thus 3NF
4. {MovieID}  {MovieTitle}
5. Dependency between MovieID & MovieTitle Violates BCNF

Example 3 - Consulting (Not in BCNF)


Scheme  {Client, Problem, Consultant}
6. Key1  {Client, Problem}
7. Key2  {Client, Consultant}
8. No non-key attribute hence 3NF
9. {Client, Problem}  {Consultant}
10. {Client, Consultant}  {Problem}
11. Dependency between attributess belonging to keys violates
BCNF
BCNF - Decomposition
1. Place the two candidate primary keys in
separate entities
2. Place each of the remaining data items in one
of the resulting entities according to its
dependency on the primary key.
Example 1 (Convert to BCNF)
Old Scheme  {City, Street, ZipCode }
New Scheme1  {ZipCode, Street}
New Scheme2  {City, Street}
• Loss of relation {ZipCode}  {City}
Alternate New Scheme1  {ZipCode, Street }
Alternate New Scheme2  {ZipCode, City}
Decomposition – Loss of
Information
1. If decomposition does not cause any loss of information
it is called a lossless decomposition.
2. If a decomposition does not cause any dependencies to
be lost it is called a dependency-preserving
decomposition.
3. Any table scheme can be decomposed in a lossless way
into a collection of smaller schemas that are in BCNF
form. However the dependency preservation is not
guaranteed.
4. Any table can be decomposed in a lossless way into 3rd
normal form that also preserves the dependencies.
• 3NF may be better than BCNF in some cases

Use
Useyour
yourown
ownjudgment
judgmentwhen
whendecomposing
decomposingschemas
schemas
BCNF - Decomposition
Example 2 (Convert to BCNF)
Old Scheme  {MovieTitle, MovieID, PersonName, Role, Payment }
New Scheme  {MovieID, PersonName, Role, Payment}
New Scheme  {MovieTitle, PersonName}
• Loss of relation {MovieID}  {MovieTitle}
New Scheme  {MovieID, PersonName, Role, Payment}
New Scheme  {MovieID, MovieTitle}
• We got the {MovieID}  {MovieTitle} relationship back
Example 3 (Convert to BCNF)
Old Scheme  {Client, Problem, Consultant}
New Scheme  {Client, Consultant}
New Scheme  {Client, Problem}
Fourth Normal Form

(4NF)
Fourth normal form eliminates independent many-to-
one relationships between columns.
• To be in Fourth Normal Form,
– a relation must first be in Boyce-Codd Normal Form.
– a given relation may not contain more than one multi-
valued attribute.

Example (Not in 4NF)


Scheme  {MovieName, ScreeningCity, Genre)
Primary Key: {MovieName, ScreeningCity, Genre)
1. All columns are a part of the only candidate key, hence
BCNF
Movie Screening Genre
2. Many Movies can have the same Genre City
Hard Code Los Angles Comedy
3. Many Cities can have the same movie
Hard Code New York Comedy
4. Violates 4NF Bill Durham Santa Cruz Drama

Bill Durham Durham Drama

The Code New York Horror


Warrier
Fourth Normal Form
(4NF)
Example 2 (Not in 4NF) Manage Chil Employ
r d ee
Scheme  {Manager, Child, Employee} Jim Beth Alice

1. Primary Key  {Manager, Child, Employee} Mary Bob Jane

2. Each manager can have more than one childMary NULL Adam
3. Each manager can supervise more than one employee
4. 4NF Violated

Example 3 (Not in 4NF)


Scheme  {Employee, Skill, ForeignLanguage}
5. Primary Key  {Employee, Skill, Language }
6. Each employee can speak multiple languages
7. Each employee can have multiple skills
Employ Skill Langua
8. Thus violates 4NF ee ge
1234 Cooking French

1234 Cooking German


1453 Carpentr Spanish
y
1453 Cooking Spanish
2345 Cooking Spanish
4NF - Decomposition
1. Move the two multi-valued relations to separate tables
2. Identify a primary key for each of the new entity.

Example 1 (Convert to 3NF)


Old Scheme  {MovieName, ScreeningCity, Genre}
New Scheme  {MovieName, ScreeningCity}
New Scheme  {MovieName, Genre}

Movie Genre Movie Screening


City
Hard Code Comedy Hard Code Los Angles

Bill Durham Drama Hard Code New York

The Code Horror Bill Durham Santa Cruz


Warrier
Bill Durham Durham

The Code New York


Warrier
4NF - Decomposition
Example 2 (Convert to 4NF) Manage Chil Manage Employ
r d r ee
Old Scheme  {Manager, Child, Employee}
Jim Beth Jim Alice

New Scheme  {Manager, Child} Mary Bob Mary Jane

New Scheme  {Manager, Employee} Mary Adam

Example 3 (Convert to 4NF)


Old Scheme  {Employee, Skill, ForeignLanguage}
New Scheme  {Employee, Skill}
New Scheme  {Employee, ForeignLanguage}

Employ Skill Employ Langua


ee ee ge
1234 Cooking 1234 French

1453 Carpentr 1234 German


y
1453 Cooking 1453 Spanish

2345 Cooking 2345 Spanish


Fifth Normal Form

(5NF)
Fifth normal form is satisfied when all tables
are broken into as many tables as possible in
order to avoid redundancy. Once it is in fifth
normal form it cannot be broken into smaller
relations without changing the facts or the
meaning.
Domain Key Normal Form
(DKNF)
• The relation is in DKNF when there can be no
insertion or deletion anomalies in the
database.

You might also like