0% found this document useful (0 votes)
4 views

Normalization (2)

Database normalization is a process aimed at eliminating redundant data by restructuring tables to achieve higher normalization forms, characterized by scalar values, minimal redundancy, and reduced null values. It addresses update anomalies, such as insertion, deletion, and modification issues that arise from redundant data. The document outlines various levels of normalization, including 1NF, 2NF, and 3NF, and emphasizes the importance of functional dependencies in maintaining data integrity.

Uploaded by

rithike
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Normalization (2)

Database normalization is a process aimed at eliminating redundant data by restructuring tables to achieve higher normalization forms, characterized by scalar values, minimal redundancy, and reduced null values. It addresses update anomalies, such as insertion, deletion, and modification issues that arise from redundant data. The document outlines various levels of normalization, including 1NF, 2NF, and 3NF, and emphasizes the importance of functional dependencies in maintaining data integrity.

Uploaded by

rithike
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

Database Normalization

Definition
• This is the process which allows you to winnow out
redundant data within your database.
• This involves restructuring the tables to successively
meeting higher forms of Normalization.
• A properly normalized database should have the
following characteristics
– Scalar values in each fields
– Absence of redundancy.
– Minimal use of null values.
– Minimal loss of information.
Update Anomalies
Relations that have redundant data may
have problems called update anomalies,
which are classified as ,
Insertion anomalies
Deletion anomalies
Modification anomalies
Example of Update Anomalies

To insert a new staff with branchNo B007 into the


StaffBranch relation;
To delete a tuple that represents the last member of
staff located at a branch B007;
To change the address of branch B003.
StaffBranch
staffNo sName position salary branchNo bAddress
SL21 John White Manager 30000 B005 22 Deer Rd, London
SG37 Ann Beech Assistant 12000 B003 163 Main St,Glasgow
SG14 David Ford Supervisor 18000 B003 163 Main St,Glasgow
SA9 Mary Howe Assistant 9000 B007 16 Argyll St, Aberdeen
SG5 Susan Brand Manager 24000 B003 163 Main St,Glasgow
SL41 Julie Lee Assistant 9000 B005 22 Deer Rd, London

Figure 1 StaffBranch relation


Example of Update Anomalies (2)

Staf
f
staffNo sName position salary branceNo
SL21 John White Manager 30000 B005
SG37 Ann Beech Assistant 12000 B003
SG14 David Ford Supervisor 18000 B003
SA9 Mary Howe Assistant 9000 B007
SG5 Susan Brand Manager 24000 B003
SL41 Julie Lee Assistant 9000 B005

Branch
branceNo bAddress
B005 22 Deer Rd, London
B007 16 Argyll St, Aberdeen
B003 163 Main St,Glasgow

Figure 2 Staff and Branch relations


Design Alternative :Large Schemas
Loan Borrow
Loan_id amount er
Customer_id Loan_id
Loan_id
23-652 L-100
Primary key in
L-100 one and foreign 15-202 L-100
key in another 23-521 L-100

Bor_loan More
cust_id loan_id amount redundancy.
23-652 L-100 10000 Amount
15-202 L-100 10000 repeated for
23-521 L-100 10000 all loan. If
amount
modified
should be
Design Alternative :Large Schemas
Loan_id amount Loan_id Branch_name

L-100 10000 L-100 springfield

Loan_id amount Branch_name


L-100 10000 springfield
Design Alternative: smaller schemas
• All the decomposing of schemas are not helpful.
Employee_id ename Telephone_no Start_date

Lossy Decomposition
123-45-6789 Kim 882-0000 1984-April-29
987-65-4321 Kim 869-9999 1981-01-16

Employee_id ename ename telephone_no Start_date


123-45-6789 Kim Kim 882-0000 1984-April-29

987-65-4321 Kim Kim 869-9999 1981-01-16

Employee_id ename Telephone_no Start_date


123-45-6789 Kim 882-0000 1984-April-29
123-45-6789 Kim 869-9999 1981-01-16
987-65-4321 Kim 882-0000 1984-April-29
987-65-4321 Kim 869-9999 1981-01-16
Levels of Normalization
• Levels of normalization based on the amount of
redundancy in the database.
• Various levels of normalization are:
– First Normal Form (1NF)
– Second Normal Form (2NF)

Number of Tables
Redundancy
– Third Normal Form (3NF)

Complexity
– Boyce-Codd Normal Form (BCNF)
– Fourth Normal Form (4NF)
– Fifth Normal Form (5NF)
– Domain Key Normal Form (DKNF)

Most databases should be 3NF or BCNF in order to avoid


the database anomalies.
Levels of Normalization
1NF
2NF
3NF
4NF
5NF
DKNF

Each higher level is a subset of the lower level


First Normal Form (1NF)
A table is considered to be in 1NF if all the fields contain
only scalar values (as opposed to list of values).
Example (Not 1NF)

ISBN Title AuName AuPhone PubName PubPhone Price

0-321-32132-1 Balloon Sleepy, 321-321-1111, Small House 714-000-0000 $34.00


Snoopy, 232-234-1234,
Grumpy 665-235-6532

0-55-123456-9 Main Street Jones, 123-333-3333, Small House 714-000-0000 $22.95


Smith 654-223-3455
0-123-45678-0 Ulysses Joyce 666-666-6666 Alpha Press 999-999-9999 $34.00

1-22-233700-0 Visual Roman 444-444-4444 Big House 123-456-7890 $25.00


Basic

Author and AuPhone columns are not scalar


1NF - Decomposition
1. Place all items that appear in the repeating group in a
new table
2. Designate a primary key for each new table produced.
3. Duplicate in the new table the primary key of the table
from which the repeating group was extracted or vice
versa.
Example (1NF) ISBN AuName AuPhone

0-321-32132-1 Sleepy 321-321-1111

ISBN Title PubName PubPhone Price 0-321-32132-1 Snoopy 232-234-1234

0-321-32132-1 Balloon Small House 714-000-0000 $34.00 0-321-32132-1 Grumpy 665-235-6532

0-55-123456-9 Main Street Small House 714-000-0000 $22.95 0-55-123456-9 Jones 123-333-3333

0-123-45678-0 Ulysses Alpha Press 999-999-9999 $34.00 0-55-123456-9 Smith 654-223-3455

1-22-233700-0 Visual Big House 123-456-7890 $25.00 0-123-45678-0 Joyce 666-666-6666


Basic
1-22-233700-0 Roman 444-444-4444
Functional Dependencies
Functional dependency describes the relationship
between attributes in a relation.
For example, if A and B are attributes of relation R, and B
is functionally dependent on A ( denoted A  B), if each
value of A is associated with exactly one value of B. ( A
and B may each consist of one or more attributes.)

B is functionally
A B
dependent on A

Determinant Refers to the attribute or group of attributes on


the left-hand side of the arrow of a functional
dependency
Functional dependencies
• Consider a relational schema R, and let α c
R and β c R.
• The functional dependency α  β holds on
schema r( R), for all pairs of tuples t1 and
t2 in r such that t1[α] = t2[α] , is also the
case t1[β]=t2[β].
Functional Dependencies
1. If one set of attributes in a table determines another
set of attributes in the table, then the second set of
attributes is said to be functionally dependent on the
first set of attributes.

Example 1

ISBN Title Price Table Scheme: {ISBN, Title, Price}


0-321-32132-1 Balloon $34.00 Functional Dependencies: {ISBN}  {Title}
0-55-123456-9 Main Street $22.95 {ISBN}  {Price}
0-123-45678-0 Ulysses $34.00

1-22-233700-0 Visual $25.00


Basic
Functional Dependencies
Example 2
PubID PubName PubPhone Table Scheme: {PubID, PubName, PubPhone}
1 Big House 999-999-9999 Functional Dependencies: {PubId}  {PubPhone}
2 Small House 123-456-7890
{PubId}  {PubName}
{PubName, PubPhone}  {PubID}
3 Alpha Press 111-111-1111

Example 3
AuID AuName AuPhone
1 Sleepy 321-321-1111
Table Scheme: {AuID, AuName, AuPhone}
2 Snoopy 232-234-1234 Functional Dependencies: {AuId}  {AuPhone}
3 Grumpy 665-235-6532 {AuId}  {AuName}
4 Jones 123-333-3333
{AuName, AuPhone}  {AuID}
5 Smith 654-223-3455

6 Joyce 666-666-6666

7 Roman 444-444-4444
FD – Example
Database to track reviews of papers submitted to an academic
conference. Prospective authors submit papers for review and
possible acceptance in the published conference proceedings.
Details of the entities
– Author information includes a unique author number, a name, a
mailing address, and a unique (optional) email address.
– Paper information includes the primary author, the paper number,
the title, the abstract, and review status (pending,
accepted,rejected)
– Reviewer information includes the reviewer number, the name, the
mailing address, and a unique (optional) email address
– A completed review includes the reviewer number, the date, the
paper number, comments to the authors, comments to the
program chairperson, and ratings (overall, originality, correctness,
style, clarity)
FD – Example
Functional Dependencies
– AuthNo  AuthName, AuthEmail, AuthAddress
– AuthEmail  AuthNo
– PaperNo  Primary-AuthNo, Title, Abstract, Status
– RevNo  RevName, RevEmail, RevAddress
– RevEmail  RevNo
– RevNo, PaperNo  AuthComm, Prog-Comm, Date,
Rating1, Rating2, Rating3, Rating4, Rating5
Functional dependency
• Types of functional dependencies:
– Full Functional dependency
– Partial Functional dependency
– Transitive dependency
Full dependencies
Partial dependencies
Transitive dependencies
Trivial Functional Dependencies
• Functional dependencies are said to be trivial
because they satisfied by all relation.
• A functional dependency of the form
α  β is trivial if β C α.
Functional Dependencies (2)

Trival functional dependency means that the right-hand


side is a subset ( not necessarily a proper subset) of the left-
hand side.
For example:
staffNo, sName  sName
staffNo, sName  staffNo
They do not provide any additional information about
possible integrity constraints on the values held by
these attributes.

We are normally more interested in nontrivial


dependencies because they represent integrity
constraints for the relation.
Second Normal Form (2NF)
For a table to be in 2NF, there are two requirements
– The database is in first normal form
– All nonkey attributes in the table must be functionally dependent on the
entire primary key
Note: Remember that we are dealing with non-key attributes

Example 1 (Not 2NF)


Scheme  {Title, PubId, AuId, Price, AuAddress}
1. Key  {Title, PubId, AuId}
2. {Title, PubId, AuID}  {Price}
3. {AuID}  {AuAddress}
4. AuAddress does not belong to a key
5. AuAddress functionally depends on AuId which is a subset of a key
Second Normal Form (2NF)
Example 2 (Not 2NF)
Scheme  {City, Street, HouseNumber, HouseColor, CityPopulation}
1. key  {City, Street, HouseNumber}
2. {City, Street, HouseNumber}  {HouseColor}
3. {City}  {CityPopulation}
4. CityPopulation does not belong to any key.
5. CityPopulation is functionally dependent on the City which is a proper subset of
the key

Example 3 (Not 2NF)


Scheme  {studio, movie, budget, studio_city}
1. Key  {studio, movie}
2. {studio, movie}  {budget}
3. {studio}  {studio_city}
4. studio_city is not a part of a key
5. studio_city functionally depends on studio which is a proper subset of the key
2NF - Decomposition
1. If a data item is fully functionally dependent on only a part of the
primary key, move that data item and that part of the primary
key to a new table.
2. If other data items are functionally dependent on the same part
of the key, place them in the new table also
3. Make the partial primary key copied from the original table the
primary key for the new table. Place all items that appear in the
repeating group in a new table
Example 1 (Convert to 2NF)
Old Scheme  {Title, PubId, AuId, Price, AuAddress}
New Scheme  {Title, PubId, AuId, Price}
New Scheme  {AuId, AuAddress}
2NF - Decomposition
Example 2 (Convert to 2NF)
Old Scheme  {Studio, Movie, Budget, StudioCity}
New Scheme  {Movie, Studio, Budget}
New Scheme  {Studio, City}

Example 3 (Convert to 2NF)


Old Scheme  {City, Street, HouseNumber, HouseColor, CityPopulation}
New Scheme  {City, Street, HouseNumber, HouseColor}
New Scheme  {City, CityPopulation}
Third Normal Form (3NF)
This form dictates that all non-key attributes of a table must be functionally
dependent on a candidate key i.e. there can be no interdependencies
among non-key attributes.

For a table to be in 3NF, there are two requirements


– The table should be second normal form
– No attribute is transitively dependent on the primary key

Example (Not in 3NF)


Scheme  {Title, PubID, PageCount, Price }
1. Key  {Title, PubId}
2. {Title, PubId}  {PageCount}
3. {PageCount}  {Price}
4. Both Price and PageCount depend on a key hence 2NF
5. Transitively {Title, PubID}  {Price} hence not in 3NF
Third Normal Form (3NF)
Example 2 (Not in 3NF)
Scheme  {Studio, StudioCity, CityTemp}
1. Primary Key  {Studio}
2. {Studio}  {StudioCity}
3. {StudioCity}  {CityTemp}
4. {Studio}  {CityTemp}
5. Both StudioCity and CityTemp depend on the entire key hence 2NF
6. CityTemp transitively depends on Studio hence violates 3NF
BuildingID Contractor Fee

Example 3 (Not in 3NF) 100 Randolph 1200


Scheme  {BuildingID, Contractor, Fee} 150 Ingersoll 1100
1. Primary Key  {BuildingID} 200 Randolph 1200
2. {BuildingID}  {Contractor} 250 Pitkin 1100
3. {Contractor}  {Fee} 300 Randolph 1200
4. {BuildingID}  {Fee}
5. Fee transitively depends on the BuildingID
6. Both Contractor and Fee depend on the entire key hence 2NF
3NF - Decomposition
1. Move all items involved in transitive dependencies to a new
entity.
2. Identify a primary key for the new entity.
3. Place the primary key for the new entity as a foreign key on the
original entity.

Example 1 (Convert to 3NF)


Old Scheme  {Title, PubID, PageCount, Price }
New Scheme  {PubID, PageCount, Price}
New Scheme  {Title, PubID, PageCount}
3NF - Decomposition
Example 2 (Convert to 3NF)
Old Scheme  {Studio, StudioCity, CityTemp}
New Scheme  {Studio, StudioCity}
New Scheme  {StudioCity, CityTemp}

Example 3 (Convert to 3NF) BuildingID Contractor Contractor Fee

Old Scheme  {BuildingID, Contractor, Fee} 100 Randolph Randolph 1200


150 Ingersoll Ingersoll 1100
New Scheme  {BuildingID, Contractor}
200 Randolph Pitkin 1100
New Scheme  {Contractor, Fee} 250 Pitkin
300 Randolph
Boyce-Codd Normal Form (BCNF)
• BCNF does not allow dependencies between attributes that belong to candidate keys.
• BCNF is a refinement of the third normal form in which it drops the restriction of a non-
key attribute from the 3rd normal form.
• Third normal form and BCNF are not same if the following conditions are true:
– The table has two or more candidate keys
– At least two of the candidate keys are composed of more than one attribute
– The keys are not disjoint i.e. The composite candidate keys share some attributes

Example 1 - Address (Not in BCNF)


Scheme  {City, Street, ZipCode }
1. Key1  {City, Street }
2. Key2  {ZipCode, Street}
3. No non-key attribute hence 3NF
4. {City, Street}  {ZipCode}
5. {ZipCode}  {City}
6. Dependency between attributes belonging to a key
Boyce Codd Normal Form (BCNF)
Example 2 - Movie (Not in BCNF)
Scheme  {MovieTitle, MovieID, PersonName, Role, Payment }
1. Key1  {MovieTitle, PersonName}
2. Key2  {MovieID, PersonName}
3. Both role and payment functionally depend on both candidate keys thus 3NF
4. {MovieID}  {MovieTitle}
5. Dependency between MovieID & MovieTitle Violates BCNF

Example 3 - Consulting (Not in BCNF)


Scheme  {Client, Problem, Consultant}
1. Key1  {Client, Problem}
2. Key2  {Client, Consultant}
3. No non-key attribute hence 3NF
4. {Client, Problem}  {Consultant}
5. {Client, Consultant}  {Problem}
6. Dependency between attributess belonging to keys violates BCNF
BCNF - Decomposition
1. Place the two candidate primary keys in separate
entities
2. Place each of the remaining data items in one of the
resulting entities according to its dependency on the
primary key.
Example 1 (Convert to BCNF)
Old Scheme  {City, Street, ZipCode }
New Scheme1  {ZipCode, Street}
New Scheme2  {City, Street}
• Loss of relation {ZipCode}  {City}
Alternate New Scheme1  {ZipCode, Street }
Alternate New Scheme2  {ZipCode, City}
Decomposition – Loss of
Information
1. If decomposition does not cause any loss of information it is
called a lossless decomposition.
2. If a decomposition does not cause any dependencies to be lost
it is called a dependency-preserving decomposition.
3. Any table scheme can be decomposed in a lossless way into a
collection of smaller schemas that are in BCNF form. However
the dependency preservation is not guaranteed.
4. Any table can be decomposed in a lossless way into 3rd normal
form that also preserves the dependencies.
• 3NF may be better than BCNF in some cases

Use your own judgment when decomposing schemas


BCNF - Decomposition
Example 2 (Convert to BCNF)
Old Scheme  {MovieTitle, MovieID, PersonName, Role, Payment }
New Scheme  {MovieID, PersonName, Role, Payment}
New Scheme  {MovieTitle, PersonName}
• Loss of relation {MovieID}  {MovieTitle}
New Scheme  {MovieID, PersonName, Role, Payment}
New Scheme  {MovieID, MovieTitle}

• We got the {MovieID}  {MovieTitle} relationship back


Example 3 (Convert to BCNF)
Old Scheme  {Client, Problem, Consultant}
New Scheme  {Client, Consultant}
New Scheme  {Client, Problem}
Fourth Normal Form (4NF)
• Fourth normal form eliminates independent many-to-one
relationships between columns.
• To be in Fourth Normal Form,
– a relation must first be in Boyce-Codd Normal Form.
– a given relation may not contain more than one multi-valued attribute.

Example (Not in 4NF)


Scheme  {MovieName, ScreeningCity, Genre)
Primary Key: {MovieName, ScreeningCity, Genre)
1. All columns are a part of the only candidate key, hence BCNF
2. Many Movies can have the same Genre
3. Many Cities can have the same movie Movie ScreeningCity Genre

4. Violates 4NF Hard Code Los Angles Comedy

Hard Code New York Comedy

Bill Durham Santa Cruz Drama

Bill Durham Durham Drama

The Code Warrier New York Horror


Fourth Normal Form (4NF)
Example 2 (Not in 4NF) Manager Child Employee

Scheme  {Manager, Child, Employee} Jim Beth Alice

1. Primary Key  {Manager, Child, Employee} Mary Bob Jane

2. Each manager can have more than one child Mary NULL Adam

3. Each manager can supervise more than one employee


4. 4NF Violated

Example 3 (Not in 4NF)


Scheme  {Employee, Skill, ForeignLanguage}
1. Primary Key  {Employee, Skill, Language }
2. Each employee can speak multiple languages
3. Each employee can have multiple skills Employee Skill Language
4. Thus violates 4NF 1234 Cooking French

1234 Cooking German


1453 Carpentry Spanish

1453 Cooking Spanish


2345 Cooking Spanish
4NF - Decomposition
1. Move the two multi-valued relations to separate tables
2. Identify a primary key for each of the new entity.

Example 1 (Convert to 3NF)


Old Scheme  {MovieName, ScreeningCity, Genre}
New Scheme  {MovieName, ScreeningCity}
New Scheme  {MovieName, Genre}

Movie Genre Movie ScreeningCity


Hard Code Comedy Hard Code Los Angles

Bill Durham Drama Hard Code New York

The Code Warrier Horror Bill Durham Santa Cruz

Bill Durham Durham

The Code Warrier New York


4NF - Decomposition
Example 2 (Convert to 4NF) Manager Child Manager Employee
Old Scheme  {Manager, Child, Employee} Jim Beth Jim Alice

New Scheme  {Manager, Child} Mary Bob Mary Jane


Mary Adam
New Scheme  {Manager, Employee}

Example 3 (Convert to 4NF)


Old Scheme  {Employee, Skill, ForeignLanguage}
New Scheme  {Employee, Skill}
New Scheme  {Employee, ForeignLanguage}

Employee Skill Employee Language


1234 Cooking 1234 French

1453 Carpentry 1234 German

1453 Cooking 1453 Spanish

2345 Cooking 2345 Spanish


Fifth Normal Form (5NF)
• Fifth normal form is satisfied when all tables are broken
into as many tables as possible in order to avoid
redundancy. Once it is in fifth normal form it cannot be
broken into smaller relations without changing the facts or
the meaning.
Domain Key Normal Form (DKNF)
• The relation is in DKNF when there can be no insertion or
deletion anomalies in the database.

You might also like