0% found this document useful (0 votes)
6 views

DB Normalization Part1

The document discusses database design and normalization, emphasizing the importance of reducing data redundancy to prevent anomalies such as insert, update, and delete anomalies. It explains functional dependencies, candidate keys, and the process of normalization through various normal forms. The document also introduces Armstrong's Axioms for reasoning about functional dependencies and outlines the steps involved in normalization.

Uploaded by

ddeepak49127
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

DB Normalization Part1

The document discusses database design and normalization, emphasizing the importance of reducing data redundancy to prevent anomalies such as insert, update, and delete anomalies. It explains functional dependencies, candidate keys, and the process of normalization through various normal forms. The document also introduces Armstrong's Axioms for reasoning about functional dependencies and outlines the steps involved in normalization.

Uploaded by

ddeepak49127
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 71

Database Design: Normalization

Agenda

1. Database Design

2. Normal forms & functional dependencies

3. Finding functional dependencies

4. Closures, superkeys & keys


Design Theory
 The biggest problem needed to be solved in database is data redundancy.
 Why data redundancy is the problem? Because it causes:
 Insert Anomaly
 Update Anomaly
 Delete Anomaly

 Design theory is about how to represent your data to avoid anomalies.

 Achieved by Data Normalization, a process of analyzing a relation to ensure


that it is well formed.

 Normalization involves decomposing relations with anomalies to produce


smaller well structured relations.

 If a relation is normalized (or well formed), rows can be inserted, deleted and
modified without creating anomalies.
Data Anomalies & Constraints
Constraints Prevent (some)
Anomalies in the Data

A poorly designed database causes


anomalies:

Student Course Room


Mary CSC261 101
Joe CSC261 101
Sam CSC261 101
.. .. ..

If every course is in only one room,


contains redundant information!
Constraints Prevent (some)
Anomalies in the Data

A poorly designed database causes


anomalies:

Student Course Room


Mary CSC261 101
Joe CSC261 703
Sam CSC261 101
.. .. ..

If we update the room number for one tuple, we


get inconsistent data = an update anomaly
Constraints Prevent (some)
Anomalies in the Data

A poorly designed database causes


anomalies:

Student Course Room


.. .. ..

If everyone drops the class, we lose what


room the class is in! = a delete anomaly
Constraints Prevent (some)
Anomalies in the Data

A poorly designed database causes


anomalies:

Student Course Room


Similarly, we
Mary CSC261 B01 can’t reserve a
Joe CSC261 B01 room without
Sam CSC261 B01 students = an
… CSC461 703 insert anomaly
.. .. ..
Constraints Prevent (some)
Anomalies in the Data

Student Course Is this form better?

Mary CSC261 Course Room


• Redundancy?
Joe CSC261 CSC261 101 • Update anomaly?
Sam CSC261 CSC257 601 • Delete anomaly?
• Insert anomaly?
.. ..

Today: develop theory to understand why this design


may be better and how to find this decomposition…
Database Anomalies ACTIVITY Relation
Example 2 StudentID Activity Fee
100 Skiing 200
Anomalies are problems caused by bad database design. 100 Golf 65
Example: 175 Squash 50
175 Swimming 50
ACTIVITY(StudentID, Activity, Fee)
200 Swimming 50
An insertion anomaly occurs when a row cannot be added to a 200 Golf 65

relation, because not all data are available (or one has to invent “dummy” data)
 Example: we want to store that scuba diving costs $175, but have no place to put
this information until a student takes up scuba-diving (unless we create a fake
student)
A deletion anomaly occurs when data is deleted from a relation, and other critical data
are unintentionally lost
 Example: if we delete the record with StudentID = 100, we forget that skiing costs
$200
An update anomaly occurs when one must make many changes to reflect the
modification of a single datum
 Example: if the cost of swimming changes, then all entries with swimming
Activity must be changed too
Cause of Anomalies

Anomalies are primarily caused by:


1. Data redundancy: replication of the same field in multiple tables, other than foreign
keys
2. Functional dependencies including:
 Partial dependency
 Transitive dependency
 Multi-value dependency
Functional Dependencies
Functional Dependencies for Dummies

• A relationship between attributes where one attribute (or


group of attributes) determines the value of another
attribute (or group of attributes) in the same table.

• Example:
SSN uniquely identify any Person

(SSN)  (First Name, Last Name)


Candidate Keys/Primary Keys and Functional Dependencies

By definition:
• A candidate key of a relation functionally determines all
other non-key attributes in the row.

Implies:
• A primary key of a relation functionally determines all other
non-key attributes in the row.

EmployeeID  (EmployeeName, EmpPhone)


Functional Dependency

Def: Let A, B be sets of attributes, we write A  B or say


A functionally determines B if, for any tuples t1 and t2:
t1[A] = t2[A] implies t1[B] = t2[B] and we
call A  B a functional dependency

A  B means that
“whenever two tuples agree on A then they agree on B.”

A It is a determinant set.
B It is a dependent attribute.
A functionally determines B.
{A → B}
B is a functionally dependent on A.
A Picture of FDs

Defn (again):
A1 … Am B1 … Bn Given attribute sets A = {A1,…,Am}
and B = {B1,…Bn} in R,
A Picture of FDs

Defn (again):
A1 … Am B1 … Bn Given attribute sets A={A1,…,Am}
and B = {B1,…Bn} in R,

The functional dependency A B


on R holds if for any ti, tj in R:
A Picture of FDs

Defn (again):
Given attribute sets A={A1,…,Am}
and B = {B1,…Bn} in R,
A1 … Am B1 … Bn

The functional dependency A B


ti
on R holds if for any ti,tj in R:

tj
ti[A1] = tj[A1] AND ti[A2]=tj[A2] AND …
AND ti[Am] = tj[Am]

If t1, t2 agree here..


A Picture of FDs

Defn (again):
Given attribute sets A={A1,…,Am}
and B = {B1,…Bn} in R,
A1 … Am B1 … Bn

The functional dependency A B


ti
on R holds if for any ti,tj in R:
tj
if ti[A1] = tj[A1] AND ti[A2]=tj[A2] AND
… AND ti[Am] = tj[Am]

If t1,t2 agree here.. …they also agree here! then ti[B1] = tj[B1] AND ti[B2]=tj[B2]
AND … AND ti[Bn] = tj[Bn]
FDs for Relational Schema Design

High-level idea: why do we care about FDs?

1. Start with some relational schema (e.g., design by ER diagram)

2. Find out its functional dependencies (FDs)

3. Use these to design a better schema


• One which minimizes the possibility of anomalies
Functional Dependencies as Constraints

A functional dependency is a form of


constraint Student Course Room
Mary CS145 B01
• Holds on some instances not others. Joe CS145 B01
Sam CS145 B01
• Part of the schema, helps define a valid
instance. .. .. ..

Note: The FD
{Course} -> {Room} holds on
Recall: an instance of a schema is a multiset of this instance
tuples conforming to that schema, i.e. a table
Functional Dependencies as Constraints

Note that:
• You can check if an FD is Student Course Room
violated by examining a single Mary CS145 B01
instance;
Joe CS145 B01
Sam CS145 B01
• However, you cannot prove
that an FD is part of the .. .. ..
schema by examining a single
instance. However, cannot prove that the
• This would require checking FD {Course} -> {Room} is part of
every valid instance the schema
More Examples

An FD is a constraint which holds, or does not hold on


an instance:

EmpID Name Phone Position


E0045 Smith 1234 Clerk
E3542 Mike 9876 Salesrep
E1111 Smith 9876 Salesrep
E9999 Mary 1234 Lawyer
More Examples

EmpID Name Phone Position


E0045 Smith 1234 Clerk
E3542 Mike 9876 Salesrep
E1111 Smith 9876 Salesrep
E9999 Mary 1234 Lawyer

{Position}  {Phone}
More Examples

EmpID Name Phone Position


E0045 Smith 1234  Clerk
E3542 Mike 9876 Salesrep
E1111 Smith 9876 Salesrep
E9999 Mary 1234  Lawyer

but not {Phone}  {Position}


ACTIVITY
Find at least three FDs which
A B C D E hold on this instance:
1 2 4 3 6
3 2 5 1 8
1 4 4 5 7 {A }  {C }
{A,B }  {C }
1 2 4 3 6 {E }  {D }
3 2 5 1 8
Armstrong inference rules

 Armstrong's Axioms is a set of rules.


 It provides a simple technique for reasoning about functional
dependencies.
 It was developed by William W. Armstrong in 1974.
 It is used to infer all the functional dependencies on a relational
database.
Armstrong inference rules

Axioms:
Reflexivity: if YX, then X→Y
Augmenta on: if X→Y, then WX→WY
Transi vity: if X→Y and Y→Z, then X→Z
Derived Rules:
Union: if X→Y and X→Z, the X→YZ
Decomposi on: if X→YZ, then X→Y and X→Z
Pseudo transi vity: if X→Y and WY→Z, then XW→Z
Armstrong inference rules

Axioms are both


Sound:
when applied to a set of functional dependencies they only
produce dependency tables that belong to the transitive closure
of that set

Complete:
can produce all dependency tables that belong to the transitive
closure of the set
Armstrong inference rules

Three last rules can be derived from the first three (the axioms)
Let us look at the union rule:
if X→Y and X→Z, the X→YZ
Using the first three axioms, we have:
if X→Y, then XX→XY same as X→XY (2nd)
if X→Z, then YX→YZ same as XY→YZ (2nd)
if X→XY and XY→YZ, then X→YZ (3rd)
Example:
Consider relation E = (P, Q, R, S, T, U) having set of Functional Dependencies (FD).
P→Q P→R
QR → S Q→T
QR → U PR → U

Calculate some members of axioms are as follows:


1. P → T
2. PR → S Axioms:
3. QR → SU Reflexivity: if YX, then X→Y
4. PR → SU Augmenta on: if X→Y, then WX→WY
Transi vity: if X→Y and Y→Z, then X→Z
Derived Rules:
Union: if X→Y and X→Z, the X→YZ
Decomposi on: if X→YZ, then X→Y and X→Z
Pseudo transi vity: if X→Y and WY→Z, then XW→Z
Axioms:
Reflexivity: if YX, then X→Y
Augmenta on: if X→Y, then WX→WY
Transi vity: if X→Y and Y→Z, then X→Z
Solution: Derived Rules:
Union: if X→Y and X→Z, the X→YZ
1. P → T Decomposi on: if X→YZ, then X→Y and X→Z
Pseudo transi vity: if X→Y and WY→Z, then XW→Z
In the FD set, P → Q and Q → T
So, Using Transi ve Rule: If {A → B} and {B → C}, then {A → C}
∴ If P → Q and Q → T, then P → T.

2. PR → S
In the above FD set, P → Q
As, QR → S
So, Using Pseudo Transi vity Rule: If{A → B} and {BC → D}, then {AC → D}
∴ If P → Q and QR → S, then PR → S.

3. QR → SU
In above FD set, QR → S and QR → U
So, Using Union Rule: If{A → B} and {A → C}, then {A → BC}
∴ If QR → S and QR → U, then QR → SU.

4. PR → SU
So, Using Pseudo Transi vity Rule: If{A → B} and {BC → D}, then {AC → D}
∴ If PR → S and PR → U, then PR → SU.
Trivial Functional Dependency

If A holds B {A → B}, where B is a subset of A, then it is called a Trivial


Trivial
Functional Dependency. Trivial always holds Functional Dependency.

If A holds B {A → B}, where B is not a subset A, then it is called as a Non-


Non-Trivial
Trivial Functional Dependency.
Normalization
https://www.youtube.com/watch?v=UrYLYV7WSHM
https://www.youtube.com/watch?v=l5DCnCzDb8g
Normalization
 Normalization is the process of removing redundant data from your
tables to improve storage efficiency, data integrity, and scalability.

 Normalization generally involves splitting existing tables into multiple


ones, which must be re-joined or linked each time a query is issued.

 Why normalization?
 The relation derived from the user view or data store will most
likely be unnormalized.

 The problem usually happens when an existing system uses


unstructured file, e.g. in MS Excel.
Unnormalized Form (table)
Example
Normalization Example

• (Student ID)  (Student Name, DormName, DormCost)

• However, if
– (DormName)  (DormCost)

Then, DormCost should be put into its own relation, resulting


in:
(Student ID)  (Student Name, DormName)
(DormName)  ( DormCost)
Normalization Example

• (AttorneyID, ClientID)  (ClientName, MeetingDate,


Duration)

• However, if
– ClientID  ClientName

• Then: ClientName should be in its own relation:

• (AttorneyID, ClientID)  (MeetingDate, Duration)


• (ClientID)  (ClientName)
Steps of Normalization
 First Normal Form (1NF)
 Second Normal Form (2NF)
 Third Normal Form (3NF)
 Boyce-Codd Normal Form (BCNF)
 Fourth Normal Form (4NF)
 Fifth Normal Form (5NF)
 Domain Key Normal Form (DKNF)

In practice, 1NF, 2NF, 3NF, and BCNF are enough for


database.
Normal Forms

• 1st Normal Form (1NF) = All tables are flat

• 2nd Normal Form (2NF)


DB designs based on
functional dependencies,
• 3 rd Normal Form (3NF) intended to prevent data
anomalies
• Boyce-Codd Normal Form (BCNF)

• 4th and 5th Normal Forms = see text books


Normalization Steps

Table with Remove


First Normal
multivalued multivalued
Form (1NF)
attributes
attributes

First Normal Remove Second


Partial Normal Form
Form (1NF) Dependencies (2NF)

Second Remove
Third Normal
Normal Transitive Form (3NF)
Dependencies
Form (2NF)
First Normal Form (1NF)

The official qualifications for 1NF are:


1. Each attribute name must be unique.
2. Each attribute value must be single.
3. Each row must be unique.
4. There is no repeating groups.
Additional:
Choose a primary key.

Reminder:
A primary key is unique, not null, unchanged. A primary
key can be either an attribute or combined attributes.
1st Normal Form (1NF)

Student Courses
Student Courses
Mary CS145
Mary {CS145, CS229}
Mary CS229
Joe {CS145, CS106}
Joe CS145
… …
Joe CS106

Violates 1NF. In 1st NF

1NF Constraint: Types must be atomic!


First Normal Form (1NF) (Cont.)
Example of a table not in 1NF :
Group Topic Student Score
Group A Intro MongoDB Sok San 18 marks
Sao Ry 17 marks
Group B Intro MySQL Chan Tina 19 marks
Tith Sophea 16 marks

It violates the 1NF because:


 Attribute values are not single.
 Repeating groups exists.
First Normal Form (1NF) (Cont.)
 After eliminating:
Group Topic Family Name Given Name Score
A Intro MongoDB Sok San 18
A Intro MongoDB Sao Ry 17
B Intro MySQL Chan Tina 19
B Intro MySQL Tith Sophea 16

 Now it is in 1NF.
However, it might still violate 2NF and so on.
Functional Dependencies
We say an attribute, B, has a functional dependency on another
attribute, A, if for any two records, which have the same value
for A, then the values for B in these two records must be the
same. We illustrate this as:

AB (read as: A determines B or B depends on A)

Employee_name Project Email_address


Joe San POS Mart Sys [email protected]
Sao Ry Univ Mgt Sys [email protected]
Joe San Web Redesign [email protected]
Chan Sokna POS Mart Sys [email protected]
Sao Ry DB Design [email protected]

Employee_name  Email_address
Functional Dependencies (cont.)
EmpNum EmpEmail EmpFname EmpLname
123 [email protected] John Doe
456 [email protected] Peter Smith
555 [email protected] Alan Lee
633 [email protected] Peter Doe
787 [email protected] Alan Lee

If EmpNum is the PK then the FDs:

EmpNum  EmpEmail, EmpFname, EmpLname

must exist.
Functional Dependencies (cont.)
EmpNum  EmpEmail, EmpFname, EmpLname
3 different ways
you might see
FDs depicted
EmpEmail
EmpNum EmpFname

EmpLname

EmpNum EmpEmail EmpFname EmpLname


Determinant

Functional Dependency

EmpNum  EmpEmail

Attribute on the left hand side is known as the


determinant
• EmpNum is a determinant of EmpEmail
Second Normal Form (2NF)

The official qualifications for 2NF are:

1. A table is already in 1NF.


2. All non-key attributes are fully dependent on the
primary key.

All partial dependencies are removed to place in another


table.
Partial Dependencies
• Partial dependency is a functional dependency whose determinant is
part of the primary key (but not all of it)
• Example:

ACTIVITY(StudentID, Activity, Fee)

StudentID Activity Fee


StudentID
100 Skiing 200
Fee
100 Golf 65
175 Squash 50
Activity 175 Swimming 50
200 Swimming 50
200 Golf 65
Example of a table not in 2NF:

CourseID SemesterID Num Student Course Name


IT101 201301 25 Database
IT101 201302 25 Database
IT102 201301 30 Web Prog
IT102 201302 35 Web Prog
IT103 201401 20 Networking

Primary Key

The Course Name depends on only CourseID, a part of the primary key not the
whole primary {CourseID, SemesterID}. It’s called partial dependency.
Solution:
Remove CourseID and Course Name together to create a new table.
CourseID Course Name C ourseI D SemesterID Num Student
IT101 Database IT101 201301 25
IT101 Database IT101 201302 25
IT102 Web Prog IT102 201301 30
IT102 Web Prog IT102 201302 35
IT103 Networking IT103 201401 20

Done?
Oh no, it is still not in
1NF yet.
Remove the repeating CourseID Course Name
groups too. IT101 D atabase
Finally, connect the IT102 Web Prog
relationship.
IT103 Networking
Third Normal Form (3NF)

The official qualifications for 3NF are:


1. A table is already in 2NF.
2. Nonprimary key attributes do not depend on other
nonprimary key attributes
(i.e. no transitive dependencies)
All transitive dependencies are removed to place
in another table.
Transitive Dependencies
Transitive dependency is a functional dependency whose determinant is
not the primary key, part of the primary key, or a candidate key.
Transitive functionality is a functional dependency in which a non-key
attribute is determined by another non-key attribute.

Example:
ACTIVITY(StudentID, Activity, Fee)

StudentID Activity Fee


100 Skiing 200
100 Golf 65
175 Squash 50
175 Swimming 50
200 Swimming 50
200 Golf 65
Example of a Table not in 3NF:

StudyID CourseName TeacherName TeacherTel


1 Database Sok Piseth 012 123 456
2 Database Sao Kanha 0977 322 111
3 Web Prog Chan Veasna 012 412 333
4 Web Prog Chan Veasna 012 412 333
5 Networking Pou Sambath 077 545 221

Primary Key
The TeacherTel is a nonkey attribute, and the
TeacherName is also a nonkey attribute. But
TeacherTel depends on TeacherName.
It is called transitive dependency.

Solution:
Remove Teacher Name and TeacherTel together to create a
new table.
Done?
Teacher Name Teacher Tel
Oh no, it is still not in 1NF yet.
Sok Piseth 012 123 456 Remove Repeating row.
Sao Kanha 0977 322 111
StudyI D C ourse N am e T.I D
Chan Veasna 012 412 333
1 Database T1
Chan Veasna 012 412 333
2 Database T2
Pou Sambath 077 545 221
3 Web Prog T3
Teacher Name Teacher Tel 4 Web Prog T3
Sok Piseth 012 123 456 5 Networking T4
Sao Kanha 0977 322 111
Chan Veasna 012 412 333
Pou Sambath 077 545 221

Note about primary key: ID Teacher Name Teacher Tel


-In theory, you can choose T1 Sok Piseth 012 123 456
TeacherName to be a primary key. T2 Sao Kanha 0977 322 111
-But in practice, you should add
T3 Chan Veasna 012 412 333
TeacherID as the primary key.
T4 Pou Sambath 077 545 221
Boyce Codd Normal Form (BCNF) – 3.5NF
The official qualifications for BCNF are:
1. A table is already in 3NF.
2. All determinants must be superkeys.
All determinants that are not superkeys are removed to
place in another table.

K is a superkey for relation R if K functionally determines


all of R.

K is a (candidate)key for R if K is a superkey, but no


proper subset of K is a superkey.
Boyce Codd Normal Form (BCNF) (Cont.)
 Example of a table not in BCNF:
Student Course Teacher
Sok DB John
Sao DB William
Chan E-Commerce Todd
Sok E-Commerce Todd
Chan DB William

 Key: {Student, Course}


 Functional Dependency:
 {Student, Course}Teacher
 Teacher  Course
 Problem: Teacher is not a superkey but determines Course.
Student Course Solution: Decouple a table
contains Teacher and Course
Sok DB
from original table (Student,
Sao DB Course). Finally, connect the
C han E-Commerce new and old table to third
table contains Course.
Sok E-Commerce
C han DB Course
DB
E-Commerce
Course Teacher
DB John
DB W illiam
E-Commerce Todd
Forth Normal Form (4NF)
The official qualifications for 4NF are:
1. A table is already in BCNF.
2. A table contains no multi-valued dependencies.

 Multi-valued dependency: MVDs occur when two or more independent multi


valued facts about the same attribute occur within the same table.
A ->-> B (B multi-valued depends on A)
Example: MVD

Customer(name, addr, phones, email_drinksLiked)


A drinker’s phones are independent of the drinks they like.
name->->phones and name ->->drinksLiked.

Thus, each of a drinker’s phones appears with each of the drinks they like in
all combinations.
Tuples Implied by name->->phones

If we have tuples:

name addr phones drinksLiked


sue a p1 d1
sue a p2 d2
sue a p2 d1
sue a p1 d2

Then these tuples must also be in the relation.


Picture of MVD X ->->Y

X Y others

equal

exchange
Forth Normal Form (4NF) (Cont.)
 Example of a table not in 4NF:
Student Major Hobby
Sok IT Football
Sok IT Volleyball
Sao IT Football
Sao Med Football
Chan IT NULL
Puth NULL Football
Tith NULL NULL

 Key: {Student, Major, Hobby}


 MVD: Student -->> Major, Hobby
Solution: Decouple to each Student Major
table contains MVD. Finally, Sok IT
connect each to a third table
contains Student. Sao IT
Sao Med
Student Chan IT
Sok Puth NULL
Sao Tith NULL
Chan
Puth Student Hobby
Tith Sok Football
Sok Volleyball
Sao Football
Chan NULL
Puth Football
Tith NULL
Fifth Normal Form (5NF)
The official qualifications for 5NF are:
1. A table is already in 4NF.
2. The attributes of multi-valued dependencies are related.
Fifth Normal Form (5NF) (Cont.)
 Example of a table not in 5NF:
Seller Company Product
Sok MIAF Trading Zenya
Sao Coca-Cola Corp Coke
Sao Coca-Cola Corp Fanta
Sao Coca-Cola Corp Sprite
Chan Angkor Brewery Angkor Beer
Chan Cambodia Brewery Cambodia Beer

 Key: {Seller, Company, Product}


 MVD: Seller -->> Company, Product
 Product is related to Company.
C ompany
Seller C ompany
Seller 1 M Sok M 1 MIAF Trading
MIAF Trading
Sok Coca-Cola Corp
Sao Coca-Cola Corp
Sao Angkor Brewery
C han Angkor Brewery
C han Cambodia Brewery
C han Cambodia Brewery
1 1

M
M
1 C ompany Product
Seller Product
MIAF Trading Z enya
Sok Z enya
M Coca-Cola Corp C oke
Sao C oke
Product Coca-Cola Corp Fanta
Sao Fanta
Z enya Coca-Cola Corp Sprite
Sao Sprite
C oke Angkor Brewery Angkor Beer
C han Angkor Beer
Fanta C ambodia C ambodia
C han C ambodia Brewery Beer
Sprite
Beer M
Angkor Beer 1
Normalization Practice

Please see the normalization example slides


Acknowledgement

Some of these slides are taken from cs145 course offered by


Stanford University.

You might also like