DB Normalization Part1
DB Normalization Part1
Agenda
1. Database Design
If a relation is normalized (or well formed), rows can be inserted, deleted and
modified without creating anomalies.
Data Anomalies & Constraints
Constraints Prevent (some)
Anomalies in the Data
relation, because not all data are available (or one has to invent “dummy” data)
Example: we want to store that scuba diving costs $175, but have no place to put
this information until a student takes up scuba-diving (unless we create a fake
student)
A deletion anomaly occurs when data is deleted from a relation, and other critical data
are unintentionally lost
Example: if we delete the record with StudentID = 100, we forget that skiing costs
$200
An update anomaly occurs when one must make many changes to reflect the
modification of a single datum
Example: if the cost of swimming changes, then all entries with swimming
Activity must be changed too
Cause of Anomalies
• Example:
SSN uniquely identify any Person
By definition:
• A candidate key of a relation functionally determines all
other non-key attributes in the row.
Implies:
• A primary key of a relation functionally determines all other
non-key attributes in the row.
A B means that
“whenever two tuples agree on A then they agree on B.”
A It is a determinant set.
B It is a dependent attribute.
A functionally determines B.
{A → B}
B is a functionally dependent on A.
A Picture of FDs
Defn (again):
A1 … Am B1 … Bn Given attribute sets A = {A1,…,Am}
and B = {B1,…Bn} in R,
A Picture of FDs
Defn (again):
A1 … Am B1 … Bn Given attribute sets A={A1,…,Am}
and B = {B1,…Bn} in R,
Defn (again):
Given attribute sets A={A1,…,Am}
and B = {B1,…Bn} in R,
A1 … Am B1 … Bn
tj
ti[A1] = tj[A1] AND ti[A2]=tj[A2] AND …
AND ti[Am] = tj[Am]
Defn (again):
Given attribute sets A={A1,…,Am}
and B = {B1,…Bn} in R,
A1 … Am B1 … Bn
If t1,t2 agree here.. …they also agree here! then ti[B1] = tj[B1] AND ti[B2]=tj[B2]
AND … AND ti[Bn] = tj[Bn]
FDs for Relational Schema Design
Note: The FD
{Course} -> {Room} holds on
Recall: an instance of a schema is a multiset of this instance
tuples conforming to that schema, i.e. a table
Functional Dependencies as Constraints
Note that:
• You can check if an FD is Student Course Room
violated by examining a single Mary CS145 B01
instance;
Joe CS145 B01
Sam CS145 B01
• However, you cannot prove
that an FD is part of the .. .. ..
schema by examining a single
instance. However, cannot prove that the
• This would require checking FD {Course} -> {Room} is part of
every valid instance the schema
More Examples
{Position} {Phone}
More Examples
Axioms:
Reflexivity: if YX, then X→Y
Augmenta on: if X→Y, then WX→WY
Transi vity: if X→Y and Y→Z, then X→Z
Derived Rules:
Union: if X→Y and X→Z, the X→YZ
Decomposi on: if X→YZ, then X→Y and X→Z
Pseudo transi vity: if X→Y and WY→Z, then XW→Z
Armstrong inference rules
Complete:
can produce all dependency tables that belong to the transitive
closure of the set
Armstrong inference rules
Three last rules can be derived from the first three (the axioms)
Let us look at the union rule:
if X→Y and X→Z, the X→YZ
Using the first three axioms, we have:
if X→Y, then XX→XY same as X→XY (2nd)
if X→Z, then YX→YZ same as XY→YZ (2nd)
if X→XY and XY→YZ, then X→YZ (3rd)
Example:
Consider relation E = (P, Q, R, S, T, U) having set of Functional Dependencies (FD).
P→Q P→R
QR → S Q→T
QR → U PR → U
2. PR → S
In the above FD set, P → Q
As, QR → S
So, Using Pseudo Transi vity Rule: If{A → B} and {BC → D}, then {AC → D}
∴ If P → Q and QR → S, then PR → S.
3. QR → SU
In above FD set, QR → S and QR → U
So, Using Union Rule: If{A → B} and {A → C}, then {A → BC}
∴ If QR → S and QR → U, then QR → SU.
4. PR → SU
So, Using Pseudo Transi vity Rule: If{A → B} and {BC → D}, then {AC → D}
∴ If PR → S and PR → U, then PR → SU.
Trivial Functional Dependency
Why normalization?
The relation derived from the user view or data store will most
likely be unnormalized.
• However, if
– (DormName) (DormCost)
• However, if
– ClientID ClientName
Second Remove
Third Normal
Normal Transitive Form (3NF)
Dependencies
Form (2NF)
First Normal Form (1NF)
Reminder:
A primary key is unique, not null, unchanged. A primary
key can be either an attribute or combined attributes.
1st Normal Form (1NF)
Student Courses
Student Courses
Mary CS145
Mary {CS145, CS229}
Mary CS229
Joe {CS145, CS106}
Joe CS145
… …
Joe CS106
Now it is in 1NF.
However, it might still violate 2NF and so on.
Functional Dependencies
We say an attribute, B, has a functional dependency on another
attribute, A, if for any two records, which have the same value
for A, then the values for B in these two records must be the
same. We illustrate this as:
Employee_name Email_address
Functional Dependencies (cont.)
EmpNum EmpEmail EmpFname EmpLname
123 [email protected] John Doe
456 [email protected] Peter Smith
555 [email protected] Alan Lee
633 [email protected] Peter Doe
787 [email protected] Alan Lee
must exist.
Functional Dependencies (cont.)
EmpNum EmpEmail, EmpFname, EmpLname
3 different ways
you might see
FDs depicted
EmpEmail
EmpNum EmpFname
EmpLname
Functional Dependency
EmpNum EmpEmail
Primary Key
The Course Name depends on only CourseID, a part of the primary key not the
whole primary {CourseID, SemesterID}. It’s called partial dependency.
Solution:
Remove CourseID and Course Name together to create a new table.
CourseID Course Name C ourseI D SemesterID Num Student
IT101 Database IT101 201301 25
IT101 Database IT101 201302 25
IT102 Web Prog IT102 201301 30
IT102 Web Prog IT102 201302 35
IT103 Networking IT103 201401 20
Done?
Oh no, it is still not in
1NF yet.
Remove the repeating CourseID Course Name
groups too. IT101 D atabase
Finally, connect the IT102 Web Prog
relationship.
IT103 Networking
Third Normal Form (3NF)
Example:
ACTIVITY(StudentID, Activity, Fee)
Primary Key
The TeacherTel is a nonkey attribute, and the
TeacherName is also a nonkey attribute. But
TeacherTel depends on TeacherName.
It is called transitive dependency.
Solution:
Remove Teacher Name and TeacherTel together to create a
new table.
Done?
Teacher Name Teacher Tel
Oh no, it is still not in 1NF yet.
Sok Piseth 012 123 456 Remove Repeating row.
Sao Kanha 0977 322 111
StudyI D C ourse N am e T.I D
Chan Veasna 012 412 333
1 Database T1
Chan Veasna 012 412 333
2 Database T2
Pou Sambath 077 545 221
3 Web Prog T3
Teacher Name Teacher Tel 4 Web Prog T3
Sok Piseth 012 123 456 5 Networking T4
Sao Kanha 0977 322 111
Chan Veasna 012 412 333
Pou Sambath 077 545 221
Thus, each of a drinker’s phones appears with each of the drinks they like in
all combinations.
Tuples Implied by name->->phones
If we have tuples:
X Y others
equal
exchange
Forth Normal Form (4NF) (Cont.)
Example of a table not in 4NF:
Student Major Hobby
Sok IT Football
Sok IT Volleyball
Sao IT Football
Sao Med Football
Chan IT NULL
Puth NULL Football
Tith NULL NULL
M
M
1 C ompany Product
Seller Product
MIAF Trading Z enya
Sok Z enya
M Coca-Cola Corp C oke
Sao C oke
Product Coca-Cola Corp Fanta
Sao Fanta
Z enya Coca-Cola Corp Sprite
Sao Sprite
C oke Angkor Brewery Angkor Beer
C han Angkor Beer
Fanta C ambodia C ambodia
C han C ambodia Brewery Beer
Sprite
Beer M
Angkor Beer 1
Normalization Practice