Functional Dependencies and
Normalization for Relational
databases
Normalization
Database Normalization, or data normalization is a technique
to organize the contents of the table for transactional database
and data warehouses.
This include creating tables and establishing relationship
between those tables according to rules designed both to
protect data and make the database more flexible by
eliminating two factors
(i) redundancy
(ii) inconsistent dependency
Normalization
Normalization is part of the successful database design,
without normalization database system may be inaccurate,
slow and inefficient and they might not produce the data we
expect.
“ The process of reducing data redundancy in a relational
database is called Normalization”.
The main advantages of normalization are ; it frees the
database from certain insertion, deletion and modification
anomalies,
Insertion Anomalies
Insertion Anomalies.
Insertion anomalies can be differentiated into two types,
illustrated by the following examples based on the
EMP_DEPT relation:
Insertion Anomalies
To insert a new employee tuple into EMP_DEPT, we must include
either the attribute values for the department that the employee
works for, or nulls (if the employee does not work for a department
as yet).
For example, to insert a new tuple for an employee who works in
department number 5, we must enter the attribute values of
department 5 correctly so that they are consistent with values for
department 5 in other tuples in EMP_DEPT.
In the design of Figure 10.2, we do not have to worry about this
consistency problem because we enter only the department number
in the employee tuple; all other attribute values of department 5 are
recorded only once in the database, as a single tuple in the
DEPARTMENT relation.
Insertion Anomalies
It is difficult to insert a new department that has no employees as yet
in the EMP_DEPT relation. The only way to do this is to place null
values in the attributes for employee.
This causes a problem because SSN is the primary key of
EMP_DEPT, and each tuple is supposed to represent an employee
entity-not a department entity.
Moreover, when the first employee is assigned to that department,
we do not need this tuple with null values any more.
This problem does not occur in the design of Figure 10.2, because a
department is entered in the DEPARTMENT relation whether or not
any employees work for it, and whenever an employee is assigned to
that department, a corresponding tuple is inserted in EMPLOYEE.
Insertion Anomalies
Insert Anomaly: Cannot insert a new department unless an
employee is assigned to .
Inversely - Cannot insert an employee unless an he/she is
assigned to a Department.
For Example:- (1)To insert a new employee tuple into
EMP_DEPT, we must include either the attribute values for
the department that the employee works for, or nulls (if the
employee does not work for a department as yet).
Insertion Anomalies
(2) It is difficult to insert a new department that has no
employees as yet in the EMP_DEPT relation. The only way
to do this is to place null values in the attributes for
employee. This causes a problem because SSN is the
primary key of EMP_DEPT.
These problems are not occurs in the Example 2
Example
Example
Deletion Anomalies
When a project is deleted, it will result in deleting all the
employees who work on that project.
Deletion Anomalies
If we delete from EMP_DEPT an employee tuple that
happens to represent the last employee working for a
particular department, the information concerning that
department is lost from the database.
Deletion Anomalies
Modification Anomalies
In EMP_DEPT, if we change the value of one of the attributes of
a particular department-say, the manager of department 5-we
must update the tuples of all employees who work in that
department; otherwise, the database will become inconsistent. If
we fail to update some tuples, the same department will be shown
to have two different values for manager in different employee
tuples, which would be wrong.
Functional Dependencies
Functional dependencies (FDs) are used to specify formal
measures of the "goodness" of relational designs
FDs and keys are used to define normal forms for
relations
FDs are constraints that are derived from the meaning
and interrelationships of the data attributes
A set of attributes X uniquely determines a set of
attributes Y if the value of X determines a unique value
for Y
Functional Dependencies
X -> Y holds if whenever two tuples have the same value
for X, they must have the same value for Y
For any two tuples t1 and t2 in any relation instance r(R): If
t1[X]=t2[X], then t1[Y]=t2[Y]
X -> Y in R specifies a constraint on all relation instances
r(R)
Written as X -> Y; Read as “X uniquely determines Y”
FDs are derived from the real-world constraints on the
attributes
Examples of FD constraints
social security number determines employee name
SSN -> ENAME
project number determines project name and location
PNUMBER -> {PNAME, PLOCATION}
employee ssn and project number determines the hours
per week that the employee works on the project
{SSN, PNUMBER} -> HOURS
Examples of FD constraints
An FD is a property of the attributes in the schema R
The constraint must hold on every relation instance
r(R)
If K is a key of R, then K functionally determines all
attributes in R (since we never have two distinct tuples
with t1[K]=t2[K])
Example
Review of Functional Dependencies
FDs are constraints on the set of legal relations
They require that the value for one set of attributes
uniquely determines the value for another set of attributes
We write X Y to mean that the value(s) for the
attribute(s) of X uniquely determine the value(s) for the
attribute(s) of Y
We expect some FDs to hold in any real world. For
example, we expect SSN to uniquely determine FNAME.
However, we do not expect FNAME to uniquely
determine SSN.
Computing F+, the Closure of a
Set of FDs F
When we design a relational DB, we start out by
considering the set of FDs we expect to hold.
It is important to consider all FDs, so we need to be sure
we’ve listed them all
The closure of a set of FDs F is the set of all FDs that are
logically implied by F
We denote the closure of F by F+
We compute F+ by applying Armstrong’s Axioms
Inference Rules for FDs
The following six Inference rule (IR1 to IR6) are well
know as Inference Rules for Functional Dependencies
Armstrong's Axioms :
IR1. (Reflexivity) If Y X, then X -> Y
Ex: ABC BC
IR2. (Augmentation) If X -> Y, then XZ -> YZ
Ex: If we know C D, then we know ABC ABD
IR3. (Transitivity) If X -> Y and Y -> Z, then X -> Z
Ex: If we know AB CD and CD EF, then we
know AB EF
Inference Rules for FDs
Armstrong’s axioms are sound because they do not
generate any incorrect functional dependencies.
They are complete because for a given set F of Functional
dependencies they allows us to generate all F+.
Inference Rules for FDs
More useful rules can be deduced from Armstrong’s Axioms:
(Union) If X -> Y and X -> Z, then X -> YZ
Ex: If AB CD and AB EF, we can infer AB CDEF
(Decomposition) If X -> YZ, then X -> Y and X -> Z
Ex: If AB CDEF, we can infer AB CD and AB EF
and AB C and AB D and AB E and AB F
(Psuedotransitivity) If X -> Y and WY -> Z, then WX -> Z
Ex: If AB EF and DEF G, we can infer ABD G
Example
R = (A, B, C, G, H, I)
F={ AB
AC
CG H
CG I
B H}
Example
some members of F+
AH
by transitivity from A B and B H
AG I
by augmenting A C with G, to get AG CG
and then transitivity with CG I
CG HI
by augmenting CG I to infer CG CGI,
and augmenting of CG H to infer CGI HI,
and then transitivity
Procedure for Computing F+
To compute the closure of a set of functional dependencies F:
F+=F
repeat
for each functional dependency f in F+
apply reflexivity and augmentation rules on f
add the resulting functional dependencies to F +
for each pair of functional dependencies f1and f2 in F +
if f1 and f2 can be combined using transitivity
then add the resulting functional dependency to F +
until F + does not change any further
The Closure of Attribute Sets, X+
F+ can grow quite large, as we keep applying rules to find
more FDs
Sometimes we want to find all of F+, and other times we
just want to find part of it
We are often interested in finding the part that tells us
whether or not some subset of attributes X is a superkey
for R
If you can uniquely determine all attributes in R by some
subset of attributes X, then X is a superkey for R
The closure of X under F, denoted X+, is the subset of
attributes that are uniquely determined by X under F
Algorithm: Determining X+, the Closure of X
under F
X+ = X // initialization
repeat
oldX+ = X+
for each FD Y Z in F do // check each FD
if Y is a subset of X+, then // if the lefthand side is in X+
X+ = X+ U Z // add the righthand side to
X+
until X+ = = oldX+ // loop as long as X+ changes
Example of Attribute Set Closure
R = (A, B, C, G, H, I)
F = {A B
AC
CG H
CG I
B H}
Example of Attribute Set Closure
(AG)+
1. result = AG
2. result = ABCG(A C and A B)
3. result = ABCGH (CG H and CG AGBC)
4. result = ABCGHI (CG I and CG AGBCH)
Is AG a candidate key?
1. Is AG a super key?
1. Does AG R? == Is (AG)+ R
2. Is any subset of AG a superkey?
1. Does A R? == Is (A)+ R
2. Does G R? == Is (G)+ R
Uses of Attribute Closure
There are several uses of the attribute closure algorithm:
Testing for superkey:
To test if is a superkey, we compute +, and check if + contains
all attributes of R.
Testing functional dependencies
To check if a functional dependency holds (or, in other
words, is in F+), just check if +.
That is, we compute + by using attribute closure, and then check if
it contains .
Computing closure of F
For each R, we find the closure +, and for each S +, we
output a functional dependency S.
Example of Computing X+
Given R = (A, B, C, G, H, I) and
F = {A B, A C, CG H, CG I, B H},
compute (AG)+
We want to see if AG is a superkey for R. So we
use the algorithm to compute (AG)+. If (AG)+
contains all attributes of R when the algorithm
stops, then AG is a superkey for R.
Equivalence of Sets of FDs
Two sets of FDs F and G are equivalent if:
- every FD in F can be inferred from G, and
- every FD in G can be inferred from F
Hence, F and G are equivalent if F + = G +
Definition: F covers G if every FD in G can be inferred from F
(i.e., if G + F +)
F and G are equivalent if F covers G and G covers F
Minimal Covers of FDs (1)
A set of FDs is minimal if it satisfies the following
conditions:
(1) Every dependency in F has a single attribute for its RHS.
(2) We cannot replace any dependency X -> A in F with a
dependency Y -> A, where Y X and still have a set of
dependencies that is equivalent to F.
(3) We cannot remove any dependency from F and still have a
set of dependencies that is equivalent to F.
Minimal Covers of FDs (1)
We can think of a minimal set of dependencies as being a
set of dependencies in a standard or canonical form and with
no redundancies.
Condition 1 just represents every dependency in a canonical
form with a single attribute on the right-hand side.
Condition 2 and 3 ensure that there are no redundancies in
the dependencies either by having redundant attributes on
the left-hand side of a dependency (condition 2) or by
having a dependency that can be inferred from the
remaining FDs in F(Condition 3)
Minimal Covers of FDs (2)
Every set of FDs has an equivalent minimal cover
There can be several equivalent minimal covers
There is an algorithm for computing a minimal cover
Minimal Covers of FDs (1)
CANDIDATE KEYS
A candidate key of a relation schema R is a subset X of the
attributes of R with the following two properties:
Every attribute is functionally dependent on X, i.e., X+ = all attributes of
R (also denoted as X+ = R).
No proper subset of X has the property (1), i.e., X is minimal with
respect to the property (1).
A sub-key of R: a subset of a candidate key;
a super-key: a set of attributes containing a candidate key.
For Example :-
we know that AC is a candidate key, both A and C are sub-
keys, and ABC is a super-key. The only other candidate
keys are AB and AD. Note that since nothing determines A,
A is in every candidate key.