0% found this document useful (0 votes)
126 views41 pages

Functional Dependencies and Normalization For Relational Databases

This document discusses database normalization and functional dependencies. It contains the following key points: 1. Normalization is a technique used to organize database tables to reduce data redundancy and inconsistencies. It involves creating tables and relationships according to specific rules. 2. Functional dependencies specify relationships between attributes where the values of one attribute determine values of another. They are used to define normalization rules and measure how well a database design minimizes redundancy. 3. Anomalies like insertion, deletion, and modification anomalies can occur if dependencies are not accounted for properly in the database design. Normalization addresses these anomalies through decomposing tables and eliminating redundant attributes.

Uploaded by

Shakul Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
126 views41 pages

Functional Dependencies and Normalization For Relational Databases

This document discusses database normalization and functional dependencies. It contains the following key points: 1. Normalization is a technique used to organize database tables to reduce data redundancy and inconsistencies. It involves creating tables and relationships according to specific rules. 2. Functional dependencies specify relationships between attributes where the values of one attribute determine values of another. They are used to define normalization rules and measure how well a database design minimizes redundancy. 3. Anomalies like insertion, deletion, and modification anomalies can occur if dependencies are not accounted for properly in the database design. Normalization addresses these anomalies through decomposing tables and eliminating redundant attributes.

Uploaded by

Shakul Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 41

Functional Dependencies and

Normalization for Relational


databases
Normalization
 Database Normalization, or data normalization is a technique
to organize the contents of the table for transactional database
and data warehouses.
 This include creating tables and establishing relationship
between those tables according to rules designed both to
protect data and make the database more flexible by
eliminating two factors
 (i) redundancy
 (ii) inconsistent dependency
Normalization
 Normalization is part of the successful database design,
without normalization database system may be inaccurate,
slow and inefficient and they might not produce the data we
expect.
 “ The process of reducing data redundancy in a relational

database is called Normalization”.


 The main advantages of normalization are ; it frees the
database from certain insertion, deletion and modification
anomalies,
Insertion Anomalies
 Insertion Anomalies.
 Insertion anomalies can be differentiated into two types,
illustrated by the following examples based on the
EMP_DEPT relation:
Insertion Anomalies
 To insert a new employee tuple into EMP_DEPT, we must include
either the attribute values for the department that the employee
works for, or nulls (if the employee does not work for a department
as yet).
 For example, to insert a new tuple for an employee who works in
department number 5, we must enter the attribute values of
department 5 correctly so that they are consistent with values for
department 5 in other tuples in EMP_DEPT.
 In the design of Figure 10.2, we do not have to worry about this
consistency problem because we enter only the department number
in the employee tuple; all other attribute values of department 5 are
recorded only once in the database, as a single tuple in the
DEPARTMENT relation.
Insertion Anomalies
 It is difficult to insert a new department that has no employees as yet
in the EMP_DEPT relation. The only way to do this is to place null
values in the attributes for employee.
 This causes a problem because SSN is the primary key of
EMP_DEPT, and each tuple is supposed to represent an employee
entity-not a department entity.
 Moreover, when the first employee is assigned to that department,
we do not need this tuple with null values any more.
 This problem does not occur in the design of Figure 10.2, because a
department is entered in the DEPARTMENT relation whether or not
any employees work for it, and whenever an employee is assigned to
that department, a corresponding tuple is inserted in EMPLOYEE.
Insertion Anomalies
 Insert Anomaly: Cannot insert a new department unless an
employee is assigned to .
Inversely - Cannot insert an employee unless an he/she is
assigned to a Department.
 For Example:- (1)To insert a new employee tuple into
EMP_DEPT, we must include either the attribute values for
the department that the employee works for, or nulls (if the
employee does not work for a department as yet).
Insertion Anomalies
 (2) It is difficult to insert a new department that has no
employees as yet in the EMP_DEPT relation. The only way
to do this is to place null values in the attributes for
employee. This causes a problem because SSN is the
primary key of EMP_DEPT.
 These problems are not occurs in the Example 2
Example
Example
Deletion Anomalies
 When a project is deleted, it will result in deleting all the
employees who work on that project.
Deletion Anomalies
 If we delete from EMP_DEPT an employee tuple that
happens to represent the last employee working for a
particular department, the information concerning that
department is lost from the database.
Deletion Anomalies
Modification Anomalies
 In EMP_DEPT, if we change the value of one of the attributes of
a particular department-say, the manager of department 5-we
must update the tuples of all employees who work in that
department; otherwise, the database will become inconsistent. If
we fail to update some tuples, the same department will be shown
to have two different values for manager in different employee
tuples, which would be wrong.
Functional Dependencies
 Functional dependencies (FDs) are used to specify formal
measures of the "goodness" of relational designs
 FDs and keys are used to define normal forms for
relations
 FDs are constraints that are derived from the meaning
and interrelationships of the data attributes
 A set of attributes X uniquely determines a set of
attributes Y if the value of X determines a unique value
for Y
Functional Dependencies
 X -> Y holds if whenever two tuples have the same value
for X, they must have the same value for Y
 For any two tuples t1 and t2 in any relation instance r(R): If
t1[X]=t2[X], then t1[Y]=t2[Y]
 X -> Y in R specifies a constraint on all relation instances
r(R)
 Written as X -> Y; Read as “X uniquely determines Y”
 FDs are derived from the real-world constraints on the
attributes
Examples of FD constraints
 social security number determines employee name
SSN -> ENAME
 project number determines project name and location
PNUMBER -> {PNAME, PLOCATION}
 employee ssn and project number determines the hours
per week that the employee works on the project
{SSN, PNUMBER} -> HOURS
Examples of FD constraints
 An FD is a property of the attributes in the schema R
 The constraint must hold on every relation instance
r(R)
 If K is a key of R, then K functionally determines all
attributes in R (since we never have two distinct tuples
with t1[K]=t2[K])
Example
Review of Functional Dependencies
 FDs are constraints on the set of legal relations
 They require that the value for one set of attributes
uniquely determines the value for another set of attributes
 We write X  Y to mean that the value(s) for the
attribute(s) of X uniquely determine the value(s) for the
attribute(s) of Y
 We expect some FDs to hold in any real world. For
example, we expect SSN to uniquely determine FNAME.
However, we do not expect FNAME to uniquely
determine SSN.
Computing F+, the Closure of a
Set of FDs F
 When we design a relational DB, we start out by
considering the set of FDs we expect to hold.
 It is important to consider all FDs, so we need to be sure
we’ve listed them all
 The closure of a set of FDs F is the set of all FDs that are
logically implied by F
 We denote the closure of F by F+
 We compute F+ by applying Armstrong’s Axioms
Inference Rules for FDs
 The following six Inference rule (IR1 to IR6) are well
know as Inference Rules for Functional Dependencies
  Armstrong's Axioms :

IR1. (Reflexivity) If Y  X, then X -> Y


Ex: ABC  BC
IR2. (Augmentation) If X -> Y, then XZ -> YZ
Ex: If we know C  D, then we know ABC  ABD
IR3. (Transitivity) If X -> Y and Y -> Z, then X -> Z
Ex: If we know AB  CD and CD  EF, then we
know AB  EF
  
Inference Rules for FDs
 Armstrong’s axioms are sound because they do not
generate any incorrect functional dependencies.
 They are complete because for a given set F of Functional
dependencies they allows us to generate all F+.
Inference Rules for FDs
More useful rules can be deduced from Armstrong’s Axioms:

(Union) If X -> Y and X -> Z, then X -> YZ


Ex: If AB  CD and AB  EF, we can infer AB  CDEF
(Decomposition) If X -> YZ, then X -> Y and X -> Z
Ex: If AB  CDEF, we can infer AB  CD and AB  EF
and AB  C and AB  D and AB  E and AB  F
(Psuedotransitivity) If X -> Y and WY -> Z, then WX -> Z
Ex: If AB  EF and DEF  G, we can infer ABD  G
Example
 R = (A, B, C, G, H, I)
F={ AB
AC
CG  H
CG  I
B  H}
Example
 some members of F+
 AH

 by transitivity from A  B and B  H

 AG  I

 by augmenting A  C with G, to get AG  CG

and then transitivity with CG  I


 CG  HI

 by augmenting CG  I to infer CG  CGI,

and augmenting of CG  H to infer CGI  HI,


and then transitivity
Procedure for Computing F+
 To compute the closure of a set of functional dependencies F:
F+=F
repeat
for each functional dependency f in F+
apply reflexivity and augmentation rules on f
add the resulting functional dependencies to F +
for each pair of functional dependencies f1and f2 in F +
if f1 and f2 can be combined using transitivity
then add the resulting functional dependency to F +
until F + does not change any further
The Closure of Attribute Sets, X+
 F+ can grow quite large, as we keep applying rules to find
more FDs
 Sometimes we want to find all of F+, and other times we
just want to find part of it
 We are often interested in finding the part that tells us
whether or not some subset of attributes X is a superkey
for R
 If you can uniquely determine all attributes in R by some
subset of attributes X, then X is a superkey for R
 The closure of X under F, denoted X+, is the subset of
attributes that are uniquely determined by X under F
Algorithm: Determining X+, the Closure of X
under F

X+ = X // initialization
repeat
oldX+ = X+
for each FD Y  Z in F do // check each FD
if Y is a subset of X+, then // if the lefthand side is in X+
X+ = X+ U Z // add the righthand side to
X+
until X+ = = oldX+ // loop as long as X+ changes
Example of Attribute Set Closure
 R = (A, B, C, G, H, I)
 F = {A  B
AC
CG  H
CG  I
B  H}
Example of Attribute Set Closure
 (AG)+
1. result = AG
2. result = ABCG(A  C and A  B)
3. result = ABCGH (CG  H and CG  AGBC)
4. result = ABCGHI (CG  I and CG  AGBCH)
 Is AG a candidate key?
1. Is AG a super key?

1. Does AG  R? == Is (AG)+  R

2. Is any subset of AG a superkey?

1. Does A  R? == Is (A)+  R

2. Does G  R? == Is (G)+  R
Uses of Attribute Closure
There are several uses of the attribute closure algorithm:
 Testing for superkey:
 To test if  is a superkey, we compute +, and check if + contains
all attributes of R.
 Testing functional dependencies
 To check if a functional dependency    holds (or, in other
words, is in F+), just check if   +.
 That is, we compute + by using attribute closure, and then check if
it contains .
 Computing closure of F
 For each   R, we find the closure +, and for each S  +, we
output a functional dependency   S.
Example of Computing X+
Given R = (A, B, C, G, H, I) and
F = {A  B, A  C, CG  H, CG  I, B  H},
compute (AG)+

We want to see if AG is a superkey for R. So we


use the algorithm to compute (AG)+. If (AG)+
contains all attributes of R when the algorithm
stops, then AG is a superkey for R.
Equivalence of Sets of FDs
 Two sets of FDs F and G are equivalent if:
- every FD in F can be inferred from G, and
- every FD in G can be inferred from F
 Hence, F and G are equivalent if F + = G +

Definition: F covers G if every FD in G can be inferred from F


(i.e., if G +  F +)
 F and G are equivalent if F covers G and G covers F
Minimal Covers of FDs (1)

 A set of FDs is minimal if it satisfies the following


conditions:
(1) Every dependency in F has a single attribute for its RHS.
(2) We cannot replace any dependency X -> A in F with a
dependency Y -> A, where Y  X and still have a set of
dependencies that is equivalent to F.
(3) We cannot remove any dependency from F and still have a
set of dependencies that is equivalent to F.
Minimal Covers of FDs (1)
 We can think of a minimal set of dependencies as being a
set of dependencies in a standard or canonical form and with
no redundancies.
 Condition 1 just represents every dependency in a canonical
form with a single attribute on the right-hand side.
 Condition 2 and 3 ensure that there are no redundancies in
the dependencies either by having redundant attributes on
the left-hand side of a dependency (condition 2) or by
having a dependency that can be inferred from the
remaining FDs in F(Condition 3)
Minimal Covers of FDs (2)
 Every set of FDs has an equivalent minimal cover
 There can be several equivalent minimal covers
 There is an algorithm for computing a minimal cover
Minimal Covers of FDs (1)
CANDIDATE KEYS
 A candidate key of a relation schema R is a subset X of the
attributes of R with the following two properties:
 Every attribute is functionally dependent on X, i.e., X+ = all attributes of
R (also denoted as X+ = R).
 No proper subset of X has the property (1), i.e., X is minimal with
respect to the property (1).
 A sub-key of R: a subset of a candidate key;
 a super-key: a set of attributes containing a candidate key.
 For Example :-
 we know that AC is a candidate key, both A and C are sub-
keys, and ABC is a super-key. The only other candidate
keys are AB and AD. Note that since nothing determines A,
A is in every candidate key.

You might also like