5832module 3 (Functional Dependency) - 5th Semester - Computer Science and Engineering
5832module 3 (Functional Dependency) - 5th Semester - Computer Science and Engineering
MODULE – 3
RELATIONAL DATABASE DESIGN
Since for each value of A there is associated one and only one value of B.
It means that the values of the Y of a tuple in r depend upon or determined by, the
value of X component. Alternatively, the values of X component of a tuple uniquely
(or Functionally) determine the values of the Y component.
Thus, X functionally determines Y in a relation schema R if and only if, whenever two
tuples of r(R) agree on their X-value, they must necessarily agree on their Y value.
Partial Dependency:
A functional dependency that holds in a relation is partial when removing one of
the determining attributes gives a functional dependency that holds in the
relation.
E.g. if {A,B} → {C} but also {A} → {C} then {C} is partially functionally dependent
on {A,B}.
Example:
Let us assume a relation R with attributes A, B, C, and D. Also, assume that the
set of functional dependencies F that hold on R as follows;
F = {A → B, D → C}.
From set of attributes F, we can derive the primary key. For R, the key can be
(A,D), a composite primary key. That means, AD → BC, AD can uniquely identify
B and C. But, for this case A and D is not required to identify B or C uniquely. To
identify B, attribute A is enough. Likewise, to identify C, attribute D is enough.
The functional dependencies AD → B or AD → C are called as Partial functional
dependencies.
Functional Dependencies:
Data dependencies are constraints imposed on data in database.
They are part of the scheme definition.
FDs allow us to formally define keys.
A conjecture (It has to be proven) is that a set of functional dependencies
and one join dependency are enough to express the dependency structure
of a relational database scheme.
Motivation:
Functional dependencies help in accomplishing the following two goals:
(a) controlling redundancy and
(b) enhancing data reliability.
Problematic Issue:
Representing the set of all FDs for a relation R.
Solution:
Find a basic set of FDs.
Use axioms for inferring.
Represent the set of all FDs as the set of FDs that can be inferred from the basic
set of FDs.
Axioms:
An inference axiom is a rule that states if a relation satisfies certain FDs then it must
satisfy certain other FDs.
FD manipulations:
Soundness -- no incorrect FD's are generated
Completeness -- all FD's can be generated
Let R(U) be a relation scheme over the set of attributes U. We will use the letters X, Y,
Z & W to represent any subset of and, for short, the union of two sets of attributes
and by instead of the usual X U Y.
F1. Reflexivity X → X
F2. Augmentation If (Z W; X → Y) then XW → YZ
F3. Additivity If { (X → Y) (X → Z)} then X → YZ
F4. Projectivity If (X → YZ) then X → Y
F5. Transitivity If (X → Y) and (Y → Z) then (X → Z)
F6. Pseudotransitivity If (X → Y) and (YZ → W) then XZ → W
i. {W →Y, X →Z } {WX → Y}
Given
W →Y --------- (a)
X →Z --------- (b)
by augmenting in (a) by X
WX → XY ------- (c)
by decomposing in (c)
WX → Y Hence, it is proved.
Given X → Z i.e. X Z and that any two tuples t1 and t2 of relation R such that
t1[X] = t2[X] then t1[Z] = t2[Z].
Also given Y → Z i.e. Y Z and that any two tuples t1 and t2 of relation R such
that t1[Y] = t2[Y] then t1[Z] = t2[Z].
This implies that X Y; i.e. X →Y. Hence, it is proved.
1. Reflexivity X → X
2. Augmentation If (Z W; X → Y) then XW → YZ
3. Additivity If { (X → Y) (X → Z)} then X → YZ
4. Projectivity If (X → YZ) then X → Y
5. Transitivity If (X → Y) and (Y → Z) then (X → Z)
6. Pseudotransitivity If (X → Y) and (YZ → W) then XZ → W
FDs: A → A, A → B, A → C, B → B, B → C, C → C, D → D, AB → A, AB → B, AB → C, AC
→ A, AC → B, AC → C, AD → A, AD → B, AD → C, AD → D, BC → B, BC → C, BD → B,
BD → C, BD → D, CD → C, CD → D, ABC → A, ABC → B, ABC → C, ABD → A, ABD → B,
ABD → C, ABD → D, BCD → B, BCD → C, BCD → D, ABCD → A, ABCD → B, ABCD → C,
ABCD → D.
result := α;
while (changes to result) do
for each β → γ in F do
begin
if β ⊆ result then result := result ∪ γ;
end
Example:
R = (A, B, C, G, H, I) and F = {A → B A → C CG → H CG → I B → H}
• (AG+)
1. result = AG
2. result = ABCG (A → C and A ⊆ AGB)
3. result = ABCGH (CG → H and CG ⊆ AGBC)
4. result = ABCGHI (CG → I and CG ⊆ AGBCH)
Note: A proper subset Y is a subset of X such that Y != X (i.e., X has at least one
element not in Y ).
Example.
Consider a table R(A, B, C, D), and that F = {A → B, B → C}.
Given A → C, A determines C
C → D, C determines D
D → B, D determines B
E → F E determines F
Now, the easiest way is to find which attributes are not determined. In this example,
A and E are not determined. Then, find out the closure of (AE)+.
(AE)+ = AE
= ACE
= ACDE
= ACDBE
= ACDBEF
Closure of AE has all the attributes. Thus, AE is a candidate key. In this way, we can
find out more candidate keys for this problem.
Normalization
While designing a database out of an entity–relationship model, the main problem
existing in that “raw” database is redundancy. Redundancy is storing the same data
item in more one place. A redundancy creates several problems like the following:
Extra storage space: storing the same data in many places takes large amount
of disk space.
Entering same data more than once during data insertion.
Deleting data from more than one place during deletion.
Modifying data in more than one place.
Anomalies may occur in the database if insertion, deletion, modification etc
are no done properly. It creates inconsistency and unreliability in the
database.
To solve this problem, the “raw” database needs to be normalized. This is a step by
step process of removing different kinds of redundancy and anomaly at each step. At
each step a specific rule is followed to remove specific kind of impurity in order to
give the database a slim and clean look.
Normalization
Normalization of Database
Database Normalization is a technique of organizing the data in the database.
Normalization is a systematic approach of decomposing tables to eliminate data
redundancy and undesirable characteristics like Insertion, Update and Deletion
Anomalies. It is a multi-step process that puts data into tabular form by removing
duplicated data from the relation tables.
Normalization is used for mainly two purpose,
Eliminating redundant(useless) data.
Ensuring data dependencies make sense i.e. data is logically stored.
Normalization
Un-Normalized Form (UNF)
If a table contains non-atomic values at each row, it is said to be in UNF. An atomic
value is something that can not be further decomposed. A non-atomic value, as the
name suggests, can be further decomposed and simplified. Consider the following
table:
Emp-Id Emp-Name Month Sales Bank-Id Bank-Name
E01 AA Jan 1000 B01 SBI
Feb 1200
Mar 850
E02 BB Jan 2200 B02 UTI
Feb 2500
E03 CC Jan 1700 B01 SBI
Feb 1800
Mar 1850
Apr 1725
In the sample table above, there are multiple occurrences of rows under each key
Emp-Id. Although considered to be the primary key, Emp-Id cannot give us the
unique identification facility for any single row. Further, each primary key points to a
variable length record (3 for E01, 2 for E02 and 4 for E03).
Normalization
Problems without Normalization
If a database design is not perfect, it may contain anomalies, which are like a bad
dream for any database administrator. Managing a database with anomalies is next
to impossible.
Update anomalies − If data items are scattered and are not linked to each other
properly, then it could lead to strange situations. For example, when we try to update
one data item having its copies scattered over several places, a few instances get
updated properly while a few others are left with old values. Such instances leave the
database in an inconsistent state.
Deletion anomalies − We tried to delete a record, but parts of it was left undeleted
because of unawareness, the data is also saved somewhere else.
Insert anomalies − We tried to insert data in a record that does not exist at all.
Normalization is a method to remove all these anomalies and bring the database to a
consistent state.
Normalization
Normalization Rule
Normalization rule are divided into following normal form.
First Normal Form
Second Normal Form
Third Normal Form
BCNF
Fourth Normal Form
Fifth Normal Form (PJNF)
Normalization
First Normal Form (1NF)
A relation is said to be in 1NF if it contains no non-atomic values and each row can
provide a unique combination of values. The above table in UNF can be processed to
create the following table in 1NF.
Emp-Id Emp-Name Month Sales Bank-Id Bank-Name
E01 AA Jan 1000 B01 SBI
E01 AA Feb 1200 B01 SBI
E01 AA Mar 850 B01 SBI
E02 BB Jan 2200 B02 UTI
E02 BB Feb 2500 B02 UTI
E03 CC Jan 1700 B01 SBI
E03 CC Feb 1800 B01 SBI
E03 CC Mar 1850 B01 SBI
E03 CC Apr 1725 B01 SBI
As you can see now, each row contains unique combination of values. Unlike in UNF,
this relation contains only atomic values, i.e. the rows can not be further
decomposed, so the relation is now in 1NF.
Normalization
Company Symbol Headquarters Date Close Price
Microsoft MSFT Redmond, WA 09/07/2013 23.96
09/08/2013 23.93
Un-Normal Form 09/09/2013 24.01
Table
Oracle ORCL Redwood Shores, CA 09/07/2013 24.27
09/08/2013 24.14
09/09/2013 24.33
Normalization
Second Normal Form (2NF)
A relation is said to be in 2NF f if it is already in 1NF and each and every attribute
fully depends on the primary key of the relation. Speaking inversely, if a table has
some attributes which is not dependant on the primary key of that table, then it is
not in 2NF.
Let us explain. Emp-Id is the primary key of the above relation. Emp-Name, Month,
Sales and Bank-Name all depend upon Emp-Id. But the attribute Bank-Name depends
on Bank-Id, which is not the primary key of the table. So the table is in 1NF, but not in
2NF. If this position can be removed into another related relation, it would come to
2NF.
Emp-Id Emp-Name Month Sales Bank-Id Bank-Id Bank-Name
E01 AA JAN 1000 B01 B01 SBI
E01 AA FEB 1200 B01 B02 UTI
E01 AA MAR 850 B01
E02 BB JAN 2200 B02 After removing the portion into another
E02 BB FEB 2500 B02 relation we store lesser amount of data in
E03 CC JAN 1700 B01 two relations without any loss
E03 CC FEB 1800 B01
information. There is also a significant
E03 CC MAR 1850 B01
E03 CC APR 1726 B01 reduction in redundancy.
Normalization
The following example relation is not in 2NF:
STOCKS (Company, Symbol, Headquarters, Date, Close_Price)
To start the normalization process, list the functional dependencies (FD):
FD1: Symbol, Date → Company, Headquarters, Close Price
FD2: Symbol → Company, Headquarters
Consider that Symbol, Date → Close Price. So we might use Symbol, Date as our key.
However we also see that: Symbol → Headquarters
This violates the rule for 2NF in that a part of our key. key determines a non-
key attribute.
Another name for this is a Partial key dependency. Symbol is only a “part” of
the key and it determines a non-key attribute.
Also, consider the insertion and deletion anomalies.
One Solution:
Split this up into two new relations:
COMPANY (Company, Symbol, Headquarters)
STOCK_PRICES (Symbol, Date, Close_Price)
Normalization
• At this point we have Company Symbol Headquarters Date Close Price
two new relations in our Microsoft MSFT Redmond, WA 09/07/2013 23.96
relational model. The
Microsoft MSFT Redmond, WA 09/08/2013 23.93
original “STOCKS”
relation we started with Microsoft MSFT Redmond, WA 09/09/2013 24.01
is removed form the Oracle ORCL Redwood Shores, CA 09/07/2013 24.27
model. Oracle ORCL Redwood Shores, CA 09/08/2013 24.14
• Sample data and Oracle ORCL Redwood Shores, CA 09/09/2013 24.33
functional dependencies
for the two new FD1: Symbol, Date → Company, Headquarters, Close Price
relations:
• COMPANY Relation: Symbol Date Close Price
MSFT 09/07/2013 23.96
FD2: Symbol → Company, Headquarters
MSFT 09/08/2013 23.93
Company Symbol Headquarters MSFT 09/09/2013 24.01
Microsoft MSFT Redmond, WA ORCL 09/07/2013 24.27
Oracle ORCL Redwood Shores, CA ORCL 09/08/2013 24.14
ORCL 09/09/2013 24.33
Normalization
Third Normal Form (3NF)
A relation is in third normal form (3NF) if it is in second normal form and it contains
no transitive dependencies.
Consider relation R containing attributes A, B and C. R(A, B, C)
If A → B and B → C then A → C
Transitive Dependency: Three attributes with the above dependencies.
Example: At CUNY:
Course_Code → Course_Number, Section
Course_Number, Sec on → Classroom, Professor
Consider one of the new relations we created in the STOCKS example for 2nd normal
form:
The functional dependencies we can see are:
FD1: Symbol → Company
FD2: Company → Headquarters Company Symbol Headquarters
so therefore: Symbol → Headquarters Microsoft MSFT Redmond, WA
This is a transitive dependency. Oracle ORCL Redwood Shores, CA
What happens if we remove Oracle?
We loose information about 2 different facts.
Normalization
The solution again is to split this relation up into two new relations:
STOCK_SYMBOLS(Company, Symbol)
COMPANY_HEADQUARTERS(Company, Headquarters)
This gives us the following sample data and FD for the new relations
Normalization
Boyce-Codd Normal Form (BCNF)
A relation is in BCNF if every determinant is a candidate key.
Recall that not all determinants are keys.
Those determinants that are keys we initially call candidate keys.
Eventually, we select a single candidate key to be the key for the relation.
Consider the following example:
Funds consist of one or more Investment Types.
Funds are managed by one or more Managers
Investment Types can have one more Managers
Managers only manage one type of investment.
Normalization
In this case, the combination FundID and InvestmentType form a candidate
key because we can use FundID,InvestmentType to uniquely identify a tuple in
the relation.
Similarly, the combination FundID and Manager also form a candidate
key because we can use FundID, Manager to uniquely identify a tuple.
Manager by itself is not a candidate key because we cannot use Manager alone
to uniquely identify a tuple in the relation.
Is this relation FUNDS(FundID, InvestmentType, Manager) in 1NF, 2NF or 3NF ?
Given we pick FundID, InvestmentType as the Primary Key:
1NF for sure.
2NF because all of the non-key attributes (Manager) is dependant on all of
the key.
3NF because there are no transitive dependencies.
However consider what happens if we delete the tuple with FundID 22. We loose
the fact that Brown manages the InvestmentType “Growth Stocks.”
Normalization
Therefore, while FUNDS relation is in 1NF, 2NF and 3NF, it is in BCNF because not
all determinants (Manager in FD3) are candidate keys.
The following are steps to normalize a relation into BCNF:
List all of the determinants.
See if each determinant can act as a key (candidate keys).
For any determinant that is not a candidate key, create a new relation from
the functional dependency. Retain the determinant in the original relation.
For our example:FUNDS (FundID, InvestmentType, Manager)
The determinants are:FundID, InvestmentType FundID, Manager Manager
Which determinants can act as keys?
FundID, InvestmentType YES
FundID, Manager YES
Manager NO
Create a new relation from the functional dependency:
MANAGERS(Manager, InvestmentType),
FUND_MANAGERS(FundID, Manager)
In this last step, we have retained the determinant “Manager” in the original relation
MANAGERS.Each of the new relations should be checked to ensure they meet the
definitions of 1NF, 2NF, 3NF and BCNF
Normalization
For our example:FUNDS (FundID, InvestmentType, Manager)
The determinants are:FundID, InvestmentType FundID, Manager Manager
Which determinants can act as keys?
FundID, InvestmentType YES
FundID, Manager YES
Manager NO
Create a new relation from the functional dependency:
MANAGERS(Manager, InvestmentType),
FUND_MANAGERS(FundID, Manager)
In this last step, we have retained the determinant “Manager” in the original relation
MANAGERS. Each of the new relations should be checked to ensure they meet the
definitions of 1NF, 2NF, 3NF and BCNF
FundID Manager InvestmentType Manager
99 Smith Common Stock Smith
99 Jones Municipal Bonds Jones
33 Green Common Stock Green
22 Brown Growth Stocks Brown
11 Smith Common Stock Smith
Normalization
Fourth Normal Form (4NF)
A relation is in fourth normal form if it is in BCNF and it contains no multivalued
dependencies.
Multivalued Dependency: A type of functional dependency where the determinant
can determine more than one value.
More formally, there are 3 criteria:
There must be at least 3 attributes in the relation. call them A, B, and C, for
example.
Given A, one can determine multiple values of B.
Given A, one can determine multiple values of C.
B and C are independent of one another.
Book example:
Student has one or more majors.
Student participates in one or more activities.
Normalization
A few characteristics:
No regular functional dependencies
All three attributes taken together form the key.
Later two attributes are independent of one another.
Insertion anomaly: Cannot add a stock fund without adding a bond fund (NULL
Value). Must always maintain the combinations to preserve the meaning.
Stock Fund and Bond Fund form a multivalued dependency on Portfolio ID.
PortfolioID →→ Stock Fund
PortfolioID →→ Bond Fund
Normalization
Resolution: Split into two tables with the common key:
Normalization
Fifth Normal Form (5NF)
Also called “Projection Join” Normal form.
There are certain conditions under which after decomposing a relation, it cannot
be reassembled back into its original form.
Algorithm
i/p F
o/p Fc
repeat
1. Use the decompose rule to replace
a1 b1 & a1 b2
by a1 b1b2
2. Find FD a b which has an extraneous attribute ‘A’ and remove A from
ab
3. Remove redundant FD
Until F does not change
a) F covers G
b) G covers F
c) F & G are equal
d) None
Answer is d) None