Unit 3
Unit 3
Relational Database Design: Features of Good Relational Designs- Atomic Domains and First Normal
Form- Second Normal Form-Decomposition Using Functional Dependencies- Functional-Dependency
Theory-Algorithms for decomposition- Decomposition Using Multivalued Dependencies-More Normal
Forms- Database-Design Process- Modeling Temporal Data
Two Marks
1. What is normalization?
The main goal of normalization is to reduce redundant data. Normalization is based on functional
dependencies.
First normal form is also called as flat file. There are no composite attributes and every attribute is
single and describe one property.
Functional dependencies play key role in differentiating good database designs from bad database
design. A functional dependency is a type of constraint that is a generalization of the notion of key.
4. What is decomposition?
The process of decomposing a relation schema that has many attributes into several schemas with
fewer attributes is called decomposition.
A relation schema R is in BCNF with respect to a set F of functional dependencies if, for all functional
dependencies in F+ of the form α→β, where α C R and β C R, atleast one of the following holds:
A relation is said to be in Second normal form, if it is in first normal form and non key attributes are
functionally dependent on the key attributes.
This schema is called as non- BCNF schema. A relation defined with multivalued dependency which is a
new form of constraint is called forth normal form. It is more restrictive than BCNF.
Multivalued dependencies are referred to as tuple generating dependencies. It do not rule out existence
of certain tuples, instead they require other tuples of certain form be present in the relation.
3CNF BCNF
11 MARKS
1.Explain in detail about Relational database design? (11 Marks)
A domain is atomic if elements of the domain are considered to be indivisible units. We say that
a relation schema R is in first normal form (1NF) if the domains of all attributes of R are atomic. A set
of names is an example of a nonatomic value. For example, if the schema of a relation employee
included an attribute childrenwhose domain elements are sets of names, the schema would not be in
first normal form. Composite attributes, such as an attribute address with component attributes street
and city, also have nonatomic domains. Integers are assumed to be atomic, so the set of integers is an
atomic domain; the set of all sets of integers is a nonatomic domain. The distinction is that we do not
normally consider integers to have subparts, but we consider sets of integers to have subparts namely,
the integers making up the set.
The domain of all integers would be nonatomic if we considered each integer to be an ordered list of
digits.
Some types of nonatomic values can be useful, although they should be used with care. For example,
composite valued attributes are often useful, and set valued attributes are also useful in many cases,
which is why both are supported in the E-R model.
Among the undesirable properties that a bad design may have are:
Repetition of information
Inability to represent certain information
Suppose the information concerning loans is kept in one single relation, lending, which is defined over
the relation schema
Lending-schema = (branch-name, branch-city, assets, customer-name,
loan-number, amount)
A tuple t in the lending relation has the following intuitive meaning:
t[assets] is the asset figure for the branch named t[branch-name].
Suppose that we wish to add a new loan to our database. Say that the loan is made by the Perryridge
branch to Adams in the amount of $1500. Let the loan-number be L-31. In our design, we need a tuple
with values on all the attributes of Lendingschema. Thus, we must repeat the asset and city data for the
Perryridge branch, and must add the tuple
(Perryridge, Horseneck, 1700000, Adams, L-31, 1500)
Another problem with the Lending-schema design is that we cannot represent directly the information
concerning a branch (branch-name, branch-city, assets) unless there exists at least one loan at the
branch. This is because tuples in the lending relation require values for loan-number, amount, and
customer-name. One solution to this problem is to introduce null values, as we did to handle updates
through views.
Basic Concepts
Functional dependencies are constraints on the set of legal relations. They allow us to express facts
about the enterprise that we are modeling with our database.
Let R be a relation schema. A subset K of R is a superkeyof R if, in any legal relation r(R), for all pairs t1
and t2 of tuples in r such that t1 _= t2, then t1[K] _= t2[K]. That is, no two tuples in any legal relation
r(R) may have the same value on attribute set K. The notion of functional dependency generalizes the
notion of superkey. Consider a relation schema R, and let α ⊆R and β ⊆R. The functional dependency
α→β
holds on schema R if, in any legal relation r(R), for all pairs of tuples t1 and t2 in r such that t1[α] =
t2[α], it is also the case that t1[β] = t2[β].
1. To test relations to see whether they are legal under a given set of functional dependencies. If a
relation r is legal under a set F of functional dependencies, we say that r satisfies F.
2. To specify constraints on the set of legal relations. We shall thus concern ourselves with only
those relations that satisfy a given set of functional dependencies. If we wish to constrain
ourselves to relations on schema R that satisfy a set F of functional dependencies, we say that F
holds on R.
On Customer-schema:customer-name→ customer-city
customer-name→ customer-street
More formally, given a relational schema R, a functional dependency f on R is logically implied by a set
of functional dependencies F on R if every relation instance r(R) that satisfies F also satisfies f.
Suppose we are given a relation schema R = (A, B, C, G, H, I) and the set of functional dependencies
A →B
A →C
CG→ H
CG→ I
B→H
The functional dependency
DEPT OF CSE Page 8
A→ H
is logically implied. That is, we can show that, whenever our given set of functional dependencies holds
on a relation, A→H must also hold on the relation. Suppose that t1 and t2 are tuples such that
t1[A] = t2[A]
Since we are given that A→B, it follows from the definition of functional dependency that
t1[B] = t2[B]
Then, since we are given that B → H, it follows from the definition of functional dependency that
t1[H] = t2[H]
Therefore, we have shown that, whenever t1 and t2 are tuples such that t1[A] = t2[A], it must be that
t1[H] = t2[H]. But that is exactly the definition of A→ H.
We can use the following three rules to find logically implied functional dependencies. By applying
these rules repeatedly, we can find all of F+, given F. This collection of rules is called Armstrong’s
axioms in honor of the person who first proposed it.
• Reflexivity rule. If α is a set of attributes and β ⊆α, then α →β holds.
• Augmentation rule. If α → β holds and γ is a set of attributes, then γα → γβ holds.
• Transitivity rule. If α →β holds and β → γ holds, then α → γ holds.
Although Armstrong’s axioms are complete, it is tiresome to use them directly for the computation of
F+. To simplify matters further, we list additional rules. It is possible to use Armstrong’s axioms to
prove that these rules are correct .
To test whether a set α is a superkey, we must devise an algorithm for computing the set of
attributes functionally determined by α. One way of doing this is to compute F+, take all functional
dependencies with α as the left-hand side, and take the union of the right-hand sides of all such
dependencies.However, doing so can be expensive, since F+ can be large.
Let α be a set of attributes.We call the set of all attributes functionally determined by α under a set F of
functional dependencies the closure of α under F; we denote it by α+. The input is a set F of functional
dependencies and the set α of attributes. The output is stored in the variable result. The first time that
we execute the while loop to test each functional dependency, we find that
• A → B causes us to include B in result. To see this fact, we observe that A → B is in F, A ⊆result (which
is AG), so result :=result ∪B.
• A→ C causes result to become ABCG.
• CG→H causes result to become ABCGH.
• CG→I causes result to become ABCGHI.
It turns out that, in the worst case, this algorithm may take an amount of time quadratic in the size of F.
There is a faster (although slightly more complex) algorithm that runs in time linear in the size of F.
result:= α;
while(changes to result) do
for each functional dependency β → γ in F do
begin
ifβ ⊆result then result := result ∪γ;
end
Canonical Cover
Suppose that we have a set of functional dependencies F on a relation schema. Whenever a user
performs an update on the relation, the database system must ensure that the update does not violate
any functional dependencies, that is, all the functional dependencies in F are satisfied in the new
database state. The system must roll back the update if it violates any functional dependencies in the set
F.
An attribute of a functional dependency is said to be extraneous if we can remove it without
changing the closure of the set of functional dependencies. The formal definition of extraneous
attributes is as follows. Consider a set F of functional dependencies and the functional dependency α
→β in F.
• Attribute A is extraneous in α ifA ∈α, and F logically implies (F − {α →β}) ∪ {(α − A) → β}.
• Attribute A is extraneous in β if A ∈β, and the set of functional dependencies
(F − {α →β}) ∪ {α → (β − A)} logically implies F.
A canonical cover Fc for F is a set of dependencies such that F logically implies all dependencies
in Fc, and Fc logically implies all dependencies in F. Furthermore, Fcmust have the following properties:
• No functional dependency in Fc contains an extraneous attribute.
• Each left side of a functional dependency in Fc is unique. That is, there are no two dependencies α1 →
β1 and α2 → β2 in Fc such that α1 = α2.
Consider an alternative design in which we decompose Lending-schema into the following two
schemas:
Branch-customer-schema = (branch-name, branch-city, assets, customer-name)
Customer-loan-schema = (customer-name, loan-number, amount)
Using the lending relation of Figure 7.1,we construct our new relations branch-customer
(Branch-customer) and customer-loan (Customer-loan-schema):
branch-customer= Π branch-name, branch-city, assets, customer-name (lending)
customer-loan= Π customer-name, loan-number, amount (lending)
Lossless-Join Decomposition
Let R be a relation schema, and let F be a set of functional dependencies on R. Let R1 and R2 form a
decomposition of R. This decomposition is a lossless-join decomposition of R if at least one of the
following functional dependencies is in F+:
• R1 ∩ R2 → R1
• R1 ∩ R2 → R2
In other words, if R1 ∩ R2 forms a superkey of either R1 or R2, the decomposition of R is a lossless-join
decomposition.
To decide whether joins must be computed to check an update, we need to determine what
functional dependencies can be tested by checking each relation individually. Let F be a set of functional
dependencies on a schema R, and let R1, R2, . . . ,Rn be a decomposition of R. The restriction of F to Riis
the set Fi of all functional dependencies in F+ that include only attributes of Ri. Since all functional
dependencies in a restriction involve attributes of only one relation schema, it is possible to test such a
dependency for satisfaction by checking only one relation.
We consider each member of the set F of functional dependencies that we require to hold on Lending-
schema, and show that each one can be tested in at least one relation in the decomposition.
• We can test the functional dependency: branch-name → branch-city assets using Branch-schema =
(branch-name, branch-city, assets)
• We can test the functional dependency: loan-number → amount branch-name using Loan-schema =
(branch-name, loan-number, amount).
Compute F+;
for each schema Riin D do
begin
Fi : = the restriction of F+ to Ri;
end
F_ := ∅
for each restriction Fi do
begin
F_ = F_ ∪Fi
end
compute F_+;
if(F_+ = F+) then return (true)
else return (false);
In the decomposition, the relation on schema Borrower schema contains the loan-number, customer-
name relationship, and no other schema does. Therefore, we have one tuple for each customer for a
loan in only the relation on Borrower-schema. In the other relations involving loan-number (those on
schemasLoan-schema and Borrower-schema), only one tuple per loan needs to appear.
One of the more desirable normal forms that we can obtain is Boyce–Codd normal form (BCNF). A
relation schema R is in BCNF with respect to a set F of functionaldependencies if, for all functional
dependencies in F+ of the form α → β, where α ⊆R and β ⊆R, at least one of the following holds:
• α→ β is a trivial functional dependency (that is, β ⊆α).
• αis a superkey for schema R.
A database design is in BCNF if each member of the set of relation schemas that constitutes the design is
in BCNF. Consider the following relation schemas and their respective functional dependencies:
• Customer-schema = (customer-name, customer-street, customer-city) customer-name → customer-
street customer-city
• Branch-schema = (branch-name, assets, branch-city) branch-name → assets branch-city
• Loan-info-schema = (branch-name, customer-name, loan-number, amount) loan-number → amount
branch-name
It is now possible to avoid redundancy in the case where there are several customers associated with a
loan. There is exactly one tuple for each loan in the relation on Loan-schema, and one tuple for each
customer of each loan in the relation on Borrower-schema. Thus, we do not have to repeat the branch
name and the amount once for each customer associated with a loan. Often testing of a relation to see if
it satisfies BCNF can be simplified:
Decomposition Algorithm
The decomposition that the algorithm generates is not only in BCNF, but is also a lossless-join
decomposition. To see why our algorithm generates only lossless-join decompositions, we note that,
when we replace a schema Riwith (Ri−β) and (α, β), the dependency α →β holds, and (Ri−β) ∩(α, β) = α.
result:= {R};
done:= false;
computeF+;
while(not done) do
if(there is a schema Riin result that is not in BCNF)
then begin
letα →β be a nontrivial functional dependency that holds
onRisuch that α →Riis not in F+, and α ∩β = ∅;
result:= (result −Ri) ∪(Ri−β) ∪( α, β);
end
elsedone := true;
which indicates that a customer has a “personal banker” in a particular branch. The set F of functional
dependencies that we require to hold on the Banker-schema is
banker-name → branch-name
branch-name customer-name → banker-name
Clearly, Banker-schema is not in BCNF since banker-name is not a superkey. If we apply the algorithm
we obtain the following BCNF decomposition:
Banker-branch-schema = (banker-name, branch-name)
Customer-banker-schema = (customer-name, banker-name)
The decomposed schemas preserve only banker-name → branch-name (and trivial dependencies), but
the closure of {banker-name → branch-name} does not include customer-name branch-name → banker-
name. The violation of this dependency cannot be detected unless a join is computed.
Thus, the example shows that we cannot always satisfy all three design goals:
1. Lossless join
2. BCNF
3. Dependency preservation
We have two alternatives if we wish to check if an update violates any functional dependencies:
• Pay the extra cost of computing joins to test for violations.
• Use an alternative decomposition, third normal form (3NF), which we present below, which makes
testing of updates cheaper. Unlike BCNF, 3NF decompositions may contain some redundancy in the
decomposed schema.
Definition
BCNF requires that all nontrivial dependencies be of the form α → β, where α is a superkey. 3NF relaxes
this constraint slightly by allowing nontrivial functional dependencies whose left side is not a superkey.
The first two alternatives are the same as the two alternatives in the definition of BCNF. The third
alternative of the 3NF definition seems rather unintuitive, and it is not obvious why it is useful. It
represents, in some sense, a minimal relaxation of the BCNF conditions that helps ensure that every
schema has a dependency-preserving decomposition into 3NF. Its purpose will become more clear
later, when we study decomposition into 3NF.
The only nontrivial functional dependencies of the form α → banker-name include {customer-name,
branch-name} as part of α. Since {customer-name, branch-name} is a candidate key, these dependencies
do not violate the definition of 3NF.
Decomposition Algorithm
The set of dependencies Fc used in the algorithm is a canonical cover for F.Note that the
algorithm considers the set of schemas Rj, j = 1, 2, . . . , i; initially i = 0, and in this case the set is empty.
Functional dependencies rule out certain tuples from being in a relation. If A → B, then we cannot have
two tuples with the same A value but different B values. Multivalued dependencies, on the other hand,
do not rule out the existence of certain tuples. Instead, they require that other tuples of a certain form
be present in the relation. For this reason, functional dependencies sometimes are referred to as
equality generating dependencies, and multivalued dependencies are referred to as tuple
generating dependencies.
Let R be a relation schema and let α ⊆ R and β ⊆ R. The multivalued dependency
α →→ β
holds on R if, in any legal relation r(R), for all pairs of tuples t1 and t2 in r such that t1[α] = t2[α], there
exist tuples t3 and t4 in r such that
From the definition of multivalued dependency, we can derive the following rule:
• If α → β, then α →→ β.
In other words, every functional dependency is also a multivalued dependency.
result := {R};
done := false;
compute D+; Given schema Ri, let Di denote the restriction of D+ to Ri
while (not done) do
if (there is a schema Ri in result that is not in 4NF w.r.t. Di)
then begin
let α →→ β be a nontrivial multivalued dependency that holds
on Ri such that α → Ri is not in Di, and α ∩ β = ∅;
result := (result − Ri) ∪ (Ri − β) ∪ (α, β);
end
else done := true;
Let R be a relation schema, and let R1,R2, . . . , Rn be a decomposition of R. To check if each relation
schema Ri in the decomposition is in 4NF, we need to find what multivalued dependencies hold on each
Ri. Recall that, for a set F of functional dependencies, the restriction Fi of F to Ri is all functional
dependencies in F+ that include only attributes ofRi.Now consider a setDof both functional and
multivalued dependencies. The restriction of D to Ri is the set Di consisting of
1. All functional dependencies in D+ that include only attributes of Ri
2. All multivalued dependencies of the form
α →→ β ∩ Ri
where α ⊆ Ri and α →→ β is in D+.
Let R be a relation schema, and let D be a set of functional and multivalued dependencies on R. Let R1
and R2 form a decomposition of R. This decomposition is a lossless-join decomposition of R if and only
if at least one of the following multivalued dependencies is in D+:
R1 ∩ R2 →→ R1
R1 ∩ R2 →→ R2