09 - Functional Dependencies Normalization
09 - Functional Dependencies Normalization
Basics of Functional
Dependencies and
Normalization for
Relational Databases
Figure 14.1 A
simplified COMPANY
relational database
schema.
Figure 14.3
Two relation schemas
suffering from update
anomalies. (a)
EMP_DEPT and (b)
EMP_PROJ.
◼ GUIDELINE 2:
◼ Design a schema that does not suffer from the
insertion, deletion and update anomalies.
◼ If there are any anomalies present, then note them
so that applications can be made to take them into
account.
◼ GUIDELINE 3:
◼ Relations should be designed such that their
tuples will have as few NULL values as possible
◼ Attributes that are NULL frequently could be
placed in separate relations (with the primary key)
◼ Reasons for nulls:
◼ Attribute not applicable or invalid
◼ Attribute value unknown (may exist)
◼ Value known to exist, but unavailable
◼ GUIDELINE 4:
◼ The relations should be designed to satisfy the
lossless join condition.
◼ No spurious tuples should be generated by doing
a natural-join of any relations.
◼ Note that:
◼ Property (a) is extremely important and cannot be
sacrificed.
◼ Property (b) is less stringent and may be sacrificed. (See
Chapter 15).
◼ Normalization:
◼ The process of decomposing unsatisfactory "bad"
relations by breaking up their attributes into
smaller relations
◼ Normal form:
◼ Condition using keys and FDs of a relation to
certify whether a relation schema is in a particular
normal form
◼ Disallows
◼ composite attributes
◼ multivalued attributes
◼ nested relations; attributes whose values for an
individual tuple are non-atomic
◼ Considered to be part of the definition of a
relation
◼ Most RDBMSs allow only those relations to be
defined that are in First Normal Form
Normalizing nested relations into 1NF. (a) Schema of the EMP_PROJ relation with a
nested relation attribute PROJS. (b) Sample extension of the EMP_PROJ relation
showing nested relations within each tuple. (c) Decomposition of EMP_PROJ into
relations EMP_PROJ1 and EMP_PROJ2 by propagating the primary key.
◼ Definition:
◼ Superkey of relation schema R - a set of attributes
S of R that contains a key of R
◼ A relation schema R is in third normal form (3NF)
if whenever a FD X → A holds in R, then either:
◼ (a) X is a superkey of R, or
◼ (b) A is a prime attribute of R
◼ LOTS1 relation violates 3NF because
Area → Price ; and Area is not a superkey in
LOTS1. (see Figure 14.12).
Figure 14.13
Boyce-Codd normal form. (a) BCNF normalization of
LOTS1A with the functional dependency FD2 being lost in
the decomposition. (b) A schematic relation with FDs; it is
in 3NF, but not in BCNF due to the f.d. C → B.
Figure 14.14
A relation TEACH that is in 3NF
but not BCNF.
Figure 14.15
Fourth and fifth normal forms. (a) The EMP relation with two MVDs: Ename –>> Pname and Ename –>>
Dname. (b) Decomposing the EMP relation into two 4NF relations EMP_PROJECTS and EMP_DEPENDENTS.
(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has the JD(R1, R2, R3). (d)
Decomposing the relation SUPPLY into the 5NF relations R1, R2, R3.