0% found this document useful (0 votes)
69 views

Relational Normalization: Contents Relational Database Design: Rationale

The document discusses relational database normalization. It defines several normal forms including first, second, third, and Boyce-Codd normal forms. The goals of normalization include reducing data redundancy, maintaining data integrity, and simplifying query processing and updates. An example shows how denormalizing a database by combining attributes from different entities into one relation can introduce anomalies like insertion, deletion, and modification anomalies.

Uploaded by

dear_skm
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views

Relational Normalization: Contents Relational Database Design: Rationale

The document discusses relational database normalization. It defines several normal forms including first, second, third, and Boyce-Codd normal forms. The goals of normalization include reducing data redundancy, maintaining data integrity, and simplifying query processing and updates. An example shows how denormalizing a database by combining attributes from different entities into one relation can introduce anomalies like insertion, deletion, and modification anomalies.

Uploaded by

dear_skm
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Relational Normalization: Contents

Relational Database Design: Rationale


• Motivation
• Functional Dependencies • Logical level
• First Normal Form – clarity of semantics of data

• Second Normal Form – convenience of query formulation


– information content
• Third Normal Form
– applies both to base relations and views (virtual relations)
• Boyce-Codd Normal Form
• Physical level
• Decomposition Algorithms
– storage optimization
• Inclusion Dependencies and Inclusion Normal Form – efficiency of access (query evaluation)
• Multivalued Dependencies and Fourth Normal Form – simplicity of updates
• Join Dependencies and Fifth Normal Form – applies only to stored base relations

• Critique of Relational Normalization

1 3

• The grouping of attributes in physical relation schemas has a significant effect on


storage space

Relational Database Design

• Choose and apply criteria for grouping attributes in relation schemas


• Necessary to characterize some groupings as better than others
• A good design can be obtained by
– intuiton, craftmanship of designer
– design in a more expressive data model (such as E-R) and translate to rela-
tional
– thru normalization theory
∗ relational normal forms
∗ important stage in the development of relational theory
∗ normal forms can be defined for other data models (e.g., E-R)

Normalization- 1 Normalization- 2
Example of Relational Schema Logical Level: Semantics of Relation Attributes
Employee
SSN FName MInit LName BDate Address Sex Salary SuperSSN DNo • Meaning (semantics) is associated with attributes = interpretation of values in
relation tuples
Department
DNumber DName DMgr MgrStartDate
• The clearer the semantics of a relation, the better the design of the schema
DeptLocation
DNumber DLocation • A plausible set of rules:
Project – each tuple should represent one entity or one relationship instance
PNumber PName PLocation DNumber
– attributes of different entities and relationships should not be mixed in the
WorksOn same relation
PNumber PName Hours
– only foreign keys should be used to refer to other entities (most probably
Dependent
SSN DependentName Sex BDate Relationship with referential integrity)

4 5

• In terms of entities and relationships

– DeptLocations represents a multivalued attribute of Department


– WorksOn represents a M-N relationship between Employee and Project
– Dependent merges a relationship and a less important entity
A not so Good Design and an Improved Normalization
• There are many other ways to organize the same data

• No claim that the database represents the “real world” EmpDept


EName SSN BDate Address DNumber DName DMgr

@
@€€
Emp
EName SSN BDate Address DNumber

Dept
DNumber DName DMgr

Normalization- 3 Normalization- 4
• Clear semantics but poor design: attributes from distinct real-world entities (Employee,
Department) are mixed in EmpDept

• Maybe acceptable for views, but causes problems when done with base relations

• Redundant information causes update anomalies

A Similar Example Deletion and Modification Anomalies


EmpProj EmpDept
SSN PNumber Hours EName PName PLocation DNumber EName SSN BDate Address DNumber DName DMgr

• Delete from EmpDept the last employee of a department


→ lose information or complex update
@
@€€
• Change manager of a department
Proj → affects several tuples to avoid inconsistency
PNumber PName PLocation DNumber

Emp
SSN EName

WorksOn
SSN PNumber Hours
8

Insertion Anomalies
EmpDept
EName SSN BDate Address DNumber DName DMgr

• Insert a new employee into EmpDept


– compatibility of department data to be checked with other tuples of EmpDept
– insert nulls if no department data is input
• Insert a new department without employee into EmpDept
– tuple with nulls for employee data (but SSN is key!)
– remove nulls (complex insertion) when the first employee of th department is
inserted

Normalization- 5 Normalization- 6
Spurious Tuples
Spurious Tuples in EmProj1 1 EmpLocs
• Bad design may result in erroneous results for joins
EmpProj EmpProj1 1 EmpLocs
SSN PNumber Hours EName PName PLocation SSN P# Hours PName PLocation EName
⇓ 1234 1 32.5 ProdX Bellaire Smith
∗ 1234 1 32.5 ProdX Bellaire English
EmpLocs EmpProj1 1234 2 7.5 ProdY Sugarland Smith
EName PLocation SSN PNumber Hours PName PLocation ∗ 1234 2 7.5 ProdY Sugarland English
6668 3 40.0 ProdZ Houston Narayan
∗ 4534 1 20.0 ProdX Bellaire Smith
• EmpProj =
6 join of its projections EmpLocs and EmpProj1
4534 1 20.0 ProdX Bellaire English
– common attribute (PLocation) is not a key or a foreign key 4534 2 20.0 ProdY Sugarland Smith
– joining yields more tuples than in EmpProj (“spurions tuples”) ∗ 4534 2 20.0 ProdY Sugarland English

• Lossless-join property: guarantees meaningful results for joins

9 11

EmpProj
SSN P# Hours EName PName PLocation Functional Dependencies (FDs)
1234 1 32.5 Smith ProdX Bellaire
1234 2 7.5 Smith ProdY Sugarland
• Definition 1:
6668 3 40.0 Narayan ProdZ Houston
4534 1 20.0 English ProdX Bellaire FD X → Y holds in relation R(A1 , . . . , An ), with X, Y ⊆ {A1 , . . . , An },
4534 2 20.0 English ProdY Sugarland if for every pair of tuples t1 , t2 such that t1 [X] = t2 [X] then t1 [Y ] = t2 [Y ]

⇓ • Definition 2:
X → Y if there cannot exists different tuples t1 , t2 such that t1 [X] = t2 [X]
EmpProj1 EmpLocs
SSN P# Hours PName PLocation EName PLocation • Functional dependencies are constraints, i.e., belong to the schema
1234 1 32.5 ProdX Bellaire Smith Bellaire → cannot be deduced from extension
1234 2 7.5 ProdY Sugarland Smith Sugarland
6668 3 40.0 ProdZ Houston Narayan Houston → can only be verified or invalidated on extension (one counterexample is enough
4534 1 20.0 ProdX Bellaire English Bellaire to invalidate)
4534 2 20.0 ProdY Sugarland English Sugarland

10 12

Normalization- 7 Normalization- 8
• Definition 1:
– all tuples that agree on X also agree on Y
– values for X functionally determine values for Y FDs: Example, cont.
– definition applies with t1 = t2
EmpDept
• Special case : if X is a key or superkey then X → Y for all Y ⊆ {A1 , . . . , An } EName SSN BDate Address D# DName DMgrSSN
Smith 1234 21/07/39 ... 1 Research 1234
Narayan 6668 18/01/43 ... 1 Research 1234
English 4534 8/05/53 ... 2 Account 4534
Wong 9788 30/11/49 ... 3 Admin 9788
Zelaya 6677 23/08/60 ... 3 Admin 9788
FDs: Example
?
DMgrSSN → D#
EmpDept EmpProj
EName SSN BDate Address D# DName DMgrSSN SSN P# Hours EName PName PLocation
6 6 6 6 1234 1 32.5 Smith ProdX Bellaire
6 6 1234 2 7.5 Smith ProdY Sugarland
SSN → EName SSN → D# 6668 3 40.0 Narayan ProdZ Houston
SSN → {EName,BDate} D# → DMgrSSN 4534 1 20.0 English ProdX Bellaire
4534 2 20.0 English ProdY Sugarland
EmpProj
SSN P# Hours EName PName PLocation PLocation → PName does not hold
6
6 14
6 6
{SSN,P#} → Hours
P# → {PName,PLocation}

13

Inference of New FDs

• Only most obvious (or most important) dependencies are explicitly specified
• Many other dependencies can be deduced from them
• Closure F + of a set F of dependencies:
– set of all dependencies in F + those implied by F
– dependency X → Y is implied by F if
X → Y is valid in all relation instances for which F is valid
– X → Y ∈ F iff F |= X → Y

15

Normalization- 9 Normalization- 10
Armstrong’s Inference Rules
Example: Deriving FDs
A1 (Reflexivity) If Y ⊆ X, then X → Y
EmpDept
A2 (Augmentation) If X → Y , then XZ → Y Z (and XZ → Y ) EName SSN BDate Address D# DName DMgrSSN
A3 (Transitivity) If X → Y and Y → Z, then X → Z 6 6 6 6
6 6
A1, A2, A3 form a sound and complete set of inference rules
SSN → EName D# → DName ⇒ SSN → DName
Some useful additional inference rules SSN → BDate D# → DMgrSSN SSN → DMgrSSN
A4 (Decomposition) If X → Y Z, then X → Y and X → Z SSN → Address ... SSN → {EName,DName}
SSN → D# ..
.. .
A5 (Union) If X → Y and X → Z, then X → Y Z
.
A6 (Pseudotransitivity) If X → Y and W Y → Z, then W X → Z

16 17

Inference rule: systematic way of constructing functional dependencies in F +

Example: Deriving FDs, cont.

EmpProj
SSN P# Hours EName PName PLocation
6
6
6 6
P# → P#
⇒ P# → {P#,PLocation}
P# → PLocation

{SSN,P#} → Hours {SSN,P#,Hours} → . . . 7 possibilities


{SSN,P#} → EName {SSN,P#,EName} → . . . 7 possibilities
{SSN,P#} → PName {SSN,P#,Hours,EName} → . . . 3 possibilities
{SSN,P#} → PLocation .
..
{SSN,P#} → {Hours,EName}
..
.
15 in all

18

Normalization- 11 Normalization- 12
• Algorithm starts setting X+ to all attributes in X: by A1 all these attributes are
functionally dependent on X

• Using A3 and A4 we add attributes to X+ , using each FD in F


• We keep going through all dependencies in F (repeat loop) until no more attributes
are added to X+ during a complete cycle (for loop) through the dependencies in F

Proof of the Inference Rules


• Viewed as a formal deductive system
– a set of given FDs plays the role of axioms
– Armstrong’s rules: sound and complete inference rules to derive new FDs
from the axioms without invoking definition of what is an FD
• Structure of formal deductive system is clear and simple
Closure of Attribute Sets: Example
• Armstrong’s rules can be proven with the definition of FD
• Derived rules (A4, A5, A6, and others) can be proven from Armstrong’s basic EmpProj
SSN P# Hours EName PName PLocation
rules
6
6
6 6
{SSN}+ = {SSN,EName}
{P#}+ = {P#,PName,PLocation}
19 {SSN,P#}+ = {SSN,P#,EName,PName,PLocation,Hours}

21
Closure of Attribute Sets

• Closure F (X + ) of a set of attributes X under a set F of FDs:


– set of attributes A such that X → A can be deduced from F by Armstrong’s
rules
– allows to tell at a glance whether FD X → Y follows from F
• Algorithm for determining the closure of X under F

X+ := X;
repeat
oldX+ := X+ ;
for each FD Y → Z in F do
if X+ ⊇ Y then X+ := X+ ∪ Z;
until (X+ = oldX+ )

20

Normalization- 13 Normalization- 14
Finding Minimal Covers
Equivalence of Sets of FDs
Algorithm for finding a minimal cover G of F
• Two sets of FDs F and G are equivalent if (1) Set G := F
– every FD in F can be inferred from G, and
(2) Replace each FD X → {A1 , . . . An } in G by n FDs X → A1 , . . . , X → An
– every FD in G can be inferred from F
(3) For each FD X → A in G
• Hence F and G are equivalent if F + = G+ for each attribute B that is an element of X
if (G − {X → A}) ∪ {(X − {B}) → A} is equivalent to G
• F covers G if every FD in G can be inferred from F (i.e., if G+ ⊆ F + )
then replace X → A with (X − {B}) → A in G
• F and G are equivalent if F covers G and G covers F
(4) For each remaining FD X → A in G
• There is an algorithm for cheking equivalence of sets of FDs if (G − {X → A}) is equivalent to G
then remove X → A from G

22 24

Minimal Sets of FDs Several Minimal Covers: Example

• A set F of FDs is minimal if AB → C D → EG AB → C D→E CG → B


(1) every FD in F has a single attribute for its RHS C→A BE → C C→A D→G CG → D

BC → D CG → BD BC → D BE → C CE → A
(2) no subset of F is equivalent to F
ACD → B CE → AG ACD → B CE → G
(3) no dependency X → A in F can be replaced by Y → A with Y ⊂ X and
• Two minimal covers
yield a set of dependencies equivalent to F
• Minimal cover of a set F of FDs: minimal set of dependencies Fmin equivalent
AB → C D→G AB → C D→G
to F
C→A BE → C C→A BE → C
BC → D CG → D BC → D CG → B
• Every set of FDs has a minimal cover
CD → B CE → G D→E CE → G
• There can be several minimal covers for a set of FDs D→E

23 25

Normalization- 15 Normalization- 16
First minimal cover
• CE → A implied by C → A

• CG → B implied by CG → D, C → A, ACD → B

• ACD → B replaced by CD → B since C → A


Second minimal cover Normal Forms
• CE → A implied by C → A
• Several types of normal forms based on different integrity constraints
• CG → D implied by CG → B, BC → D – Functional Dependencies
• ACD → B implied by C → A, D → G, CG → B 1st, 2nd, 3rd, Boyce-Codd, Improved 3rd Normal Forms
– Functional and Inclusion Dependencies
Inclusion Normal Form
– Functional and Multivalued Dependencies
4th Normal Form
Normalization
– Functional, Multivalued, and Join Dependencies
5th Normal Form
• Process of decomposing unsatisfactory relation schemas into smaller relations
without those undesirable aspects (e.g., update anomalies)

EmpDept
EName SSN Bdate Address DNo DName DMgr
6 6 6 6 27
6 6
⇓ Normalization
Dept Emp
DNo DName DMgr EName SSN Bdate Address DNo
6 6 6 6 6 6

• Normal form = condition (integrity constraint) to certify that a relation schema


is in a particular form
How Far to Normalize

26 • Relations not always normalized to the highest possible form, e.g., for perfor-
mance reasons
• Higher normals forms
⇒ smaller relations
⇒ more joins in queries
⇒ space/time or query/update tradeoff

28

Normalization- 17 Normalization- 18
• Redundancy: for each value of DLocation, other attributes has to be repeated

• Normalized relation is less adequate with respect to real-world intuition

Nested Relations
First Normal Form (1NF)
EmpProj
Projs
• Relations as defined in the relational model SSN EName PNumber Hours
• Forbids composite attributes, multivalued attributes, nested relations (relations EmpProj
within relations) Projs
• Relational-model limitation to 1NF: SSN EName PNumber Hours
123456789 Smith, John B. 1 32.5
– historical reasons (simplify file management) 2 7.5
– on retrospect, a mistake 666884444 Narayan, Ramesh K. 3 40.0
999888777 Zelaya, Alicia J. 1 20.0
2 20.0
453453453 English, Joyce A. 30 30.0
10 10.0
⇓ 1NF Normalization (unnest operation)
EmpProj1 EmpProj2
29
SSN EName SSN PNumber Hours

31

Multivalued attributes
• Nested relations: value of an attribute of a relation can be a relation
Department
DName DNumber DMgr {DLocations} • SSN: primary key of EmpProj
6 6 6
• PNumber: primary key of each nested Projs relation
Department
• Unnest operation transforms the relation into 1NF
DName DNumber DMgr {DLocations}
Research 5 333445555 {Bellaire,Sugarland,Houston} • The primary key has to be propagated into the embedded relation
Administration 4 987654321 {Stafford}
Headquarters 1 888665555 {Houston}
⇓ 1NF Normalization
Department
DName DNumber DLocations DMgr
Research 5 Bellaire 333445555
Research 5 Sugarland 333445555
Research 5 Houston 333445555
Administration 4 Stafford 987654321
Headquarters 1 Houston 888665555

30

Normalization- 19 Normalization- 20
Second Normal Form (2NF)

Emp Proj
SSN PNumber Hours EName PName PLocation Third Normal Form (3NF)
fd1 6 Emp Dept
fd2 6 EName SSN BDate Address DNumber DName DMgr
fd3 6 6
6 6 6 6
• Prime attribute: attribute which is a member of a key 6 6
• Full functional dependency: an FD X → Z where removal of any attribute • X → Z is a transitive functional dependency if ∃ Y such that X → Y and
from X invalidates the dependency Y → Z (and Z is not a subset of a key)
– fd1 is a full FD, neither SSN → Hours nor PNumber → Hours hold – SSN → DMgr is a transitive FD
– SSN,PNumber → EName is not a full FD (i.e., is a partial dependency) (SSN → DNumber and DNumber → DMgr hold)
since SSN → EName (fd2) also holds – SSN → EName is a non-transitive FD
(there is no set of attributes X where SSN → X and X → EName)
• Definition: a relation schema R is in 2NF if every nonprime attribute is fully
functionally dependent on every key
• A relation where all keys are single attributes is automatically in 2NF

32 34

Second Normal Form : Example


Third Normal Form: Definitions
Emp Proj
SSN PNumber Hours EName PName PLocation 3 definitions for a relation schema R to be in 3NF:
fd1 6
fd2 6 (1) R is in 2NF and no nonprime attribute is transitively dependent on a key
fd3 6 6 (2) for all X → A ∈ F +
⇓ 2NF Normalization • either X is a superkey,
Emp Proj1 Emp Proj2 • or A is a prime attribute
SSN PNumber Hours SSN EName
fd1 fd2 (3) every nonprime attribute is
6 6
• fully functionally dependent on every key
Emp Proj3
PNumber PName PLocation • nontransitively dependent on every key
fd3 6 6

33 35

Normalization- 21 Normalization- 22
Normalization: Example, cont.
Third Normal Form: Example
Lots1A
Emp Dept PropertyId# CountyName Lot# Area
EName SSN BDate Address DNumber DName DMgr fd1 6 6 6
fd2 6 6
6 6 6 6
6 6 Lots1B
⇓ 3NF Normalization Area Price
fd4 6
Emp Dept1
EName SSN BDate Address DNumber
Lots 1NF
6 6 6 6
@
Emp Dept2 @
DNumber DName DMgr Lots1 Lots2 2NF
@
6 6 @
Lots1A Lots1B Lots2 3NF and BCNF

36 38

Normalization: Example

Lots
PropertyId# CountyName Lot# Area Price TaxRate Boyce-Codd Normal Form (BCNF)
fd1 6 6 6 6 6
fd2 6 6 6 6 • A relation schema R is in BCNF if, for all X → A ∈ F + , X is a superkey of R
fd3 6 (and A 6∈ X)
fd4 6
• BCNF is an improved 3NF : all dependencies result from keys
Lots1
PropertyId# CountyName Lot# Area Price • A relation with 2 attributes is automatically in BCNF
fd1 6 6 6 6 • Most 3NF relations are also in BCNF
fd2 6 6 6
fd4 • Intuition of 3NF/BCNF: FDs concern the key, the whole key, and nothing
6
but the key
Lots2
CountyName TaxRate
fd3 6

37 39

Normalization- 23 Normalization- 24
A relation in 3NF but not in BCNF
Relational Decomposition
PatVisit
Patient Hospital Doctor • Normalization decomposes relation schemas with undesirable aspects into smaller
PatVisit
Smith Alachua Atkinson relations
Patient Hospital Doctor
Lee Shands Smith
6 Marks Alachua Atkinson • Consider a relation schema R(A1 , . . . , An ) and a set of dependencies F
6 Marks Shands Shaw • Goal: produce a decomposition D of R into m relation schemas D = {R1 , . . . , Rm }
Rao North Florida Nefzger where each Ri contains a subset of {A1 , . . . , An }, and
⇓ BCNF Normalization – every attribute Ai in R appears in at least one Ri
PatDoctor DoctHosp – each relation Ri is at least in BCNF or in 3NF
Patient Doctor Doctor Hospital
Smith Atkinson Atkinson Alachua • Extreme decomposition approach: start with a universal relation schema con-
Lee Smith Smith Shands taining all the DB attributes and a set of FD
Marks Atkinson Shaw Shands • ⇒ Universal relation assumption is needed: every attribute is unique, i.e.,
Marks Shaw Nefzger North Florida attributes with the same name in different relations have the same meaning
Rao Nefzger

40 41

• PatVisit has 2 keys: (Patient,Hospital) and (Patient,Doctor)


• To reach BCNF without losing information an inter-relation constraint is needed,
namely (Patient, Hospital) → Doctor

Relational Decomposition, cont.

• Requiring each individual relation to be in a given normal form does not alone
guarantee a good design
• BCNF (or 3NF) measure “goodness” for individual relations based on their keys
and functional dependencies
• A set of relations must possess additional properties to ensure a good design
– dependency preservation
– lossless (nonadditive) join
• In traditional approach to relational database design, dependency preservation
is required because of the weak support for integrity constraints by DBMSs

42

Normalization- 25 Normalization- 26
Dependency Preservation: Formalization
Dependency Preservation • A decomposition D must preserve the dependencies: collection of all dependen-
cies that hold on individual relations Ri must be equivalent to F
• Consider a relation schema R(A1 , . . . , An ), a set of dependencies F , and a de-
composition D of R into m relation schemas D = {R1 , . . . , Rm } • Formally
Q
– Projection F (Ri ) of F on Ri : set of FDs X → Y in F + such that (X ∪Y ) ⊆
• Dependency preservation: each FD X → Y of F should appear explicitly or be
Ri (their left- and right-hand side attributes are in Ri )
inferrable in one relation schema Ri
– A decomposition D = {R1 , . . . , Rm } is dependency-preserving if
• Otherwise, to preserve information, inter-relation FDs are needed (i.e., depen- Q Q
( F (R1 ) ∪ . . . ∪ F (Rm ))+ = F +
dencies that hold on a join of several relations of the decomposition)
• Dependency preservation enables checking that FDs in F hold by checking them
on each relation Ri individually

43 45

Dependency Preservation: Example


Dependency Preserving Decomposition Algorithm
Lots1A
Find a dependency preserving decomposition D = {R1 , . . . , Rm } of a relation R
PropertyId# CountyName Lot# Area
fd1 w.r.t. a set of FDs F such that each Ri is in 3NF
6 6 6
fd2 6 6 (1) Find a minimal cover G of F
fd3 6
(2) For each X and FD X → A in G
⇓ BCNF Normalization with loss of a key
create relation Ri in D with attributes {X ∪ A1 ∪ . . . ∪ Ak } where
Lots1AX X → A1 , . . . , X → Ak are the only FDs in G with X as left-hand side
PropertyId# Area Lot#
(X is the key of Ri )
fd1 6 6
(3) If attributes in G are not placed in any Rj , create another relation (all-key,
Lots1AY
without FD) in D for those attributes
Area CountyName
fd3 6 • There is no similar algorithm to reach BCNF
• There are several minimal covers in general ⇒ nondeterminism
• An inter-relation constraint must express that (Lot#,CountyName) is the key
• Not guaranteed to be lossless (nonadditive)
of the join of Lots1AX and Lots1AY

44 46

Normalization- 27 Normalization- 28
Lossless (Nonadditive) Join Property
Properties of the Nonadditive Join (2)
• Ensures that no spurious tuples appear when relations in the decomposition are If D = {R1 , . . . , Rm } of R has the nonadditive-join property w.r.t. F , and Di =
joined Q
{Q1 , . . . , Qk } of Ri has the nonadditive-join property w.r.t. Ri (F ), then
• Decomposition D = {R1 , . . . , Rm } of R has the lossless-join property w.r.t. a D = {R1 , . . . , Ri−1 , Q1 , . . . , Qk , Ri+1 , . . . , Rm }
set F of FDs if, for every relation instance r(R) whose tuples satisfy all the FDs
in F : has the nonadditive join property w.r.t. F
ΠR1 (r) 1 . . . 1 ΠRm (r) = r Emp
SSN PNumber Hours EName
(This is the general form of join dependency, see later)
6
• Ensures that whenever a relation instance r(R) satisfies F , no spurious tuples 6
are generated by joining the decomposed relations r(Ri ) ⇓
Emp1 Emp2
• Necessary to generate meaningful results for queries involving joins SSN PNumber Hours SSN EName
• There exists an algorithm for testing whether a decomposition D satisfies the 6 6
lossless-join property with respect to a set F of FDs

47 49

Properties of the Nonadditive Join (1)


Lossless Join Decomposition Algorithm
Decomposition D = {R1 , R2 } of R has the nonadditive-join property w.r.t. F iff
Decompose R into a lossless join decomposition D = {R1 , . . . , Rm } w.r.t. F such
either (R1 ∩ R2 ) → (R1 − R2 ) or (R1 ∩ R2 ) → (R2 − R1 ) is in F +
that each Ri is in BCNF
EmpProj
SSN PNumber Hours EName PName PLocation (1) Set D := {R}
6 (2) While there is a relation schema Q in D that is not BCNF
6 begin
6 6
⇓ find an FD X → Y in Q that violates BCNF;
Emp replace Q in D by two relations (Q − Y ) and (X ∪ Y )
SSN PNumber Hours EName end
6 • Does not necessarily preserve the FDs
6
Proj • The algorithm is very simple: normalizing up to BCNF = break relations ac-
PNumber PName PLocation cording to FDs
6 6

48 50

Normalization- 29 Normalization- 30
Decomposition Example
Combined Decomposition Algorithm
Produce a lossless join and dependency-preserving decomposition into 3NF
Stock
Model# Serial# Price Color Name Year
(1) Find a minimal cover G of F
(2) For each X in an FD X → A in G
create a relation in D with attributes {X ∪ A1 ∪ . . . ∪ Ak } where Dependencies Minimal Cover
X → A1 , . . . , X → Ak are the only FDs in G with X as left-hand side {M,S} → {P,C} {M} → {N} {M,S} → {C} {M} → {N}
(X is the key of this relation) {S} → {Y} {N,Y} → {P} {S} → {Y} {N,Y} → {P}

(3) If none of the relations in D contains a key of R, create one more relation schema
in D that contains attributes that form a key of R Stock1 Stock2
Model# Serial# Color Model# Name
• Step 3 of previous algorithm is not needed because the key will include any Stock3 Stock4
unplaced attributes (i.e., attributes not participating in any FD) Serial# Year Name Year Price

51 53

Multivalued Dependencies

• Multivalued dependencies are a consequence of requiring 1NF


Algorithm for Finding a Key
Finding a key F for a relation schema R based on a set F of FDs Course Teacher Text
 
 Green    

   
Mechanics
(1) Set K := R Physics Brown

 
  Thermodynamics 
 
(2) For each attribute A in K Black
 
compute (K − A)+ with respect to F n o  Algebra 
Math White
if (K − A)+ contains all attributes in R then set K := K − {A};  Geometry 

• Semantics: every teacher who teaches a course uses all the texts for that course
• Determines only one key out of the possible candidates keys for R
(independence of Teacher and Text)
• Key returned depends on the order in which attributes are removed
• For two or more multivalued independent attributes, every value of one of the
attributes must be repeated with every value of the other attribute to keep the
relation consistent

52 54

Normalization- 31 Normalization- 32
• In the example, Course →→ Teacher | Text: each course is associated with a set of
teachers and with a set of texts, and these sets are independent of each other

Multivalued Dependencies (cont.)

Course Teacher Text


Physics Green Mechanics
Physics Green Thermodynamics Multivalued Dependencies: Formal Definition
Physics Brown Mechanics
Physics Brown Thermodynamics • Consider a relation schema R, X and Y are subsets of attributes in R, Z =
Physics Black Mechanics R − (X ∪ Y )
Physics Black Thermodynamics
Math White Algebra • X →→ Y holds in R if, whenever tuples t1 and t2 exist in an instance r(R) with
Math White Geometry t1 [X] = t2 [X], then tuples t3 and t4 also exist in r(R) such that
X Y Z
• This relation is in BCNF since it is all-key − t1 [X] = t2 [X] = t3 [X] = t4 [X] t1 a b1 c1
− t3 [Y ] = t1 [Y ] and t4 [Y ] = t2 [Y ] t2 a b1 c2
• A set of attributes X multidetermines a set of attributes Y if the value of X t3 a b2 c1
− t3 [Z] = t2 [Z] and t4 [Z] = t1 [Z]
determines a set of values for Y (independently of any other attributes) t4 a b2 c2
• A multivalued dependency (MVD) is written as X →→ Y • A MVD X →→ Y is trivial if either Y ⊆ X or (X ∪ Y ) = R

• In the example, Course →→ Teacher and Course →→ Text – always holds according to the MVD definition
• FDs are special cases of MVDs: If X → Y holds, then X →→ Z also holds
55

57

Intuition of MVDs

• X →→ Y is sometimes paraphrased as “X multidetermines Y” or “a given


X-value determines a set of Y-values”
• But this is really not precise enough, this is dangerously close to the paraphrase
of a many-to-many relationship or a multivalued attribute
• Z = R − (X ∪ Y ) is also involved
• If X →→ Y holds, then X →→ Z also holds
• A better, more intuitive notation is X →→ Y | Z
• X →→ Y | Z implies that a value of X determines a set of values of Y indepen-
dently from the values of Z

56

Normalization- 33 Normalization- 34
Inference Rules for FDs and MVDs
Motivation for 4NF
I1 Y ⊆ X ⇒ X → Y
• A relational schema with non-trivial MVDs is not a good design
I2 X → Y ⇒ XZ → Y Z
• Update anomalies : for a new teacher of Physics, we must insert two tuples
I3 X → Y and Y → Z ⇒ X → Z
Course Teacher Text
I4 X → Y ⇒ X → Z where Z = R − (X ∪ Y ) Physics Green Mechanics
Physics Green Thermodynamics
I5 X →→ Y and Z ⊆ W ⇒ W X →→ Y Z Physics Brown Mechanics
Physics Brown Thermodynamics
I6 X →→ Y and Y →→ Z ⇒ X →→ (Z − Y ) Physics Black Mechanics
Physics Black Thermodynamics
I7 X → Y ⇒ X →→ Z Math White Algebra
Math White Geometry
I8 X →→ Y and W → Z (for Z ⊆ Y , W ∩ Y = ∅ and W ∩ Z = ∅) ⇒ X → Z

58 59

A Sound and Complete Set of Inference Rules for FDs • This relation represents two independent 1:N relationships
and MVDs
Course Teacher Course Text
To compute the closure of a set F of functional and multivalued dependencies (F + )
Physics Green Physics Mechanics
I1 (Reflexivity for FDs) If Y ⊆ X, then X → Y Physics Brown Physics Thermodynamics
Physics Black Math Algebra
I2 (Augmentation for FDs) If X → Y , then XZ → Y Z Math White Math Geometry
I3 (Transitivity for FDs) If X → Y and Y → Z, then X → Z

I4 (Complementation for MVDs)


If X → Y , then X → Z where Z = R − (X ∪ Y )

I5 (Augmentation for MVDs)


If X →→ Y and Z ⊆ W , then W X →→ Y Z

I6 (Transitivity for MVDs)


If X →→ Y and Y →→ Z, then X →→ (Z − Y )
I7 (Replication)
If X → Y , then X →→ Z
I8 (Coalescence for MVDs)
If X →→ Y and W → Z, for Z ⊆ Y , W ∩ Y = ∅ and W ∩ Z = ∅, then X → Z

Normalization- 35 Normalization- 36
Decomposition in 4NF
Definition of 4NF
• Given a MVD X →→ Y that holds in a schema R, the decomposition into R1 = (X ∪ Y ) and
• A relation schema R is in 4NF w.r.t. a set of FDs and MVDs F if, for every nontrivial
R2 = (R − Y ) has the nonadditive-join property
multivalued dependency X →→ Y in F + , X is a superkey of R
• The converse also holds
• Since every MVD is an FD, 4NF implies BCNF
• Thus, a decomposition D = {R1 , R2 } of R has the nonadditive join property with respect to F
• In other words, a relation is in 4NF if it is in BCNF and if every nontrivial MVD is also an FD
if and only if either:
• If all dependencies in F are FDs, the definition of 4NF reduces to that of BCNF – (R1 ∩ R2 ) →→ (R1 − R2 ) holds in F + , or
• Although many relations in BCNF but not in 4NF are all-key (they have no FD), this is not – (R1 ∩ R2 ) →→ (R2 − R1 ) holds in F +
necessarily so
• Actually, if one of these holds, so does the other

60 62

Lossless Join Decomposition in 4NF


4NF: Example
Algorithm for lossless join decomposition of R into 4NF relations w.r.t. a set of FDs and MVDs
Course Teacher Text
(1) Set D := {R}
• Dependencies
(2) While there is a relation schema Q in D that is not in 4NF do
– Course →→ Teacher | Text
begin
– (Teacher,Text) → Course choose one Q in D that is not in 4NF;
(a teacher does not use the same text in more than one course) find a nontrivial MVD X →→ Y in Q that violates 4NF;
• {Teacher,Text} is a key replace Q in D by two relations (Q − Y ) and (X ∪ Y )
end;
• Relation is in BCNF but not in 4NF
• Does not necessarily preserve FDs
• Breaking it to produce 4NF relations does not preserve the dependency
– in the example above the FD (Teacher,Text) → Course is lost

61 63

Normalization- 37 Normalization- 38
A Sufficient Condition for Testing 4NF

• Given a relation schema R(U ), a subset C of U is a cut if every key of R has a non-empty
intersection with C and a nonempty intersection with U − C
EmpProj
Emp Proj Loc
Embedded Multivalued Dependencies
Smith P1 FL
Smith P2 CA • FDs are preserved when adding or suppressing an attribute (provided it is not involved in the
Smith P3 AZ FD)
Walton P1 CA
Walton P2 AZ • On the contrary, some MVDs are expected to hold after projection but are not explicit as MVD
before the projection
• Assume that EmpProj has the keys {Emp,Proj} (an employee works for a project
Regist
in only one location) and {Emp,Loc} (an employee works in one location for only Course Stud Preq Year
one project) CS402 Jones CS311 1988
CS402 Smith CS401 1989
• In EmpProj there is only one cut : {Proj,Loc} FD: {Stud,Preq} → Year
• Suppose that another key is added: {Proj,Loc} (given a project and a location, • Course →→ Stud does not hold: (CS402,Jones,CS401,1989) 6∈ Regist
only one employee is attached to them)
• Now, relation EmpProj has no cut

Theorem: If a relation schema is in BCNF and has no cut, then it is in 4NF

64 66

Embedded Multivalued Dependencies (cont.)

Regist1 Regist2
Stud Preq Year Course Stud Preq
Corollary: If a relation schema is in BCNF and has a simple (non composite) key,
• Course →→ Stud (and Course →→ Preq) hold in Regist2: every student enrolled in a course
then it is in 4NF
is required to have taken each prerequisite for the course
(a relation with a simple key has no cut)
• Corresponding constraint in the original relation Regist is an embedded multivalued dependency
• It is written Course →→ Stud|Preq, meaning that the dependency holds in the projection
πCourse,Stud,Preq (Regist)
• Regist2 is not in 4NF, and it should be decomposed into relations (Course,Stud) and (Course,Preq)

65 67

Normalization- 39 Normalization- 40
Join Dependencies
Join Dependencies
• The constraint in the schema is equivalent to say
• There are relations where a nonadditive-join decomposition can only be realized with more than
if hs1 , p1 , j2 i, hs2 , p1 , j1 i, hs1 , p2 , j1 i appear in Supply
two relation schemas
then hs1 , p1 , j1 i also appears in Supply
Supply
Supplier Part Proj • Supply satisfies the join dependency
Smith Bolt ProjX JD({Supplier,Part},{Part,Proj},{Supplier,Proj}), i.e.
Smith Nut ProjY Supply = Supply[Supplier,Part] 1 Supply[Part,Proj] 1
Adamsky Bolt ProjY Supply[Supplier,Proj]
Smith Bolt ProjY
• A join dependency JD(R1 , R2 , . . . , Rn ) on a relation R specifies that every instance of R has
• If a supplier s supplies part p, a project j uses part p, and the supplier s supplies at least one a nonadditive-join decomposition into R1 , R2 , . . . , Rn
part to project j, then supplier s also supplies part p to project j
• A MVD is a special case of a JD where n = 2
• Relation is “all key”, involves no nontrivial FDs or MVDs ⇒ is in 4NF
• A JD(R1 , R2 , . . . , Rn ) on R is a trivial JD if some Ri = R

68 70

Join Dependencies, cont.


Join Dependencies and Update Anomalies
Sup Part Part Proj Sup Proj
Supplier Part Part Proj Supplier Proj
Smith Bolt Bolt ProjX Smith ProjX • Join dependencies induce update anomalies
Smith Nut Nut ProjY Smith ProjY Supply1 Supply2
Adamsky Bolt Bolt ProjY Adamsky ProjY Supplier Part Proj Supplier Part Proj
@
R
@ €
€
‰  Smith Bolt ProjX Smith Bolt ProjX

Join over Part  Smith Nut ProjY Smith Nut ProjY


XX ‹
Walton Bolt ProjY
Supplier Part Proj XX
z
X Smith Bolt ProjY
Smith Bolt ProjX
Smith Bolt ProjY Join over Supplier and Proj • In Supply1, inserting (Walton,Bolt,ProjY) implies insertion of (Smith,Bolt,ProjY) (yet converse
Smith Nut ProjY is not true)
spurious Adamsky Bolt ProjX
• Deleting (Walton,Bolt,ProjY) from Supply2 has no side effects
Adamsky Bolt ProjY
? • If (Smith,Bolt,ProjY) is deleted from Supply2, then either the first or the third tuple must be
Original relation Supply deleted
• Does not hold if last tuple is removed from Supply

69 71

Normalization- 41 Normalization- 42
Fifth Normal Form (5NF)

• A relation schema R is in 5NF w.r.t. a set F or FDs, MVDs, and JDs if for every nontrivial
JD(R1 , R2 , . . . , Rn ), each Ri is a superkey
Supplier Part Proj
⇓ 5NF normalization Critique of Relational Normalization
Supplier Part Supplier Proj Part Proj
(1) Normal forms are easier to understand and appreciate from a richer point of view on data
• 5NF is also called PJNF (project-join normal form) modeling, namely, ER or OO
(2) Practical relevance of normal forms has been overemphasized
• Since a MVD is a special case of a JD, every relation in 5NF is also in 4NF
• Every relation can be non losslessly decomposed into 5NF relations
• If a relation is in 3NF and all its keys are simple, then it is in 5NF
• Discovering JDs in practice for large databases is difficult

72 73

1) Normal forms are easier to understand and appreciate from a richer point
• Many obvious join dependencies are based on keys of view on data modeling, namely, ER or OO
• Further decomposition ultimately leads to irreducible relations • Particularly striking for the “higher” normal forms (4NF, 5NF)

• MVDs are not stable when relation schemas are modified (leading to embedded de-
pendencies) ⇒ complexity of MVDs is largely a relational problem

• A “complex” normal form like 4NF is better analyzed in ER terms than just with the
multi-valued dependency ⇒ there are various interpretations for an MVD

– multi-valued attribute
– grouping of independent facts within one relation
– integrity constraint in a genuine relationship

• ER-based design methodologies start with entities and relationships observed in the
real world, and their systematic translation into relations produces 3NF (or higher)
most of the time

2) Practical relevance of normal forms has been overemphasized

• Goal of normalization: produce simple relations representing in a natural way a portion


of the “real world”

• But why start with a complex description in the first place to have to normalize it
afterwards?

• Reason should be traced back to pre-relational technology: space was at a premium


and attributes were grouped in large physical records to save space

Normalization- 43 Normalization- 44
• Another reason for favoring large relation schemas was to minimize the number of Relational Database Design
joins in access programs
• A complex process
• Multi-level architectures now permit different schemas at the logical and physical levels
• Made simpler by starting with a more expressive schema in a suitable model (ER,
• Normalization was a nice relatively easy piece of relational theory (hard to resist for OO)
researchers!)
• Principles of translating an ER schema into a relational schema are simple ...
• Normal forms are properties of relations in isolation; if and when (inter-relation) con-
straints are seriously taken care of by DBMSs, normalization as a criterion for the • But faithfully translating an ER schema into a relational schema is an immense task,
quality of a database schema will lose some of its emphasis if done without loss of information

• In practice, even with sophisticated CASE tools, some information will be lost in the
translation for the relational schema to remain manageable
• Traditional relational design neglects essential dependencies (inclusion and join de-
• Definition of the relational model was a “revolution” against “bad” practices that were pendencies)
prevailing in database management before the relational model
• Normalization theory concerns both ER and relational schemas
• Revolution went too far: 1NF is too restrictive for modeling complex data, 1NF was
a simple radical idea Thesis

• Decomposition algorithms suppose that all functional dependencies have been specified • Relational DB design has over-emphasized importance of normalization theory
• Overlooking FDs may produce undesirable designs

• Not easy to solve in practice: it is better to allow grouping attributes on less formal
grounds during conceptual modeling

• Algorithms that require a minimal cover depend on which minimal cover is picked
(nondeterminism)

Denormalization for Efficiency

• Normalization considers only consistency, performance also matters


• When the normalized database entails too many costly joins, then decide, at the physical
level to store the joins rather than the projections
• Still, it is more appropriate to start performance tuning with a well-designed database
• Rationale for design in stages: first analysis, then design, then implementation and performance
tuning

74

Normalization- 45 Normalization- 46

You might also like