Relational Normalization: Contents Relational Database Design: Rationale
Relational Normalization: Contents Relational Database Design: Rationale
1 3
Normalization- 1 Normalization- 2
Example of Relational Schema Logical Level: Semantics of Relation Attributes
Employee
SSN FName MInit LName BDate Address Sex Salary SuperSSN DNo • Meaning (semantics) is associated with attributes = interpretation of values in
relation tuples
Department
DNumber DName DMgr MgrStartDate
• The clearer the semantics of a relation, the better the design of the schema
DeptLocation
DNumber DLocation • A plausible set of rules:
Project – each tuple should represent one entity or one relationship instance
PNumber PName PLocation DNumber
– attributes of different entities and relationships should not be mixed in the
WorksOn same relation
PNumber PName Hours
– only foreign keys should be used to refer to other entities (most probably
Dependent
SSN DependentName Sex BDate Relationship with referential integrity)
4 5
@
@
Emp
EName SSN BDate Address DNumber
Dept
DNumber DName DMgr
Normalization- 3 Normalization- 4
• Clear semantics but poor design: attributes from distinct real-world entities (Employee,
Department) are mixed in EmpDept
• Maybe acceptable for views, but causes problems when done with base relations
Emp
SSN EName
WorksOn
SSN PNumber Hours
8
Insertion Anomalies
EmpDept
EName SSN BDate Address DNumber DName DMgr
Normalization- 5 Normalization- 6
Spurious Tuples
Spurious Tuples in EmProj1 1 EmpLocs
• Bad design may result in erroneous results for joins
EmpProj EmpProj1 1 EmpLocs
SSN PNumber Hours EName PName PLocation SSN P# Hours PName PLocation EName
⇓ 1234 1 32.5 ProdX Bellaire Smith
∗ 1234 1 32.5 ProdX Bellaire English
EmpLocs EmpProj1 1234 2 7.5 ProdY Sugarland Smith
EName PLocation SSN PNumber Hours PName PLocation ∗ 1234 2 7.5 ProdY Sugarland English
6668 3 40.0 ProdZ Houston Narayan
∗ 4534 1 20.0 ProdX Bellaire Smith
• EmpProj =
6 join of its projections EmpLocs and EmpProj1
4534 1 20.0 ProdX Bellaire English
– common attribute (PLocation) is not a key or a foreign key 4534 2 20.0 ProdY Sugarland Smith
– joining yields more tuples than in EmpProj (“spurions tuples”) ∗ 4534 2 20.0 ProdY Sugarland English
9 11
EmpProj
SSN P# Hours EName PName PLocation Functional Dependencies (FDs)
1234 1 32.5 Smith ProdX Bellaire
1234 2 7.5 Smith ProdY Sugarland
• Definition 1:
6668 3 40.0 Narayan ProdZ Houston
4534 1 20.0 English ProdX Bellaire FD X → Y holds in relation R(A1 , . . . , An ), with X, Y ⊆ {A1 , . . . , An },
4534 2 20.0 English ProdY Sugarland if for every pair of tuples t1 , t2 such that t1 [X] = t2 [X] then t1 [Y ] = t2 [Y ]
⇓ • Definition 2:
X → Y if there cannot exists different tuples t1 , t2 such that t1 [X] = t2 [X]
EmpProj1 EmpLocs
SSN P# Hours PName PLocation EName PLocation • Functional dependencies are constraints, i.e., belong to the schema
1234 1 32.5 ProdX Bellaire Smith Bellaire → cannot be deduced from extension
1234 2 7.5 ProdY Sugarland Smith Sugarland
6668 3 40.0 ProdZ Houston Narayan Houston → can only be verified or invalidated on extension (one counterexample is enough
4534 1 20.0 ProdX Bellaire English Bellaire to invalidate)
4534 2 20.0 ProdY Sugarland English Sugarland
10 12
Normalization- 7 Normalization- 8
• Definition 1:
– all tuples that agree on X also agree on Y
– values for X functionally determine values for Y FDs: Example, cont.
– definition applies with t1 = t2
EmpDept
• Special case : if X is a key or superkey then X → Y for all Y ⊆ {A1 , . . . , An } EName SSN BDate Address D# DName DMgrSSN
Smith 1234 21/07/39 ... 1 Research 1234
Narayan 6668 18/01/43 ... 1 Research 1234
English 4534 8/05/53 ... 2 Account 4534
Wong 9788 30/11/49 ... 3 Admin 9788
Zelaya 6677 23/08/60 ... 3 Admin 9788
FDs: Example
?
DMgrSSN → D#
EmpDept EmpProj
EName SSN BDate Address D# DName DMgrSSN SSN P# Hours EName PName PLocation
6 6 6 6 1234 1 32.5 Smith ProdX Bellaire
6 6 1234 2 7.5 Smith ProdY Sugarland
SSN → EName SSN → D# 6668 3 40.0 Narayan ProdZ Houston
SSN → {EName,BDate} D# → DMgrSSN 4534 1 20.0 English ProdX Bellaire
4534 2 20.0 English ProdY Sugarland
EmpProj
SSN P# Hours EName PName PLocation PLocation → PName does not hold
6
6 14
6 6
{SSN,P#} → Hours
P# → {PName,PLocation}
13
• Only most obvious (or most important) dependencies are explicitly specified
• Many other dependencies can be deduced from them
• Closure F + of a set F of dependencies:
– set of all dependencies in F + those implied by F
– dependency X → Y is implied by F if
X → Y is valid in all relation instances for which F is valid
– X → Y ∈ F iff F |= X → Y
15
Normalization- 9 Normalization- 10
Armstrong’s Inference Rules
Example: Deriving FDs
A1 (Reflexivity) If Y ⊆ X, then X → Y
EmpDept
A2 (Augmentation) If X → Y , then XZ → Y Z (and XZ → Y ) EName SSN BDate Address D# DName DMgrSSN
A3 (Transitivity) If X → Y and Y → Z, then X → Z 6 6 6 6
6 6
A1, A2, A3 form a sound and complete set of inference rules
SSN → EName D# → DName ⇒ SSN → DName
Some useful additional inference rules SSN → BDate D# → DMgrSSN SSN → DMgrSSN
A4 (Decomposition) If X → Y Z, then X → Y and X → Z SSN → Address ... SSN → {EName,DName}
SSN → D# ..
.. .
A5 (Union) If X → Y and X → Z, then X → Y Z
.
A6 (Pseudotransitivity) If X → Y and W Y → Z, then W X → Z
16 17
EmpProj
SSN P# Hours EName PName PLocation
6
6
6 6
P# → P#
⇒ P# → {P#,PLocation}
P# → PLocation
18
Normalization- 11 Normalization- 12
• Algorithm starts setting X+ to all attributes in X: by A1 all these attributes are
functionally dependent on X
21
Closure of Attribute Sets
X+ := X;
repeat
oldX+ := X+ ;
for each FD Y → Z in F do
if X+ ⊇ Y then X+ := X+ ∪ Z;
until (X+ = oldX+ )
20
Normalization- 13 Normalization- 14
Finding Minimal Covers
Equivalence of Sets of FDs
Algorithm for finding a minimal cover G of F
• Two sets of FDs F and G are equivalent if (1) Set G := F
– every FD in F can be inferred from G, and
(2) Replace each FD X → {A1 , . . . An } in G by n FDs X → A1 , . . . , X → An
– every FD in G can be inferred from F
(3) For each FD X → A in G
• Hence F and G are equivalent if F + = G+ for each attribute B that is an element of X
if (G − {X → A}) ∪ {(X − {B}) → A} is equivalent to G
• F covers G if every FD in G can be inferred from F (i.e., if G+ ⊆ F + )
then replace X → A with (X − {B}) → A in G
• F and G are equivalent if F covers G and G covers F
(4) For each remaining FD X → A in G
• There is an algorithm for cheking equivalence of sets of FDs if (G − {X → A}) is equivalent to G
then remove X → A from G
22 24
23 25
Normalization- 15 Normalization- 16
First minimal cover
• CE → A implied by C → A
• CG → B implied by CG → D, C → A, ACD → B
EmpDept
EName SSN Bdate Address DNo DName DMgr
6 6 6 6 27
6 6
⇓ Normalization
Dept Emp
DNo DName DMgr EName SSN Bdate Address DNo
6 6 6 6 6 6
26 • Relations not always normalized to the highest possible form, e.g., for perfor-
mance reasons
• Higher normals forms
⇒ smaller relations
⇒ more joins in queries
⇒ space/time or query/update tradeoff
28
Normalization- 17 Normalization- 18
• Redundancy: for each value of DLocation, other attributes has to be repeated
Nested Relations
First Normal Form (1NF)
EmpProj
Projs
• Relations as defined in the relational model SSN EName PNumber Hours
• Forbids composite attributes, multivalued attributes, nested relations (relations EmpProj
within relations) Projs
• Relational-model limitation to 1NF: SSN EName PNumber Hours
123456789 Smith, John B. 1 32.5
– historical reasons (simplify file management) 2 7.5
– on retrospect, a mistake 666884444 Narayan, Ramesh K. 3 40.0
999888777 Zelaya, Alicia J. 1 20.0
2 20.0
453453453 English, Joyce A. 30 30.0
10 10.0
⇓ 1NF Normalization (unnest operation)
EmpProj1 EmpProj2
29
SSN EName SSN PNumber Hours
31
Multivalued attributes
• Nested relations: value of an attribute of a relation can be a relation
Department
DName DNumber DMgr {DLocations} • SSN: primary key of EmpProj
6 6 6
• PNumber: primary key of each nested Projs relation
Department
• Unnest operation transforms the relation into 1NF
DName DNumber DMgr {DLocations}
Research 5 333445555 {Bellaire,Sugarland,Houston} • The primary key has to be propagated into the embedded relation
Administration 4 987654321 {Stafford}
Headquarters 1 888665555 {Houston}
⇓ 1NF Normalization
Department
DName DNumber DLocations DMgr
Research 5 Bellaire 333445555
Research 5 Sugarland 333445555
Research 5 Houston 333445555
Administration 4 Stafford 987654321
Headquarters 1 Houston 888665555
30
Normalization- 19 Normalization- 20
Second Normal Form (2NF)
Emp Proj
SSN PNumber Hours EName PName PLocation Third Normal Form (3NF)
fd1 6 Emp Dept
fd2 6 EName SSN BDate Address DNumber DName DMgr
fd3 6 6
6 6 6 6
• Prime attribute: attribute which is a member of a key 6 6
• Full functional dependency: an FD X → Z where removal of any attribute • X → Z is a transitive functional dependency if ∃ Y such that X → Y and
from X invalidates the dependency Y → Z (and Z is not a subset of a key)
– fd1 is a full FD, neither SSN → Hours nor PNumber → Hours hold – SSN → DMgr is a transitive FD
– SSN,PNumber → EName is not a full FD (i.e., is a partial dependency) (SSN → DNumber and DNumber → DMgr hold)
since SSN → EName (fd2) also holds – SSN → EName is a non-transitive FD
(there is no set of attributes X where SSN → X and X → EName)
• Definition: a relation schema R is in 2NF if every nonprime attribute is fully
functionally dependent on every key
• A relation where all keys are single attributes is automatically in 2NF
32 34
33 35
Normalization- 21 Normalization- 22
Normalization: Example, cont.
Third Normal Form: Example
Lots1A
Emp Dept PropertyId# CountyName Lot# Area
EName SSN BDate Address DNumber DName DMgr fd1 6 6 6
fd2 6 6
6 6 6 6
6 6 Lots1B
⇓ 3NF Normalization Area Price
fd4 6
Emp Dept1
EName SSN BDate Address DNumber
Lots 1NF
6 6 6 6
@
Emp Dept2 @
DNumber DName DMgr Lots1 Lots2 2NF
@
6 6 @
Lots1A Lots1B Lots2 3NF and BCNF
36 38
Normalization: Example
Lots
PropertyId# CountyName Lot# Area Price TaxRate Boyce-Codd Normal Form (BCNF)
fd1 6 6 6 6 6
fd2 6 6 6 6 • A relation schema R is in BCNF if, for all X → A ∈ F + , X is a superkey of R
fd3 6 (and A 6∈ X)
fd4 6
• BCNF is an improved 3NF : all dependencies result from keys
Lots1
PropertyId# CountyName Lot# Area Price • A relation with 2 attributes is automatically in BCNF
fd1 6 6 6 6 • Most 3NF relations are also in BCNF
fd2 6 6 6
fd4 • Intuition of 3NF/BCNF: FDs concern the key, the whole key, and nothing
6
but the key
Lots2
CountyName TaxRate
fd3 6
37 39
Normalization- 23 Normalization- 24
A relation in 3NF but not in BCNF
Relational Decomposition
PatVisit
Patient Hospital Doctor • Normalization decomposes relation schemas with undesirable aspects into smaller
PatVisit
Smith Alachua Atkinson relations
Patient Hospital Doctor
Lee Shands Smith
6 Marks Alachua Atkinson • Consider a relation schema R(A1 , . . . , An ) and a set of dependencies F
6 Marks Shands Shaw • Goal: produce a decomposition D of R into m relation schemas D = {R1 , . . . , Rm }
Rao North Florida Nefzger where each Ri contains a subset of {A1 , . . . , An }, and
⇓ BCNF Normalization – every attribute Ai in R appears in at least one Ri
PatDoctor DoctHosp – each relation Ri is at least in BCNF or in 3NF
Patient Doctor Doctor Hospital
Smith Atkinson Atkinson Alachua • Extreme decomposition approach: start with a universal relation schema con-
Lee Smith Smith Shands taining all the DB attributes and a set of FD
Marks Atkinson Shaw Shands • ⇒ Universal relation assumption is needed: every attribute is unique, i.e.,
Marks Shaw Nefzger North Florida attributes with the same name in different relations have the same meaning
Rao Nefzger
40 41
• Requiring each individual relation to be in a given normal form does not alone
guarantee a good design
• BCNF (or 3NF) measure “goodness” for individual relations based on their keys
and functional dependencies
• A set of relations must possess additional properties to ensure a good design
– dependency preservation
– lossless (nonadditive) join
• In traditional approach to relational database design, dependency preservation
is required because of the weak support for integrity constraints by DBMSs
42
Normalization- 25 Normalization- 26
Dependency Preservation: Formalization
Dependency Preservation • A decomposition D must preserve the dependencies: collection of all dependen-
cies that hold on individual relations Ri must be equivalent to F
• Consider a relation schema R(A1 , . . . , An ), a set of dependencies F , and a de-
composition D of R into m relation schemas D = {R1 , . . . , Rm } • Formally
Q
– Projection F (Ri ) of F on Ri : set of FDs X → Y in F + such that (X ∪Y ) ⊆
• Dependency preservation: each FD X → Y of F should appear explicitly or be
Ri (their left- and right-hand side attributes are in Ri )
inferrable in one relation schema Ri
– A decomposition D = {R1 , . . . , Rm } is dependency-preserving if
• Otherwise, to preserve information, inter-relation FDs are needed (i.e., depen- Q Q
( F (R1 ) ∪ . . . ∪ F (Rm ))+ = F +
dencies that hold on a join of several relations of the decomposition)
• Dependency preservation enables checking that FDs in F hold by checking them
on each relation Ri individually
43 45
44 46
Normalization- 27 Normalization- 28
Lossless (Nonadditive) Join Property
Properties of the Nonadditive Join (2)
• Ensures that no spurious tuples appear when relations in the decomposition are If D = {R1 , . . . , Rm } of R has the nonadditive-join property w.r.t. F , and Di =
joined Q
{Q1 , . . . , Qk } of Ri has the nonadditive-join property w.r.t. Ri (F ), then
• Decomposition D = {R1 , . . . , Rm } of R has the lossless-join property w.r.t. a D = {R1 , . . . , Ri−1 , Q1 , . . . , Qk , Ri+1 , . . . , Rm }
set F of FDs if, for every relation instance r(R) whose tuples satisfy all the FDs
in F : has the nonadditive join property w.r.t. F
ΠR1 (r) 1 . . . 1 ΠRm (r) = r Emp
SSN PNumber Hours EName
(This is the general form of join dependency, see later)
6
• Ensures that whenever a relation instance r(R) satisfies F , no spurious tuples 6
are generated by joining the decomposed relations r(Ri ) ⇓
Emp1 Emp2
• Necessary to generate meaningful results for queries involving joins SSN PNumber Hours SSN EName
• There exists an algorithm for testing whether a decomposition D satisfies the 6 6
lossless-join property with respect to a set F of FDs
47 49
48 50
Normalization- 29 Normalization- 30
Decomposition Example
Combined Decomposition Algorithm
Produce a lossless join and dependency-preserving decomposition into 3NF
Stock
Model# Serial# Price Color Name Year
(1) Find a minimal cover G of F
(2) For each X in an FD X → A in G
create a relation in D with attributes {X ∪ A1 ∪ . . . ∪ Ak } where Dependencies Minimal Cover
X → A1 , . . . , X → Ak are the only FDs in G with X as left-hand side {M,S} → {P,C} {M} → {N} {M,S} → {C} {M} → {N}
(X is the key of this relation) {S} → {Y} {N,Y} → {P} {S} → {Y} {N,Y} → {P}
(3) If none of the relations in D contains a key of R, create one more relation schema
in D that contains attributes that form a key of R Stock1 Stock2
Model# Serial# Color Model# Name
• Step 3 of previous algorithm is not needed because the key will include any Stock3 Stock4
unplaced attributes (i.e., attributes not participating in any FD) Serial# Year Name Year Price
51 53
Multivalued Dependencies
• Semantics: every teacher who teaches a course uses all the texts for that course
• Determines only one key out of the possible candidates keys for R
(independence of Teacher and Text)
• Key returned depends on the order in which attributes are removed
• For two or more multivalued independent attributes, every value of one of the
attributes must be repeated with every value of the other attribute to keep the
relation consistent
52 54
Normalization- 31 Normalization- 32
• In the example, Course →→ Teacher | Text: each course is associated with a set of
teachers and with a set of texts, and these sets are independent of each other
• In the example, Course →→ Teacher and Course →→ Text – always holds according to the MVD definition
• FDs are special cases of MVDs: If X → Y holds, then X →→ Z also holds
55
57
Intuition of MVDs
56
Normalization- 33 Normalization- 34
Inference Rules for FDs and MVDs
Motivation for 4NF
I1 Y ⊆ X ⇒ X → Y
• A relational schema with non-trivial MVDs is not a good design
I2 X → Y ⇒ XZ → Y Z
• Update anomalies : for a new teacher of Physics, we must insert two tuples
I3 X → Y and Y → Z ⇒ X → Z
Course Teacher Text
I4 X → Y ⇒ X → Z where Z = R − (X ∪ Y ) Physics Green Mechanics
Physics Green Thermodynamics
I5 X →→ Y and Z ⊆ W ⇒ W X →→ Y Z Physics Brown Mechanics
Physics Brown Thermodynamics
I6 X →→ Y and Y →→ Z ⇒ X →→ (Z − Y ) Physics Black Mechanics
Physics Black Thermodynamics
I7 X → Y ⇒ X →→ Z Math White Algebra
Math White Geometry
I8 X →→ Y and W → Z (for Z ⊆ Y , W ∩ Y = ∅ and W ∩ Z = ∅) ⇒ X → Z
58 59
A Sound and Complete Set of Inference Rules for FDs • This relation represents two independent 1:N relationships
and MVDs
Course Teacher Course Text
To compute the closure of a set F of functional and multivalued dependencies (F + )
Physics Green Physics Mechanics
I1 (Reflexivity for FDs) If Y ⊆ X, then X → Y Physics Brown Physics Thermodynamics
Physics Black Math Algebra
I2 (Augmentation for FDs) If X → Y , then XZ → Y Z Math White Math Geometry
I3 (Transitivity for FDs) If X → Y and Y → Z, then X → Z
Normalization- 35 Normalization- 36
Decomposition in 4NF
Definition of 4NF
• Given a MVD X →→ Y that holds in a schema R, the decomposition into R1 = (X ∪ Y ) and
• A relation schema R is in 4NF w.r.t. a set of FDs and MVDs F if, for every nontrivial
R2 = (R − Y ) has the nonadditive-join property
multivalued dependency X →→ Y in F + , X is a superkey of R
• The converse also holds
• Since every MVD is an FD, 4NF implies BCNF
• Thus, a decomposition D = {R1 , R2 } of R has the nonadditive join property with respect to F
• In other words, a relation is in 4NF if it is in BCNF and if every nontrivial MVD is also an FD
if and only if either:
• If all dependencies in F are FDs, the definition of 4NF reduces to that of BCNF – (R1 ∩ R2 ) →→ (R1 − R2 ) holds in F + , or
• Although many relations in BCNF but not in 4NF are all-key (they have no FD), this is not – (R1 ∩ R2 ) →→ (R2 − R1 ) holds in F +
necessarily so
• Actually, if one of these holds, so does the other
60 62
61 63
Normalization- 37 Normalization- 38
A Sufficient Condition for Testing 4NF
• Given a relation schema R(U ), a subset C of U is a cut if every key of R has a non-empty
intersection with C and a nonempty intersection with U − C
EmpProj
Emp Proj Loc
Embedded Multivalued Dependencies
Smith P1 FL
Smith P2 CA • FDs are preserved when adding or suppressing an attribute (provided it is not involved in the
Smith P3 AZ FD)
Walton P1 CA
Walton P2 AZ • On the contrary, some MVDs are expected to hold after projection but are not explicit as MVD
before the projection
• Assume that EmpProj has the keys {Emp,Proj} (an employee works for a project
Regist
in only one location) and {Emp,Loc} (an employee works in one location for only Course Stud Preq Year
one project) CS402 Jones CS311 1988
CS402 Smith CS401 1989
• In EmpProj there is only one cut : {Proj,Loc} FD: {Stud,Preq} → Year
• Suppose that another key is added: {Proj,Loc} (given a project and a location, • Course →→ Stud does not hold: (CS402,Jones,CS401,1989) 6∈ Regist
only one employee is attached to them)
• Now, relation EmpProj has no cut
64 66
Regist1 Regist2
Stud Preq Year Course Stud Preq
Corollary: If a relation schema is in BCNF and has a simple (non composite) key,
• Course →→ Stud (and Course →→ Preq) hold in Regist2: every student enrolled in a course
then it is in 4NF
is required to have taken each prerequisite for the course
(a relation with a simple key has no cut)
• Corresponding constraint in the original relation Regist is an embedded multivalued dependency
• It is written Course →→ Stud|Preq, meaning that the dependency holds in the projection
πCourse,Stud,Preq (Regist)
• Regist2 is not in 4NF, and it should be decomposed into relations (Course,Stud) and (Course,Preq)
65 67
Normalization- 39 Normalization- 40
Join Dependencies
Join Dependencies
• The constraint in the schema is equivalent to say
• There are relations where a nonadditive-join decomposition can only be realized with more than
if hs1 , p1 , j2 i, hs2 , p1 , j1 i, hs1 , p2 , j1 i appear in Supply
two relation schemas
then hs1 , p1 , j1 i also appears in Supply
Supply
Supplier Part Proj • Supply satisfies the join dependency
Smith Bolt ProjX JD({Supplier,Part},{Part,Proj},{Supplier,Proj}), i.e.
Smith Nut ProjY Supply = Supply[Supplier,Part] 1 Supply[Part,Proj] 1
Adamsky Bolt ProjY Supply[Supplier,Proj]
Smith Bolt ProjY
• A join dependency JD(R1 , R2 , . . . , Rn ) on a relation R specifies that every instance of R has
• If a supplier s supplies part p, a project j uses part p, and the supplier s supplies at least one a nonadditive-join decomposition into R1 , R2 , . . . , Rn
part to project j, then supplier s also supplies part p to project j
• A MVD is a special case of a JD where n = 2
• Relation is “all key”, involves no nontrivial FDs or MVDs ⇒ is in 4NF
• A JD(R1 , R2 , . . . , Rn ) on R is a trivial JD if some Ri = R
68 70
69 71
Normalization- 41 Normalization- 42
Fifth Normal Form (5NF)
• A relation schema R is in 5NF w.r.t. a set F or FDs, MVDs, and JDs if for every nontrivial
JD(R1 , R2 , . . . , Rn ), each Ri is a superkey
Supplier Part Proj
⇓ 5NF normalization Critique of Relational Normalization
Supplier Part Supplier Proj Part Proj
(1) Normal forms are easier to understand and appreciate from a richer point of view on data
• 5NF is also called PJNF (project-join normal form) modeling, namely, ER or OO
(2) Practical relevance of normal forms has been overemphasized
• Since a MVD is a special case of a JD, every relation in 5NF is also in 4NF
• Every relation can be non losslessly decomposed into 5NF relations
• If a relation is in 3NF and all its keys are simple, then it is in 5NF
• Discovering JDs in practice for large databases is difficult
72 73
1) Normal forms are easier to understand and appreciate from a richer point
• Many obvious join dependencies are based on keys of view on data modeling, namely, ER or OO
• Further decomposition ultimately leads to irreducible relations • Particularly striking for the “higher” normal forms (4NF, 5NF)
• MVDs are not stable when relation schemas are modified (leading to embedded de-
pendencies) ⇒ complexity of MVDs is largely a relational problem
• A “complex” normal form like 4NF is better analyzed in ER terms than just with the
multi-valued dependency ⇒ there are various interpretations for an MVD
– multi-valued attribute
– grouping of independent facts within one relation
– integrity constraint in a genuine relationship
• ER-based design methodologies start with entities and relationships observed in the
real world, and their systematic translation into relations produces 3NF (or higher)
most of the time
• But why start with a complex description in the first place to have to normalize it
afterwards?
Normalization- 43 Normalization- 44
• Another reason for favoring large relation schemas was to minimize the number of Relational Database Design
joins in access programs
• A complex process
• Multi-level architectures now permit different schemas at the logical and physical levels
• Made simpler by starting with a more expressive schema in a suitable model (ER,
• Normalization was a nice relatively easy piece of relational theory (hard to resist for OO)
researchers!)
• Principles of translating an ER schema into a relational schema are simple ...
• Normal forms are properties of relations in isolation; if and when (inter-relation) con-
straints are seriously taken care of by DBMSs, normalization as a criterion for the • But faithfully translating an ER schema into a relational schema is an immense task,
quality of a database schema will lose some of its emphasis if done without loss of information
• In practice, even with sophisticated CASE tools, some information will be lost in the
translation for the relational schema to remain manageable
• Traditional relational design neglects essential dependencies (inclusion and join de-
• Definition of the relational model was a “revolution” against “bad” practices that were pendencies)
prevailing in database management before the relational model
• Normalization theory concerns both ER and relational schemas
• Revolution went too far: 1NF is too restrictive for modeling complex data, 1NF was
a simple radical idea Thesis
• Decomposition algorithms suppose that all functional dependencies have been specified • Relational DB design has over-emphasized importance of normalization theory
• Overlooking FDs may produce undesirable designs
• Not easy to solve in practice: it is better to allow grouping attributes on less formal
grounds during conceptual modeling
• Algorithms that require a minimal cover depend on which minimal cover is picked
(nondeterminism)
74
Normalization- 45 Normalization- 46