Logical Database Design: Unit 7
Logical Database Design: Unit 7
7.3 First, Second, and Third Normal Forms (1NF, 2NF, 3NF)
7-3
Logical Database Design
Logical database design vs. Physical database design
Logical Database Design
• Normalization
• Semantic Modeling, eg. E-R model
Problem of Normalization
• Given some body of data to be represented in a database,
how to decide the suitable logical structure they should
have?
• what relations should exist?
• what attributes should they have?
Normalization
S P SP
S# SNAME STATUS CITY P# ... ... ... S# P# QTY
s1 . . London . . . . . . .
. . . . . . . . . . .
S' P SP'
S# SNAME STATUS P# ... ... ... S# CITY P# QTY
or S1 London P1 300
S1 Smith .
S2 . . . . . . S1 London P2 200
. . . . . . . . . . .
2NF relations
3NF relations
BCNF relations
4NF relations
5NF relations
Wei-Pang Yang, Information Management, NDHU Fig. 7.1: Normal Forms 7-6
7.2 Functional Dependency
Functional Dependency (FD)
Fully Functional Dependency (FFD)
7-7
Functional Dependency
Functional Dependency
• Def: Given a relation R, R.Y is functionally dependent on R.X iff
each X-value has associated with it precisely one Y-value (at any
time).
• Note: X, Y may be the composite attributes. R
Notation: . . X Y
. .
R.X R.Y
.
S# STATUS
SNAME CITY
CITY
<e.g.3> SP
SP S# P# QTY
S1 P1 300
S1 P2 200
S# S1 P3 400
QTY S1 P4 200
S1 P5 100
P# S1 P6 100
S2 P1 300
S2 P2 400
S3 P2 200
If X is a candidate key of R, then all attributes Y S4 P2 200
S4 P4 300
of R are functionally dependent on X. (i.e. X Y) S4 P5 400
dependent on X iff S1
London Taipei
• (1) Y is FD on X
• (2) Y is not FD on any proper
subset of X.
SP'
<e.g.> SP' (S#, CITY, P#, QTY) S# CITY P# QTY
S1 London P1 300
S# FFD S1 London P2 200
QTY
P# … …. … ...
S# FD FD
CITY S# CITY S# CITY
not FFD … …..
P#
7-13
Normal Forms: 1NF
Def: A relation is in 1NF iff all underlying simple domains contain atomic
values only.
fact
FIRST
S# STATUS CITY P# QTY
S1 20 London P1 300
S# STATUS CITY (P#, QTY) S1 20 London P2 200
S1 20 London P3 400
S1 20 London P4 200
S1 20 London {(P1, 300), (P2, 200), ..., (P6, 100)} S1 20 London P5 100
S2 10 Paris {(P1, 300), (P2, 400)} S1 20 London P6 100
S3 10 Paris {(P2, 200)} S2 10 Paris P1 300
S4 20 London {(P2, 200), (P4, 300), (P5, 400)} S2 10 Paris P2 400
S3 10 Paris P2 200
S4 20 London P2 200
Suppose 1. CITY is the main office of the supplier. S4 20 London P4 300
S4 20 London P5 400
2. STATUS is some factor of CITY
Key:(S#,P#),
Normalized 1NF
2nd SP
SECOND(S#, STATUS, CITY), SP(S#,P#,QTY)
The collection of 2NF relations may contain “more” information than the equivalent
1NF relation.
<e.g.> (S5, 30, Athens)
S.S# S.STATUS
S.S# S. CITY
S.CITY S.STATUS cause a transitive dependency
FD:
STATUS 1. S# STATUS
S# 2. S# CITY
CITY 3. CITY STATUS
STATUS
STATUS
S# CITY
S# CITY
①Decomposition A: ② Decomposition B:
③ Decomposition C:
SC: SC:
S# CITY S# CITY S# -> status
CS:
CITY STATUS
CS:
S# STATUS
city -> status
Decomposition C:
SECOND(S#, STATUS, CITY) { SS (S#, STATUS)
CS (CITY, STATUS)
7-28
Example
– S: student SJT(S, J, T)
– J: subject S J T
– T: teacher Smith Math. Prof. White
Smith Physics Prof. Green
Jones Math. Prof. White
Jones Physics Prof. Brown
S# STATUS
SNAME CITY
Assume :
(1) CITY, STATUS are independent
3NF • 3NF but not BCNF
(2) SNAME is a candidate key
BCNF
S#, SNAME (determinants) are candidate keys. ‧e.g.2
S is in BCNF (also in 3NF).
P#
QTY S#
S# SNAME QTY
P#
7-37
Un-Normalized Relation
CTX Text
COURSE TEACHER TEXT 1
Physics {Prof. Green, {Basic Mechanics, Math 2
Prof. Brown} Principle of Optics}
Math. {Prof. Green} {Basic Mechanics, 3
Vector Analysis,
Trigonometry}
meaning of a record: the specified course can be taught by any of the specified
teachers and uses all of the specified texts as references.
Assume:
- For a given course, there exists any number of teachers and any number of texts.
- Teachers and texts are independent.
- A given teacher or a given text can be associated with any number of courses.
• property: Physics(c)
physics(c)
Prof. Green (t1)
Prof. Green (t1)
Basic Mechanics (x1)
Principle of Optics (x2)
if (c, t1, x1), (c, t2, x2) both appear physics(c) Prof. Brown (t2) Basic Mechanics (x1)
then (c, t1, x2) , (c, t2, x1) both appear also! physics(c) prof. Brown (t2) Principles of Optics (x2)
• reason: No FD, but has MVD! Math prof. Green Basic Mechanics
Math prof. Green Vector Analysis
intuitively decomposed Math prof. Green Trigonometry
CT: CX:
COURSE TEACHER COURSE TEXT
Physics Prof. Green Physics Basic Mechanics
Physics Prof. Brown Physics Principles of Optics
Math Prof. Green Math Basic Mechanics
Math Vector Analysis
Physics Math Trigonometry
Math
Not FD!
• the decomposition cannot be made on the basis of FD.
Wei-Pang Yang, Information Management, NDHU 7-41
MVD ( Multi-Valued Dependencies)
Def: Given R(A, B, C), the multivalued dependence (MVD)
R.A R.B holds in R iff the set of B-values matching a given (A-value, C-value)
pair is R, depend only on A-value, and is independent of C-value.
<e.g> COURSE TEACHER, COURSE TEXT
Green
{ physics , { Brown
{ , Basic Mechanics
{
A B C
Thm: Given R(A, B, C), the MVDR.A R.B holds iff the MVD R.A R.C also holds.
• Notation: R.A R.B | R.C
<e.g.> COURSE TEACHER | TEXT MVD .
<Note> 1. FD is a special case of MVD
. .
all FD's are also MVD's .
2. MVDs (which are not also FD's) can exist only
if the relation R has at least 3 attributes. FD ‧ ‧
7-45
A Surprise
There exist relations that cannot be nonloss-decomposed into two projections,
but can be decomposed into three or more.
Def: n-decomposable (for some n > 2)
the relation can be nonloss-decomposed into n projections,
but not into m projection for any m < n.
<e.g.> SPJ (S#, P#, J#); S: supplier, P: part, J: project.
• Suppose in real world
if (a) Smith supplies monkey wrenches, and
(b) Monkey wrenches are used in Manhattan project, and
(c) Smith supplies Manhattan project.
then
(d) Smith supplies Monkey wenches to Manhatan project.
i.e.
If (s1, p1, j2), (s2, p1, j1), (s1, p2, j1) appear in SPJ
Then (s1, p1, j1) appears in SPJ also.
– no MVD in 4NF
S# P# J#
S1 P1 J2
S1 P1 J1
S1 P2 J1
spurious S2 P1 J2 join over (J#, S#)
S2 P1 J1
ORIGINAL SPJ
Note:
1. Discovering all the JD's is a nontrivial operation.
2. Intuitive meaning of JD may not be obvious.
3. A relation in 4NF but not in 5NF is a pathological case, and likely to be rare in
practice.