0% found this document useful (0 votes)
189 views

Normalization

Uploaded by

api-3722484
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
189 views

Normalization

Uploaded by

api-3722484
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 27

Normalization

Database Group, Georgia Tech


© Leo Mark 1
Normalization
What it’s all about

• Given a relation, R, and a set of


functional dependencies, F, on R.
• Assume that R is not in a desirable
form for enforcing F.
• Decompose relation R into relations,
R1,..., Rk, with associated functional
dependencies, F1,..., Fk, such that
R1,..., Rk are in a more desirable
form, 3NF or BCNF.
• While decomposing R, make sure to
preserve the dependencies, and
make sure not to lose information.

Database Group, Georgia Tech


© Leo Mark 2
Normalization
Contents

• The Good and the Bad


• Bad database design
– redundancy of fact
– fact clutter
– information loss
– dependency loss
• Good database design
• How to compute with meaning
– functional dependencies - FDs
– Armstrong’s inference rules
– the meaning of a set of FDs
– minimal cover of a set of FDs
• Normal Forms - overview
• 1NF, 2NF, 3NF, BCNF
Database Group, Georgia Tech
© Leo Mark 3
Normalization
The Good

e F o ur e n t s :
Th n dm
m a i t
Com t Co m m
h a l
Th o uS n c y of

e d u n da
No R
Fact C l u t ter
S h a lt
u
• Tho cts
No Fa
u S h alt
• Tho rve
Prese tion
f o r m a
In
S h a lt
u
• Tho rve
Prese nal
io
Funct encies
nd
Depe

Database Group, Georgia Tech


© Leo Mark 4
Normalization
Primitive Domains
FLT-SCHEDULE
flt# weekday airline dtime from atime to
DL242 MO WE FR DELTA 10:40 ATL 12:30 BOS
SK912 SA SU SAS 12:00 CPH 15:30 JFK
AA242 MO FR AA 08:00 CHI 10:10 ATL

Attributes must be defined over


domains with atomic values

FLT-SCHEDULE
flt# weekday airline dtime from atime to
DL242 MO DELTA 10:40 ATL 12:30 BOS
DL242 WE DELTA 10:40 ATL 12:30 BOS
DL242 FR DELTA 10:40 ATL 12:30 BOS
SK912 SA SAS 12:00 CPH 15:30 JFK
SK912 SU SAS 12:00 CPH 15:30 JFK
AA242 MO AA 08:00 CHI 10:10 ATL
AA242 FR AA 08:00 CHI 10:10 ATL

Database Group, Georgia Tech


© Leo Mark 5
Normalization
Bad Database Design
- redundancy of fact
FLIGHTS
flt# date airline plane#
DL242 10/23/00 Delta k-yo-33297
DL242 10/24/00 Delta t-up-73356
DL242 10/25/00 Delta o-ge-98722
AA121 10/24/00 American p-rw-84663
AA121 10/25/00 American q-yg-98237
AA411 10/22/00 American h-fe-65748

• redundancy: airline name repeated for


same flight
• inconsistency: when airline name for a
flight changes, it must be changed many
places

Database Group, Georgia Tech


© Leo Mark 6
Normalization
Bad Database Design
- fact clutter

FLIGHTS
flt# date airline plane#
DL242 10/23/00 Delta k-yo-33297
DL242 10/24/00 Delta t-up-73356
DL242 10/25/00 Delta o-ge-98722
AA121 10/24/00 American p-rw-84663
AA121 10/25/00 American q-yg-98237
AA411 10/22/00 American h-fe-65748

• insertion anomalies: how do we represent


that SK912 is flown by Scandinavian
without there being a date and a plane
assigned?
• deletion anomalies: cancelling AA411 on
10/22/00 makes us lose that it is flown by
American.
• update anomalies: if DL242 is flown by
Sabena, we must change it everywhere.
Database Group, Georgia Tech
© Leo Mark 7
Normalization
Bad Database Design
- information loss
FLIGHTS
flt# date airline plane#
DL242 10/23/00 Delta k-yo-33297
DL242 10/24/00 Delta t-up-73356
DL242 10/25/00 Delta o-ge-98722
AA121 10/24/00 American p-rw-84663
AA121 10/25/00 American q-yg-98237
AA411 10/22/00 American h-fe-65748

FLIGHTS-AIRLINE DATE-AIRLINE-PLANE
flt# airline date airline plane#
DL242 Delta 10/23/00 Delta k-yo-33297
AA121 American 10/24/00 Delta t-up-73356
AA411 American 10/25/00 Delta o-ge-98722
10/24/00 American p-rw-84663
10/25/00 American q-yg-98237
10/22/00 American h-fe-65748

Database Group, Georgia Tech


© Leo Mark 8
Normalization
Bad Database Design
- information loss
FLIGHTS-AIRLINE DATE-AIRLINE-PLANE
flt# airline date airline plane#
DL242 Delta 10/23/00 Delta k-yo-33297
AA121 American 10/24/00 Delta t-up-73356
AA411 American 10/25/00 Delta o-ge-98722
10/24/00 American p-rw-84663
10/25/00 American q-yg-98237
10/22/00 American h-fe-65748
FLIGHTS
flt# date airline plane#
DL242 10/23/00 Delta k-yo-33297
DL242 10/24/00 Delta t-up-73356
DL242 10/25/00 Delta o-ge-98722
AA121 10/24/00 American p-rw-84663
AA121 10/25/00 American q-yg-98237
AA211 10/22/00 American h-fe-65748
AA411 10/24/00 American p-rw-84663
AA411 10/25/00 American q-yg-98237
AA411 10/22/00 American h-fe-65748

• information loss: we polluted the database


with false facts; we can’t find the true facts.
Database Group, Georgia Tech
© Leo Mark 9
Normalization
Bad Database Design
- dependency loss

FLIGHTS-AIRLINE DATE-AIRLINE-PLANE
flt# airline date airline plane#
DL242 Delta 10/23/00 Delta k-yo-33297
AA121 American 10/24/00 Delta t-up-73356
AA411 American 10/25/00 Delta o-ge-98722
10/24/00 American p-rw-84663
10/25/00 American q-yg-98237
10/22/00 American h-fe-65748

• dependency loss: we lost the fact that


(flt#, date) → plane#

Database Group, Georgia Tech


© Leo Mark 10
Normalization
Good Database Design
FLIGHTS-DATE-PLANE
flt# date plane#
FLIGHTS-AIRLINE
DL242 10/23/00 k-yo-33297
flt# airline
DL242 10/24/00 t-up-73356
DL242 Delta
DL242 10/25/00 o-ge-98722
AA121 American
AA121 10/24/00 p-rw-84663
AA411 American
AA121 10/25/00 q-yg-98237
AA411 10/22/00 h-fe-65748

• no redundancy of FACT (!)


• no inconsistency
• no insertion, deletion or
update anomalies
• no information loss
• no dependency loss

Database Group, Georgia Tech


© Leo Mark 11
Normalization
Functional Dependencies
and Keys
Let X and Y be sets of attributes in R
• Y is functionally dependent on X
in R iff for each x ∈ R.X there is
precisely one y∈ R.Y
• Y is fully functional dependent
on X in R if Y is functional
dependent on X and Y is not
functional dependent on any
proper subset of X
• We use keys to enforce functional
dependencies in relations:
X→Y

X Y

Database Group, Georgia Tech


© Leo Mark 12
Normalization
Functional Dependencies
and Keys
FLIGHTS
flt# date airline plane#

the FLIGHT relation will not allow the


FDs to be enforced by keys

FLIGHTS
flt# date airline plane#

plane# is not determined by flt# alone

FLIGHTS
flt# date airline plane#

airline is not determined by flt# and date

Database Group, Georgia Tech


© Leo Mark 13
Normalization
Functional Dependencies
and Keys

real world database

name cust# name address

address

Consider the meaning

cust# name address

cust# name address

cust# name address separate

cust# name address combined

Database Group, Georgia Tech


© Leo Mark 14
Normalization
Functional Dependencies
Dtime Atime
Airport Airline Functional Dependencies in
Name Code
From
Miles
the ER-Diagram
1 n
City Airport Flt Schedule
1 n
To
Price AIRPORT ↔ Airportcode
State
FLT-SCHEDULE ↔ Flt#
1 Flt# Weekday
FLT-INSTANCE ↔ (Flt#, Date)
Instance
AIRPLANE ↔ Plane#
Plane
Plane# Type Of Date CUSTOMER ↔ Cust#
n
1 n
RESERVATION ↔ (Cust#, Flt#, Date)
Assigned Flt Instance Ticket#
Airplane
RESERVATION ↔ Ticket#
n
Seat#
Total #Avail Reser-
#Seats Seats Vation Street Airportcode → name, City, State
Check-In
n
Status Flt# → Airline, Dtime, Atime, Miles,
First City
Customer Customer Price, (from) Airportcode, (το)
Customer Address
Name Airportcode
Middle State
(Flt#, Date) → Flt#, Date, Plane#
Phone# Cust#
Last Zip (Cust#, Flt#, Date) →Cust#, Flt#, Date,
Ticket#, Seat#, CheckInStatus,
AIRPORT
Ticket# → Cust#, Flt#, Date
airportcode name city state Cust# → CustomerName,
FLT-SCHEDULE CustomerAddress, Phone#
flt# airline dtime from-airportcode atime to-airportcode miles price
FLT-WEEKDAY
flt# weekday
FLT-INSTANCE
flt# date plane# #avail-seats
AIRPLANE
plane# plane-type total-#seats
CUSTOMER
cust# first middle last phone# street city state zip
RESERVATION
flt# date cust# seat# check-in-status ticket#

Database Group, Georgia Tech


© Leo Mark 15
Normalization
How to Compute Meaning
- Armstrong’s inference rules

Rules of the computation:


– reflexivity: if Y⊆ X, then X→Y
– Augmentation: if X→Y, then WX→WY
– Transitivity: if X→Y and Y→Z, then X→Z
Derived rules:
– Union: if X→Y and X→Z, the X→YZ
– Decomposition: if X→YZ, then X→Y and
X→Z
– Pseudotransitivity: if X→Y and WY→Z,
then XW→Z

Armstrong’s Axioms:
– sound
– complete

Database Group, Georgia Tech


© Leo Mark 16
Normalization
How to Compute Meaning
-the meaning of a set of FDs, F+
umbrella: a collapsible shade consisting
of fabric stretched over hinged ribs
radiating from a central pole

• Given the ribs of an umbrella, the


FDs, what does the whole umbrella,
F+, look like?

• Determine each set of attributes, X,


that appears on a left-hand side of a
FD. Determine the set, X+, the
closure of X under F.

Database Group, Georgia Tech


© Leo Mark 17
Normalization
How to Compute Meaning
when do sets of FDs mean the same?
• F covers E if every FD in E is also
in F+
F+
≡ E
F

• F and E are equivalent if F covers


E and E covers F.
• We can determine whether F covers
E by calculating X+ with respect to F
for each FD, X→Y in E, and then
checking whether this X+ includes
the attributes in Y+. If this is the
case for every FD in E, then F
covers E.
Database Group, Georgia Tech
© Leo Mark 18
Normalization
How to Compute Meaning
- minimal cover of a set of FDs

• Is there a minimal set of ribs


that will hold the umbrella open?

F is minimal if:
• every dependency in F has a single
attribute as right-hand side
• we can’t replace any dependency X
→A in F with a dependency Y→A
where Y⊂X and still have a set of
dependencies equivalent with F
• we can’t remove any dependency
from F and still have a set of
dependencies equivalent with F
Database Group, Georgia Tech
© Leo Mark 19
Normalization
How to guarantee
lossless joins

R1 R2=R

• Decompose relation, R, with


functional dependencies, F, into
relations, R1 and R2, with attributes,
A1 and A2, and associated functional
dependencies, F1 and F2.
• The decomposition is lossless iff:
• A1∩A2→A1\A2 is in F+, or
• A1∩A2→A2 \A1 is in F+

Database Group, Georgia Tech


© Leo Mark 20
Normalization
How to guarantee
preservation of FDs

F+=(F1∪... ∪ Fk)+

• Decompose relation, R, with


functional dependencies, F, into
relations, R1,..., Rk, with associated
functional dependencies, F1,..., Fk.
• The decomposition is dependency
preserving iff:
• F+=(F1∪... ∪ Fk)+

Database Group, Georgia Tech


© Leo Mark 21
Normalization
Overview of NFs
NF2
1NF
2NF
3NF
BCNF

Database Group, Georgia Tech


© Leo Mark 22
Normalization
Normal Forms
- definitions
• NF2: non-first normal form
• 1NF: R is in 1NF. iff all domain
values are atomic2
• 2NF: R is in 2. NF. iff R is in 1NF
and every nonkey attribute is fully
dependent on the key
• 3NF: R is in 3NF iff R is 2NF and
every nonkey attribute is non-
transitively dependent on the key
• BCNF: R is in BCNF iff every
determinant is a candidate key

• Determinant: an attribute on
which some other attribute is fully
functionally dependent.
Database Group, Georgia Tech
© Leo Mark 23
Normalization
Example of Normalization

FLT-INSTANCE
flt# date plane# airline from to miles

airline
flt# from
plane#
date to

miles

Database Group, Georgia Tech


© Leo Mark 24
Normalization
Example of Normalization
1NF: airline
flt# from
plane#
date to

miles

2NF: airline
flt# flt# from
plane#
date to

miles

airline
3NF & flt# from
BCNF: to
flt#
plane# from
date to

miles
Database Group, Georgia Tech
© Leo Mark 25
Normalization
3NF that is not BCNF
R
A B C

A
C
B

Candidate keys: {A,B} and {A,C}


Determinants: {A,B} and {C}

A decomposition:
R1 R2
C B A C

Lossless, but not dependency


preserving!
Database Group, Georgia Tech
© Leo Mark 26
Normalization
Major Results in
Normalization Theory
Theorem:
• There is an algorithm for testing a
decomposition for lossless join wrt. a
set of FDs
Theorem:
• There is an algorithm for testing a
decomposition for dependency
preservation
Theorem:
• There is an algorithm for lossless join
decomposition into BCNF
Theorem:
• There is an algorithm for dependency
preserving decomposition into 3NF

Database Group, Georgia Tech


© Leo Mark 27
Normalization

You might also like