0% found this document useful (0 votes)
6 views65 pages

Normalization

Normalization is a set of rules for structuring tables and databases, presented as a sequence of normal forms. It includes concepts such as functional dependencies, transitive and partial dependencies, and various normal forms (1NF, 2NF, 3NF, BCNF, 4NF, 5NF) to eliminate anomalies in database design. The document explains these concepts with examples and emphasizes the importance of maintaining data integrity and reducing redundancy.

Uploaded by

amanrangari2003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views65 pages

Normalization

Normalization is a set of rules for structuring tables and databases, presented as a sequence of normal forms. It includes concepts such as functional dependencies, transitive and partial dependencies, and various normal forms (1NF, 2NF, 3NF, BCNF, 4NF, 5NF) to eliminate anomalies in database design. The document explains these concepts with examples and emphasizes the importance of maintaining data integrity and reducing redundancy.

Uploaded by

amanrangari2003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 65

NORMALIZATION

Objective
 Normalization presents a set of rules that tables
and databases must follow to be well structured.
 Historically presented as a sequence of normal
forms
Functional Dependencies
Functional Dependencies
We say an attribute, B, has a functional dependency on another attribute, A, if
for any two records, which have
the same value for A, then the values for B in these two records must be the
same. We illustrate this as:
AB

Example: Suppose we keep track of employee email addresses, and we


only track one email address for each employee. Suppose each employee
is identified by their unique employee number. We say there is a functional
dependency of email address on employee number:

employee number  email address

91.2914 3
Functional Dependencies
EmpNum EmpEmail EmpFname EmpLname
123 [email protected] John Doe
456 [email protected] Peter Smith
555 [email protected] Alan Lee
633 [email protected] Peter Doe
787 [email protected] Alan Lee

If EmpNum is the PK then the FDs:


EmpNum  EmpEmail
EmpNum  EmpFname
EmpNum  EmpLname

must exist.

91.2914 4
Functional Dependencies
EmpNum  EmpEmail
EmpNum  EmpFname 3 different ways you
EmpNum  EmpLname might see FDs
depicted

EmpEmail
EmpNum EmpFname

EmpLname

EmpNum EmpEmail EmpFname EmpLname

91.2914 5
Determinant
Functional Dependency

EmpNum  EmpEmail

Attribute on the LHS is known as the determinant


• EmpNum is a determinant of EmpEmail

91.2914 6
Transitive dependency
Transitive dependency

Consider attributes A, B, and C, and where


A  B and B  C.
Functional dependencies are transitive, which means that we also
have the functional dependency AC
We say that C is transitively dependent on A through B.

91.2914 7
Transitive dependency
EmpNum  DeptNum

EmpNum EmpEmail DeptNum DeptNname

DeptNum  DeptName

EmpNum EmpEmail DeptNum DeptNname

DeptName is transitively dependent on EmpNum via DeptNum


EmpNum  DeptName

91.2914 8
Partial dependency
A partial dependency exists when an attribute B is functionally
dependent on an attribute A, and A is a component of a multipart
candidate key.

InvNum LineNum Qty InvDate

Candidate keys: {InvNum, LineNum} InvDate is partially dependent


on {InvNum, LineNum} as InvNum is a determinant of InvDate and
InvNum is part of a candidate key

91.2914 9
First Normal Form
First Normal Form
We say a relation is in 1NF if all values stored in the relation are single-
valued and atomic.

1NF places restrictions on the structure of relations. Values must be


simple.

91.2914 10
First Normal From
 A table is in the first normal form iff
 The domain of each attribute contains only
atomic values, and
 The value of each attribute contains only a
single value from that domain.

In layman's terms. it means every column of


your table should only contain single values
Example
 For a library

Patron ID Borrowed books


C45 B33, B44, B55
C12 B56
1-NF Solution
Patron ID Borrowed book
C45 B33
C45 B44
C45 B33
C12 B56
Example
 For an airline

Flight Weekdays
UA59 Mo We Fr
UA73 Mo Tu We Th Fr
1NF Solution
Flight Weekday
UA59 Mo
UA59 We
UA59 Fr
UA73 Mo
UA73 We
… …
Implication for the ER model
 Watch for entities that can have multiple values
for the same attribute
 Phone numbers, …
 What about course schedules?
 MW 5:30-7:00pm
 Can treat them as atomic time slots
Functional dependency
Let X and Y be sets of attributes in a table T
 Y is functionally dependent on X in T iff for
each set x Î R.X there is precisely one
corresponding set yÎ R.Y
 Y is fully functional dependent on X in T if Y is
functional dependent on X and Y is not
functional dependent on any proper subset of X
Example
 Book table
BookNo Title Author Year
B1 Moby Dick H. Melville 1851
B2 Lincoln G. Vidal 1984

Author attribute is:


 functionally dependent on the pair
{ BookNo, Title}
 fully functionally dependent on BookNo
Why it matters
 table BorrowedBooks

BookNo Patron Address Due


B1 J. Fisher 101 Main Street 3/2/15
B2 L. Perez 202 Market Street 2/28/15

Address attribute is
 functionally dependent on the pair
{ BookNo, Patron}
 fully functionally dependent on Patron
Problems
 Cannot insert new patrons in the system until
they have borrowed books
 Insertion anomaly
 Must update all rows involving a given patron if
he or she moves.
 Update anomaly
 Will lose information about patrons that have
returned all the books they have borrowed
 Deletion anomaly
Second Normal Form
 A table is in 2NF iff
 It is in 1NF and
 no non-prime attribute is dependent on any
proper subset of any candidate key of the
table
 A non-prime attribute of a table is an attribute
that is not a part of any candidate key of the
table
 A candidate key is a minimal superkey
Example
 Library allows patrons to request books that are
currently out

BookNo Patron PhoneNo


B3 J. Fisher 555-1234
B2 J. Fisher 555-1234
B2 M. Amer 555-4321
Example
 Candidate key is {BookNo, Patron}
 We have
 Patron → PhoneNo
 Table is not 2NF
 Potential for
 Insertion anomalies

 Update anomalies

 Deletion anomalies
2NF Solution
 Put telephone number in separate Patron table

BookNo Patron Patron PhoneNo


B3 J. Fisher J. Fisher 555-1234
B2 J. Fisher M. Amer 555-4321
B2 M. Amer
Third Normal Form
 A table is in 3NF iff
 it is in 2NF and
 all its attributes are determined only by its
candidate keys and not by any non-prime
attributes
Example
 Table BorrowedBooks

BookNo Patron Address Due


B1 J. Fisher 101 Main Street 3/2/15
B2 L. Perez 202 Market Street 2/28/15

 Candidate key is BookNo


 Patron → Address
3NF Solution
 Put address in separate Patron table
BookNo Patron Due
B1 J. Fisher 3/2/15
B2 L. Perez 2/28/15

Patron Address
J. Fisher 101 Main Street
L. Perez 202 Market Street
Another example
 Tournament winners

Tournament Year Winner DOB


Indiana Invitational 1998 Al Fredrickson 21 July 1975

Cleveland Open 1999 Bob Albertson 28 Sept. 1968


Des Moines Masters 1999 Al Fredrickson 21 July 1975

 Candidate key is {Tournament, Year}


 Winner →DOB
Boyce-Codd Normal Form
 Stricter form of 3NF
 A table T is in BCNF iff
 for every one of its non-trivial dependencies
X → Y, X is a super key for T

 Most tables that are in 3NF also are in BCNF


Example
Manager Project Branch
Alice Alpha Austin
Alice Delta Austin
Carol Alpha Houston
Dean Delta Houston

 We can assume
 Manager → Branch
 {Project, Branch} → Manager
Example
Manager Project Branch
Alice Alpha Austin
Bob Delta Houston
Carol Alpha Houston
Alice Delta Austin

 Not in BCNF because Manager → Branch and


Manager is not a superkey
 Will decomposition work?
A decomposition (I)
Manager Project Manager Branch
Alice Alpha Alice Austin
Bob Delta Bob Houston
Carol Alpha Carol Houston
Alice Delta
 Two-table solution does not preserve the
dependency {Project, Branch} → Manager
A decomposition (II)
Manager Project Manager Branch
Alice Alpha Alice Austin
Bob Delta Bob Houston
Carol Alpha Carol Houston
Alice Delta Dean Houston
Dean Delta
 Cannot have two or more managers managing
the same project at the same branch
Multivalued dependencies
 Assume the column headings in a table
are divided into three disjoint groupings X,
Y, and Z
 For a particular row, we can refer to the
data beneath each group of headings as x,
y, and z respectively
Multivalued dependencies
 A multivalued dependency X =>Y occurs if
 For any xc actually occurring in the table and
the list of all the xcyz combinations that occur
in the table, we will find that xc is associated
with the same y entries regardless of z.
 A trivial multivalued dependency X =>Y is one
where either
 Y is a subset of X, or

 Z is empty (X  Y has all column headings)


Fourth Normal Form
 A table is in 4NF iff
 For every one of its non-trivial multivalued
dependencies X => Y, X is either:
 A candidate key or

 A superset of a candidate key


Example from Wikipedia
Restaurant Pizza DeliveryArea
Pizza Milano Thin crust SW Houston
Pizza Milano Thick crust SW Houston
Pizza Firenze Thin crust NW Houston
Pizza Firenze Thick crust NW Houston
Pizza Milano Thin crust NW Houston
Pizza Milano Thick crust NW Houston
Discussion
 The table has no non-key attributes
 Key is { Restaurant, Pizza, DeliveryArea}
 Two non-trivial multivalued dependencies
 Restaurant => Pizza
 Restaurant => DeliveryArea

since each restaurant delivers the same pizzas


to all its delivery areas
Restaurant DeliveryArea
4NF Solution Pizza Milano SW Houston
Pizza Firenze NW Houston
Pizza Milano NW Houston
 Two separate tables

Restaurant Pizza
Pizza Milano Thin crust
Pizza Milano Thick crust
Pizza Firenze Thin crust
Pizza Firenze Thick crust
Join dependency
 A table T is subject to a join dependency if it
can always be recreated by joining multiple
tables each having a subset of the attributes of T

 The join dependency is said to be trivial if one


of the tables in the join has all the attributes of
the table T
 Notation: *{ A, B, …} on T
Fifth normal form
 A table T is said to be 5NF iff
 Every non-trivial join dependency in it is
implied by its candidate keys

 A join dependency *{A, B, … Z} on T is implied


by the candidate key(s) of T if and only if each of
A, B, …, Z is a superkey for T
An example
Store Brand Product
Circuit City Apple Tablets
Circuit City Apple Phones
Circuit City Toshiba Laptops
CompUSA Apple Laptops

 Note that Circuit City sells Apple tablets and


phones but only Toshiba laptops
A very bad decomposition
Store Product Brand Product
Circuit City Tablets Apple Tablets
Circuit City Phones Apple Phones
Circuit City Laptops Apple Laptops
CompUSA Laptops Toshiba Laptops

 Let see what happens when we do a natural join


The result of the join
Store Brand Product
Circuit City Apple Tablets
Circuit City Apple Phones
Circuit City Apple Laptops
Circuit City Toshiba Laptops
CompUSA Apple Laptops
CompUSA Toshiba Laptops

 Introduces two spurious tuples


A different table
Store Brand Product
Circuit City Apple Tablets
Circuit City Apple Phones
Circuit City Apple Laptops
Circuit City Toshiba Laptops
CompUSA Apple Laptops

 Assume now that any store carrying a given


brand and selling a product that is made by that
brand will always carry that product
The same decomposition
Store Product Brand Product
Circuit City Tablets Apple Tablets
Circuit City Phones Apple Phones
Circuit City Laptops Apple Laptops
CompUSA Laptops Toshiba Laptops

 Let see what happens when we do a natural join


The result of the join
Store Brand Product
Circuit City Apple Tablets
Circuit City Apple Phones
Circuit City Apple Laptops
Circuit City Toshiba Laptops
CompUSA Apple Laptops
CompUSA Toshiba Laptops

 Still one spurious tuple


The right decomposition
Store Product Brand Product
Circuit City Tablets Apple Tablets
Circuit City Phones Apple Phones
Circuit City Laptops Apple Laptops
CompUSA Laptops Toshiba Laptops

Store Brand
Circuit City Apple
Circuit City Toshiba
CompUSA Apple
Conclusion
 The first "big" table was 5NF
 The second table was decomposable
Lossless
Decomposition
General Concept
 If R(A, B, C) satisfies AB
 We can project it on A,B and A,C
without losing information
 Lossless decomposition

 R = AB(R) ⋈ AC(R)
 AB(R) is the projection of R on AB
⋈ is the natural join operator
Example

Course Instructor Text


4330 Paris none
4330 Cheng none
3330 Hillford Patterson & Hennessy

 Observe that Course  Text


A lossless decomposition

Course Text
Course, Text (R) 4330 none
3330 Patterson & Hennessy

Course Instructor
4330 Paris
Course, Instructor (R)
4330 Cheng
3330 Hillford
A different case

Course Instructor Text


4330 Paris Silberschatz and Peterson
4330 Cheng none
3330 Hillford Patterson & Hennessy

 Now Course  Text


 R cannot be decomposed
A lossy decomposition
Course Text
4330 none
Course, Text (R) 4330 Silberschatz & Peterson
3330 Patterson & Hennessy

Course Instructor
4330 Paris
Course, Instructor (R) 4330 Cheng
3330 Hillford
An Example
Normalisation Example
 We have a table  Columns
representing orders in  Order
an online store  Product
 Each row represents  Quantity
an item on a  UnitPrice
particular order
 Customer
 Primary key is
 Address
{Order, Product}
Functional Dependencies
 Each order is for a single customer:
 Order  Customer
 Each customer has a single address
 Customer  Address
 Each product has a single price
 Product  UnitPrice
 As Order  Customer and Customer  Address
 Order  Address
2NF Solution (I)
 First decomposition
 First table

Order Product Quantity UnitPrice

 Second table
Order Customer Address
2NF Solution (II)
 Second decomposition
 First table

Order Product Quantity


 Second table

Order Customer Address


 Third table

Product UnitPrice
3NF
 In second table
Order Customer Address
 Customer  Address
 Split second table into

Order Customer

Customer Address
Normalisation to 2NF
 Second normal form  To remove the first FD we
means no partial project over
dependencies on {Order, Customer,
candidate keys
Address} (R1)
 {Order}  and
{Customer, {Order, Product, Quantity,
Address} UnitPrice} (R2)
 {Product} 
{UnitPrice}
Normalisation to 2NF
  To remove this we project over
R1 is now in 2NF, but
there is still a partial FD in {Product, UnitPrice} (R3)
R2 and
{Order, Product, Quantity} (R4)
{Product} 
{UnitPrice}
Normalisation to 3NF
 R has now been split into  To remove
3 relations - R1, R3, and {Order} 
R4
{Customer} 
 R3 and R4 are in {Address}
3NF  we project R1 over
 R1 has a transitive  {Order,Customer}
FD on its key  {Customer,
Address}
Normalisation
 1NF:
 {Order, Product, Customer, Address, Quantity,
UnitPrice}
 2NF:
 {Order, Customer, Address}, {Product, UnitPrice},
and {Order, Product, Quantity}
 3NF:
 {Product, UnitPrice}, {Order, Product, Quantity},

{Order, Customer}, and {Customer, Address}

You might also like