0% found this document useful (0 votes)
18 views

Normalization

Normalization techniques

Uploaded by

shubhangi.ladde
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Normalization

Normalization techniques

Uploaded by

shubhangi.ladde
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 135

Normalization

By KBS
Purpose of Normalization

Data Redundancy and Update Anomalies,

Functional Dependencies: Basic concepts,

closure of set of functional dependencies,

closure of attribute set, canonical cover

Decomposition: lossless join decomposition and dependency


preservation.

The Process of Normalization: 1NF, 2NF, 3NF, BCNF, 4NF, 5NF.

By KBS
Database Modeling Life Cycle Step-By-Step
 Write down the problem definition / requirements in detail.

 Identify the nouns, they are the candidates for being entities.

 Identify the characteristic properties of these nouns, they are the candidates for being
the attributes of the entities.

 Decide various relationships between these entities.

 Try to model one table for each entity.

 Try to model one field for each attribute.

 Carryout Normalization

 Establish relationship with proper cardinality

 Apply Business logic in form of various integrity constraints.

 your database model is ready


By KBS
Relational Database Design
The goal of Relational Database Design
 To generate a set of relational schemas that allows us to store information without
unnecessary redundancy.
 Allows us to retrieve information easily.
This is accomplished by designing schemas that are in appropriate normal form.
 and using well-designed E-R Diagram.
 Features of Good Relational Design
 The goodness (or badness) of the resulting set of schemas depends on how good the E-
R diagram.
 The goodness of the resulting set of schemas depends on the designing of the data
model.
 it should also have following features.
 Minimum Redundancy
 Decomposition
 Functional Dependency
 Normal form.

By KBS
Anomalies
Anomalies : are problems that arise in the data due to a
flaw in the database design.

which are classified as:-


Insertion Anomalies:
 occurs when new data is inserted into flawed schema
Deletion Anomalies:
 occurs when data is deleted from a flawed schema
Update Anomalies:
 occurs when new data in flawed schema is changed

By KBS
Example of Update Anomalies
StaffBranch

staffNo sName position salary branchNo bAddress


SL21 John White Manager 30000 B005 22 Deer Rd, London
SG37 Ann Beech Assistant 12000 B003 163 Main St,Glasgow
SG14 David Ford Supervisor 18000 B003 163 Main St,Glasgow
SA9 Mary Howe Assistant 9000 B007 16 Argyll St, Aberdeen
SG5 Susan Brand Manager 24000 B003 163 Main St,Glasgow
SL41 Julie Lee Assistant 9000 B005 22 Deer Rd, London

To insert a new staff with branchNo B007 into the


StaffBranch relation.
To delete a tuple that represents the member of staff located
at a branch B005.
To update the address of branch B003.

By KBS
Example of Update Anomalies
StaffBranch
staffNo sName position salary branchNo bAddress
SL21 John White Manager 30000 B005 22 Deer Rd, London
SG37 Ann Beech Assistant 12000 B003 163 Main St,Glasgow
SG14 David Ford Supervisor 18000 B003 163 Main St,Glasgow
SA9 Mary Howe Assistant 9000 B007 16 Argyll St, Aberdeen
SG5 Susan Brand Manager 24000 B003 163 Main St,Glasgow
SL41 Julie Lee Assistant 9000 B005 22 Deer Rd, London

Staff staffNo sName position salary branchNo


SL21 John White Manager 30000 B005
SG37 Ann Beech Assistant 12000 B003
SG14 David Ford Supervisor 18000 B003
SA9 Mary Howe Assistant 9000 B007 Foreign Key
SG5 Susan Brand Manager 24000 B003
SL41 Julie Lee Assistant 9000 B005

Branch branchNo bAddress


B005 22 Deer Rd, London
B007 16 Argyll St, Aberdeen Primary Key
B003 163 Main St,Glasgow
By KBS
Example -. Assume the following relation

Student-courses (Sid:pk, Sname, Phone, Courses-taken)

Where attribute Sid is the primary key,


Sname is student name,
Phone is student's phone number and
Courses-taken is a table contains course-id, course-description, credit hours and
grade for each course taken by the student.

More precise definition of table Course-taken is :

Course-taken (Course-id:pk, Course-description, Credit-hours, Grade)

By KBS
Student-courses
Sid Sname Phone Courses-taken
100 John 487 2454 St-100-courses-taken
200 Smith 671 8120 St-200-courses-taken
300 Russell 871 2356 St-300-courses-taken
St-100-Course-taken
Course-id Course-description Credit-hours Grade
IS380 Database Concepts 3 A
IS416 Unix Operating System 3 B
St-200-Course-taken
Course-id Course-description Credit-hours Grade
IS380 Database Concepts 3 B
IS416 Unix Operating System 3 B
IS420 Data Net Work 3 C
St-300-Course-taken
Course-id Course-description Credit-hours Grade
IS417 System Analysis 3 A
By KBS
Definition of the three types of anomalies:
Insertion anomaly means that that some data can not be inserted in the database.
For example we can not add a new course to the database of above example,
unless we insert a student who has taken that course.

Thus Student-courses
Update table
anomaly means we have data redundancy suffers
in the database andfrom
to make
any modification we have to change all copies of the redundant data or else the
all the
databasethree anomalies
will contain incorrect data. .
For example in our database we have the Course description "Database Concepts"
for IS380 appears in both St-100-Course-taken and St-200-Course-taken tables.
To change its description to "New Database Concepts" we have to change it in all
places.

Deletion anomaly means deleting some data cause other information to be lost.
For example if student Russell is deleted from St-100-Course-taken table we also
lose the information that we had a course call IS417 with description System
Analysis.

By KBS
Deletion Anomaly
Occurs when the removal of a record results in a loss of important
information about an entity.

Example:

All the information about a customer is contained in an order file


if the order is canceled
all the customer information could be lost when the order record is deleted

Solution:
Create two tables--one table contains order information and
the other table contains customer information

By KBS
Update Anomaly
Occurs when a change of a single attribute in one record
requires changes in multiple records

Example:
A staff person changes their telephone number and every potential
customer that person ever worked with has to have the corrected
number inserted.

Solution:
Put the employees telephone number in one location--as an attribute
in the employee table.

By KBS
Insertion Anomaly
Occurs when there does not appear to be any reasonable
place to assign attribute values to records in the database.

Example:
Adding new attributes or entire records when they are not
needed.

Solution:
Create a new table with a primary key that contains the
relevant or functional dependent attributes.

By KBS
A Relational State
Name Course Ph-No Major Prof. Grade
Rohit CS203 696453 Comp. Rajat A
Nath CS303 427739 Chemi. Bharat B
Rohit CS328 696453 Comp. Rajat B
Martin CS303 388518 Physics Bharat A
Martin CS503 388518 Physics Shyam In Pr
Rohit CS492 696453 Comp. Cross In Pr
Baxter CS379 831803 English Rod C

By KBS
A Relational State
Name Course Ph-No Major Prof. Grade
Rohit CS203 696453 Comp. Rajat A
Nath CS303 427739 Chemi. Bharat B
Rohit CS328 696453 Comp. Rajat B
The instance in
Martin CS303 388518
which thePhysics
data Bharat A
base is
Martin CS503 388518 Physics Shyam
currently
In Pr
Rohit CS492 696453 Comp. Cross In Pr
Baxter CS379 831803 English Rod C

By KBS
PROBLEMS

Redundancy
Update Anomalies
Insertion Anomalies
Deletion Anomalies

By KBS
STUDENTS Relation

Redundancy
Name Course OfPh-No
Data Major Prof. Grade
Rohit CS203 696453 Comp. Rajat A
Nath CS303 427739 Chemi. Bharat B
Rohit CS328 696453 Comp. Rajat B
Martin CS303 388518 Physics Bharat A
Martin CS503 388518 Physics Shyam In Pr
Rohit CS492 696453 Comp. Cross In Pr
Baxter CS379 831803 English Rod C
By KBS
STUDENTS Relation
Update
Anomalies
Name Course Ph-No Major Prof. Grade
Rohit CS203 696455 Comp. Rajat A
Nath CS303 427739 Chemi. Bharat B
Rohit696453
CS328 696455 Comp. Rajat B
Martin CS303 388518 Physics Bharat A
Martin CS503 388518 Physics Shyam In Pr
Rohit CS492 696453 Comp. Cross In Pr
Baxter CS379 831803 English Rod C
By KBS
STUDENTS Relation
Name Course Ph-No Major Prof. Grade
Rohit CS203 696453 Comp. Rajat A
Nath CS303 427739 Chemi. Bharat B
Rohit CS328 696453 Comp. Rajat B
Martin CS303 388518 Physics Bharat A
Insertion
Martin CS503 388518 Physics Shyam In Pr
Anomalies
Rohit CS492 696453 Comp. Cross In Pr
Baxter CS379 831803 English Rod C
Rohit CS500 Vijay
By KBS
STUDENTS Relation
Redundancy
Of Data
Name Course Ph-No Major Prof. Grade
Rohit CS203 696453 Comp. Rajat A
Nath CS303 427739 Chemi. Bharat B
Rohit CS328 696453 Comp. Rajat B
Martin CS303 388518 Physics Bharat A
Martin CS503 388518 Physics Shyam In Pr
Rohit CS492 696453 Comp. Cross In Pr
Baxter CS379 831803 English Rod C
By KBS
STUDENTS Relation
Name Course Ph-No Major Prof. Grade
Rohit CS203 696453 Comp. Rajat A
Nath CS303 427739 Chemi. Bharat B
Deletion
RohitAnomalies
CS328 696453 Comp. Rajat B
Martin CS303 388518 Physics Bharat A
Martin CS503 388518 Physics Shyam In Pr
Rohit CS492 696453 Comp. Cross In Pr
Baxter CS379 831803 English Rod C

By KBS
Redundancy and Data Anomalies
Redundant data is where we have stored the same ‘information’ more
than once. i.e., the redundant data could be removed without the loss
of information.
Example: We have the following relation that contains staff and department details:

staffNo job dept dname city Such ‘redundancy’


SL10 Salesman 10 Sales Stratford
SA51 Manager 20 Accounts Barking
could lead to the
DS40 Clerk 20 Accounts Barking following
OS45 Clerk 30 Operations Barking ‘anomalies’

Insert Anomaly: We can’t insert a dept without inserting a member of


staff that works in that department
Update Anomaly: We could change the name of the dept that SA51
works in without simultaneously changing the dept that DS40 works in.
Deletion Anomaly: By removing employee SL10 we have removed all
information pertaining to the SalesBydept.
KBS
Repeating Groups
A repeating group is an attribute (or set of attributes) that can have
more than one value for a primary key value.

Example: We have the following relation that contains staff and department details
and a list of telephone contact numbers for each member of staff.

staffNo job dept dname City contact number


SL10 Salesman 10 Sales Stratford 018111777, 018111888,
079311122
SA51 Manager 20 Accounts Barking 017111777
DS40 Clerk 20 Accounts Barking
OS45 Clerk 30 Operations Barking 079311555

Repeating Groups are not allowed in a relational design, since all


attributes have to be ‘atomic’ - i.e., there can only be one value per cell in
a table!

By KBS
Normalization :-The Solution
Normalization is a process we can use to remove design flaws from a
database

Normalization is used to avoid or eliminate the three types of anomalies


(insertion, deletion and update anomalies) which a database may suffer
from.

sets of rules describing what we should and should not do in our table
structures

process consists of breaking tables into smaller tables that form a better
design

In this process :
Take database design through the different forms in order
form subsumes the one below it
At each stage, we add more rules that the schema must satisfy

By KBS
Normalization
In the design of a relational database management system (RDBMS), the
process of organizing data to minimize redundancy is called
normalization.

The goal of database normalization is to decompose relations with


anomalies in order to produce smaller, well-structured relations.

Normalization usually involves dividing large tables into smaller (and


less redundant) tables and defining relationships between them.

The objective is to isolate data so that additions, deletions, and


modifications of a field can be made in just one table and then
propagated through the rest of the database via the defined relationships.

By KBS
There are two goals of the normalization process:-

Eliminating redundant data (for example, storing the same data in more
than one table) and
Ensuring data dependencies make sense (only storing related data in a
table).

Normalization is a technique for producing a set of relations with


desirable properties, given the data requirements of an enterprise.

The process of normalization is a formal method that identifies relations


based on their primary or candidate keys and the functional dependencies
among their attributes.

By KBS
Purpose of Normalization
To avoid redundancy by storing each ‘fact’ within the
database only once.

To put data into a form that conforms to relational


principles (e.g., single valued attributes, each relation
represents one entity) - no repeating groups.

To put the data into a form that is more able to


accurately accommodate change.

To avoid certain updating ‘anomalies’.

To facilitate the enforcement of data constraints.


By KBS
Normalization Benefits
Facilitates data integration.

Reduces data redundancy.

Provides a robust architecture for retrieving and


maintaining data.

Compliments data modeling.

Reduces the chances of data anomalies occurring.

By KBS
Functional Dependencies
The concept of functional dependency (also known as
normalization was introduced by professor Codd in 1970 when
he defined the first three normal forms (first, second and third
normal forms)
Plays a key role in differentiating good database design and bad
database design
Functional dependency describes the relationship between
attributes in a relation.
For example, if A and B are attributes of relation R, and B is
functionally dependent on A ( denoted A B),
if each value of A is associated with exactly one value of B. ( A
and B may each consist of one or more attributes.)
B is functionally dependent on A
A B
Refers to the attribute or group of attributes on the
Determinant
left-hand side of the arrow of a functional
dependency By KBS
Definition
 A functional dependency is defined as a
constraint between two sets of attributes in a
relation from a database.

Given a relation R,
a set of attributes X in R is said to functionally
determine another attribute Y, also in R, (written X
→ Y) if and only if each X value is associated with
at most one Y value.
By KBS
In other words….
X is the determinant set and Y is the dependent
attribute.

By KBS
Example
Employee

SSN Name JobType DeptName


557-78-6587 Lance Smith Accountant Salary

214-45-2398 Lance Smith Engineer Product

Note:
Name is functionally dependent on SSN because an employee’s
name can be uniquely determined from their SSN.
SSN Name

Name does not determine SSN, because more than one employee
can have the same name.. Name SSN
By KBS
Example:
Functional Dependencies
staffNo job dept dname
SL10 Salesman 10 Sales
staffNo  job
SA51 Manager 20 Accounts staffNo  dept
DS40 Clerk 20 Accounts staffNo  dname
OS45 Clerk 30 Operations dept  dname

FDs are constraints on Legal Relations.

There is no algorithmic method of identifying dependency.

We have to use our commonsense and judgement to specify


dependencies.

By KBS
Example

SSN LASTNAME FIRSTNAME


111 SMITH BOB
222 JONES DAVID
333 SMITH JOE
111 SMITH BOB
444 JONES SUE
555 WHITE DAVID

SSN LASTNAME

LASTNAME FIRSTNAME

By KBS
Dependency Diagram
Phone
Name
Major

Course
Prof.

Key Attributes

By KBS
Functional Dependencies
Trival functional dependency means that the right-hand side is a
subset ( not necessarily a proper subset) of the left- hand side.
staffNo sName position salary branchNo bAddress
SL21 John White Manager 30000 B005 22 Deer Rd, London
SG37 Ann Beech Assistant 12000 B003 163 Main St,Glasgow
SG14 David Ford Supervisor 18000 B003 163 Main St,Glasgow
SA9 Mary Howe Assistant 9000 B007 16 Argyll St, Aberdeen
SG5 Susan Brand Manager 24000 B003 163 Main St,Glasgow
SL41 Julie Lee Assistant 9000 B005 22 Deer Rd, London

For example: (See Figure )


staffNo, sName  sName
staffNo, sName  staffNo

They do not provide any additional information about possible integrity


constraints on the values held by these attributes.

We are normally more interested in nontrivial dependencies because they


represent integrity constraints for the relation.(ie y is not a prpoer subset of
By KBS
x)
Functional Dependencies
Main characteristics of functional dependencies in
normalization

• Have a one-to-one relationship between attribute(s) on the left-


and right- hand side of a dependency;

• hold for all time;

• are nontrivial.

Functional dependency is a property of the meaning or semantics of


the attributes in a relation.
When a functional dependency is present, the dependency is
specified as a constraint between the attributes.

By KBS
Closure of Functional Dependencies

Let a relation R have some functional dependencies F specified.


The closure of F (usually written as F+) is the set of all functional
dependencies that may be logically derived from F.
Often F is the set of most obvious and important functional
dependencies.
F+, the closure, is the set of all the functional dependencies
including F and those that can be deduced from F.
A set of all functional dependencies that are implied by a given set
of functional dependencies F is called closure of F, written F+.
A set of inference rule is needed to compute F+ from F.

By KBS
Armstrong’s axioms
Developed by Armstrong in 1974, there are six
rules(axioms) that all possible functional dependencies
may be derived from them.

1. Reflexivity Rule ---


If X is a set of attributes and Y is a subset of X,
then X  Y holds.

2. Augmentation Rule ---


If X  Y holds and W is a set of attributes,
then WX  WY holds.

3. Transitivity Rule ---


If X  Y and Y  Z holds, then X  Z holds.

By KBS
4. Union Rule ---
If X  Y and X  Z holds,
then X  YZ holds.

5. Decomposition Rule ---


If X  YZ holds,
then so do X  Y and X  Z.

6. Pseudo transitivity Rule ---


If X  Y and WY  Z hold then so does WX  Z.

By KBS
※ Procedure for Computing F+
To compute the closure of a set of functional dependencies F:

repeat
for each functional dependency f in F+
apply reflexivity and augmentation rules on f
add the resulting functional dependencies to F +
for each pair of functional dependencies f1and f2 in F +
if f1 and f2 can be combined using transitivity
then add the resulting functional dependency to F +
until F + does not change any further
Example
R = (A, B, C, G, H, I)
F={ AB
AC
CG  H
CG  I
BH}
some members of F+
AH
 by transitivity from A  B and B  H
AG  I
 by augmenting A  C with G, to get AG  CG

 and then transitivity with CG  I


CG  HI
 by augmenting CG  I to infer CG  CGI,

and augmenting of CG  H to infer CGI  HI , and then transitivity


Example:2
Student
SNo SName CNo CName Addr Instr. Office
5425 Susan Ross 102 Calc I San Jose, CA P. Smith B42 Room 112

7845 Dave Turco 541 Bio 10 San Diego, L. Talip B24 Room 210
CA

SNo -> Sname


SNo -> Addr
CNo -> Cname F
CNo -> Instr
Instr -> Office

By KBS
Based on the rules provided, the following dependencies can be
derived.

(SNo, CNo)  SNo (Rule 1) -- subset


(SNo, CNo)  CNo (Rule 1)
(SNo, CNo)  (SName, CName) F+ (Rule 2) augmentation
CNo  office (Rule 3) -- transitivity
SNo  (SName, address) (Union Rule)

By KBS
Minial Sets of Functional Dependencies

A set of functional dependencies X is minimal if it satisfies


the following condition:
• Every dependency in X has a single attribute on its
right-hand side

• We cannot replace any dependency A  B in X with


dependency C B, where C is a proper subset of A, and
still have a set of dependencies that is equivalent to X.

• We cannot remove any dependency from X and still


have a set of dependencies that is equivalent to X.

By KBS
Example of A Minial Sets of Functional Dependencies

A set of functional dependencies for the StaffBranch relation


satisfies the three conditions for producing a minimal set.

staffNo  sName
staffNo  position
staffNo  salary
staffNo  branchNo
staffNo  bAddress
branchNo  bAddress
branchNo, position  salary
bAddress, position  salary

By KBS
Closure of Attribute Sets
Given a set of attributes  define the closure of  under F (denoted by
+) as the set of attributes that are functionally determined by  under F

Algorithm to compute +, the closure of  under F


result := ;
while (changes to result) do
for each    in F do
begin
if   result then result := result  
end
Uses of Attribute Closure
There are several uses of the attribute closure algorithm:
Testing for superkey and candidate key:
To test if  is a superkey, we compute +, and check if + contains all
attributes of R.

Testing functional dependencies


To check if a functional dependency    holds (or, in other words, is in
F+), just check if   +.
That is, we compute + by using attribute closure, and then check if it
contains .
Example of Attribute Set Closure
R = (A, B, C, G, H, I)
F = {A  B
AC
CG  H
CG  I
B  H}

(AG)+
1. result = AG
2. result = ABCG (A  C and A  B)
3. result = ABCGH (CG  H and CG  AGBC)
4. result = ABCGHI (CG  I and CG  AGBCH)

(AG)+ = ABCGHI
AG is super key
But if (AG)+ is a subset of super key then it’s a candidate key
What is Decomposition?
Decomposition – the process of breaking down in
parts or elements.

Decomposition in database means breaking tables


down into multiple tables

From Database perspective means going to a


higher normal form
Decomposition
Important that decompositions are “good”,

Two Characteristics of Good Decompositions


1) Lossless
2) Preserve dependencies
Lossless Decomposition

By KBS
What is lossless?
Lossless means functioning without a loss.

In other words, retain everything.

Important for databases to have this feature.


Pitfalls in Relational DB Design
A bad design may have several properties, including:

 Repetition of information.

 Inability to represent certain information.

 Loss of information.

By KBS
Representation of Information
Suppose we have a schema, bank_schema,

bank_schema = (bname, bcity, assets, cname, loan#, amount)


and suppose an instance of the relation is

Figure : Sample bank_schema relation.


We are now repeating the assets and branch city information for every loan.
Repetition of information wastes space.
Repetition of information complicates updating.

By KBS
Let's analyze this problem:

We know that a branch is located in exactly one city.

We also know that a branch may make many loans.

The functional dependency bname bcity holds on bank-schema.

The functional dependency bname loan# does not.

These two facts are best represented in separate relations.

By KBS
Definition of Decomposition
Let R be a relation schema
A set of relation schemas { R1, R2,…, Rn } is a
decomposition of R if
 R = R1 U R2 U …..U Rn
 each Ri is a subset of R ( for i = 1,2…,n)

By KBS
Example of Decomposition
For relation R(x,y,z) there can be 2 subsets:
R1(x,z) and R2(y,z)

If we union R1 and R2, we get R


R = R1 U R2

By KBS
Goal of Decomposition
Eliminate redundancy by decomposing a relation
into several relations in a higher normal form.
It is important to check that a decomposition
does not lead to bad design

By KBS
Problem with Decomposition
Given instances of the decomposed
relations, we may not be able to reconstruct
the corresponding instance of the original
relation – information loss

By KBS
Example : Problem with Decomposition
R
Model Name Price Category
a11 100 Canon
s20 200 Nikon
a70 150 Canon

R1 R2
Model Name Category Price Category

a11 Canon 100 Canon

s20 Nikon 200 Nikon


a70 Canon 150 Canon

By KBS
Example : Problem with Decomposition
R1 U R2 Model Name Price Category
a11 100 Canon
a11 150 Canon
s20 200 Nikon
a70 100 Canon
a70 150 Canon

Model Name Price Category


R
a11 100 Canon
s20 200 Nikon
a70 150 Canon

By KBS
Lossy decomposition
In previous example, additional tuples are
obtained along with original tuples

Although there are more tuples, this leads to less


information

Due to the loss of information, decomposition for


previous example is called lossy decomposition or
lossy-join decomposition

By KBS
Lossy decomposition (more example)

T Employee Project Branch


Brown Mars L.A.
Green Jupiter San Jose
Green Venus San Jose
Hoskins Saturn San Jose
Hoskins Venus San Jose

Functional dependencies:

Employee Branch, Project Branch

By KBS
Lossy decomposition
Decomposition of the previous relation
T
Employee Project Branch
Brown Mars L.A.
Green Jupiter San Jose
Green Venus San Jose
Hoskins Saturn San Jose
Hoskins Venus San Jose
T1 T2
Employee Branch Project Branch
Brown L.A Mars L.A.
Jupiter San Jose
Green San Jose
Saturn San Jose
Hoskins San Jose
Venus San Jose
By KBS
Lossy decomposition
After Natural Join Original Relation
Employee Project Branch Employee Project Branch
Brown Mars L.A. Brown Mars L.A.
Green Jupiter San Jose Green Jupiter San Jose
Green Venus San Jose Green Venus San Jose
Hoskins Saturn San Jose Hoskins Saturn San Jose
Hoskins Venus San Jose Hoskins Venus San Jose
Green Saturn San Jose
Hoskins Jupiter San Jose

After Natural Join, we get two extra tuples.


Thus, there is loss of information.

By KBS
Lossless Decomposition
A decomposition {R1, R2,…, Rn} of a
relation R is called a lossless decomposition
for R if the natural join of R1, R2,…, Rn
produces exactly the relation R.

By KBS
Lossless Decomposition
A decomposition is lossless if we can recover:

R(A, B, C)
Decompose
R1(A, B) R2(A, C)
Recover
R’(A, B, C)

Thus, R’ = R

By KBS
Lossless Decomposition
A decomposition is lossless if we can recover:
R(A,B,C)
Decompose

R1(A,B) R2(A,C)
Recover

R’(A,B,C) should be the same as


R(A,B,C)
Must ensure R’ = R
Lossless Decomposition
Sometimes the same set of data is reproduced:

Name Price Category


Word 100 WP
Oracle 1000 DB
Access 100 DB

Name Price Name Category


Word 100 Word WP
Oracle 1000 Oracle DB
Access 100 Access DB

(Word, 100) + (Word, WP)  (Word, 100, WP)

(Oracle, 1000) + (Oracle, DB)  (Oracle, 1000, DB)

(Access, 100) + (Access, DB)  (Access, 100, DB)


Lossy Decomposition
Sometimes it’s not:

Name Price Category


Word 100 WP
What’s
Oracle 1000 DB
wrong?
Access 100 DB

Category Name Category Price


WP Word WP 100
DB Oracle DB 1000
DB Access DB 100

(Word, WP) + (100, WP) = (Word, 100, WP)


(Oracle, DB) + (1000, DB) = (Oracle, 1000, DB)
(Oracle, DB) + (100, DB) = (Oracle, 100, DB)
(Access, DB) + (1000, DB) = (Access, 1000, DB)
(Access, DB) + (100, DB) = (Access, 100, DB)
Lossless Decomposition Property
R : relation
F : set of functional dependencies on R
X,Y : decomposition of R
Decomposition is lossles if :
X ∩ Y  X, that is: all attributes common to both X and Y
functionally determine ALL the attributes in X
OR
X ∩ Y  Y, that is: all attributes common to both X and Y
functionally determine ALL the attributes in Y

By KBS
Example : Lossless Decomposition
Given:
bank_schema = (branch-name, branch-city, assets, customer-name,
loan-number, amount)
Required FD’s:
branch-name branch-city, assets
loan-number amount, branch-name
Decompose bank_schema into two schemas:
Branch-schema = (branch-name, branch-city, assets)
Loan-info-schema = (branch-name, customer-name, loan-number,
amount)

By KBS
Example : Lossless Decomposition
Show that decomposition is Lossless Decomposition

 Since branch-name branch-city ,assets,


 the augmentation rule for FD implies that:
branch-name branch-name ,branch-city, assets

 Since Branch-schema ∩ Loan-info-schema = {bank_schema}

Thus, this decomposition is Lossless decomposition

By KBS
Algorithm to test for the lossless and lossy decomposition
Input: A universal relation R,
a decomposition D = {R1, R2, …, Rm} of R,
and a set F of functional dependencies

1.Create an initial matrix S with one row i for each relation Ri in D, and
one column j for each attribute Aj in R.
2. Set S(i, j) = bij for all matrix entries.
/* each bij is a symbol associated with indices (i, j)*/
3. For each row i representing relation schema Ri
For each column j representing attribute Aj
if (relation Ri includes attribute Aj) then S(i, j) = aj
/* each aj is a symbol associated with index j */

By KBS
4. Repeat until a complete loop execution results in no changes to S
for each functional dependency X Y in F
for all rows in S which have the same symbols in the
columns corresponding to attributes in X
make the symbols in each column that correspond
to an attribute in Y be the same in all these rows as
follows:
If any of the rows has an “a” symbol for the
column, set the other rows to that same “a”
symbol in the column.
if no “a” symbol exists for the attribute in any of the rows, choose
one of the “b” symbols that appear in one of the rows for the
attribute and set the other rows to that same “b” symbol in the
column

By KBS
5. If a row is made up entirely of “a” symbols, then the
decomposition has the lossless join property; otherwise it does
not.

– If one row is full of distinguished variables, it’s lossless


– If no one row is full, add distinguished variables
• To add distinguished variables
1) 2 or more rows with distinguished variables on LHS
2) 1 or more rows with distinguished variables on RHS
3) 1 or more rows with non-distinguished variables on RHS

By KBS
Example of a Lossless Decomposition

R = {SSN, EName, PNum, PName, PLoc, Hrs}

F = {f1: SSN  EName, f2: PNum  {PName, PLoc},


f3: {SSN, PNum}  Hrs}

Decompose R into R1, R2, and R3 where


R1 = {SSN, EName}
R2 = {PNum, PName, PLoc}
R3 = {SSN, PNum, Hrs}

Find decomposition of R is lossless or not.

By KBS
Create the matrix S
SSN EName PNum PName PLoc Hrs
R1 a1 a2 b13 b14 b15 b16
R2 b21 b22 a3 a4 a5 b26
R3 a1 b32 a3 b34 b35 a6
Apply f1: SSN  EName
The values in the SSN column are the same for rows R1 and R3. In
the EName column, the values are a2 and b32.
Since an “a” symbol appears, change the R3, EName entry in S to
be a2. The resulting matrix S is

SSN EName PNum PName PLoc Hrs


R1 a1 a2 b13 b14 b15 b16
R2 b21 b22 a3 a4 a5 b26
R3 a1 b32 a2 a3 b34 b35 a6

By KBS
Apply f2: PNum  {PName, PLoc}
The values in the PNum column are the same for rows R2 and R3. In
the PName column, the values are a4 and b34.
Since an “a” symbol appears, change the R3, PName entry in S to be
a4. In the PLoc column, the values are a5 and b35. Change the R3,
PLoc entry to be a5.
The resulting matrix S is

SSN EName PNum PName PLoc Hrs


R1 a1 a2 b13 b14 b15 b16
R2 b21 b22 a3 a4 a5 b26
R3 a1 b32 a2 a3 b34 a4 b35 a5 a6

Since row 3 of S consists of just “a” symbols, the decomposition is


lossless.
Note: f3 was not applied. If it is, then S does not change since all the set of rows with the
same values in the columns SSN and PNum contain only 1 row.
By KBS
Example of a Lossless Decomposition
R = {A1 , A2 , A3 , A4 , A5}
F = {f1: A1  {A3, A5}, f2: A5 { A1, A4}
f3: {A3, A4}  A2}
Decompose R into R1, R2, and R3 where
R1 = {A1 , A2 , A3 , A5 }
R2 = {A1 , A3 , A4 }
R3 = {A4, A5} find decomposition is lossless or not?
Create the matrix S
A1 A2 A3 A4 A5
R1 a1 a2 a3 b14 a5
R2 a1 b22 a3 a4 b25
R3 b31 b32 b33 a4 a5

By KBS
Apply f1: A1{A3, A5}
The values in the A1 column are the same for rows R1 ,R2 .
In the A3 column, the values are a3,a3 respectively.
Since an “a” symbol appears more than one row in column A1,
change in A5 entry in R2 to be a5. The resulting matrix S is

A1 A2 A3 A4 A5
R1 a1 a2 a3 b14 a5
Decomposition is Lossless
R2 a1 b22 a3 a4 b25 a5
R3 b31 b32 b33 a4 a5

Apply f2: A5{A1, A4}


A1 A2 A3 A4 A5
R1 a1 a2 a3 b14 a4 a5
R2 a1 b22 a3 a4 b25 a5
R3 b31 b32 b33 a4 a5
By KBS
Example of a Lossy Decomposition
R = {SSN, EName, PNum, PName, PLoc, Hrs}
F = {f1: SSN  Ename, f2: Pnum  {PName, PLoc},
f3: {SSN, PNum}  Hrs}
Decompose R into R1 and R2 where
R1 = {Ename, Ploc}
R2 = {SSN, PNum, PName, PLoc, Hrs}
Create the matrix S
SSN EName PNum PName PLoc Hrs
R1 b11 a2 b13 b14 a5 B16
R2 a1 b22 a3 a4 a5 a6

Apply f1: SSN  Ename no change in S


Apply f2: Pnum  {PName, PLoc} no change in S
Apply f3: {SSN, PNum}  Hrs no change in S
Thus this decomposition is lossy.
By KBS
Conclusions
Decomposing is the act of breaking tables down in order to achieve higher normal
form.

Decompositions should always be lossless.

This confirms that information in the original relation can be accurately


reconstructed based on the decomposed relations.

Remember that for a decomposition to be considered “GOOD” it must also preserve


functional dependencies.

Lossless decomposition ensure that the information in the original relation can be
accurately reconstructed based on the information represented in the decomposed
relations.

By KBS
Extraneous Attributes
Consider F, and a functional dependency, A  B.

“Extraneous”: Are there any attributes in A or B that can be safely


removed ?
Without changing the constraints implied by F
Example: Given F = {A  C, AB  CD}
C is extraneous in AB  CD since AB  C can be inferred even after
deleting C
Testing if an Attribute is Extraneous
Consider a set F of functional dependencies and the functional dependency   
in F.
To test if attribute A   is extraneous in 
1. compute ({} – A)+ using the dependencies in F
2. check that ({} – A)+ contains ; if it does, A is extraneous in 
Example: Given F = {A  C, AB  C }
B is extraneous in AB  C
Apply above rule
the functional dependency    means AB  C ie =AB and =C and attribute
A is nothing but B.
1.compute ({} – A)+ using the dependencies in F
So compute ({} – A)+ means (AB – B)+ = (A)+ =AC.
2.check that ({} – A)+ contains ; if it does, A is extraneous in 
({} – A)+ is AC and  ie C is in AC
That why B is extraneous
To test if attribute A   is extraneous in 
1. compute + using only the dependencies in
F’ = (F – {  })  { ( – A)},
2. check that + contains A; if it does, A is extraneous in 
Example: Given F = {A  C, AB  CD}
C is extraneous in AB  CD
Apply above rule
The functional dependency    means AB CD ie =AB and =CD and
attribute A is nothing but C.
1.compute + using only the dependencies in
F’ = (F – {  })  { ( – A)},
F’=(F – {AB  CD})  {AB (CD – C)},
F’=(A  C )  (AB D)
F’=AB CD
compute + using above dependencies
+ = AB+ = ABCD.
2.check that + contains A; if it does, A is extraneous in 
ABCD contains C that why C is extraneous.
By KBS
Canonical Cover
Sets of functional dependencies may have redundant dependencies that can be
inferred from the others
For example: A  C is redundant in: {A  B, B  C}
Parts of a functional dependency may be redundant
 E.g.: {A  B, B  C, A  CD} can be simplified to

 {A  B, B  C, A  D}
 E.g.: {A  B, B  C, AC  D} can be simplified to

 {A  B, B  C, A  D}

A canonical cover of F is a “minimal” set of functional dependencies equivalent to


F, having no redundant dependencies or redundant parts of dependencies
Canonical Cover
A canonical cover for F is a set of dependencies Fc such that
F logically implies all dependencies in Fc, and
Fc logically implies all dependencies in F, and
No functional dependency in Fc contains an extraneous attribute, and
Each left side of functional dependency in Fc is unique.

Algorithm compute a canonical cover for F:


Repeat
Use the union rule to replace any dependencies in F
1  1 and 1  2 with 1  1 2
Find a functional dependency    with an
extraneous attribute either in  or in 
If an extraneous attribute is found, delete it from   
until F does not change
Note: Union rule may become applicable after some extraneous attributes have been
deleted, so it has to be re-applied
Computing a Canonical Cover
R = (A, B, C)
F = {A  BC
BC
AB
AB  C}
Combine A  BC and A  B into A  BC
Set is now {A  BC, B  C, AB  C}
A is extraneous in AB  C
Check if the result of deleting A from AB  C is implied by the other dependencies
 Yes: in fact, B  C is already present!

Set is now {A  BC, B  C}


C is extraneous in A  BC
Check if A  C is logically implied by A  B and the other dependencies
 Yes: using transitivity on A  B and B  C.

• Can use attribute closure of A in more complex cases


The canonical cover is: AB
BC
The Process of Normalization
 Normalization is often executed as a series of steps.
 Each step corresponds to a specific normal form that has
known properties.

 As normalization proceeds, the relations become progressively


more restricted in format, and also less vulnerable to update
anomalies.

 For the relational data model, it is important to recognize that


it is only first normal form (1NF) that is critical in creating
relations.

 All the subsequent normal forms are optional.


By KBS
Types of normalized forms existing in a data
base environment

First Normal Form (1NF)

Second Normal Form (2NF)

Third Normal Form (3NF)

Boyce-Codd Normal Form (BCNF)

Fourth Normal Form (4NF)

Fifth Normal Form (5NF)

By KBS
Stages of Normalisation
Un-normalised
(UNF)
Remove repeating groups
First normal form
(1NF)
Remove partial dependencies
Second normal form
(2NF)
Remove transitive dependencies
Third normal form
(3NF) Remove remaining functional
dependency anomalies
Boyce-Codd normal
form (BCNF)
Remove multivalued dependencies
Fourth normal form
(4NF)
Remove remaining anomalies
Fifth normal form
(5NF) By KBS
Unnormalised Normal Form (UNF)
Definition:-
A relation is un-normalised when it has not any normalization
rules applied to it, and it suffers from various anomalies.

ORDER
Customer No: 001964 Order Number: 00012345
Name: Mark Campbell Order Date: 14-Feb-2002
Address: 1 The House
Leytonstone
E11 9ZZ

Product Product Unit Order Line


Number Description Price Quantity Total
T5060 Hook 5.00 5 25.00
PT42 Bolt 2.50 10 20.50
QZE48 Spanner 20.00 1 20.00

Order Total: 65.50

ORDER (order-no, order-date, cust-no, cust-name, cust-add,


(prod-no, prod-desc, unit-price, ord-qty, line-total)*, order-total

By KBS
By KBS
First Normal Form (1NF)
Definition: A relation is in 1NF if, and only if, all its
underlying attributes contain atomic values only.
Remove repeating groups into a new relation
First Normal Form is a relation in which the intersection of
each row and column contains one and only one value.

A table (relation) is in 1NF if


1. There are no duplicated rows in the table.
2. Each cell is single-valued (i.e., there are no repeating groups
or arrays).
3. Entries in a column (attribute, field) are of the same kind.

By KBS
Example:
A table for the entity of Book

Title Author ISBN Subject Publisher Pages

Database Sudarshan 0-07-295886-3 Database McGraw-Hill 1142


System
Concepts
Database Silberschatz 0-07-295886-3 Database McGraw-Hill 1142
System
Concepts
The Ultimate Das 0-07-240500-7 Unix McGraw-Hill 445
Guide

The Ultimate Korth 0-07-240500-7 Unix McGraw-Hill 445


Guide

By KBS
By applying the first normal form,
we will have to construct separate tables for the redundant data
with extra tables to define the relationship between the tables.

Author_ID Last Name First Name Subject_ID Subject


1 Sudarshan Mark 1 Database
2 Silberschatz Abraham 2 Unix
3 Das Sumitabha
4 Korth Henry * Here we have the table for subject.

* Here we have the table for author.

ISBN Title Pages Publisher


0-07-295886-3 Database System Concepts 1142 McGraw-Hill

0-07-240500-7 The Ultimate Guide 445 McGraw-Hill

* Here we have the table for book.

By KBS
Since the tables had separated in order to avoid redundancy.
we also need to create new tables to connect each table so that
their relationship between each table will remain unchanged.

ISBN Author_ID
0-07-295886-3 1

0-07-240500-7 3

0-07-295886-3 2

0-07-240500-7 4

* Here we have the relationship between the book and the author.

ISBN Subject_ID
0-07-295886-3 1

0-07-240500-7 2

* Here we have the relationship between the book and the subject .
By KBS
A table is in first normal form (1NF) if there are no repeating
groups.
A repeating group is a set of logically related fields or values
that occur multiple times in one record.
EMPLOYEES_PROJECTS_TIME
EmployeeID Name Project Time
30-452-T3, 0.25,
EN1-26 Sean O'Brien 30-457-T3, 0.40,
32-244-T3 0.30
30-452-T3, 0.05,
EN1-33 Amy Guya 30-382-TC, 0.35,
32-244-T3 0.60
30-452-T3, 0.15,
EN1-35 Steven Baranco
31-238-TC 0.80
EN1-36 Elizabeth Roslyn 35-152-TC 0.90
EN1-38 Carol Schaaf 36-272-TC 0.75
31-238-TC, 0.20,
EN1-40 Alexandra Wing
31-241-TC 0.70
The above tables below do not comply with first normal form.
Look for fields that contain too much data and repeating group of fields .
By KBS
The example above is also related to another design issue,
namely,

That each field should hold the smallest meaningful value and
that there should not be multiple values in a single field.

There would be no way to sort by last names nor to know


which allocation of time belonged to which project.
A table with repeating groups of fields.
EmpID Last Name First Name Project1 Time1 Project2 Time2 Project3 Time3
EN1-26 O'Brien Sean 30-452-T3 0.25 30-457-T3 0.40 32-244-T3 0.30
EN1-33 Guya Amy 30-452-T3 0.05 30-382-TC 0.35 32-244-T3 0.60

EN1-35 Baranco Steven 30-452-T3 0.15 31-238-TC 0.80

EN1-36 Roslyn Elizabeth 35-152-TC 0.90

EN1-38 Schaaf Carol 36-272-TC 0.75

EN1-40 Wing Alexandra 31-238-TC 0.20 31-241-TC 0.70

If an employee was assigned to a fourth project, you would


have to add two new fields to
By the
KBS table.
Designing to meet first normal form
Now we will take the table you saw above and redesign it so
it will comply with first normal form.
Look at the repeating groups of data.
Identify tables and fields that will hold this data without the
repeating groups.
EMPLOYEES Primary Key

EmployeeID Last Name First Name


EN1-26 O'Brien Sean
EN1-33 Guya Amy
EN1-35 Baranco Steven
EN1-36 Roslyn Elizabeth
EN1-38 Schaaf Carol
EN1-40 Wing Alexandra
By KBS
PROJECTS_EMPLOYEES_TIME Foreign key

ProjectNum EmployeeID Time


30-328-TC EN1-33 0.35
30-452-T3 EN1-26 0.25
30-452-T3 EN1-33 0.05
30-452-T3 EN1-35 0.15
31-238-TC EN1-35 0.80
30-457-T3 EN1-26 0.40
31-238-TC EN1-40 0.20
31-241-TC EN1-40 0.70
32-244-T3 EN1-33 0.60
35-152-TC EN1-36 0.90
36-272-TC EN1-38 0.75
32-244-T3 EN1-26 0.30

By KBS
First Normal Form

By KBS
Requirements

 All rows shall be uniquely identifiable

 All the values shall be atomic

 No columns of repetitive nature

 No redundant data

By KBS
Second Normal Form (2NF)
Def: A table is in 2NF if it is in 1NF and if all non-key
attributes are dependent on all of the key.
It has no partial dependency.
Second Normal Form(2NF) remove partial dependency.
Partial dependency:-
Partial dependency occurs when the value in a non-key
attribute of a table is dependent on the value of some part of
the table’s primary key(but not all of it)
Partial Dependency
CUSTOMER

Cust_ID Name Order_ID


101 AT&T 1234
101 AT&T 156
125 Cisco 1250
By KBS
Customer_book(cust_no,ISBN,Title,Author_name,Authore
_countory,Qty,Unitprice)
The dependency diagram of above relation

Cust_no

Qty
Title
ISBN
Author_name
Relation is decompose in to two relation
Author_countory
Sales(cust_no,ISBN,Qty)
Unit price
Book_sale(ISBN,title,Authore_name,Autho
re_countory,unitprice)
By KBS
Employee_project
A table with a multi-field primary key and repeating data in non-key fields
*EmployeeID LastName FirstName *ProjectNumber ProjectTitle
EN1-26 O'Brien Sean 30-452-T3 STAR manual
EN1-26 O'Brien Sean 30-457-T3 ISO procedures
Employee
EN1-26 O'Brien Sean 31-124-T3
handbook
EN1-33 Guya Amy 30-452-T3 STAR manual
EN1-33 Guya Amy 30-482-TC Web Site
EN1-33 Guya Amy 31-241-TC New catalog
EN1-35 Baranco Steven 30-452-T3 STAR manual
EN1-35 Baranco Steven 31-238-TC STAR prototype
EN1-36 Roslyn Elizabeth 35-152-TC STAR pricing
EN1-38 Schaaf Carol 36-272-TC Order system
EN1-40 Wing Alexandra 31-238-TC STAR prototype
EN1-40 Wing Alexandra 31-241-TC New catalog
 multi-field primary key is necessary because neither the
EmployeeID nor the ProjectNum fields contain unique values.
Non-key fields relate to only part of the primary key.
They are not functionally dependent on the entire primary key.
The solution to this lies in breaking the table into smaller
tables that do meet second normal form
By KBS
EMPLOYEES *EmployeeID Last Name First Name
EN1-26 O'Brien Sean
EN1-33 Guya Amy
EN1-35 Baranco Steven
EN1-36 Roslyn Elizabeth
EN1-38 Schaaf Carol
EN1-40 Wing Alexandra

EMPLOYEES_PROJECTS *EmployeeID *ProjectNum


EN1-26 30-452-T3
EN1-26 30-457-T3
EN1-26 31-124-T3
EN1-33 30-328-TC
EN1-33 30-452-T3
EN1-33 32-244-T3
EN1-35 30-452-T3
EN1-35 31-238-TC
EN1-36 35-152-TC
EN1-38 36-272-TC
EN1-40 31-238-TC
EN1-40 31-241-TC
By KBS
PROJECTS
*ProjectNum ProjectTitle
30-452-T3 STAR manual
30-457-T3 ISO procedures
30-482-TC Web site
31-124-T3 Employee handbook
31-238-TC STAR prototype
31-238-TC New catalog
35-152-TC STAR pricing
36-272-TC Order system

By KBS
Second Normal Form

Table in First Normal Form

Table Split into Two and now in 2NF


By KBS
Requirements for 2NF

Shall be in First Normal Form [1NF]

All Non Key members fully and functionally dependent


on the Primary Key

By KBS
Third Normal Form (3NF)
A table is in second normal form (2NF) and there are no
transitive dependencies.

A transitive dependency is a type of functional dependency


in which the value in a non-key field is determined by the
value in another non-key field and that field is not a
candidate key.

There may be dependency among non key fields.

There is no algorithmic method of identifying dependency .

We have to use our commonsense and judgment to specify


dependencies.
By KBS
Sales(cust_no,ISBN,Qty)
Book_sale(ISBN,title,Authore_name,Authors_countory,unitprice)
In sale relation there is only one nonkey-No question of
dependency between non-key fields.
Thus there is no transitive dependency.
Hence Sales is in 3NF.
In Book_Sale Authors_countory depends on Authors_name.
There is transitive dependency in Book_sale relation.
Dependency diagram for Book_sale-
Transitive
Authore_name Dependency
ISBN Title

Authore_countory
Unit_price
By KBS
Transitive
Dependency

EMPLOYEE

Emp_ID F_Name L_Name Dept_ID Dept_Name


111 Mary Jones 1 Acct
122 Sarah Smith 2 Mktg

By KBS
Book_sale(ISBN,title,Authore_name,Authors_country,unitprice)

The existence of transitive dependency will result into insert,


update,and delete anomalies.

To overcome these anomalies decompose the Book_sale


relation .

Book(ISBN,Title,UnitPrice ,Authore_name)

Author(Author_name,Author_country)

By KBS
Project_manager
A table with a single field primary key and repeating values in non-key fields.
*ProjectNum ProjectTitle ProjectMgr Phone
30-452-T3 STAR manual Garrison 2756
30-457-T3 ISO procedures Jacanda 2954
30-482-TC Web site Friedman 2846
31-124-T3 Employee handbook Jones 3102
31-238-TC STAR prototype Garrison 2756
31-241-TC New catalog Jones 3102
35-152-TC STAR pricing Vance 3022
36-272-TC Order system Jacanda 2954

The phone number is repeated each time a manager name is repeated.


 This is because the phone number is only a second cousin to the project number.
It's dependent on the manager, which is dependent on the project number
(a transitive dependency).
The ProjectMgr field is not a candidate key because the same person manages
more than one project.
 Again, the solution is to remove the field with repeating data to a separate table.

By KBS
Complying with third normal form Foreign key

PROJECTS
*ProjectNum ProjectTitle ProjectMgr
30-452-T3 STAR manual Garrison
30-457-T3 ISO procedures Jacanda

Primary key 30-482-TC Web site Friedman


31-124-T3 Employee handbook Jones
31-238-TC STAR prototype Garrison
31-241-TC New catalog Jones
35-152-TC STAR pricing Vance
36-272-TC Order system Jacanda

MANAGERS *ProjectMgr Phone


Friedman 2846
Garrison 2756
Jacanda 2954
Jones 3102
Vance 3022

By KBS
Third Normal Form

By KBS
Requirements

Shall be in Second Normal Form[2NF]

No attribute shall be transitively dependent on the


Primary key

By KBS
Boyce-Codd Normal Form (BCNF)
Definition:
A relation is in Boyce-Codd Normal Form (BCNF) if every
determinant is a candidate key.
Determinant is a column on which some of the columns are
fully functional dependent.
Candidate key is minimal subset of super key.
E.g;-Roll_no and {student_name,student_address) are
candidate key .
Roll_no is use to uniquely identify the records and
combination of student_name & student_address are also used
for unique record .
But {Roll_no,student_name} is not a candidate key because
Roll_no is candidate key.

By KBS
Examples:
Consider a database table that stores employee information.

Emp_info(employee_id, first_name, last_name, title)

In this table, the field employee_id determines first_name


and last_name.

Similarly, the tuple (first_name, last_name) determines


employee_id.

By KBS
Boyce-Codd normal form (BCNF)

A relation is in BCNF, if and only if, every determinant is a


candidate key.

The difference between 3NF and BCNF

A functional dependency A  B, 3NF allows this dependency


in a relation if B is a primary-key attribute and A is not a
candidate key,

whereas BCNF insists that for this dependency to remain in a


relation, A must be a candidate key.

By KBS
Example of BCNF(1)
ClientInterview

ClientNo interviewDate interviewTime staffNo roomNo


CR76 13-May-02 10.30 SG5 G101
CR76 13-May-02 12.00 SG5 G101
CR74 13-May-02 12.00 SG37 G102
CR56 1-Jul-02 10.30 SG5 G102

fd1 : clientNo, interviewDate  interviewTime, staffNo, roomNo (Primary


Key)
fd2 : staffNo, interviewDate, interviewTime clientNo (Candidate key)
fd3 : roomNo, interviewDate, interviewTime  clientNo, staffNo
(Candidate key)
fd4 : staffNo, interviewDate  roomNo (not a candidate key)

As a consequece the ClientInterview relation may suffer from


update anmalies.
For example, two tuples have to be updated if the roomNo need
be changed for staffNo SG5 on the 13-May-02.
By KBS
Example of BCNF(1)
To transform the ClientInterview relation to BCNF, we must
remove the violating functional dependency by creating two
new relations called Interview and SatffRoom .
Interview (clientNo, interviewDate, interviewTime, staffNo)

StaffRoom(staffNo, interviewDate, roomNo)


Interview
ClientNo interviewDate interviewTime staffNo
CR76 13-May-02 10.30 SG5
CR76 13-May-02 12.00 SG5
CR74 13-May-02 12.00 SG37
CR56 1-Jul-02 10.30 SG5

StaffRoom
staffNo interviewDate roomNo
SG5 13-May-02 G101
SG37 13-May-02 G102
SG5 1-Jul-02 G102

By KBS
Example of BCNF(2)
Consider relational table SUPPLIERS

SUPPLIERS (supplier_no, supplier_name, city, zip)

We assume that each supplier has a unique supplier_name, so


that supplier_no and supplier_name are both candidate keys.

Functional Dependencies:

supplier_no city
supplier_no zip
supplier_no supplier_name
supplier_name city
supplier_name zip
supplier_name supplier_no
The relation is in BCNF since both determinants (supplier_no and
supplier_name) are unique (i.e., are candidate keys).
By KBS
Note that even relations in BCNF can have anomalies.
Anomalies:
INSERT: We cannot record the city for a supplier_no without also
knowing the supplier_name

DELETE: If we delete the row for a given supplier_name, we lose the


information that the supplier_no is associated with a given city.

Decomposition:

SUPPLIER_INFO (supplier_no, city, zip)


SUPPLIER_NAME (supplier_no, supplier_name)

By KBS
Fourth Normal Form (4NF)
Def:
A table is in 4NF if it is in BCNF and if it has no multi-valued
dependencies.

Multi-valued dependency (MVD)


Represents a dependency between attributes (for example, A,
B and C) in a relation,
such that for each value of A there is a set of values for B and
a set of value for C.
However, the set of values for B and C are independent of
each other.

By KBS
Consider this example of a database of teaching courses, the
books recommended for the course, and the lecturers who will
be teaching the course:

Course_book_lecturer
Course Book Lecturer
AHA Silberschatz John D
AHA Nederpelt William M
AHA Silberschatz William M
AHA Nederpelt John D
AHA Silberschatz Christian G
AHA Nederpelt Christian G
OSO Silberschatz John D
OSO Silberschatz William M

Course_book_lecturer.Course Book
Course_book_lecturer.Course Lecturer
By KBS
Example:- Pizza Delivery Permutations
Restaurant Pizza Variety Delivery Area
A1 Pizza Thick Crust Springfield
A1 Pizza Thick Crust Shelbyville
A1 Pizza Thick Crust Capital City
A1 Pizza Stuffed Crust Springfield
A1 Pizza Stuffed Crust Shelbyville
A1 Pizza Stuffed Crust Capital City
Elite Pizza Thin Crust Capital City
Elite Pizza Stuffed Crust Capital City
Vincenzo's Pizza Thick Crust Springfield
Vincenzo's Pizza Thick Crust Shelbyville
Vincenzo's Pizza Thin Crust Springfield
Vincenzo's Pizza Thin Crust Shelbyville
The dependencies are:
{Restaurant} {Pizza Variety}
{Restaurant} {Delivery Area}
By KBS
To eliminate the possibility of these anomalies, we must place
the facts about varieties offered into a different table from the
facts about delivery areas,
Two tables that are both in 4NF:
Varieties By Restaurant Delivery Areas By Restaurant

Restaurant Pizza Variety Restaurant Delivery Area


A1 Pizza Thick Crust A1 Pizza Springfield
A1 Pizza Stuffed Crust A1 Pizza Shelbyville
Elite Pizza Thin Crust A1 Pizza Capital City
Elite Pizza Stuffed Crust Elite Pizza Capital City
Vincenzo's Vincenzo's
Thick Crust Springfield
Pizza Pizza
Vincenzo's Vincenzo's
Thin Crust Shelbyville
Pizza Pizza

By KBS
Fifth Normal Form (5NF)
Fifth normal form (5NF), also known as Project-join
normal form (PJ/NF) is a level of database normalization
designed to reduce redundancy in relational databases.

Def: A table is in 5NF, also called "Projection-Join Normal


Form" (PJNF), if it is in 4NF and if every join dependency in
the table is a consequence of the candidate keys of the table.

A table is said to be in the 5NF if and only if every join


dependency in it is implied by the candidate keys.

A relation that has no join dependency.

By KBS
There are some relation which cannot be decomposed in to two
projection.
But these relations can be decomposed into three relation this is
in 5NF.
Suppose there is a relation Dealer_parts_customer
Dealer Parts customer
D1 P1 C1
D1 P1 C2
D1 P2 C1
D2 P1 C1

We decompose it into three projection


Dealer_parts
parts_customer
Dealer_customer
By KBS
Dealer_parts parts_customer

Dealer Parts Parts customer


D1 P1 P1 C1
D1 P2 P1 C2
D2 P1 P2 C1

Customer_Dealer

customer Dealer
C1 D1
C2 D1
C1 D2
If we join three table we obtained original relation.
If we join two relation we can not obtained original relation.
This is called join dependency.
By KBS
End of Unit-I

By KBS

You might also like