0% found this document useful (0 votes)
40 views

Normalization: Click To Edit Master Subtitle Style

The document discusses database normalization and provides examples. It begins by introducing database design problems and how normalization addresses issues like redundancy and data integrity. It then covers the various normal forms including 1NF, 2NF, 3NF, and BCNF. Examples of unnormalized and normalized relations are provided to illustrate the concepts. The goal of normalization is to break relations into smaller subsets and eliminate anomalies like insertion, deletion, and update issues.

Uploaded by

Regina Sabs
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views

Normalization: Click To Edit Master Subtitle Style

The document discusses database normalization and provides examples. It begins by introducing database design problems and how normalization addresses issues like redundancy and data integrity. It then covers the various normal forms including 1NF, 2NF, 3NF, and BCNF. Examples of unnormalized and normalized relations are provided to illustrate the concepts. The goal of normalization is to break relations into smaller subsets and eliminate anomalies like insertion, deletion, and update issues.

Uploaded by

Regina Sabs
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 46

4/29/12

Normalization
Click to edit Master subtitle style

4/29/12

Database design problem

How to decide on a suitable logical structure? to decide the relations needed? to decide on the attributes in each relation? to avoid redundancy? to represent all the information? to avoid loss of information?

How How How How How

The Suppilers and Parts Database Relational View


S
S# S1 S2 S3 S4 S5 SNAME Smith Jones Blake Clark Adams STATUS 20 10 30 20 30 CITY London Paris London Paris Athens

4/29/12

SP

S# S1 S1 S1 S1 S1

P# P1 P2 P3 P4 P5 P6 P1 P2 P2 P2 P4 P5

QTY 300 200 400 200 100 100 300 400 200 200 300 400

P# P1 P2 P3 P4 P5 P6

PNAME Nut Bolt Screw Screw Cam Dog

COLOR Red Green Blue Red Blue Red

WEIGHT 12 17 17 14 12 19

CITY London Paris Rome London Paris London

S1 S2 S2 S3 S4 S4 S4

15 29

4/29/12

STATUS given in the table SP SP Unnecessary repetitions S# S1 S1 S1 S1 P# P1 P2 P3 P4 QTY 300 200 400 100 STATUS 20 20 20 20

4/29/12

Normalization
A logical design method which minimizes data redundancy and reduces design flaws.

Consists of applying various normal forms to the database design.

A relation is said to be in a particular normal form if it satisfies certain specified set of

4/29/12

The normal forms break down large tables into smaller subsets theory is based on the observation that relations with certain properties are more effective in inserting, updating and deleting data than other sets of relations containing the same data is a multi-step process beginning with an unnormalized relation.

Normalization

Normalization

4/29/12

Normal Forms
First

Normal Form (1NF) Normal Form (2NF)

Second Third

Normal Form (3NF) Normal Form

Boyce-Codd

(BCNF)
Fourth

Normal Form (4NF)

4/29/12

4/29/12

Unnormalized Relations

First step in normalization is to convert the data into a two-dimensional table unnormalized relations data can repeat within a column

In

First Normal Form (1NF)


A relation R is in First Normal Form(1NF) if and only if all underlying domains contain atomic values only. q Each attribute must be atomic
No repeating columns within a row. No multi-valued columns.

4/29/12

v 1NF simplifies attributes


Queries become easier. A column or set of columns is called a Candidate Key when its values can uniquely identify the row in the relation.

RELATIONS S & SP JOINED AS FIRST FIRST S# STATUS CITY S1 S1 S1 S1 S1 S1 20 20 20 20 20 20 P# QTY 300 200 400 200 100 100 QT Y S # P # Sample tabulation for an eg.

Londo P1 n Londo P2 n Londo P3 n Londo P4 n Londo P5 n Londo P6

Page #3 STATU S CITY

4/29/12

Primary key is (S#,P#) 27

1NF Storage Anomalies


Insertion:

4/29/12

The fact that a supplier is located in a particular city cannot be entered unless the supplier supplies at least one part. Eg. S5 is located in Athens. City value appearing many times can cause redundancy problems while updating. : Deleting the only tuple for a particular supplier, we delete the particular shipment data together with the location of the supplier. Eg. If tuple with primary key S3, P2 is deleted, we lose info that S3 is located in Paris.

Update:

Deletion

Solution to 1nf
Replace

4/29/12

relation FIRST by two relations SECOND(S#, STATUS, CITY) and SP(S3, P#, QTY) SP S# P# QTY SECON D S# STATUS CITY S S1 P1 300 # QT S1 20 London Y P S1 P2 200 # S2 10 Paris S1 P3 400 S3 10 S1 P4 200 S1 P5 100

Paris STATU 27 S S4 S 20 London # S5 30 CITY Athens

4/29/12

Functional dependence (FD)


Given

a relation R, attribute Y of R is functionally dependant on attribute X of R if and only if each Xvalue in R has associated with it precisely one Yvalue in R. J X X Y Y Z K 1 1 4 4 3 L 0 6 1 9 5 J K J L

FD in supplier-parts database
S# S# S#

4/29/12

SNAME STATUS CITY

Read as SNAME functionally dependant on S# or attribute S# functionally determines attribute SNAME

or
S#

Page #3

(SNAME, STATUS, CITY)

Given a relation R, attribute Y of R is functionally dependant on attribute X of R if and only if, whenever two tuples of R agree on their X-value, they also agree on their Y-value.

4/29/12

Functional dependency diagram

S
S# S1 S2 S3 S4 S5 SNAME Smith Jones Blake Clark Adams STATUS 20 10 30 20 30 CITY London Paris London Paris Athens

S#

STATU S

SNAM E

CIT Y

Full functional dependence


Attribute

4/29/12

Y is fully functionally dependent on attribute X if it is functionally dependent on X and not functionally dependent on any proper subset of X. other words, there does not exist a proper subset X of the attributes constituting X such that Y is functionally dependent on X. In relation S, (S#, STATUS) CITY . But not fully functionally dependent, because

In

Eg.

Second Normal Form (2NF)


A relation R is in second normal form(2NF) if and only if it is in 1NF and every non-key attribute is fully dependent on the primary key. Each attribute must be functionally dependent on the primary key.
Functional dependence - the property of one or more attributes that uniquely determines the value of other attributes. Any non-dependent attributes are moved into a smaller (subset) table.

4/29/12

2NF improves data integrity.


Prevents update, insert, and delete anomalies.

Functional Dependence in second & sp


SECOND

SP

4/29/12

S# P# QTY S1 P1 300

S# STATUS CITY S1 S2 S3 S4 S # S5 20 10 London Paris

S1 P2 200 S1 P3 400 S1 P4 200 S1 P5 100 S # P # S1 P6 100 QT S2 P1 300 Y

10 Paris STATU S 20 London 30 Athens CITY

4/29/12

1NF Storage Anomalies Removed


Insertion:

Can now enter new suppliers with their location. : Can delete shipment info without deleting info about location of a supplier Easy to update locations without repetition.

Deletion

Update:

2NF Storage Anomalies


SECOND
Insertion:

Cannot enter the fact that a particular city has aS1 particular status value, unless we have a supplier in the city. S2

S# STATUS CITY 20 10 London Paris Paris London

4/29/12

Deletion:

Deleting supplier info S3 10 can cause the loss of info concerning status value of a city.
S4 20

Update:

The status value appearing many times can cause inconsistency S5 while30 Athens updating the database.

Transitive Dependence

4/29/12

SECON D S #

STATU S CITY

Status is functionally dependent on S#. Also it is transitively also dependent on S#, via, city. Each S# value determines a city value which in turn determines the status value.

Solution to 2nf
Replace

4/29/12

original relation SECOND by two projections SC(S#, CITY) and CS(CITY, STATUS)

S#
SC

CIT CIT Y Y FD in relations SC and CS


CS

STATU S

S# CITY S1 London S2 Paris

CITY Athen s Londo n

STATUS 30 20

Third Normal Form (3NF)


A relation is said to be in third normal form(3NF) if and only if it is in 2NF and every non-key attribute is non-transitively dependent on the primary key. A relation is said to be in Third Normal

4/29/12

Form if there is no transitive functional dependency between non-key attributes


Any

transitive dependencies are moved into a smaller (subset) table. further improves data integrity.

3NF

Prevents update, insert, and delete anomalies.

4/29/12

2NF Storage Anomalies Removed

Insertion:

We can now enter status values for cities, even if there is no supplier. Deleting info about a supplier does not delete info about status of a city. No redundancy and hence no inconsistency.

Deletion:

Update:

Boyce-Codd Normal Form


A relation R, is said to be in BCNF if and only if every determinant is a candidate key.

4/29/12

Most

3NF relations are also BCNF relations.

3NF relation is NOT in BCNF if:


Candidate keys in the relation are composite keys (they are not single attributes) and is more than one candidate key in the relation,

There The

keys are not disjoint, that is, some attributes in the keys are common, ie, some attributes overlap.

EXAMPLES
Relations

4/29/12

FIRST and SECOND are not either 3NF or

BCNF.
Relations

SP, SC, CS which were in 3NF are in BCNF

also.
Relation

FIRST contains 3 determinants : S#, CITY and (S#,P#). Of these, only (S#,P#) is a candidate key. So, FIRST is not in BCNF. Page #11 SECOND is not in BCNF, because determinant CITY Page #13 is not a candidate key.

Eg. With 2 disjoint candidate keys


Relation

4/29/12

S is in BCNF
STATUS 20 10 30 20 30 CITY London Paris London Paris Athens

S# S1 S2 S3 S4 S5

SNAME Smith Jones Blake Clark Adams

S#

STATU S

Assumptions: Status and City are independent Supplier numbers and

SNAM E

CIT Y

Eg. With overlapping candidate keys


2

4/29/12

candidate keys overlap if they involve 2 or more attributes each and have an attribute in common. Relation SSP(S#, SNAME, P#, QTY) are (S#, P#) and (SNAME, P#) Page #3

Eg.

Keys SSP

is not BCNF since the determinants S# and SNAME are not keys. SSP is in 3NF. can cause inconsistency.

But

Updations

Eg. Relation SJT with attributes s,j and t ie, student, subject and teacher
For

4/29/12

each subject, each student of that subject is taught by only 1 teacher teacher teaches only 1 subject subject is taught by several teachers candidate keys SJ and ST S J T

Each Each

Overlapping

SJT S J T Smith Maths Prof. White Smith Physics Prof. Green Jones Maths Prof.

4/29/12

Eg. Of SJT contd


Keys SJT T

are SJ and ST

is in 3NF, not BCNF.

is determinant, but not candidate key

Updation

anomalies can occur during deletions.

Solution

is to replace SJT with 2 BCNF projections ST(S, T) and TJ(T, J).

Typical eg. Of BCNF


Relation

Exam with attributes S(Student), J(Subject) and P(Position). EXAM tuple means that specified student was examined in the specified subject and achieved the specified position. is that there are no ties in the position keys are SJ and JP.

4/29/12

An

Rule

Candidate EXAM

is in BCNF since, the keys are the only determinants S J P an d Functional Dependanc J P S

Difference b/w 3NF and BCNF


Relations

4/29/12

in BCNF are in 3NF, not vice

versa.
3NF

can cause problems when candidate keys overlap. BCNF eliminates these problem cases with 3NF. when there are more than one candidate key in a relation, 3NF may show anomalies, which BCNF may not show. than 3NF since it involves no reference to primary key, transitive

Also

Simpler

Fourth Normal Form


Unnormalized COURSE Table CTX TEACHER TEXT Physics Prof. Green Basic Prof. Brown Mechanics Prof. Black Principles of Optics Normalized Table Maths Prof. White Modern Algebra COURSE TEACHER TEXT CTX Projective Physics Prof. Green Physics Prof. Green Physics Prof. Brown Geometry Basic Mechanics Basic Mechanics Basic Mechanics

4/29/12

Indicated course can be taught by the indicated teacher, using all In BCNF, ie, indicated no FD, but texts lot of redundancie s Why no FD?

Meaning of CTX
A

4/29/12

tuple <c,t,x> appears in CTX if and only if course c can be taught by teacher t, and uses text x as reference. a given course all possible combinations of teacher & text appear. CTX satisfies the constraint if tuples <c1,t1,x1> & <c2,t2,x2> appear, then tuples <c1,t1,x2> & <c2,t2,x1> also appear. lot of redundancy: to add info that Physics uses a new text, 3 tuples need to be created for all the 3 teachers. this is a problem BCNF relation.

For

Ie,

Contains

So

Solution is to replace ctx with ct & cx


CT

4/29/12

COURSE TEACHER Physics Prof. Green

CX

COURSE TEXT

Physics Basic Mechanics Physics Principles of Physics Prof. Optics Brown Maths Modern Algebra Physics Prof. Black This decomposition is made on the basis of a new dependancy named multi-valued dependancy Maths Projective Maths Prof. White (MVD), which is a generalization of FD. Geometry

Multi-valued dependency
Given

4/29/12

a relation R with attributes A, B and C, the multi-valued dependence A B holds in R if and only if the set of B-values matching a given pair(A-value, C-value) in R depends only on the A-value and is independent of the Cvalue. As usual A, B and C may be composite. can exist only if the relation has at least 3 attributes. a relation R(A,B,C), the MVD A B holds if and only if A C also holds. Ie, MVDs always go together in pairs. A B | C eg. Course Teacher | Text

MVDs

In

Ie,

For

Problem with CTX


COURSE TEACHER TEXT MVDs in CTX

Physics Prof. Basic Mechanics courseteac Green her Courseteach Physics Prof. Basic Mechanics er Green coursetext Means a course Physics Prof. Basic Mechanics may not have a Brown single Physics Prof. Principles of corresponding Brown Optics teacher, but it Contains an MVD that is not an FD. will have a wellPhysics Prof. Black Principles of CT and CX doesnt have any such MVD. defined set of Optics corr. Relation Black attributes A, Physics Prof.R with Principles of B and C can teachers. be Optics non-loss decomposed to 2 projections R1(A,B) and Prof. WhiteModern Algebra R2(A,C) if and only if the MVD AB|C Maths

4/29/12

Fourth Normal Form


A

4/29/12

relation R is in 4th Normal Form (4NF), if and only if whenever there exists an MVD in R, say AB, then all attributes of R are also functionally dependent on A. (ie, AX for all attributes X of R) relation is in Fourth Normal Form if it is BCNF and any multivalued dependencies are trivial non-trivial multivalued dependencies by projecting into simpler tables other words, the only dependencies (FD or MVD) in R are of the form KX. ie, a FD from a candidate key K to some other attribute X.

Any

Eliminate

In

Fifth Normal Form


Some

4/29/12

projections cannot be non-loss decomposed into 2 projections but can be non-loss decomposed into 3 projections.

SPJ S# S1 S1 S2 S1

P# P1 P2 P1 P1

J# J2 J1 J1 J1

SPJ is All Key No FD No MVD In 4NF

4/29/12

S# S1 S1 S2

P# P1 P2 P1 Join over P# S# P# J# S1 P1 J2 S1 P1 J1 S1 P2 J1 S2 P1 J2

P# P1 P2 P1

J# J2 J1 J1

J# J2 J1 J1

S# S1 S1 S2

Join over (J#,S#) Original

Join-dependence

4/29/12

The statement SPJ is equivalent to the join of its 3 projections SP, PJ and JS is equivalent to the stmt, if the pair <s1,p1> appears in SP and the pair <p1,j1> appears in PJ and the pair <j1,s1> appears in JS, then the triple <s1,p1,j1> appears in SPJ. Since <s1,p1> appears in SP if and only if s1 and p1 appear together in SPJ, and similarly for <p1,j1> and <j1,s1> we can rewrite the above statement as a constraint on SPJ: If <s1,p1,j1> , <s2,p1,j1> , <s1,p2,j1> appear in SPJ then <s1,p1,j1> also appear in SPJ. This constraint is a join dependency since it holds on a relation which is a join of its projections. We can say, SPJ satisfies the JD (SP, PJ, JS). Or, Relation R satisfies the JD(X,Y, , Z) if and only if it is the join of its projections on X, Y , , Z where X, Y , . , Z

R(A,B,C)

can be non-loss decomposed into R1(A,B) and R2(A,C) if and only if A B|C holds in R is equivalent to the statement that R(A,B,C) satisfies the JD *(AB,AC)if and only if it satisfies the MVD A B|C we can say MVD is a special case of JD.

4/29/12

So, Or

JDs are generalizations of MVDs.

JDs

are the most general form of dependency possible.

Problem

with SPJ was that although in 4NF , it

Fifth Normal Form A relation is in 5NF - also called projection-join normal form (PJ/NF) - if and only if every join dependency in the relation is implied by the candidate keys of the relation Means that relations that have been decomposed in previous NF can be recombined via natural joins to recreate the original relation.

4/29/12

4/29/12

Advantages of normalization

Reduces data redundancies data anomalies

Eliminates

Produces

controlled redundancies to link

tables

Normalization tips
Normalization

4/29/12

is performed to reduce or eliminate Insertion, Deletion or Update anomalies. a completely normalized database may not be the most efficient or effective implementation. efficiency. is sometimes used to improve

However,

Denormalization

Normalization

splits database information across multiple tables. retrieve complete information from a normalized database, the JOIN operation must be used. tends to be expensive in terms of processing

To

JOIN

You might also like