Normalization: Click To Edit Master Subtitle Style
Normalization: Click To Edit Master Subtitle Style
Normalization
Click to edit Master subtitle style
4/29/12
How to decide on a suitable logical structure? to decide the relations needed? to decide on the attributes in each relation? to avoid redundancy? to represent all the information? to avoid loss of information?
4/29/12
SP
S# S1 S1 S1 S1 S1
P# P1 P2 P3 P4 P5 P6 P1 P2 P2 P2 P4 P5
QTY 300 200 400 200 100 100 300 400 200 200 300 400
P# P1 P2 P3 P4 P5 P6
WEIGHT 12 17 17 14 12 19
S1 S2 S2 S3 S4 S4 S4
15 29
4/29/12
STATUS given in the table SP SP Unnecessary repetitions S# S1 S1 S1 S1 P# P1 P2 P3 P4 QTY 300 200 400 100 STATUS 20 20 20 20
4/29/12
Normalization
A logical design method which minimizes data redundancy and reduces design flaws.
4/29/12
The normal forms break down large tables into smaller subsets theory is based on the observation that relations with certain properties are more effective in inserting, updating and deleting data than other sets of relations containing the same data is a multi-step process beginning with an unnormalized relation.
Normalization
Normalization
4/29/12
Normal Forms
First
Second Third
Boyce-Codd
(BCNF)
Fourth
4/29/12
4/29/12
Unnormalized Relations
First step in normalization is to convert the data into a two-dimensional table unnormalized relations data can repeat within a column
In
4/29/12
RELATIONS S & SP JOINED AS FIRST FIRST S# STATUS CITY S1 S1 S1 S1 S1 S1 20 20 20 20 20 20 P# QTY 300 200 400 200 100 100 QT Y S # P # Sample tabulation for an eg.
4/29/12
4/29/12
The fact that a supplier is located in a particular city cannot be entered unless the supplier supplies at least one part. Eg. S5 is located in Athens. City value appearing many times can cause redundancy problems while updating. : Deleting the only tuple for a particular supplier, we delete the particular shipment data together with the location of the supplier. Eg. If tuple with primary key S3, P2 is deleted, we lose info that S3 is located in Paris.
Update:
Deletion
Solution to 1nf
Replace
4/29/12
relation FIRST by two relations SECOND(S#, STATUS, CITY) and SP(S3, P#, QTY) SP S# P# QTY SECON D S# STATUS CITY S S1 P1 300 # QT S1 20 London Y P S1 P2 200 # S2 10 Paris S1 P3 400 S3 10 S1 P4 200 S1 P5 100
4/29/12
a relation R, attribute Y of R is functionally dependant on attribute X of R if and only if each Xvalue in R has associated with it precisely one Yvalue in R. J X X Y Y Z K 1 1 4 4 3 L 0 6 1 9 5 J K J L
FD in supplier-parts database
S# S# S#
4/29/12
or
S#
Page #3
Given a relation R, attribute Y of R is functionally dependant on attribute X of R if and only if, whenever two tuples of R agree on their X-value, they also agree on their Y-value.
4/29/12
S
S# S1 S2 S3 S4 S5 SNAME Smith Jones Blake Clark Adams STATUS 20 10 30 20 30 CITY London Paris London Paris Athens
S#
STATU S
SNAM E
CIT Y
4/29/12
Y is fully functionally dependent on attribute X if it is functionally dependent on X and not functionally dependent on any proper subset of X. other words, there does not exist a proper subset X of the attributes constituting X such that Y is functionally dependent on X. In relation S, (S#, STATUS) CITY . But not fully functionally dependent, because
In
Eg.
4/29/12
SP
4/29/12
S# P# QTY S1 P1 300
4/29/12
Can now enter new suppliers with their location. : Can delete shipment info without deleting info about location of a supplier Easy to update locations without repetition.
Deletion
Update:
Cannot enter the fact that a particular city has aS1 particular status value, unless we have a supplier in the city. S2
4/29/12
Deletion:
Deleting supplier info S3 10 can cause the loss of info concerning status value of a city.
S4 20
Update:
The status value appearing many times can cause inconsistency S5 while30 Athens updating the database.
Transitive Dependence
4/29/12
SECON D S #
STATU S CITY
Status is functionally dependent on S#. Also it is transitively also dependent on S#, via, city. Each S# value determines a city value which in turn determines the status value.
Solution to 2nf
Replace
4/29/12
original relation SECOND by two projections SC(S#, CITY) and CS(CITY, STATUS)
S#
SC
STATU S
STATUS 30 20
4/29/12
transitive dependencies are moved into a smaller (subset) table. further improves data integrity.
3NF
4/29/12
Insertion:
We can now enter status values for cities, even if there is no supplier. Deleting info about a supplier does not delete info about status of a city. No redundancy and hence no inconsistency.
Deletion:
Update:
4/29/12
Most
There The
keys are not disjoint, that is, some attributes in the keys are common, ie, some attributes overlap.
EXAMPLES
Relations
4/29/12
BCNF.
Relations
also.
Relation
FIRST contains 3 determinants : S#, CITY and (S#,P#). Of these, only (S#,P#) is a candidate key. So, FIRST is not in BCNF. Page #11 SECOND is not in BCNF, because determinant CITY Page #13 is not a candidate key.
4/29/12
S is in BCNF
STATUS 20 10 30 20 30 CITY London Paris London Paris Athens
S# S1 S2 S3 S4 S5
S#
STATU S
SNAM E
CIT Y
4/29/12
candidate keys overlap if they involve 2 or more attributes each and have an attribute in common. Relation SSP(S#, SNAME, P#, QTY) are (S#, P#) and (SNAME, P#) Page #3
Eg.
Keys SSP
is not BCNF since the determinants S# and SNAME are not keys. SSP is in 3NF. can cause inconsistency.
But
Updations
Eg. Relation SJT with attributes s,j and t ie, student, subject and teacher
For
4/29/12
each subject, each student of that subject is taught by only 1 teacher teacher teaches only 1 subject subject is taught by several teachers candidate keys SJ and ST S J T
Each Each
Overlapping
SJT S J T Smith Maths Prof. White Smith Physics Prof. Green Jones Maths Prof.
4/29/12
are SJ and ST
Updation
Solution
Exam with attributes S(Student), J(Subject) and P(Position). EXAM tuple means that specified student was examined in the specified subject and achieved the specified position. is that there are no ties in the position keys are SJ and JP.
4/29/12
An
Rule
Candidate EXAM
is in BCNF since, the keys are the only determinants S J P an d Functional Dependanc J P S
4/29/12
versa.
3NF
can cause problems when candidate keys overlap. BCNF eliminates these problem cases with 3NF. when there are more than one candidate key in a relation, 3NF may show anomalies, which BCNF may not show. than 3NF since it involves no reference to primary key, transitive
Also
Simpler
4/29/12
Indicated course can be taught by the indicated teacher, using all In BCNF, ie, indicated no FD, but texts lot of redundancie s Why no FD?
Meaning of CTX
A
4/29/12
tuple <c,t,x> appears in CTX if and only if course c can be taught by teacher t, and uses text x as reference. a given course all possible combinations of teacher & text appear. CTX satisfies the constraint if tuples <c1,t1,x1> & <c2,t2,x2> appear, then tuples <c1,t1,x2> & <c2,t2,x1> also appear. lot of redundancy: to add info that Physics uses a new text, 3 tuples need to be created for all the 3 teachers. this is a problem BCNF relation.
For
Ie,
Contains
So
4/29/12
CX
COURSE TEXT
Physics Basic Mechanics Physics Principles of Physics Prof. Optics Brown Maths Modern Algebra Physics Prof. Black This decomposition is made on the basis of a new dependancy named multi-valued dependancy Maths Projective Maths Prof. White (MVD), which is a generalization of FD. Geometry
Multi-valued dependency
Given
4/29/12
a relation R with attributes A, B and C, the multi-valued dependence A B holds in R if and only if the set of B-values matching a given pair(A-value, C-value) in R depends only on the A-value and is independent of the Cvalue. As usual A, B and C may be composite. can exist only if the relation has at least 3 attributes. a relation R(A,B,C), the MVD A B holds if and only if A C also holds. Ie, MVDs always go together in pairs. A B | C eg. Course Teacher | Text
MVDs
In
Ie,
For
Physics Prof. Basic Mechanics courseteac Green her Courseteach Physics Prof. Basic Mechanics er Green coursetext Means a course Physics Prof. Basic Mechanics may not have a Brown single Physics Prof. Principles of corresponding Brown Optics teacher, but it Contains an MVD that is not an FD. will have a wellPhysics Prof. Black Principles of CT and CX doesnt have any such MVD. defined set of Optics corr. Relation Black attributes A, Physics Prof.R with Principles of B and C can teachers. be Optics non-loss decomposed to 2 projections R1(A,B) and Prof. WhiteModern Algebra R2(A,C) if and only if the MVD AB|C Maths
4/29/12
4/29/12
relation R is in 4th Normal Form (4NF), if and only if whenever there exists an MVD in R, say AB, then all attributes of R are also functionally dependent on A. (ie, AX for all attributes X of R) relation is in Fourth Normal Form if it is BCNF and any multivalued dependencies are trivial non-trivial multivalued dependencies by projecting into simpler tables other words, the only dependencies (FD or MVD) in R are of the form KX. ie, a FD from a candidate key K to some other attribute X.
Any
Eliminate
In
4/29/12
projections cannot be non-loss decomposed into 2 projections but can be non-loss decomposed into 3 projections.
SPJ S# S1 S1 S2 S1
P# P1 P2 P1 P1
J# J2 J1 J1 J1
4/29/12
S# S1 S1 S2
P# P1 P2 P1 Join over P# S# P# J# S1 P1 J2 S1 P1 J1 S1 P2 J1 S2 P1 J2
P# P1 P2 P1
J# J2 J1 J1
J# J2 J1 J1
S# S1 S1 S2
Join-dependence
4/29/12
The statement SPJ is equivalent to the join of its 3 projections SP, PJ and JS is equivalent to the stmt, if the pair <s1,p1> appears in SP and the pair <p1,j1> appears in PJ and the pair <j1,s1> appears in JS, then the triple <s1,p1,j1> appears in SPJ. Since <s1,p1> appears in SP if and only if s1 and p1 appear together in SPJ, and similarly for <p1,j1> and <j1,s1> we can rewrite the above statement as a constraint on SPJ: If <s1,p1,j1> , <s2,p1,j1> , <s1,p2,j1> appear in SPJ then <s1,p1,j1> also appear in SPJ. This constraint is a join dependency since it holds on a relation which is a join of its projections. We can say, SPJ satisfies the JD (SP, PJ, JS). Or, Relation R satisfies the JD(X,Y, , Z) if and only if it is the join of its projections on X, Y , , Z where X, Y , . , Z
R(A,B,C)
can be non-loss decomposed into R1(A,B) and R2(A,C) if and only if A B|C holds in R is equivalent to the statement that R(A,B,C) satisfies the JD *(AB,AC)if and only if it satisfies the MVD A B|C we can say MVD is a special case of JD.
4/29/12
So, Or
JDs
Problem
Fifth Normal Form A relation is in 5NF - also called projection-join normal form (PJ/NF) - if and only if every join dependency in the relation is implied by the candidate keys of the relation Means that relations that have been decomposed in previous NF can be recombined via natural joins to recreate the original relation.
4/29/12
4/29/12
Advantages of normalization
Eliminates
Produces
tables
Normalization tips
Normalization
4/29/12
is performed to reduce or eliminate Insertion, Deletion or Update anomalies. a completely normalized database may not be the most efficient or effective implementation. efficiency. is sometimes used to improve
However,
Denormalization
Normalization
splits database information across multiple tables. retrieve complete information from a normalized database, the JOIN operation must be used. tends to be expensive in terms of processing
To
JOIN