0% found this document useful (0 votes)
17 views

Unit 4 - PDF

ER Model

Uploaded by

lakshmi.bhavanib
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Unit 4 - PDF

ER Model

Uploaded by

lakshmi.bhavanib
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Why Is This Important?

 Many ways to model a given scenario in a database


Schema Refinement and  How do we find the best one?
 We will discuss objective criteria for evaluating
Normal Forms database design quality
 Formally define desired properties
Chapter 19
 Algorithms for determining if a database has these
properties
 Algorithms for fixing problems

1 2

The Evils of Redundancy Functional Dependencies (FDs)


 Redundancy is at the root of several problems  A functional dependency XY holds over relation R
associated with relational schemas: if, for every allowable instance r of R:
 Redundant storage  t1r, t2r, X(t1) = X(t2) implies Y(t1) = Y(t2)
 Insert, delete, update anomalies  I.e., given two tuples in r, if the X values agree, then the Y
 Integrity constraints can be used to identify schemas values must also agree. (X and Y are sets of attributes.)
with such problems and to suggest refinements.  An FD is a statement about all allowable relations.
 Main refinement technique: decomposition  Must be identified based on semantics of application.
 Replacing ABCD with, say, AB and BCD, or ACD and ABD.  Given some allowable instance r1 of R, we can check if it
violates some FD f, but we cannot tell if f holds over R.
 Decomposition should be used judiciously:
 K is a candidate key for R means that KR
 Is there reason to decompose a relation?
 However, KR does not require K to be minimal.
 What problems (if any) does the decomposition cause?
3 4

Wages R W
8 10
Example: Constraints on Entity Set Example (Contd.) Hourly_Emps2 5 7

S N L R H
Are the two smaller
 Consider a relation obtained from Hourly_Emps: tables better? 123-22-3666 Attishoo 48 8 40
 Hourly_Emps (ssn, name, lot, rating, hrly_wages, 231-31-5368 Smiley 22 8 30
hrs_worked)  Problems in single “wide”
table due to RW: 131-24-3650 Smethurst 35 5 30
 Notation: We will denote this relation schema by  Update anomaly: Can we 434-26-3751 Guldu 35 5 32
listing the attributes: SNLRWH change W in just the first
tuple of SNLRWH? 612-67-4134 Madayan 35 8 40
 This is really the set of attributes {S,N,L,R,W,H}.  Insertion anomaly: What
S N L R W H
if we want to insert an
 Sometimes, we will refer to all attributes of a relation by employee and don’t know 123-22-3666 Attishoo 48 8 10 40
using the relation name. (e.g., Hourly_Emps for SNLRWH) the hourly wage for his
rating? 231-31-5368 Smiley 22 8 10 30
 Some FDs on Hourly_Emps:  Deletion anomaly: If we
131-24-3650 Smethurst 35 5 7 30
delete all employees with
 ssn is the key: SSNLRWH rating 5, we lose the
information about the
434-26-3751 Guldu 35 5 7 32
 rating determines hrly_wages: RW wage for rating 5. 612-67-4134 Madayan 35 8 10 40
5 6
Reasoning About FDs Reasoning About FDs (Contd.)
 Given some FDs, we can infer additional FDs:  Additional rules (that follow from the AA):
 ssndid, didlot implies ssnlot  Union: If XY and XZ, then XYZ
 An FD f is implied by a set of FDs F if f holds whenever all  Decomposition: If XYZ, then XY and XZ
FDs in F hold.  Example: Contracts(cid, sid, jid, did, pid, qty, value) and:
 F+ = closure of F; is the set of all FDs that are implied by F.  C is the key: CCSJDPQV
 Armstrong’s Axioms (X, Y, Z are sets of attributes):  Project purchases each part using single contract: JPC
 Reflexivity: If XY, then YX.  Dept purchases at most one part from a supplier: SDP
 Augmentation: If XY, then XZYZ for any Z.  JP C, CCSJDPQV imply JPCSJDPQV
 Transitivity: If XY and YZ, then XZ.  SDP implies SDJJP
 These are sound (generate only FDs in F+) and complete  SDJJP, JPCSJDPQV imply SDJCSJDPQV
(generate all FDs in F+) inference rules for FDs.
7 8

Reasoning About FDs (Contd.) So, What Do We Do Now With FDs?


 Computing the closure of a set of FDs can be expensive.  Essential for identifying problems in a database
 Size of closure is exponential in # attributes design
 Typically, we just want to check if a given FD XY is in  Provide a way for “fixing” the problem
the closure of a set of FDs F. An efficient algorithm:
 Compute attribute closure of X (denoted X+) wrt F:
• Set of all attributes A such that XA is in F+  Key concept: normal forms
• There is a linear time algorithm to compute this.  A relation that is in a certain normal form has certain
 Check if Y is in X+ desirable properties

 Does F = {AB, BC, CDE} imply AE?


 I.e, is AE in the closure F+? Equivalently, is E in A+?
9 10

Normal Forms Boyce-Codd Normal Form (BCNF)


 Returning to the issue of schema refinement, the first  Reln R with FDs F is in BCNF if, for all XA in F+
question to ask is whether any refinement is needed.  AX (called a trivial FD), or
 X is a superkey for R.
 If a relation is in a certain normal form (BCNF, 3NF etc.),
it is known that certain kinds of problems are avoided or  In other words, R is in BCNF if the only non-trivial FDs
that hold over R are key constraints.
minimized.
 R is free of any redundancy caused by FDs alone.
 Helps deciding whether decomposing the relation will help. • No field of any tuple can be inferred (using only FDs) from the values
 Role of FDs in detecting redundancy: in the other fields in the relation instance
• For XA, consider two tuples with the same X Y A
 Consider a relation R with three attributes, ABC. X value.
• No FDs hold: There is no redundancy here. • They should have the same A value. Redundancy? x y1 a
• Given AB: Several tuples could have the same A value, and if so, • No. Since R is in BCNF, X is a superkey and hence
they all have the same B value. the “two” tuples must be identical. x y2 ?

11 12
Problems Prevented By BCNF Third Normal Form (3NF)
 If BCNF is violated by (non-trivial) FD XA, one of the  Reln R with FDs F is in 3NF if, for all XA in F+
following holds:  AX (called a trivial FD), or
 X is a subset of some key K.  X is a superkey for R, or
• We store (X, A) pairs redundantly.  A is part of some key for R.
• E.g., Reserves(S, B, D, C) with SBD as only key and FD SC
• Credit card number of a sailor stored for each reservation
 Minimality of a key is crucial in third condition above.
 X is not a proper subset of any key.  If R is in BCNF, is it automatically in 3NF? What about the
• Redundant storage of (X, A) pairs as above other direction?
• And there is a chain of FDs KXA, which means that we cannot  If R is in 3NF, some redundancy is possible.
associate an X value with a K value unless we also associate an A value
with an X value.  3NF is a compromise, used when BCNF is not achievable (e.g.,
• E.g., Hourly_Emps(S, N, L, R, W, H) with S as only key and FD RW no ``good’’ decomposition, or performance considerations).
• Have chain SRW, hence cannot record the fact that employee S has  Lossless-join, dependency-preserving decomposition of R into a
rating R without knowing the hourly wage for that rating collection of 3NF relations is always possible. (covered soon)
13 14

What Does 3NF Achieve? Footnote About Other Normal Forms


 Prevents same problems as BCNF, except for FDs where  1NF: every field contains only atomic values, i.e., no
A is part of some key lists or sets
 Consider FD XA where X is no superkey, but A is part of some
key  2NF: 1NF, and all attributes that are not part of any
 E.g., Reserves(S, B, D, C) with only key SBD and FDs SC and candidate key are functionally dependent on the
CS is in 3NF whole of every candidate key
• Notice: same example as before, but adding CS made it 3NF
• Why? Since CS and SBD is a key, CBD is also a key. Hence for SC, C is  3NF implies 2NF
part of a key
• Redundancy problem: for each reservation of sailor S, same (S, C) pair  4NF: prevents redundancy from multi-valued
is stored. dependencies (see book)
 BCNF did not suffer from this redundancy problem.
 5NF: addresses redundancy based on join
 So, why do we need 3NF? Let’s look at decompositions
first. dependencies, which generalize multi-valued
dependencies (see book)
15 16

Decomposition of a Relation Schema Example Decomposition


 Suppose relation R contains attributes A1,..., An. A  Decompositions should be used only when needed.
decomposition of R replaces R by two or more  Let SNLRWH have FDs SSNLRWH and RW
relations such that:  Second FD causes violation of 3NF
• W values repeatedly associated with R values.
 Each new relation schema contains a subset of the
 Easiest fix: create a relation RW to store these associations
attributes of R (and no attributes that do not appear in R),
and remove W from the main schema:
and
• I.e., we decompose SNLRWH into SNLRH and RW
 Every attribute of R appears as an attribute of at least one
 Each SNLRWH tuple will now be projected into two
of the new relations.
tuples, SNLRH and RW, each stored in the
 Intuition: decomposing R means we will store corresponding relation
instances of the relation schemes produced by the  Are there any potential problems with this approach?
decomposition, instead of instances of R.
17 18
Problems with Decompositions Reconstructing A Relation
A B B C
 Three potential problems to consider: 1 2 2 3
 Some queries become more expensive. Decomposition
4 5 5 6
• E.g., how much did sailor Joe earn? (salary = W*H) Original table 7 2 2 8
 Given instances of the decomposed relations, we may not
A B C
be able to reconstruct the corresponding instance of the
original relation. 1 2 3
4 5 6 A B C
• Fortunately, not the case in the SNLRWH example.
7 2 8 1 2 3 Joined back together
 Checking some dependencies may require joining the
instances of the decomposed relations. 4 5 6
• Fortunately, not the case in the SNLRWH example. 7 2 8
What went wrong? 1 2 8
 Tradeoff: Must consider these issues vs. redundancy.
7 2 3
19 20

A B
Lossless Join Decompositions More on Lossless Join 1 2
A B C 4 5
 Decomposition of R into X and Y is lossless-join w.r.t.  The decomposition of R 1 2 3 7 2
a set of FDs F if, for every instance r that satisfies F: into X and Y is lossless-join 4 5 6
 X(R) ⋈ Y(R) = R w.r.t. F if and only if the B C
7 2 8
closure of F contains: 2 3
 It is always true that R  X(R) ⋈ Y(R)
 X  Y  X, or
5 6
 In general, the other direction does not hold.
2 8
 If it does, the decomposition is lossless-join.  XYY A B C
 Definition extended to decomposition into three or  Special case: 1 2 3
more relations in a straightforward way.  For FD U  V, the 4 5 6
decomposition of R into UV 7 2 8
 It is essential that all decompositions used to deal
and R  V is lossless-join. 1 2 8
with redundancy be lossless. Why? 7 2 3
21 22

Dependency Preserving Decompositions


Dependency-Preserving Decomposition
(Contd.)
 Consider CSJDPQV, C is key, JPC and SDP.  Decomposition of R into X and Y is dependency-
 BCNF decomposition: CSJDQV and SDP preserving if (FX  FY)+ = F+
 Problem: Checking JPC now requires a join.  I.e., if we consider only dependencies in the closure F+ that
 Dependency-preserving decomposition (intuition): can be checked in X without considering Y, and in Y
 Can enforce all FDs by examining a single relation instance on without considering X, these imply all dependencies in F+.
each insertion or modification of a tuple (do not need to join  Important to consider F+, not F, in this definition:
multiple relation instances)
 ABC, AB, BC, CA, decomposed into AB and BC.
 Formal definition requires notion of a projection of a set
 Is this dependency preserving? Is CA preserved?
of FDs F over R:
 If R is decomposed into X and Y, the projection of F onto X  Dependency preserving does not imply lossless join:
(denoted FX) is the set of all FDs UV in F+ (closure of F ) such  ABC, AB, decomposed into AB and BC.
that U and V both are in X.
 And vice-versa. (Example?)
23 24
Decomposition into BCNF BCNF and Dependency Preservation
 Consider relation R with FDs F. If XY violates BCNF,  In general, there may not be a dependency-preserving
decompose R into RY and XY. decomposition into BCNF.
 Repeated application of this idea will give us a collection of  E.g., CSZ with CSZ and ZC
relations that are in BCNF  Not in BCNF, but cannot decompose while preserving CSZ.
• Lossless join decomposition and guaranteed to terminate.
 Similarly, decomposition of CSJDQV into SDP, JS and
 E.g., CSJDPQV, key C, JPC, SDP, JS
CJDQV is not dependency preserving (w.r.t. the FDs
 To deal with SDP, decompose into SDP and CSJDQV.
JPC, SDP and JS). Why?
 To deal with JS, decompose CSJDQV into JS and CJDQV.
 Note: adding relation JPC gives us a dependency-preserving
 In general, several dependencies may cause violation decomposition into BCNF.
of BCNF. The order in which we ``deal with’’ them • Problem: redundancy across relations. Each relation by itself is in
could lead to very different sets of relations. BCNF (i.e., no redundancy within relation), but JPC’s tuples can be
obtained by joining CSJDQV and SDP.
25 26

Decomposition into 3NF Minimal Cover for a Set of FDs


 Algorithm for lossless-join decomposition into BCNF  Minimal cover G for a set of FDs F:
can be used to obtain a lossless-join decomposition  Closure of F = closure of G.
into 3NF (typically, can stop earlier).  Right hand side of each FD in G is a single attribute.
 To ensure dependency preservation, one idea:  If we modify G by deleting an FD or by deleting attributes
 If XY is not preserved, add relation XY. from an FD in G, the closure changes.
 Problem is that XY may violate 3NF.  Intuitively, every FD in G is needed, and ``as small as
 What can we do then? possible’’ in order to get the same closure as F.
 E.g., AB, ABCDE, EFGH, ACDFEG has the
 Refinement: Instead of the given set of FDs F, work following minimal cover:
with the minimal cover for F.  AB, ACDE, EFG and EFH
27 28

Dependency-Preserving Decomposition
Finding The Minimal Cover
into 3NF
 Using minimal cover F of given FD set, we can now achieve a lossless-join,
 F = {AB, ABCDE, EFGH, ACDFEG} dependency-preserving decomposition into 3NF.
 Decomposition to have single attribute on right side 1. Lossless-join decomposition until all smaller relations are in 3NF
2. For each FD XA in F that is not preserved, add relation XA
 AB, ABCDE, EFG, EFH, ACDFE, ACDFG  Result is lossless-join (X is superkey of XA) and dependency-preserving
(obviously), but is it still in 3NF?
 Check if any attribute on left side can be deleted  All relations after step 1 are in 3NF, but what about XA?
 XA is not a problem for 3NF because X is a superkey of XA
without changing closure  What if another FD on XA is a problem for 3NF?
 AB, ABCDE, EFG, EFH, ACDFE, ACDFG •

Any FD on XA can only contain attributes from X{A}
If right-hand side of FD in FXA contains A, left must be X (otherwise XA would not have
been in minimal cover)
 Delete FDs that are implied by others • If right-hand side does not contain A, it must be a subset of X, i.e., is a subset of a key
• Why is X a key? It is a superkey, but is it minimal?
 AB, ACDE, EFG, EFH, ACDE, ACDFG • Yes: if X’X was a key, then XA would not have been in the minimal cover and X’A
would have been there
• ACDFG from ACDE, EFG  Why not use the same algorithm for lossless-join, dependency –
preserving decomposition into BCNF?

29 30
Update on DB Design Process Refining Entity Sets
 Create ER diagram  Consider Hourly_Emps(ssn, name, lot, rating,
hourly_wages, hours_worked)
 Translate ER diagram into set of relations  FDs: SSNLRWH and RW
 Check relations for redundancy problems (not in  Assume designer created entity set Hourly_Emps as
above
3NF, BCNF)  Redundancy problem with RW
 Perform decomposition to fix problems  Could not discover it in ER diagram (only shows primary key
constraints)
 Update ER diagram  To fix redundancy problem, create new entity set
Wage_Table(rating, hourly_wages)
 Add relationship to connect Hourly_Emps2(S, N, L, H) and
Wage_Table(R, W)
 Similar for refining of relationship sets (see book)
31 32

Identifying Entity Attributes Summary of Schema Refinement


Before:
 1st diagram translated
 Workers(S,N,L,D,S) name
since
dname
 If a relation is in BCNF, it is free of redundancies that
 Departments(D,M,B) ssn lot did budget can be detected using FDs. Thus, trying to ensure
 Lots associated with
workers. that all relations are in BCNF is a good heuristic.
 Suppose all workers in a Employees Works_In Departments
dept are assigned the  If a relation is not in BCNF, we can try to decompose
same lot: didlot
 Redundancy! it into a collection of BCNF relations.
 Fixed by: After:  Must consider whether all FDs are preserved. If a lossless-
 Workers2(S,N,D,S)
 Dept_Lots(D,L) since
budget join, dependency preserving decomposition into BCNF is
 Departments(D,M,B) name dname not possible (or unsuitable, given typical queries), consider
 Can fine-tune this: ssn did lot decomposition into 3NF.
 Workers2(S,N,D,S)
 Departments(D,M,B,L)  Decompositions should be carried out and/or re-examined
Employees Works_In Departments
while keeping performance requirements in mind.
33 34

You might also like