This document discusses schema refinement in database design. It begins by explaining that the initial conceptual database design must be refined to account for integrity constraints, performance criteria, and typical workloads. Redundancy in data storage can lead to problems like inconsistent updates and anomalies in insertion and deletion. Decomposition of relations and functional dependencies are introduced as techniques to address redundancy and other issues. The document provides examples and definitions related to functional dependencies, normal forms, and attribute closure in determining an optimal database schema.
Download as DOC, PDF, TXT or read online on Scribd
100%(2)100% found this document useful (2 votes)
1K views
Dbms Unit-5 Notes
This document discusses schema refinement in database design. It begins by explaining that the initial conceptual database design must be refined to account for integrity constraints, performance criteria, and typical workloads. Redundancy in data storage can lead to problems like inconsistent updates and anomalies in insertion and deletion. Decomposition of relations and functional dependencies are introduced as techniques to address redundancy and other issues. The document provides examples and definitions related to functional dependencies, normal forms, and attribute closure in determining an optimal database schema.
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 27
A Helpful Hand
DATABASE MANAGEMENT SYSTEMS
UNIT V SCHEMA REFINEMENT
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CSEROCKZ DATABASE MANAGEMENT SYSTEMS UNIT-V INTRODUCTION TO SCHEMA REFINEMENT
Conceptual database design gives us a set of relation schemas and integrity constraints (ICs) that can be regarded as a good starting point for the final database design. This initial design must be refined by taking the ICs into account and also by considering performance criteria and typical workloads. A major aim of relational database design is to minimie data redundancy. The problems associated with data redundancy are illustrated as follows! Problems caused by redudacy "toring the same information in more than one place within a database is called redundancy and can lead to several problems! Reduda! S!ora"e# "ome information is stored repeatedly. U$da!e Aomal%es# If one copy of such repeated data is updated# an inconsistency is created unless all copies are similarly updated. Iser!%o Aomal%es# It may not be possible to store certain information unless some other# unrelated# information is stored as well. Dele!%o Aomal%es# It may not be possible to delete certain information without losing some other# unrelated# information as well. E&# Consider a relation# $ourly%&mps(ssn, name, lot, rating, hourly_wages, hours_worked) The key for $ourly%&mps is ssn. In addition# suppose that the hourly_wages attribute is determined by the rating attribute. That is# for a given rating value# there is only one permissible hourly_wages value. This IC is an e'ample of a functional dependency. It leads to possible redundancy in the relation $ourly%&mps# as shown below! ssn name lo t ratin g hourly%wages hours%worked '''( C S E R O C K Z ( C O M Page 27 DATABASE MANAGEMENT SYSTEMS UNIT-V ()* Adithya +, , -. +. (*) /evesh 00 , -. 1. (*+ Ayush "oni 1( ( * 1. (2* 3ajasekha r 1( ( * 10 (c- "unil 1( , -. +. If the same value appears in the rating column of two tuples# the IC tells us that the same value must appear in the hourly_wages column as well. This redundancy has the following problems! Redundant Storage: The rating value , corresponds to the hourly%wage -.# and this association is repeated three times. Update Anomalies: The hourly_wage in the first tuple could be updated without making a similar change in the second tuple. Insertion Anomalies: 4e cannot insert a tuple for an employee unless we know the hourly_wage for the employee5s rating value. Deletion Anomalies: If we delete all tuples with a given rating value (e.g.# we delete the tuples for Ayush "oni and 3ajasekhar) we lose the association between that rating value and its hourly_wage value. Null )alues 6ull values cannot provide a complete solution# but they can provide some help. Consider the e'ample $ourly%&mps relation. $ere null values cannot help to eliminate redundant storage# update or deletion anomalies. It appears that they can address insertion anomalies. 7or instance# we can insert an employee tuple with null values in the hourly wage field. $owever# null values cannot address all insertion anomalies. Thus# null values do not provide a general solution to the problems of redundancy# even though they can help in some cases. '''( C S E R O C K Z ( C O M Page 27 DATABASE MANAGEMENT SYSTEMS UNIT-V Decom$os%!%os A decom$os%!%o o* rela!%o sc+ema R consists of replacing the relation schema by two or more relation schemas that each contain a subset of the attributes of R and together include all attributes in R. E&# 4e can decompose $ourly%&mps into two relations! $ourly%&mps0(ssn, name, lot, rating, hours_worked) 4ages(rating, hourly_wages) Problems Rela!ed !o Decom$os%!%o Two important 8uestions must be asked during decomposition process! -. /o we need to decompose a relation9 0. 4hat problems does a given decomposition cause9 To answer a first 8uestion# several normal forms have been proposed for relations. If a relation schema is in one of these normal forms# we know that certain kinds of problems cannot arise. 4ith respect second 8uestion# two properties of decomposition are to be considered! The lossless!oin property enables us to recover any instance of the relation of the decomposed relation from corresponding instances of the smaller relations. The dependencypreser"ation property enables us to enforce any constraint on the original relation by simply enforcing some constraints on each of the smaller relations. That is# we need not perform joins of the smaller relations to check whether a constraint on the original relation is violated. '''( C S E R O C K Z ( C O M Page 27 DATABASE MANAGEMENT SYSTEMS UNIT-V Fuc!%oal De$edec%es A *uc!%oal de$edecy (7/) is a kind of IC that generalies the concept of a key. :et 3 be a relation schema and let ; and < be nonempty sets of attributes in 3. 4e say that an instance r of 3 satisfies the 7/ ;=< - if the following holds for every pair of tuples t- and t0 in r! If t-.; > t0.;# then t-.< > t0.<. An 7/ ;=< says that if two tuples agree on the values in attributes ;# they must also agree on the values in attributes <. E&# The 7/ A?=C is satisfied by the following instance! A ? C / a- b- c- d- a- b- c- d0 a- b0 c0 d- a0 b- c1 d- $ere# if we add a tuple @a-# b-# c0# d-A to the instance shown in figure# the resulting instance would violate the 7/. Reaso%" abou! FDs 1 X Y is read as X functionally determines Y, or si!"# as X determines Y. '''( C S E R O C K Z ( C O M Page 27 DATABASE MANAGEMENT SYSTEMS UNIT-V Biven a set of 7/s over a relation schema 3# typically several additional 7/s hold over 3 whenever all of the given 7/s hold. E&# Consider 4orkers(ssn, name, lot, did, since) 4ith given 7/s ssn # did and did# lot. Then# in any legal instance of 4orkers# if two tuples have the same ssn value# they must have the same did value# and because they have the same did value# they must also have the same lot value. Therefore# the 7/ ssn# lot also holds on 4orkers. Closure o* a Se! o* FDs The set of all 7/s implied by a given set $ of 7/s is called the closure o* F, denoted as $%. The closure $% can be calculated by using the following Arms!ro"-s A&%oms rules. :et ;# <# and C be the sets of attributes over a relation schema 3! Re*le&%.%!y# If ; is a super set of <# then ;=<. Au"me!a!%o# If ;=<# then ;C=<C for any C. Tras%!%.%!y# If ;=< and <=C# then ;=C. U%o# If ;=< and ;=C# then ;=<C. Decom$os%!%o# If ;=<C# then ;=< and ;=C. E&/# Consider a relation schema A?C with 7/s A=? and ?=C. 7rom transitivity# we get A=C. 7rom augmentation# we get AC=?C# A?=AC# A?=C?. E&0# '''( C S E R O C K Z ( C O M Page 27 DATABASE MANAGEMENT SYSTEMS UNIT-V Contracts(contractid, supplierid, pro!ectid, deptid, partid, &ty, "alue) 4e denote the schema for Contracts as 'S(D)*+. The following are the given 7/s! i, '# 'S(D)*+. ii, ()#'. iii, SD#). "everal additional 7/s hold in the closure of the set of given 7/s! 7rom ()#' and '#'S(D)*+, and transitivity# we infer ()#'S(D)*+. 7rom SD#) and augmentation# we infer SD(# (). 7rom SD(#(), ()#'S(D)*+, and transitivity# we infer SD(#'S(D)*+. Note: In a !r%.%al FD, the right side contains only attributes that also appear on the left side. Dsing refle-i"ity, we can generate all trivial dependencies# which are of the form! ;=<# where < is a subset of ;# ; is a subset of A?C# and < is a subset of A?C. 7rom augmentation# we get the nontrivial dependencies. A!!r%bu!e Closure '''( C S E R O C K Z ( C O M Page 27 DATABASE MANAGEMENT SYSTEMS UNIT-V If we want to check whether a given dependency# say# ;=<# is in the closure $%, we can do so efficiently without computing $%. 4e first compute the a!!r%bu!e closure 12 with respect to 7# which is the set of attributes A such that ;=A can be inferred using Armstrong A'ioms. The algorithm for computing the attribute closure of a set ; of attributes is shown below! closure . /0 repeat until there is no change! E if there is an 7/ U#+ in $ such that U 1 closure, then set closure . closure D F G NORMA3IZATION '''( C S E R O C K Z ( C O M Page 27 DATABASE MANAGEMENT SYSTEMS UNIT-V 2ormali3ation is a techni8ue for producing a set of relations with desirable properties# given the data re8uirements of an enterprise. The process of normaliation was first developed by &.7.Codd. 2ormali3ation is often performed as a series of tests on a relation to determine whether it satisfies or violates the re8uirements of a given normal form. Ad.a!a"es o* Normal%4a!%o# :ess storage space 3educes data redundancy in a database It eliminates serious manipulation anomalies. 7le'ible structure Normal Forms Biven a relation schema# we need to decide whether it is a good design or we need to decompose it into smaller relations. To make such a decision# several ormal *orms have been proposed. The normal forms based on 7/s are first normal form 452$,, second normal form462$,, third normal form472$,, and 8oyce'odd normal form 48'2$,. F%rs! Normal Form 5/NF6# '''( C S E R O C K Z ( C O M Page 27 DATABASE MANAGEMENT SYSTEMS UNIT-V A relation in which the intersection of each row and column contains one and only one value. E&am$le# &mp6um &mpHhone &mp/egrees --- .+.I 01,+.--0 000 .+.I 012,*)(+ E ?A# ?"c# Hh/ G 111 .+.I 01+()*,2 E ?"c# J"c G Identification of multi"alued attri9utes: The above relation in o! in -67 as &mp/egrees is a multiIvalued field! employee 111 has two degrees! 8Sc and :Sc employee 000 has three degrees! 8A, 8Sc, )hD ;ransformation into 52$: In order to make this relation to be in -67# we remove the repeating group (&mp/egrees details) by placing the repeating data along with a copy of the original key attribute ( &mp6um) in a separate relation. The format of the resulting -67 relations are as follows! &mployee( &mp6um# &mpHhone) &mployee/egree(&mp6um# &mp/egrees) &mp6um &mpHhone '''( C S E R O C K Z ( C O M Page 27 DATABASE MANAGEMENT SYSTEMS UNIT-V --- .+.I01,+.--0 000 .+.I012,*)(+ 111 .+.I01+()*,2 Secod Normal Form 50NF6# Par!%al De$edecy# A partial dependency e'ists when an attribute ? is functionally dependent on an attribute A# and A is a component of a multipart candidate key. 0NF# A relation is in 067 if it is in -67# and every nonIkey attribute is fully dependent on each candidate key. (That is# we don5t have any partial functional dependency.) A relation in 067 will not have any partial dependencies. E&am$le# Consider this I.3%e table (in -67)! Inv6um :ine6um Hrod6um Kty Inv/ate &mp6um &mp/egrees 000 ?A 000 ?"c 000 Hh/ 111 ?"c 111 J"c '''( C S E R O C K Z ( C O M Page 27 DATABASE MANAGEMENT SYSTEMS UNIT-V The above relation has the following 7/s! EInv6um# :ine6umG=Hrod6um# Kty EInv6um# Hrod6umG=:ine6um# Kty Inv6um=Inv/ate Identification of partial dependencies: Inv:ine is o! 0NF since there is a partial dependency of Inv/ate on Inv6um. ;ransformation into 62$: 4e can impro"e the database by decomposing the above relation into relations! Inv6um :ine6um Hrod6um Kty Inv6um Inv/ate T+%rd Normal Form 57NF6# Tras%!%.e de$edecy# A condition where A# ?# and C are attributes of a relation such that if A =? and ?=C# then C is transitively dependent on A via ?. 7NF# A relation that is in first and second normal form# and in which no nonIkey attribute is transitively dependent on the candidate key. In other words# A relation in 167 will not have any transitive dependencies of nonIkey attribute on a candidate key through another nonI key attribute. '''( C S E R O C K Z ( C O M Page 27 DATABASE MANAGEMENT SYSTEMS UNIT-V 7ormally it can be given as! :et 3 be relation schema# 7 be the set of 7/s given to hold over 3# then for every 7/ ;=A in 7# one of the following is true! -) ; is a candidate key or 0) A is part of the key. E&am$le# Consider an Em$loyee relation! &mp6um &mp6ame /ept6um /ept6ame This relation has the following 7/s! &mp6um=&mp6ame# /ept6um# /ept6ame /ept6um=/ept6ame Identifying transiti"e dependencies: &mp6ame# /ept6um# and /ept6ame are nonIkey attributes. /ept6um determines /ept6ame# a nonIkey attribute# and /ept6um is not a candidate key. ;ransformation into 72$: Is the relation in 1679 L no Is the relation in 0679 L yes 4e correct the situation by decomposing the original relation into two 167 relations! '''( C S E R O C K Z ( C O M Page 27 DATABASE MANAGEMENT SYSTEMS UNIT-V 8oyce9Codd Normal Form 58CNF6# 6amed after its inventors! ?oyce and Codd I "tronger than 167 De!erm%a!# 3efers to the attribute or group of attributes on the left hand side of the arrow of a functional dependency. &'! Consider an 7/# &mp6um=&mp&mail. $ere# &mp6um is a determinant of &mp&mail. 8CNF# A relation is in ?C67# if and only if# every determinant is a candidate key. 7ig! &.7.Codd &mp6um &mp6ame /ept6um /ept6um /ept6ame '''( C S E R O C K Z ( C O M Page 27 DATABASE MANAGEMENT SYSTEMS UNIT-V 7ormally it can be given as! :et 3 be a relation schema# 7 be the set of 7/s given over 3# then for every 7/ ; =A in 7# ; should be a candidate key. E&am$le# '''( C S E R O C K Z ( C O M Page 27 DATABASE MANAGEMENT SYSTEMS UNIT-V ;ransformation into 8'2$: Note: any relation that is in ?C67# is in 167 any relation in 167 is in 067 any relation in 067 is in -67 There is a se8uence to normal forms! -67 is considered the weakest 067 is stronger than -67 167 is stronger than 067# and ?C67 is considered the strongest '''( C S E R O C K Z ( C O M Page 27 DATABASE MANAGEMENT SYSTEMS UNIT-V Mul!%9)alued De$edec%es 5M)D6 The possible e'istence of multiIvalued dependencies in a relation is due to first normal form (-67)# which disallows an attribute in a tuple from having a set of values. 7or e'ample# if we have two multiIvalued attributes in a relation# we have to repeat each value of one of the attributes with every value of the other attribute# to ensure that tuples of a relation are consistent. This type of constraint is referred to as a multiIvalued dependency and results in data redundancy. Consider the <mployee relation which is not in -67 shown in figure! &mp6um &mpHhone &mp/egrees --- E .+.I000000# .+.I111111G E ?A# ?"cG The result of -67 on above relation is shown below! &mp6um &mpHhone &mp/egrees --- .+.I000000 ?A --- .+.I111111 ?"c --- .+.I000000 ?"c --- .+.I111111 ?A This relation records the <mp)hone and <mpDegrees details of an employee ---. $owever# the <mpDegrees of an employee are independent of <mp)hone. This constraint results in data redundancy and is referred to as multi"alued dependency. '''( C S E R O C K Z ( C O M Page 27 DATABASE MANAGEMENT SYSTEMS UNIT-V Mul!%9)alued De$edecy 5M)D6# 3epresents a dependency between attributes (for e'ample# A# ?# and C) in a relation# such that for each value of A there is a set of values for ? and a set of values for C. $owever# the set of values for ? and C are independent of each other. 4e represent an JF/ between attributes A# ?# and C in a relation using the following notation! A == ? A == C For e&am$le, we specify the JF/ in the above <mployee relation as follows! <mp2um ## <mp)hone <mp2um ## <mpDegrees 7ormally# an JF/ can be defined as! :et 3 be a relation schema# and ; and < be disjoint subsets of 3# and C > 3M;<. If a relation 3 satisfies ; ==< 0 # the following must be true for every legal instance of r of R! if for any two tuples t-# t0 and t-(;) > t0(;)# then there e'ist t1 in r such that t1(;) > t-(;)# t1(<) > t-(<)# t1(C) > t0(C). ?y symmetry# there e'ist t+ in r such that# t+(;) > t-(;)# t+(<) > t0(<)# t+(C) > t-(C). 2 $%%Y &a' (e read as X multi-determines Y. ; < C '- y- - t- '- y0 0 t0 '- y- 0 t1 '- y0 - t+ '''( C S E R O C K Z ( C O M Page 27 DATABASE MANAGEMENT SYSTEMS UNIT-V
The JF/ ;== < says that the relationship between ; and < is independent of the relationship between ; and 3M <. Arms!ro" A&%oms rules rela!e !o M)Ds# M)D Com$leme!a!%o# If ;==<# then ;==3M;<. M)D Au"me!a!%o# If ;==< and 4 be super set of C# then 4;==<C. M)D Tras%!%.%!y# If ;==< and <==C# then ;==(CM<). Tr%.%al M)D# An JF/ A==? in relation 3 is defined as being tri"ial if# a) ? is a subset of A or b) A D ? > 3. An JF/ is said to be nontri"ial if neither a) nor b) is satisfied. '''( C S E R O C K Z ( C O M Page 27 DATABASE MANAGEMENT SYSTEMS UNIT-V Four!+ Normal Form 5:NF6# A relation that is in ?oyceICodd normal form and contains no nonItrivial JF/s. 7ormally# :et 3 be a relation schema# ; and < be nonempty subsets of the attributes of 3. 3 is said to be in :NF, if# for every JF/ ;==< that holds over 3# one of the following is true! < is a subset of ; or ;< > 3# or ; is a super key. E&am$le# Consider the above <mployee relation. Identifying nontri"ial :+D: The <mployee is not in +67 because of the presence of nonItrivial JF/. ;ransformation into =2$: 4e decompose the <mployee relation into <mp5 and <mp6 relations as shown below! <mp5 <mp6 &mp6um &mpHhone --- .+.I 000000 --- .+.I 111111 ?oth new relations are in +67 because the <mp5 relation contains the trivial JF/ <mp2um##<mp)hone, and the <mp6 relation contains the trivial JF/ <mp2um##<mpDegrees. &mp6um &mp/egrees --- ?A --- ?"c '''( C S E R O C K Z ( C O M Page 27 DATABASE MANAGEMENT SYSTEMS UNIT-V Pro$er!%es o* Decom$os%!%o 3ossless9;o% Decom$os%!%o# :et 3 be a relation schema and let 7 be a set of 7/s over 3. A decomposition of 3 into two schemas with attribute sets ; and < is said to be a lossless9<o% decom$os%!%o =%!+ res$ec! !o F if# for every instance r of 3 that satisfies the dependencies in 7# N ' (r) N y (r) > r. In other words# we can recover the original relation from the decomposed relations. 7rom the definition it is easy to see that r is always a subset of natural join of decomposed relations. If we take projections of a relation and recombine them using natural join# we typically obtain some tuples that were not in the original relation. E&am$le# ?y replacing the instance r shown in figure with the instances N "H (r) and N H/ (r)# we lose some information.
Instance r > S) (r) N H/ (r) " H / s- p- d- s0 p0 d0 s1 p- d1 " H s- p- s0 p0 s1 p- H / p- d- p0 d0 p- d1 '''( C S E R O C K Z ( C O M Page 27 DATABASE MANAGEMENT SYSTEMS UNIT-V N "H (r) N H/ (r) 7ig! Instances illustrating :ossy /ecompositions T+eorem# :et 3 be a relation and 7 be a set of 7/s that hold over 3. The decomposition of 3 into relations with attribute sets 3- and 30 is lossless if and only if 7 O contains either the 7/ 3- P 30 = 3- (or 3-M30) or the 7/ 3- P 30 = 30 (or 30M3-). Consider the $ourly%&mps relation. It has attributes S2?R@A, and the 7/ 3=4 causes a violation of 167. 4e dealt this violation by decomposing the relation into S2?RA and R@. "ince 3 is common to both decomposed relations and 3=4 holds# this decomposition is losslessIjoin. De$edecy9Preser.%" Decom$os%!%o# Consider the Contracts relation with attributes 'S(D)*+. The given 7/s are C=C"Q/HKF# QH=C# and "/=H. ?ecause "/ is not a key# the dependency "/=H causes a violation of ?C67. 4e can decompose Contracts into relations with schemas C"Q/KF and "/H to address this violation. The decomposition is losslessIjoin. ?ut# there is one problem. If we want to enforce an integrity constraint QH=C# it re8uires an e'pensive join of the two relations. 4e say that this decomposition is not dependencyIpreserving. :et 3 be a relation schema that is decomposed into two schemas with attributes sets ; and <# and let 7 be a set of 7/s over 3. The $ro<ec!%o o* F o 1 is the set " H / s- p- d- s0 p0 d0 s1 p- d1 s- p- d1 s1 p- d- '''( C S E R O C K Z ( C O M Page 27 DATABASE MANAGEMENT SYSTEMS UNIT-V of 7/s in the closure 7 O that involve only attributes in ;. 4e denote the projection of 7 on attributes ; as 7 ; . 6ote that a dependency D=F in 7 O is in 7 ; only if all the attributes in D and F are in ;. The decomposition of relation schema 3 with 7/s 7 into schemas with attribute sets ; and < is de$edecy9$reser.%" if (7 ; D 7 < ) O > 7 O . E&am$le# Consider the relation 3 with attributes A?C is decomposed into relations with attributes A? and ?C. The set of 7/s over 3 includes A=?# ?=C# and C=A. The closure of 7 contains all dependencies in 7 plus A=C# ?=A# and C=?. Conse8uently 7 A? contains A=? and ?=A# and 7 ?C contains ?=C and C=?. Therefore# 7 A? D 7 ?C contains A=?# ?=C# ?=A and C=?. The closure of 7 A? and 7 ?C now includes C=A (which follows from C=? and ?=A). Thus the decomposition preserves the dependency C=A. '''( C S E R O C K Z ( C O M Page 27 DATABASE MANAGEMENT SYSTEMS UNIT-V E1ERCISES 0) i) The following relation is in 167# but not in ?C679 &'plain why and take an appropriate action9 '''( C S E R O C K Z ( C O M Page 27 DATABASE MANAGEMENT SYSTEMS UNIT-V ii) If we reverse the direction between staff%id and class%code# then determine whether it is violating 067# 167# ?C679 If yes# perform the necessary action. 1) 6ormalie the following relation! +) Biven a relation with schema EI/# 6ame# Address# Hostcode# CardType# Card6umberG# the candidate key EI/G# and the following functional dependencies! R EI/G E6ame# Address# Hostcode# CardType# Card6umberG R EAddressG EHostcodeG R ECard6umberG ECardTypeG (i) &'plain why this relation is in second normal form# but not in third normal form. '''( C S E R O C K Z ( C O M Page 27 DATABASE MANAGEMENT SYSTEMS UNIT-V (ii) "how how this relation can be converted to third normal form. <ou should show what functional dependencies are being removed# e'plain. SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS A:: T$& ?&"T EMAI3# cseroc>4?@A"ma%l(com '''( C S E R O C K Z ( C O M Page 27 DATABASE MANAGEMENT SYSTEMS UNIT-V
www.cserockz.com Kee$ 'a!c+%" *or Re"ular U$da!esB(CC '''( C S E R O C K Z ( C O M Page 27