0% found this document useful (0 votes)
10 views

My Normalization Chapter

Uploaded by

deepanshu rawat
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

My Normalization Chapter

Uploaded by

deepanshu rawat
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 76

Chapter 10

Functional Dependencies and


Normalization for Relational Databases
Prepared by : Ms. SHWETA WADHERA
Dept of Computer Science
DDUC
Chapter Outline

 1 Informal Design Guidelines for Relational Databases


 1.1Semantics of the Relation Attributes
 1.2 Redundant Information in Tuples and Update
Anomalies
 1.3 Null Values in Tuples
 1.4 Spurious Tuples
 2 Functional Dependencies (FDs)
 2.1 Definition of FD
Chapter Outline (contd.)

 3 Normal Forms Based on Primary Keys


 3.1 Normalization of Relations
 3.2 Practical Use of Normal Forms
 3.3 Definitions of Keys and Attributes Participating in
Keys
 3.4 First Normal Form
 3.5 Second Normal Form
 3.6 Third Normal Form
 4 BCNF (Boyce-Codd Normal Form)
What is relational The grouping of attributes
to form "good" relation
database design? schemas

The logical
Two levels of relation "user view" level

schemas The storage


"base relation" level
Informal
Design Design is concerned mainly with base
Guidelines relations
for Relational
Databases(1)
What are the criteria for "good" base
relations?
Informal Design Guidelines for Relational
Databases(2)

We first discuss informal guidelines for good relational


design
Then we discuss formal concepts of functional
dependencies and normal forms
- 1NF (First Normal Form)
- 2NF (Second Normal Form)
- 3NF (Third Normal Form)
- BCNF (Boyce-Codd Normal Form)

Additional types of dependencies, further normal


forms, relational design algorithms by synthesis are
discussed in Chapter 11
 GUIDELINE 1: Informally, each tuple in a relation should
represent one entity or relationship instance. (Applies to
individual relations and their attributes).

 Attributes of different entities


(EMPLOYEEs, DEPARTMENTs, PROJECTs) should not be

Problem 1 mixed in the same relation


 Only foreign keys should be used to refer to other
1.1 entities
Semantics  Entity and relationship attributes should be kept apart
as much as possible.
of the
Relation  Bottom Line: Design a schema that can be explained
Attributes easily relation by relation. The semantics of attributes
should be easy to interpret.
Figure 10.1 A
simplified
COMPANY
relational
database
schema

Note: The above figure is now called


Figure 10.1
 Grouping attributes into relations has a significant
effect on storage space.( as in EMP_DEPT)
 Compare the space used by two base relations
EMPLOYEE and DEPARTMENT (Pg 343) with
EMP_DEPT.
 In EMP_DEPT attribute values pertaining to a
Problem 2 particular department( Dnum, Dname ,Dmgr_ssn)
1.2 are repeated for every employee who works for
that department.
Redundant
Information in
Tuples and  Information is stored redundantly wasting storage
Update  So, mixing attributes of multiple entities may cause
Anomalies problems
 Another serious problem with using the relations
of fig 14.4 as base relations is the
PROBLEM OF UPDATE ANOMALIES .

 Update anomalies can be classified into …..


UPDATE Insertion anomalies
ANOMALIES Deletion anomalies
Modification anomalies
EXAMPLE OF AN UPDATE ANOMALY (1)

 Consider the relation:


 EMP_PROJ ( Emp#, Proj#, Ename, Pname, No_hours)

 Update Anomaly:
Changing the name of project number P1 from
“Billing” to “Customer-Accounting” may cause this update
to be made for all 100 employees working on project P1.
Insertion anomalies can be differentiated into two types,
illustrated by the following examples based on ….. The
EMP_DEPT relation :

CASE 1:
 To insert a NEW EMP TUPLE into EMP_DEPT, we will have
EXAMPLE OF
to include the attribute value for the dept in which the
employee works, OR NULLs( if the EMP is currently not
allocated any DEPT )
AN INSERTION
ANOMALY (1)  If the EMP is working in a DEPT ,then the DEPT data
(Dnum,Dname,Dmgr_ssn ) should be entered
CONSISTENTLY .

 On the Contrary…. If EMPLOYEE and DEPT are separate


tables then we do not have to worry about consistency or
NULL values.
 CASE 2 :
 It is difficult to insert a NEW DEPT that has no EMPLOYEE as
yet. The only way to do so is by adding a NULL value in
EMP_NO.
 This causes a PROBLEM ….
AS Emp_no is the PRIMARY KEY.
And each tuple is supposed to represent an EMPLOYEE
ENTITY.
EXAMPLE  Also when the first EMP will be added to this NEW DEPT , we
OF AN won’t need this tuple with NULL values.

INSERTION
ANOMALY
(2)
 On the Contrary…. If EMPLOYEE and DEPT are separate
tables then we do not have to worry about all this…. as DEPT is
a separate table in that scenario.
 The problem of deletion anomalies is related to the SECOND
INSERTION ANOMALY SITUATION.

 If we delete from EMP_DEPT an Employee tuple, which


happens to be the last emp working for a particular
department…..
EXAMPLE  THEN ….
OF A  The information pertaining to that DEPT is lost from the
DATABASE .
DELETION
ANOMALY
 On the Contrary …. If EMP and DEPT are separate
relations, then no such problem.
 In EMP_DEPT, if we change the value of one of
the attributes of a particular department- say, the
manager of DEPT num 5 ……

 Then we must UPDATE the tuples of all


EXAMPLE OF Employees who work in that DEPT num 5.
A
Otherwise, it can lead to
MODIFICATION
ANOMALY INCONSISTENCY IN THE DATABASE .
i.e the same DEPT will show TWO DIFFERENT
MANAGERS .
Figure 10.3 Two relation schemas suffering from update
anomalies

Note: The above figure is now called Figure 10.3


Figure 10.4 Example States for EMP_DEPT and EMP_PROJ

Note: The above figure is now called Figure 10.4


Guideline 2
 GUIDELINE 2 :
 Design a schema that does not suffer To Avoid
from the insertion, deletion and Redundant
update anomalies. If there are any
present, then note them so that Information in
applications can be made to take Tuples
them into account
and
Update Anomalies
 In some schema designs we may group many attributes
together into a “FAT” RELATION.

 If many of the attributes do NOT apply to ALL tuples in


the relation, we end up with MANY NULLs in the
Problem 3 tuples.
 This leads to waste of space.
1.3 Null Values
in Tuples  Another problem with NULL is how to account for
them…when aggregate operations like COUNT or SUM
are applied, results become unpredictable.

 NULL can have multiple interpretations …..


NULL can have multiple interpretations …..

 The attribute does NOT apply to this tuple.


 The attribute value for this tuple is unknown.
 The value is known, but absent ; that is, it has not been
recorded yet .

 Hence, as far as possible, AVOID placing attributes in a


base relation whose values may frequently be NULL.
GUIDELINE 3: Relations should be designed
such that their tuples will have as few NULL
values as possible GUIDELINE 3
 Attributes that are NULL frequently could
be placed in separate relations (with the
primary key)
To Avoid Null
 Reasons for nulls: Values in
 attribute not applicable or invalid
Tuples
 attribute value unknown (may exist)
 value known to exist, but unavailable
2.1 Functional Dependencies

 Functional dependencies (FDs) are used to specify formal


measures of the "goodness" of relational designs

 FDs and keys are used to define normal forms for relations

 FDs are constraints that are derived from the meaning and
interrelationships of the data attributes

 A set of attributes X functionally determines a set of attributes Y


if the value of X determines a unique value for Y
Functional Dependencies (2)

X  Y holds if whenever two tuples have the same value for X,


they must have the same value for Y

 For any two tuples t1 and t2 in any relation instance


r(R) : If t1[X]=t2[X], then t1[Y]=t2[Y]

 X  Y in R specifies a constraint on all relation instances r(R)

 Written as X  Y; can be displayed graphically on a relation schema


as in Figures. ( denoted by the arrow: ).

 FDs are derived from the real-world constraints on the attributes .


 Social security number determines employee
name
 SSN  ENAME

 Employee Eno and project number determines


Examples of the hours per week that the employee works on
the project
FD constraints  {ENo, PNUMBER}  HOURS

 project number determines project name and


location
 PNUMBER  {PNAME, PLOCATION}
Examples of FD constraints (2)

 An FD is a property of the attributes in the schema R

 The constraint must hold on every relation instance r(R)

 If K is a key of R , then K functionally determines all


attributes in R (since we never have two distinct tuples
with t1[K] = t2 [K] )
 A.1 Normalization of Relations

 A.2 Practical Use of Normal Forms

 A.3 Definitions of Keys and Attributes


Participating in Keys
NORMALIZATION
OF RELATIONS  A.4 First Normal Form

 A.5 Second Normal Form

 A.6 Third Normal Form


Normalization: The process of
decomposing unsatisfactory "bad"
relations into smaller relations by
reorganizing their attributes
3.1
Normalization Normal form: Condition using keys
of Relations and FDs of a relation to certify
whether a relation schema is in a
particular normal form
Normalization of Relations (2)

2NF, 3NF, BCNF based on keys and FDs of


a relation schema
4NF based on keys, multi-valued
dependencies : MVDs; 5NF based on
keys, join dependencies : JDs
Additional properties may be needed to
ensure a good relational design (lossless
join, dependency preservation; Chapter
11)
Normalization is carried out in practice so that the
resulting designs are of high quality and meet the
desirable properties

The practical utility of these normal forms becomes


questionable only when the constraints on which they
are based are hard to understand or to detect

A.2
Practical The database designers need not normalize to the
highest possible normal form. (usually up to 3NF, BCNF
Use of or 4NF)

Normal
Forms Denormalization: the process of storing the join of
higher normal form relations as a base relation—
which is in a lower normal form
Definitions of Keys and Attributes
Participating in Keys (2)

 If a relation schema has more than one key, each is called a


candidate key. One of the candidate keys is arbitrarily
designated to be the primary key, and the others are called
secondary keys.
 A Prime attribute must be a member of some candidate
key
 A Nonprime attribute is not a prime attribute—that is, it
is not a member of any candidate key.
A.2 First Normal Form

 Disallows composite attributes, multivalued attributes, and


nested relations; attributes whose values for an individual
tuple are non-atomic

 It states that the domain of an attribute must include only


ATOMIC (simple, indivisible ) values and that the value of
any attribute in a tuple must be a SINGLE VALUE .
 It was defined to disallow multivalued attributes, composite attributes
and their combinations .
 So, the only values permitted by 1NF are single atomic (or indivisible
values ).

 Consider Following example :-

Dname Dnumber Dmgrno Dlocations


Research D5 E55 { Banglore,Delhi,Hyderabad }
Admin D2 E12 { Banglore}
HR D1 E15 {Delhi ,Hyderabad }
Automation D3 E30 {Delhi }
IT D4 E35 {Delhi }
So, as we see, this is not in 1NF, bcoz Dlocations is not an atomic attribute, as we
see the table above.

3 main Techniques to achieve Ist Normal Form for such a relation :-

Solution 1 :
Remove the attribute Dlocations that violate 1NF and place it in a separate
relation Dept_locations, along with the Primary key Dnumber.

Department
Dname Dnumber Dmgreno Dlocations

Dname Dnumber Dmgreno Dnumber Dlocations


 Solution 2
Expand the key, so that there will be a separate tuple in the
original Department relation, for each location of a Department.

In this case the Primary Key would be {Dnumber ,Dlocation }

Dname Dnumber Dmgr_eno Dlocation


Research D5 E55 Bangalore
Research D5 E55 Delhi
Research D5 E55 Hyderabad
Admin D2 E12 Bangalore
HR D1 E15 Delhi
HR D1 E15 Hyderabad

But the disadvantage of this solution is introducing Redundancy


in the Relation .
Solution 3 : Introducing new attributes in same
relation

 If a maximum no. of values is known for the attribute, for


example, if it is already known that atmost 3 locations can
exist for one particular department---- then we can
replace the attribute Dlocations by 3 Atomic attributes :
dloc1, dloc2, dloc3 .
Dname Dnumber Dmgr_eno Dloc1 Dloc2 Dloc3

 Disadvantage with this solution :


We introduce Null values, if most departments have fewer
than three locations.

Among the 3 Solutions, the 1st solution is considered the best, because it
does not suffer from redundancy and is completely general.
The 1NF also does NOT allow attributes that are
composite

Eno Ename Projects


We see a Nested
Pno Hrs relation
E1 Mahesh P1 30 within each tuple,
P2 07 which is not
E2 Ramesh P3 10
allowed in 1NF
P10 05
P15 20
E5 Priya P10 20
P2 10
P15 05
P1 05
 Each tuple here represents an employee entity, and a
relation Projects(Pno,Hrs) is there within each tuple .

 So, to normalize it into 1NF, we remove the nested relation


attributes into a new relation and propagate the Primary
key into it.

 So, we remove the Projects attribute ( which was creating


problem) .
 Projects attribute has been brocken down into Pno,Hours

Eno Ename Eno Pno Hours

 Now, it is in 1NF .
Figure 10.8 Normalization into 1NF

Note: The above figure is now called Figure 10.8 in Edition 4


Figure 10.9
Normalization of
nested relations into 1NF

Note: The above figure is now called Figure 10.9 in Edition 4


3.3 Second Normal Form (1)

 Uses the concepts of FDs, primary key


 Definitions:
 Prime attribute - attribute that is member of the primary key K

 Full functional dependency : - a FD Y  Z where removal of


any attribute from Y means the FD does not hold any more

 Examples: - {SSN, PNUMBER}  HOURS is a full FD since


neither SSN -> HOURS nor PNUMBER -> HOURS hold
 - {SSN, PNUMBER} -> ENAME is not a full FD (it is called a
partial dependency ) since SSN -> ENAME also holds
A relation schema R is in second normal form
(2NF) if every non-prime attribute A in R is fully
functionally dependent on the primary key

Second
Normal R can be decomposed into 2NF relations
via the process of 2NF normalization
Form …..
 2NF Defination : A relation schema R is in 2 NF , if every non-prime
attribute A in R is fully functionally dependent on the Primary key of R.
 Example :-

Eno Pno Hours Ename Pname Ploc

 In the relation Emp_proj, here {Eno,Pno }  Hours


 Hours is determined by Eno, Pno
 If any one of the two is removed, Hours cannot be determined.

 Neither Eno  Hours ( Does NOT hold )


 Nor Pno  Hours ( Does NOT hold )
 So, we say { Eno,Pno }  Hours ( Both are needed to determine Hours)
 Hence, FULL FUNCTIONAL DEPENDENCY .
 Next is Ename

 { Eno, Pno }  Ename


 (i.e) Ename can be determined by { Eno, Pno } .
 But if one of the attributes from this Primary Key is removed
 (i.e) Eno  Ename
 So, we see that Eno alone can also determine Ename.
 So, we say Ename is PARTIALLY DEPENDENT on the
PRIMARY KEY { Eno, Pno}

 Note * The attribute which has partial dependency should be


removed from this relation .
 Next is Pname ie Proj name

 { Eno, Pno }  Pname


 (i.e) Pname can be determined by { Eno, Pno } .
 But if one of the attributes from this Primary Key is removed
 (i.e) Pno  Pname
 So, we see that Pno alone can also determine Pname.
 So, we say Pname is PARTIALLY DEPENDENT on the
PRIMARY KEY { Eno, Pno}

 Note * The attribute which has partial dependency should be


removed from this relation .
 Next is Plocation ie Proj location

 { Eno, Pno }  Ploc


 (i.e) Ploc can be determined by { Eno, Pno } .
 But if one of the attributes from this Primary Key is
removed
 (i.e) Pno  Ploc
 So, we see that Pno alone can also determine Ploc.
 So, we say Ploc is PARTIALLY DEPENDENT on the
PRIMARY KEY { Eno, Pno}

 Note * The attribute which has partial dependency


should be removed from this relation .
 To sum it up…….
 {Eno,Pno }  Hours [ Full Funtional Dependency ]
 {Eno,Pno }  Ename [Partial Functional Dependency ]
 {Eno,Pno }  Pname [Partial Functional Dependency ]
 {Eno,Pno }  Plocation [Partial Functional Dependency ]

 So the Next Step is to remove those attributes, which have Partial Functional Dependency
on the primary key.

Eno Pno Hours Ename Pname Plocation

2 NF Normalization

EP1
Eno Pno Hours

EP2 Eno Ename

EP3 Pno Pname Plocation


In each of these , above decomposed Relations ,
are they in 2NF….. ?

We see Full functional Dependency of every Non-prime attribute on the primary key .

EP1 {Eno,pno}  Hours Full functional Dependency  In 2NF

EP2 Eno  Ename Full functional Dependency  In 2NF

EP3 Pno  Pname Full functional Dependency  In 2NF

EP3 Pno  Plocation Full functional Dependency  In 2NF


Third Normal form ( 3 NF )
 3 NF is based on the concept of Transitive dependency .
 Defination : According to Codd’s original definition, a relation
schema R is in 3 NF if it satisfies 2NF and no NON Prime
attribute of R is transitively dependent on the Primary Key.

 (i.e ) to say that the Relation should NOT have a NON key
attribute
which is Functionally determined by another non key attribute .

Finally , there should not be any Transitive dependency of any


non-key attribute on the Primary Key.
NOTE:
In X -> Y and Y -> Z, with X as the primary key, we
consider this a problem only if Y is not a candidate
key. When Y is a candidate key, there is no problem
with the transitive dependency .
E.g., Consider EMP (SSN, Emp#, Salary ).
Here, SSN -> Emp# -> Salary and Emp# is a
candidate key.
What do we mean by Transitive Dependency ?

 A functional dependency X  Y in a relation schema R is a


Transitive dependency if there is a set of attributes Z that is neither a
candidate key nor a subset of any key of R,
and both X Z, and Z  Y hold .

Consider the following relation : -


Emp_dept

Eno Ename DOB Address Dnum Dname Dmgr_eno


Evaluating all Functional Dependencies …..

Eno  Ename
(P.K ) ( Non prime attribute )
Here Non prime attribute directly functionally dependent on P.K

Eno  DOB
(P.K ) ( Non prime attribute )
Here Non prime attribute directly functionally dependent on P.K

Eno  Address
(P.K ) ( Non prime attribute )
Here Non prime attribute directly functionally dependent on P.K

Eno  Dnum
(P.K ) ( Non prime attribute )
Here Non prime attribute directly functionally dependent on P.K
 Now left are Dname , and Dmgr_eno
 Dname is not directly dependent on Eno (P.K)
 Dname is Functionally dependent on Dnum.
 As Dnum changes => Dname also changes
 Dnum  Dname
 (Non Prime Attribute ) (Non Prime Attribute )

 So, a non Prime attribute is functionally dependent on another


Non Prime attribute.
 And Dnum is functionally dependent on Eno ie the Primary Key.
 Eno  Dnum  Dname
 So, Transitively Dname is F.D on Eno
 Eno  Dname
 Hence the relation is NOT in 3NF , because there is Transitive dependency.
 Further we see that
 Dnum  Dmgr_eno
 (Non-Prime Attribute) (Non-Prime Attribute)
 And Eno  Dnum
 So again….
 Eno  Dnum  Dmgr_eno
 Again Transitive dependency exists .
 So, Not in 3NF.

 So, to get the relation in 3NF, we need to decompose the relation such
that we remove those Non-Prime attributes that are Functionally
Dependent on other Non-Prime attributes .

 So, we remove Dname and Dmgr_eno


Diagramatic Representation for 3NF

Decompose to get
Relation in 3NF

Eno Ename DOB Address Dnum Dnum Dname Dmgr_eno

 Now, in both the relations each of the Non-Prime attributes are F.D
on the Primary Key of that relation.
 No non-prime attribute is F.D on another non-prime attribute.

 No Transitive Dependency .
Normalizing into 2NF and 3NF

Note: The above figure is Figure 10.10


Figure 10.11
Normalization into
2NF and 3NF

Note: The above figure is now called Figure 10.11 in Edition 4


Examples ….

 The diagram given is ….

S_id S_name D_O_B Street City State Zip

 We notice the following …..


 S_id  S_name
 S_id  D_o_b So all Non-Prime attributes are FULLY
Functionally dependent on P.Key.
 S_id  Street
 S_id  City
 S_id  State It is in 2NF
 S_id  Zip
 Now, to check for 3 NF
 That is , we need to check that Transitive Dependency should NOT be
there .
 So, NO NON_PRIME attribute has to be Transitively dependent on the
Primary Key .

This possibility is there when one Non-Prime attribute is


dependent on another non-prime attribute,
which in turn is dependent on the Primary Key .
 Also we notice another set of FDs
 S_id  zip

 Zip  street
 Zip  city
 Zip  state
 Zip is a NON PRIME Attribute
 Street, City and state are also NON PRIME Attributes.
 So one Non Prime attribute is dependent on another NON PRIME ATTRIBUTE , which
in turn is dependent on the Primary Key (S_id).

TRANSITIVE DEPENDENCY IS THERE .

 Zip, Street , city, state are violating the 3NF , so remove them from the Table and put in another
Table .

S_id S_name D_O_B Street City State Zip

S_id S_name D_O_B Zip Zip Street City State


 In Table 1, all NON PRIME Attributes are directly dependent on the
primary Key ( S_id ). So, no Transitive Dependency.

 In table 2, all NON PRIME Attributes are directly dependent on the


primary Key ( Zip ). So, no Transitive Dependency.

 Hence , both Tables are in 3NF. ( No Transitive dependency )


 Example ……
 FD 1

Staff_no Appt_dt Appt_time Dentist_Name Patient_no Patient_name Surgery_no

FD 2 FD 3

FD 4

FD 1 is already in 2NF

Staff_no Appt_dt Appt_time Patient_No Patient_Name


 Which means that
 ( Staff_no , Appt_dt, Appt_time )  Patient_No
 ( Staff_no , Appt_dt, Appt_time )  Patient_Name
 Full Functional Dependency is there .

 In FD 2 we see …..
 Staff_no  Dentist_Name
 It shows that Dentist_Name has PARTIAL FUNCTIONAL Dependency on Primary Key
( Staff_No, Appt_dt, Appt_time )

 Dentist_Name Violates 2NF, so we remove it from the
Table )
 In FD4 we see …..
 (Staff_no , Appt_dt )  Surgery_ No
 This shows that Surgery_No has PARTIAL FUNCTIONAL Dependency on Primary Key
( Staff_No, Appt_dt, Appt_time )
 Surgery_No violates 2NF , hence remove it from the Table .
 So, left in the Table are Patient_Name and Patient_No, along with the Primary
key .

FD1

Staff_no Appt_dt Appt_time Patient_No Patient_Name


R1
FD3
Now in 2 NF
 So, Patient_No and Patient_Name have Full Functional Dependency on
Primary Key .

 Surgery_No shifted to another Table ( Bcoz it violated 2NF )

Staff_No Appt_dt Surgery_No


R2

FD4 Now in 2 NF
Dentist_name shifted to another Table .

Staff_No Dentist_Name Now in 2 NF


R3
FD2

Now we have obtained the above 3 Tables ( R1,R2, R3 )….which are in 2NF .
 In table R1 ,
 We see a NON Prime Attribute ( Patient_name) is FUNCTIONALLY Dependent on another
Non-Prime Attribute Patient_No.
 And Patient_No is further Functionally Dependent on the Primary key.
 So, Patient_Name is TRANSITIVELY dependent on the primary Key , through Patient_No.
 So, PATIENT_NAME is VIOLATING 3NF .
 So Put Patient_Name along with Patient_No in a separate

Staff_No Appt_dt Appt_time Patient_no Patient_Name


R1

R 1A R 1B
Staff_no Appt_dt Appt_time Patient_no Patient_no Patient_Name

Now in 3 NF
 We also observe that R2 and R3 are in 2 NF , aswell as 3 NF .

 So the final Tables which we get in 3 NF are …. R 1A, R 1B, R2 , R3 .


 All these are in 3 NF .
5 BCNF (Boyce-Codd Normal Form)

 A relation schema R is in Boyce-Codd Normal Form


(BCNF) if whenever an FD X  A holds in R, then X is
a superkey of R

 Each normal form is strictly stronger than the previous one


 Every 2NF relation is in 1NF
 Every 3NF relation is in 2NF
 Every BCNF relation is in 3NF
 There exist relations that are in 3NF but not in BCNF
 The goal is to have each relation in BCNF (or 3NF)
Boyce-Codd normal
form
A relation TEACH that is in 3NF
but not in BCNF
Achieving the BCNF by
Decomposition (1)

 Two FDs exist in the relation TEACH:


fd1: { student, course} -> instructor
fd2: instructor -> course
 {student, course} is a candidate key for this relation and
that the dependencies shown follow the pattern in
Figure 10.12 (b). So this relation is in 3NF but not in BCNF
 A relation NOT in BCNF should be decomposed so as to
meet this property, while possibly forgoing the
preservation of all functional dependencies in the
decomposed relations. (See Algorithm 11.3)
Achieving the BCNF by Decomposition
 Three possible decompositions for relation TEACH
1. {student, instructor} and {student, course}
2. {course, instructor } and {course, student}
3. {instructor, course } and {instructor, student}

 All three decompositions will lose fd1. We have to settle for sacrificing the
functional dependency preservation. But we cannot sacrifice the non-
additivity property after decomposition.
 Out of the above three, only the 3rd decomposition will not generate spurious
tuples after join.(and hence has the non-additivity property).
 A test to determine whether a binary decomposition (decomposition into two
relations) is nonadditive (lossless) is discussed in section 11.1.4 under
Property LJ1. Verify that the third decomposition above meets the property.
A.3 Definitions of Keys and Attributes
Participating in Keys (1)

A Superkey of a relation schema


with the property that…….
R = {A1, A2, ...., An} is a set of attributes no two tuples t1 and t2 in any
S subset-of R ….. legal relation state r of R will have
t1[S] = t2[S]

A key K is a superkey with the additional property that removal of


any attribute from K will cause K not to be a superkey any more.

SLIDE 34
SLIDE 22
 Suppose we used Emp_Proj1 and Emp_Locs as
the base relations instead of EMP_PROJ (given
on slide 17 ).
 This produces a bad schema design because
 We cannot recover the information that was
Problem 4 originally in EMP_PROJ from EMP_PROJ1 and
EMP_LOCS.
1.4  Infact a NATURAL JOIN on these two relations,
Generation of produces many more erroneous tuples than the
original set .
Spurious  These additional tuples that were not in
EMP_PROJ, are called SPURIOUS TUPLES.
Tuples  They represent SPURIOUS INFORMATION that is
NOT VALID.

Emp_Proj1 Emp_Locs

Emp_ssn Pnum Hours Pname Plocation Ename Plocation


SLIDE 23

 So we can say that…. Decomposing EMP_PROJ


into EMP_LOCS and EMP_PROJ1 is undesirable.
 Bcoz joining them back with NATURAL JOIN, does
Generation of not give CORRECT information.

Spurious  Reason….. In this case Plocation is the attribute


which is used for joining ( common attribute)
Tuples (contd)  Plocation is neither a PRIMARY KEY nor a
FOREIGN KEY.
 SO, we can state another GUIDELINE…….
SLIDE 24

 Bad designs for a relational database may result in


erroneous results for certain JOIN operations
 The "lossless join" property is used to guarantee
meaningful results for join operations

1.4
 GUIDELINE 4: To avoid generation of Spurious
Generation Tuples
of Spurious
Tuples  The relations should be designed such that no
spurious tuples should be generated by doing a
natural-join of any relations.

You might also like