0% found this document useful (0 votes)
16 views

Database Lect5 FD

The document discusses database schemas and functional dependencies. It defines functional dependencies and provides examples to illustrate the concepts. The document also covers topics like normalization, keys, and decomposing relations to eliminate anomalies.

Uploaded by

mennah samy
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Database Lect5 FD

The document discusses database schemas and functional dependencies. It defines functional dependencies and provides examples to illustrate the concepts. The document also covers topics like normalization, keys, and decomposing relations to eliminate anomalies.

Uploaded by

mennah samy
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 66

Database

Functional Dependencies
Designing Good Schemas
 We know how to create schemas, but ...
 how do we create good schemas?
 what does good mean?

 Schema quality measurements:


 semantics of the attributes
 minimal redundancy
 minimal frequency of null values
Functional Dependences
 A column Y of relational table R is
functionally dependent up on column X of
relational table R if and only if:

Each value of X in R associated with each


value of Y at any given time
Functional Dependencies

Definition: A1, ..., Am  B1, ..., Bn holds in R if:

t, t’  R, (t.A1=t’.A1  ...  t.Am=t’.Am  t.B1=t’.B1  ...  t.Bm=t’.Bm


)
R
A1 ... Am B1 ... Bm

t’

if t, t’ agree here then t, t’ agree here


Examples
EmpID Name Phone Position
E0045 Smith 1234 Clerk
E1847 John 9876 Salesrep
E1111 Smith 9876 Salesrep
E9999 Mary 1234 Lawyer
 EmpID Name, Phone, Position
 Position Phone
 but Phone Position
Example Data
name addr beersLiked manf favBeer
Janeway Voyager Bud A.B. WickedAle
Janeway Voyager WickedAle Pete’s WickedAle
Spock Enterprise Bud A.B. Bud

Because name -> addr Because name -> favBeer

Because beersLiked -> manf

6
Example
Drinkers(name, addr, beersLiked,
manf, favoriteBeer)
na me ad dr be ersLi ked ma nf fa vo riteBe er
Janeway Vo ya ger Bud A.B. Wi ckedAl e
Janeway Vo ya ger Wi ckedAl e Pete's Wi ckedAl e
Spo ck Enterpris e Bud A.B. Bud

 Reasonable FD's to assert:


1. name  addr
2. name  favoriteBeer
3. beersLiked  manf
Functional dependences
 Y is functional dependent up on X same as
values of X identify values of Y
 If X  Y then XZYZ
 IF XY and Y  Z then XZ
 X Y means that Y depend on X or
X identify Y
Examples
 S#  Ename
 {S#, P#}  Hours
 If for each value of S#, there are exactly one
corresponding value for sname, state, city
then:

S# Sname Sate City


Example
 If {S#, p#}  Qty

S# P# QTY
Redundancy Example
 Where’s the redundancy?
Redundancy Example
Example FDs

Proper FDs Transitive FDs

Partial Key FD

Partial Key FDs


Example
 R = (A, B, C, G, H, I)
F={ AB
AC
CG  H
CG  I
B  H}
 some members of F+
 AH
 by transitivity from A  B and B  H
 AG  I
 by augmenting A  C with G, to get AG  CG
and then transitivity with CG  I
 CG  HI
 by augmenting CG  I to infer CG  CGI,
and augmenting of CG  H to infer CGI  HI,
and then transitivity
Formal definition of a key
 A key is a set of attributes A1, ..., An s.t. for
any other attribute B, A1, ..., An  B

 A minimal key is a set of attributes which is


a key and for which no subset is a key

 Note: book calls them superkey and key


Where Do Keys Come From?
1. We could simply assert a key K. Then the
only FD’s are K -> A for all atributes A, and
K turns out to be the only key obtainable
from the FD’s.
2. We could assert FD’s and deduce the keys
by systematic exploration.
 E/R gives us FD’s from entity-set keys and
many-one relationships.

16
Examples of Keys
 Product(name, price, category, color)
name, category  price
category  color

Keys are: {name, category} and all supersets

 Enrollment(student, address, course, room, time)


student  address
room, time  course
student, course  room, time

Keys are: [in class]


Example 2
Lastname Firstname Student ID Major

Key Key
(2 attributes)
Superkey

Note: There are alternate keys

 Keys are {Lastname, Firstname} and


{StudentID}
Finding the Keys of a Relation

Given a relation constructed from an E/R diagram, what is its key?


Rules:
1. If the relation comes from an entity set,
the key of the relation is the set of attributes which is the
key of the entity set.

Person Person(address, name, ssn)

address name ssn


Finding the Keys
Rules:
2. If the relation comes from a many-many relationship,
the key of the relation is the set of all attribute keys in the
relations corresponding to the entity sets

name
Product buys Person

price name ssn


date

buys(name, ssn, date)


Finding the Keys
Except: if there is an arrow from the relationship to E, then
we don’t need the key of E as part of the relation key.

Product sname

name Purchase Store

card-no
CreditCard
Person ssn

Purchase(name , sname, ssn, card-no)


Expressing Dependencies
Say: “the CreditCard determines the Person”

Product sname

name Purchase Store

Incomplete
card-no
CreditCard
Person ssn (what does
it say ?)

Purchase(name , sname, ssn, card-no)


card-no  name
 Enrollment(student, major, course, room,
time)
student  major
major, course  room
course  time

What else can we infer ?


Relational Schema Design
(or Logical Design)
Main idea:
 Start with some relational schema

 Find out its FD’s


 Important also to look at inferred FD’s.
 Use them to design a better relational
schema
Relational Schema Design
Recall set attributes (persons with several phones):

Name SSN PhoneNumber City


Fred 123-45-6789 206-555-1234 Seattle
Fred 123-45-6789 206-555-6543 Seattle
Joe 987-65-4321 908-555-2121 Westfield
Joe 987-65-4321 908-555-1234 Westfield

SSN  Name, City, but not SSN  PhoneNumber

Anomalies:
Redundancy = repeat data•
Update anomalies = Fred moves to “Bellvue”•
Deletion anomalies = Fred drops all phone numbers:•
what is his city ?
Relation Decomposition
Break the relation into two:

Name SSN City


Fred 123-45-6789 Seattle
Joe 987-65-4321 Westfield

SSN PhoneNumber
123-45-6789 206-555-1234
123-45-6789 206-555-6543
987-65-4321 908-555-2121
987-65-4321 908-555-1234
Relational Schema Design
name
Conceptual Model: Product buys Person

price name ssn

Relational Model:
plus FD’s

Normalization:
Eliminates anomalies
Decompositions in General
R(A1, ..., An)

Create two relations R1(B1, ..., Bm) and R2(C1, ..., Cp)

such that: B1, ..., Bm  C1, ..., Cp = A1, ..., An

and:
R1 = projection of R on B1, ..., Bm
R2 = projection of R on C1, ..., Cp
Incorrect Decomposition
 Sometimes it is incorrect:

Name Price Category


Gizmo 19.99 Gadget
OneClick 24.99 Camera
DoubleClick 29.99 Camera

Decompose on : Name, Category and Price, Category


Incorrect Decomposition

Name Category Price Category

Gizmo Gadget 19.99 Gadget


OneClick Camera 24.99 Camera
DoubleClick Camera 29.99 Camera

Name Price Category

Gizmo 19.99 Gadget


When we put it back: OneClick 24.99 Camera
OneClick 29.99 Camera
Cannot recover information DoubleClick 24.99 Camera
DoubleClick 29.99 Camera
Normal Forms
 Each normal form is a set of conditions on a
schema that guarantees certain properties
(relating to redundancy and update
anomalies)
 The two commonly used normal forms are
third normal form (3NF) and Boyce-Codd
normal form (BCNF)

31
Normalization
0NF

remove remove remove


multi-valued 1NF partial 2NF transitive 3NF
attributes dependencies dependencies

remove
remove remove
remaining
FD anomal
BCNF multivalue 4NF remaining 5NF
dependencies anomalies
dependencies
Goals of Normalization
 Let R be a relation scheme with a set F of
functional dependencies.
 Decide whether a relation scheme R is in “good”
form.
 In the case that a relation scheme R is not in
“good” form, decompose it into a set of relation
scheme {R1, R2, ..., Rn} such that
 each relation scheme is in good form
 the decomposition is a lossless-join decomposition
 Preferably, the decomposition should be dependency
preserving.
1 NF

 First normal form is

 NO multi-valued attributes

 No composite attribute

 No nested relation

We create new table or new field (telephone, visiting)


1NF Normalization

Proper translation
from ER multi-value
attributes will
achieve 1NF.

Still not a good solution,


since we have redundancy in
Dnumber and Dmgr_ssn.
(This will be handled by 2NF.)
2 NF form
 Second normal form that if primary key is
multiple attribute and non-key attribute
depend on part of primary key

S# P# Hours Cname pname Loc


2NF Normalization

Move the partial key and dependent attributes to a new relation.


Transitive Dependencies
 X → Y is a transitive dependency (PD)
if there exists Z ⊈ any key
such that X → Z → Y
 TDs can cause redundancy if there are multiple
values of X that determine the same value of Z
 the value of Y for that value of Z is stored multiple
times

 3NF normalization: move (Z,Y) to new relation


in which Z is the primary key
3 NF

 The relation in 3NF if it is 2 NF and every

non-key attribute is non-transitively

dependent on primary key


3NF Normalization

 Create new relation to hold the attributes


in the transitive FD.
 LHS of transitive FD becomes PK of new
Transitive Dependency
Example
DEPT COURSE SECTION ROOM INSTR I_OFFICE

I_OFFICE (instructor's office) is determined


by the non-PK attribute INSTR

DEPT COURSE SECTION ROOM INSTR I_OFFICE


COMP 51 1 WPC122 DOHERTY CSB109
COMP 51 2 WPC219 CLIBURN CSB107
COMP 163 1 WPC122 DOHERTY CSB109
COMP 53 1 WPC130 BOWRING CSB108
COMP 53 2 WPC130 CARMAN CSB104
NF Decomposition:
Foreign Keys
DEPT COURSE SECTION ROOM INSTR I_OFFICE

DEPT COURSE SECTION ROOM INSTR

Decomposition:

INSTR I_OFFICE
3NF Example
 Relation dept_advisor:
 dept_advisor (s_ID, i_ID, dept_name)
F = {s_ID, dept_name  i_ID, i_ID 
dept_name}
 Two candidate keys: s_ID, dept_name, and
i_ID, s_ID
 R is in 3NF
 s_ID, dept_name  i_ID s_ID
 dept_name is a superkey
 i_ID  dept_name
 dept_name is contained in a candidate key
Redundancy in 3NF
 There is some redundancy in this schema
 Example of problems due to redundancy in
3NF J L K
j1 l1 k1
 R = (J, K, L)
F = {JK  L, L  K } j2 l1 k1
j3 l1 k1

null l2 k2

 repetition of information (e.g., the relationship l1, k1)


 (i_ID, dept_name)
 need to use null values (e.g., to represent the relationship
l2, k2 where there is no corresponding value for J).
 (i_ID, dept_nameI) if there is no separate relation mapping
instructors to departments
3NF Decomposition: An Example
 Relation schema:
cust_banker_branch = (customer_id, employee_id, branch_name, type )
 The functional dependencies for this relation schema are:
1. customer_id, employee_id  branch_name, type
2. employee_id  branch_name
3. customer_id, branch_name  employee_id
 We first compute a canonical cover
 branch_name is extraneous in the r.h.s. of the 1st dependency
 No other attribute is extraneous, so we get FC =
customer_id, employee_id  type
employee_id  branch_name
customer_id, branch_name  employee_id
Normalization
Goal = BCNF = Boyce-Codd Normal Form =
all FD’s follow from the fact “key 
everything.”
 Formally, R is in BCNF if for every nontrivial
FD for R, say X  A, then X is a superkey.
 “Nontrivial” = right-side attribute not in left side.

Why?
1. Guarantees no redundancy due to FD’s.
2. Guarantees no update anomalies = one
occurrence of a fact is updated, not all.
3. Guarantees no deletion anomalies = valid
fact is lost when tuple is deleted.
Boyce-Codd Normal Form
A relation schema R is in BCNF with respect to a set F of
functional dependencies if for all functional dependencies in F+ of
the form



where   R and   R, at least one of the following holds:

    is trivial (i.e.,   )
  is a superkey for R
Example schema not in BCNF:

instr_dept (ID, name, salary, dept_name, building, budget )

because dept_name building, budget


holds on instr_dept, but dept_name is not a superkey
Third Normal Form
 A relation schema R is in third normal form (3NF) if for all:
   in F+
at least one of the following holds:
    is trivial (i.e.,   )
  is a superkey for R

 Each attribute A in  –  is contained in a candidate key for R.

(NOTE: each attribute may be in a different candidate key)


 If a relation is in BCNF it is in 3NF (since in BCNF one of
the first two conditions above must hold).
 Third condition is a minimal relaxation of BCNF to ensure
dependency preservation (will see why later).
Boyce-Codd Normal Form
 Sample data for Course Section table
Department Prefix Num SecNum CourseName Instructor
Mathematics Math 101 1 Algebra I Al Jeebra
Mathematics Math 101 2 Algebra I Al Jeebra
Mathematics Math 201 1 Calculus I Kal Kuelus
Philosophy Phil 201 1 Greek Thought Arie Stottle
Philosophy Phil 202 1 Euro Thought Mike
Angelo
Marketing Mktg 410 1 Marketing Marc
Strategy Ekking
Marketing SpMkg 401 1 Advanced Hulk
Sports Hogan
Marketing

 Because Prefix  Department, we know that (Prefix, Num, SecNum) could


also be a primary key for this table.
Example

Students(name, addr, phones, CarLiked)


 A student’s phones are independent of the cars
they like.
 Thus, each of a student’s phones appears with
each of the cars they like in all combinations.
 This repetition is unlike redundancy due to
FD’s, of which name->addr is the only one.

50
Example
 Students(name, addr, CarLiked, manf, favCar)
 FD’s: name->addr favCar, carsLiked->manf
 Only key is {name, CarsLiked}.
 In each FD, the left side is not a superkey.
 Any one of these FD’s shows Students is not
in BCNF

51
Boyce-Codd Normal Form
 We say a relation R is in BCNF if whenever
X ->A is a nontrivial FD that holds in R, X is
a superkey.
 Remember: nontrivial means A is not a member
of set X.
 Remember, a superkey is any superset of a key
(not necessarily a proper superset).

52
Example
 Students(name, addr, CarsLiked, manf, favCar)
 F = name->addr, name -> favCar, CarsLiked->manf
 Pick BCNF violation name->addr.
 Close the left side: {name}+ = {name, addr, favCar}.
 Decomposed relations:
1. Students1(name, addr, favCar)
2. Students2(name, CarsLiked, manf)

53
3NF and BCNF
 3rd Normal Form (3NF) modifies the BCNF
condition so we do not have to decompose
in this problem situation.
 X ->A violates 3NF if and only if X is not a
superkey, and also A is not prime.

54
Exercises
 The following relation schema
is not in third normal form (3NF).
SHIPMENT
SID FROM_CITY TO_CITY DISTANCE WEIGHT

Is this an example of a transitive dependency


or a partial key dependency?

Give an equivalent schema that is in 3NF.


Exercises
 This relation has been proposed
to track Pacific alumni:

Alumni( SID, LastName, FirstName, Degree,


YearAwarded, Phone).

Pacific allows students to receive multiple degrees,


possibly in different years.
Identify all FDs.

Give a new schema that is in third normal form.


Exercises
 Consider the following relation schema:
Movie(title, genre, length, actor, sag_id, studio, studio_addr)

 Every movie has a unique title.


 A movie may have multiple actors.
 Each actor has a unique sag_id.
 An actor may appear in multiple movies.
 A movie has exactly one studio,
but a studio may produce more than one movie.
 Each studio has exactly one address.

 Identify all functional dependencies.


 Normalize the schema to 3NF.
INDEX

 Is used to speed up the retrieval of records in


response to certain search conditions

 Any field of the file can be used to create an


index
Index

 Multiple indexes on different fields can be

constructed on same file.

 Is specified on the ordered key field of file

(single index) and B+ tree (multiple indexes)


Primary index

 It has 2 fields:
1. Primary key of the data file

2. Pointer to a disk block (address)


Index problem

 The main problem with primary index is


insertion and deletion of records

 To insert a record in its correct position, other


records be shifted to give space for new one.
Clustering index

 It based on a non-key field in the file where

the record value can be repeated so it

clustering into groups

 The record insertion and deletion still cause

a problem
Clustering index

 The primary index requires a distinct value

for each record

 In clustering index, there is one entry for

each distinct value


Secondary index

 It based on some non-ordering field of the

data file.

 There can be many secondary indexes for

same file
Example
 Create a database for managing class enrollments in a single
semester. You should keep track of all students (their names, Ids,
and addresses) and professors (name, Id, department). Do not
record the address of professors but keep track of their ages.
Maintain records of courses also. Like what classroom is assigned
to a course, what is the current enrollment, and which department
offers it. At most one professor teaches each course. Each student
evaluates the professor teaching the course. Note that all course
offerings in the semester are unique, i.e. course names and
numbers do not overlap. A course can have ≥ 0 pre-requisites,
excluding itself. A student enrolled in a course must have enrolled in
all its pre-requisites. Each student receives a grade in each course.
The departments are also unique, and can have at most one
chairperson (or dept. head). A chairperson is not allowed to head
two or more departments.
Example
 Create a database for managing class enrollments in a single
semester. You should keep track of all students (their names, Ids,
and addresses) and professors (name, Id, department). Do not
record the address of professors but keep track of their ages.
Maintain records of courses also. Like what classroom is assigned
to a course, what is the current enrollment, and which department
offers it. At most one professor teaches each course. Each student
evaluates the professor teaching the course. Note that all course
offerings in the semester are unique, i.e. course names and
numbers do not overlap. A course can have ≥ 0 pre-requisites,
excluding itself. A student enrolled in a course must have enrolled in
all its pre-requisites. Each student receives a grade in each course.
The departments are also unique, and can have at most one
chairperson (or dept. head). A chairperson is not allowed to head
two or more departments.

You might also like