0% found this document useful (0 votes)
0 views

Lecture17

The document discusses schema refinement and normal forms in database design, focusing on functional dependencies and the identification of keys for a relation with five attributes. It outlines the requirements for various normal forms, including 1NF, 2NF, 3NF, and BCNF, and explains the significance of eliminating redundancy and dependencies. Additionally, it addresses the challenges and potential issues associated with decompositions and the importance of lossless joins.

Uploaded by

theheatman675
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views

Lecture17

The document discusses schema refinement and normal forms in database design, focusing on functional dependencies and the identification of keys for a relation with five attributes. It outlines the requirements for various normal forms, including 1NF, 2NF, 3NF, and BCNF, and explains the significance of eliminating redundancy and dependencies. Additionally, it addresses the challenges and potential issues associated with decompositions and the importance of lossless joins.

Uploaded by

theheatman675
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

COMP 421: Files and Databases

Lecture 17: Schema Refinement and Normal Forms

1
Exercise
• Consider a relation R with 5 attributes ABCDE. You are given the
following dependencies:
• 𝐴→𝐵
• 𝐵𝐶 → 𝐸
• 𝐸𝐷 → 𝐴

1. List all keys for R (try all combinations: A, B, AB, AC …) -- difficult


2. Lets see if ACD is a key?
• Find closure of ACD, check if all 5 attributes are in the closure
3. Lets see if ADE is a key?
4. Lets see if BCD is a key?
5. Lets see if CDE is a key?

2
Exercise
• Consider a relation R with 5 attributes ABCDE. You are given the
following dependencies:
• 𝐴→𝐵
• 𝐵𝐶 → 𝐸
• 𝐸𝐷 → 𝐴

2. Lets see if ACD is a key?


• Find closure of ACD, check if all 5 attributes are in the closure
• Closure = {A, C, D}
• Using 𝐴 → 𝐵, LHS = A which is in the current closure, so add RHS = B to the closure,
closure = {A, B, C, D}
• Using 𝐵𝐶 → 𝐸, LHS = BC which is in the current closure, so add RHS = E to the closure,
closure = {A, B, C, D, E}

3
Exercise
• Consider a relation R with 5 attributes ABCDE. You are given the
following dependencies:
• 𝐴→𝐵
• 𝐵𝐶 → 𝐸
• 𝐸𝐷 → 𝐴

3. Lets see if ADE is a key?


• Find closure of ADE, check if all 5 attributes are in the closure
• Closure = {A, D, E}
• Using 𝐴 → 𝐵, LHS = A which is in the current closure, so add RHS = B to the closure,
closure = {A, B, D, E}
• Using 𝐸𝐷 → 𝐴, LHS = ED which is in the current closure, so add RHS = A to the closure,
closure = {A, B, D, E}

4
Exercise
• Consider a relation R with 5 attributes ABCDE. You are given the
following dependencies:
• 𝐴→𝐵
• 𝐵𝐶 → 𝐸
• 𝐸𝐷 → 𝐴

4. Lets see if BCD is a key?


• Find closure of BCD, check if all 5 attributes are in the closure
• Closure = {B, C, D}
• Using 𝐵𝐶 → 𝐸, LHS = BC which is in the current closure, so add RHS = E to the closure,
closure = {B, C, D, E}
• Using 𝐸𝐷 → 𝐴, LHS = ED which is in the current closure, so add RHS = A to the closure,
closure = {A, B, C, D, E}

5
Exercise
• Consider a relation R with 5 attributes ABCDE. You are given the
following dependencies:
• 𝐴→𝐵
• 𝐵𝐶 → 𝐸
• 𝐸𝐷 → 𝐴

5. Lets see if CDE is a key?


• Find closure of CDE, check if all 5 attributes are in the closure
• Closure = {C, D, E}
• Using 𝐸𝐷 → 𝐴, LHS = ED which is in the current closure, so add RHS = A to the closure,
closure = {A, C, D, E}
• Using 𝐴 → 𝐵, LHS = A which is in the current closure, so add RHS = B to the closure,
closure = {A, B, C, D, E}

6
Normal Forms
• To eliminate redundancy, several forms have been proposed as
guidance
• If a relational schema is in one of these normal forms, then certain
problems cannot arise
• The normal forms based on FDs are:
• First normal form (1NF)
• Second normal form (2NF)
• Third normal form (3NF)
• Boyce-Codd normal form (BCNF)
• Forth normal form (4NF)
• Fifth normal form (5NF)

7
First Normal Forms (1NF)
• For a relation to be in First Normal Form (1NF), it must satisfy the
following rules:
1. Each column (attribute) in the relation must contain only atomic values
2. The order of rows and columns does not matter in 1NF
3. Every row in the table must be unique

sid sname rating sid sailor rating


1 Lubber 7 1 Lubber 7
2 Dustin 8 to 10 1 Lubber 7

Violates rule# 1 Violates rule# 3

8
Second Normal Forms (2NF)
• For a relation to be in Second Normal Form (2NF), it must satisfy the
following rules:
1. The relation must already satisfy all the conditions of 1NF
2. Non-key attributes must be fully dependent on the entire candidate key, not
just a part of it.

sid bid day cost Primary key: (sid, bid, day)


1 100 2024-10-12 200 Primary key = candidate key
2 101 2024-10-22 250
3 100 2024-10-23 200

Violates rule# 2
𝑏𝑖𝑑 → 𝑐𝑜𝑠𝑡
Non-key attribute cost depends on bid which is just a part of a candidate key
9
Second Normal Forms (2NF)
• For a relation to be in Second Normal Form (2NF), it must satisfy the
following rules:
1. The relation must already satisfy all the conditions of 1NF
2. Non-key attributes must be fully dependent on the entire candidate key, not
just a part of it.

sid bid day cost Primary key: (sid, bid, day)


1 100 2024-10-12 200 Primary key = candidate key
2 101 2024-10-22 250
3 100 2024-10-23 200

Violates rule# 2
𝑏𝑖𝑑 → 𝑐𝑜𝑠𝑡
This is called partial dependency. cost depends on bid which is only a part of a key
10
Partial Dependency
• Dependency only on an attribute which is a subset of some key.
• cost depends on bid, bid is subset of the primary key
• 2NF eliminated partial dependencies

KEY Attribute X Attribute A

sid
bid cost
day

11
Third Normal Forms (3NF)
• For a relation R to be in Third Normal Form (3NF), if for every X→A
in F, one of the following statements is ture:
• 𝐴𝜖𝑋; trivial dependency
• X is a superkey, or
• A is part of some key for R

A can only depend on X if X is a key (primary, candidate, super)


A can only depend on X if A is a part of some key

Primary and candidate keys are called minimal super keys


If an attribute is part of a key, it is called a key attribute.
12
Third Normal Forms (3NF)
• For a relation to be in Third Normal Form (3NF), it must satisfy the
following rules:
1. The relation must already satisfy all the conditions of 2NF
2. A non-key attribute must be directly dependent on the entire candidate key,
and only on the candidate key
3. An attribute cannot depend on non-key attribute
sid rating wage Primary key: sid
1 10 20 Primary key = candidate key
2 7 15
3 10 20

Violates rule# 2
𝑟𝑎𝑡𝑖𝑛𝑔 → 𝑤𝑎𝑔𝑒
Non-key attribute wage depends on rating which is not a candidate key
13
Transitive Dependency
• Dependency on an attribute which is not a subset of any key.
• wage depends on rating, rating is not a subset of any candidate key
• 3NF eliminates transitive dependency

KEY Attribute X Attribute A

KEY Attribute A Attribute X

14
Third Normal Forms (3NF)
• For a relation to be in Third Normal Form (3NF), it must satisfy the
following rules:
1. The relation must already satisfy all the conditions of 2NF
2. A non-key attribute must be directly dependent on the entire candidate key,
and only on the candidate key

sid bid day credit_card Primary key: (sid, bid, day)


1 100 2024-10-12 4400-1234-5678-9012 Primary key = candidate key
2 101 2024-10-22 4400-1234-5678-2012
3 102 2024-10-23 4400-1234-5678-0012

Violates rule# 1
s𝑖𝑑 → 𝑐𝑟𝑒𝑑𝑖𝑡_𝑐𝑎𝑟𝑑
Non-key attribute credit_card depends on sid which is not a part of a candidate key
15
Third Normal Forms (3NF)
• For a relation to be in Third Normal Form (3NF), it must satisfy the
following rules:
1. The relation must already satisfy all the conditions of 2NF
2. A non-key attribute must be directly dependent on the entire candidate key,
and only on the candidate key

sid bid day credit_card What if credit card uniquely


1 100 2024-10-12 4400-1234-5678-9012 identifies a sailor?
2 101 2024-10-22 4400-1234-5678-2012 Then (credit_card, bid,day) is a
3 102 2024-10-23 4400-1234-5678-0012 candidate key

s𝑖𝑑 → 𝑐𝑟𝑒𝑑𝑖𝑡_𝑐𝑎𝑟𝑑 does not violate rule# 2


since now credit_card is a part of a candidate key

16
Boyce-Codd Normal Forms (BCNF)
• For a relation R to be in Third Normal Form (3NF), if for every X→A
in F, one of the following statements is ture:
• 𝐴𝜖𝑋; trivial dependency
• X is a superkey, or
• A is part of some key for R

A can only depend on X if X is a key (primary, candidate, super)


A can only depend on X if A is a part of some key

Primary and candidate keys are called minimal super keys


If an attribute is part of a key, it is called a key attribute.
17
Boyce-Codd Normal Forms (BCNF)
• For a relation to be in Boyce-Codd Normal Form (BCNF), it must
satisfy the following rules:
1. The relation must already satisfy all the conditions of 3NF
2. Every attribute must be directly dependent on the whole candidate key, and
only on the candidate key

sid bid day credit_card candidate key: (sid, bid, day)


1 100 2024-10-12 4400-1234-5678-9012
candidate key: (credit_card, bid, day)
2 101 2024-10-22 4400-1234-5678-2012
3 102 2024-10-23 4400-1234-5678-0012

s𝑖𝑑 → 𝑐𝑟𝑒𝑑𝑖𝑡_𝑐𝑎𝑟𝑑 violates rule# 2 even if credit_card is a part of a candidate key

18
Decompositions
• Replace the relation with a collection of smaller relations
ssn name lot rating hourly_wages hours_worked
123-22-3666 Attishoo 48 8 10 40
231-31-5368 Smiley 22 8 10 30
131-24-3650 Smethurst 35 5 7 30
434-26-3751 Guldu 35 5 7 32
612-67-4134 Madayan 35 8 10 40

ssn name lot rating hours_worked rating hourly_wages


123-22-3666 Attishoo 48 8 40 8 10
231-31-5368 Smiley 22 8 30 5 7
131-24-3650 Smethurst 35 5 30
434-26-3751 Guldu 35 5 32
612-67-4134 Madayan 35 8 40
19
Problems with Decompositions
• There are three potential problems to consider
1. Some queries become more expensive
2. Given instances of decomposed relations, we may not be able to reconstruct
the original relaiton

S P D S P P D S P D
s1 p1 d1 s1 p1 p1 d1 s1 p1 d1
s2 p2 d2 s2 p2 p2 d2 s2 p2 d2
s3 p1 d3 s3 p1 p1 d3 s3 p1 d3
Instance R Instance R1 Instance R2 s1 p1 d3
s3 p1 d1
R1⨝R2
We don’t get R1 by joining R1, R2, so it’s a lossy decomposition
20
Lossless JOIN
• There are three potential problems to consider
1. Some queries become more expensive
2. Given instances of decomposed relations, we may not be able to reconstruct
the original relaiton

S P D S P S D S P D
s1 p1 d1 s1 p1 s1 d1 s1 p1 d1
s2 p2 d2 s2 p2 s2 d2 s2 p2 d2
s3 p1 d3 s3 p1 s3 d3 s3 p1 d3
Instance R Instance R1 Instance R2 R1⨝R2

We get R1 by joining R1, R2, so it’s a lossless decomposition


S appears in both decomposed relation and it is a key for R2
21
Lossless JOIN
• Attributes that are common to R1 and R2 must contain a key for
either R1 or R2 (S → D, D → P)
S P D S P P D
Common attribute between SP and PD is P
s1 p1 d1 s1 p1 p1 d1 P is not a key for either R1 or R2
lossy
s2 p2 d2 s2 p2 p2 d2
s3 p1 d3 s3 p1 p1 d3
Instance R Instance R1 Instance R2

S P D S P S D
lossless Common attribute between SP and SD is S
s1 p1 d1 s1 p1 s1 d1 S is a key for both R1 and R2
s2 p2 d2 s2 p2 s2 d2
s3 p1 d3 s3 p1 s3 d3
Instance R Instance R1 Instance R2

22
Lossless Decomposition
• Attributes that are common to R1 and R2 must contain a key for either
R1 or R2
R: snlrwh
Common attribute of R1 and R2 is: r
r is a key for R2

R2: rw

R1: snlrh

23
Dependency Preserving Decomposition
• Schema for Contracts: CSTDPQV
• This schema holds following ICs:
• The contractid C is a key: 𝐶 → 𝐶𝑆𝑇𝐷𝑃𝑄𝑉
• A task purchases a given part using a single contract: 𝑃𝑇 → 𝐶
• A department purchases at most one part from a supplier: 𝑆𝐷 → 𝑃

How to decompose?
1. Consider FD 𝑃𝑇 → 𝐶
• Create one relation with PTC: PTC
• Create another relation without C: STDPQV?

• How to make sure the other FD 𝑆𝐷 → 𝑃 is preserved?

24
Dependency Preserving Decomposition
• Schema for Contracts: CSTDPQV
• This schema holds following ICs:
• The contractid C is a key: 𝐶 → 𝐶𝑆𝑇𝐷𝑃𝑄𝑉
• A task purchases a given part using a single contract: 𝑃𝑇 → 𝐶
• A department purchases at most one part from a supplier: 𝑆𝐷 → 𝑃

How to decompose?
1. Consider FD 𝑆𝐷 → 𝑃
• Create one relation with SDP: SDP
• Create another relation without P: CSTDQV?

• How to make sure the other FD 𝑃𝑇 → 𝐶 is preserved?

25
Dependency Preserving Decomposition
• Consider a schema with 3 attributes: 𝐴𝐵𝐶
• This schema holds following FDs:
• 𝐴 → 𝐵, 𝐵 → 𝐶, 𝐶 → 𝐴
• Decompose considering 𝐴 → 𝐵 and 𝐵 → 𝐶
• Create one relation with AB: AB (preserves 𝐴 → 𝐵 )
• Create one relation with BC: AC (preserves 𝐵 → 𝐶 )
• To check if the decompose is dependency preserving:
• Compute closure of all FDs in original Relation R, 𝐹 +
+ +
• Projection of F on AB and BC of decomposed Relations 𝐹𝐴𝐵 , 𝐹𝐵𝐶
+ +
• if 𝐹𝐴𝐵 ∪𝐹𝐵𝐶 = 𝐹 + , then it’s dependency preserving
• We consider only dependencies that can be checked in X without considering Y and vice versa

26
Dependency Preserving Decomposition
• Consider a schema with 3 attributes: 𝐴𝐵𝐶
• This schema holds following FDs:
• 𝐴 → 𝐵, 𝐵 → 𝐶, 𝐶 → 𝐴
• Closure of all FDs in original Relation 𝐹 +
• Closure of A = {A, B, C}: 𝐴 → 𝐴, 𝐴 → 𝐵, 𝐴 → 𝐶
• Closure of B = {A, B, C}: 𝐵 → 𝐴, 𝐵 → 𝐵, 𝐵 → 𝐶
• Closure of C = {A, B, C}: 𝐶 → 𝐴, 𝐶 → 𝐵, 𝐶 → 𝐶

+
• Projection of original Relation on AB, 𝐹𝐴𝐵 : 𝐴 → 𝐴, 𝐴 → 𝐵, 𝐵 → 𝐴, 𝐵 → 𝐵
+
• Projection of original Relation on BC, 𝐹𝐵𝐶 : 𝐵 → 𝐵, 𝐵 → 𝐶, 𝐶 → 𝐵, 𝐶 → 𝐶

27
Dependency Preserving Decomposition
• Consider a schema with 3 attributes: 𝐴𝐵𝐶
• This schema holds following FDs:
• 𝐴 → 𝐵, 𝐵 → 𝐶, 𝐶 → 𝐴
• Closure of all FDs in original Relation 𝐹 +
• Closure of A = {A, B, C}: 𝐴 → 𝐵, 𝐴 → 𝐶
• Closure of B = {A, B, C}: 𝐵 → 𝐴, 𝐵 → 𝐶
• Closure of C = {A, B, C}: 𝐶 → 𝐴, 𝐶 → 𝐵

+
• Projection of original Relation on AB, 𝐹𝐴𝐵 : 𝐴 → 𝐵, 𝐵 → 𝐴
+
• Projection of original Relation on BC, 𝐹𝐵𝐶 : 𝐵 → 𝐶, 𝐶 → 𝐵

28
Dependency Preserving Decomposition
• Consider a schema with 3 attributes: 𝐴𝐵𝐶
• This schema holds following FDs:
• 𝐴 → 𝐵, 𝐵 → 𝐶, 𝐶 → 𝐴
• Closure of all FDs in original Relation 𝐹 +
+ +
• Closure of A = {A, B, C}: 𝐴 → 𝐵, 𝐴 → 𝐶 Can we infer these two from 𝐹𝐴𝐵 and 𝐹𝐵𝐶 ?
• Closure of B = {A, B, C}: 𝐵 → 𝐴, 𝐵 → 𝐶
• Closure of C = {A, B, C}: 𝐶 → 𝐴, 𝐶 → 𝐵

+
• Projection of original Relation on AB, 𝐹𝐴𝐵 : 𝐴 → 𝐵, 𝐵 → 𝐴
+
• Projection of original Relation on BC, 𝐹𝐵𝐶 : 𝐵 → 𝐶, 𝐶 → 𝐵

29
Dependency Preserving Decomposition
• Consider a schema with 3 attributes: 𝐴𝐵𝐶
• This schema holds following FDs:
• 𝐴 → 𝐵, 𝐵 → 𝐶, 𝐶 → 𝐴
• Closure of all FDs in original Relation 𝐹 +
• Closure of A = {A, B, C}: 𝐴 → 𝐵, 𝐴 → 𝐶 +
Can we infer these two from 𝐹𝐴𝐵 +
and 𝐹𝐵𝐶 ?
• Closure of B = {A, B, C}: 𝐵 → 𝐴, 𝐵 → 𝐶
• Closure of C = {A, B, C}: 𝐶 → 𝐴, 𝐶 → 𝐵
+
• Projection of original Relation on AB, 𝐹𝐴𝐵 : 𝐴 → 𝐵, 𝐵 → 𝐴
+
• Projection of original Relation on BC, 𝐹𝐵𝐶 : 𝐵 → 𝐶, 𝐶 → 𝐵

• Using transitivity, from 𝐴 → 𝐵 and 𝐵 → 𝐶, we get 𝐴 → 𝐶


• Using transitivity, from 𝐶 → 𝐵 and 𝐵 → 𝐴, we get 𝐶 → 𝐴

30
Summary
• Finding a Candidate Key: For all possible combinations of attribute sets X, find the closure of X. If the closure
contains all attributes of the relation, then X is a key.

• Checking if a Relation R is in 2NF: Ensure that every non-key attribute depends entirely on the key,
eliminating any partial dependencies.

• Checking if a Relation R is in 3NF: Ensure that every non-key attribute depends entirely on the key, and only
on the key, (any key attribute can depend on a part of key). This eliminates both partial and transitive
dependencies.

• Checking if a Relation R is in BCNF: Ensure that every attribute depends entirely on the key, and only on the
key

• Checking if a Decomposition is Lossless: The common attribute(s) of the decomposed relations must be a
superkey in at least one of the relations to ensure a lossless join.

• Checking if a Decomposition is Dependency Preserving: The closure of all functional dependencies (FDs) in
the original relation must be equivalent to the union of the closures of all FDs in the decomposed relations.

31

You might also like