0% found this document useful (0 votes)
5 views26 pages

ch 5

DSA notes

Uploaded by

Rutuja Jadhav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views26 pages

ch 5

DSA notes

Uploaded by

Rutuja Jadhav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Unit No-5

Relational–Databas
e Design

1
Basic
Concept
• NORMALIZATION
• is a database design technique that reducesdata redundancyand
eliminates
undesirable characteristics like Insertion, Update and Deletion Anomalies.
• Normalization rules divides larger tables into smaller tables and links them using
relationships.
• The purpose of Normalization is to eliminate redundant (repetitive) data and ensure
data is stored logically.
• Normalization is process to convert bad database to good database design.
• Database Normal Forms
• 1NF (First Normal Form)
• 2NF (Second Normal Form)
• 3NF (Third Normal Form)
• BCNF (Boyce-Codd Normal Form)
• 4NF (Fourth Normal Form)
• 5NF (Fifth Normal Form)
2
Basic
Concept
• Pitfalls in Relational-Database:
• Example: Suppose Bank data base contains all data about Branch, Customer and Loan in a
single table…
Branch Name Branch-city Assets Customer-name Loan-no Loan-amount

D B 90 J L-17 1000
R P 21 S L-23 2000
P H 17 H L-15 1500
D B 90 JA L-14 1500
. . . . . .
. . . . . .
. . . . . .
P H 17 G L-25 2500

• The above table is not normalized because of anomalies present in above


table
3
Anomalies
Branch Name Branch-city Assets Customer-name Loan-no Loan-amount
D B 90 J L-17 1000
R P 21 S L-23 2000
P H 17 H L-15 1500
D B 90 JA L-14 1500
. . . . . .
P H 17 G L-25 2500
• Insert Anomaly : If you want to add new branch details to the relation. The new branch details can be
added when customer takes loan.
• Update Anomaly : If we want to update the assets of D Branch then we have to update the same
amount in two rows of D branch or the data will become inconsistent. If somehow, the correct assets
gets updated in one Branch of D but not in other then as per the database, Branch would be having
two different assets, which is not correct and would lead to inconsistent data.
• Delete Anomaly: Suppose, if at a point of time the L-23 of R branch Paid all loan amount then deleting
the rows that is having L-23 loan-no would also delete the information of Branch and customer since
‘S’ customer is only one to this Branch.

4
Terminologies:
• Before starting 1NF,2NF,3NF… Let us understand following Terminologies
:
• Functional Dependencies
• Armstrong Axioms (Rules of Inference)
• Decomposition
• Identification of Candidate Key
• Partial Functional Dependencies
• Full Functional Dependencies
• Functional Dependencies:
• Denoted by symbol
• A B
• Meaning A attribute determines B attribute or
• B attribute is dependent on A attribute
5
Terminologies:
• Armstrong's Axioms:
• If F is a set of functional dependencies then the closure set of F, denoted as F+, is the set of
all functional dependencies logically implied by F.
• Armstrong's Axioms are a set of rules, that when applied repeatedly, generates a closure of
functional dependencies.

• Reflexive rule − If alpha is a set of attributes and beta is_subset_of alpha, then
alpha holds beta.
• Augmentation rule − If a → b holds and y is attribute set, then ay → by also
holds.
• Union Rule :If a → b and a y holds then a by.
• Decomposition Rule: If a by holds then a b and a y.
• Pseudo transitivity Rule : If a b and yb dholds then ay d.
• Transitivity Rule : If a b and b y then a y.
6
Terminologies:
• Example :
• Suppose R = {A, B, C, G, H, I}
• Given FD are as follows :
A B,A C , CG I , B H
• Find Closure Set F + CG H,

1. A H using Transitivity Rule (from A B and B H)


2. CG HI using Union Rule (from CH H and CG I)
3. AG I using Pseudo transitivity Rule (from A C and CG I)

7
Terminologies:
Decomposition:
• is to divide original Relation into smaller relations.
• Decomposition should loss-less should not lossy.
• 2 properties to check whether decomposition is loss-less or
not R1 ⋂ R2 = R1 Or R1 ⋂ R2 = R2
Example :
Let R = {A,B,C} and FD = A B
The Decompose relations are R1 = {A,B} and R2 = {A,C} Is R1 and R2 loss less or lossy
decomposition?
Solution : Since A B is given
So if we take R1 ⋂ R2will get A and A B is given meaning B is
depends on A
So B must with A which is in R1
R1 ⋂ R2 = R1 so we are getting R1
Hence this is loss less decomposition
8
Terminologies:
Decomposition: (Exercise)
1. Let R = {A,B,C} and FD = A B
The Decompose relations are R1 = {A,B} and R2 = {B,C} Is R1 and R2 loss less or lossy
decomposition?
2. Let R = {A,B,C,D,E} and FD = {A BC , CD E, B D, E A}
The Decompose relations are R1 = {A,B,C} and R2 = {A,D,E} Is R1 and R2 loss less or
lossy decomposition?

3. Let R = {A,B,C,D,E,F} and FD = {AB C , B D, C E , D F}


The Decompose relations are
R1 = {A,B,C}
R2 = {A,B,E}
R3 = { B, D}
R4 = {B,F}
Whether decomposition is loss-less or Lossy?

9
Terminologies:
Determine Candidate Key
Example : Suppose R =
{A,B,C,D,E,F} With FD = A B, AE F, CE D, BC D
CD A, the candidate Key for R.
Determine
1. F+ (A) = {A}
A B { A,B}
A can identify maximum 1 attribute hence cannot be a candidate
Key.
2. F+(AE) = AE F { A,E,F}
A B {A,B,E,F}
Therefore AE F can identify maximum 4 attribute hence cannot be a candidate Key.

3. F+(CD) = CD A { A,C,D}
A B {A,B,C,D}
Therefore CD A can identify maximum 4 attribute hence cannot be a candidate Key.
10
Terminologies:
Determine Candidate Key :
Cont…
Example : Suppose R = {A,B,C,D,E,F} BC D
Determine
With FD = AtheB,candidate Key A,
AE F, CD forCE D,
R.
4. F+ (CE) = CE D { C,D,E}
CD A {A,C,D,E}
A B {A,B,C,D,E}
AE F {A,B,C,D,E,F}
CE can identify all attributes of R hence CE is a candidate Key
5. R. F+(BC) = BC D = {B,C,D}
CD A = {A,B,C,D}
Therefore BC D can identify maximum 4 attribute hence cannot be a
candidate Key

11
Terminologies:
Determine Candidate Key (Exercise)
1. Example : Suppose R =
{A,B,C,D,E,F} With FD = A BC, E A,
CD E, B D,
Determine the candidate Key for R.
• Full Functional Dependency :
When Non key attributes are dependent on Key attributes is called as Full
Functional dependency.
• Partial Functional dependency:
When Non key attributes are dependent on part of the Key attributes is
called as Partial Functional dependency.
Example : Suppose R = {A,B,C,D } and FD = { AB C , B D} and AB is a candidate key then
AB C Where AB is Key Attribute and C is non key attribute which is fully dependent on AB
Hence AB is FULL Functional dependency.
B D Where B is a part of AB key attribute and D is non key attribute and D is partially
dependent on B which is part of AB Key attribute Hence B is Partially Functional Dependency.
12
NORMALIZATION
• NORMALIZATION
• 1NF(First Normal Form)
• If and only if all attributes of the Relation R is atomic in nature.

• 2NF(Second Normal Form)


• R is said to in second normal form if and only if :
• It is in the 1NF.
• No Partial dependency exists between non-key attributes and Key attribute.
• Group such partial dependent attributes as a separate table and identify Key and name of
newly created table.

• 3 NF(Third Normal Form)


• R is said to in second normal form if and only if :
• It is in the 2NF.
• No Transitive dependency exists between non-key attributes and Key attribute.
• Group such Transitive dependent attributes as a separate table and identify Key and name
13
of
NORMALIZATION
• NORMALIZATION (Example of Bad Database Design) College
DB
S# S-name DOB C# C-Name Pre-requisite Duration DOE Marks Grade
101 Davis 11/04/1986 M4 Applied Maths Basic Math's 7 11/11/2004 82 A
102 Daniel 11/06/1987 M4 Applied Maths Basic Math’s 7 11/11/2004 62 C
101 Davis 11/04/1986 H6 American History 4 22/11/2004 79 B
103 Sandra 10/02/1988 C3 Bio Chemistry Basic Chemistry 11 16/11/2004 65 B
104 Evelyn 22/02/1986 B3 Botany 8 26/11/2004 77 B
102 Daniel 11/6/1987 P3 Nuclear Physics Basic Physics 13 12/11/2004 68 B
105 Susan 31/8/1985 P3 Nuclear Physics Basic Physics 13 12/11/2004 89 A
103 Sandra 10/02/1988 B4 Zoology 5 27/11/2004 54 D
105 Susan 31/8/1985 H6 American History 4 22/11/2004 87 A
104 Evelyn 22/02/1986 M4 Applied Maths Basic Math's 7 11/11/2004 65 B

14
NORMALIZATION
• NORMALIZATION (Example of Bad Database
Design)
S# S-name DOB C# C-Name Pre-requisite Duration DOE Marks Grade
101 Davis 11/04/1986 M4 Applied Maths Basic Math's 7 11/11/2004 82 A
102 Daniel 11/06/1987 M4 Applied Maths Basic Math’s 7 11/11/2004 62 C
101 Davis 11/04/1986 H6 American History 4 22/11/2004 79 B
103 Sandra 10/02/1988 C3 Bio Chemistry Basic Chemistry 11 16/11/2004 65 B
104 Evelyn 22/02/1986 B3 Botany 8 26/11/2004 77 B
102 Daniel 11/6/1987 P3 Nuclear Physics Basic Physics 13 12/11/2004 68 B
105 Susan 31/8/1985 P3 Nuclear Physics Basic Physics 13 12/11/2004 89 A
103 Sandra 10/02/1988 B4 Zoology 5 27/11/2004 54 D
105 Susan 31/8/1985 H6 American History 4 22/11/2004 87 A
104 Evelyn 22/02/1986 M4 Applied Maths Basic Math's 7 11/11/2004 65 B

Above table contains All Anomalies(Insert, Update and Delete) hence table is
Bad
Database design.
There we will apply Normalization process i.e. 1NF, 2NF and 3 NF…..
15
NORMALIZATION
• NORMALIZATION (Applying 1 NF on Bad database
Design)
S# S-name DOB C# C-Name Pre-requisite Duration DOE Marks Grade
101 Davis 11/04/1986 M4 Applied Maths Basic Math's 7 11/11/2004 82 A
102 Daniel 11/06/1987 M4 Applied Maths Basic Math’s 7 11/11/2004 62 C
101 Davis 11/04/1986 H6 American History 4 22/11/2004 79 B
103 Sandra 10/02/1988 C3 Bio Chemistry Basic Chemistry 11 16/11/2004 65 B
104 Evelyn 22/02/1986 B3 Botany 8 26/11/2004 77 B
102 Daniel 11/6/1987 P3 Nuclear Physics Basic Physics 13 12/11/2004 68 B
105 Susan 31/8/1985 P3 Nuclear Physics Basic Physics 13 12/11/2004 89 A
103 Sandra 10/02/1988 B4 Zoology 5 27/11/2004 54 D
105 Susan 31/8/1985 H6 American History 4 22/11/2004 87 A
104 Evelyn 22/02/1986 M4 Applied Maths Basic Math's 7 11/11/2004 65 B

• 1NF: If and only if all attributes of the Relation R is atomic in nature.


• All attributes of the above table is atomic in nature hence table is
in 1NF.
16
NORMALIZATION
• NORMALIZATION (Applying 2 NF on output of 1NF table)
• 2NF(Second Normal Form)
• R is said to in second normal form if and only if :
• It is in the 1NF.
• No Partial dependency exists between non-key attributes and Key attribute.
• Group such partial dependent attributes as a separate table and identify Key and name of newly created
table.
• Solution: Here S#,C# is the primary key of College table.
• S#,C# Marks S# S-Name C# C-Name
S#,C# Grade S# DOB C# Pre-requisite
C# Duration
C# DOE

Full F. Partial Partial


D F.D. F.D.
17
NORMALIZATION
• Output of 2NF :
College (T1) Student (T2) Course (T3)
S# C# Marks Grade S# S-name DOB C# C-Name Pre-requisite Duration DOE
101 M4 82 A 101 Davis 11/04/1986
M4 Applied Maths Basic Maths 7 11/11/2004
102 M4 62 C
H6 American History 4 22/11/2004
101 H6 79 B 102 Daniel 11/06/1987
C3 Bio Chemistry Basic Chemistry 11 16/11/2004
103 C3 65 B
104 B3 77 B 103 Sandra 10/02/1988 B3 Botany 8 26/11/2004

102 P3 68 B P3 Nuclear Physics Basic Physics 13 12/11/2004


105 P3 89 A 104 Evelyn 22/02/1986 B4 Zoology 5 27/11/2004
103 B4 54 D
105 H6 87 A
105 Susan 31/8/1985
104 M4 65 B

18
NORMALIZATION
• 3 NF(Third Normal Form)
• R is said to in second normal form if and only if :
• It is in the 2NF.
• No Transitive dependency exists between non-key attributes and Key attribute.
• Group such Transitive dependent attributes as a separate table and identify Key
name
and of newly created table. College (T1) Grades(T4)
• Solution : College S#
101 M4
C# Marks
82
Upper- Lower- Grad
bound bound e
• S#,C# Marks Grade 102 M4 62 100 95 A+
101 H6 79 94 85 A
• Transitive dependency 103 C3 65 84 70 B
104 B3 77 69 65 B-
102 P3 68
64 55 C
105 P3 89
54 45 D
103 B4 54
44 40 E
105 H6 87
104 M4 65

19
NORMALIZATION
• Final solution after 3NF with no Anomalies
: Course (T3)
• College (T1) Student
S# C# Marks (T2)
S# S-name DOB C# C-Name Pre- Duratio DOE
101 M4 82 101 Davis 11/04/1986 requisite n
102 M4 62 M4 Applied Maths Basic Maths 7 11/11/2004
101 H6 79 102 Daniel 11/06/1987
H6 American 4 22/11/2004
103 C3 65 History
103 Sandra 10/02/1988
104 B3 77 C3 Bio Chemistry Basic 11 16/11/2004
102 P3 68 Chemistry
104 Evelyn 22/02/1986
105 P3 89 B3 Botany 8 26/11/2004

103 B4 54 105 Susan 31/8/1985 P3 Nuclear Physics Basic Physics 13 12/11/2004


105 H6 87 B4 Zoology 5 27/11/2004
104 M4 65
Upper-bound Lower-bound Grade
100 95 A+
94 85 A
84 70 B Grades(T4)
69 65 B-
64 55 C
54 45 D
20
44 40 E
NORMALIZATION(BCNF)
• Boyce-Codd Normal Form (BCNF)
• A table is in Boyce-Codd normal form (BCNF) if every determinant in the table
is
a candidate key.

• Important Point :
• Every BCNF relation is in 3NF but not every 3NF is in BCNF.

21
The Decomposition of a Table Structure to meet BCNF Requirements

◆ A + B → C, D
◆ C→B
Change the PK to A+C

◆ A + C → B, D
◆ C→B

◆ A+C→D
◆ C→B
22
The Boyce-Codd Normal Form
(BCNF)

◆ STU_ID + STAFF_ID → CLASS_CODE, ENROLL_GRADE


◆ CLASS_CODE → STAFF_ID
◆ Points :
◆ Each Class_code identifies a class uniquely
◆ A Student can take admission to many classes
◆ A Staff can teach to many classes but each class taught by only one staff.

23
The Boyce-Codd Normal Form
(BCNF)

◆ STU_ID + STAFF_ID → CLASS_CODE, ENROLL_GRADE


◆ CLASS_CODE → STAFF_ID

24
Decomposition into BCNF

25
NORMALIZATION
• 4 NF(Fourth Normal Form)
• R is said to in second normal form if and only if :
• It is in the 3NF.
• No Multivalued dependency exists between non-key attributes and Key attribute.
• Group such Multivalued attributes as a separate table and identify Key and name
of newly created table.
Student(One of the Table in 3NF)Student(T1) Stu-Address(T11)
S# S-Name Add. DOB S# Add.
S# S-Name DOB
101 Davis A1 11/04/1986 101 Davis 11/04/1986 101 A1
101 Davis A2 11/04/1986 102 Smith 12/05/1987
101 A2
101 Davis A3 11/04/1986 101 A3
102 Smith B1 12/05/1987 102 B1
102 Smith B1 12/05/1987 102 B1

Dr. Amol 26
Pande

You might also like