0% found this document useful (0 votes)
15 views

EDM - E1 - Data Architecture and Modeling - Normalization v1.1

Data Architecture and Modeling

Uploaded by

mukhopadhyay00
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

EDM - E1 - Data Architecture and Modeling - Normalization v1.1

Data Architecture and Modeling

Uploaded by

mukhopadhyay00
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 60

EDM - E1 - Data Architecture and Modeling –

Normalization

June 26, 2024 TCS Confidential


EDM – E1 – Data Architecture and Modeling – Training Lecture
Series

• EDM – E1 – Data Architecture and Modeling - Data Architecture

• EDM – E1 – Data Architecture and Modeling - Data Modeling Overview

• EDM – E1 – Data Architecture and Modeling - Normalization

• EDM – E1 – Data Architecture and Modeling – Dimensional Modeling

June 26, 2024 2


Normalization - Agenda
• Definitions
– Relation
– Key
– Functional Dependency

• Normalization
– Normal Forms
– 1st NF
– 2nd NF
– 3rd NF
– Denormalization

• Normalization Examples
– Example 1
– Example 2

• Synthesis of Relations

• References

June 26, 2024 3


Normalization - Agenda
• Definitions
– Relation
– Key
– Functional Dependency

• Normalization
– Normal Forms
– 1st NF
– 2nd NF
– 3rd NF
– Denormalization

• Normalization Examples
– Example 1
– Example 2

• Synthesis of Relations

• References

June 26, 2024 4


Relation - Definition
• Relational DBMS products store data in the form of relations

• A Relation is a two-dimensional table that has the following characteristics


– No two rows are identical
– Rows contain data about an entity
– Columns contain data about attributes of the entity
– Cells of the table hold a single value
– The order of the columns is unimportant
– The order of the rows is unimportant
– All entries in a column are of the same kind/domain
– Each column has a unique name

• Although not all tables are relations, every relation is a table

• The terms table and relation are normally used interchangeably


– Table/row/column = file/record/field = relation/tuple/attribute

June 26, 2024 5


Example: Table is a Relation

Prof

Prof_Name Prof_Dept Prof_Email

Dilip Vengsarkar Batting [email protected]

Farukh Engineer Fielding [email protected]

Hrithik Roshan Bowling [email protected]

Juhi Chawla Fielding [email protected]

K Srikanth Batting [email protected]

Katrina Kaif Bowling [email protected]

Preity Zinta Bowling [email protected]

S Venkataraghavan Bowling [email protected]

Shahrukh Khan Batting [email protected]

Vijay Mallya Fielding [email protected]

June 26, 2024 6


Example: Table is NOT a Relation

Prof

Prof_Name Prof_Dept Prof_Email

Dilip Vengsarkar Batting [email protected]

Farukh Engineer Fielding [email protected]

Hrithik Roshan Bowling [email protected]

Juhi Chawla Fielding [email protected]

K Srikanth Batting [email protected]

Home: [email protected]

Katrina Kaif Bowling [email protected]

Preity Zinta Bowling [email protected]

S Venkataraghavan Bowling [email protected]

Shahrukh Khan Batting [email protected], [email protected]

Vijay Mallya Fielding [email protected]

June 26, 2024 7


Types of Keys
• A key is one or more columns of a relation that identifies a row

• A unique key identifies a single row;


a non-unique key identifies several rows

• Composite key is a key that contains two or more attributes

• A relation has one unique primary key and may also have
additional unique keys called candidate keys (CK)

• Candidate Keys follow the same definition of Primary Key as


shown in next slide

• A PK is a CK, but not every CK is a PK. CKs that are not the PK
are called Alternate Keys (AK).

June 26, 2024 8


Primary Key

• A Primary Key (PK) of a relation is a designated key, represented by


one or more of its attributes and
– Is a Unique Key, and
– Does not accept NULL values, and
– Is minimal

• Primary key is generally used to


– Represent the table in relationships
– Organize table storage
– Generate indexes

• Sometimes, relations are denoted by showing the name of the


relation followed by the attributes of the relation in parentheses. The
primary key of the relation is underlined.

E.g. Prof (Prof_Name, Prof_Dept, Prof_Email)

June 26, 2024 9


Foreign Key

• A Foreign Key (FK) of a relation is represented by one or more of its


attributes and
– Its values are derived from the respective attribute (s) of
another/same relation’s PK, or they are NULL

• A relation can have 0 or more FKs.

• In a physical table, the domains of the FK columns must match


those of its respective (i.e. order of the column matching matters)
PK columns.

• Foreign Keys maintain referential integrity between the referencing


(containing FK) and the referenced (containing the referenced PK)
table.

June 26, 2024 10


Surrogate Key

• A Surrogate Key (SK) of a relation is a unique, system-supplied,


numeric value that is appended to the relation to serve as the
primary key.

• SK values have no meaning to the users and are normally hidden


from them on forms, queries, and reports.

• SKs are used to replace composite PKs in the referencing table, that
are long (i.e. take a lot of bytes/row) and reduce disk space by their
usage as FK in referential integrity constraints. SKs are defined in
the referenced table that contains the original PK, which now
becomes an AK.

• SKs are used to increase join performance.

June 26, 2024 11


Functional Dependencies
• A functional dependency (FD) occurs when the value of one (set of) attribute(s) {LHS}
determines the value of a second (set of) attribute (s) {RHS} in a relation.

• Stated alternatively, in a relation R, column, B is said to be functionally dependent upon column


A of R if and only if each value of A in R is associated with precisely one value of B at any given
time. A and B may be composite.

e.g. In the FD A  B, A determines B, or B is functionally dependent on A. A is the LHS, and B is
the RHS.

• The attribute on the left side of the functional dependency is called the determinant e.g.
– ProfName  ProfDept, ProfEmail
• Given the name of a professor, I can tell you with certainty his/her dept and email. In
other words, ProfDept and ProfEmail are functionally dependent on/determined by
ProfName
– StudentID  DormName, Fee
– CustomerNumber, ItemNumber, Quantity  Price

• While a primary key is always a determinant,


a determinant is not necessarily a primary key

June 26, 2024 12


Functional Dependencies

• A partial dependency occurs when the value of one (set of) attribute (s) that are part of a key,
determines the value of a second (set of) attribute (s)

• A transitive dependency occurs when the value of one (set of) attribute (s) determines the value of a
second (set of) attribute (s), which in turn determines the value of a third (set of) attribute (s).
e.g. If A  B and B  C, then A  C

As an example,

Relation R (A, B, C, D, E, F, G) has following FDs


• FD1: A, B, C  D, E, F, G :: Definition of PK. Non-key attributes (D, E, F, G) are dependent on whole key
(A, B, C)

• FD2: C  D :: Partial Dependency. Non-key attribute D is dependent on part ( C ) of the key (A, B, C)

• FD3: D  E :: C  D and D  E implies C  E, thus constituting a transitive dependency. D  E


constitutes a non-key to non-key dependency.

• FD4: B  G :: Partial dependency :: Non-key attribute G is dependent on part ( B ) of the key (A, B, C)

June 26, 2024 13


Normalization - Agenda
• Definitions
– Relation
– Key
– Functional Dependency

• Normalization
– Normal Forms
– 1st NF
– 2nd NF
– 3rd NF
– Denormalization

• Normalization Examples
– Example 1
– Example 2

• Synthesis of Relations

• References

June 26, 2024 14


Normalization – Ref. 1
• Database Normalization is a technique for designing relational database tables to minimize
duplication of information and, in so doing, to safeguard the database against certain types of
logical or structural problems, namely data anomalies.

• Normalization is a technique for producing a set of relations with desirable properties, given the
data requirements of an enterprise.

• A table that is sufficiently normalized is less vulnerable to problems of this kind, because its
structure reflects the basic assumptions for when multiple instances of the same information
should be represented by a single instance only.

• Normalization eliminates modification anomalies


– Deletion anomaly: deletion of a row loses information about two or more entities
– Insertion anomaly: insertion of a fact in one entity cannot be done until a fact about
another entity is added

• Anomalies can be removed by splitting the relation into two or more relations; each with a
different, single theme. However, breaking up a relation may create referential integrity
constraints.

• Normalization works through classes of relations called normal forms

June 26, 2024 15


Normalization
• The process of normalization is a formal method that identifies relations
based on their primary or candidate keys and the functional dependencies
among their attributes.

• Normalization is often executed as a series of steps. Each step


corresponds to a specific normal form that has known properties.

• As normalization proceeds, the relations become progressively more


restricted in format, and also less vulnerable to update anomalies.

• For the relational data model, it is important to recognize that it is only first
normal form (1NF) that is critical in creating relations. All the subsequent
normal forms are optional.

June 26, 2024 16


Normal Forms - Definition
• Any table of data is in 1NF if it meets the definition of a relation

• A relation is in 2NF iff all its non-key attributes are dependent on all of the PK (no
partial dependencies) or no key i.e. other non-key attributes

• A relation is in 3NF iff it is in 2NF and has no transitive dependencies. In other


words, for relation in 3NF, every non-key attribute must be non-transitively
dependent on the PK

• A relation is in BCNF iff every determinant is a CK

• A relation is in 4NF iff it is in BCNF and has no multi-value dependencies

• A relation is in 5NF iff it is in 4NF and every join dependency in it is implied by the
CKs

• A relation is in DK/NF iff it is in 5NF and every constraint on the relation is a


logical consequence of the definition of keys and domains
– First published in 1981 by Fagin
– DK/NF has no modification anomalies; so no higher normal form is needed

June 26, 2024 17


First Normal Form (1NF)
• Unnormalized form (UNF) Repeating group = (propertyNo, pAddress,
A table that contains one or
more repeating groups. rentStart, rentFinish, rent, ownerNo, oName)

ClientNo cName propertyNo pAddress rentStart rentFinish rent ownerNo oName


6 lawrence Tina
1-Jul-00 31-Aug-01 350 CO40 Murphy
PG4 St,Glasgow
John
CR76
kay Tony
PG16 5 Novar Dr, Shaw
1-Sep-02 1-Sep-02 450 CO93
Glasgow

6 lawrence Tina
PG4 1-Sep-99 10-Jun-00 350 CO40 Murphy
St,Glasgow

Tony
Aline 2 Manor Rd,
CR56 PG36 10-Oct-00 1-Dec-01 370 CO93 Shaw
Stewart Glasgow

Tony
5 Novar Dr, Shaw
PG16 1-Nov-02 1-Aug-03 450 CO93
Glasgow

June 26, 2024 18


Definition of 1NF

• FirstNormal Form is a relation in which the intersection of each


row and column contains one and only one value.

There are two approaches to removing repeating groups from


nnormalized tables

1. Removes the repeating groups by entering appropriate data


n the empty columns of rows containing the repeating data.

2. Removes the repeating group by placing the repeating data,


long with a copy of the original key attribute(s), in a separate
elation. A primary key is identified for the new relation.

June 26, 2024 19


1NF – First Approach
• With the first approach, we remove the repeating group (property rented
details) by entering the appropriate client data into each row.

ClientNo propertyNo cName pAddress rentStart rentFinish rent ownerNo oName

John 6 lawrence Tina


CR76 PG4 1-Jul-00 31-Aug-01 350 CO40
Kay St,Glasgow Murphy

John 5 Novar Dr, Tony


CR76 PG16 1-Sep-02 1-Sep-02 450 CO93
Kay Glasgow Shaw

Aline 6 lawrence Tina


CR56 PG4 1-Sep-99 10-Jun-00 350 CO40
Stewart St,Glasgow Murphy

2 Manor Tony
Aline
CR56 PG36 Rd, 10-Oct-00 1-Dec-01 370 CO93 Shaw
Stewart
Glasgow
Tony
Aline 5 Novar Dr,
CR56 PG16 1-Nov-02 1-Aug-03 450 CO93 Shaw
Stewart Glasgow

June 26, 2024 20


1NF – Second Approach
• With the second approach, we remove the ClientNo cName
repeating group (property rented details) by
CR76 John Kay
placing the repeating data along with a copy
of the original key attribute (clientNo) in a CR56 Aline Stewart
separate relation.
ClientNo propertyNo pAddress rentStart rentFinish rent ownerNo oName

6 lawrence Tina
CR76 PG4 1-Jul-00 31-Aug-01 350 CO40
St,Glasgow Murphy

5 Novar Dr, Tony


CR76 PG16 1-Sep-02 1-Sep-02 450 CO93
Glasgow Shaw

6 lawrence Tina
CR56 PG4 1-Sep-99 10-Jun-00 350 CO40
St,Glasgow Murphy

2 Manor
Rd, Tony
CR56 PG36 10-Oct-00 1-Dec-01 370 CO93
Shaw
Glasgow

5 Novar Dr, Tony


CR56 PG16 1-Nov-02 1-Aug-03 450 CO93
Glasgow Shaw

June 26, 2024 21


Second Normal Form (2NF)
•Second normal form (2NF) is a relation that is in first normal form and
every non-primary-key attribute is fully functionally dependent on the
primary key.

•The normalization of 1NF relations to 2NF involves the removal of partial


dependencies. If a partial dependency exists, we remove the functional
dependency attributes from the relation by placing them in a new relation
along with a copy of their determinant.

June 26, 2024 22


2NF ClientRental relation

The ClientRental relation has the following functional dependencies:

fd1 clientNo, propertyNo  rentStart, rentFinish (Primary key)

fd2 clientNo  cName (Partial dependency)

fd3 propertyNo  pAddress, rent, ownerNo, oName (Partial dependency)

fd4 ownerNo  oName (Transitive Dependency)

fd5 clientNo, rentStart  propertyNo, pAddress,


rentFinish, rent, ownerNo, oName (Candidate key)

fd6 propertyNo, rentStart  clientNo, cName, rentFinish (Candidate key)

June 26, 2024 23


2NF ClientRental relation
• After removing the partial dependencies, three new relations created are
called Client, Rental, and PropertyOwner
• Client (clientNo, cName)
• Rental (clientNo, propertyNo, rentStart, rentFinish)
• PropertyOwner (propertyNo, pAddress, rent, ownerNo, oName)

property owner ClientNo propertyNo rentStart rentFinish


pAddress rent oName
No No
CR76 PG4 1-Jul-00 31-Aug-01
6 lawrence Tina
PG4 350 CO40 CR76 PG16 1-Sep-02 1-Sep-02
St,Glasgow Murphy

5 Novar Dr, Tony CR56 PG4 1-Sep-99 10-Jun-00


PG16 450 CO93
Glasgow Shaw
CR56 PG36 10-Oct-00 1-Dec-01
2 Manor Rd, Tony
PG36 370 CO93
Glasgow Shaw CR56 PG16 1-Nov-02 1-Aug-03

Property Owner Rental


ClientNo cName

CR76 John Kay

CR56 Aline Stewart Client

June 26, 2024 24


Third Normal Form (3NF)
•A relation that is in second normal form, and in which no non-primary-key
attribute is transitively dependent on the primary key.
•The normalization of 2NF relations to 3NF involves the removal of transitive
dependencies
• The functional dependencies for the Client, Rental and PropertyOwner
relations are as follows:

Client
fd2 clientNo  cName (Primary key)

Rental
fd1 clientNo, propertyNo  rentStart, rentFinish (Primary key)
fd5 clientNo, rentStart  propertyNo, rentFinish (Candidate key)
fd6 propertyNo, rentStart  clientNo, rentFinish (Candidate key)

PropertyOwner
fd3 propertyNo  pAddress, rent, ownerNo, oName (Primary Key)
fd4 ownerNo  oName (Transitive Dependency)

June 26, 2024 25


3NF Modeling Process
• The process of transforming a table into 3NF is:

– Identify any determinants, other than the primary key, and the columns
they determine.

– Create and name a new table for each determinant and the unique
columns it determines.

– Move the determined columns from the original table to the new table.
The determinant becomes the primary key of the new table.

– Delete the columns you just moved from the original table except for
the determinant which will serve as a foreign key.

– The original table may be renamed to maintain semantic meaning.

June 26, 2024 26


3NF ClientRental relation

• The resulting 3NF relations have the forms:

Client (clientNo, cName)


Rental (clientNo, propertyNo, rentStart,
rentFinish)
PropertyOwner (propertyNo, pAddress, rent, ownerNo)
Owner (ownerNo, oName)

June 26, 2024 27


3NF ClientRental relation
Client Rental
ClientNo cName ClientNo propertyNo rentStart rentFinish
CR76 John Kay CR76 PG4 1-Jul-00 31-Aug-01
CR56 Aline Stewart CR76 PG16 1-Sep-02 1-Sep-02
CR56 PG4 1-Sep-99 10-Jun-00
CR56 PG36 10-Oct-00 1-Dec-01
CR56 PG16 1-Nov-02 1-Aug-03

PropertyOwner Owner

propertyNo pAddress rent ownerNo ownerNo oName

PG4 6 lawrence St,Glasgow 350 CO40 CO40 Tina Murphy

PG16 5 Novar Dr, Glasgow 450 CO93 CO93 Tony Shaw

PG36 2 Manor Rd, Glasgow 370 CO93

June 26, 2024 28


3NF Modeling
• The advantage of having relational tables in 3NF is that it eliminates
redundant data which in turn saves space and reduces manipulation
anomalies.

• When compared to a star schema, a 3NF schema typically has a larger


number of tables due to this normalization process.

• The tables which describe the dimensions in the snowflake scheme are in
Third normal form.

• 3NF schemas are typically chosen for large data warehouses, especially
environments with significant data-loading requirements that are used to
feed data marts and execute long-running queries.

• Queries on 3NF schemas are often very complex and involve a large
number of tables. The performance of joins between large tables is thus a
primary consideration when using 3NF schemas.

June 26, 2024 29


Benefits of Normalization – Ref. 3
• Aids in the discovery process
– Posing questions to business experts might uncover new requirements

• Ensures precise capture of business logic


– Domain experts distill the knowledge of business rules and logic by defining proper
functional dependencies between attributes

• Minimize Redundancy
– One fact in one place – single theme
– Defined once – used consistently by all stakeholders

• Minimizes requirements for use of NULL values


– NULL values may cause problems in access and use of data
– Maybe, some additional business fact may have been overlooked

• Prevents loss of information or introduction of anomalies

• Aids in model management and integration


– Enterprise awareness
– Better integration during M&A

June 26, 2024 30


Denormalized Designs
• When a normalized design is unnatural, awkward, or results in
unacceptable performance, a denormalized design is preferred

• Example
– Normalized relation
• CUSTOMER (CustNumber, CustName, Zip)
• CODES (Zip, City, State)
– De-Normalized relations
• CUSTOMER (CustNumber, CustName, City, State, Zip)

• Denormalization is frequently observed in dimensional modeling in


data warehouse environments primarily for performance reasons.

June 26, 2024 31


Normalization - Agenda
• Definitions
– Relation
– Key
– Functional Dependency

• Normalization
– Normal Forms
– 1st NF
– 2nd NF
– 3rd NF
– Denormalization

• Normalization Examples
– Example 1
– Example 2

• Synthesis of Relations

• References

June 26, 2024 32


Example 1: Problem Statement

• Given relation R (A, B, C, D, E, F) and functional dependencies

FD1: A, B, C  D, E, F
FD2: C  D
FD3: B  E
FD4: E  F

NOTE: There is only 1 CK viz. (A, B, C). Hence, it is the PK.


• Break this relation into 3NF relations, eliminating insertion and deletion
anomalies.

June 26, 2024 33


Example 1: 2NF violation

• Given relation R (A, B, C, D, E, F) and functional dependencies

FD1: A, B, C  D, E, F
FD2: C  D
FD3: B  E
FD4: E  F

• 2NF Violation:
FD2 and FD3 violate 2NF for R, since they are partial dependencies i.e.
their determinant is part of the key.

June 26, 2024 34


Example 1: 2NF resolution
• 2NF Resolution:
Take each of the violating FDs and create new relations, with the determinant
attribute (s) as the new PK. Leave a reference of the determinant in the
original relation, and remove the non-key attributes (unless they are
determinants of some other FD) from the original relation.

• So after resolution, the new relations are


1) S (C, D)
FD2: C  D

2) T (B, E)
FD3: B  E

3) R1 (A, B, C, E, F)
FD11: A, B, C  E, F
FD4: E  F

June 26, 2024 35


Example 1: 3NF violation

Consider the previous 2NF relations:


1) S (C, D)
FD2 C  D

2) T (B, E)
FD3 B  E

3) R1 (A, B, C, E, F)
FD11: A, B, C  E, F
FD4: E  F

3NF violation:
Only FD4 violates 3NF for (only) R1, since it has non-key attributes (F) determined by
other non-key attributes (E) i.e. its non-key attributes are dependent on NO KEY. Note
that S and T are already in 3NF with respect to their FDs.

June 26, 2024 36


Example 1: 3NF resolution
• 3NF Resolution:
Take each of the violating FDs and create new relations, with the determinant
attribute (s) as the new PK. Leave a reference of the determinant in the
original relation, and remove the non-key attributes (unless they are
determinants of some other FD) from the original relation.

• So after resolution, the new relations are


1) S (C, D)
FD2: C  D

2) T (B, E)
FD3: B  E

3) Q (E, F)
FD4: E  F

4) R12 (A, B, C, E)
FD112: A, B, C  E

June 26, 2024 37


Example 1: Final Answer: All relations satisfy 3NF
• Original relation in 1NF, R (A, B, C, D, E, F) and functional dependencies

FD1: A, B, C  D, E, F
FD2: C  D
FD3: B  E
FD4: E  F

• After 3NF resolution, the new relations are


1) S (C, D)
FD2: C  D

2) T (B, E)
FD3: B  E

3) Q (E, F)
FD4: E  F

4) R12 (A, B, C, E)
FD112: A, B, C  E

June 26, 2024 38


Example 1: Alternate approach
• 2NF resolution could alternatively be achieved by carrying the transitive dependency in the same
relation which had its partial dependency. Hence,
• Alternate 2NF Resolution
1) Salt (C, D)
FD2: C  D :: satisfies 2NF

2) Talt (B, E, F) :: Take the complete transitive (and its partial) dependency into this table
FD3alt: B  E, F :: satisfies 2NF
FD4: E  F :: satisfies 2NF, violates 3NF

3) R1alt (A, B, C, E)
FD11alt: A, B, C  E :: satisfies 2NF

• Alternate 3NF Resolution


Finally, Only FD4 violates 3NF for (only) Talt, since it has non-key attributes (F) determined by other
non-key attributes (E) i.e. its non-key attributes are dependent on NO KEY. Note that Salt and R1alt
are already in 3NF with respect to their FDs.

• After 3NF resolution of Talt, you will land up with the same final answer as shown in previous
slide.

June 26, 2024 39


Example 2 – Problem Statement
GIVEN
a) The set of 9 attributes to track student grades in a college, a relation with
sample data and constraints are shown in next 2 slides

Stu_Gr (A, B, C, D, E, F, G, H, I)
b) Assume that there are normalization anomalies present in this relation
Stu_Gr containing the 9 attributes

PROBLEM

Design a set of tables that will be in 3NF for this real world scenario

June 26, 2024 40


Example 2 – Student Grades - Sample Data in 1NF
Grad
Stu_Gr

Class_Name Section Term e Student_Number Student_Name Prof_Name Prof_Dept Prof_Email

A B C D E F G H I

1 Bowling101 A Fall A 101007 Sachin Tendulkar Mukesh Ambani Bowling [email protected]

2 Bowling101 A Fall C 101008 Harbhajan Singh Mukesh Ambani Bowling [email protected]

3 Bowling101 B Fall B 101009 Andrew Symonds Preity Zinta Bowling [email protected]

4 Bowling101 B Fall E 101002 Rahul Dravid Preity Zinta Bowling [email protected]

5 Batting101 A Fall B 101007 Sachin Tendulkar Shahrukh Khan Batting [email protected]

6 Batting101 A Fall A 101008 Harbhajan Singh Shahrukh Khan Batting [email protected]

7 Batting101 A Fall A 101009 Andrew Symonds Shahrukh Khan Batting [email protected]

8 Batting101 A Fall A 101002 Rahul Dravid Shahrukh Khan Batting [email protected]

9 Bowling101 A Spring D 101003 Ajit Agarkar Katrina Kaif Bowling [email protected]

10 Bowling101 A Spring B 101005 Virat Kohli Katrina Kaif Bowling [email protected]

11 Fielding101 A Spring A 101011 Yuvraj Singh Juhi Chawla Fielding [email protected]

12 Fielding101 A Spring A 101007 Sachin Tendulkar Juhi Chawla Fielding [email protected]

13 Fielding101 B Spring B 101001 Irfan Pathan Vijay Mallya Fielding [email protected]

14 Fielding101 A Summer B 101007 Sachin Tendulkar Juhi Chawla Fielding [email protected]

15 Bowling101 A Summer B 101004 Ishant Sharma Katrina Kaif Bowling [email protected]

16 Bowling101 B Summer B 101002 Rahul Dravid Preity Zinta Bowling [email protected]

17 Wkeeping102 A Summer A 101006 Mahendra Singh Dhoni Farukh Engineer Fielding [email protected]

18 WKeeping102 A Summer A 101010 Dinesh Karthik Farukh Engineer Fielding [email protected]

19 Spinning102 A Summer B 101007 Sachin Tendulkar S Venkataraghavan Bowling [email protected]

20 Spinning102 A Summer B 101008 Harbhajan Singh S Venkataraghavan Bowling [email protected]

21 SlowBatting102 A Summer C 101004 Ishant Sharma Dilip Vengsarkar Batting [email protected]

22 SlowBatting102 A Summer A 101002 Rahul Dravid Dilip Vengsarkar Batting [email protected]

23 FastBatting102 A Summer A 101006 Mahendra Singh Dhoni K Srikanth Batting [email protected]

24 FastBatting102 A Summer A 101011 Yuvraj Singh K Srikanth Batting [email protected]

… … … … … … … … June
… 26, 2024 … 41
Example 2 - Constraints
CONSTRAINTS
Stu_Gr

1 A student can take many classes in a term

2 A professor can teach many classes in a term

3 Any class, in a given term, can only be taught by 1 professor

4 All of the data is restricted to 1 college year

5 Professor name is unique in the college

6 A professor must have exactly 1 email, but could be shared by spouse if spouse teaching in same college

7 A professor cannot teach a class outside of their department

8 A professor belongs to 1 and only 1 department

9 "E" is a failing grade, whereas "A", "B", "C", "D" are passing grades

10 A Student can enroll for the same class in the same term in 2 different sections, although not likely

11 Student Number is unique in the college

12 Assume Stu_Gr is a Relation

June 26, 2024 42


So, what are some of the anomalies you see?

• What happens if Hrithik Roshan joins the faculty as a professor in Fall, but doesn’t
start teaching till Spring? Can he be entered into the Stu_Gr table?

• Assuming there were only 24 records in the Stu_Gr table (as shown), and we found
that Irfan Pathan’s info (row 13) need no longer be tracked – Does that mean Vijay
Mallya has left the college?

• Assuming there were only 24 records in the Stu_Gr table (as shown), and we found
that Irfan Pathan’s info (row 13) need no longer be tracked – Does that mean that the
class “Fielding101” Section B was never offered in Spring? And that too by Vijay
Mallya?

June 26, 2024 43


Example 2 – Step 1: Identify CKs

• Based on constraints, and looking at the sample data, there is only 1 CK in the Stu_Gr
relation

CK1: (Class_Name, Section, Term, Student_Number)

Hence, the PK is
PK : A, B, C, E

Stu_Gr (A, B, C, D, E, F, G, H, I)

June 26, 2024 44


Example 2 – Step 2: Identify FDs

Stu_Gr (A, B, C, D, E, F, G, H, I)
1T1

Functional Dependency Based on Non-Key attribute(s) Determinant Satisfies

1NF 2NF
FD1 A, B, C, E ---> D, F, G, H, I Definition of Primary Key D, F, G, H, I Whole Key Y Y

FD2 A, B, C ---> G Constraint 3 G Part of the Key Y N

FD3 G ---> H, I Constraints 5, 6, 7, 8 H, I No Key Y Y

FD4 E ---> F Constraint 11 F Part of the Key Y N

June 26, 2024 45


Example 2 – Step 3: 2NF Resolution
Course_Prof (A, B, C, G, H, I)

2T1

Functional Dependency Based on Non-Key attribute(s) Determinant Satisfies

1NF 2NF 3NF


FD2 A, B, C ---> G Constraint 3 G Whole Key Y Y Y
FD3 G ---> H, I Constraints 5, 6, 7, 8 H, I No Key Y Y N

Student (E, F)
2T2

Functional Dependency Based on Non-Key attribute(s) Determinant Satisfies

1NF 2NF 3NF


FD4 E ---> F Constraint 11 F Whole Key Y Y Y

Student_Course_Grade (A, B, C, D, E)
2T3_1T1

Functional
Dependency Based on Non-Key attribute(s) Determinant Satisfies

1NF 2NF 3NF


FD5_F Definition of Primary
D1 A, B, C, E ---> D Key D Whole Key Y Y Y

June 26, 2024 46


Example 2 – Step 3: 3NF Resolution

Prof (G, H, I)

3T1

Functional Dependency Based on Non-Key attribute(s) Determinant Satisfies

1NF 2NF 3NF

FD3 G ---> H, I Constraints 5, 6, 7, 8 H, I Whole Key Y Y Y

Course_Prof (A, B, C, G)
3T2_2T1

Functional Dependency Based on Non-Key attribute(s) Determinant Satisfies

1NF 2NF 3NF


FD6_FD2 A, B, C ---> G Constraint 3 G Whole Key Y Y Y

June 26, 2024 47


Example 2 – Final Answer in 3NF
Original Table in 1NF

Stu_Gr (A, B, C, D, E, F, G, H, I)
1T1

Functional Dependency Based on Non-Key attribute(s) Determinant

FD1 A, B, C, E ---> D, F, G, H, I Definition of Primary Key D, F, G, H, I Whole Key

FD2 A, B, C ---> G Constraint 3 G Part of the Key

FD3 G ---> H, I Constraints 5, 6, 7, 8 H, I No Key

FD4 E ---> F Constraint 11 F Part of the Key

The following 4 tables satisfy 3NF w.r.t. their FDs

Table Table Name FD Name FD

1 2T2 Student (E, F) FD4 E ---> F

2 2T3_1T1 Student_Course_Grade (A, B, C, D, E) FD5_FD1 A, B, C, E ---> D

3 3T1 Prof (G, H, I) FD3 G ---> H, I

4 3T2_2T1 Course_Prof (A, B, C, G) FD6_FD2 A, B, C ---> G

June 26, 2024 48


Example 2 – Sample Data 3NF

Student

Student_Number Student_Name

E F

1 101001 Irfan Pathan

2 101002 Rahul Dravid

3 101003 Ajit Agarkar

4 101004 Ishant Sharma

5 101005 Virat Kohli

6 101006 Mahendra Singh Dhoni

7 101007 Sachin Tendulkar

8 101008 Harbhajan Singh

9 101009 Andrew Symonds

10 101010 Dinesh Karthik

11 101011 Yuvraj Singh

… … …

June 26, 2024 49


Example 2 – Sample Data 3NF

Prof

Prof_Name Prof_Dept Prof_Email

G H I

1 Dilip Vengsarkar Batting [email protected]

2 Farukh Engineer Fielding [email protected]

3 Mukesh Ambani Bowling [email protected]

4 Juhi Chawla Fielding [email protected]

5 K Srikanth Batting [email protected]

6 Katrina Kaif Bowling [email protected]

7 Preity Zinta Bowling [email protected]

8 S Venkataraghavan Bowling [email protected]

9 Shahrukh Khan Batting [email protected]

10 Vijay Mallya Fielding [email protected]

… … … …

June 26, 2024 50


Example 2 – Sample Data 3NF

Course_Prof

Class_Name Section Term Prof_Name

A B C G

1 Batting101 A Fall Shahrukh Khan

2 Bowling101 A Fall Mukesh Ambani

3 Bowling101 B Fall Preity Zinta

4 Bowling101 A Spring Katrina Kaif

5 Bowling101 A Summer Katrina Kaif

6 Bowling101 B Summer Preity Zinta

7 FastBatting102 A Summer K Srikanth

8 Fielding101 A Spring Juhi Chawla

9 Fielding101 B Spring Vijay Mallya

10 Fielding101 A Summer Juhi Chawla

11 SlowBatting102 A Summer Dilip Vengsarkar

12 Spinning102 A Summer S Venkataraghavan

13 Wkeeping102 A Summer Farukh Engineer

… … … … …

June 26, 2024 51


Example 2 – Sample Data 3NF
Student_Course_Grade

Class_Name Section Term Grade Student_Number

A B C D E

1 Bowling101 A Fall A 101007

2 Bowling101 A Fall C 101008

3 Bowling101 B Fall B 101009

4 Bowling101 B Fall E 101002

5 Batting101 A Fall B 101007

6 Batting101 A Fall A 101008

7 Batting101 A Fall A 101009

8 Batting101 A Fall A 101002

9 Bowling101 A Spring D 101003

10 Bowling101 A Spring B 101005

11 Fielding101 A Spring A 101011

12 Fielding101 A Spring A 101007

13 Fielding101 B Spring B 101001

14 Fielding101 A Summer B 101007

15 Bowling101 A Summer B 101004

16 Bowling101 B Summer B 101002

17 Wkeeping102 A Summer A 101006

18 WKeeping102 A Summer A 101010

19 Spinning102 A Summer B 101007

20 Spinning102 A Summer B 101008

21 SlowBatting102 A Summer C 101004

22 SlowBatting102 A Summer A 101002

23 FastBatting102 A Summer A 101006

24 FastBatting102 A Summer A 101011

… … … … … …

June 26, 2024 52


Example 2 - Food for thought
• For example 2, draw the IDEF1X ER Diagram for the 3NF table
schema shown in slide “Example 2 – Final Answer in 3NF”

• What would happen to the schema if more than 1 professor could


teach a given class section in a term?

• Is the schema shown in slide “Example 2 – Final Answer in 3NF”


satisfy BCNF?

• How many FKs are there in Example 2 3NF schema?

• Have all the identified anomalies been eliminated by the 3NF design?

June 26, 2024 53


Normalization - Agenda
• Definitions
– Relation
– Key
– Functional Dependency

• Normalization
– Normal Forms
– 1st NF
– 2nd NF
– 3rd NF
– Denormalization

• Normalization Examples
– Example 1
– Example 2

• Synthesis of Relations

• References

June 26, 2024 54


The Synthesis of Relations

• Given a set of attributes with certain functional dependencies, what


relations should we form?

• Example: A and B are two attributes


– If A  B and B  A
• A and B have a one-to-one attribute relationship
– If A  B, but B not  A
• A and B have a many-to-one attribute relationship
– If A not  B and B not  A
• A and B have a many-to-many attribute relationship

June 26, 2024 55


One-to-One Attribute Relationships

• Attributes that have a one-to-one relationship must occur together in at


least one relation

• Call the relation R and the attributes A and B:


– Either A or B must be the key of R
– An attribute can be added to R if it is functionally determined by A or
B
– An attribute that is not functionally determined by A or B cannot be
added to R
– A and B must occur together in R, but should not occur together in
other relations
– Either A or B should be consistently used to represent the pair in
relations other than R

June 26, 2024 56


Many-to-One Attribute Relationships

• Attributes that have a many-to-one relationship can exist in a relation


together

• Assume C determines D in relation S (C, D)


– C must be the key of S
– An attribute can be added to S if it is determined by C
– An attribute that is not determined by C cannot be added to S

June 26, 2024 57


Many-to-Many Attribute Relationships

• Attributes that have a many-to-many relationship can exist in a relation


together

• Assume attributes E and F reside together in relation T


– The key of T must be (E, F)
– An attribute can be added to T if it is determined by the combination
(E, F)
– An attribute may not be added to T if it is not determined by the
combination (E, F)
– If adding a new attribute, G, expands the key to (E, F, G), then the
theme of the relation has been changed
• Either G does not belong in T or the name of T must be changed
to reflect the new theme

June 26, 2024 58


References

1. Wikipedia – www.wikipedia.org for definitions on normalization.

2. Database Concepts – 2nd edition, David M. Kroenke, Pearson Prentice Hall, ISBN 0-
13-145141-3

3. The Data Modeling Handbook – A Best-Practice Approach to Building Quality Data


Models – Michael Reingruber, William W. Gregory, John Wiley & Sons, Inc., ISBN
0-471-05290-6

June 26, 2024 59


June 26, 2024 60

You might also like