CS 623 – Database Management Systems
Session 6 Agenda
Review of Chapter 3: Individual Assignments
Review of Chapter 4: Individual and Team Assignments
Oracle Installation Questions
Review Midterm requirements
Chapter 6: Normalization and Denormalization
Note: Midterm Exam - 11/2/21
Review of Chapter 3: Individual
Assignments
Review of Chapter 4: Individual and Team
Assignments
CS 623 – Database Management Systems
Midterm Exam – November 2, 2021
The Midterm Exam will be held on Tuesday, November 2
Midterm Exam duration: 90 minutes.
Once you begin the Midterm Exam, you must complete in one sitting.
The Midterm Exam will consist of 50 multiple choice questions and
will cover Chapter 1 - Chapter 7 (approximately 7 questions per
chapter).
A study guide has been uploaded to the Content section of Pace
Classes.
CS 623 – Database Management Systems
Midterm Exam – November 2, 2021 (Con’t)
The Midterm Exam will be conducted during the zoom call.
Students will use LockDown Browser to take the exam.
LockDown Browser allows the instructor to do live proctoring via
Zoom.
Therefore, all students must turn on their webcam (with microphone
muted) during the exam, which will allow the professor to view the
student and surrounding area.
If any students have questions during the exam, use the chat facility
on Zoom to communicate with professor.
If any students need to leave the computer during the exam, use the
chat facility on Zoom to request permission from instructor.
CS 623 – Database Management Systems
Midterm Exam – November 2, 2021 (Con’t)
Students are NOT allowed to use any books, materials, technology,
or seek help from other people or collaborate with other students.
Academic Honesty is highly valued at Pace. Students are expected
to know and comply with Pace provisions on academic honesty, and
consequences of academic dishonesty which can include failing the
class or being expelled from the University.
Chapter 6: Normalization and
Denormalization
Chapter 6: Normalization and Denormalization
Results of a Poorly Designed Database
A poorly designed database may lead to redundant data and
anomalies.
Redundant data is unnecessary reoccurring data (repeating groups
of data).
Anomalies are any occurrence that weakens the integrity of your
data due to irregular or inconsistent storage (insert, update and
delete irregularity, that generates the inconsistent data).
Chapter 6: Normalization and Denormalization
Anomalies
An anomaly is an inconsistent, incomplete, or contradictory state of the
database.
If anomalies are present:
we would be unable to represent some information
we might lose information when certain updates were performed
we would run the risk of having data become inconsistent over time
Chapter 6: Normalization and Denormalization
Types of Anomalies
Insertion anomaly – user is unable to insert a new record when it
should be possible to do so
Update anomaly –a record is updated, but other appearances of the
same items are not updated
Deletion anomaly – when a record is deleted, other information that
is tied to it is also deleted
Chapter 6: Normalization and Denormalization
Anomalies Example - Combined Student-Enroll Table
Insertion anomaly: It is not possible to add a new class, for MTH101A , even if
faculty, schedule, room are known, unless there is a student registered for it,
because stuId is part of primary key
Update anomaly: If schedule of ART103A is updated in first record, and not in
second and third – data is inconsistent
Deletion anomaly: If record of student S1001 is deleted, information about
HST205A class is also lost
Chapter 6: Normalization and Denormalization
Normalization / Objectives
Normalization is the process of efficiently organizing data in a
database, which reduces the amount of space a database
consumes and ensures that data is logically stored.
The main objectives of the normalization process:
eliminating redundant data (storing the same data in more than one
table)
ensuring data dependencies make sense (only storing related data
in a table).
Design is free from insert, delete, update anomalies
Model flexibility (allowing the model to be extended when needed to
account for new attributes, entity sets, and relationships)
Chapter 6: Normalization and Denormalization
The Goals of Normalization
When normalizing a database you should achieve four goals:
Arranging data into logical groups such that each group describes a
small part of the whole
Minimizing the amount of duplicated data stored in a database
Building a database in which you can access and manipulate the
data quickly and efficiently without compromising the integrity of the
data storage
Organizing the data such that, when you modify it, you make the
changes in only one place
Chapter 6: Normalization and Denormalization
Normal Forms
We use the normalization process to design efficient and functional
databases. By normalizing, we store data where it logically and uniquely
belongs. The normalization process involves a few steps and each step
is called a form.
First Normal Form (1NF)
Second Normal Form (2NF)
Third Normal Form (3NF)
Boyce-Codd Normal Form (BCNF)
Fourth Normal Form (4NF)
Fifth Normal Form (5NF)
Domain Key Normal Form (DK/NF)
Each form is contained within the previous form – each form has stricter
rules than the previous form
Chapter 6: Normalization and Denormalization
Types of Dependencies involved with Normalization
The following all cause problems in Relational Design:
Functional dependencies
Multi-valued dependencies
Join dependencies
Chapter 6: Normalization and Denormalization
Functional Dependencies
Functional dependencies - is a relationship between two attributes,
typically between the primary key and other non-key attributes within
a table. Therefore for a relational table R, attribute B is functionally
dependent on attribute A (usually the primary key)
Written: A→B
Read: A functionally determines B
or: B is functionally dependent on A
Example: Student table: stuId and StuLastName
StuId is the primary key
StuId uniquely identifies the StuLastName attribute because if we know
the student id, we can tell the student name associated with it.
Chapter 6: Normalization and Denormalization
Multi-Valued Dependencies
Multi-valued dependencies occurs when two attributes in a table are
independent of each other but, both depend on a third attribute. If a
table has attributes A, B, and C and
B and C are multi-valued facts of A.
Written: A ->-> B A ->-> C
Read: A multi-determines attribute B
And: A multi-determines attribute C
Example: Student table: StuLastName, Major, Sport
Chapter 6: Normalization and Denormalization
Multi-Valued Dependencies (Con’t)
Attributes Major and Sport are independent of each other but
dependent on StuLastName
Problem: Table must list all combinations of values of Major and
Sport for each StuLastName to avoid implying relationships that do
not exist
A table with a multivalued dependency violates the normalization
standard of Fourth Normal Form (4NF) because it creates
unnecessary redundancies and can contribute to inconsistent data.
To bring this up to 4NF, it is necessary to break the Multi-valued
dependency information into two tables:
StudentMajors table: StuLastName, Major
StudentSports table: StuLastName, Sport
Chapter 6: Normalization and Denormalization
Join Dependencies
A Join dependency is generalization of Multivalued dependency
A relation is said to have join dependency if it can be recreated by
joining multiple sub-relations and each of these sub-relations has a
subset of the attributes of the original relation.
Basically a table can be created by joining multiple tables
Chapter 6: Normalization and Denormalization
First Normal Form (1NF)
A table is in First Normal Form (1NF) if and only if the following
conditions are satisfied:
Each attribute contains only one value (single-valued)
All attribute values are atomic meaning they cannot be broken down
any further (there are no repeating groups).
There are no duplicated rows in the table
Chapter 6: Normalization and Denormalization
Example of table that does not satisfy 1NF - NewStu2 Table
NewStud2 (New Student table):
• Assume stuId is the primary key
• Assume students can have more than one major
• The major attribute is not single-valued for each tuple. For a given
stuId, there may be more than one value for major
Chapter 6: Normalization and Denormalization
Ideal Method to First Normal Form (1NF)
Best solution: For each multi-valued attribute, create a new table, in
which you place the key of the original table and the multi-valued
attribute. Keep the original table, with its primary key
Example: NewStu2 (stuId, lastName, credits, status, socSecNo)
Majors (stuId, major)
Chapter 6: Normalization and Denormalization
Second Method to First Normal Form (1NF)
If the number of repeats is limited, make additional columns for multiple
values. Ex: major1 and major2
Drawback: Must know the maximum number of repeats and queries
become more complex.
Chapter 6: Normalization and Denormalization
Third Method to First Normal Form (1NF)
Flatten the original table by making the multi-valued attributes part of
the primary key:
Ex: Student (stuId, major, lastname, credits, status, socSecNo)
Chapter 6: Normalization and Denormalization
Second Normal Form (2NF)
A table is in Second Normal Form (2NF) if and only if the following
conditions are satisfied:
Table is in First Normal Form (1NF)
And all non-primary key attributes are fully functionally dependent on
the primary key (no partial dependency)
Note: If primary key has only one attribute and the table is 1NF, then
the table is automatically 2NF
Chapter 6: Normalization and Denormalization
Converting to Second Normal Form (2NF)
Identify each partial functional dependencies
Remove the partially functional dependency attributes that depend
on each of the determinants so identified
Place these determinants in separate table along with their
dependent attributes
In original table, keep the composite key and any attributes that are
fully functionally dependent on all of it
Even if the composite key has no dependent attributes, keep that
relation to connect logically the others
Chapter 6: Normalization and Denormalization
Second Normal Form (2NF) Example
StuId classNumber Cost
S1001 ART103A 1000
S1002 HST205A 1500
S1001 MTH101B 2000
S1004 MTH103C 1000
S1004 ART103A 1000
S1002 CSC201A 2000
Note: that there are many classes with the same cost.
ART103A, MTH103C = 1000
Chapter 6: Normalization and Denormalization
Second Normal Form (2NF) Example (Con’t)
2NF tries to reduce the redundant data getting stored on disc.
For example, if there are 100 students taking ART103A, we do not need to
store its cost as 1000 for all the 100 records, instead we can store it once
in the second table as the course fee for C1 is 1000.
Table 1 Table 2
StuId classNumber classNumber Cost
S1001 ART103A ART103A 1000
S1002 HST205A HST205A 1500
S1001 MTH101B MTH101B 2000
S1004 MTH103C MTH103C 1000
S1004 ART103A CSC201A 2000
S1002 CSC201A
Chapter 6: Normalization and Denormalization
Third Normal Form (3NF)
A table is in Third Normal Form (3NF) if and only if the following
conditions are satisfied:
Table is in Second Normal Form (2NF)
And all non-primary key attributes are transitive dependent on the
primary key
A table is in 3NF if at least one of the following condition holds in every
non-trivial function dependency X→Y exists:
X is a superkey
Y is a prime attribute (each element of Y is part of some candidate
key)
Chapter 6: Normalization and Denormalization
Making a Relation Third Normal Form (3NF)
Example: Student (StuId, StuName, StuCity, StuState, StuZip)
StuCity and StuState are dependent on StuZip
StuZip is dependent on StuId
The non-prime attributes (StuCity, StuState) are transitively dependent
on key (Stuid), which violates the rule 3NF.
To fix this we:
Remove the dependent attribute from the table.
Create new table with dependent attribute and its determinant
Keep the determinant in the original table:
NewStudent (StuId, StuName,StuZip)
zipCode (StuZip, StuCity, StuState)
Chapter 6: Normalization and Denormalization
Boyce-Codd Normal Form (BCNF)
A table is in Boyce-Codd Normal Form (BCNF) if and only if the
following conditions are satisfied:
Table is in Third Normal Form (3NF)
And for any non-trivial functional dependency A→B, A is a
superkey
Therefore, to check for BCNF, we simply identify all the determinants
and verify that they are superkeys. If they are not, we break up the
relational table by projection until we have a set of relational tables
all in BCNF.
Chapter 6: Normalization and Denormalization
Boyce-Codd Normal Form (BCNF) - Example
stuId classNumber facName
S1001 ART103A Adams
S1001 HST205A Tanaka
S1002 ART103A Byrne
S1003 MTH101B Smith
S1004 ART103A Adams
- Primary key: stuid, classNumber
- One student can enroll in many classes S1001 (ART103A, HST205A)
- For Each classNumber, Faculty is assigned to student
- ClassNumber can be taught by many Faculty ART103A (Adams,Byrne)
- There is a dependency between ClassNumber and FacName where
ClassNumber is dependent on FacName
Chapter 6: Normalization and Denormalization
Boyce-Codd Normal Form (BCNF) – Example (Con’t)
This table satisfies the First Normal Form (1NF) because all the
values are atomic, column names are unique and all the values
stored in a particular column are of same domain.
This table also satisfies the 2nd Normal Form (2NF) as their is
no Partial Dependency.
This table also satisfies 3rd Normal Form (3NF) as there is
no Transitive Dependency.
But this table is not in Boyce-Codd Normal Form because faculty is
dependent on ClassNumber and while ClassNumber is part of the
primary key, Faculty is a non-primary key attribute which is not
allowed in BCNF.
Chapter 6: Normalization and Denormalization
Boyce-Codd Normal Form (BCNF) – Example (Con’t)
How to fix this: we remove the dependent attributes to a new relational
table, with the determinant as the primary key
Student stuId FacId
S1001 F101
S1001 F102
Faculty FacID FacName ClassNum
F101 Adams ART103A
F102 Tanaka HST205A
Chapter 6: Normalization and Denormalization
Fourth Normal Form (4NF)
A table is in Fourth Normal Form (4NF) if and only if the following
conditions are satisfied:
It should be in the Boyce-Codd Normal Form (BCNF).
the table should not have any Multi-valued Dependency.
A table with a multivalued dependency violates the normalization
standard of Fourth Normal Form (4NF) because it creates
unnecessary redundancies and can contribute to inconsistent data.
To bring this up to 4NF, it is necessary to break this information into
two tables.
Chapter 6: Normalization and Denormalization
Fourth Normal Form (4NF) Example
StuId classNumber Hobby
S1001 ART103A Cricket
S1001 MTH101B Hockey
S1001 is taking 2 courses and two hobbies
The problem is that there is no relationship between classNumber and
Hobby – they are independent of each other.
Therefore, there is a multi-value dependency, which leads to un-necessary
repetition of data and other anomalies as well.
StuId classNumber Hobby
S1001 ART103A Cricket
S1001 ART103A Hockey
S1001 MTH101B Cricket
S1001 MTH101B Hockey
Chapter 6: Normalization and Denormalization
Fourth Normal Form (4NF) Example
To fix this, we can decompose the table into 2 tables.
StuId classNumber
S1001 ART103A
S1001 MTH101B
StuId Hobby
S1001 Cricket
S1001 Hockey
Chapter 6: Normalization and Denormalization
Fifth Normal Form (5NF)
Fifth Normal Form (5NF) also known as Project-Join Normal Form
(PJ/NF).
A table is in Fifth Normal Form (5NF) if and only if the following conditions
are satisfied:
Table is in Fourth Normal Form (4NF)
It cannot be further non-loss or lossless decomposed (join dependency)
A lossless join decomposition is a decomposition of a table into tables
such that a natural join of the two smaller tables yields back the original
table. This is central in removing redundancy safely from databases
while preserving the original data.
Note: 5NF is satisfied when all the tables are broken into as many tables
as possible in order to avoid redundancy.
Chapter 6: Normalization and Denormalization
Fifth Normal Form (5NF) - Example
FacID Subject Semester
F101 History 1
F101 Math 1
F102 Math 2
F101 teaches History and Math for Semester 1 but does not teach
Math for Semester 2
The primary key is the combination of all three columns.
Assume we want to add Semester 3 but do not know who will be
teaching what subject. We cannot leave the attributes null since they
are part of the primary key
Chapter 6: Normalization and Denormalization
Fifth Normal Form (5NF) – Example (Con’t)
To fix this we need to decompose this into following 3 tables:
Table 1 Table 2 Table 3
FaciD Subject FacID Semester Semester Subject
F101 History F101 1 1 History
F101 Math F101 1 1 Math
F102 Math F102 2 2 Math
Chapter 6: Normalization and Denormalization
Domain-Key Normal Form (DKNF)
A table is in Domain-Key Normal Form (DKNF) when every constraint
is a logical consequence of domain constraints or key constraints.
Domain constraints specify the possible values of the attribute.
Ex: Colors are only black and white.
Ex: GPA of Student is between 0 and 4.
Key constraints specify keys of some table.
The basic idea behind the DKNF is to specify the normal form that
takes into account all the possible dependencies and constraints.
In other words, DKNF requires that the database contains no
constraints other than domain constraints and key constraints.
Chapter 6: Normalization and Denormalization
Denormalization
Denormalization is the reverse process of normalization
When to stop the normalization process
When applications require too many joins
When you cannot get a non-loss decomposition that preserves
dependencies
Denormalization means deliberately choosing a lower normal form
Chapter 6: Normalization and Denormalization
Problems with a Normalized Database
Best used when the source data is relatively simple
Stored data does not resemble the original documents from which it
is taken, but instead is shredded into separate tables
Usually store only the most current information, not historical data
Useful for OLTP, online transaction processing
Optimized for write operations; read operations may be slow, if joins
of the tables are required
Chapter 6: Normalization and Denormalization
Non-Normalized Databases
OLAP systems
Used for planning and decision-making
Require historical data, not just current data
Updates are rare; optimized for reading
Data stored in denormalized form
Object-based systems
Needed for advanced applications
Objects are not normalized
Chapter 6: Normalization and Denormalization
Non-Normalized Databases (Con’t)
Big data systems-XML, Google’s Big Table, HBase, Cassandra,
and Hadoop
Capture data in the format of its source
Store data in a denormalized, usually duplicated, form
Allow multiple versions of items to be stored
Provide efficient and scalable read access to data
Facilitate data transmission
Chapter 6 Questions?
Chapter 6: Team Assignment Review
Normalization
Chapter 6 – Team Project: Normalizing the Relational Model for the
Team Project and Creating a Normalized Oracle Database
Read the sample project steps for this chapter and apply the same
techniques to the team project that you are developing. For the team
project, do the following:
Step 6.1 - Begin with the list of the tables that the entities and
relationships from the E-R diagram mapped to naturally, from
the sample project section at the end of chapter 4.
For each table on the list, identify functional dependencies and
normalize the relation to Boyce-Codd Normal Form (BCNF). Then
decide whether the resulting tables should be implemented in that
form. If not, explain why.
Chapter 6: Team Assignment Review
Normalization (Con’t)
Step 6.2 - Update the data dictionary and list of assumptions as
needed.
Step 6.3 - For each table, write the table name and write out the
names, data types, and sizes of all the data items.
Identify any constraints, using the conventions of the DBMS you will use for
implementation.
Step 6.4 - Write and execute SQL statements to create all the tables
needed to implement the design
Step 6.5 - Write and execute SQL statements to create indexes for
foreign keys and any other columns that will be used most often for
queries. (primary key, foreign key, check constraints)
Note: Step 6.4 and Step 6.5 can be combined.
Chapter 6: Team Assignment Review
Normalization (Con’t)
Step 6.6 - Write and execute SQL statements to insert at least five
records in each table, preserving all constraints.
Put in enough data to demonstrate how the database will function.
Step 6.7 - Write and execute SQL statements that will process five
non-routine requests for information from the database just
created.
Note: Make sure to write 5 different SQL statements. Also use a
WHERE clause or join tables.
Do not write select * from <table_name>;
Chapter 6: Team Assignment Review
Normalization (Con’t)
Step 6.8 - Write and execute SQL statements to create at least one
trigger.
Step 6.9 - Write and execute SQL statements to demonstrate that
the trigger is working as expected.
To demonstrate that the trigger is working as expected, provide a
screenshot of the data before and after the trigger is executed.