0% found this document useful (0 votes)
1 views18 pages

Database Management System ---- (Chapter 8)

Chapter 8 discusses schema refinement and normal forms in database management, emphasizing the importance of eliminating redundancy and addressing anomalies such as insertion, deletion, and update issues. It explains the concepts of tables, instances, and schemas, including their types, and introduces decomposition as a method to improve database design while preserving functional dependencies. Additionally, the chapter covers multi-valued dependencies and Armstrong's Axioms for inferring functional dependencies.

Uploaded by

abelahabtamu1063
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views18 pages

Database Management System ---- (Chapter 8)

Chapter 8 discusses schema refinement and normal forms in database management, emphasizing the importance of eliminating redundancy and addressing anomalies such as insertion, deletion, and update issues. It explains the concepts of tables, instances, and schemas, including their types, and introduces decomposition as a method to improve database design while preserving functional dependencies. Additionally, the chapter covers multi-valued dependencies and Armstrong's Axioms for inferring functional dependencies.

Uploaded by

abelahabtamu1063
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Chapter

8
Schema Refinement and
Normal Forms

INTRODUCTION

View
Views and tables, both are two database object types. In simple words, Views are stored or named
select queries. They can be created as shown below.
Create or replace view view_name
As
Select_statement;
Tables are made up of columns and rows. A column is a set of data, which belongs to a same data
type. A row is a sequence of values, which can be from different data types. Columns are identified
by the column names, and each row is uniquely identified by the table primary key. Tables are created
using “create table” DDL query.
Create table table_name (
Column_name1 datatype (length),
Copyright © 2014. Alpha Science International. All rights reserved.

Column_name2 datatype (length)


….
….
….
     );

Schema
A database schema of a database system describes the structure and the organization of data. A formal
language supported by the Database Management System is used to define the database schema.
Schema describes how the database will be constructed using its tables. Formally, schema is defined as
the set of formula that imposes integrity constraints on the tables. Furthermore, the database schema

Bhatia, Ashima Bhatnagar, and Ashima Bhatnagar Bhatia. Database Management System, Alpha Science International, 2014. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/georgetown/detail.action?docID=5218421.
Created from georgetown on 2023-02-21 16:29:01.
8.2 Database Management System

will describe all tables, column names and types, indexes, etc. There are three types of schema called
the conceptual schema, logical schema and physical schema. Conceptual schema describes how
concepts and relationships are mapped. Logical schema defines how entities, attributes and relations
are mapped. Physical schema is a specific implementation of the aforementioned logical schema.

Table
A table is a set of data that is organized in to rows and columns. A database contains one or more
tables that actually hold the data in the database. Each table in a database has a unique name that is
used to identify it. Columns in a database also have a unique name and a data type associated with it.
In addition, there can be special attributes associated with a column such as whether it is a primary
key or whether it is used as an index, etc. The rows in a table hold the actual data. In relational
databases, a relation is represented using a table. But a relation and a table are not the same, since
a table can have rows that are duplicates (and a relation cannot contain duplicate rows). There are
two types of tables as object tables and relational tables. Object tables hold objects of a defined type
whereas relational tables hold user data in a relational database.

Instance
Instance is a collection of processes running on top of the operating system and the related memory
that interacts with the data storage. The instance is the interface between the user and the database.
Processes capable of communicating with the client and accessing database are provided by the
instance. These processes are background processes and they are not enough to maintain the ACID
(Atomicity, Consistency, Isolation, and Durability) principle in the database. So, an instance also
uses few other components such as memory cache and buffers. More specifically, an Instance is
composed of three parts. They are SGA (System Global Area), PGA (Program Global Area) and
background processes. SGA is a temporary shared memory structure, which has a life span of the
instance startup to its shutdown.

Database
The Oracle database refers to the actual storage of the Oracle RDBMS. It is made up of three main
components. They are control files, redo files and data files. Optionally there could be password files
Copyright © 2014. Alpha Science International. All rights reserved.

in the database. The control files keep track of all the data files and redo files. It also helps keep the
database integrity intact by keeping track of the System Change Number (SCN), timestamps and
other critical information such as backup/recovery information. Data files keep the actual data. At
the time of database creation, at least two data files are created. These files are physically seen by
the DBA (Database Administrator). File operations such as renaming, resizing, adding, moving or
dropping can be carried out on data files. Redo log files (also known as online redo logs), keep the
information regarding the changes to the database with the chronological information. This information
is needed in case the user needs to redo all or some of the modifications on the database. In order
for an instance to manipulate the data of the database, it should open it first. An instance could open
only one database. However, a database can be opened by multiple instances.

Bhatia, Ashima Bhatnagar, and Ashima Bhatnagar Bhatia. Database Management System, Alpha Science International, 2014. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/georgetown/detail.action?docID=5218421.
Created from georgetown on 2023-02-21 16:29:01.
Schema Refinement and Normal Forms 8.3

Need for Schema Refinement


Due to the problem of redundancy we need to refine schema.

Demerits of Redundancy
l Redundancy is at the root of several problems associated with relational schemas:
o Redundant storage, insert/delete/update anomalies
l Integrity constraints, in particular functional dependencies, can be used to identify schemas
with such problems and to suggest refinements.
l Redundancy is a primary issue in data storage.
o Problems caused by redundancy are as follows :
   Redundant storage
l Decomposition should be used judiciously.
l Functional dependency constraints utilized to identify schemas with such problems and to
suggest refinements.

Anomalies in Database

Database anomalies, are really just unmatched or missing information caused by limitations or flaws
within a given database. Databases are designed to collect data and sort or present it in specific ways
to the end user. Entering or deleting information, be it an update or a new record can cause issues
if the database is limited or has ‘bugs’.
Modification anomalies or Update anomalies are data inconsistencies that resulted from data
redundancy or partial update. The Problems resulting from data redundancy in database table are
known as update anomalies. If a modification is not carried out on all the relevant rows, the database
will become inconsistent. So any database insertion, deletion or modification that leaves the database
in an inconsistent state is said to have caused an update anomaly.
Insertion anomalies are issues that come about when you are inserting information into the
database for the first time. To insert the information into the table, we must enter the correct details
Copyright © 2014. Alpha Science International. All rights reserved.

so that they are consistent with the values for the other rows. Missing or incorrectly formatted entries
are two of the more common insertion errors. Most developers acknowledge that this will happen
and build in error codes that tell you exactly what went wrong.
Deletion anomalies are obviously about issues with data being deleted, either when attempting
to delete and being stopped by an error or by the unseen drop off of data. If we delete a row from the
table that represents the last piece of data, the details about that piece are also lost from the Database.
These are the least likely to be caught or to stop you from proceeding. Because many deletion errors
go unnoticed for extended periods of time, they could be the most costly in terms of recovery.
Database anomalies are a fact; we will all face them in one form or another in life. The importance
of backing up, storing offsite and data consistency checks come into full focus when you consider
what could be lost.

Bhatia, Ashima Bhatnagar, and Ashima Bhatnagar Bhatia. Database Management System, Alpha Science International, 2014. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/georgetown/detail.action?docID=5218421.
Created from georgetown on 2023-02-21 16:29:01.
8.4 Database Management System

Examples Motivating Schema Refinement


ER design CAN generate some schemas with redundancy problems, because it is a complex, subjective
process, and certain constraints are NOT expressible in terms of ER diagrams.
l Constraints on an entity set
l Constraints on a relationship set
l Identifying attributes of entities
l Identifying entity sets

Decomposition
Decomposition means replacing a relation with a collection of smaller relations.
Let r be a relation schema a set of relation schemas {r1, r2, …, rn} is a decomposition of r if
l r = r1 U r2 U …..U rn
l Each ri is a subset of r (for i = 1, 2…, n)
Decompositions should always be lossless. Lossless decomposition ensure that the information
in the original relation can be accurately reconstructed based on the information represented in the
decomposed relations.

Purpose of Decomposition
l Eliminate redundancy by decomposing a relation into several relations in a higher normal form.
l It is important to check that a decomposition does not lead to bad design

Problems Related to Decomposition


l Although decomposition can eliminate redundancy, it causes problems of its own.
l Queries over the original relation may require us to join the decomposed relations. If such queries
are common, the performance penalty of decomposing the relation may not be acceptable.
l In this case we may choose to live with some of the problems of redundancy and not decompose
Copyright © 2014. Alpha Science International. All rights reserved.

the relation.
l It is important to be aware of the potential problems caused by such residual redundancy in the
design and to take steps to avoid them (e.g. by adding some checks to application code).
l A good DB designer should have a firm grasp of normal forms and what problems they do (or
do not) alleviate, the technique of decomposition, and potential problems with decompositions.

Loosy Decomposition
Suppose a relation R is decomposed into two new relations R1 and R2. If on decomposition of R1
and R2, the original relation R cannot be obtained due to loss of information, than the decomposition
is known as loosy decomposition.

Bhatia, Ashima Bhatnagar, and Ashima Bhatnagar Bhatia. Database Management System, Alpha Science International, 2014. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/georgetown/detail.action?docID=5218421.
Created from georgetown on 2023-02-21 16:29:01.
Schema Refinement and Normal Forms 8.5

Lossless Decomposition
When on decomposition of a relation, recomposition can be done without loss of information then
the decomposition is called lossless decomposition.
Decomposition of R into X and Y is lossless-join w.r.t. a set of FDs F if, for every instance r
that satisfies F : px(r) py(r) = r.
It is essential that all decompositions used to deal with redundancy be lossless.

Dependency Preserving Decomposition


When the decomposition of a relational scheme preserved the associated set of functional dependencies.
If R is decomposed into R1, R2,…, Rn, then {F1 U F2 U … U Fn}+ = F+

Algorithm to check for Dependency Preservation


begin;
for each X → Y in F and with R (R1, R2, …, Rn)
{
let Z = X;
while there are changes in Z
{
from i=1 to n
Z = Z ∪ ((Z ∩ Ri)+ ∩ Ri) w.r.t to F;
}
if Y is a proper subset of Z, current fd is preserved
else decomposition is not dependency preserving;
}
this is a dependency preserving decomposition;
end;

Explain the above Algorithm


Copyright © 2014. Alpha Science International. All rights reserved.

1. Choose a functional dependency in set F, say you choose X à Y.


2. Let set Z to the “left hand side” of the functional dependency, X such Z = X. Starting with R1
in the decomposed set {R1, R2,…Rn)
3. Intersect Z with R1, Z ∩ R1
4. Find the closure of the result from step 3 (Z ∩ R1) using original set F
5. Intersect the result from step 4 ((Z ∩ R1)+) with R1 again.
6. Updated Z with new attribute in the result from step 5.
7. Repeat step 3-6 from R2, R3, …, Rn.
8. If there’s any changes between original Z before step 3 and after step 7, repeat step 3-7.

Bhatia, Ashima Bhatnagar, and Ashima Bhatnagar Bhatia. Database Management System, Alpha Science International, 2014. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/georgetown/detail.action?docID=5218421.
Created from georgetown on 2023-02-21 16:29:01.
8.6 Database Management System

9. Check whether Y is a proper subset of current Z. If it is not, this decomposition is a violation


of dependency preservation. You can stop now.
10. If Y is a proper subset of current Z, repeat 1-9 until you check ALL functional dependencies in
set F.

Fig. 8.1 Relation between decomposition and normal form

Multi-valued Dependencies
A multivalued dependency (MVD) on R, X→→Y , says that if two tuples of R agree on all the
attributes of X, then their components in Y may be swapped, and the result will be two tuples that are
also in the relation, i.e., for each value of X, the values of Y are independent of the values of R-X-Y.
E.g. Student(name, addr, phones, course)
A drinker’s phones are independent of the beers they like.
name →→phones and name →→ course.
Thus, each of a student’s phones appears with each of the course they like in all combinations.
This repetition is unlike FD redundancy.
name → addr is the only FD.
Copyright © 2014. Alpha Science International. All rights reserved.

Functional Dependencies
Functional Dependency is the starting point for the process of normalization. Functional dependency
exists when a relationship between two attributes allows you to uniquely determine the corresponding
attribute’s value. If ‘X’ is known, and as a result you are able to uniquely identify ‘Y’, there is
functional dependency. Combined with keys, normal forms are defined for relations.
Examples
Bear Number determines Student Name:
BearNum → StuName
Department Number and Job Rank determine Security Clearance:

Bhatia, Ashima Bhatnagar, and Ashima Bhatnagar Bhatia. Database Management System, Alpha Science International, 2014. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/georgetown/detail.action?docID=5218421.
Created from georgetown on 2023-02-21 16:29:01.
Schema Refinement and Normal Forms 8.7

(DeptNum, JRank) →SecClear


Social Security Number determines Employee Name and Salary:
SSN → (EmpName, Salary)
Additionally, the above can be read as:
SSN →EmpName and SSN Salary

Armstrong’s Axioms
William W. Armstrong established a set of rules which can be sued to infer the functional dependencies
in a relational database
Reflexivity rule: If A is a set of attributes, and B is a set of attributes that are completely
contained in A, theA implies B.
l Augmentation rule: If A implies B, and C is a set of attributes, then if A implies B,
then AC implies BC.
l Transitivity rule: If A implies B and B implies C, then A implies C.
These can be simplified if we also use:
l Union rule: If A implies B and A implies C, the A implies AC.
l Decomposition rule: If A implies BC then A implies B and A implies C.
l Pseudotransitivity rule: If A implies B and CB implies D, then AC implies D.

Example Functional Dependencies


Let R be
NewStudent(stuId, lastName, major, credits, status, socSecNo)
FDs in R include
l {stuId}→{lastName}, but not the reverse
l {stuId} →{lastName, major, credits, status, socSecNo, stuId}
l {socSecNo} →{stuId, lastName, major, credits, status, socSecNo}
Copyright © 2014. Alpha Science International. All rights reserved.

l {credits}→{status}, but not {status}→{credits}


ZipCode→AddressCity
l 16652 is Huntingdon’s ZIP
ArtistName→BirthYear
l Picasso was born in 1881
Autobrand→Manufacturer, Engine type
l Pontiac is built by General Motors with gasoline engine
Author, Title→PublDate
l Shakespeare’s Hamlet was published in 1600

Bhatia, Ashima Bhatnagar, and Ashima Bhatnagar Bhatia. Database Management System, Alpha Science International, 2014. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/georgetown/detail.action?docID=5218421.
Created from georgetown on 2023-02-21 16:29:01.
8.8 Database Management System

Trivial Functional Dependency


The FD X→Y is trivial if set {Y} is a subset of set {X}
Examples: If A and B are attributes of R,
l {A} → {A}
l {A, B} → {A}
l {A, B} → {B}
l {A, B} → {A, B}
are all trivial FDs and will not contribute to the evaluation of normalization.

FD Axioms
Understanding: Functional Dependencies are recognized by analysis of the real world; no automation
or algorithm. Finding or recognizing them are the database designer’s task.
FD manipulations:
l Soundness -- no incorrect FD›s are generated
l Completeness -- all FD›s can be generated
Table 8.1 Armstrong’s Axioms Table

Axiom Name Axiom Example

Reflexivity if a is set of attributes, b ⊆ a, then a →b SSN,Name → SSN


Augmentation if a→ b holds and c is a set of attributes, then SSN → Name then
ca→cb SSN,Phone → Name, Phone
Transitivity if a →b holds and b→c holds, then a→ c SSN →Zip and Zip → City then SSN →City
holds
Union or Additivity* if a → b and a → c holds then a→ bc holds SSN→Name and SSN→Zip then SSN→Name,Zip
Decomposition or if a → bc holds then a → b and a → c holds SSN→Name,Zip then SSN→Name and SSN→Zip
Projectivity*
Pseudotransitivity* if a → b and cb → d hold then ac → d holds Address → Project and Project,Date →
Copyright © 2014. Alpha Science International. All rights reserved.

Amount then Address, Date → Amount


(NOTE) ab→ c does NOT imply a → b and b → c

*Armstrong’s Axioms (basic axioms)

Closure
Find all FD’s for attributes a in a relation R
a+ denotes the set of attributes that are functionally determined by a
IF attribute(s) a IS/ARE A SUPERKEY OF R THEN a+ SHOULD BE THE WHOLE
RELATION R. This is our goal. Any attributes in a relation not part of the closure indicates
a problem with the design.

Bhatia, Ashima Bhatnagar, and Ashima Bhatnagar Bhatia. Database Management System, Alpha Science International, 2014. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/georgetown/detail.action?docID=5218421.
Created from georgetown on 2023-02-21 16:29:01.
Schema Refinement and Normal Forms 8.9

Algorithm for Closure


result: = a; //start with superkey a
   WHILE (more changes to result) DO
   FOREACH (FD b → c in R) DO
   IF b ⊆ result
THEN result := result ∪ c

Normal Forms
While designing a database out of an entity–relationship model, the main problem existing in that
“raw” database is redundancy. Redundancy is storing the same data item in more one place.
A redundancy creates several problems like the following:
1. Extra storage space: storing the same data in many places takes large amount of disk space.
2. Entering same data more than once during data insertion.
3. Deleting data from more than one place during deletion.
4. Modifying data in more than one place.
5. Anomalies may occur in the database if insertion, deletion, modification etc are not done properly.
It creates inconsistency and unreliability in the database.
To solve this problem, the “raw” database needs to be normalized. This is a step by step process
of removing different kinds of redundancy and anomaly at each step. At each step a specific rule
is followed to remove specific kind of impurity in order to give the database a slim and clean look.

Un-Normalized Form (UNF)


If a table contains non-atomic values at each row, it is said to be in UNF. An atomic value is
something that cannot be further decomposed. A non-atomic value, as the name suggests, can be
further decomposed and simplified. Consider the following table:
Table 8.2 Employee Table
Copyright © 2014. Alpha Science International. All rights reserved.

Emp-Id Emp-Name Month Sales Bank-Id Bank-Name


E01 AA Jan 1000 B01 SBI
Feb 1200
Mar 850
E02 BB Jan 2200 B02 UTI
Feb 2500
E03 CC Jan 1700 B01 SBI
Feb 1800
Mar 1850
Apr 1725

Bhatia, Ashima Bhatnagar, and Ashima Bhatnagar Bhatia. Database Management System, Alpha Science International, 2014. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/georgetown/detail.action?docID=5218421.
Created from georgetown on 2023-02-21 16:29:01.
8.10 Database Management System

In the sample table above, there are multiple occurrences of rows under each key Emp-Id.
Although considered to be the primary key, Emp-Id cannot give us the unique identification facility
for any single row. Further, each primary key points to a variable length record (3 for E01, 2 for
E02 and 4 for E03).

First Normal Form (1NF)


A relation is said to be in 1NF if it contains no non-atomic values and each row can provide a unique
combination of values. The above table in UNF can be processed to create the following table in 1NF.
Table 8.3 Employee Table after INF

Emp-Id Emp-Name Month Sales Bank-Id Bank-Name


E01 AA Jan 1000 B01 SBI
E01 AA Feb 1200 B01 SBI
E01 AA Mar 850 B01 SBI
E02 BB Jan 2200 B02 UTI
E02 BB Feb 2500 B02 UTI
E03 CC Jan 1700 B01 SBI
E03 CC Feb 1800 B01 SBI
E03 CC Mar 1850 B01 SBI
E03 CC Apr 1725 B01 SBI
As you can see now, each row contains unique combination of values. Unlike in UNF, this
relation contains only atomic values, i.e., the rows can not be further decomposed, so the relation
is now in 1NF.

Second Normal Form (2NF)


A relation is said to be in 2NF f if it is already in 1NF and each and every attribute fully depends on the
primary key of the relation. Speaking inversely, if a table has some attributes which is not dependant
on the primary key of that table, then it is not in 2NF. Let us explain. Emp-Id is the primary key of the
above relation. Emp-Name, Month, Sales and Bank-Name all depend upon Emp-Id. But the attribute
Bank-Name depends on Bank-Id, which is not the primary key of the table. So the table is in 1NF,
but not in 2NF. If this position can be removed into another related relation, it would come to 2NF.
Copyright © 2014. Alpha Science International. All rights reserved.

Table 8.4 Employee_1 Table

Emp-Id Emp-Name Month Sales Bank-Id


E01 AA Jan 1000 B01
E01 AA Feb 1200 B01
E01 AA Mar 850 B01
E02 BB Jan 2200 B02
E02 BB Feb 2500 B02
E03 CC Jan 1700 B01
E03 CC Feb 1800 B01
E03 CC Mar 1850 B01
E03 CC Apr 1725 B01

Bhatia, Ashima Bhatnagar, and Ashima Bhatnagar Bhatia. Database Management System, Alpha Science International, 2014. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/georgetown/detail.action?docID=5218421.
Created from georgetown on 2023-02-21 16:29:01.
Schema Refinement and Normal Forms 8.11

Table 8.5 Employee_2 Table

Bank-Id Bank-Name
B01 SBI
B02 UTI
After removing the portion into another relation we store lesser amount of data in two relations
without any loss information. There is also a significant reduction in redundancy.

Third Normal Form (3NF)


A relation is said to be in 3NF, if it is already in 2NF and there exists no transitive dependency in
that relation. Speaking inversely, if a table contains transitive dependency, then it is not in 3NF, and
the table must be split to bring it into 3NF.
What is a transitive dependency? Within a relation if we see
A → B [B depends on A]
And
B → C [C depends on B]
Then we may derive
A → C [C depends on A]
Such derived dependencies hold well in most of the situations. For example if we have
Roll → Marks
And
Marks → Grade
Then we may safely derive
Roll → Grade.
This third dependency was not originally specified but we have derived it.
The derived dependency is called a transitive dependency when such dependency becomes
improbable.
Copyright © 2014. Alpha Science International. All rights reserved.

For example we have been given


Roll → City
And
City → STDCode
If we try to derive Roll → STDCode it becomes a transitive dependency, because obviously the
STDCode of a city cannot depend on the roll number issued by a school or college. In such a
case the relation should be broken into two, each containing one of these two dependencies:
Roll → City
And
City → STD code

Bhatia, Ashima Bhatnagar, and Ashima Bhatnagar Bhatia. Database Management System, Alpha Science International, 2014. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/georgetown/detail.action?docID=5218421.
Created from georgetown on 2023-02-21 16:29:01.
8.12 Database Management System

Boyce-Code Normal Form (BCNF)


A relationship is said to be in BCNF if it is already in 3NF and the left hand side of every dependency
is a candidate key. A relation which is in 3NF is almost always in BCNF. These could be same situation
when a 3NF relation may not be in BCNF the following conditions are found true.
1. The candidate keys are composite.
2. There are more than one candidate keys in the relation.
3. There are some common attributes in the relation.
Table 8.6 Department Table

Professor Code Department Head of Dept. Per cent Time


P1 Physics Ghosh 50
P1 Maths Krishnan 50
P2 Chemistry Rao 25
P2 Physics Ghosh 75
P3 Maths Krishnan 100
Consider, as an example, the above relation. It is assumed that:
1. A professor can work in more than one department
2. The percentage of the time he spends in each department is given.
3. Each department has only one Head of Department.
The relation diagram for the above relation is given as the following:
Copyright © 2014. Alpha Science International. All rights reserved.

Fig. 8.2 Dependency diagrams 1

The given relation is in 3NF. Observe, however, that the names of Dept. and Head of Dept. are
duplicated. Further, if Professor P2 resigns, rows 3 and 4 are deleted. We lose the information that
Rao is the Head of Department of Chemistry.
The normalization of the relation is done by creating a new relation for Dept. and Head of Dept.
and deleting Head of Dept. form the given relation. The normalized relations are shown on next
page.

Bhatia, Ashima Bhatnagar, and Ashima Bhatnagar Bhatia. Database Management System, Alpha Science International, 2014. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/georgetown/detail.action?docID=5218421.
Created from georgetown on 2023-02-21 16:29:01.
Schema Refinement and Normal Forms 8.13

Table 8.7 Department_2 Table

Professor Code Department Per cent Time

P1 Physics 50

P1 Maths 50

P2 Chemistry 25

P2 Physics 75

P3 Maths 100

Table 8.8 Department_3 Table

Department Head of Dept.


Physics Ghosh
Mathematics Krishnan
Chemistry Rao

See the dependency diagrams for these new relations.

Fig. 8.3 Dependency diagrams 2

Fourth Normal Form (4NF)


Copyright © 2014. Alpha Science International. All rights reserved.

When attributes in a relation have multi-valued dependency, further Normalization to 4NF and 5NF
are required. Let us first find out what multi-valued dependency is.
A multi-valued dependency is a typical kind of dependency in which each and every attribute
within a relation depends upon the other, yet none of them is a unique primary key.
We will illustrate this with an example. Consider a vendor supplying many items to many
projects in an organization. The following are the assumptions:
1. A vendor is capable of supplying many items.
2. A project uses many items.
3. A vendor supplies to many projects.
4. An item may be supplied by many vendors.

Bhatia, Ashima Bhatnagar, and Ashima Bhatnagar Bhatia. Database Management System, Alpha Science International, 2014. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/georgetown/detail.action?docID=5218421.
Created from georgetown on 2023-02-21 16:29:01.
8.14 Database Management System

A multi-valued dependency exists here because all the attributes depend upon the other and yet
none of them is a primary key having unique value.
Table 8.9 Vendor_Item Table

Vendor Code Item Code Project No


V1 I1 P1
V1 I2 P1
V1 I1 P3
V1 I2 P3
V2 I2 P1
V2 I3 P1
V3 I1 P2
V3 I1 P3

The given relation has a number of problems. For example:


1. If vendor V1 has to supply to project P2, but the item is not yet decided, then a row with a blank
for item code has to be introduced.
2. The information about item I1 is stored twice for vendor V3.
Observe that the relation given is in 3NF and also in BCNF. It still has the problem mentioned
above. The problem is reduced by expressing this relation as two relations in the Fourth Normal
Form (4NF). A relation is in 4NF if it has no more than one independent multi-valued dependency
or one independent multi-valued dependency with a functional dependency.
The table can be expressed as the two 4NF relations given as following. The fact that vendors
are capable of supplying certain items and that they are assigned to supply for some projects in
independently specified in the 4NF relation.
Table 8.10 Vendor_Item_1 Table

Vendor Code Item Code


V1 I1
Copyright © 2014. Alpha Science International. All rights reserved.

V1 I2
V2 I2
V2 I3
V3 I1

Table 8.11 Vendor_Project Table

Vendor Code Project No


V1 P1
V1 P3
V2 P1
V3 P2

Bhatia, Ashima Bhatnagar, and Ashima Bhatnagar Bhatia. Database Management System, Alpha Science International, 2014. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/georgetown/detail.action?docID=5218421.
Created from georgetown on 2023-02-21 16:29:01.
Schema Refinement and Normal Forms 8.15

Fifth Normal Form (5NF)


These relations still have a problem. While defining the 4NF we mentioned that all the attributes
depend upon each other. While creating the two tables in the 4NF, although we have preserved the
dependencies between Vendor Code and Item code in the first table and Vendor Code and Item
code in the second table, we have lost the relationship between Item Code and Project No. If there
were a primary key then this loss of dependency would not have occurred. In order to revive this
relationship we must add a new table like the following. Please note that during the entire process
of normalization, this is the only step where a new table is created by joining two attributes, rather
than splitting them into separate tables.
Table 8.12 Project_Item Table

Project No Item Code


P1 I1
P1 I2
P2 I1
P3 I1
P3 I3
Copyright © 2014. Alpha Science International. All rights reserved.

Fig. 8.4 Steps of normal forms

Table 8.13 Summarize the Normal Forms

Name Description
An entity is in First Normal Form (1NF) when all tables are two-dimensional with no repeating groups.
A row is in first normal form (1NF) if all underlying domains contain atomic values only. 1NF eliminates repeating
groups by putting each into a separate table and connecting them with a one-to-many relationship. Make a
First Normal separate table for each set of related attributes and uniquely identify each record with a primary key.
Form
• Eliminate duplicative columns from the same table.
• Create separate tables for each group of related data and identify each row with a unique column or set of
columns (the primary key).

Bhatia, Ashima Bhatnagar, and Ashima Bhatnagar Bhatia. Database Management System, Alpha Science International, 2014. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/georgetown/detail.action?docID=5218421.
Created from georgetown on 2023-02-21 16:29:01.
8.16 Database Management System

An entity is in Second Normal Form (2NF) when it meets the requirement of being in First Normal Form (1NF)
and additionally:
• Does not have a composite primary key. Meaning that the primary key can not be subdivided into separate
logical entities.
Second
• All the non-key columns are functionally dependent on the entire primary key.
Normal
Form • A row is in second normal form if, and only if, it is in first normal form and every non-key attribute is fully
dependent on the key.
• 2NF eliminates functional dependencies on a partial key by putting the fields in a separate table from
those that are dependent on the whole key. An example is resolving many:many relationships using an
intersecting entity.
An entity is in Third Normal Form (3NF) when it meets the requirement of being in Second Normal Form (2NF)
and additionally:
Third
• Functional dependencies on non-key fields are eliminated by putting them in a separate table. At this level,
Normal
all non-key fields are dependent on the primary key.
Form
• A row is in third normal form if and only if it is in second normal form and if attributes that do not contribute
to a description of the primary key are move into a separate table. An example is creating look-up tables.
Boyce Codd Normal Form (BCNF) is a further refinement of 3NF. In his later writings Codd refers to BCNF as
3NF. A row is in Boyce Codd normal form if, and only if, every determinant is a candidate key. Most entities in
Boyce-Codd 3NF are already in BCNF.
Normal BCNF covers very specific situations where 3NF misses inter-dependencies between non-key (but candidate
Form key) attributes. Typically, any relation that is in 3NF is also in BCNF. However, a 3NF relation won’t be in BCNF
if (a) there are multiple candidate keys, (b) the keys are composed of multiple attributes, and (c) there are
common attributes between the keys.
An entity is in Fourth Normal Form (4NF) when it meets the requirement of being in Third Normal Form (3NF)
and additionally:
Fourth
• Has no multiple sets of multi-valued dependencies. In other words, 4NF states that no entity can have
Normal
more than a single one-to-many relationship within an entity if the one-to-many attributes are independent
Form
of each other.
• Many:many relationships are resolved independently.
Fifth Normal An entity is in Fifth Normal Form (5NF) if, and only if, it is in 4NF and every join dependency for the entity is a
Form consequence of its candidate keys.

Normalization vs Denormalization
Copyright © 2014. Alpha Science International. All rights reserved.

Relational databases are made up of relations (related tables). Tables are made up of columns. If
the tables are two large (i.e. too many columns in one table), then database anomalies can occur. If
the tables are two small (i.e. database is made up of many smaller tables), it would be inefficient
for querying. Normalization and Denormalization are two processes that are used to optimize the
performance of the database. Normalization minimizes the redundancies that are present in data
tables. Denormalization (reverse of normalization) adds redundant data or group data.

Normalization
Normalization is a process that is carried out to minimize the redundancies that are present in data
in relational databases. This process will mainly divide large tables in to smaller tables with fewer
redundancies (called “Normal forms”). These smaller tables will be related to each other through
well defined relationships. In a well normalized database, any alteration or modification in data will

Bhatia, Ashima Bhatnagar, and Ashima Bhatnagar Bhatia. Database Management System, Alpha Science International, 2014. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/georgetown/detail.action?docID=5218421.
Created from georgetown on 2023-02-21 16:29:01.
Schema Refinement and Normal Forms 8.17

requires modifying only a single table. First Normal Form (1NF), Second Normal Form (2NF),
and the Third Normal Form (3NF) were introduced by Edgar F. Codd. Boyce-Codd Normal Form
(BCNF) was introduced in 1974 by Codd and Raymond F. Boyce. Higher Normal Forms (4NF, 5NF
and 6NF) have been defined, but they are being used rarely.
A table that complies with 1NF assures that it actually represents a relation (i.e. it does not contain
any records that are repeating), and does not contain any attributes that are relational valued (i.e.
all the attributes should have atomic values). For a table to comply with 2NF, it should be complied
with 1NF and any attribute that is not a part of any candidate key (i.e. non-prime attributes) should
fully depend on any of the candidate keys in the table. According to the Codd’s definition, a table is
said to be in 3NF, if and only if, that table is in the second normal form (2NF) and every attribute in
the table that do not belong to a candidate key should directly depend on every candidate key of that
table. BCNF (also known as 3.5NF) captures some the anomalies that are not addressed by the 3NF.

Denormalization
Denormalization is the reverse process of the normalization process. Denormalization works by
adding redundant data or grouping data to optimize the performance. Even though, adding redundant
data sounds counter-productive, sometimes denormalization is a very important process to overcome
some of the shortcomings in the relational database software that may incur heavy performance
penalties with normalized databases (even tuned for higher performance). This is because joining
several relations (which are results of normalizing) to produce a result to a query can sometimes be
slow depending on the actual physical implementation of the database systems.

Difference between Normalization and Denormalization


l Normalization and denormalization are two processes that are completely opposite.
l Normalization is the process of dividing larger tables in to smaller ones reducing the redundant
data, while denormalization is the process of adding redundant data to optimize performance.
l Normalization is carried out to prevent databases anomalies.
l Denormalization is usually carried out to improve the read performance of the database, but
due to the additional constraints used for denormalization, writes (i.e. insert, update and delete
Copyright © 2014. Alpha Science International. All rights reserved.

operations) can become slower. Therefore, a denormalized database can offer worse write
performance than a normalized database.
l It is often recommended that you should “normalize until it hurts, denormalize until it works”.

Bhatia, Ashima Bhatnagar, and Ashima Bhatnagar Bhatia. Database Management System, Alpha Science International, 2014. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/georgetown/detail.action?docID=5218421.
Created from georgetown on 2023-02-21 16:29:01.
Copyright © 2014. Alpha Science International. All rights reserved.

Bhatia, Ashima Bhatnagar, and Ashima Bhatnagar Bhatia. Database Management System, Alpha Science International, 2014. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/georgetown/detail.action?docID=5218421.
Created from georgetown on 2023-02-21 16:29:01.

You might also like