Database Management System ---- (Chapter 8)
Database Management System ---- (Chapter 8)
8
Schema Refinement and
Normal Forms
INTRODUCTION
View
Views and tables, both are two database object types. In simple words, Views are stored or named
select queries. They can be created as shown below.
Create or replace view view_name
As
Select_statement;
Tables are made up of columns and rows. A column is a set of data, which belongs to a same data
type. A row is a sequence of values, which can be from different data types. Columns are identified
by the column names, and each row is uniquely identified by the table primary key. Tables are created
using “create table” DDL query.
Create table table_name (
Column_name1 datatype (length),
Copyright © 2014. Alpha Science International. All rights reserved.
Schema
A database schema of a database system describes the structure and the organization of data. A formal
language supported by the Database Management System is used to define the database schema.
Schema describes how the database will be constructed using its tables. Formally, schema is defined as
the set of formula that imposes integrity constraints on the tables. Furthermore, the database schema
Bhatia, Ashima Bhatnagar, and Ashima Bhatnagar Bhatia. Database Management System, Alpha Science International, 2014. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/georgetown/detail.action?docID=5218421.
Created from georgetown on 2023-02-21 16:29:01.
8.2 Database Management System
will describe all tables, column names and types, indexes, etc. There are three types of schema called
the conceptual schema, logical schema and physical schema. Conceptual schema describes how
concepts and relationships are mapped. Logical schema defines how entities, attributes and relations
are mapped. Physical schema is a specific implementation of the aforementioned logical schema.
Table
A table is a set of data that is organized in to rows and columns. A database contains one or more
tables that actually hold the data in the database. Each table in a database has a unique name that is
used to identify it. Columns in a database also have a unique name and a data type associated with it.
In addition, there can be special attributes associated with a column such as whether it is a primary
key or whether it is used as an index, etc. The rows in a table hold the actual data. In relational
databases, a relation is represented using a table. But a relation and a table are not the same, since
a table can have rows that are duplicates (and a relation cannot contain duplicate rows). There are
two types of tables as object tables and relational tables. Object tables hold objects of a defined type
whereas relational tables hold user data in a relational database.
Instance
Instance is a collection of processes running on top of the operating system and the related memory
that interacts with the data storage. The instance is the interface between the user and the database.
Processes capable of communicating with the client and accessing database are provided by the
instance. These processes are background processes and they are not enough to maintain the ACID
(Atomicity, Consistency, Isolation, and Durability) principle in the database. So, an instance also
uses few other components such as memory cache and buffers. More specifically, an Instance is
composed of three parts. They are SGA (System Global Area), PGA (Program Global Area) and
background processes. SGA is a temporary shared memory structure, which has a life span of the
instance startup to its shutdown.
Database
The Oracle database refers to the actual storage of the Oracle RDBMS. It is made up of three main
components. They are control files, redo files and data files. Optionally there could be password files
Copyright © 2014. Alpha Science International. All rights reserved.
in the database. The control files keep track of all the data files and redo files. It also helps keep the
database integrity intact by keeping track of the System Change Number (SCN), timestamps and
other critical information such as backup/recovery information. Data files keep the actual data. At
the time of database creation, at least two data files are created. These files are physically seen by
the DBA (Database Administrator). File operations such as renaming, resizing, adding, moving or
dropping can be carried out on data files. Redo log files (also known as online redo logs), keep the
information regarding the changes to the database with the chronological information. This information
is needed in case the user needs to redo all or some of the modifications on the database. In order
for an instance to manipulate the data of the database, it should open it first. An instance could open
only one database. However, a database can be opened by multiple instances.
Bhatia, Ashima Bhatnagar, and Ashima Bhatnagar Bhatia. Database Management System, Alpha Science International, 2014. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/georgetown/detail.action?docID=5218421.
Created from georgetown on 2023-02-21 16:29:01.
Schema Refinement and Normal Forms 8.3
Demerits of Redundancy
l Redundancy is at the root of several problems associated with relational schemas:
o Redundant storage, insert/delete/update anomalies
l Integrity constraints, in particular functional dependencies, can be used to identify schemas
with such problems and to suggest refinements.
l Redundancy is a primary issue in data storage.
o Problems caused by redundancy are as follows :
Redundant storage
l Decomposition should be used judiciously.
l Functional dependency constraints utilized to identify schemas with such problems and to
suggest refinements.
Anomalies in Database
Database anomalies, are really just unmatched or missing information caused by limitations or flaws
within a given database. Databases are designed to collect data and sort or present it in specific ways
to the end user. Entering or deleting information, be it an update or a new record can cause issues
if the database is limited or has ‘bugs’.
Modification anomalies or Update anomalies are data inconsistencies that resulted from data
redundancy or partial update. The Problems resulting from data redundancy in database table are
known as update anomalies. If a modification is not carried out on all the relevant rows, the database
will become inconsistent. So any database insertion, deletion or modification that leaves the database
in an inconsistent state is said to have caused an update anomaly.
Insertion anomalies are issues that come about when you are inserting information into the
database for the first time. To insert the information into the table, we must enter the correct details
Copyright © 2014. Alpha Science International. All rights reserved.
so that they are consistent with the values for the other rows. Missing or incorrectly formatted entries
are two of the more common insertion errors. Most developers acknowledge that this will happen
and build in error codes that tell you exactly what went wrong.
Deletion anomalies are obviously about issues with data being deleted, either when attempting
to delete and being stopped by an error or by the unseen drop off of data. If we delete a row from the
table that represents the last piece of data, the details about that piece are also lost from the Database.
These are the least likely to be caught or to stop you from proceeding. Because many deletion errors
go unnoticed for extended periods of time, they could be the most costly in terms of recovery.
Database anomalies are a fact; we will all face them in one form or another in life. The importance
of backing up, storing offsite and data consistency checks come into full focus when you consider
what could be lost.
Bhatia, Ashima Bhatnagar, and Ashima Bhatnagar Bhatia. Database Management System, Alpha Science International, 2014. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/georgetown/detail.action?docID=5218421.
Created from georgetown on 2023-02-21 16:29:01.
8.4 Database Management System
Decomposition
Decomposition means replacing a relation with a collection of smaller relations.
Let r be a relation schema a set of relation schemas {r1, r2, …, rn} is a decomposition of r if
l r = r1 U r2 U …..U rn
l Each ri is a subset of r (for i = 1, 2…, n)
Decompositions should always be lossless. Lossless decomposition ensure that the information
in the original relation can be accurately reconstructed based on the information represented in the
decomposed relations.
Purpose of Decomposition
l Eliminate redundancy by decomposing a relation into several relations in a higher normal form.
l It is important to check that a decomposition does not lead to bad design
the relation.
l It is important to be aware of the potential problems caused by such residual redundancy in the
design and to take steps to avoid them (e.g. by adding some checks to application code).
l A good DB designer should have a firm grasp of normal forms and what problems they do (or
do not) alleviate, the technique of decomposition, and potential problems with decompositions.
Loosy Decomposition
Suppose a relation R is decomposed into two new relations R1 and R2. If on decomposition of R1
and R2, the original relation R cannot be obtained due to loss of information, than the decomposition
is known as loosy decomposition.
Bhatia, Ashima Bhatnagar, and Ashima Bhatnagar Bhatia. Database Management System, Alpha Science International, 2014. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/georgetown/detail.action?docID=5218421.
Created from georgetown on 2023-02-21 16:29:01.
Schema Refinement and Normal Forms 8.5
Lossless Decomposition
When on decomposition of a relation, recomposition can be done without loss of information then
the decomposition is called lossless decomposition.
Decomposition of R into X and Y is lossless-join w.r.t. a set of FDs F if, for every instance r
that satisfies F : px(r) py(r) = r.
It is essential that all decompositions used to deal with redundancy be lossless.
Bhatia, Ashima Bhatnagar, and Ashima Bhatnagar Bhatia. Database Management System, Alpha Science International, 2014. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/georgetown/detail.action?docID=5218421.
Created from georgetown on 2023-02-21 16:29:01.
8.6 Database Management System
Multi-valued Dependencies
A multivalued dependency (MVD) on R, X→→Y , says that if two tuples of R agree on all the
attributes of X, then their components in Y may be swapped, and the result will be two tuples that are
also in the relation, i.e., for each value of X, the values of Y are independent of the values of R-X-Y.
E.g. Student(name, addr, phones, course)
A drinker’s phones are independent of the beers they like.
name →→phones and name →→ course.
Thus, each of a student’s phones appears with each of the course they like in all combinations.
This repetition is unlike FD redundancy.
name → addr is the only FD.
Copyright © 2014. Alpha Science International. All rights reserved.
Functional Dependencies
Functional Dependency is the starting point for the process of normalization. Functional dependency
exists when a relationship between two attributes allows you to uniquely determine the corresponding
attribute’s value. If ‘X’ is known, and as a result you are able to uniquely identify ‘Y’, there is
functional dependency. Combined with keys, normal forms are defined for relations.
Examples
Bear Number determines Student Name:
BearNum → StuName
Department Number and Job Rank determine Security Clearance:
Bhatia, Ashima Bhatnagar, and Ashima Bhatnagar Bhatia. Database Management System, Alpha Science International, 2014. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/georgetown/detail.action?docID=5218421.
Created from georgetown on 2023-02-21 16:29:01.
Schema Refinement and Normal Forms 8.7
Armstrong’s Axioms
William W. Armstrong established a set of rules which can be sued to infer the functional dependencies
in a relational database
Reflexivity rule: If A is a set of attributes, and B is a set of attributes that are completely
contained in A, theA implies B.
l Augmentation rule: If A implies B, and C is a set of attributes, then if A implies B,
then AC implies BC.
l Transitivity rule: If A implies B and B implies C, then A implies C.
These can be simplified if we also use:
l Union rule: If A implies B and A implies C, the A implies AC.
l Decomposition rule: If A implies BC then A implies B and A implies C.
l Pseudotransitivity rule: If A implies B and CB implies D, then AC implies D.
Bhatia, Ashima Bhatnagar, and Ashima Bhatnagar Bhatia. Database Management System, Alpha Science International, 2014. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/georgetown/detail.action?docID=5218421.
Created from georgetown on 2023-02-21 16:29:01.
8.8 Database Management System
FD Axioms
Understanding: Functional Dependencies are recognized by analysis of the real world; no automation
or algorithm. Finding or recognizing them are the database designer’s task.
FD manipulations:
l Soundness -- no incorrect FD›s are generated
l Completeness -- all FD›s can be generated
Table 8.1 Armstrong’s Axioms Table
Closure
Find all FD’s for attributes a in a relation R
a+ denotes the set of attributes that are functionally determined by a
IF attribute(s) a IS/ARE A SUPERKEY OF R THEN a+ SHOULD BE THE WHOLE
RELATION R. This is our goal. Any attributes in a relation not part of the closure indicates
a problem with the design.
Bhatia, Ashima Bhatnagar, and Ashima Bhatnagar Bhatia. Database Management System, Alpha Science International, 2014. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/georgetown/detail.action?docID=5218421.
Created from georgetown on 2023-02-21 16:29:01.
Schema Refinement and Normal Forms 8.9
Normal Forms
While designing a database out of an entity–relationship model, the main problem existing in that
“raw” database is redundancy. Redundancy is storing the same data item in more one place.
A redundancy creates several problems like the following:
1. Extra storage space: storing the same data in many places takes large amount of disk space.
2. Entering same data more than once during data insertion.
3. Deleting data from more than one place during deletion.
4. Modifying data in more than one place.
5. Anomalies may occur in the database if insertion, deletion, modification etc are not done properly.
It creates inconsistency and unreliability in the database.
To solve this problem, the “raw” database needs to be normalized. This is a step by step process
of removing different kinds of redundancy and anomaly at each step. At each step a specific rule
is followed to remove specific kind of impurity in order to give the database a slim and clean look.
Bhatia, Ashima Bhatnagar, and Ashima Bhatnagar Bhatia. Database Management System, Alpha Science International, 2014. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/georgetown/detail.action?docID=5218421.
Created from georgetown on 2023-02-21 16:29:01.
8.10 Database Management System
In the sample table above, there are multiple occurrences of rows under each key Emp-Id.
Although considered to be the primary key, Emp-Id cannot give us the unique identification facility
for any single row. Further, each primary key points to a variable length record (3 for E01, 2 for
E02 and 4 for E03).
Bhatia, Ashima Bhatnagar, and Ashima Bhatnagar Bhatia. Database Management System, Alpha Science International, 2014. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/georgetown/detail.action?docID=5218421.
Created from georgetown on 2023-02-21 16:29:01.
Schema Refinement and Normal Forms 8.11
Bank-Id Bank-Name
B01 SBI
B02 UTI
After removing the portion into another relation we store lesser amount of data in two relations
without any loss information. There is also a significant reduction in redundancy.
Bhatia, Ashima Bhatnagar, and Ashima Bhatnagar Bhatia. Database Management System, Alpha Science International, 2014. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/georgetown/detail.action?docID=5218421.
Created from georgetown on 2023-02-21 16:29:01.
8.12 Database Management System
The given relation is in 3NF. Observe, however, that the names of Dept. and Head of Dept. are
duplicated. Further, if Professor P2 resigns, rows 3 and 4 are deleted. We lose the information that
Rao is the Head of Department of Chemistry.
The normalization of the relation is done by creating a new relation for Dept. and Head of Dept.
and deleting Head of Dept. form the given relation. The normalized relations are shown on next
page.
Bhatia, Ashima Bhatnagar, and Ashima Bhatnagar Bhatia. Database Management System, Alpha Science International, 2014. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/georgetown/detail.action?docID=5218421.
Created from georgetown on 2023-02-21 16:29:01.
Schema Refinement and Normal Forms 8.13
P1 Physics 50
P1 Maths 50
P2 Chemistry 25
P2 Physics 75
P3 Maths 100
When attributes in a relation have multi-valued dependency, further Normalization to 4NF and 5NF
are required. Let us first find out what multi-valued dependency is.
A multi-valued dependency is a typical kind of dependency in which each and every attribute
within a relation depends upon the other, yet none of them is a unique primary key.
We will illustrate this with an example. Consider a vendor supplying many items to many
projects in an organization. The following are the assumptions:
1. A vendor is capable of supplying many items.
2. A project uses many items.
3. A vendor supplies to many projects.
4. An item may be supplied by many vendors.
Bhatia, Ashima Bhatnagar, and Ashima Bhatnagar Bhatia. Database Management System, Alpha Science International, 2014. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/georgetown/detail.action?docID=5218421.
Created from georgetown on 2023-02-21 16:29:01.
8.14 Database Management System
A multi-valued dependency exists here because all the attributes depend upon the other and yet
none of them is a primary key having unique value.
Table 8.9 Vendor_Item Table
V1 I2
V2 I2
V2 I3
V3 I1
Bhatia, Ashima Bhatnagar, and Ashima Bhatnagar Bhatia. Database Management System, Alpha Science International, 2014. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/georgetown/detail.action?docID=5218421.
Created from georgetown on 2023-02-21 16:29:01.
Schema Refinement and Normal Forms 8.15
Name Description
An entity is in First Normal Form (1NF) when all tables are two-dimensional with no repeating groups.
A row is in first normal form (1NF) if all underlying domains contain atomic values only. 1NF eliminates repeating
groups by putting each into a separate table and connecting them with a one-to-many relationship. Make a
First Normal separate table for each set of related attributes and uniquely identify each record with a primary key.
Form
• Eliminate duplicative columns from the same table.
• Create separate tables for each group of related data and identify each row with a unique column or set of
columns (the primary key).
Bhatia, Ashima Bhatnagar, and Ashima Bhatnagar Bhatia. Database Management System, Alpha Science International, 2014. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/georgetown/detail.action?docID=5218421.
Created from georgetown on 2023-02-21 16:29:01.
8.16 Database Management System
An entity is in Second Normal Form (2NF) when it meets the requirement of being in First Normal Form (1NF)
and additionally:
• Does not have a composite primary key. Meaning that the primary key can not be subdivided into separate
logical entities.
Second
• All the non-key columns are functionally dependent on the entire primary key.
Normal
Form • A row is in second normal form if, and only if, it is in first normal form and every non-key attribute is fully
dependent on the key.
• 2NF eliminates functional dependencies on a partial key by putting the fields in a separate table from
those that are dependent on the whole key. An example is resolving many:many relationships using an
intersecting entity.
An entity is in Third Normal Form (3NF) when it meets the requirement of being in Second Normal Form (2NF)
and additionally:
Third
• Functional dependencies on non-key fields are eliminated by putting them in a separate table. At this level,
Normal
all non-key fields are dependent on the primary key.
Form
• A row is in third normal form if and only if it is in second normal form and if attributes that do not contribute
to a description of the primary key are move into a separate table. An example is creating look-up tables.
Boyce Codd Normal Form (BCNF) is a further refinement of 3NF. In his later writings Codd refers to BCNF as
3NF. A row is in Boyce Codd normal form if, and only if, every determinant is a candidate key. Most entities in
Boyce-Codd 3NF are already in BCNF.
Normal BCNF covers very specific situations where 3NF misses inter-dependencies between non-key (but candidate
Form key) attributes. Typically, any relation that is in 3NF is also in BCNF. However, a 3NF relation won’t be in BCNF
if (a) there are multiple candidate keys, (b) the keys are composed of multiple attributes, and (c) there are
common attributes between the keys.
An entity is in Fourth Normal Form (4NF) when it meets the requirement of being in Third Normal Form (3NF)
and additionally:
Fourth
• Has no multiple sets of multi-valued dependencies. In other words, 4NF states that no entity can have
Normal
more than a single one-to-many relationship within an entity if the one-to-many attributes are independent
Form
of each other.
• Many:many relationships are resolved independently.
Fifth Normal An entity is in Fifth Normal Form (5NF) if, and only if, it is in 4NF and every join dependency for the entity is a
Form consequence of its candidate keys.
Normalization vs Denormalization
Copyright © 2014. Alpha Science International. All rights reserved.
Relational databases are made up of relations (related tables). Tables are made up of columns. If
the tables are two large (i.e. too many columns in one table), then database anomalies can occur. If
the tables are two small (i.e. database is made up of many smaller tables), it would be inefficient
for querying. Normalization and Denormalization are two processes that are used to optimize the
performance of the database. Normalization minimizes the redundancies that are present in data
tables. Denormalization (reverse of normalization) adds redundant data or group data.
Normalization
Normalization is a process that is carried out to minimize the redundancies that are present in data
in relational databases. This process will mainly divide large tables in to smaller tables with fewer
redundancies (called “Normal forms”). These smaller tables will be related to each other through
well defined relationships. In a well normalized database, any alteration or modification in data will
Bhatia, Ashima Bhatnagar, and Ashima Bhatnagar Bhatia. Database Management System, Alpha Science International, 2014. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/georgetown/detail.action?docID=5218421.
Created from georgetown on 2023-02-21 16:29:01.
Schema Refinement and Normal Forms 8.17
requires modifying only a single table. First Normal Form (1NF), Second Normal Form (2NF),
and the Third Normal Form (3NF) were introduced by Edgar F. Codd. Boyce-Codd Normal Form
(BCNF) was introduced in 1974 by Codd and Raymond F. Boyce. Higher Normal Forms (4NF, 5NF
and 6NF) have been defined, but they are being used rarely.
A table that complies with 1NF assures that it actually represents a relation (i.e. it does not contain
any records that are repeating), and does not contain any attributes that are relational valued (i.e.
all the attributes should have atomic values). For a table to comply with 2NF, it should be complied
with 1NF and any attribute that is not a part of any candidate key (i.e. non-prime attributes) should
fully depend on any of the candidate keys in the table. According to the Codd’s definition, a table is
said to be in 3NF, if and only if, that table is in the second normal form (2NF) and every attribute in
the table that do not belong to a candidate key should directly depend on every candidate key of that
table. BCNF (also known as 3.5NF) captures some the anomalies that are not addressed by the 3NF.
Denormalization
Denormalization is the reverse process of the normalization process. Denormalization works by
adding redundant data or grouping data to optimize the performance. Even though, adding redundant
data sounds counter-productive, sometimes denormalization is a very important process to overcome
some of the shortcomings in the relational database software that may incur heavy performance
penalties with normalized databases (even tuned for higher performance). This is because joining
several relations (which are results of normalizing) to produce a result to a query can sometimes be
slow depending on the actual physical implementation of the database systems.
operations) can become slower. Therefore, a denormalized database can offer worse write
performance than a normalized database.
l It is often recommended that you should “normalize until it hurts, denormalize until it works”.
Bhatia, Ashima Bhatnagar, and Ashima Bhatnagar Bhatia. Database Management System, Alpha Science International, 2014. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/georgetown/detail.action?docID=5218421.
Created from georgetown on 2023-02-21 16:29:01.
Copyright © 2014. Alpha Science International. All rights reserved.
Bhatia, Ashima Bhatnagar, and Ashima Bhatnagar Bhatia. Database Management System, Alpha Science International, 2014. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/georgetown/detail.action?docID=5218421.
Created from georgetown on 2023-02-21 16:29:01.