0% found this document useful (0 votes)
27 views

DBMS Chap 07 Normalization

The document defines database normalization and describes its objectives to minimize redundancy and dependency. It discusses various types of database anomalies like redundancy, update anomalies, deletion anomalies, and insertion anomalies that can occur due to data redundancy. The document also explains different types of dependencies like functional dependency, transitive dependency, multivalued dependency, and join dependency. Finally, it provides an overview of database normal forms and describes how normalization is used to reduce vulnerabilities to logical inconsistencies and anomalies.

Uploaded by

Vinay Gupta
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

DBMS Chap 07 Normalization

The document defines database normalization and describes its objectives to minimize redundancy and dependency. It discusses various types of database anomalies like redundancy, update anomalies, deletion anomalies, and insertion anomalies that can occur due to data redundancy. The document also explains different types of dependencies like functional dependency, transitive dependency, multivalued dependency, and join dependency. Finally, it provides an overview of database normal forms and describes how normalization is used to reduce vulnerabilities to logical inconsistencies and anomalies.

Uploaded by

Vinay Gupta
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 46

Normalization

Definition
• Database normalization is the process of organizing
the fields and tables of a relational database to
minimize redundancy and dependency.
• Normalization usually involves dividing large tables
into smaller (and less redundant) tables and defining
relationships between them.
• The objective is to isolate data so that additions,
deletions, and modifications of a field can be made in
just one table and then propagated through the rest of
the database via the defined relationships.
What is anomalies ?
Database anomalies are the problems in relations that
occur due to redundancy in the relations.
These anomalies affect the process of inserting,
deleting and modifying data in the relations. Some
important data may be lost if a Relation is updated that
contains database anomalies.
It is important to remove these anomalies in order to
perform different processing on the relations without
any problem.
Types of Anomalies.
• Redundancy
– Repeat info unnecessarily in several tuples

• Update anomalies:
– Change info in one tuple but not in another.

• Deletion anomalies:
– Delete some values & lose other values too.

• Insert anomalies:
– Inserting row means having to insert other, separate info.
Data Redundancy
Data redundancies is nothing, it is duplicacy of data
that means, the same data is stored in different
location in a database, or if you getting some
problem to extract the data from the file due to
duplicacy of data then this cause is called as
redundancies.
Update Anomalies
An update anomaly occurs when we have a
lot of redundancy in our data. Due to
redundancy, data updating becomes
cumbersome. If we have to update one
attribute value, which is occurring a number
of times, we have to search for every
occurrence of that value and then change it.
Stu_No Stu_Name Address Course_ID Course_ID Course_Name Instructor
1001 Amit Jalandhar Cap302 Cap301 Data_base Mr.Sartaj Singh
1002 Vikash Chandigarh Cap301 Cap302 Operating_Syst Mrs. Jasleen
1003 Sumit ludhiana Cap303 em
1004 Rahul Jammu Cap301 Cap303 Financial_Man Mrs. Manpreet
agement Kaur
1005 Vijay Chandigarh Cap303

Stu_No Stu_Name Address Course_ID Course_Name Instructor

1001 Amit Jalandhar Cap302 Operating_System Mrs. Jasleen

1002 Vikash Chandigarh Cap301 Data_base Mr.Sartaj Singh

1003 Sumit ludhiana Cap303 Financial_Management Mrs. Manpreet Kaur

1004 Rahul Jammu Cap301 Data_base Mr.Sartaj Singh

1005 Vijay Chandigarh Cap303 Financial_Management Mrs. Manpreet Kaur


Stu_No Stu_Name Address Course_ID Course_Name Instructor

1001 Amit Jalandhar Cap302 Operating_System Mrs. Jasleen

1002 Vikash Chandigarh Cap301 Data_base Mr.Sartaj Singh

1003 Sumit ludhiana Cap303 Financial_Management Mrs. Manpreet Kaur

1004 Rahul Jammu Cap301 Data_base Mr. Navdeep Kumar

1005 Vijay Chandigarh Cap303 Financial_Management Mrs. Manpreet Kaur

An Update Anomaly exists when one or more instances of duplicated


data is updated, but not all i.e Change info in one tuple but not in
another.
In STU_DETAIL, if we want to change the name of Instructor of
Course_ID cap301 then it will update all the tuples in the table, but some
reason all the tuples are not updated, we might have a database that
gives two names of instructor for subject cap301.
Insert Anomalies
An insertion anomaly occurs when we are unable to
insert a tuple into a table. Such a situation can arise
when the value of primary key is not known. As per the
entity integrity rule, the primary key cannot have null
value. Therefore, the value/s corresponding to primary
key attribute/s of the tuple must be assigned before
inserting the tuple. If these values are unknown, the
tuple cannot be inserted into the table.
Delete Anomalies
In case of a deletion anomaly, the deletion
of a tuple causes problems in the
database. This can happen when we delete
a tuple, which contains an important piece
of information, and the tuple being the last
one in the table containing the
information. With the deletion of the tuple
the important piece of information also
gets removed from the database.
Dependency

A dependency refers to relationship amongst attributes.


These attributes may belong to the same relation or
different relations. Dependencies can be of various types
viz., functional dependencies, transitive dependencies,
multivalued dependencies, join dependencies, etc. We
shall briefly examine some of these dependencies.
Types of Dependency
• Functional Dependency
– Fully Functional Dependency
– Partial Functional Dependency
• Transitive Dependency
• Multivalued Dependency
• Join Dependency
Functional Dependency

Functional Dependency (F.D) – Functional dependency


represents semantic association between attributes. If a value
of an attribute A determines the value of another attribute B,
we say B is functionally dependent on A. This is denoted by

A => B and read as “A determines B” and A is called the


determinant
Course_ID Course_Name Instructor
Cap301 Data_base Mr.Sartaj Singh
Cap302 Operating_System Mrs. Jasleen
Cap303 Financial_Management Mrs. Manpreet Kaur

Course_Name and Instructors are dependent


on Course_ID
Fully Functional Dependency
Full Functional dependency  Indicates that if
A and B are attributes(columns)of a table, B
is fully functionally dependent on A if B is
functionally dependent on A ,but not on any
proper subset of A.

E.g. Staff_ID----> Domain


Partial Functional Dependency
Partial Functional Dependency Indicates that if A and B are
attributes of a table , B is partially dependent on A if there
is some attribute that can be removed from A and yet the
dependency still holds.

Say for Ex, consider the following functional dependency


that exists in the STUDENT table:

Reg_no, Name -------> Section_No


Section_No is functionally dependent on a subset of A
(Reg_no, Name ), namely Reg_no.
Transitive Dependency
Transitive Dependency –
Transitive dependency is a form of intermediate dependency.
For example, if we have attributes or groups of attributes A,
B and C such that A determines B and B determines C i.e.
A => B
B => C
Then we say a transitive dependency represented by A => B
=> C

C is transitively dependent on A through B.


Multi-valued Dependency
Multi-valued Dependency refers to m:n (many-to-many)
relationships. We say multi-valued dependency exists between
two data items when one value of the first data item gives a
collection of values of the second data item i.e., it multi-
determines the second data items.

multivalued dependency is a full constraint between two sets of


attributes in a relation.

In contrast to the functional dependency, the multivalued


dependency requires that certain tuples be present in a
relation. Therefore, a multivalued dependency is a special case
of tuple-generating dependency. The multivalued dependency
plays a role in the 4NF database normalization.
Join Dependency

Join Dependency—If we decompose a relation into smaller relations


and the join of the smaller relations does not give us tuples as in the
parent relation, we say the relation has join dependency.
A join dependency is a constraint on the set of legal relations over a
database scheme. A table T is subject to a join dependency if T can
always be recreated by joining multiple tables each having a subset
of the attributes of T. If one of the tables in the join has all the
attributes of the table T, the join dependency is called trivial.
The join dependency plays an important role in the Fifth normal
form, also known as project-join normal form, because it can be
proven that if you decompose a scheme R in tables R1 to Rn, the
decomposition will be a lossless-join decomposition if you restrict
the legal relations on R to a join dependency on R called *
(R1,R2,...Rn).
Normal Forms
•The normal forms (abbrev. NF) of relational database theory provide criteria for
determining a table's degree of vulnerability to logical inconsistencies and
anomalies. The higher the normal form applicable to a table, the less vulnerable
it is to inconsistencies and anomalies. Each table has a "highest normal form"
(HNF): by definition, a table always meets the requirements of its HNF and of all
normal forms lower than its HNF; also by definition, a table fails to meet the
requirements of any normal form higher than its HNF.
•The normal forms are applicable to individual tables; to say that an entire
database is in normal form n is to say that all of its tables are in normal form n.
•Newcomers to database design sometimes suppose that normalization proceeds
in an iterative fashion, i.e. a 1NF design is first normalized to 2NF, then to 3NF,
and so on. This is not an accurate description of how normalization typically
works. A sensibly designed table is likely to be in 3NF on the first attempt;
furthermore, if it is 3NF, it is overwhelmingly likely to have an HNF of 5NF.
Achieving the "higher" normal forms (above 3NF) does not usually require an
extra expenditure of effort on the part of the designer, because 3NF tables usually
need no modification to meet the requirements of these higher normal forms.
1NF
• First normal form (1NF) is a relation that has a primary
key and in which there are no repeating groups.
• It is the process of eliminating duplicate forms from
same data and creating the separate tables for each
group of related data and also to identify each row
with a unique column or set of columns.
• The only thing that is required for a table to be in 1NF
is to contain only atomic values (intersection of each
row and column should contain one and only one
value).this is sometimes referred to as : Eliminate
Repeating groups.
Example
• Suppose a designer wishes to record the names and telephone numbers of customers. He
defines a customer table which looks like this

Customer
Customer ID First Name Surname Telephone Number
123 Robert Ingram 555-861-2025
456 Jane Wright 555-403-1659
789 Maria Fernandez 555-808-9633

The designer then becomes aware of a requirement to record multiple telephone


numbers for some customers. He reasons that the simplest way of doing this is to allow
the "Telephone Number" field in any given record to contain more than one value

Customer
Customer ID First Name Surname Telephone Number
123 Robert Ingram 555-861-2025
555-403-1659
456 Jane Wright
555-776-4100
789 Maria Fernandez 555-808-9633
Assuming, however, that the Telephone Number column is defined on
some Telephone Number-like domain , the representation above is not
in 1NF.
1NF prevents a single field from containing more than one value from
its column's domain.
Repeating groups across columns
The designer might attempt to get around this restriction by defining
multiple Telephone Number columns:

Customer
Customer ID First Name Surname Tel. No. 1 Tel. No. 2 Tel. No. 3
123 Robert Ingram 555-861-
2025
555-403- 555-776- 555-403-
456 Jane Wright 1659 4100 1659
555-808-
789 Maria Fernandez
9633
This representation, however, makes use of nullable columns, and therefore does
not conform to definition of 1NF . Tel. No. 1, Tel. No. 2., and Tel. No. 3. share
exactly the same domain and exactly the same meaning; the splitting of
Telephone Number into three headings is artificial and causes logical problems.
These problems include:

• Difficulty in querying the table. Answering such questions as "Which customers


have telephone number X?" and "Which pairs of customers share a telephone
number?" is awkward.
• Inability to enforce uniqueness of Customer-to-Telephone Number links
through the RDBMS. Customer 789 might mistakenly be given a Tel. No. 2
value that is exactly the same as her Tel. No. 1 value.
• Restriction of the number of telephone numbers per customer to three. If a
customer with four telephone numbers comes along, we are constrained to
record only three and leave the fourth unrecorded. This means that the
database design is imposing constraints on the business process, rather than
(as should ideally be the case) vice-versa.
A design that complies with 1NF
A design that is unambiguously in 1NF makes use of two tables: a Customer Name table and a
Customer Telephone Number table.

Customer Name Customer Telephone Number

Customer ID First Name Surname Customer ID Telephone Number

123 Robert Ingram 123 555-861-2025


456 Jane Wright 456 555-403-1659
456 555-776-4100
789 Maria Fernandez
789 555-808-9633

Repeating groups of telephone numbers do not occur in this design. Instead, each
Customer-to-Telephone Number link appears on its own record. With Customer ID as key
fields, a "parent-child" or one-to-many (1:M) relationship exists between the two tables,
since a customer record (in the "parent" table, Customer Name) can have many
telephone number records (in the "child" table, Customer Telephone Number), but each
telephone number usually has one, and only one customer. In the case where several
customers could share the same telephone number, an additional column is needed in
the Customer Telephone Number table to represent a unique key
2NF
• Second normal form (2NF) a relation in first
normal form in which every non key attribute is
fully functionally dependent on the primary key.
• Meet all requirement of 1NF & removing subset
of data that apply to multiple rows of tables and
place them in separate tables.
• A Table is said to be in 2NF if it is in 1NF and
there are no partial dependencies i.e. every non
primary key attribute of the Table is fully
functionally dependent on the primary key.
Example: Consider a table describing employees' skills:

Employees' Skills

Employee Skill Current Work Location


Jones Typing 114 Main Street
Jones Shorthand 114 Main Street
Jones Whittling 114 Main Street
Bravo Light Cleaning 73 Industrial Way
Ellis Alchemy 73 Industrial Way
Ellis Flying 73 Industrial Way
Harrison Light Cleaning 73 Industrial Way
• Neither {Employee} nor {Skill} is a candidate key for the table. This
is because a given Employee might need to appear more than
once (he might have multiple Skills), and a given Skill might need
to appear more than once (it might be possessed by multiple
Employees). Only the composite key {Employee, Skill} qualifies as a
candidate key for the table.
• The remaining attribute, Current Work Location, is dependent on
only part of the candidate key, namely Employee. Therefore the
table is not in 2NF.
• Note the redundancy in the way Current Work Locations are
represented: we are told three times that Jones works at 114 Main
Street, and twice that Ellis works at 73 Industrial Way.
• This redundancy makes the table vulnerable to update anomalies:
it is, for example, possible to update Jones' work location on his
"Typing" and "Shorthand" records and not update his "Whittling"
record. The resulting data would imply contradictory answers to
the question "What is Jones' current work location?"
A 2NF alternative to this design would represent the same information in two
tables: an "Employees" table with candidate key {Employee}, and an
"Employees' Skills" table with candidate key {Employee, Skill}:
Employees Employees' Skills
Employee Current Work Location Employee Skill
Jones 114 Main Street Jones Typing
Bravo 73 Industrial Way
Jones Shorthand
Ellis 73 Industrial Way
Jones Whittling
Harrison 73 Industrial Way
Bravo Light Cleaning
Ellis Alchemy
Ellis Flying
Harrison Light Cleaning

Neither of these tables can suffer from update anomalies. Not all 2NF tables are free
from update anomalies
3NF
• Third normal form (3NF)a relation that is in
second normal form and has no transitive
dependencies.
• Meeting both above forms and removing
duplicate data.
• A Table that is in 1NF and 2NF and in which no
non primary key attribute is transitively
dependent on primary key.
• Third Normal form applies that every non-prime attribute of table must be dependent
on primary key, or we can say that, there should not be the case that a non-prime
attribute is determined by another non-prime attribute. So this transitive functional
dependency should be removed from the table and also the table must be in Second
Normal form. For example, consider a table with following fields.
• Student_Detail Table 
Student_id Student_name DOB Street city State Zip Student_id

• In this table Student_id is Primary key, but street, city and state depends upon Zip. The
dependency between zip and other fields is called transitive dependency. Hence to
apply 3NF, we need to move the street, city and state to new table, with Zip as
primary key.
• New Student_Detail Table :
Student_id Student_name DOB Zip

• Address Table :

Zip Street city state


Now it should be obvious that TeacherName is dependent on TeacherID - so this is
not in 3NF. To fix this, we do much the same as we did in 2NF - take TeacherName
out of this table, and put it in its own, which has TeacherID as the key.

No redundancy!!
One important thing to remember is that if something is not in 1NF, it is not in 2NF or 3NF
either. So each additional Normal Form requires everything that the lower ones had, plus
some extra conditions, which must all be fulfilled.
BCNF
• Boyce-codd Normal Form (BCNF)   A Table is in BCNF if
and only if every determinant is a candidate key.
• BCNF is a stronger form of 3NF.
The difference between 3NF and BCNF is that for a
Functional dependency A--->B, 3NF allows this
dependency in a table if attribute B is a primary key
attribute and attribute A is not a candidate key, where
as BCNF insists that for this dependency to remain in a
table, attribute A must be a candidate key.

• Note:- Determinant - it is an attribute or a group of attributes on which


some other attribute is fully functionally dependent
• Boyce and Codd Normal Form is a higher version of the Third Normal form. This form
deals with certain type of anomaly that is not handled by 3NF. A 3NF table which does
not have multiple overlapping candidate keys is said to be in BCNF. For a table to be in
BCNF, following conditions must be satisfied:
• R must be in 3rd Normal Form
• and, for each functional dependency ( X -> Y ), X should be a super Key.
4NF

• Fourth Normal Form (4NF)   4NF is a stronger normal


form than BCNF as it prevents Tables from containing
nontrivial Multi-Valued Dependencies (MVDs) and
hence data redundancy.

• The Normalization of BCNF Tables to 4NF involves the


removal of MVDs from the Table by placing the
attribute(s) in a new Table along with the copy of the
determinant(s).
Fourth Normal Form (4NF) - MVD
• Dependency between attributes (for example, A,
B, and C) in a relation, such that for each value
of A there is a set of values for B and a set of
values for C. However, set of values for B and C
are independent of each other.
• MVD between attributes A, B, and C in a
relation using the following notation:
A ¾¾ØØ B
A ¾¾ØØ C

Deepak Gour, Faculty – DBMS, School of


Engineering, SPSU
Fourth Normal Form (4NF)
MVD can be further defined as being trivial or
nontrivial.
MVD A ¾¾ØØ B in relation R is defined as
being trivial if
(a) B is a subset of A or
(b) A  B = R.
MVD is defined as being nontrivial if
neither (a) nor (b) are satisfied.
Trivial MVD does not specify a constraint on a
relation, while a nontrivial MVD does specify a
constraint.
Pizza Delivery Permutations

Restaurant Pizza Variety Delivery Area


A1 Pizza Thick Crust Springfield
A1 Pizza Thick Crust Shelbyville
A1 Pizza Thick Crust Capital City
A1 Pizza Stuffed Crust Springfield
A1 Pizza Stuffed Crust Shelbyville
A1 Pizza Stuffed Crust Capital City
Elite Pizza Thin Crust Capital City
Elite Pizza Stuffed Crust Capital City
Vincenzo's Pizza Thick Crust Springfield
Vincenzo's Pizza Thick Crust Shelbyville
Vincenzo's Pizza Thin Crust Springfield
Vincenzo's Pizza Thin Crust Shelbyville
• Each row indicates that a given restaurant can deliver a given variety of pizza to a
given area.
• The table has no non-key attributes because its only key is {Restaurant, Pizza Variety,
Delivery Area}. Therefore it meets all normal forms up to BCNF. If we assume,
however, that pizza varieties offered by a restaurant are not affected by delivery
area, then it does not meet 4NF. The problem is that the table features two non-
trivial multivalued dependencies on the {Restaurant} attribute (which is not a super
key). The dependencies are:

• {Restaurant} {Pizza Variety}


• {Restaurant} {Delivery Area}

• These non-trivial multivalued dependencies on a non-super key reflect the fact that
the varieties of pizza a restaurant offers are independent from the areas to which
the restaurant delivers. This state of affairs leads to redundancy in the table:
• for example, we are told three times that A1 Pizza offers Stuffed Crust, and if A1
Pizza starts producing Cheese Crust pizzas then we will need to add multiple rows,
one for each of A1 Pizza's delivery areas. There is, moreover, nothing to prevent us
from doing this incorrectly: we might add Cheese Crust rows for all but one of A1
Pizza's delivery areas, thereby failing to respect the multivalued dependency
{Restaurant} {Pizza Variety}.
• To eliminate the possibility of these anomalies, we must place the facts about varieties
offered into a different table from the facts about delivery areas, yielding two tables that are
both in 4NF.

Varieties By Restaurant Delivery Areas By Restaurant

Restaurant Pizza Variety Restaurant Delivery Area


A1 Pizza Thick Crust A1 Pizza Springfield
A1 Pizza Stuffed Crust A1 Pizza Shelbyville
Elite Pizza Thin Crust A1 Pizza Capital City
Elite Pizza Stuffed Crust Elite Pizza Capital City
Vincenzo's Pizza Thick Crust Vincenzo's Pizza Springfield
Vincenzo's Pizza Thin Crust Vincenzo's Pizza Shelbyville
5NF
• Fifth Normal Form(5NF)   5NF is also called Project-Join
Normal Form(PJRF) and specifies that a 5NF Table has no
Join dependency.
• A table is said to be in the 5NF if and only if every join
dependency in it is implied by the candidate keys.
• A join dependency *{A, B, … Z} on R is implied by the
candidate key(s) of R if and only if each of A, B, …, Z is a
superkey for R.
• Super key(Combinational Primary Key or Compound
Primary Key):- If two or more attributes are combined to
form a primary key, then that Primary key is called SUPER
Key. A simple Primary Key is also called minimal super key.
Travelling Salesman Product Availability By Brand

Travelling Salesman Brand Product Type


Jack Schneider Acme Vacuum Cleaner
Jack Schneider Acme Breadbox
Willy Loman Robusto Pruning Shears
Willy Loman Robusto Vacuum Cleaner
Willy Loman Robusto Breadbox
Willy Loman Robusto Umbrella Stand
Louis Ferguson Robusto Vacuum Cleaner
Louis Ferguson Robusto Telescope
Louis Ferguson Acme Vacuum Cleaner
Louis Ferguson Acme Lava Lamp
Louis Ferguson Nimbus Tie Rack
• The table's predicate is: Products of the type designated by Product
Type, made by the brand designated by Brand, are available from
the travelling salesman designated by Travelling Salesman.
• In the absence of any rules restricting the valid possible
combinations of Travelling Salesman, Brand, and Product Type, the
three-attribute table above is necessary in order to model the
situation correctly.
• Suppose, however, that the following rule applies:
A Travelling Salesman has certain Brands and certain
Product Types in his list.
If Brand B is in his list, and Product Type P is in his list,
then (assuming Brand B makes Product Type P), the
Travelling Salesman must offer products of Product
Type P made by Brand B.
In that case, it is possible to split the table into three: Brands By Travelling Salesman
Travelling Salesman Brand
Product Types By Travelling Salesman

Travelling Jack Schneider Acme


Product Type
Salesman Willy Loman Robusto
Jack Schneider Vacuum Cleaner
Louis Ferguson Robusto
Jack Schneider Breadbox
Louis Ferguson Acme
Willy Loman Pruning Shears
Louis Ferguson Nimbus
Willy Loman Vacuum Cleaner
Brand Product Type
Willy Loman Breadbox
Acme Vacuum Cleaner
Willy Loman Umbrella Stand
Acme Breadbox
Louis Ferguson Telescope
Acme Lava Lamp
Louis Ferguson Vacuum Cleaner Robusto Pruning Shears
Louis Ferguson Lava Lamp Robusto Vacuum Cleaner
Louis Ferguson Tie Rack Robusto Breadbox
Robusto Umbrella Stand
Robusto Telescope
Product Types By Brand
Nimbus Tie Rack
Note how this setup helps to remove redundancy.
Suppose that Jack Schneider starts selling Robusto's
products. In the previous setup we would have to add
two new entries since Jack Schneider is able to sell two
Product Types covered by Robusto: Breadboxes and
Vacuum Cleaners. With the new setup we need only
add a single entry (in Brands By Travelling Salesman).

You might also like