DBMS Chap 07 Normalization
DBMS Chap 07 Normalization
Definition
• Database normalization is the process of organizing
the fields and tables of a relational database to
minimize redundancy and dependency.
• Normalization usually involves dividing large tables
into smaller (and less redundant) tables and defining
relationships between them.
• The objective is to isolate data so that additions,
deletions, and modifications of a field can be made in
just one table and then propagated through the rest of
the database via the defined relationships.
What is anomalies ?
Database anomalies are the problems in relations that
occur due to redundancy in the relations.
These anomalies affect the process of inserting,
deleting and modifying data in the relations. Some
important data may be lost if a Relation is updated that
contains database anomalies.
It is important to remove these anomalies in order to
perform different processing on the relations without
any problem.
Types of Anomalies.
• Redundancy
– Repeat info unnecessarily in several tuples
• Update anomalies:
– Change info in one tuple but not in another.
• Deletion anomalies:
– Delete some values & lose other values too.
• Insert anomalies:
– Inserting row means having to insert other, separate info.
Data Redundancy
Data redundancies is nothing, it is duplicacy of data
that means, the same data is stored in different
location in a database, or if you getting some
problem to extract the data from the file due to
duplicacy of data then this cause is called as
redundancies.
Update Anomalies
An update anomaly occurs when we have a
lot of redundancy in our data. Due to
redundancy, data updating becomes
cumbersome. If we have to update one
attribute value, which is occurring a number
of times, we have to search for every
occurrence of that value and then change it.
Stu_No Stu_Name Address Course_ID Course_ID Course_Name Instructor
1001 Amit Jalandhar Cap302 Cap301 Data_base Mr.Sartaj Singh
1002 Vikash Chandigarh Cap301 Cap302 Operating_Syst Mrs. Jasleen
1003 Sumit ludhiana Cap303 em
1004 Rahul Jammu Cap301 Cap303 Financial_Man Mrs. Manpreet
agement Kaur
1005 Vijay Chandigarh Cap303
Customer
Customer ID First Name Surname Telephone Number
123 Robert Ingram 555-861-2025
456 Jane Wright 555-403-1659
789 Maria Fernandez 555-808-9633
Customer
Customer ID First Name Surname Telephone Number
123 Robert Ingram 555-861-2025
555-403-1659
456 Jane Wright
555-776-4100
789 Maria Fernandez 555-808-9633
Assuming, however, that the Telephone Number column is defined on
some Telephone Number-like domain , the representation above is not
in 1NF.
1NF prevents a single field from containing more than one value from
its column's domain.
Repeating groups across columns
The designer might attempt to get around this restriction by defining
multiple Telephone Number columns:
Customer
Customer ID First Name Surname Tel. No. 1 Tel. No. 2 Tel. No. 3
123 Robert Ingram 555-861-
2025
555-403- 555-776- 555-403-
456 Jane Wright 1659 4100 1659
555-808-
789 Maria Fernandez
9633
This representation, however, makes use of nullable columns, and therefore does
not conform to definition of 1NF . Tel. No. 1, Tel. No. 2., and Tel. No. 3. share
exactly the same domain and exactly the same meaning; the splitting of
Telephone Number into three headings is artificial and causes logical problems.
These problems include:
Repeating groups of telephone numbers do not occur in this design. Instead, each
Customer-to-Telephone Number link appears on its own record. With Customer ID as key
fields, a "parent-child" or one-to-many (1:M) relationship exists between the two tables,
since a customer record (in the "parent" table, Customer Name) can have many
telephone number records (in the "child" table, Customer Telephone Number), but each
telephone number usually has one, and only one customer. In the case where several
customers could share the same telephone number, an additional column is needed in
the Customer Telephone Number table to represent a unique key
2NF
• Second normal form (2NF) a relation in first
normal form in which every non key attribute is
fully functionally dependent on the primary key.
• Meet all requirement of 1NF & removing subset
of data that apply to multiple rows of tables and
place them in separate tables.
• A Table is said to be in 2NF if it is in 1NF and
there are no partial dependencies i.e. every non
primary key attribute of the Table is fully
functionally dependent on the primary key.
Example: Consider a table describing employees' skills:
Employees' Skills
Neither of these tables can suffer from update anomalies. Not all 2NF tables are free
from update anomalies
3NF
• Third normal form (3NF)a relation that is in
second normal form and has no transitive
dependencies.
• Meeting both above forms and removing
duplicate data.
• A Table that is in 1NF and 2NF and in which no
non primary key attribute is transitively
dependent on primary key.
• Third Normal form applies that every non-prime attribute of table must be dependent
on primary key, or we can say that, there should not be the case that a non-prime
attribute is determined by another non-prime attribute. So this transitive functional
dependency should be removed from the table and also the table must be in Second
Normal form. For example, consider a table with following fields.
• Student_Detail Table
Student_id Student_name DOB Street city State Zip Student_id
• In this table Student_id is Primary key, but street, city and state depends upon Zip. The
dependency between zip and other fields is called transitive dependency. Hence to
apply 3NF, we need to move the street, city and state to new table, with Zip as
primary key.
• New Student_Detail Table :
Student_id Student_name DOB Zip
• Address Table :
No redundancy!!
One important thing to remember is that if something is not in 1NF, it is not in 2NF or 3NF
either. So each additional Normal Form requires everything that the lower ones had, plus
some extra conditions, which must all be fulfilled.
BCNF
• Boyce-codd Normal Form (BCNF) A Table is in BCNF if
and only if every determinant is a candidate key.
• BCNF is a stronger form of 3NF.
The difference between 3NF and BCNF is that for a
Functional dependency A--->B, 3NF allows this
dependency in a table if attribute B is a primary key
attribute and attribute A is not a candidate key, where
as BCNF insists that for this dependency to remain in a
table, attribute A must be a candidate key.
• These non-trivial multivalued dependencies on a non-super key reflect the fact that
the varieties of pizza a restaurant offers are independent from the areas to which
the restaurant delivers. This state of affairs leads to redundancy in the table:
• for example, we are told three times that A1 Pizza offers Stuffed Crust, and if A1
Pizza starts producing Cheese Crust pizzas then we will need to add multiple rows,
one for each of A1 Pizza's delivery areas. There is, moreover, nothing to prevent us
from doing this incorrectly: we might add Cheese Crust rows for all but one of A1
Pizza's delivery areas, thereby failing to respect the multivalued dependency
{Restaurant} {Pizza Variety}.
• To eliminate the possibility of these anomalies, we must place the facts about varieties
offered into a different table from the facts about delivery areas, yielding two tables that are
both in 4NF.