Normalization Data Anomalies
Normalization Data Anomalies
Anomalies are problems that can occur in poorly planned, un-normalized databases.
Or
An anomaly is an inconsistency between one part of the data and another part.
Insertion Anomaly - The nature of a database may be such that it is not possible to
add a required piece of data unless another piece of unavailable data is also added.
E.g. A library database that cannot store the details of a new member until that
member has taken out a book.
Deletion Anomaly - A record of data can legitimately be deleted from a database,
and the deletion can result in the deletion of the only instance of other, required
data, E.g. Deleting a book loan from a library member can remove all details of the
particular book from the database such as the author, book title etc.
Modification/Update Anomaly - Incorrect data may have to be changed, which
could involve many records having to be changed, leading to the possibility of some
changes being made incorrectly.
Example:
Suppose a manufacturing company stores the employee details in a table
named employee that has four attributes.
Update anomaly: In the above table we have two rows for employee Rick
as he belongs to two departments of the company. If we want to update
the address of Rick then we have to update the same in two rows or the
data will become inconsistent. If somehow, the correct address gets
updated in one department but not in other then as per the database,
Rick would be having two different addresses, which is not correct and
would lead to inconsistent data.
AB
Here attribute B ‘s value is determined by only attribute A.
For example, in employee table consider 2 attributes Social Security
number (SSN) and name.
it can be said that name is dependent upon SSN ( SSN name) because
an employee's name can be uniquely determined from an SSN.
Transitive dependency:
A transitive dependency requires three or more attributes that have a
functional dependency between them.
Means, A C is a transitive dependency when it is true only because both
A B and BC are true.
For example, Consider AUTHORS table:
Book → Author: Here, the Book attribute determines Author attribute
Author → Author_Nationality: Likewise, the Author attribute
determines the Author_Nationality,
Book →Author_Nationality: If we know the book name, we can
determine the nationality via the Author column.
Multivalued Dependencies:
Multivalued dependencies occur when the presence of one/more rows in
a table implies the presence of one/more other rows in that same table.
The problem here is that both Ravi and Beth play several sports.
It is necessary to add a new row for every additional sport.
This table has introduced a multivalued dependency because
Student_Name ->-> Major
Student_Name ->-> Sport
What is Join Dependency?
If a table can be recreated by joining multiple tables and each of this table
have a subset of the attributes of the table, then the table is in Join
Dependency.
It is a generalization of Multivalued Dependency.
Example:
<Employee>
EmpName EmpSkills EmpJob (Assigned Work)
Tom Networking EJ001
Harry Web Development EJ002
Katie Programming EJ002
The above table can be decomposed into the following three tables;
therefore it is not in 5NF:
<EmployeeSkills>
EmpName EmpSkills
Tom Networking
Harry Web Development
Katie Programming
<EmployeeJob>
EmpName EmpJob
Tom EJ001
Harry EJ002
Katie EJ002
<JobSkills>
EmpSkills EmpJob
Networking EJ001
Web Development EJ002
Programming EJ002
The above relations have join dependency, so they are not in 5NF.
That would mean that a join relation of the above three relations is equal
to our original relation <Employee>.
Normalization in DBMS:
Normalization is a database design technique that begins by examining
the relationships (called functional dependencies) between attributes.
Normalization uses a series of tests (described as normal forms) to help
identify the optimal grouping of attributes.
Definition:
Normalization is the process of organizing the data in the database.
Or
Normalization is a step by step decomposing of large tables in to small by
eliminate data redundancy.
Types of Normalization:
The database normalization process is divided into following:
First Normal Form (1NF)
Second Normal Form (2NF)
Third Normal Form (3NF)
Boyce-Codd Normal Form (BCNF)
Fourth Normal Form (4NF)
Fifth Normal Form (5NF)
First Normal Form (1NF):
A table is said to be in 1NF:
If Each column is unique(no repeating groups).
Example:
Sample Employee table, it displays employees are working with multiple
departments.
Employee Ag Department
e
In the above table Department
Melvin 32 Marketing, Sales
column is having multiple values.
Melvin 32 Sales
1 Monitor Apple
2 Monitor Samsung
3 Scanner HP
productI product
D
1 Monitor
2 Scanner
3 Head phone
Brand table:
brandI brand
D
1 Apple
2 Samsung
3 HP
4 JBL
Products Brand table:
1 1 1
2 1 2
3 2 3
4 3 4
Subject Table
Score Table
score_id student_id subject_id marks
1 10 1 70
2 10 2 75
3 11 1 80
In the Score table, we need to store some more information, which is the
exam name and total marks, so let's add 2 more columns to the Score
table.
1 Workshop 200
2 Mains 70
3 Practicals 30
Example: