Chapter 4 Normalization
Chapter 4 Normalization
1
Faculty of computing and software engineering G2-SW
Database Management Systems Chapter 7
Now, consider the following diagram, which represents an example of populated relation states
of the above schema as shown below.
The meaning of the EMPLOYEE relation schema is quite simple: Each tuple represents an
employee, with values for the employee‟s name (Ename), Social Security number (Ssn), birth
date (Bdate), and address (Address), and the department number that the employee works for
(Dnumber). The Dnumber attribute is a foreign key that represents an implicit relationship
between EMPLOYEE and DEPARTMENT.
The semantics of the DEPARTMENT and PROJECT schemas are also straightforward: Each
DEPARTMENT tuple represents a department entity, and each PROJECT tuple represents a
project entity. The attribute Dmgr_ssn of DEPARTMENT relates a department to the employee
who is its manager, while Dnum of PROJECT relates a project to its controlling department;
both are foreign key attributes. The ease with which the meaning of a relation‟s attributes can be
explained is, an informal measure of how well the relation is designed.
2
Faculty of computing and software engineering G2-SW
Database Management Systems Chapter 7
Now, consider an EMP-DEPT base relation as shown in the below diagram which is the result of
applying NATURAL JOIN operation to EMPLOYEE and DEPARTMENT in the above
diagram.
Insertion Anomalies
An “insertion anomaly” is a failure to place information about a new database entry. For ex: to
insert a new tuple for an employee who works in department 5, into the above drawn
EMP_DEPT relation, we must enter the attribute values for the department also. Also, it is
difficult to insert a new department that has no employees as yet in the EMP_DEPT relation. The
only way to do this is to place NULL values in the attributes for employee. This causes a
problem because Ssn is the primary key of EMP_DEPT, and each tuple is supposed to represent
an employee entity-not a department entity.
The above mentioned problem does not occur in the design of the figure (which shows
EMPLOYEE and DEPARTMENT as two different relations) because a department is entered in
the DEPARTMENT relation whether or not any employees work for it, and whenever an
employee is to be added, it will be done in EMPLOYEE relation only.
3
Faculty of computing and software engineering G2-SW
Database Management Systems Chapter 7
Deletion Anomalies
A “deletion anomaly” is a failure to remove information about an existing database entry.
Additionally, deletion of one data may result in lose of other information. For ex: if an employee
tuple (that happens to be the last employee working for a particular department) is deleted from
EMP_DEPT, the information concerning that department is lost from the database.
Modification Anomalies
A “modification anomaly” is a failure to modify/update information about an existing database
entry. For ex: In EMP_DEPT of the above figure, if we change the value of one of the attributes
of a particular department—say, the manager of department 5—we must update the tuples of all
employees who work in that department; otherwise, the database will become inconsistent.
If we attempt to JOIN (Cartesian product) the above relations, the following relation will be
occurred.
In the above relation, you can observe that there are some meaningless tuples (which are called
as spurious tuples). For ex: consider the second tuple in the above relation. It shows that an
employee with E_id = 101, is getting a salary of 100, belongs to CS & IT and Electrical
4
Faculty of computing and software engineering G2-SW
Database Management Systems Chapter 7
department also. This is clearly spurious information, since one employee cannot belong to two
departments. So, this tuple will be a spurious tuple and is marked by asterisks (*).
To obtain the correct data, we have to apply conditions on the JOIN operation. For ex: if the
condition is as
EMPLOYEE . E_id = DEPARTMENT . Dep_id
We will be retrieving the only the tuples 1, 5 and 9 only, which is the required one.
Functional Dependency
In general, a functional dependency is a relationship among attributes.
Data Dependency
The logical associations between data items that point the database designer in the direction of a
good database design are referred to as determinant or dependent relationships.
Two data items A and B are said to be in a determinant or dependent relationship if certain
values of data item B always appears with certain values of data item A. If the data item A is the
determinant data item and B the dependent data item, then the direction of the association is from
A to B and not vice versa.
The essence of this idea is that if A exists, implies that B must exist and have a certain value, and
then we say that "B is functionally dependent on A." Also it is possible to say that "A
functionally determines B," or that "B is a function of A," or that "A functionally governs B." or
"If A, then B."
The notation is: AB which is read as; B is functionally dependent on A
Since the type of Wine served depends on the type of Dinner, we say Wine is functionally
dependent on Dinner. ie, Dinner Wine
Full Dependency
If an attribute which is not a member of the primary key is dependent on the whole key and not
on some part of the primary key, then that attribute is fully functionally dependent on the
primary key.
Let {A, B} be the Primary Key and C is a non- key attribute
Then if {A, B}C and BC and AC does not hold, Then C is fully functionally dependent
on {A, B}.
Eg: {Ssn, P_number} Hours
is a full dependency because Hours is dependent on both the attributes Ssn and P_number, not on
any one of them, separately.
Let us see an example −
<ProjectCost>
ProjectID ProjectCost
001 1000
5
Faculty of computing and software engineering G2-SW
Database Management Systems Chapter 7
001 5000
<EmployeeProject>
EmpID ProjectID Days
E099 001 320
E056 002 190
The above relations states that −
Days are the number of days spent on the project.
EmpID, ProjectID, ProjectCost -> Days
Partial Dependency
If an attribute which is not a member of the primary key is dependent on some part of the
primary key, then that attribute is partially functionally dependent on the primary key.
Let {A, B} is the Primary Key and C is no key attribute.
Then if {A, B} C and BC or AC
Then C is partially functionally dependent on {A, B}
Eg: {Ssn, Pnumber} Ename
is a partial dependency, because SsnEname
example:
<StudentProject>
StudentID ProjectNo StudentName ProjectName
S01 199 Katie Geo Location
S02 120 Ollie Cluster Exploration
In the above table, we have partial dependency; let us see how −
The prime key attributes are StudentID and ProjectNo, and
StudentID = Unique ID of the student
StudentName = Name of the student
ProjectNo = Unique ID of the project
ProjectName = Name of the project
As stated, the non-prime attributes i.e. StudentName and ProjectName should be functionally
dependent on part of a candidate key, to be Partial Dependent.
6
Faculty of computing and software engineering G2-SW
Database Management Systems Chapter 7
The StudentName can be determined by StudentID, which makes the relation Partial
Dependent.
The ProjectName can be determined by ProjectNo, which makes the relation Partial
Dependent.
Transitive Dependency
In mathematics and logic, a transitive relationship is a relationship of the following form: "If A
implies B, and if also B implies C, then A implies C."
Eg:
->->
P->->Q
P->->R
In the above case, Multivalued Dependency exists only if Q and R are independent
attributes.
A table with multivalued dependency violates the 4NF.
Example
Let us see an example &mins;
<Student>
7
Faculty of computing and software engineering G2-SW
Database Management Systems Chapter 7
In the above table, we can see Students Amit and Akash have interest in more than
one activity.
This is multivalued dependency because CourseDiscipline of a student are
independent of Activities, but are dependent on the student.
Therefore, multivalued dependency −
<StudentCourse>
StudentName CourseDiscipline
Amit Mathematics
Amit Mathematics
Yuvraj Computers
Akash Literature
Akash Literature
Akash Literature
8
Faculty of computing and software engineering G2-SW
Database Management Systems Chapter 7
<StudentActivities>
StudentName Activities
Amit Singing
Amit Dancing
Yuvraj Cricket
Akash Dancing
Akash Cricket
Akash Singing
This breaks the multivalued dependency and now we have two functional dependencies
−
NORMALIZATION
A relational database is merely a collection of data, organized in a particular manner. Database
normalization is a series of steps followed to obtain a database design that allows for consistent
storage and efficient access of data in a relational database. Concept of normalization was
introduced by E.F. Codd (known as the father of the relational data model) as the basis for
database design. He defined first, second and third normal forms depending upon the
constraints which each normalization form satisfies.
Normalization is the process of identifying the logical associations between data items and
designing a database that will represent such associations but without any type of anomalies..
Normalization may reduce system performance since data will be cross referenced from many
tables. Thus de-normalization is sometimes used to improve performance, at the cost of reduced
consistency guarantees.
Database Normalization is a technique of organizing the data in the database. Normalization is a
systematic approach of decomposing tables to eliminate data redundancy (repetition) and
undesirable characteristics like Insertion, Update and Deletion Anomalies.
Normalization is used for mainly two purposes,
Eliminating redundant (useless) data.
Ensuring data dependencies make sense i.e. data is logically stored.
9
Faculty of computing and software engineering G2-SW
Database Management Systems Chapter 7
If a table is not properly normalized and have data redundancy then it will not only eat up extra
memory space but will also make it difficult to handle and update the database, without facing
data loss. Insertion, Updation and Deletion Anamolies are very frequent if database is not
normalized.
Steps of Normalization
We have various levels or steps in normalization called Normal Forms. The level of complexity,
strength of the rule and decomposition increases as we move from one lower level Normal Form
to the higher. A table in a relational database is said to be in a certain normal form if it satisfies
certain constraints.
Normalization towards a logical design consists of the following steps:
Normalized Form (UNF): Identify all data elements
First Normal Form (1NF): Find the key with which you can find all data i.e. remove any
repeating group
Second Normal Form (2NF): Remove part-key dependencies (partial dependency). Make all
data dependent on the whole key.
Third Normal Form (3NF): Remove non-key dependencies (transitive dependencies). Make all
data dependent on nothing but the key.
For most practical purposes, databases are considered normalized if they adhere to the third
normal form (there is no transitive dependency).
First Normal Form (1NF)
A relation is said to be in first normal form (INF) if and only if all underlying domains contain
atomic values only. i.e it states that the domain of an attribute must include only atomic values
(simple, indivisible) and that the value of any attribute in a tuple must be a single value from the
domain of that attribute.
1NF does not allows
composite attributes
multivalued attributes
The following diagram depicts the steps of normalization into 1NF form
10
Faculty of computing and software engineering G2-SW
Database Management Systems Chapter 7
Business rule: Whenever an employee participates in a project, he/she will be entitled for an
incentive.
This schema is in its 1NF since we don„t have any repeating groups or attributes with multi-
valued property. To convert it into a 2NF, we need to remove all partial dependencies of non key
attributes on part of the primary key.
11
Faculty of computing and software engineering G2-SW
Database Management Systems Chapter 7
As we can see, some non key attributes are partially dependent on some part of the primary key.
This can be witnessed by analyzing the first two functional dependencies (FD1 and FD2). Thus,
each Functional Dependencies, with their dependent attributes should be moved to a new relation
(as shown below) where the determinant will be the Primary Key for each.
This schema is in its 2NF since the primary key is a single attribute and there are no repeating
groups (multi valued attributes).
Let„s take StudID, Year and Dormitory and see the dependencies.
And Year can not determine StudID and Dormitory can not determine StudID Then transitively
To convert it into a 3NF, we need to remove all transitive dependencies of non key attributes on
another non-key attribute. The non-primary key attributes, dependent on each other will be
12
Faculty of computing and software engineering G2-SW
Database Management Systems Chapter 7
moved to another table and linked with the main table using Candidate Key- Foreign Key
relationship as shown below.
Generally, even though there are other four additional levels of Normalization, a table is said to
be normalized if it reaches 3NF. A database with all tables in the 3NF is said to be Normalized
Database.
Tips for remembering the rationale for normalization up to 3NF could be the following:
1. No Repeating or Redundancy: no repeating fields in the table.
2. The Fields depend upon the Key: the table should solely depend on the key.
3. The Whole Key: no partial key dependency.
4. And nothing but the Key: no inter data dependency.
The correct solution, to cause the model to be in 4th normal form, is to ensure that all M: M
relationships are resolved independently if they are indeed independent, as shown below.
13
Faculty of computing and software engineering G2-SW
Database Management Systems Chapter 7
Def: A table is in DKNF if every constraint on the table is a logical consequence of the
definition of keys and domains.
The underlying ideas in normalization are simple enough. Through normalization we want to design
for our relational database a set of tables that;
(1) Contain all the data necessary for the purposes that the database is to serve,
(2) Have as little redundancy as possible,
(3) Accommodate multiple values for types of data that require them,
(4) Permit efficient updates of the data in the database, and
(5) Avoid the danger of losing data unknowingly.
The following figure shows the graphical illustration of different phases of normalization.
14
Faculty of computing and software engineering G2-SW
Database Management Systems Chapter 7
Pitfalls of Normalization
15
Faculty of computing and software engineering G2-SW