0% found this document useful (0 votes)
2 views

Chapter 4 Normalization

Chapter 7 discusses functional dependencies and normalization in relational databases, emphasizing the importance of clear semantics, reducing redundancy, and avoiding anomalies. It outlines various types of dependencies, including full, partial, and transitive dependencies, and introduces normalization steps to eliminate data redundancy and ensure logical data storage. The chapter concludes with a description of the normalization process and its significance in database design.

Uploaded by

Tesfalegn Yakob
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Chapter 4 Normalization

Chapter 7 discusses functional dependencies and normalization in relational databases, emphasizing the importance of clear semantics, reducing redundancy, and avoiding anomalies. It outlines various types of dependencies, including full, partial, and transitive dependencies, and introduces normalization steps to eliminate data redundancy and ensure logical data storage. The chapter concludes with a description of the normalization process and its significance in database design.

Uploaded by

Tesfalegn Yakob
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Database Management Systems Chapter 7

Functional Dependencies and Normalization for Relational Databases


In Relational Model, each relation schema consists of a number of attributes, and the relational
database schema consists of a number of relation schemas. Conceptual data models such as the
ER model or some other conceptual models make the designer to identify entity types and
relationship types and their respective attributes, which leads to a natural and logical grouping
of the attributes into relations.
There are two levels at which we can discuss the goodness of relation schemas. The first is the
logical (or conceptual) level—how users interpret the relation schemas and the meaning of their
attributes. Having good relation schemas at this level enables users to understand clearly the
meaning of the data in the relations, and hence to formulate their queries correctly.
The second is the implementation (or physical storage) level—how the tuples in a base relation
are stored and updated which will be physically stored as files.

Informal Design Guidelines for Relation Schemas


There are four informal measures that may be used as measures to determine the quality of
relation schema design: they are
 Making sure that the semantics of the attributes is clear in the schema.
 Reducing the redundant information in tuples.
 Reducing the NULL values in tuples.
 Disallowing the possibility of generating spurious tuples.
These measures are not always independent of one another.

Clear Semantics to Attributes in Relations


The semantics of a relation refers to the interpretation of attribute values in a tuple. Whenever
the attributes are grouped to form a relation schema, it is assumed that attributes belonging to
one relation have certain real-world meaning and a proper interpretation associated with them.
Consider a simplified version of the COMPANY relational database schema as shown in the
below given figure.

1
Faculty of computing and software engineering G2-SW
Database Management Systems Chapter 7

Now, consider the following diagram, which represents an example of populated relation states
of the above schema as shown below.

The meaning of the EMPLOYEE relation schema is quite simple: Each tuple represents an
employee, with values for the employee‟s name (Ename), Social Security number (Ssn), birth
date (Bdate), and address (Address), and the department number that the employee works for
(Dnumber). The Dnumber attribute is a foreign key that represents an implicit relationship
between EMPLOYEE and DEPARTMENT.
The semantics of the DEPARTMENT and PROJECT schemas are also straightforward: Each
DEPARTMENT tuple represents a department entity, and each PROJECT tuple represents a
project entity. The attribute Dmgr_ssn of DEPARTMENT relates a department to the employee
who is its manager, while Dnum of PROJECT relates a project to its controlling department;
both are foreign key attributes. The ease with which the meaning of a relation‟s attributes can be
explained is, an informal measure of how well the relation is designed.

2
Faculty of computing and software engineering G2-SW
Database Management Systems Chapter 7

Reducing Redundant Information in Tuples


Consider the two base relations EMPLOYEE and DEPARTMENT as shown in the figure below.

Now, consider an EMP-DEPT base relation as shown in the below diagram which is the result of
applying NATURAL JOIN operation to EMPLOYEE and DEPARTMENT in the above
diagram.

In EMP_DEPT, the attribute values pertaining to a particular department (Dnumber, Dname,


Dmgr_ssn) are repeated for every employee who works for that department. In contrast, each
department‟s information appears only once in the DEPARTMENT relation in the first figure
(which shows the EMPLOYEE and DEPARTMENT relation separately). Only the department
number (Dnumber) is repeated in the EMPLOYEE relation for each employee who works in that
department as a foreign key.
Another problem with using the relations in figure above (which shows the EMP-DEPT base
relation) is the problem of update anomalies. These can be classified into;
 Insertion anomalies
 Deletion anomalies
 Modification anomalies.

Insertion Anomalies
An “insertion anomaly” is a failure to place information about a new database entry. For ex: to
insert a new tuple for an employee who works in department 5, into the above drawn
EMP_DEPT relation, we must enter the attribute values for the department also. Also, it is
difficult to insert a new department that has no employees as yet in the EMP_DEPT relation. The
only way to do this is to place NULL values in the attributes for employee. This causes a
problem because Ssn is the primary key of EMP_DEPT, and each tuple is supposed to represent
an employee entity-not a department entity.
The above mentioned problem does not occur in the design of the figure (which shows
EMPLOYEE and DEPARTMENT as two different relations) because a department is entered in
the DEPARTMENT relation whether or not any employees work for it, and whenever an
employee is to be added, it will be done in EMPLOYEE relation only.

3
Faculty of computing and software engineering G2-SW
Database Management Systems Chapter 7

Deletion Anomalies
A “deletion anomaly” is a failure to remove information about an existing database entry.
Additionally, deletion of one data may result in lose of other information. For ex: if an employee
tuple (that happens to be the last employee working for a particular department) is deleted from
EMP_DEPT, the information concerning that department is lost from the database.

Modification Anomalies
A “modification anomaly” is a failure to modify/update information about an existing database
entry. For ex: In EMP_DEPT of the above figure, if we change the value of one of the attributes
of a particular department—say, the manager of department 5—we must update the tuples of all
employees who work in that department; otherwise, the database will become inconsistent.

NULL values in tuples


In some schema designs, if many of the attributes do not apply to all tuples in the relation, we
end up with many NULLs in those tuples. This can waste space at the storage level and may also
lead to problems with understanding the meaning of the attributes. Also, another problem with
NULL is how to account aggregate operations such as COUNT or SUM.
As far as possible, avoid placing attributes in a base relation whose values may frequently be
NULL. If NULLs are unavoidable, make sure that they apply in exceptional cases only and do
not apply to a majority of tuples in the relation.

Generation of spurious tuples


Consider two base relations EMPLOYEE and DEPARTMENT as given below

If we attempt to JOIN (Cartesian product) the above relations, the following relation will be
occurred.

In the above relation, you can observe that there are some meaningless tuples (which are called
as spurious tuples). For ex: consider the second tuple in the above relation. It shows that an
employee with E_id = 101, is getting a salary of 100, belongs to CS & IT and Electrical

4
Faculty of computing and software engineering G2-SW
Database Management Systems Chapter 7

department also. This is clearly spurious information, since one employee cannot belong to two
departments. So, this tuple will be a spurious tuple and is marked by asterisks (*).
To obtain the correct data, we have to apply conditions on the JOIN operation. For ex: if the
condition is as
EMPLOYEE . E_id = DEPARTMENT . Dep_id
We will be retrieving the only the tuples 1, 5 and 9 only, which is the required one.

Functional Dependency
In general, a functional dependency is a relationship among attributes.
Data Dependency
The logical associations between data items that point the database designer in the direction of a
good database design are referred to as determinant or dependent relationships.
Two data items A and B are said to be in a determinant or dependent relationship if certain
values of data item B always appears with certain values of data item A. If the data item A is the
determinant data item and B the dependent data item, then the direction of the association is from
A to B and not vice versa.
The essence of this idea is that if A exists, implies that B must exist and have a certain value, and
then we say that "B is functionally dependent on A." Also it is possible to say that "A
functionally determines B," or that "B is a function of A," or that "A functionally governs B." or
"If A, then B."
The notation is: AB which is read as; B is functionally dependent on A

Since the type of Wine served depends on the type of Dinner, we say Wine is functionally
dependent on Dinner. ie, Dinner Wine

Full Dependency
If an attribute which is not a member of the primary key is dependent on the whole key and not
on some part of the primary key, then that attribute is fully functionally dependent on the
primary key.
Let {A, B} be the Primary Key and C is a non- key attribute
Then if {A, B}C and BC and AC does not hold, Then C is fully functionally dependent
on {A, B}.
Eg: {Ssn, P_number}  Hours
is a full dependency because Hours is dependent on both the attributes Ssn and P_number, not on
any one of them, separately.
Let us see an example −
<ProjectCost>
ProjectID ProjectCost
001 1000

5
Faculty of computing and software engineering G2-SW
Database Management Systems Chapter 7

001 5000
<EmployeeProject>
EmpID ProjectID Days
E099 001 320
E056 002 190
The above relations states that −
Days are the number of days spent on the project.
EmpID, ProjectID, ProjectCost -> Days

However, it is not fully functional dependent.


Whereas the subset {EmpID, ProjectID} can easily determine the {Days} spent on the project
by the employee.
This summarizes and gives our fully functional dependency −
{EmpID, ProjectID} -> (Days)

Partial Dependency
If an attribute which is not a member of the primary key is dependent on some part of the
primary key, then that attribute is partially functionally dependent on the primary key.
Let {A, B} is the Primary Key and C is no key attribute.
Then if {A, B} C and BC or AC
Then C is partially functionally dependent on {A, B}
Eg: {Ssn, Pnumber} Ename
is a partial dependency, because SsnEname
example:
<StudentProject>
StudentID ProjectNo StudentName ProjectName
S01 199 Katie Geo Location
S02 120 Ollie Cluster Exploration
In the above table, we have partial dependency; let us see how −
The prime key attributes are StudentID and ProjectNo, and
StudentID = Unique ID of the student
StudentName = Name of the student
ProjectNo = Unique ID of the project
ProjectName = Name of the project

As stated, the non-prime attributes i.e. StudentName and ProjectName should be functionally
dependent on part of a candidate key, to be Partial Dependent.

6
Faculty of computing and software engineering G2-SW
Database Management Systems Chapter 7

The StudentName can be determined by StudentID, which makes the relation Partial
Dependent.
The ProjectName can be determined by ProjectNo, which makes the relation Partial
Dependent.
Transitive Dependency
In mathematics and logic, a transitive relationship is a relationship of the following form: "If A
implies B, and if also B implies C, then A implies C."
Eg:

In the above case;


If, Dinner  Wine and
also Wine  Service then
Dinner  Service
ie, in generally, If {(AB) AND (BC)} ==> AC

What is Multi-valued dependency?


When existence of one or more rows in a table implies one or more other rows in the
same table, then the Multi-valued dependencies occur.
If a table has attributes P, Q and R, then Q and R are multi-valued facts of P.
It is represented by double arrow −

->->

For our example:

P->->Q
P->->R

In the above case, Multivalued Dependency exists only if Q and R are independent
attributes.
A table with multivalued dependency violates the 4NF.

Example
Let us see an example &mins;
<Student>

7
Faculty of computing and software engineering G2-SW
Database Management Systems Chapter 7

StudentName CourseDiscipline Activities

Amit Mathematics Singing

Amit Mathematics Dancing

Yuvraj Computers Cricket

Akash Literature Dancing

Akash Literature Cricket

Akash Literature Singing

In the above table, we can see Students Amit and Akash have interest in more than
one activity.
This is multivalued dependency because CourseDiscipline of a student are
independent of Activities, but are dependent on the student.
Therefore, multivalued dependency −

StudentName ->-> CourseDiscipline


StudentName ->-> Activities

The above relation violates Fourth Normal Form in Normalization.


To correct it, divide the table into two separate tables and break Multivalued
Dependency −

<StudentCourse>

StudentName CourseDiscipline

Amit Mathematics

Amit Mathematics

Yuvraj Computers

Akash Literature

Akash Literature

Akash Literature

8
Faculty of computing and software engineering G2-SW
Database Management Systems Chapter 7

<StudentActivities>

StudentName Activities

Amit Singing

Amit Dancing

Yuvraj Cricket

Akash Dancing

Akash Cricket

Akash Singing

This breaks the multivalued dependency and now we have two functional dependencies

StudentName -> CourseDiscipline


StudentName - > Activities

NORMALIZATION
A relational database is merely a collection of data, organized in a particular manner. Database
normalization is a series of steps followed to obtain a database design that allows for consistent
storage and efficient access of data in a relational database. Concept of normalization was
introduced by E.F. Codd (known as the father of the relational data model) as the basis for
database design. He defined first, second and third normal forms depending upon the
constraints which each normalization form satisfies.
Normalization is the process of identifying the logical associations between data items and
designing a database that will represent such associations but without any type of anomalies..
Normalization may reduce system performance since data will be cross referenced from many
tables. Thus de-normalization is sometimes used to improve performance, at the cost of reduced
consistency guarantees.
Database Normalization is a technique of organizing the data in the database. Normalization is a
systematic approach of decomposing tables to eliminate data redundancy (repetition) and
undesirable characteristics like Insertion, Update and Deletion Anomalies.
Normalization is used for mainly two purposes,
 Eliminating redundant (useless) data.
 Ensuring data dependencies make sense i.e. data is logically stored.

Problems without Normalization

9
Faculty of computing and software engineering G2-SW
Database Management Systems Chapter 7

If a table is not properly normalized and have data redundancy then it will not only eat up extra
memory space but will also make it difficult to handle and update the database, without facing
data loss. Insertion, Updation and Deletion Anamolies are very frequent if database is not
normalized.
Steps of Normalization
We have various levels or steps in normalization called Normal Forms. The level of complexity,
strength of the rule and decomposition increases as we move from one lower level Normal Form
to the higher. A table in a relational database is said to be in a certain normal form if it satisfies
certain constraints.
Normalization towards a logical design consists of the following steps:
Normalized Form (UNF): Identify all data elements
First Normal Form (1NF): Find the key with which you can find all data i.e. remove any
repeating group
Second Normal Form (2NF): Remove part-key dependencies (partial dependency). Make all
data dependent on the whole key.
Third Normal Form (3NF): Remove non-key dependencies (transitive dependencies). Make all
data dependent on nothing but the key.
For most practical purposes, databases are considered normalized if they adhere to the third
normal form (there is no transitive dependency).
First Normal Form (1NF)
A relation is said to be in first normal form (INF) if and only if all underlying domains contain
atomic values only. i.e it states that the domain of an attribute must include only atomic values
(simple, indivisible) and that the value of any attribute in a tuple must be a single value from the
domain of that attribute.
1NF does not allows
 composite attributes
 multivalued attributes
The following diagram depicts the steps of normalization into 1NF form

10
Faculty of computing and software engineering G2-SW
Database Management Systems Chapter 7

Second Normal form (2NF)


No partial dependency of a non key attribute on part of the primary key. This will result in a set
of relations with a level of Second Normal Form.
Definition: A table (relation) is in 2NF, if
 It is in 1NF, and
 If all non-key attributes are dependent on the entire primary key. i.e. no partial
dependency.
That means, a relation R is said to be in 2NF if it is in 1NF and every non key attribute is
completely functionally dependent on the primary key of R.

Example for 2NF:


Consider the relation schema given below.
EMP_PROJ

Business rule: Whenever an employee participates in a project, he/she will be entitled for an
incentive.
This schema is in its 1NF since we don„t have any repeating groups or attributes with multi-
valued property. To convert it into a 2NF, we need to remove all partial dependencies of non key
attributes on part of the primary key.

But in addition to this we have the following dependencies

11
Faculty of computing and software engineering G2-SW
Database Management Systems Chapter 7

As we can see, some non key attributes are partially dependent on some part of the primary key.
This can be witnessed by analyzing the first two functional dependencies (FD1 and FD2). Thus,
each Functional Dependencies, with their dependent attributes should be moved to a new relation
(as shown below) where the determinant will be the Primary Key for each.

Third Normal Form (3NF)

Eliminate Columns dependent on another non-Primary Key - If attributes do not contribute to a


description of the key; remove them to a separate table. This level avoids update and deletes
anomalies.
Definition: A Table (Relation) is in 3NF, if:
 It is in 2NF , and
 There are no transitive dependencies between a primary key and non-primary key
attributes.
Example for (3NF)
Assumption: Students of same batch (same year) live in the same dormitory

This schema is in its 2NF since the primary key is a single attribute and there are no repeating
groups (multi valued attributes).
Let„s take StudID, Year and Dormitory and see the dependencies.

And Year can not determine StudID and Dormitory can not determine StudID Then transitively

To convert it into a 3NF, we need to remove all transitive dependencies of non key attributes on
another non-key attribute. The non-primary key attributes, dependent on each other will be

12
Faculty of computing and software engineering G2-SW
Database Management Systems Chapter 7

moved to another table and linked with the main table using Candidate Key- Foreign Key
relationship as shown below.

Generally, even though there are other four additional levels of Normalization, a table is said to
be normalized if it reaches 3NF. A database with all tables in the 3NF is said to be Normalized
Database.
Tips for remembering the rationale for normalization up to 3NF could be the following:
1. No Repeating or Redundancy: no repeating fields in the table.
2. The Fields depend upon the Key: the table should solely depend on the key.
3. The Whole Key: no partial key dependency.
4. And nothing but the Key: no inter data dependency.

Other Levels of Normalization

1. Boyce-Codd Normal Form (BCNF)


Isolate Independent Multiple Relationships - No table may contain two or more 1: n or N: M
relationships that are not directly related.

The correct solution, to cause the model to be in 4th normal form, is to ensure that all M: M
relationships are resolved independently if they are indeed independent, as shown below.

Def: A table is in BCNF if it is in 3NF and if every determinant is a candidate key.

2. Fourth Normal form (4NF)


Isolate Semantically Related Multiple Relationships - There may be practical constrains on
information that justify separating logically related many-to-many relationships.

Def: A table is in 4NF if it is in BCNF and if it has no multi-valued dependency

3. Fifth Normal Form (5NF)


A model limited to only simple (elemental) facts, as expressed in ORM.

13
Faculty of computing and software engineering G2-SW
Database Management Systems Chapter 7

Def: A table is in 5NF, also called "Projection-Join Normal Form" (PJNF), if it is in


4NF and if every join dependency in the table is a consequence of the candidate
keys of the table.

4. Domain-Key Normal Form (DKNF)


A model free from all modification anomalies.

Def: A table is in DKNF if every constraint on the table is a logical consequence of the
definition of keys and domains.

The underlying ideas in normalization are simple enough. Through normalization we want to design
for our relational database a set of tables that;

(1) Contain all the data necessary for the purposes that the database is to serve,
(2) Have as little redundancy as possible,
(3) Accommodate multiple values for types of data that require them,
(4) Permit efficient updates of the data in the database, and
(5) Avoid the danger of losing data unknowingly.

The following figure shows the graphical illustration of different phases of normalization.

14
Faculty of computing and software engineering G2-SW
Database Management Systems Chapter 7

Pitfalls of Normalization

 Requires data to see the problems


 May reduce performance of the system
 Is time consuming,
 Difficult to design and apply and
 Prone to human error

15
Faculty of computing and software engineering G2-SW

You might also like