0% found this document useful (0 votes)
70 views

Functional Dependency and Normalisation

The document discusses database issues caused by data redundancy such as update, insertion, and deletion anomalies. It defines functional dependencies and provides examples. Rules for functional dependencies are described, including Armstrong's axioms like reflexivity, augmentation, and transitivity. The concepts of normalization, dependency diagrams, and decomposition are introduced to help address redundancy issues.

Uploaded by

Eric Awat
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views

Functional Dependency and Normalisation

The document discusses database issues caused by data redundancy such as update, insertion, and deletion anomalies. It defines functional dependencies and provides examples. Rules for functional dependencies are described, including Armstrong's axioms like reflexivity, augmentation, and transitivity. The concepts of normalization, dependency diagrams, and decomposition are introduced to help address redundancy issues.

Uploaded by

Eric Awat
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

HIMS22

DATABASE ISSUES AND REMEDIES

Problems Caused by Redundancy

Storing the same information redundantly, that is, in more than one place within
a database, can lead to several problems:

• Redundant storage: Some information is stored repeatedly.


• Update anomalies: If one copy of such repeated data is updated, an
inconsistency is created unless all copies are similarly updated.
• Insertion anomalies: It may not be possible to store some information
unless some other information is stored as well.
• Deletion anomalies: It may not be possible to delete some information
without losing some other information as well.

FUNCTIONAL DEPENDENCIES
A functional dependency (FD) is a relationship
between two attributes, typically between the PK and
other non-key attributes within a table. For any
relation R, attribute Y is functionally dependent on
attribute X (usually the PK), if for every valid instance
of X, that value of X uniquely determines the value of
Y. This relationship is indicated by the representation
below:

X ———–> Y
The left side of the above FD diagram is called the
determinant, and the right side is the dependent.
HIMS22
Here are a few examples.
In the first example, below, SIN determines Name, Address and Birthdate. Given SIN,
we can determine any of the other attributes within the table

SIN ———-> Name, Address, Birthdate

For the second example, SIN and Course determine


the date completed (Date Completed). This must also
SIN, Course ———> Date Completed

work for a composite PK.

The third example indicates that ISBN determines

ISBN ———–> Title

Title.

Rules of Functional Dependencies


Consider the following table of data r(R) of the relation schema
R(ABCDE) shown in Table

As you look at this table, ask yourself: What kind of dependencies can we observe among
the attributes in Table R? Since the values of A are unique (a1, a2, a3, etc.), it follows from
the FD definition that:
A → B, A → C, A → D, A → E

• It also follows that A →BC (or any other subset of ABCDE).


• This can be summarized as A →BCDE.
• From our understanding of primary keys, A is a primary key.
HIMS22

Since the values of E are always the same (all e1), it follows that

Table 1. Functional dependency

example,

A→E
,
B→E
,
C→E
,
D→E
However, we cannot generally summarize the above with ABCD → E
because, in general, A → E, B → E, AB → E. Other observations:

1. Combinations of BC are unique, therefore BC →ADE.


2. Combinations of BD are unique, therefore BD → ACE.
3. If C values match, so do D values.
i Therefore, C → D
ii However, D values don’t determine C values
iii So C does not determine D, and D does not determine C.

Looking at actual data can help clarify which attributes are dependent and
which are determinants.
HIMS22
Inference Rules
Armstrong’s axioms are a set of inference rules used to infer all the
functional dependencies on a relational database. They were developed by
William W. Armstrong. The following describes what will be used, in
terms of notation, to explain these axioms.
Let R(U) be a relation scheme over the set of attributes
U. We will use the letters X, Y, Z to represent any
subset of and, for short, the union of two sets of
attributes, instead of the usual X U Y.

Axiom of reflexivity

This axiom says, if Y is a subset of X, then X determines Y (see Figure


1.1).
For example, PartNo —> NT123 where X (PartNo) is composed of more
than one piece of information; i.e., Y (NT) and partID (123).

Figure 1.1. Equation for axiom of reflexivity.

Axiom of augmentation

The axiom of augmentation, also known as a partial dependency, says


if X determines Y, then XZ determines YZ for any Z (see Figure 11.2
).

Figure 1.2. Equation for axiom of augmentation.

The axiom of augmentation says that every non-key attribute must be


fully dependent on the PK. In the example shown below, StudentName,
Address, City, Prov, and PC (postal code) are only dependent on the
StudentNo, not on the StudentNo and Grade.
HIMS22
StudentNo, Course —> StudentName, Address, City,
Prov, PC, Grade, DateCompleted
This situation is not desirable because every non-
key attribute has to be fully dependent on the PK. In
this situation, student information is only partially
dependent on the PK (StudentNo).
To fix this problem, we need to break the original table
down into two as follows:
• Table 1: StudentNo, Course, Grade, DateCompleted
• Table 2: StudentNo, StudentName, Address, City, Prov, PC

Axiom of transitivity

The axiom of transitivity says if X determines Y, and


Y determines Z, then X must also determine Z (see
Figure 11.3).

Figure 1.3. Equation for axiom of transitivity.

The table below has information not directly related to the student; for
instance, ProgramID and ProgramName should have a table of its own.
ProgramName is not dependent on StudentNo; it’s dependent on
ProgramID.
StudentNo —> StudentName, Address, City, Prov,
PC, ProgramID, ProgramName
This situation is not desirable because a non-key attribute (ProgramName)
depends on another non-key attribute (ProgramID).

To fix this problem, we need to break this table into two: one to hold
information about the student and the other to hold information about the
program.
HIMS22

• Table 1: StudentNo —> StudentName, Address, City, Prov, PC, ProgramID


• Table 2: ProgramID —> ProgramName

However we still need to leave an FK in the student table so that we


can identify which program the student is enrolled in.

Union

This rule suggests that if two tables are separate, and the PK is the same,
you may want to consider putting them together. It states that if X
determines Y and X determines Z then X must also determine Y and Z (
see Figure 11.4).

Figure 11.4. Equation for the Union


rule.

For example, if:


• SIN —> EmpName
• SIN —> SpouseName

You may want to join these two tables into one as follows:
SIN –> EmpName, SpouseName
Some database administrators (DBA) might choose to keep these tables
separated for a couple of reasons. One, each table describes a different
entity so the entities should be kept apart. Two, if SpouseName is to be
left NULL most of the time, there is no need to include it in the same table
as EmpName.

Decomposition

Decomposition is the reverse of the Union rule. If you have a table that
appears to contain two entities that are deter- mined by the same PK,
consider breaking them up into two tables. This rule states that if X
determines Y and Z, then X determines Y and X determines Z separately
(see Figure 11.5).
HIMS22

Figure 1.5. Equation for decompensation rule.


Dependency Diagram
A dependency diagram, shown in Figure 11.6,
illustrates the various dependencies that might exist in
a non-normalized table. A non-normalized table is

one that has data redundancy in it.

Figure 1.6. Dependency diagram.

The following dependencies are identified in this


table:

• ProjectNo and EmpNo, combined, are the PK.

Partial Dependencies:
ProjectNo —> ProjName
EmpNo —> EmpName, DeptNo,
- ProjectNo, EmpNo —> HrsWork
Transitive Dependency:
DeptNo —> DeptName
HIMS22
Normalization
Normalization should be part of the database design process. However, it is
difficult to separate the normalization process from the ER modelling
process so the two techniques should be used concurrently.
Use an entity relation diagram (ERD) to provide
the big picture, or macro view, of an organization’s
data requirements and operations. This is created
through an iterative process that involves
identifying relevant entities, their attributes and
their relationships.
Normalization procedure focuses on characteristics of specific entities and
represents the micro view of entities within the ERD.

What Is Normalization?
Normalization is the branch of relational theory that provides design insights. It is the
process of determining how much redundancy exists in a table. The goals of normalization are
to:

Be able to characterize the level of redundancy in a relational schema


Provide mechanisms for transforming schemas in order to remove redundancy

Normalization theory draws heavily on the theory of functional


dependencies. Normalization theory defines six normal forms (NF). Each
normal form involves a set of dependency properties that a schema must
satisfy and each normal form gives guarantees about the presence and/or
absence of update anomalies. This means that higher normal forms have
less redundancy, and as a result, fewer update problems.

Normal Forms
All the tables in any database can be in one of the
normal forms we will discuss next. Ideally we only
want minimal redundancy for PK to FK. Everything
else should be derived from other tables. There are
six normal forms, but we will only look at the first
four, which are:
• First normal form (1NF)
• Second normal form (2NF)
HIMS22
• Third normal form (3NF)
• Boyce-Codd normal form

(BCNF) BCNF is rarely used.

The need for normalization

In following example:

We see in that example, the structure of data set does not conform to the requirements of table
nor does it handle data very well.
Consider the following deficiencies:
1. The project number (PROJ_NUM) is apparently intended to be primary key or at least a part
of a PK, but it contains nulls.
2. . The table entries invite data inconsistencies. For example the JOB_CLASS value "Elect.
Engineer" might be entered as "Elect. Eng."
3. The table displays data redundancies. Those data redundancies yield the following
anomalies:
• Update anomalies. Modifying the JOB_CLASS for employee number 105 requires
(potentially) many alterations, one for each EMP_NUM=105.
HIMS22
• Insertion anomalies. Just to complete a row definition, a new employee must be
assigned to a project. If the employee is not assigned, a phantom project must be
created to complete the employee data entry.

Deletion anomalies. Suppose that only one employee is associated with
a given project, if that employee leaves the company and the employee
data are deleted , the project information will also be deleted .to prevent
the loss of the project information ,a fictitious employee must be created
just to save the project information
The Normalization Process:

We will learn how to use normalization to produce a set of normalized tables


to store the data that will be used to generate the required information. The
objective of normalization is to ensure that each table conforms to the
concept of well-formed relations, that is, tables that have the following
characteristics:

▪ Each table represents a single subject. For example, a course Table will
contain only data that directly pertains to courses. Similarly, a student
table will contain only student data.
▪ No data item will be unnecessarily stored in more than one table (in
short, tables have minimum controlled redundancy). The reason for this
requirement is to ensure that the data are update in only one place. ▪ All
nonprime attributes in a table are dependent on the primary key. The
reason for this requirement is to ensure that the data are uniquely
identifiable by a primary key value.
▪ Each table is void of insertion, update, or deletion anomalies. This is to
ensure the integrity and consistency of the data.
HIMS22
Conversion to First Normal

Form (1NF)

Step 1: Eliminate the Repeating

Groups

Start by presenting the data in tabular format, where each cell has a single
value and there are no repeating groups. A repeating group derives its
name from the fact that a group of multiple entries of the same type can
exist for any single key attributes occurrence. To eliminate the repeating
groups, eliminate the nulls by making sure that each repeating group
attribute contains an appropriate data value.
HIMS22
Step 2: Identify the primary key:
Even causal observers will note that PROJ-NUM is not an adequate
primary key because the project number does not uniquely identify all of
the remaining entity (row) attributes. To maintain a proper primary key
that will uniquely identify any attribute value, the new key must be
compost of a combination of a PROJ_NUM and EMP_NUM Step 3:
Identify All Dependencies:
The identification of the PK in Step 2 means that you have already
identified the following dependency:

▪ PROJ_NUM, EMP_NUM PROJ_NAME,


EMP_NAME, JOB_CLASS, CHG_HOUR, HOURS
HIMS22
Conversion to Second Normal Form (2NF)

Converting to 2NF is done only when the 1NF has a composite primary key.
if the 1NF has a single attribute primary key, then the table is automatically
in 2NF. The 1NF-to-2NF conversion is simple starting with:

Step 1: Write Each Key Component on a Separate Line

Write each key component on a separate line; then write the original (composite)
key on the last line.

▪ PROJ_NUM

▪ EMP_NUM

▪ PROJ_NUM EMP_NUM
Each component will become the key in a new
table. In other words, the original table is now
divided in to three tables:

▪ (PROJECT, EMPLOYEE, and ASSIGNMENT).

Step 2: Assign Corresponding Dependent Attributes

Use dependency diagram to determine those


attributes that are dependent on other attributes.

▪ PROJECT (PROJ_NUM, PROJ_NAME)


▪ EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS, CHG_HOUR)
▪ ASSIGNMENT (PROJ_NUM, EMP_NUM, ASSIGN_HOURS)
HIMS22

A table is in second normal form (2NF) when:

▪ it is in 1NF, And
▪ It includes no partial dependencies; that is, no attribute is dependent on
only portion of the primary key. Note that is still possible for a table in
2NF to exhibit transitive dependency; that is, one or more attributes may
be functionally dependent on non key attributes.
Conversion to Third Normal (3NF):

Step 1: Identify the Dependent Attributes

For every transitive dependency, write its


determinant as PK for a new table.

▪ JOB_CLASS
HIMS22
Step 2: Identify the Dependent Attributes

Identify the attributes that are dependent on each determinant identified in


Step 1 and identify the dependency.
▪ JOB_CLASS "CHG_HOUR
Name the table to reflect its contents and function. In
this case, JOB seems appropriate.
Step 3: Remove the Dependent Attributes from Transitive Dependencies
Eliminate all dependent attributes in the transitive
relationship(s) from each of the tables that have
such a transitive relationship.

▪ EMP_NUM " EMP_NAME, JOB_CLASS


Note that the JOB_CLASS remains in the
EMPLOYEE table to save as FK. After the 3NF
conversion has been completed, your database
contains four tables:

A table is in 3NF when:

• It is in 2NF
• It contains no transitive dependencies

Dependency-Preserving Decomposition into 3NF

You might also like