Functional Dependency and Normalisation
Functional Dependency and Normalisation
Storing the same information redundantly, that is, in more than one place within
a database, can lead to several problems:
FUNCTIONAL DEPENDENCIES
A functional dependency (FD) is a relationship
between two attributes, typically between the PK and
other non-key attributes within a table. For any
relation R, attribute Y is functionally dependent on
attribute X (usually the PK), if for every valid instance
of X, that value of X uniquely determines the value of
Y. This relationship is indicated by the representation
below:
X ———–> Y
The left side of the above FD diagram is called the
determinant, and the right side is the dependent.
HIMS22
Here are a few examples.
In the first example, below, SIN determines Name, Address and Birthdate. Given SIN,
we can determine any of the other attributes within the table
Title.
As you look at this table, ask yourself: What kind of dependencies can we observe among
the attributes in Table R? Since the values of A are unique (a1, a2, a3, etc.), it follows from
the FD definition that:
A → B, A → C, A → D, A → E
Since the values of E are always the same (all e1), it follows that
example,
A→E
,
B→E
,
C→E
,
D→E
However, we cannot generally summarize the above with ABCD → E
because, in general, A → E, B → E, AB → E. Other observations:
Looking at actual data can help clarify which attributes are dependent and
which are determinants.
HIMS22
Inference Rules
Armstrong’s axioms are a set of inference rules used to infer all the
functional dependencies on a relational database. They were developed by
William W. Armstrong. The following describes what will be used, in
terms of notation, to explain these axioms.
Let R(U) be a relation scheme over the set of attributes
U. We will use the letters X, Y, Z to represent any
subset of and, for short, the union of two sets of
attributes, instead of the usual X U Y.
Axiom of reflexivity
Axiom of augmentation
Axiom of transitivity
The table below has information not directly related to the student; for
instance, ProgramID and ProgramName should have a table of its own.
ProgramName is not dependent on StudentNo; it’s dependent on
ProgramID.
StudentNo —> StudentName, Address, City, Prov,
PC, ProgramID, ProgramName
This situation is not desirable because a non-key attribute (ProgramName)
depends on another non-key attribute (ProgramID).
To fix this problem, we need to break this table into two: one to hold
information about the student and the other to hold information about the
program.
HIMS22
Union
This rule suggests that if two tables are separate, and the PK is the same,
you may want to consider putting them together. It states that if X
determines Y and X determines Z then X must also determine Y and Z (
see Figure 11.4).
You may want to join these two tables into one as follows:
SIN –> EmpName, SpouseName
Some database administrators (DBA) might choose to keep these tables
separated for a couple of reasons. One, each table describes a different
entity so the entities should be kept apart. Two, if SpouseName is to be
left NULL most of the time, there is no need to include it in the same table
as EmpName.
Decomposition
Decomposition is the reverse of the Union rule. If you have a table that
appears to contain two entities that are deter- mined by the same PK,
consider breaking them up into two tables. This rule states that if X
determines Y and Z, then X determines Y and X determines Z separately
(see Figure 11.5).
HIMS22
Partial Dependencies:
ProjectNo —> ProjName
EmpNo —> EmpName, DeptNo,
- ProjectNo, EmpNo —> HrsWork
Transitive Dependency:
DeptNo —> DeptName
HIMS22
Normalization
Normalization should be part of the database design process. However, it is
difficult to separate the normalization process from the ER modelling
process so the two techniques should be used concurrently.
Use an entity relation diagram (ERD) to provide
the big picture, or macro view, of an organization’s
data requirements and operations. This is created
through an iterative process that involves
identifying relevant entities, their attributes and
their relationships.
Normalization procedure focuses on characteristics of specific entities and
represents the micro view of entities within the ERD.
What Is Normalization?
Normalization is the branch of relational theory that provides design insights. It is the
process of determining how much redundancy exists in a table. The goals of normalization are
to:
Normal Forms
All the tables in any database can be in one of the
normal forms we will discuss next. Ideally we only
want minimal redundancy for PK to FK. Everything
else should be derived from other tables. There are
six normal forms, but we will only look at the first
four, which are:
• First normal form (1NF)
• Second normal form (2NF)
HIMS22
• Third normal form (3NF)
• Boyce-Codd normal form
In following example:
We see in that example, the structure of data set does not conform to the requirements of table
nor does it handle data very well.
Consider the following deficiencies:
1. The project number (PROJ_NUM) is apparently intended to be primary key or at least a part
of a PK, but it contains nulls.
2. . The table entries invite data inconsistencies. For example the JOB_CLASS value "Elect.
Engineer" might be entered as "Elect. Eng."
3. The table displays data redundancies. Those data redundancies yield the following
anomalies:
• Update anomalies. Modifying the JOB_CLASS for employee number 105 requires
(potentially) many alterations, one for each EMP_NUM=105.
HIMS22
• Insertion anomalies. Just to complete a row definition, a new employee must be
assigned to a project. If the employee is not assigned, a phantom project must be
created to complete the employee data entry.
•
Deletion anomalies. Suppose that only one employee is associated with
a given project, if that employee leaves the company and the employee
data are deleted , the project information will also be deleted .to prevent
the loss of the project information ,a fictitious employee must be created
just to save the project information
The Normalization Process:
▪ Each table represents a single subject. For example, a course Table will
contain only data that directly pertains to courses. Similarly, a student
table will contain only student data.
▪ No data item will be unnecessarily stored in more than one table (in
short, tables have minimum controlled redundancy). The reason for this
requirement is to ensure that the data are update in only one place. ▪ All
nonprime attributes in a table are dependent on the primary key. The
reason for this requirement is to ensure that the data are uniquely
identifiable by a primary key value.
▪ Each table is void of insertion, update, or deletion anomalies. This is to
ensure the integrity and consistency of the data.
HIMS22
Conversion to First Normal
Form (1NF)
Groups
Start by presenting the data in tabular format, where each cell has a single
value and there are no repeating groups. A repeating group derives its
name from the fact that a group of multiple entries of the same type can
exist for any single key attributes occurrence. To eliminate the repeating
groups, eliminate the nulls by making sure that each repeating group
attribute contains an appropriate data value.
HIMS22
Step 2: Identify the primary key:
Even causal observers will note that PROJ-NUM is not an adequate
primary key because the project number does not uniquely identify all of
the remaining entity (row) attributes. To maintain a proper primary key
that will uniquely identify any attribute value, the new key must be
compost of a combination of a PROJ_NUM and EMP_NUM Step 3:
Identify All Dependencies:
The identification of the PK in Step 2 means that you have already
identified the following dependency:
Converting to 2NF is done only when the 1NF has a composite primary key.
if the 1NF has a single attribute primary key, then the table is automatically
in 2NF. The 1NF-to-2NF conversion is simple starting with:
Write each key component on a separate line; then write the original (composite)
key on the last line.
▪ PROJ_NUM
▪ EMP_NUM
▪ PROJ_NUM EMP_NUM
Each component will become the key in a new
table. In other words, the original table is now
divided in to three tables:
▪ it is in 1NF, And
▪ It includes no partial dependencies; that is, no attribute is dependent on
only portion of the primary key. Note that is still possible for a table in
2NF to exhibit transitive dependency; that is, one or more attributes may
be functionally dependent on non key attributes.
Conversion to Third Normal (3NF):
▪ JOB_CLASS
HIMS22
Step 2: Identify the Dependent Attributes
• It is in 2NF
• It contains no transitive dependencies