0% found this document useful (0 votes)
6 views

Chapter 5-Functional Dependency and Normalization

Chapter 5 of 'Fundamentals of Database Systems' discusses functional dependency and normalization in database design, emphasizing the importance of analyzing attribute groupings to improve design quality. It introduces concepts such as functional dependencies, normalization processes, and various normal forms (1NF, 2NF, 3NF) to minimize redundancy and maintain data integrity. The chapter also highlights the significance of keys and attributes in establishing relationships within relational schemas.

Uploaded by

Mintesnot Yigezu
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Chapter 5-Functional Dependency and Normalization

Chapter 5 of 'Fundamentals of Database Systems' discusses functional dependency and normalization in database design, emphasizing the importance of analyzing attribute groupings to improve design quality. It introduces concepts such as functional dependencies, normalization processes, and various normal forms (1NF, 2NF, 3NF) to minimize redundancy and maintain data integrity. The chapter also highlights the significance of keys and attributes in establishing relationships within relational schemas.

Uploaded by

Mintesnot Yigezu
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 27

Fundamentals of Database

Systems(InSy2041)
Chapter - 5 Functional Dependency
and Normalization

Compiled By Amsalu Dinote(MSc.)


01/22/2025 Fundamentals of Database Systems 1
Introduction
• So far, we have assumed that attributes are grouped
to form a relation schema by using the common
sense of the database designer or by mapping a
database schema design from a conceptual data model
such as the ER or Enhanced-ER (EER) data model.
• These models make the designer identify entity types
and relationship types and their respective attributes,
which leads to a natural and logical grouping of the
attributes into relations when the mapping procedures
are followed.
• However, we still need some formal way of analyzing
why one grouping of attributes into a relation schema
may be better than another.
01/22/2025 Fundamentals of Database Systems 2
Introduction(cont..)
• While discussing database design in, we did not
develop any measure of appropriateness or
goodness to measure the quality of the design, other
than the intuition of the designer.
• In this chapter we discuss some of the theory that
has been developed with the goal of evaluating
relational schemas for design quality—that is, to
measure formally why one set of groupings of
attributes into relation schemas is better than
another.
• Relational database design ultimately produces a set
of relations. The implicit goals of the design activity
are information preservation and minimum
redundancy. Fundamentals of Database Systems
01/22/2025 3
Introduction(cont..)
• Information is very hard to quantify—hence we consider
information preservation in terms of maintaining all concepts,
including attribute types, entity types, and relationship types as
well as generalization/specialization relationships, which are
described using a model such as the EER model.
• Thus, the relational design must preserve all of these concepts,
which are originally captured in the conceptual design after the
conceptual to logical design mapping.
• Minimizing redundancy implies minimizing redundant
storage of the same information and reducing the need for
multiple updates to maintain consistency across multiple
copies of the same information in response to real-world
events that require making an update.
01/22/2025 Fundamentals of Database Systems 4
Functional Dependency
• A functional dependency is a constraint between two sets
of attributes from the database.
• Suppose that our relational database schema has n attributes
A1, A2,...,An; let us think of the whole database as being
described by a single universal relation schema R = {A 1,
A2,..., An}.
• We do not imply that we will actually store the database as a
single universal table; we use this concept only in developing
the formal theory of data dependencies.
• A functional dependency, denoted by X → Y, between two
sets of attributes X and Y that are subsets of R specifies a
constraint on the possible tuples that can form a relation state
r of R. The constraint is that, for any two tuples t1 and t2 in r
that have t1[X] = t2[X], they must also have t1[Y] = t2[Y].
01/22/2025 Fundamentals of Database Systems 5
Functional Dependency(cont..)
• This means that the values of the Y component of a
tuple in r depend on, or are determined by, the
values of the X component; alternatively, the
values of the X component of a tuple uniquely (or
functionally) determine the values of the Y
component.
• We also say that there is a functional dependency
from X to Y, or that Y is functionally dependent on
X. The abbreviation for functional dependency is
FD or f.d. The set of attributes X is called the left-
hand side of the FD, and Y is called the right-
hand side.
01/22/2025 Fundamentals of Database Systems 6
Functional Dependency(cont..)
• In simple terms, a functional dependency describes
the relationship between attributes (fields) in a
relation (table). Suppose we have two fields A and B
in a table, called T,; and if each value of A inside the
table, T, is associated with exactly one value of B in
the same table, we say, B is functionally dependent on
A, or pictorially:
A→B
• And A is called the determinant of the functional
dependency A to B. The fields A and B can each of
them be a single field or group of fields.
• Consider the following staff_branch table as
example
01/22/2025 Fundamentals of Database Systems 7
Functional Dependency(cont..)

01/22/2025 Fundamentals of Database Systems 8


Normalization
• The normalization process, as first proposed by Codd (1972),takes a
relation schema through a series of tests to certify whether it satisfies a
certain normal form.
• The process, which proceeds in a top-down fashion by evaluating each
relation against the criteria for normal forms and decomposing relations
as necessary, can thus be considered as relational design by analysis.
• Initially, Codd proposed three normal forms, which he called first(1NF),
second(2NF), and third(3NF) normal form.
• A stronger definition of 3NF—called Boyce-Codd normal form (BCNF)
—was proposed later by Boyce and Codd.
• All these normal forms are based on a single analytical tool: the
functional dependencies among the attributes of a relation.
• Normalization of data can be considered a process of analyzing the
given relation schemas based on their FDs and primary keys to achieve
the desirable properties of (1) minimizing redundancy and (2)
minimizing the insertion, deletion, and update anomalies
01/22/2025 Fundamentals of Database Systems 9
Normalization(cont..)
• It can be considered as a “filtering” or “purification” process to make
the design have successively better quality. Unsatisfactory relation
schemas that do not meet certain conditions—the normal form tests—
are decomposed into smaller relation schemas that meet the tests and
hence possess the desirable properties. Thus, the normalization
procedure provides database designers with the following:
 A formal framework for analyzing relation schemas based on their keys
and on the functional dependencies among their attributes
 A series of normal form tests that can be carried out on individual
relation schemas so that the relational database can be normalized to any
desired degree
• The normal form of a relation refers to the highest normal form
condition that it meets, and hence indicates the degree to which it has
been normalized.
• Database design as practiced in industry today pays particular attention
to normalization only up to 3NF,BCNF,or at most 4NF.
01/22/2025 Fundamentals of Database Systems 10
Definitions of Keys and Attributes
Participating in Keys
• A superkey of a relation schema R = {A1, A2,..., An} is a set
of attributes S ⊆ R with the property that no two tuples t 1
and t2 in any legal relation state r of R will have t 1[S] = t2[S].
A key K is a superkey with the additional property that
removal of any attribute from K will cause K not to be a
superkey any more.
• The difference between a key and a superkey is that a key
has to be minimal; that is, if we have a key K = {A 1, A2,...,
Ak} of R, then K – {Ai} is not a key of R for any Ai, 1≤ i ≤ k.
• In previous ER, {Ssn} is a key for EMPLOYEE,whereas
{Ssn},{Ssn, Ename}, {Ssn, Ename, Bdate},and any set of
attributes that includes Ssn are all superkeys.
01/22/2025 Fundamentals of Database Systems 11
Definitions of Keys and Attributes
Participating in Keys(cont..)
• If a relation schema has more than one key, each is called a
candidate key. One of the candidate keys is arbitrarily
designated to be the primary key, and the others are called
secondary keys(alternate keys).
• In a practical relational database, each relation schema must
have a primary key. If no candidate key is known for a
relation, the entire relation can be treated as a default
superkey.
• An attribute of relation schema R is called a prime
attribute of R if it is a member of some candidate key of R.
An attribute is called nonprime if it is not a prime attribute
—that is, if it is not a member of any candidate key.
01/22/2025 Fundamentals of Database Systems 12
First Normal Form(1NF)
• It states that the domain of an attribute must
include only atomic (simple, indivisible) values
and that the value of any attribute in a tuple
must be a single value from the domain of that
attribute.
• Hence,1NF disallows having a set of values, a
tuple of values, or a combination of both as an
attribute value for a single tuple.
• Consider the DEPARTMENT relation on the
next slide, as we can see, this is not in 1NF
because Dlocations is not an atomic attribute
01/22/2025 Fundamentals of Database Systems 13
First Normal Form(cont..)
• Department relation not in 1NF

01/22/2025 Fundamentals of Database Systems 14


First Normal Form(cont..)
• There are three main techniques to achieve first normal form for
such a relation:
1. Remove the attribute Dlocations that violates 1NF and place it in a
separate relation DEPT_LOCATIONS along with the primary key
Dnumber of DEPARTMENT. The primary key of this relation is
the combination {Dnumber, Dlocation}
2. Expand the key so that there will be a separate tuple in the original
DEPARTMENT relation for each location of a DEPARTMENT. In
this case, the primary key becomes the combination {Dnumber,
Dlocation}.(introduce redundancy)
3. If a maximum number of values is known for the attribute—for
example, if it is known that at most three locations can exist for a
department—replace the Dlocations attribute by three atomic
attributes: Dlocation1, Dlocation2,and Dlocation3.(introduce
NULL values)
01/22/2025 Fundamentals of Database Systems 15
First Normal Form(cont..)
• Department relation in 1NF with redundancy

01/22/2025 Fundamentals of Database Systems 16


Second Normal Form(2NF)
• 2NF is based on the concept of full functional dependency.
• A functional dependency X → Y is a full functional dependency if
removal of any attribute A from X means that the dependency does
not hold any more; that is, for any attribute A ε X, (X – {A}) does not
functionally determine Y.
• A functional dependency X→Y is a partial dependency if some
attribute A ε X can be removed from X and the dependency still
holds; that is, for some A ε X, (X – {A}) → Y.
• For example, in EMP_PRO table,{Ssn, Pnumber} → Hours is a full
dependency (neither Ssn → Hours nor Pnumber →Hours
holds).However, the dependency {Ssn, Pnumber}→Ename is partial
because Ssn →Ename holds.
• In other words, a relation schema R is in 2NF if it is in 1NF and
every nonprime attribute A in R is fully functionally dependent on
the primary key of R.
01/22/2025 Fundamentals of Database Systems 17
Second Normal Form(cont..)
• The test for 2NF involves testing for functional dependencies whose
left-hand side attributes are part of the primary key. If the primary
key contains a single attribute, the test need not be applied at all.
• Let us consider another example, patients table below. It is clear
that this table is in 1NF. And, the primary key for this table is
the composite key (PatientId, RelativeId).

01/22/2025 Fundamentals of Database Systems 18


Second Normal Form(cont..)
• So, to determine if it satisfies 2NF, you have to
find out if all other fields in it depend fully on
both PatientId and RelativeId; that is, you need to
decide whether the following conditions are true:
(PatientId, RelativeId) → Relationship; and
(PatientId, RelativeId) → Patient_tel.
• However, on the dependencies in the patient table,
only the following are true:
(PatientId, RelativeId) → Relationship; and
(PatientId) → Patient_tel
• Therefore, table Patients is not in 2NF.
01/22/2025 Fundamentals of Database Systems 19
Second Normal Form(cont..)
• In order to normalize table Patients to 2NF we can break it
into two normalized tables. The Patient_tel field really
doesn’t belong to Patients table because the patients’
telephone numbers have nothing to do with patients’ relatives
and should be associated with patients only.

01/22/2025 Fundamentals of Database Systems 20


Third Normal Form(3NF)
• Third normal form (3NF) is based on the concept of transitive
dependency.
• A functional dependency X→Y in a relation schema R is a
transitive dependency if there exists a set of attributes Z in R
that is neither a candidate key nor a subset of any key of R, and
both X→Z and Z→Y hold.
• The dependency Ssn →Dmgr_ssn is transitive through Dnumber
in EMP_DEPT, because both the dependencies Ssn → Dnumber
and Dnumber → Dmgr_ssn hold and Dnumber is neither a key
itself nor a subset of the key of EMP_DEPT. Intuitively, we can
see that the dependency of Dmgr_ssn on Dnumber is undesirable
in EMP_DEPT since Dnumber is not a key of EMP_DEPT.
• According to Codd’s original definition, a relation schema R is
in 3NF if it satisfies 2NF and no nonprime attribute of R is
transitively dependent on the primary key.
01/22/2025 Fundamentals of Database Systems 21
Third Normal Form(cont..)
• Let us consider the following table.

• The primary key of this table is EmpId. Assuming that Empname holds scalar
values, this table is in 1NF and also 2NF.
• Moreover, the fields: Empname and Department are all directly associated
with EmpId, the primary key. The last field, Dept_tel, however, contains
the telephone number of departments and therefore is determined by the
department, which is not part of the primary key. In short, the following holds
true in this table: EmpId → Department and Department → Dept_tel
• These dependencies can be put together to show the fact that the following
transitive dependency holds true. EmpId → Department → Dept_tel
01/22/2025 Fundamentals of Database Systems 22
Third Normal Form(cont..)
• The normalization of 2NF tables to 3NF involves the
removal of transitive dependencies. We remove the
transitively dependent fields(s) from the table by placing the
field(s) in a new table along with a copy of the determinant(s).
Therefore, the above table can be decomposed into two 3NF
tables shown below.

01/22/2025 Fundamentals of Database Systems 23


Transitive Dependency Anomalies
• Transitive dependencies, if not eliminated, could result in various insertions,
update or delete anomalies.
• Insertion Anomalies: Suppose a new department has just been created, but the
company hasn’t hired anyone for this new department. An error would occur if
you attempted to add data into the table because there is no EmpId associated with
the new department. Since EmpId is the primary key, you can’t add a new record
into the table with EmpId being null. However, in the normalized table the new
directory data can be inserted into the Department table with no problem.
• Deletion Anomalies: can occur to the Employee1 table if, for example, Zahara
Hagos leaves the company and her record is deleted from the Employee1 table.
Because Zahara Hagos is the only member of the Administration department,
all information associated with the Administration department will be wiped out
even though the department itself has not been eliminated.
• Update Anomalies: could occur to the UN-normalized employee table
(Employee1) if the Finance department changes the current telephone number to
a new one. You would have to update three records in the table even
though one piece of information about a department has changed. However, in the
normalized table, Department, you need to change only once, i.e., the Dept_tel data
of the record that holds Finance as department.
01/22/2025 Fundamentals of Database Systems 24
Boyce-Codd Normal Form (BCNF)
• The Boyce-Codd Normal Form is an extension to the 3NF for the
special case where:
There are at least two candidate keys in the table,
All the candidate keys are composite keys, and
There is overlapping field(s) in the candidate keys (there is at least
one field in common).
• When a table satisfies all these conditions, the 3NF can’t eliminate
all forms of transitive dependency. For a table to be in the BCNF, it
must be in the 3NF, and all fields in all its candidate keys must be
functionally independent. Violation of the BCNF is quite rare, it
may only happen under the above conditions.
• The following table(a) below shows a case in which the BCNF is
not fulfilled. Suppose that the following conditions apply to the
table(a). For each subject, each student of that subject is taught by
only one teacher, and each teacher teaches only one subject (but
each subject is taught by several teachers).
01/22/2025 Fundamentals of Database Systems 25
Boyce-Codd Normal Form (BCNF)

(a) (b) (c)


• The candidate keys for this table(a) are (Student, Subject) and (Student, Teacher).
Observe here that both candidate keys are composite and the Student field is the
overlapped field in the candidate keys. This implies that the table is not in BCNF.
• Moreover, this table(a) suffers from certain update anomalies. For example, if we wish
to delete the information that Zahara is studying Physics, we can not do so without
at the same time losing the information that Dr. Jemal teaches Physics.
• Normalizing this table to BCNF would require the decomposition of the above table(a)
into two tables(b) and (c).

01/22/2025 Fundamentals of Database Systems 26


Thank you!!

01/22/2025 Fundamentals of Database Systems 27

You might also like