0% found this document useful (0 votes)
28 views75 pages

(IT) 06 Normalisasi

The document discusses normalization and related concepts. It introduces normalization, data anomalies, functional dependencies, and keys. It provides examples of normalization results and discusses how normalization can improve data quality by reducing redundancy and anomalies.

Uploaded by

Giovanni Nadika
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views75 pages

(IT) 06 Normalisasi

The document discusses normalization and related concepts. It introduces normalization, data anomalies, functional dependencies, and keys. It provides examples of normalization results and discusses how normalization can improve data quality by reducing redundancy and anomalies.

Uploaded by

Giovanni Nadika
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 75

1

06 Normalization
CT12F3 Database Modelling And Implementation
2

Learning Outcomes
Students can explain normalization steps
3

Topics
Intro to Normalization
Data Anomalies
Functional Dependency & Keys
Normalization Steps
4

Intro to Normalization
5
Background

• Transforming ER diagrams into relations typically results in well-structured


relations. However, there is no guarantee that all anomalies or redundancies are
removed by following these steps
• Normalization is a formal process in logical data-modelling technique for deciding
which attributes should be grouped together in a relation so that all anomalies are
removed and smaller & well-structured relations (from an organization-wide
view) are produced.
6
Normalization Goals

• Minimize data redundancy, thereby avoiding anomalies and conserving storage


space.
• Simplify the enforcement of referential integrity constraints.
A referential integrity constraint is a rule that maintains consistency among the
rows of two relations. The rule states that if there is a foreign key in one relation,
either each foreign key value must match a primary key value in another relation or
the foreign key value must be null.
• Make it easier to maintain data (insert, update, and delete).
• Provide a better design that is an improved representation of the real world and a
stronger basis for future growth
7
Well-Structured Relations
• A well-structured relation contains
minimal redundancy and allows
users to insert, modify, and delete
the rows in a table without errors or
inconsistencies.

• EMPLOYEE1 is such a relation.


Each row of the table contains data
Figure 1. EMPLOYEE1 relation describing one employee, and any
modification to an employee’s data
(such as a change in salary) is
confined to one row of the table
8
Well-Structured Relations (2)

Figure 2. EMPLOYEE2 relation

• In contrast, EMPLOYEE2 is not a well-structured relation.


• If you examine the sample data in the table, you will notice considerable redundancy.
• For example, values for EmpID, Name, DeptName, and Salary appear in two separate rows for
employees 100, 110, and 150.
• Consequently, if the salary for employee 100 changes, we must record this fact in two rows.
9

Data Anomalies
10
Data Anomalies

• Redundancies in a table may result in errors or inconsistencies (called anomalies)


when a user attempts to change data in the table
• Three types of anomalies
• Insertion anomalies
• Deletion anomalies
• Modification anomalies
11
Insertion Anomaly

Figure 2. EMPLOYEE2 relation

• Suppose that we need to add a new employee to EMPLOYEE2.


• The primary key for this relation is the combination of EmpID and CourseTitle. Therefore, to insert
a new row, the user must supply values for both EmpID and CourseTitle (because primary key
values cannot be null or nonexistent).
• This is an anomaly because the user should be able to enter employee data without supplying course
data.
12
Deletion Anomaly

Figure 2. EMPLOYEE2 relation

• Suppose that the data for employee number 140 are deleted from the table.
• This will result in losing the information that this employee completed a course (Tax Acc) on
12/8/2015.
• In fact, it results in losing the information that this course had an offering that completed on that
date.
13
Modification Anomaly

Figure 2. EMPLOYEE2 relation

• Suppose that employee number 100 gets a salary increase.


• We must record the increase in each of the rows for that employee (two occurrences in the Figure);
otherwise, the data will be inconsistent.
14
Data Anomalies

• These anomalies indicate that EMPLOYEE2 is not a well-structured relation.


• The problem with this relation is that it contains data about two entities:
EMPLOYEE and COURSE. We will use normalization theory to divide
EMPLOYEE2 into two relations.
• One of the resulting relations is EMPLOYEE1. The other we will call EMP
COURSE.
• The primary key of this relation is the combination of EmpID and CourseTitle, and
we underline these attribute names to highlight this fact.
• Examine the Figure to verify that EMP COURSE is free of the types of anomalies
described previously and is therefore well structured.
15
Example of Normalization Result

EMP COURSE
16

Functional Dependency & Keys


17
Background

• A functional dependency is a constraint between two attributes in


which the value of one attribute is determined by the value of another
attribute.
• Normalization forms that we will discuss are based on the analysis of
functional dependencies
18
Definition

• For any relation R, attribute B is functionally dependent on attribute A if, for every
valid instance of A, that value of A uniquely determines the value of B (Dutka and
Hanson, 1989).
• The functional dependency of B on A is represented by an arrow, as follows: A →
B.
• A functional dependency is not a mathematical dependency: B cannot be computed
from A.
• Rather, if you know the value of A, there can be only one value for B. An attribute
may be functionally dependent on a combination of two (or more) attributes rather
than on a single attribute.
19
Example

Assuming Staff_No  Position, then it can be stated


that Position is functionally dependent to Staff_No.
That is:
Value in Position can be determined with certainty if
value in Staff_No is known.

Because one Position can be occupied by more than one


Staff, then Position  Staff_No is incorrect. In other
words, Staff_No is not functionally dependent to
Position. That is: Staff_No value can’t be determined if
Position value is known.
20
Example (2)
Typical examples of functional dependencies are the following:
• SSN → Name, Address, Birthdate A person’s name, address, and birth date are
functionally dependent on that person’s Social Security number (in other words,
there can be only one Name, one Address, and one Birthdate for each SSN).
• VIN → Make, Model, Color The make, model, and the original color of a vehicle
are functionally dependent on the vehicle identification number (as above, there can
be only one value of Make, Model, and Color associated with each VIN).
• ISBN → Title, FirstAuthorName, Publisher The title of a book, the name of the
first author, and the publisher are functionally dependent on the book’s international
standard book number (ISBN).
21
Example (3)

• Consider the relation EMP COURSE (EmpID, EMP COURSE


CourseTitle, DateCompleted) shown in the figure.
• We represent the functional dependency in this relation
as follows:
EmpID, CourseTitle → DateCompleted
• The comma between EmpID and CourseTitle stands for
the logical AND operator, because DateCompleted is
functionally dependent on EmpID and CourseTitle in
combination.
• The functional dependency in this statement implies that
the date when a course is completed is determined by
the identity of the employee and the title of the course.
22
Determinant

• The attribute on the left side of the arrow in a functional dependency is


called a determinant.
• SSN, VIN, and ISBN are determinants in the preceding three examples.
• In the EMP COURSE relation, the combination of EmpID and
CourseTitle is a determinant.
23
Candidate Keys

• A candidate key is an attribute, or combination of attributes, that


uniquely identifies a row in a relation.
• A candidate key must satisfy the following properties (Dutka and
Hanson, 1989):
• Unique identification For every row, the value of the key must uniquely identify
that row. This property implies that each nonkey attribute is functionally
dependent on that key.
• Nonredundancy No attribute in the key can be deleted without destroying the
property of unique identification.
24
Example (4)

• The EMPLOYEE1 relation has the following schema:


EMPLOYEE1(EmpID, Name, DeptName, Salary).
• EmpID is the only determinant in this relation.
• All of the other attributes are functionally dependent on EmpID.
• Therefore, EmpID is a candidate key and (because there are no other candidate
keys) also is the primary key.
25
Example (5)

• For the relation EMPLOYEE2 EmpID does not uniquely identify a row in the relation. For example, there are two rows in the
table for EmpID number 100.
• There are two types of functional dependencies in this relation:
1. EmpID → Name, DeptName, Salary
2. EmpID, CourseTitle → DateCompleted

• The functional dependencies indicate that the combination of EmpID and CourseTitle is the only candidate key (and therefore
the primary key) for EMPLOYEE2.
• In other words, the primary key of EMPLOYEE2 is a composite key.
26
Representing Functional Dependencies
27
Candidate Key & Determinant

• A candidate key is always a determinant, whereas a determinant may or may not


be a candidate key.
• For example, in EMPLOYEE2, EmpID is a determinant but not a candidate key.
• A candidate key is a determinant that uniquely identifies the remaining (nonkey)
attributes in a relation.
• A determinant may be a candidate key (such as EmpID in EMPLOYEE1), part of a
composite candidate key (such as EmpID in EMPLOYEE2), or a nonkey attribute.
28
Example (6)
SID StdName CourseCode CourseName Address Lecturer Lounge

1103156000 Thomas Alfa CSH2C3 Pemodelan Jl. Sukabirus VRE F202


Edison Basis Data
1103156000 Thomas Alfa CSG2H3 Pemrograman Jl. Sukabirus MTD F211
Edison Berbasis Objek
1103156002 Galileo Galilei CSH2C3 Pemodelan Jl. Buah Batu NDN F211
Basis Data

Based on table above, FD that can be defined:

SID  StdName

SID  Address

CourseCode  CourseName

Lecturer  Lounge

SID, CourseCode  Lecturer


29

How to make sure that determinant


is a candidate key?
30
Requirements

• The set of attributes whose attribute closure is set of all attributes of


relation is called super key of relation.
• The minimal set of attributes whose attribute closure is set of all
attributes of relation is called candidate key of relation.
31
Closure of Functional Dependencies

• Given a set F set of functional dependencies, there are certain other


functional dependencies that are logically implied by F.
• If A  B and B  C, then we can infer that A  C
• etc.
• The set of all functional dependencies logically implied by F is the
closure of F.
• We denote the closure of F by F+.
32
Closure of Functional Dependencies

• We can compute F+, the closure of F, by repeatedly applying Armstrong’s Axioms:


• Reflexivity rule : if   , then   
• Augmentation rule : if   , then     
• Transitivity rule : if   , and   , then   
• These rules are
• Sound -- generate only functional dependencies that actually hold, and
• Complete -- generate all functional dependencies that hold.
33
Example of F +

 R = (A, B, C, G, H, I)
F={AB
AC
CG  H
CG  I
B  H}
 Some members of F+
 AH
by transitivity from A  B and B  H
 AG  I
by augmenting A  C with G, to get AG  CG
and then transitivity with CG  I
 CG  HI
by augmenting CG  I to infer CG  CGI, and augmenting of CG  H to infer CGI  HI, and
then transitivity
34
Closure of Functional Dependencies (cont.)

Additional rules:
• Union rule:
If    holds and    holds, then     holds.
• Decomposition rule:
If     holds, then    holds and    holds.
• Pseudotransitivity rule:
If    holds and     holds, then     holds.
The above rules can be inferred from Armstrong’s axioms.
35
Example of F (2)
+

R = (A, B, C, G, H, I)
F = {A  B
AC
CG  H
CG  I
B  H}
 (AG)+
1. result = AG
2. result = ABCG (A  C and A  B)
3. result = ABCGH (CG  H and CG  AGBC)
4. result = ABCGHI (CG  I and CG  AGBCH)
 Is AG a candidate key?
 Is AG a super key?
 Does AG  R? == Is R  (AG)+
 Is any subset of AG a superkey?
 Does A  R? == Is R  (A)+
 Does G  R? == Is R  (G)+
36
Continuing from Example (6)
SID StdName CourseCode CourseName Address Lecturer Lounge

Based on rules explained, FD can be defined:


• SID, CourseCode  SID (Rule: Subset/Reflexivity)
• SID, CourseCode  CourseCode (Rule: Subset/Reflexivity)
• CourseCode, Lecturer  CourseName, Lecturer (Rule: Augmentation with addition of Lecturer attribute on
both sides)
• SID, CourseCode  Lounge (Rule: Transitivity)
from SID, CourseCode  Lecturer and Lecturer  Lounge
• SID  StdName, Address (Rule: Union)
from SID  StdName and SID  Address
• and so on
37

Normalization Steps
38
Intro

• Now that we have examined functional dependencies and keys, we are ready to
describe and illustrate the steps of normalization.
• If an EER data model has been transformed into a comprehensive set of relations
for the database, then each of these relations needs to be normalized.
• In other cases in which the logical data model is being derived from user interfaces,
such as screens, forms, and reports, you will want to create relations for each user
interface and normalize those relations.
39
Steps in Normalization
40
Steps in Normalization (cont.)
1. First normal form Any multivalued attributes (also called repeating groups) have been removed, so
there is a single value (possibly null) at the intersection of each row and column of the table.
2. Second normal form Any partial functional dependencies have been removed (i.e., nonkey attributes
are identified by the whole primary key).
3. Third normal form Any transitive dependencies have been removed (i.e., nonkey attributes are
identified by only the primary key).
4. Boyce-Codd normal form Any remaining anomalies that result from functional dependencies have
been removed (because there was more than one possible primary key for the same nonkeys).
5. Fourth normal form Any multivalued dependencies have been removed.
6. Fifth normal form Any remaining anomalies have been removed.
41
Example
• For a simple illustration, we use a customer invoice from Pine Valley Furniture Company
42

Represent the View in Tabular Form


43
Invoice data (Pine Valley Furniture Company)

*Plus data for second order (Order ID 1007)


44

Convert to First Normal Form


45
1NF Constraints

A relation is in first normal form (1nF) if the following two constraints both apply:

1. There are no repeating groups in the relation (thus, there is a single fact at the
intersection of each row and column of the table).

2. A primary key has been defined, which uniquely identifies each row in the
relation.
46
Remove Repeating Group

Repeating Group
47
Relation with no repeating group
48
Determine FD & Select the Candidate Key
There are four determinants in INVOICE, and their functional dependencies are the following:
• OrderID → OrderDate, CustomerID, CustomerName, CustomerAddress
• CustomerID → CustomerName, CustomerAddress
• ProductID → ProductDescription, ProductFinish, ProductStandardPrice
• OrderID, ProductID → OrderedQuantity
From the four determinants we can infer that:
• OrderID, ProductID  OrderDate, CustomerID, CustomerName, CustomerAddress,
ProductDescription, ProductFinish, ProductStandardPrice, OrderedQuantity
As you can see, the only candidate key for INVOICE is the composite key consisting of the attributes
OrderID and ProductID (because there is only one row in the table for any combination of values for
these attributes).
49
Relation in 1NF with CK

Candidate Key
50

Anomalies still occur in 1NF

INSERTION DELETION UPDATE


ANOMALY ANOMALY ANOMALY
51

Convert to Second Normal Form


52
2NF Constraints

A relation is in second normal form (2NF) if it is :

1. In first normal form

2. Contains no partial functional dependencies.

A partial functional dependency exists when a nonkey attribute is functionally


dependent on part (but not all) of the primary key.
53
Functional Dependency Diagram for Invoice

As you can see, the following partial dependencies exist in the


diagram:
 OrderID → OrderDate, CustomerID, CustomerName, CustomerAddress
 ProductID → ProductDescription, ProductFinish, ProductStandardPrice
54

• OrderID, ProductID OrderID, ProductID, OrderDate, CustomerID, CustomerName,


CustomerAddress, ProductDescription, ProductFinish, ProductStandardPrice, OrderedQuantity
• OrderID, ProductID OrderID (partial)
• OrderID OrderID (full)
• OrderID, ProductID  ProductID (partial)
• ProductID  ProductID (full)
• OrderID, ProductID  OrderDate (partial)
• OrderID  OrderDate (full)
• OrderID, ProductID  CustomerID, CustomerName, CustomerAddress (partial)
• OrderID  CustomerID, CustomerName, CustomerAddress (full)
• OrderID, ProductID  ProductDescription, ProductFinish, ProductStandardPrice (partial)
• ProductID  ProductDescription, ProductFinish, ProductStandardPrice (full)
• OrderID, ProductID  OrderedQuantity (Full)
55
2NF
• OrderID  OrderID, OrderDate , CustomerID, CustomerName,
CustomerAddress
• ProductID  ProductID, ProductDescription, ProductFinish,
ProductStandardPrice
• OrderID, ProductID  OrderedQuantity

Tabel 1 OrderID OrderDate CustomerID CostumerNAme CustomerAddress

Tabel 2 ProductID ProductDescription ProductFinish ProductStandardPrice

Tabel 3 OrderID ProductID OrderedQuantity


56
3NF
• OrderID  OrderID, OrderDate , CustomerID, CustomerName,
CustomerAddress (transitive)
• CustomerID  CustomerName, CustomerAddress
• OrderID  OrderID, OrderDate , CustomerID

Tabel 1.1 OrderID OrderDate CustomerID

Tabel 1.2 CostumerID CustomerNAme CustomerAddress


Tabel 2 ProductID ProductDescription ProductFinish ProductStandardPrice

Tabel 3 OrderID ProductID OrderedQuantity


57
Removing partial dependencies

To convert a relation with partial dependencies to second normal form, the following
steps are required:

1. Create a new relation for each primary key attribute (or combination of
attributes) that is a determinant in a partial dependency. That attribute is the primary
key in the new relation.

2. Move the nonkey attributes that are only dependent on this primary key attribute
(or attributes) from the old relation to the new relation.
58
Relation in 2NF (no partial dependencies)
59

Convert to Third Normal Form


60
Constraints

A relation is in third normal form (3NF) if it is:

1. in second normal form

2. no transitive dependencies exist.

Transitive dependency in a relation is a functional dependency between the


primary key and one or more nonkey attributes that are dependent on the primary
key via another nonkey attribute.
61

There are two transitive dependencies in the CUSTOMER ORDER relation


shown in the diagram:

1. OrderID → CustomerID → CustomerName


2. OrderID → CustomerID → CustomerAddress
62
Removing transitive dependencies
You can easily remove transitive dependencies from a relation by means of a three-step procedure:

1. For each nonkey attribute (or set of attributes) that is a determinant in a relation, create a new
relation. That attribute (or set of attributes) becomes the primary key of the new relation.

2. Move all of the attributes that are functionally dependent only on the primary key of the new relation
from the old to the new relation.

3. Leave the attribute that serves as a primary key in the new relation in the old relation to serve as a
foreign key that allows you to associate the two relations.
63
Removing transitive dependencies

into
64
Relation in 3NF (no transitive dependencies)
65
Relational schema for invoice data
Customer Order

PK CustomerID PK OrderID

CustomerName OrderDate

CustomerAddress FK CustomerID

Product OrderLine

PK ProductID PK FK OrderID

ProductDescription PK FK ProductID

ProductFinish OrderedQuantity

ProductStandardPrice
66

Advanced Normalization:
Boyce-Codd Normal Form
67
Definition & Constraint

• A relation is in Boyce-Codd normal form (BCNF) if and only if every


determinant in the relation is a candidate key
68
Invoice Relation
There are four determinants in INVOICE from functional dependencies

• OrderID → OrderDate, CustomerID, CustomerName, CustomerAddress

• CustomerID → CustomerName, CustomerAddress

• ProductID → ProductDescription, ProductFinish, ProductStandardPrice

• OrderID, ProductID → OrderedQuantity

3NF relation:

Based on 3NF result, INVOICE relations already in BCNF, because every determinant is a candidate key.
69
Relation in 3NF but not in BCNF
Relation with sample data

Functional dependencies in
STUDENT ADVISOR
70
2-step process

1. The relation is modified so that the determinant in the


relation that is not a candidate key becomes a component of
the primary key of the revised relation.
71
2-step process

2. Decompose the relation to eliminate the partial functional


dependency
72
Note for Decomposition

• The decomposition is a lossless decomposition


Let R be a relation schema and let R1 and R2 form a decomposition of R . That is R =
R1 U R2
We say that the decomposition is a lossless decomposition if there is no loss of
information by replacing R with the two relation schemas R1 U R2
73
Example of Lossy Decomposition

FAILURE DECOMPOSITION
74

Reference

Hoffer, Jeffrey A., et.al., "Modern Database Management", Twelfth Edition, Pearson,
2016. Chapter 4
75
Questions

You might also like