0% found this document useful (0 votes)
61 views

4.4 Normalization

Normalization is a technique for organizing data in a database to minimize redundancy and dependency. It involves decomposing relations with anomalies into smaller, non-redundant relations. The goals of normalization are to minimize duplication of information, ensure data dependencies make sense, and simplify the management of data. It involves testing relations against normal forms like 1NF, 2NF, 3NF and BCNF to remove anomalies like insertion, deletion and modification anomalies.

Uploaded by

Hayredin Mussa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views

4.4 Normalization

Normalization is a technique for organizing data in a database to minimize redundancy and dependency. It involves decomposing relations with anomalies into smaller, non-redundant relations. The goals of normalization are to minimize duplication of information, ensure data dependencies make sense, and simplify the management of data. It involves testing relations against normal forms like 1NF, 2NF, 3NF and BCNF to remove anomalies like insertion, deletion and modification anomalies.

Uploaded by

Hayredin Mussa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 55

Normalization

CS341
Purpose of Normalization
 Normalization is a technique for producing a set of
suitable relations that support the data requirements of
an enterprise
 Characteristics of a suitable set of relations include:
 Using the minimal number of attributes that is
necessary to support the data requirements of the enterprise;
 Attributes with a close logical relationship are found in the
same relation;
 Minimal Redundancy: Each attribute is represented only once
with the exception of attributes that form a part of foreign keys.

2
Purpose of Normalization (cont.)
 Normalization decomposes relations to smaller relations
 Normalization involves creating tables and establishing
relationships between those tables
 The benefits of using a database that has a suitable set
of relations is that :
 The database will be easier for the user to access and maintain
the data;
 Update Anomalies will be avoided
 The database will need minimal storage space on the
computer

3
Why Normalize Relations?
 Example 1
 We have to repeat the loan amount once for each customer

4
Why Normalize Relations? (cont.)
 Example 2
 We store the amount of each loan exactly once
 Less Redundancy
 once

5
Normalization Process
 Normalization is performed as a series of tests on a
relation to determine whether it satisfies the
requirements of a given Normal Form
 It is based on the Functional Dependencies among the
attributes of a relation.
 A relation is normalized to prevent the possible
occurrence of Update Anomalies

6
Normal Forms
 The 4 most commonly used normal forms are
 First Normal Form (1NF)
 Second Normal Form (2NF)
 Third Normal Form (3NF)
 Boyce-Codd Normal Form (BCNF)

7
What does Normalization do?
 Normalization decomposes relations to smaller/non-
redundant relations for efficient storage & processing
 Two important properties of Decomposition.
 Lossless-join property enables us to find any instance of the
original relation from corresponding instances in the smaller
relations.
 Dependency preservation property enables us to enforce a
constraint on the original relation by enforcing some
constraint on each of the smaller relations.

8
Data Redundancy
 Major aim of relational database design is the grouping
of attributes into relations to
 Minimize data redundancy and
 Reduce file storage space required
 Main benefits of minimizing redundancy include:
 Updates are achieved with a minimal number of operations
 thus reducing the opportunities for data inconsistencies.
 Reduction in the file storage space required by the base
relations
 thus minimizing costs.

9
Update Anomalies
 Update Anomalies are data inconsistencies created due
to an update operation (insert, update, and delete )
 Relations that contain redundant information may
potentially suffer from update anomalies
 Normalization is the process of decomposing relations
with anomalies to produce well-structured relations
 Types of Update Anomalies are:
 Insertion:- impossible to insert some data
 Deletion:- Loss of dependent data
 Modification:- Inconsistency (need to update all occurrences of
the data)

10
Update Anomalies
 Insertion Anomaly:
 We cannot insert a department without inserting a member of staff that works in
that department.
 Example: Insert a new department named Administration
 Update Anomaly:
 Updates may need to be made at several points
 Example: Changing the name of the department that employee “101” works in
without simultaneously changing the department that “102” works.

11
Update Anomalies (cont.)
 Deletion Anomaly:
 Deletion of one record may result in the loss of other information
 Example: By removing, employee 100, we have removed all information
pertaining to the sales department

12
Functional Dependencies
 Functional dependency describes relationship between
attributes.
 FDs provide a formal mechanism to express constraints
between attributes
 If A and B are attributes of relation R, then B is
functionally dependent on A
 If each value of A is associated with exactly one
value of B in R
 A  B means that the value of B is determined by A

13
Notation - FD
 The notation of functional dependency is
A → B.
 The meaning of this notation is:
1. “A” determines “B”
2. “B” is functionally dependent on “A”
 “A” is called the DETERMINANT
 “B” is called OBJECT of the determinant
 Example
 StudentID → GPA
 The value is the GPA can be determined if
we know the
studentID
Child → Mother ? OR
14
Mother → Child ?
Example

15
Functional Dependencies
 The Determinant of a functional dependency refers to
the attribute or group of attributes on the left-hand
side of the arrow.

16
Functional Dependencies
 Consider the values shown in staffNo and sName
attributes of the Staff relation
 the following functional dependencies appear to hold.
staffNo → sName
sName →
staffNo
 However, the only functional dependency that remains
true for all possible values for the staffNo and sName
attributes of the Staff relation is:
staffNo → sName

17
FD - Example

 Primary key: Customer_No + Property_No

 What are the Functional Dependencies?

(Customer_No + Property_No) → (RentStart, RentFinish)


(Customer_No + Property_No) → Cname
(Customer_No) → Cname
(Customer_No +Property_No) → (Paddress, Rent, Owner_No,
Oname) (Property_No) → (Paddress, Rent, Owner_No, Oname)
18
Full FD – Characteristics
 Determinants should have the minimal number of
attributes necessary to maintain the functional
dependency with the attribute(s) on the right hand-side.
 This requirement is called full functional dependency (FFD)
 We can also say that the determinant is MINIMAL
 B is fully functionally dependent on A, if:
 A and B are attributes of a relation
 B is functionally dependent on A, and
 B is not functionally dependent on any proper subset of A

19
Partial Dependency
 This is the situation that exists if it is possible to use a
subset of the attributes of the composite determinant to
identify its object
 Example
staffNo, sName → branchNo
 Each value of (staffNo, sName) is associated with a
single value of branchNo.
 branchNo is also functionally dependent on a subset of
(staffNo, sName)
 namely staffNo
 This is a Partial Dependency
 i.e. branchNo is partially dependent on staffNo, sName
20
FD - Examples
 Identify the functional dependencies

A B C D
1 4 10 100
2 6 20 50
3 6 20 200
1 4 10 200
2 6 20 200
3 6 20 300
1 4 10 null
2 6 20 50
3 6 20 50

21
Transitive Dependencies
 Transitive dependency describes a condition where A, B,
and C are attributes of a relation such that
 if A → B and B → C, then
 C is transitively dependent on A via B
 provided that A is not functionally dependent on B or C
 Example: Consider functional dependencies in the
StaffBranch relation
 staffNo → sName, position, salary, branchNo, bAddress
 branchNo → bAddress
 Transitive dependency
 branchNo → bAddress exists on staffNo via branchNo.

22
FD in Normalization
 Functional dependencies in Normalization:
 There is a one-to-one relationship between the attribute(s) on
the left-hand side (determinant) and those on the right-hand side
of a functional dependency.
 The functional dependency holds for all times
 The determinant has the minimal number of attributes
necessary to maintain the dependency with the attribute(s) on
the right hand-side.

23
The Process of Normalization
 Formal technique for analyzing a relation based on its
primary key and the functional dependencies between
the attributes of that relation.
 Often executed as a series of steps.
 Each step corresponds to a specific normal form, which
has known properties.
 As normalization proceeds, the relations become
 progressively more restricted (stronger) in format and
 less vulnerable to update anomalies

24
The Process of Normalization

25
26
Identifying FDs
 FDs between a set of attributes is relatively simple if
 the meaning of each attribute and
 the relationships between the attributes are well understood.
 This information should be clear from the users’
requirements specification.

27
Identifying FDs - Example
 FDs for the StaffBranch relation:
 staffNo → sName, position, salary, branchNo, bAddress
 branchNo → bAddress
 bAddress → branchNo
 branchNo, position → salary
 bAddress, position → salary

28
Repeating Group
 A repeating group is an attribute (or set of attributes) that
can have more than one value for a primary key value.
 Repeating groups are not allowed in well-formed
relations
 all attributes have to be atomic
 i.e., there can only be one value per cell in a table

29
First Normal Form (1NF)
 A relation that contains one or more repeating groups is
referred to as Unnormalized Form (UNF)
 1NF - A relation in which the intersection of each row
and column contains one and only one value
 A relation should only include ATOMIC values
 Converting from UNF to 1NF
 Select attribute(s) to act as the key for the table
 Identify the repeating group(s) in the unnormalized table which
repeats for the key attribute(s)
 Remove the repeating group i.e. ‘flatten’ the table
 Create a separate row for each repeated (multivalued) attribute
 Repeat the data for the empty columns in each of the rows

30
1NF - Example
 Contacts relation is not in the 1NF
 Why?
 Convert the Contacts relation to the 1NF

31
1NF - Example
 Contacts relation in 1NF
 Eliminate repeating groups
 Identify each set of related data with a primary key
 All attributes are single valued & non-repeating

32
1NF - Example
 STAFF relation is not in the 1NF
 Why?
 Convert the STAFF relation to the 1NF

33
1NF - Example
 Department relation is not in the 1NF
 Why?
 Convert the Department relation to the 1NF
DepID DepName Location DepID DepName Location
1 Admin A, C 1 Admin A
2 Finance A, B, C 1 Admin C
3 Sales C 2 Finance A
4 Research A, B 2 Finance B
2 Finance C
3 Sales C
4 Research A
4 Research B
34
Second Normal Form (2NF)
 The 2NF is based on the concept of full functional
dependency.
 Full functional dependency indicates that if
 A and B are attributes of a relation, then B is fully dependent on
A if B is functionally dependent on A but not on any proper
subset of A
A relation is said to be in the 2NF if
 The relation that is in 1NF and
 Every non-primary-key attribute is fully functionally dependent
on the primary key
 2NF => No PARTIAL Dependency on the PK
 There should not be a data item that is functionally
dependent on the a part of the compound key!

35
1NF to 2NF
 Identify the primary key for the 1NF relation.
 Identify the functional dependencies in the relation.
 If partial dependencies exist on the primary key remove
them
 Placethem in a new relation along with a copy of their
determinant

36
1NF to 2NF - Example
 The relation EMP_PROJ is NOT in the 2NF
EmpID EName ProjID ProjName TotalTime

 The Primary Key is EmpID and ProjID


 Each attribute in the relation EMP_PROJ should be
dependent on the whole of the PK
 The FDs are
 EmpID → EName
 ProjID → ProjName
 EmpID + ProjID → TotalTime
 EName and ProjName are partially dependent on the
composite key EmpID and ProjID
37
1NF to 2NF - Example
 EMPLOYEE PROJECT relation
EmpID EName ProjID ProjName TotalTime

 The relation EMPLOYEE PROJECT can be transformed


to second normal
 The FFDs are:
 EmpID → Ename
 EmpID, ProjID → TotalTime
 ProjID → ProjName
 Solution => Decomposition
 EMPLOYEE (EmpID, EName)
 HOURS ASSIGNED(EmpID, ProjID, TotalTime)
 PROJECT (ProjID, ProjName)

38
Third Normal Form (3NF)
 3NF is based on the concept of Transitive
Dependency.
 Transitive Dependency is a condition
where
 A, B and C are attributes of a relation such that if
A  B and B  C,
 then C is transitively dependent on A through B.
 Provided that A is not functionally dependent on
B or C
 A relation is in the 3NF if
 it is in 2NF and
 No non-primary-key attribute is transitively
dependent on the
Primary Key
39
2NF to 3NF
 Identify the primary key in the 2NF relation.
 Identify functional dependencies in the relation.
 If transitive dependencies exist on the primary key
remove them
 by placing them in a new relation along with a copy of their
determinant.

40
2NF to 3NF - Example
 The Employee relation is not in the 3NF
EmpID EName DepID DepName

 The Primary key is EmpID


 The Functional Dependencies are:
 EmpID → Ename
 EmpID → DepID
 EmpID → DepName
 DepID → DepName
 EmpID → DepName is a Transitive Dependency

41
2NF to 3NF - Example
 Solution => Decompose:
 If transitive dependencies exist on the primary key, then
 remove them by placing them in a new relation along with a copy of their
determinant
 EmpID → DepName is a Transitive Dependency

EmpID EName DepID

DepID DepName

42
2NF to 3NF - Example
 Relation
SALES (CUSTOMERID, CUSTOMERNAME, SALESPERSON, REGION)

 Functional Dependencies
CUSTOMERID → CUSTOMERNAME
CUSTOMERID → SALESPERSON
CUSTOMERID → REGION

SALESPERSON → REGION

 CUSTOMERID → REGION
is a Transitive FD
 Solution?
SALES1 (CUSTOMERID, CUSTOMERNAME, SALESPERSON)
SALES2 (SALESPERSON, REGION) 43
Boyce–Codd Normal Form

Reading Assignment

44
Disadvantages of Normalization
 The disadvantage of normalization is that it produces
many tables
 A query might require retrieval of data from multiple normalized
tables
 This can result in complicated table joins
 Decomposition of tables can lead to a performance
problem
 All the joins required to merge data will slow down the
process
 Denormalization
 Denormalization is used to improve performance in cases
where over-normalized structures are causing overhead to the
query processor
 Denormalization should always be done after the relation
are normalized 45
Definition of Terms
 Key is an attribute or group of attributes, which is used
to identify a row in a relation
 Key can be broadly classified into (1) Superkey (2)
Candidate key, and (3) Primary key
 Supper Key: a set of attributes that are unique for each
record
 Candidate key: a minimal superkey
 A minimal set of attributes whose values uniquely identify tuples
in the corresponding relation.
 Primary Key: is a designated candidate key
 The primary key should not be null
 Normal Form of a Relation: refers to the highest normal
form that it satisfies
46
THE NORMALISATION OATH
 A useful mnemonic for remembering the rationale for
normalization is the distortion of the legal oath:
1. No Repeating,
2. The Data-Items Depend Upon The Key,
3. The Whole Key,
4. And Nothing But The Key,
5. So Help Me Codd.
 Line 1 indicates that there should be no repeating groups of data in a
table.
 Line 2 states that all data-items in a table must depend solely upon the
key.
 Line 3 indicates that there should be no part-key dependencies in a
table.
 Line 4 reminds us that there should be no inter-data dependencies in a
table. The only dependency should be between the key and other data-
items in a table 47
 Line 5 simply reminds us that the techniques were originally developed
by
Exercise
 Convert the following relation to the 3NF
1. R1 = ( A , B , C , D )
 AB,C, D
 C D

2. R2 = ( A , B , C , D , E)
 A,B C,D,E
 B D

3. R3 = ( A , B , C , D , E)
 A B,C,D,E
 B D,E
48
Exercise
 A company obtains parts from a number of suppliers. Each supplier
is located in one city. A city can have more than one supplier located
there and each city has a status code associated with it. Each
supplier may provide many parts. Identify in which normal form of
the following relation and normalize it to third normal form.
R (sID, cCode, city, pID, quantity)

 sID - Supplier identification number


 cCode - code assigned to city
 city - City where supplier is located
 pID - Part number of part supplied
 quantity - Qty of parts supplied to date

 Composite primary key is (sID, pID)


49
Exercise - solution
R (sID, cCode, city, pID, quantity)

 Primary Key is (sID, pID)


 FDs
 sID  city
 city  cCode
 sID, pID  quantity
 2NF
 R1 (sID, cCode, city)
 R2 (sID, pID, quantity)
 3NF
 R11 (sID, cCode)
 R12 (cCode, city)

50
Boyce–Codd Normal Form
 BCNF is based on functional dependencies that take into
account all candidate keys in a relation
 A relation is in BCNF if and only if every determinant is
a candidate key.
 If X  A , then X is a candidate key
 (A , B , C )
 A,BC
 C  B [ not allowed in BCNF unless C is an Alternate KEY]

51
BCNF
 Violation of BCNF is quite rare.
 Usually violation to BCNF may occur in a relation that:
 contains two (or more) composite candidate keys;
 the candidate keys overlap
 i.e. have at least one attribute in common.
 Example PATIENT HOSPITAL DOCTOR
 FDs
 {PATIENT, HOSPITAL}  DOCTOR
 DOCTOR  HOSPITAL
 Converting to BCNF
 R1 ( PATIENT, DOCTOR )
 R2 (DOCTOR , HOSPITAL )

52
BCNF and 3NF
 Difference between 3NF and BCNF is that for a
functional dependency A  B
 3NF allows this dependency in a relation if B is a primary-key
attribute and A is not a candidate key.
 BCNF insists that for this dependency to remain in a relation, A
must be a candidate key
 Every relation in BCNF is also in 3NF.
 However, a relation in 3NF is not necessarily in BCNF.

53
BCNF - Example
 FDs
 StudID, Course  Instructor
 Instructor  Course (an instructor teaches only one course)
 The relation is not in the BCNF because
 Instructor is not a KEY and StudID Course Instructor
 Instructor  Course 101 Java P. Java
 To change to BCNF 101 C++ P. Cpp
 R1 (StudID, Instructor) 102 Java P. Java2
 R2 (Instructor , Course) 103 C# P. Hash
104 Java P. Java

54
Exercise - Normalize to the 3NF
1. Student (StudID, StudName, Gender, HSCode,
HSName, HSCity, GPA, Priority )
 FDs
 StudID  StudName , Gender
 GPA  Priority
 HSCode  HSName, HSCity

2. R(A,B,C,D,E,F,G,H,I,J )
 FDs
 A,BC
 A D,E
 B F
 D I , J
 F G, H

55

You might also like