0% found this document useful (0 votes)
3 views

normalization

Normalization is the process of converting complex data structures into simpler, stable forms to avoid data duplication. It involves several steps, primarily focusing on First Normal Form (1NF), Second Normal Form (2NF), and Third Normal Form (3NF), which ensure that data is organized efficiently and without redundancy. Functional dependencies play a crucial role in normalization, helping to identify relationships between attributes and ensuring that non-key attributes depend only on primary keys.

Uploaded by

Loki Legends
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

normalization

Normalization is the process of converting complex data structures into simpler, stable forms to avoid data duplication. It involves several steps, primarily focusing on First Normal Form (1NF), Second Normal Form (2NF), and Third Normal Form (3NF), which ensure that data is organized efficiently and without redundancy. Functional dependencies play a crucial role in normalization, helping to identify relationships between attributes and ensuring that non-key attributes depend only on primary keys.

Uploaded by

Loki Legends
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Normalization

Re-edited by: Oum Saokosal


Master of Engineering in Information Systems,
Jeonju University, South Korea
012-252-752

[email protected]
Normalization

◆Normalization: the process of converting complex data


structures into simple, stable data structures.

The main idea is to avoid duplication of large data.


◆Why normalization?
• The relation derived from the user view or data store will most likely be
unnormalized.

• The problem usually happens when an existing system uses


unstructured file, e.g. in MS Excel.
The Three Steps of Normalization

The standard normalization has more


than three steps:
• First Normal Form (1NF) Second Normal
Form (2NF) Third Normal Form (3NF)
• Boyce-Codd Normal Form (BCNF) Fourth Normal Form
(4NF) Fifth Normal Form (5NF)
• Domain/Key Normal Form (DKNF)

However, only three steps (1NF, 2NF, 3NF) are


sufficient for normalization.
I. First Normal Form (1NF)

The official qualifications for 1NF are:


1. Each attribute must have a unique name.
2. Each attribute must have a single value. 3. Row cannot be
duplicated.
4. There is no repeating groups.
Additional:

1. Choose a primary key. The primary key


can be an attribute or combined attributes.
Name DOB
Course

Sok
11/5/1990
IT

Sao
4/4/1989
Mgt
Chan
7/7/1991 IT Mgt

Sok
11/5/1990 Mgt
Payment
450 Dollars

400 Dollars

IT: 450 Dollars Mgt: 400 Dollars


400 Dollars

Sao
4/4/1989 Tour
1) 200 Dollars
2) 200 Dollars

1. Each attribute has unique name -> Good


2. The Payment has multi data type (currency & string) -> Bad
3. All rows are not duplicated -> Good
4. The Course and Payment have repeating groups -> Bad
Name
DOB
Course
Payment ($)
Sok
11/5/1990 IT
450

Sao
4/4/1989 Mgt
400

Chan
7/7/1991 IT
450

Chan
7/7/1991 Mgt
400
Sok
11/5/1990 Mgt
400

Sao
4/4/1989 Tour
200

Sao
4/4/1989
Tour
200

All correct? Not yet. Choose a primary key.


Name? No. Name has duplicated values.

Or DOB, or Course or Payment? No. Each one has duplicated values.

Name and DOB? No. They still have duplicated values.


Name and DOB and Course? No. Still duplicated.
Combine all attribute? Still no. The last two rows are duplicated. So what else we
can do?
Of course, there is a way. Add a new attribute to be a primary key. So
let's call it ID.
ID
Name
DOB
Course
Payment
1
Sok
11/5/1990 IT
450

2
Sao
4/4/1989 Mgt
400
‫ليا‬

3
Chan
7/7/1991 IT
450

4
Chan
7/7/1991 Mgt
400

5
Sok
11/5/1990 Mgt
300

6
Sao
4/4/1989 Tour
200

7
Sao
4/4/1989 Tour
200

Now it is completely in 1NF.

Next, check it if it is not in 2NF.


II. Second Normal Form (2NF)

The official qualifications for 2NF are:


1. A table is already in 1NF.

2. All nonkey attributes are fully dependent on the


primary key.
All partial dependencies are removed and placed in
another table.
Assume you have a table below contain a primary (CourseID + Semester):

CourseID
Semester
Num Student
Course Name

IT101
2013-1
25
Database

IT101
2013-2
25
Database

IT102
2013-1
30
Web Prog
IT102
2013-2
35
Web Prog

IT103
2014-1
20
Networking

Primary Key

The Course Name depends on only CourseID, a part of the primary key not the whole primary
(CourseID + Semester). It's called partial dependency.

Solution:
Remove CourseID and Course Name together
to create a new table.
CourseID
Course Name
CourseID
Semester
Num Student

IT101
Database
IT101
2013-1
25

IT101
Database
IT101
2013-2
25

IT102
Web Prog
IT102
2013-1
30

IT102
Web Prog
IT102
2013-2
35

IT103
Networking
IT103
2014-1
20

Done?

Oh no, it is still not in 1NF yet.


You have to remove the repeating
groups too.
1

CourseID
Course Name

IT101
Database

IT102
Web Prog
IT103
Networking
III. Third Normal Form (3NF)

The official qualifications for 3NF are:


1. A table is already in 2NF.

2. Nonprimary key attributes do not depend on other


nonprimary key attributes (i.e. no transitive dependencies)
All transitive dependencies are removed and placed in another
table.
Assume you have a table below contain a primary (StudyID):

StudyID

2 3
45
Course Name
Teacher Name
Teacher Tel

Database
Sok Piseth
012 123 456

Database
Sao Kanha
0977 322 111

Web Prog
Chan Veasna
012 412 333

Web Prog Networking


Chan Veasna
012 412 333

Pou Sambath
077 545 221

Primary Key

Solution:
The Teacher Tel is a nonkey attribute, and the Teacher Name is also a
nonkey atttribute. But Teacher Tel depends on Teacher Name. It is called
transitive dependency.

Remove Teacher Name and Teacher Tel together to create a new table.
Teacher Name

Sok Piseth

Sao Kanha
Teacher Tel

012 123 456


Done?
Oh no it in still not in 1MC vot
0977 322 111
StudyID
Course Name
T.ID
Chan Veasna
012 412 333
1
Database
T1
Chan Veasna
012 412 333
2
Database
T2
Pou Sambath
077 545 221
3
Web Prog
T3

Teacher Name
Teacher Tel
4
Web Prog
T3

Sok Piseth
012 123 456
5
Networking
T4

Sao Kanha
0977 322 111
M

Chan Veasna
012 412 333

Pou Sambath
077 545 221

Note about primary key:


T.ID Teacher Name
-

In theory, you can choose


T1
Sok Piseth

Teacher Name to be a primary key. But in practice, you


should add Teacher ID as the primary key.
T2
Sao Kanha
Teacher Tel

012 123 456


0977 322 111
3377 322 1**

T3
Chan Veasna
012 412 333

T4
Pou Sambath
077 545 221
What about this table?

ID
Name
DOB
Course
Payment
1
Sok
11/5/1990 IT
450
யN
2
Sao
4/4/1989 Mgt
400

3
Chan
7/7/1991 IT
450

4
Chan
7/7/1991 Mgt
400

5
Sok
11/5/1990 Mgt
300

6
Sao
4/4/1989 Tour
200

7
Sao
4/4/1989 Tour
200

In case of the above table, there is no 2NF because the primary key is only one attribute, not the
combined attributes.
Therefore, 2NF and move to 3NF.
you can skir
In 3NF, you must remove transitive dependency.

Both Name and DOB does not depend on ID. So remove them.

Both Course and Payment does not depend on ID. So remove them.
Student

ID
Name DOB

S1
Sok
11/5/1990
C1
Course

CourseID

IT
Course

S2
Chan
7/7/1991
C2
Mgt
S3
Sao
4/4/1989
C3
Tour

1
1

M
Payment
M

PID
SID
Course
Payment
1
S1
C1
$450
2
S3

3
$2

4
S2

5
$1

6
S2

7
S2
.. . . .ន
C2
$400

C3
$450
C2
$400
C2
$300
C3
$200
C3
$200
For the Payment table, it is not done yet.
It is a relationship between Student and Course.
M
Student
Payment
N

Course

PaymentID
Payment

Stop at 3NF

The most commonly used normal forms:


First Normal Form (1NF) Second Normal Form
(2NF) Third Normal Form (3NF)
Highest normalization is not always desirable
More JOINS are required
■ Affect data retrieval performance/high response time
For most business database design purposes, 3NF is as high as we need
to go in normalization process
Normalization in Real-World

When you newly create a table in a database tool, e.g. MS


Access, SQL Server, MySQL, or Oracle, you won't need all the
steps.
The mentioned tools help you to overcome the 1NF already.
◆The 2NF happens when the primary key is
combine attributes, e.g. StudentName + DOB. But to do so is
unpractical.
Mostly, you only use 3NF. Because it can
remove all transitive dependency.

Functional Dependency

A Bit More About Theory


Functional Dependencies

◆An important concept associated with


normalization is functional dependency which describes
the relationship between attributes.

20

Functional Dependencies
Functional dependency can be divided into two types:
• Full functional dependency/Partial dependency
(PD)
Will be used to transform 1NF → 2NF

Transitive dependency (TD)


Will be used to transform 2NF → 3NF

21

Functional Dependencies

Multivalued Attributes (or repeating groups): non- key attributes or


groups of non-key attributes the values of which are not uniquely
identified by (directly or indirectly) (not functionally dependent on) the value
of the Primary Key (or its part).
STUDENT

Stud ID
Name
Course ID
Units

101
Lennon
MSI 250
3.00
1st row

MSI 415
3.00
2nd row
125
Johnson
MSI 331
3.00

Relational Schema

STUDENT(Stud_ID, Name, (Course_ID, Units))


22

Functional Dependencies

Partial Dependency- when an non-key attribute is determined by a


part, but not the whole, of a
COMPOSITE primary key (The Primary Key must be a Composite Key).

CUSTOMER
Partial

Dependency

Cust ID Name Order ID

101
AT&T
1234
Cust_ID → Name
101
AT&T
156

125
Cisco 1250

23

Functional Dependencies
Transitive Dependency- when a non-key attribute determines another
non-key attribute.

Transitive Dependency

EMPLOYEE

Emp ID
F Name
L_Name Dept HD Dept Name
111
Mary
Jones
1
Acct

122
Sarah
Smith
2
Mktg

Dept_ID → Dept_Name

24
C

Functional Dependencies
Consider a relation with attributes A and B, where attribute B is functionally depends on
attribute A. Let say an A is a PK of R.

R(A,B) A → B
A
B is functionally depends on A
B

To describe the relationship between attributes A and B is to say that "A functionally
determines B".

25

Functional Dependencies
When a functional dependency exist, the attribute or group of attributes on
the left-handed side of the
arrow is called determinant.

Determinant:

Refers to the attributes, or a group of attributes, on the

left handed side of the arrow of a functional dependency.

A
A functionally
determines B
B

26
S21
27

staff

Functional Dependencies

staffNO sName position


Johan Manager
salary branchNo
3000 B005
0000000

S37
Ana
Assistant
1200 B003

S14
Daud
Supervisor
1800 B003

S9
Mary
Assistant
900 B007

$5
Siti
Manager
2400 B003

S41
Jani
Assistant
900 B005

branch
branchNO
bAddress

B005
123, Kepong
Determinant
B007
456, Nilai

B003
789, PTP

Functional Dependencies
Consider the attributes staffNO and position of the staff relation.

For a specific staffNO (S21), we can determine the position of that


member of staff as Manager. staffNO functionally determines
position.

position is functionally
staffNO
depends on staffNO
position

Staff number (S21)


Position (manager)

28

Functional Dependencies

However the next figure illustrate that the opposite is not true, as position
does not functionally determines staffNO.
A member of staff holds one position; however, they maybe several
members of staff with the same
staffNO does not functionally
position.
position
depends on position
staffNO

Position(manager)
staff number (S21)

staff number (S5) 29

Functional Dependencies

Partial Dependencies:
Full functional dependency indicates that if A and B are attributes of a relation,
B is fully functionally dependent on A, if B is functionally dependent on A, but
not on any proper subset of A.

staff(staffNO,sName, position, salary,branchNO)

staffNO, staffName→ branchNO

True!!! →each value of (staffNO, sName) is associated with


a

single value of branchNO.


→however, branchNO is also functionally dependent
on
staffNO.
30

Functional Dependencies
Transitive Dependencies:

staff(staffNO,sName, position, salary,*branchNO)


branch(branchNO,bAddress)

staffNOsName,position,salary, branchNO,bAddress

branchNO → bAddress

True for transitive dependency!!! → branchNO bAddress exists on staffNO via branchNO

31

Normalization Process

Formal technique for analyzing relations based on their Primary Key


(or candidate keys) and functional dependencies.
The technique executed as a series of steps (stage). Each step corresponds to a
specific normal form, that have specific characteristic.
Data Redundancies

1NF

2NF

3NF

As normalization proceeds, the relations become progressively more restricted (stronger) in


format and also less vulnerable to anomalies.
32

Normalization Process

Normalization Process
1)Repeat Group
(1 Table)
UNF
2)PK is not defined
Relation/Table Format -Have repeating group -PK not defined

1)Remove Repeat Group (1 or 2 Tables)


1NF
2)Defined PK
composite PK consist of attributes -No repeating group
-PK defined

-Test partial dependency


Test for partial dependency
If (exist)

a b→
TD) 1
2NF
(a→
TD) 2
(b →
TD) 3

(a, b→ x, y)

d)
(b
Az)
(c
→ d)
3NF
(3 or 4 Tables)
(2 or 3 Tables)
(more then 1 table)

Test for transitive dependency If (exist)


-No repeating group
-PK defined
-No partial dependency -Test transitive dependency

-No repeating group -PK defined


-No partial dependency -No transitive dependency

End of Chapter

Re-edited by:
Oum Saokosal

Master of Engineering in Information Systems,


Jeonju University, South Korea
012-252-752

[email protected]

You might also like