0% found this document useful (0 votes)
4 views

Week 5 Lecture

Uploaded by

Yathartha Stha
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Week 5 Lecture

Uploaded by

Yathartha Stha
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 61

Switch off your Mobiles Phones

or
Change Profile to Silent Mode
Normalisation
Topics
• Introduction
• Why Normalisation
• Functional and Transitive Dependency
• Normalisation
• Unnormalized Form (UNF)
• First Normal Form (1NF)
• Second Normal Form (2NF)
• Third Normal Form (3NF)
• Anomalies (Insert, Delete, Update)
• Advantages and Disadvantages
Introduction
• Normalisation is the process of organizing data in a database. It includes creating entities/relations and establishing
relationships between tables according to rules designed both to protect the data and make database more flexible by
eliminating redundancy and inconsistent dependency.
• Normalisation process is to get data into simple form that truly reflects separate entity types, their attributes and
relationships between them to avoid unnecessary duplication of data
• Starts from pre-documented sets of attributes and tries to group and regroup them without causing data
inconsistencies in such a way that Anomalies are avoided
Anomalies
• Anomalies are undesirable side effects that can occur if
relations are not in proper normal form.
• Anomalies fall into three categories:
• Insertion Anomaly:
• Adding new rows forces user to create duplicate data
or occurs when we cannot insert new row into relation
because some or all of Primary Key value is not
known.
For Example, If we try to insert a record in STUDENT_COURSE
with STUD_NO =7, it will not allow.
Anomalies
• Update Anomaly:
• Changing data in row forces changes to other rows
because of duplication.
• Update: occurs when we have unnecessary redundancy
in the data and we are forced to update several rows.
• Deletion Anomaly:
• Deleting rows may cause loss of data that would be
For Example, If we try to UPDATE in table
needed for other future rows
STUDENT_COURSE, course name, computer networks we
have to update all the coursename with that value.
OR in case of deleting a record from STUDENT with
STUD_NO =1, it will not allow.
Primary Key
Anomalies
Idea: Table/Relation Student_Info have entire data about student
Result: Entire branch data of Course must be repeated for every student
of the branch.
Data Redundancy: When some data is stored multiple times
unnecessarily in
a database table

Disadvantages:

1. Insertion, Deletion and Update Anomalies

2. Data Inconsistencies
Primary Key
Anomalies
Insertion Anomaly: If we try to insert a new course with course_id C03,
Multimedia, it will not allow if there are no students registered to that
course.
Update Anomaly: If we must update the credit point value of Computing
from 120 to 160, we must change the value in every row of Computing.

Deletion Anomaly: If we try to delete the record of a student S04, it will also

remove the entire record of course C02 which is Networking.


Why Normalisation
• Database Design must be efficient (performance-wise).
• Amount of data should be reduced if possible.
• Design should be free of Update, Insertion and Deletion
Anomalies.
• Design must comply with rules regarding Relational Databases.
• Design must show relevant Relationship between Entities.
• Design should permit simple retrieval, simplify data maintenance
and reduce need to restructure data.
Functional Dependencies (FD)
• Normalisation is based on analysis of Functional
Dependence.
• Functional Dependency is particular Relationship between
two Attributes.
• Attribute B is Functionally Dependent upon Attribute A (or
a collection of attributes) if a value of A determines a single
value of B at any one time.
• Value of one Attribute (collection of attributes) determines
value of another Attribute
Functional Dependencies (FD)
• Notation for this Functional Dependency is: A 🡪 B
• Notation is read “A determines B” or “B is Functionally Dependent on A”.
• A is called Determinant and B is called object of Dependent.
• Composite Determinant is made up of more than one attribute: X, Y 🡪 Z
Functional Dependencies (FD)
• Two terms relevant with Composite Determinants. Example X,Y --> Z
• Full Functional Dependency (FFD)
• If it is necessary to use all attributes of Composite Determinant to identify its object uniquely, we have Full
Functional Dependency.
• Partial Functional Dependency (PFD)
• Dependency exists if it is necessary to use only subset of Attributes of a Composite Determinant to identify
object uniquely.
FD - An Example
Student-id Activity Cost Proficiency
(Level)
100 Squash 200 A
■ FFD = student-id, activity 🡪 proficiency
100 Swimming 100 B
■ PFD = activity 🡪 cost
150 Swimming 100 B
175 Scuba 300 L
175 Aerobics 200 I
200 Squash 200 A
200 Swimming 100 B
Transitive Dependency (TD) Student-id Accomodation Fee
• Transitive Dependency exists when there is an intermediate 100 Perkin 1100
dependency. Assume three attributes A, B and C. Further assume 150 Gatehouse 1200
200 Gatehouse 1200
that the following functional dependencies exist
250 Perkin 1100
• A 🡪 B, B 🡪 C
300 Ingleside 1500
• Then it can be stated that the following transitive dependency
■ FD = student-id 🡪 accommodation
also holds ■ FD = accommodation 🡪 fee
• A🡪B🡪C ■ TD = student-id 🡪 accommodation 🡪 fee
Transitive Dependency (TD)
Steps in Normalisation
Steps in Normalisation
• Step 1: Un-Normalised Form (UNF)
• All attributes with repeating groups included.
• Step 2: First Normal Form (1NF)
• Any repeating groups have been separated to new relation/table, so that there is a single value at the intersection of each row and column of the
relation/table.
• Step 3: Second Normal Form (2NF)
• Any Partial Functional Dependencies have been separated to a new relation.
• Step 4: Third Normal Form (3NF)
• Any Transitive Dependencies have been separated to a new relation.
• Note:
• If Relation meets the criteria for 3NF, it also meets criteria for 2NF and 1NF. Most design problems can be avoided if Relations are in 3NF.)
Un-normalised Normal Form (UNF)
• A relation is considered unnormalised if no normalisation rules have been applied to it, resulting in various anomalies.
• Sources of un-normalised data are:
• Computer Screen Layouts
• Reports/Bills
• Computer Programs
• User Manuals
An Order Form – Example
Un-normalised Normal Form
(UNF)
Order-n Order-date Cust-n Cust-name Cust-address Prod-no Prod-desc Unit-price Ord-qty Line-total Order-total
o o
0057435 11-JAN-06 1489 Arthur Smith 1 Lime Avenue T5060 Lock 5.00 5 25 70.00
Anytown ZZ52
5QA
PT42 Alarm 20.00 1 20.00 70.00

QZE248 Key 2.50 10 25.00 70.00


Un-normalised Normal Form (UNF)
Order-n Order-dat Cust-n Cust-name Cust-address Prod-no Prod-desc Unit-price Ord-qty Line-total Order-tota
l
o e o
0057435 11-JAN-06 1489 Arthur Smith 1 Lime Avenue T5060 Lock 5.00 5 25 70.00
Anytown ZZ52
5QA
Data Redundancy
0057435 11-JAN-06 1489 Arthur Smith 1 Lime Avenue PT42 Alarm 20.00 1 20.00 70.00
Anytown ZZ52
5QA
0057435 11-JAN-06 1489 Arthur Smith 1 Lime Avenue QZE248 Key 2.50 10 25.00 70.00
Anytown ZZ52
5QA
Un-normalised Normal Form (UNF)
Order-no Order-date Cust-no Cust-name Cust-address Prod-no Prod-desc Unit-pric Ord-qty Line-total Order-tot
e al
0057435 11-JAN-06 1489 Arthur Smith 1 Lime Avenue T5060,PT4 Lock, 5,20,2.50 5,1,10 25,20,25 70.00
Anytown ZZ52 5QA 2, QZE248 Alarm,Key
Repeating group identification in UNF
● Repeating Group : It refers to an attribute or a group of attributes in an unnormalized table that
can have multiple values associated with a single instance/occurrence of the nominated key
attribute (primary key) of the table.
Un-normalised Normal Form (UNF)
Repeating Group
Order-no Order-date Cust-no Cust-name Cust-address Prod-no Prod-desc Unit-price Ord-qty Line-total Order-total

0057435 11-JAN-06 1489 Arthur Smith 1 Lime Avenue T5060,PT4 Lock, 5,20,2.50 5,1,10 25,20,25 70.00
Anytown ZZ52 5QA 2, QZE248 Alarm, Key
Un-normalised Normal Form
(UNF)
Repeating Group
Order-no Order-date Cust-no Cust-name Cust-address Prod-no Prod-des Unit-price Ord-qty Line-total Order-tot
c al
0057435 11-JAN-06 1489 Arthur Smith 1 Lime Avenue T5060 Lock 5.00 5 25 70.00
Anytown ZZ52 5QA
0057435 11-JAN-06 1489 Arthur Smith 1 Lime Avenue PT42 Alarm 20.00 1 20.00 70.00
Anytown ZZ52 5QA
0057435 11-JAN-06 1489 Arthur Smith 1 Lime Avenue QZE248 Key 2.50 10 25.00 70.00
Anytown ZZ52 5QA
Applying UNF
• Write down all attributes from un-normalised form above inside small bracket and name this entity.
• Choose suitable unique identifier (PK) for this entity
• Show Repeating Group within { }.

ORDER (order-no, order-date, cust-no, cust-name, cust-address, {prod-no, prod-desc,


unit-price, ord-qty, line-total}, order-total)
ORDER
First Normal Form (1NF)
1. Repeating group should be removed to separate Relation (entity) and name that relation (entity)
2. Assign a key attribute from the repeating group of attributes which has been separated to a new relation.
3. Also, carry forward the key attribute from the UNF to the new relation.
4. 1NF restriction is built into Relational Model
5. Advantages of 1NF are simplicity and uniform access
Applying 1NF
UNF
ORDER (order-no, order-date, cust-no, cust-name, cust-address, {prod-no, prod-desc, unit-price, ord-qty, line-total}, order-total)

ORDER-1 (order-no, order-date, cust-no, cust-name, cust-address, order-total)

ORDER-LINE-1 (order-no, prod-no, prod-desc, unit-price, ord-qty, line-total)

Contains
ORDER ORDER-LINE
Second Normal Form (2NF)
• Attributes that are wholly dependent on only part of Composite identifier should be removed to a separate Relation
• Prohibits situation where each row represents single-valued facts about more than one object.
• Partial FDs on an identifier should be avoided because they result in data redundancy.
NOTE:
• Relation is in 2NF if
• It is in 1NF, and
• All non-key attributes are Fully Functionally Dependent on Primary Key and not on only a part (portion) of Primary Key.
Second Normal Form (2NF)
• Steps to transform into 2NF
1. Identify all Functional Dependencies (FFD + PFD) in 1NF.
2. Attributes that are wholly dependent on only part of Composite Primary Key (PFD) should be removed to a
separate Relation/entity
3. Make each Determinant Primary Key of new Relation.
Identifying PFDs
• List all combination of Composite Determinant (Primary Key) and Part of Composite Determinant (Primary Key)
• How are non-key attributes dependent on determinants?
• order-no, prod-no 🡪 ord-qty, line-total
• prod-no 🡪 prod-desc, unit-price Separate relation
• order-no 🡪 X
ORDER ORDER-LINE PRODUCT

Applying 2NF
ORDER-1 (order-no, order-date, cust-no, cust-name, cust-address, order-total)
ORDER-LINE-1 (order-no, prod-no, prod-desc, unit-price, ord-qty, line-total)

ORDER-2 (order-no, order-date, cust-no, cust-name, cust-address, order-total)

ORDER-LINE-2 (order-no, prod-no, ord-qty, line-total)

PRODUCT-2 (prod-no, prod-desc, unit-price)


Second Normal Form (2NF)
• Relation with only single attribute key are already in 2NF
• A 🡪 b, c, d
• Relation with Composite key may not be in 2NF, as there may be some PDs
• X, Y 🡪 l, m, n

• Relation that is in First Normal Form will be in Second Normal Form if any one of following conditions apply:
• Primary Key consists of only one attribute (such as the attribute ORDER-NO in ORDER).
• No non-key attributes exist in Relation.
• Every non-key attribute is Functionally Dependent on full set of Primary Key attributes
Third Normal Form (3NF)
• Relation is in 3NF if • Attributes that are wholly dependent upon another attribute should be
• It is in 2NF, and removed to separate Relation.
• No Transitive Dependencies. • Like 2NF, but now we consider FDs on non-key attributes only and do not
• Transitive Dependencies are when worry about Key.
• A🡪B🡪C • Transitive Dependencies should be avoided because they result in data
• Thus it can be split into redundancy.
• A 🡪 B and
• B 🡪 C.
Third Normal Form (3NF)
• Steps to transform into 3NF
1. Create one relation for each Determinant in Transitive Dependency.
2. Make Determinants Primary Keys in their respective relations.
3. Include as non-key attributes to those attributes that depend on Determinant
• order-no 🡪 cust-no 🡪 cust-name, cust-address Transitive dependency
• order-no 🡪 order-date, order-total, cust-no
• cust-no 🡪 cust-name, cust-address Separate relation
Applying 3NF
ORDER-2 (order-no, order-date, cust-no, cust-name, cust-address, order-total)
ORDER-LINE-2 (order-no, prod-no, ord-qty, line-total)
PRODUCT-2 (prod-no, prod-desc, unit-price)

ORDER-3 (order-no, order-date, cust-no*, order-total)


CUSTOMER-3 (cust-no, cust-name, cust-address)
ORDER-LINE-3 (order-no, prod-no, ord-qty, line-total)
PRODUCT-3 (prod-no, prod-desc, unit-price)
Third Normal Form (3NF)
• Entities which are all-key OR have only single non-key attribute are already in 3NF
• A, B 🡪
• A🡪b
• Entities with more than one non-key attribute may not be in 3NF, as there may be some TDs
• X, Y 🡪 l, m, n
• X 🡪 l, m
ER Model Representation
• Final list of 3NF entities can be represented by following ER model:
Eliminating Redundancy
Project Allocation Example
Person
Project Code Project Type Description Name Grade Salary Scale Date-join Project Alloc-time
Number
IC5001 New Dev Develop Claims System 2146 Jones A1 4 1/11/20xx 24
3145 Smith A2 4 2/10/20xx 24

6126 Black B1 9 7/11/20xx 18

1214 Brown A2 4 3/10/20xx 12

8191 Green A1 4 12/11/20xx 18


PAY22 Maint Maintain Payments 6142 Jacks A2 4 9/11/20xx 6
3169 White B2 10 4/11/20xx 12
6145 Dean B3 10 8/10/20xx 6
Project Allocation Example
Person
Project Code Project Type Description Name Grade Salary Scale Date-join Project Alloc-time
Number
2146, Jones,Ada
IC5001 New Dev Develop Claims System A1,B1,A1 4,10,4 1/11/20xx 24
2147,2134 m,Anish

PAY22 Maint Maintain Payments 6142 Jacks A2 4 9/11/20xx 6


3169 White B2 10 4/11/20xx 12
6145 Dean B3 10 8/10/20xx 6
Un-Normalised Form
Un-Normalised Form (UNF)
Project
• Write down all attributes from the form above Project-code
Project-type
and name this entity.
Description
• Choose a suitable unique identifier for this entity {Person-number
• Show Repeating Group within { }. Name
Grade
Salary-scale
Date-join-project
Alloc-time}
Normalisation from UNF to 1NF
First NF
Un-Normalised Form
• Remove Repeating Group Project
Project Project-code
to form new Relation, Project-code Project-type
name it. Project-type Description
Description
• Carry forward Unique
{Person-number Project-Allocation
Identifier to this Relation. Name Project-code
• Choose unique identifier Grade Person-number
Salary-scale Name
for this new Relation. Date-join-project Grade
Alloc-time} Salary-scale
Date-join-project
Alloc-time
Second NF
Normalisation from 1NF to 2NF Project
Project-code
First NF Project-type
• Find and Separate Partial Description
Project Functional Dependency
Project-code
Project-Allocation
Project-type
Project-code
Description Project-code, Person-number 🡪 Person-number
Date-join-project, Alloc-time Date-join-project
Project-Allocation
Alloc-time
Project-code
Person-number
Person
Name
Person-number
Grade
Name
Salary-scale
Grade
Date-join-project Person-number 🡪 Name, Salary-scale
Alloc-time Grade, Salary-scale
Project-code 🡪
Normalisation from 2NF to 3NF
Second NF Third NF
Project Project
• Find and Separate non-key Dependency (Transitive) Project-code
Project-code
Project-type
Project-type Description
Description
Project-Allocation
Project-Allocation Project-code
Project-code Person-number
Person-number Date-join-project
Date-join-project Alloc-time
Person-number🡪 Name
Alloc-time Person-number 🡪 Grade 🡪 Salary-scale Person
Person-number
Person Name
Person-number Grade
Name
Grade Grade-Sal
Salary-scale Grade
Person-number🡪 Grade Salary-scale
Grade 🡪 Salary-scale
Normalisation from UNF to 3NF
UNF 1 NF 2 NF 3 NF
Project Project Project Project
Project-code Project-code Project-code Project-code
Project-type Project-type Project-type Project-type
Description Description Description Description
{Person-number
Name Project-Allocation Project-Allocation Project-Allocation
Grade Project-code Project-code Project-code
Salary-scale Person-number Person-number Person-number
Date-join-project Name Date-join-project Date-join-project
Alloc-time} Grade Alloc-time Alloc-time
Salary-scale
Date-join-project Person Person
Alloc-time Person-number Person-number
Name Name
Grade Grade
Salary-scale
Grade-Sal
Grade
Salary-scale
Project Allocation - ER Model
• Final list of 3NF Relation can be represented by following ER model

scheduled PROJECT
PROJECT
ALLOCATION

works - on

belongs - in
GRADE-SAL PERSON
Avoiding Anomalies using 2NF
MODULE-RESULT
Student-id Module-id Module-title Module-level Grade
S001 CSC100 Introduction to Computing Certificate P
S002 CSC100 Introduction to Computing Certificate D
S001 CSC200 Web Development Intermediate P
S003 CSC200 Web Development Intermediate F
S001 ACC200 Accounting Part 1 Certificate P
S004 ACC201 Accounting Part 2 Advanced D
S005 HIS200 History Advanced P
• Assume that student cannot take module more than once.
• What normal form is MODULE-RESULT in currently ?
Avoiding Anomalies in 2NF
• INSERTION anomaly can occur when attempting to enter details of a new course, DB100 Databases.
• DELETION anomaly can occur when student S004 decides to drop the ACC201 module.
• UPDATE anomaly can occur if necessary to change module title of ‘Web Development’ to ‘Website Development’.
• Use of 2NF rule allows us to avoid all above anomalies.
Avoiding Anomalies using 2NF
MODULE-RESULT
Student-id Module-id Grade MODULE
S001 CSC100 P Module-id Module-title Module-level
S002 CSC100 D CSC100 Introduction to Computing Certificate
S001 CSC200 P CSC200 Web Development Intermediate
S003 CSC200 F ACC200 Accounting Part 1 Certificate
S001 ACC200 P ACC201 Accounting Part 2 Advanced
S004 ACC201 D HIS200 History Advanced
S005 HIS200 P
Avoiding Anomalies using 3NF
LECTURER
Location
Lecturer-id Lecturer-name Department Salary
W01 Emma Greg Computing 35,000 Moorgate
S01 Wendy Holder Computing 42,000 Moorgate
D01 Amy King Accounting 27,000 Staples
J02 Bob Jones Computing 29,000 Moorgate
N01 Bob Whales Accounting 36,500 Staples
J01 Jack Nelson History 41,500 Lewiston

• Assume that all lecturers within department are located at same place.
• What normal form is LECTURER in currently ?
Avoiding Anomalies in 3NF
• INSERTION anomaly can occur when attempting to create a new English department.
• DELETION anomaly can occur when Professor Nelson retires.
• UPDATE anomaly can occur if necessary to change the location of a department e.g. Computing moves from Moorgate to
Billingsgate.
• Use of 3NF rule allows us to avoid all above anomalies.
Avoiding Anomalies using 3NF
LECTURER
Lecturer-id Lecturer-name Department Salary
W01 Emma Greg Computing 35,000
S01 Wendy Holder Computing 42,000
LECTURER-LOCATION
Location
D01 Amy King Accounting 27,000 Department
J02 Bob Jones Computing 29,000 Computing Moorgate
N01 Bob Whales Accounting 36,500 Accounting Staples
J01 Jack Nelson History 41,500 History Lewiston
Example – UNF and 1NF
• UNF
• Grade_report(StudNo, StudName,{Major, Adviser, {CourseNo, Ctitle, InstrucName, InstructLocn, Grade}})
• 1NF - Remove Repeating Groups
• Student(StudNo, StudName)
• StudMajor(StudNo, Major, Advisor)
• StudCourse(StudNo, Major, CourseNo, Ctitle, InstrucName, InstructLocn, Grade)
Example – 2NF
• 2NF - Remove Partial Dependencies
• Student(StudNo, StudName)
• StudMajor(StudNo, Major, Advisor)
• StudCourse(StudNo, Major, CourseNo, Grade)
• Course(CourseNo, Ctitle, InstrucName, InstructLocn)
Example – 3NF
• 3NF Remove Transitive Dependencies
• Student(StudNo, StudName)
• StudMajor(StudNo, Major, Advisor)
• StudCourse(StudNo, Major, CourseNo, Grade)
• Course(CourseNo, Ctitle, InstrucName)
• Instructor(InstrucName, InstructLocn)
Normalisation Normalisation
Advantages Disavantages
• Data interdependencies are identified • Facilitates update but not retrieval
• Data can be grouped into related sets • Requires real understanding of business rules
• Data is easier to maintain • Not a full-proof technique
• Anomalies and resulting redundancy is eliminated. • Normalised entities can be unnatural
• Full normalisation not always possible
Summary of Dependencies
• Types of dependency between attributes

Attributes Dependency
Key Non - Key Functional
Part of Key Non - Key Partial Functional
Non - Key Non - Key Transitive

• Our aim for key attribute is to determine all other non-key attributes.
• Therefore, only first type of dependency is desirable.
• Normalisation ensures that entities are decomposed so that there is only Functional Dependency.
ER Design and Normalisation
• When an E-R diagram is carefully designed, identifying all entities correctly, the tables generated from the E-R diagram should not
need further normalization.
• However, in a real (imperfect) design there can be FDs ER Design and Normalisation from non-key attributes of an entity to other
attributes of the entity
• E.g. Emp entity with attributes dept-num and dept-add, and an FD dept-num 🡪 dept-add
• Good ER Design would have made department an entity
ER Design and Normalisation
ER MODELING NORMALISATION
Top-down approach Bottom-up approach
Analysis of entities vs Analysis of attributes
Intuitive technique Formal technique
Produces simple Structures Produces complex Structures
• Suggested Method: Do conceptual design using ER Modeling then when converting to a logical model use normalization as a validation
technique to ensure that all entities are in 3NF.
Any Questions?

You might also like