Week 5 Lecture
Week 5 Lecture
or
Change Profile to Silent Mode
Normalisation
Topics
• Introduction
• Why Normalisation
• Functional and Transitive Dependency
• Normalisation
• Unnormalized Form (UNF)
• First Normal Form (1NF)
• Second Normal Form (2NF)
• Third Normal Form (3NF)
• Anomalies (Insert, Delete, Update)
• Advantages and Disadvantages
Introduction
• Normalisation is the process of organizing data in a database. It includes creating entities/relations and establishing
relationships between tables according to rules designed both to protect the data and make database more flexible by
eliminating redundancy and inconsistent dependency.
• Normalisation process is to get data into simple form that truly reflects separate entity types, their attributes and
relationships between them to avoid unnecessary duplication of data
• Starts from pre-documented sets of attributes and tries to group and regroup them without causing data
inconsistencies in such a way that Anomalies are avoided
Anomalies
• Anomalies are undesirable side effects that can occur if
relations are not in proper normal form.
• Anomalies fall into three categories:
• Insertion Anomaly:
• Adding new rows forces user to create duplicate data
or occurs when we cannot insert new row into relation
because some or all of Primary Key value is not
known.
For Example, If we try to insert a record in STUDENT_COURSE
with STUD_NO =7, it will not allow.
Anomalies
• Update Anomaly:
• Changing data in row forces changes to other rows
because of duplication.
• Update: occurs when we have unnecessary redundancy
in the data and we are forced to update several rows.
• Deletion Anomaly:
• Deleting rows may cause loss of data that would be
For Example, If we try to UPDATE in table
needed for other future rows
STUDENT_COURSE, course name, computer networks we
have to update all the coursename with that value.
OR in case of deleting a record from STUDENT with
STUD_NO =1, it will not allow.
Primary Key
Anomalies
Idea: Table/Relation Student_Info have entire data about student
Result: Entire branch data of Course must be repeated for every student
of the branch.
Data Redundancy: When some data is stored multiple times
unnecessarily in
a database table
Disadvantages:
2. Data Inconsistencies
Primary Key
Anomalies
Insertion Anomaly: If we try to insert a new course with course_id C03,
Multimedia, it will not allow if there are no students registered to that
course.
Update Anomaly: If we must update the credit point value of Computing
from 120 to 160, we must change the value in every row of Computing.
Deletion Anomaly: If we try to delete the record of a student S04, it will also
0057435 11-JAN-06 1489 Arthur Smith 1 Lime Avenue T5060,PT4 Lock, 5,20,2.50 5,1,10 25,20,25 70.00
Anytown ZZ52 5QA 2, QZE248 Alarm, Key
Un-normalised Normal Form
(UNF)
Repeating Group
Order-no Order-date Cust-no Cust-name Cust-address Prod-no Prod-des Unit-price Ord-qty Line-total Order-tot
c al
0057435 11-JAN-06 1489 Arthur Smith 1 Lime Avenue T5060 Lock 5.00 5 25 70.00
Anytown ZZ52 5QA
0057435 11-JAN-06 1489 Arthur Smith 1 Lime Avenue PT42 Alarm 20.00 1 20.00 70.00
Anytown ZZ52 5QA
0057435 11-JAN-06 1489 Arthur Smith 1 Lime Avenue QZE248 Key 2.50 10 25.00 70.00
Anytown ZZ52 5QA
Applying UNF
• Write down all attributes from un-normalised form above inside small bracket and name this entity.
• Choose suitable unique identifier (PK) for this entity
• Show Repeating Group within { }.
Contains
ORDER ORDER-LINE
Second Normal Form (2NF)
• Attributes that are wholly dependent on only part of Composite identifier should be removed to a separate Relation
• Prohibits situation where each row represents single-valued facts about more than one object.
• Partial FDs on an identifier should be avoided because they result in data redundancy.
NOTE:
• Relation is in 2NF if
• It is in 1NF, and
• All non-key attributes are Fully Functionally Dependent on Primary Key and not on only a part (portion) of Primary Key.
Second Normal Form (2NF)
• Steps to transform into 2NF
1. Identify all Functional Dependencies (FFD + PFD) in 1NF.
2. Attributes that are wholly dependent on only part of Composite Primary Key (PFD) should be removed to a
separate Relation/entity
3. Make each Determinant Primary Key of new Relation.
Identifying PFDs
• List all combination of Composite Determinant (Primary Key) and Part of Composite Determinant (Primary Key)
• How are non-key attributes dependent on determinants?
• order-no, prod-no 🡪 ord-qty, line-total
• prod-no 🡪 prod-desc, unit-price Separate relation
• order-no 🡪 X
ORDER ORDER-LINE PRODUCT
Applying 2NF
ORDER-1 (order-no, order-date, cust-no, cust-name, cust-address, order-total)
ORDER-LINE-1 (order-no, prod-no, prod-desc, unit-price, ord-qty, line-total)
• Relation that is in First Normal Form will be in Second Normal Form if any one of following conditions apply:
• Primary Key consists of only one attribute (such as the attribute ORDER-NO in ORDER).
• No non-key attributes exist in Relation.
• Every non-key attribute is Functionally Dependent on full set of Primary Key attributes
Third Normal Form (3NF)
• Relation is in 3NF if • Attributes that are wholly dependent upon another attribute should be
• It is in 2NF, and removed to separate Relation.
• No Transitive Dependencies. • Like 2NF, but now we consider FDs on non-key attributes only and do not
• Transitive Dependencies are when worry about Key.
• A🡪B🡪C • Transitive Dependencies should be avoided because they result in data
• Thus it can be split into redundancy.
• A 🡪 B and
• B 🡪 C.
Third Normal Form (3NF)
• Steps to transform into 3NF
1. Create one relation for each Determinant in Transitive Dependency.
2. Make Determinants Primary Keys in their respective relations.
3. Include as non-key attributes to those attributes that depend on Determinant
• order-no 🡪 cust-no 🡪 cust-name, cust-address Transitive dependency
• order-no 🡪 order-date, order-total, cust-no
• cust-no 🡪 cust-name, cust-address Separate relation
Applying 3NF
ORDER-2 (order-no, order-date, cust-no, cust-name, cust-address, order-total)
ORDER-LINE-2 (order-no, prod-no, ord-qty, line-total)
PRODUCT-2 (prod-no, prod-desc, unit-price)
scheduled PROJECT
PROJECT
ALLOCATION
works - on
belongs - in
GRADE-SAL PERSON
Avoiding Anomalies using 2NF
MODULE-RESULT
Student-id Module-id Module-title Module-level Grade
S001 CSC100 Introduction to Computing Certificate P
S002 CSC100 Introduction to Computing Certificate D
S001 CSC200 Web Development Intermediate P
S003 CSC200 Web Development Intermediate F
S001 ACC200 Accounting Part 1 Certificate P
S004 ACC201 Accounting Part 2 Advanced D
S005 HIS200 History Advanced P
• Assume that student cannot take module more than once.
• What normal form is MODULE-RESULT in currently ?
Avoiding Anomalies in 2NF
• INSERTION anomaly can occur when attempting to enter details of a new course, DB100 Databases.
• DELETION anomaly can occur when student S004 decides to drop the ACC201 module.
• UPDATE anomaly can occur if necessary to change module title of ‘Web Development’ to ‘Website Development’.
• Use of 2NF rule allows us to avoid all above anomalies.
Avoiding Anomalies using 2NF
MODULE-RESULT
Student-id Module-id Grade MODULE
S001 CSC100 P Module-id Module-title Module-level
S002 CSC100 D CSC100 Introduction to Computing Certificate
S001 CSC200 P CSC200 Web Development Intermediate
S003 CSC200 F ACC200 Accounting Part 1 Certificate
S001 ACC200 P ACC201 Accounting Part 2 Advanced
S004 ACC201 D HIS200 History Advanced
S005 HIS200 P
Avoiding Anomalies using 3NF
LECTURER
Location
Lecturer-id Lecturer-name Department Salary
W01 Emma Greg Computing 35,000 Moorgate
S01 Wendy Holder Computing 42,000 Moorgate
D01 Amy King Accounting 27,000 Staples
J02 Bob Jones Computing 29,000 Moorgate
N01 Bob Whales Accounting 36,500 Staples
J01 Jack Nelson History 41,500 Lewiston
• Assume that all lecturers within department are located at same place.
• What normal form is LECTURER in currently ?
Avoiding Anomalies in 3NF
• INSERTION anomaly can occur when attempting to create a new English department.
• DELETION anomaly can occur when Professor Nelson retires.
• UPDATE anomaly can occur if necessary to change the location of a department e.g. Computing moves from Moorgate to
Billingsgate.
• Use of 3NF rule allows us to avoid all above anomalies.
Avoiding Anomalies using 3NF
LECTURER
Lecturer-id Lecturer-name Department Salary
W01 Emma Greg Computing 35,000
S01 Wendy Holder Computing 42,000
LECTURER-LOCATION
Location
D01 Amy King Accounting 27,000 Department
J02 Bob Jones Computing 29,000 Computing Moorgate
N01 Bob Whales Accounting 36,500 Accounting Staples
J01 Jack Nelson History 41,500 History Lewiston
Example – UNF and 1NF
• UNF
• Grade_report(StudNo, StudName,{Major, Adviser, {CourseNo, Ctitle, InstrucName, InstructLocn, Grade}})
• 1NF - Remove Repeating Groups
• Student(StudNo, StudName)
• StudMajor(StudNo, Major, Advisor)
• StudCourse(StudNo, Major, CourseNo, Ctitle, InstrucName, InstructLocn, Grade)
Example – 2NF
• 2NF - Remove Partial Dependencies
• Student(StudNo, StudName)
• StudMajor(StudNo, Major, Advisor)
• StudCourse(StudNo, Major, CourseNo, Grade)
• Course(CourseNo, Ctitle, InstrucName, InstructLocn)
Example – 3NF
• 3NF Remove Transitive Dependencies
• Student(StudNo, StudName)
• StudMajor(StudNo, Major, Advisor)
• StudCourse(StudNo, Major, CourseNo, Grade)
• Course(CourseNo, Ctitle, InstrucName)
• Instructor(InstrucName, InstructLocn)
Normalisation Normalisation
Advantages Disavantages
• Data interdependencies are identified • Facilitates update but not retrieval
• Data can be grouped into related sets • Requires real understanding of business rules
• Data is easier to maintain • Not a full-proof technique
• Anomalies and resulting redundancy is eliminated. • Normalised entities can be unnatural
• Full normalisation not always possible
Summary of Dependencies
• Types of dependency between attributes
Attributes Dependency
Key Non - Key Functional
Part of Key Non - Key Partial Functional
Non - Key Non - Key Transitive
• Our aim for key attribute is to determine all other non-key attributes.
• Therefore, only first type of dependency is desirable.
• Normalisation ensures that entities are decomposed so that there is only Functional Dependency.
ER Design and Normalisation
• When an E-R diagram is carefully designed, identifying all entities correctly, the tables generated from the E-R diagram should not
need further normalization.
• However, in a real (imperfect) design there can be FDs ER Design and Normalisation from non-key attributes of an entity to other
attributes of the entity
• E.g. Emp entity with attributes dept-num and dept-add, and an FD dept-num 🡪 dept-add
• Good ER Design would have made department an entity
ER Design and Normalisation
ER MODELING NORMALISATION
Top-down approach Bottom-up approach
Analysis of entities vs Analysis of attributes
Intuitive technique Formal technique
Produces simple Structures Produces complex Structures
• Suggested Method: Do conceptual design using ER Modeling then when converting to a logical model use normalization as a validation
technique to ensure that all entities are in 3NF.
Any Questions?