0% found this document useful (0 votes)
3 views

Lecture 7 Normalisation

Normalization is the process of organizing data to minimize duplication and create efficient relational databases. It aims to avoid anomalies during data operations such as updates, insertions, and deletions, and is structured through normal forms (1NF, 2NF, 3NF). Each normal form addresses specific dependencies and redundancies to ensure data integrity and flexibility.

Uploaded by

tinasherufudza18
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Lecture 7 Normalisation

Normalization is the process of organizing data to minimize duplication and create efficient relational databases. It aims to avoid anomalies during data operations such as updates, insertions, and deletions, and is structured through normal forms (1NF, 2NF, 3NF). Each normal form addresses specific dependencies and redundancies to ensure data integrity and flexibility.

Uploaded by

tinasherufudza18
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 33

Normalisation

Lecture 6
Normalisation
• Its the process of organising data to minimise
data duplication.
• It decomposes a large database table into
smaller manageable ones.
• Its aimed at creating an efficient, reliable,
flexible and appropriate relational databases.
• Its an example of a top-down design
The purpose of normalisation
• To put data into a format that conforms to
relational principle e.g single valued columns, each
relation represents one entity.
• Avoid redundancy by storing each ‘fact’ within the
database only once
• To put the data only into a form that is more able
to accommodate change.
• To avoid certain difficulties in updating (anomalies)
• To facilitate the enforcement of constraints on
data.
Anomalies
• There are basically three tuple database
operations, ie, add, delete and update tuple.
These operations are the ones that can result
in anomalies in a Database
• So we have
– Update anomalies
– insertion/addition anomalies
– deletion anomalies
Update anomalies

• Multiple copies of the same fact may lead to


update anomalies or inconsistencies when an
update is made and only some of the multiple
copies are updated.
• Thus any change to an attribute of a tuple
should be effected to all tuples relating to the
same tuple.
Insertion anomalies

• To insert a new record, the primary key field


cannot be left null, but for other fileds null
values can be accepted.
• Eg a new employee can be assigned a new
employee number before being assigned to a
department so the department attribute can
be left null.
Deletion anomalies

• This can result in the violation of


associations/links.
• It occurs when deleting a primary key value
from one table results in the deletion of all
associated records in other tables
Example
Insert Anomaly
• We cannot insert a prospective course which
does not have any registered student or we
cannot insert student details that is yet to
register for any course.
Update Anomaly
• If we want to update the course M4’s name
we need to do this operation three times.
• Similarly we may have to update student
1003’s name twice if it changes.
Delete Anomaly
• If we want to delete a course M4, in addition
to M4 occurs details, other critical details of
student also will be deleted.
• This kind of deletion is harmful to business.
• Moreover, M4 appears thrice in above table
and needs to be deleted thrice.
Duplicate Data
• Course M4’s data is stored thrice and student
1002’s data stored twice .
• This redundancy will increase as the number
of course offerings increases.
Normal forms
• Normalisation is achieved through data
structures called normal forms
• The first three normal forms are:
– First Normal Form (1NF)
– Second Normal form (2NF)
– Third Normal form (3NF)
Building Blocks
• Determinant : Attribute X can be defined as
determinant if it uniquely defines the value Y in a
given relationship or entity .
– To qualify as determinant, the attribute need NOT be a
key attribute .
• Usually, dependency of attribute is represented as X-
>Y ,which means attribute X determines attribute Y.
• Key attributes: In a given relationship R, if the
attribute X uniquely defines all other attributes, then
the X is a key attribute.
• Functional Dependency: In a given relation R, X
and Y are attributes. Y is functionally dependent on
X if each value of X determines exactly one value
of Y.
• Consider the following:
– REPORT (Student#, Course#, CourseName, IName,
Room#, Marks, Grade)
• Student#,Course# together (called composite
attribute) defines EXACTLY ONE value of marks
• This can be symbolically represented as
Student#Course# Marks
• Full functional dependency : In above
example Marks is fully functional dependent
on student#Course# and not on the sub set of
Student#Course# .
– This means marks cannot be determined either by
student # or Course# alone .
– It can be determined by using Student# and
Course# together.
– Hence Marks is fully functionally dependent on
student#course#.
• Transitive dependency is an indirect functional
dependency, one in which X→Z only by virtue
of X→Y and Y→Z.
• For example:

• Room# depends on IName and in turn depends on


Course# .Here Room# transitively depends on
Course#.
The definition of 1st normal
form
• A relation R is said to be in first normal
form (1NF) if and only if all the attributes
of the relation R, are atomic in nature.
INF
• To put a relation in 1NF:
– Create a separate table for each set of related
data.
– Eliminate repeating groups in individual tables.
– Identify each set of related data with a primary
key.
• 1NF for example table
Student(Student#, StudentName, DOB).

Course(Course#, CourseName,
Prerequisites, duration)

Results(Student#, Course#, DateOfexam,


Marks, Grade)
The definition of 2nd normal form
• A relation is said to be in Second Normal Form if
and only If:
– It is in the first normal form ,and
– No partial dependency exists between non-key
attributes and key attributes.
• based on the concept of full functional
dependency i.e. All its non-key attributes are
dependent on the whole key
• Create separate tables for sets of values that
apply to multiple records.
• Relate these tables with a foreign key.
• We see e see here in Student_Project relation
that the prime key attributes are Stu_ID and
Proj_ID. According to the rule, non-key
attributes, i.e. Stu_Name and Proj_Name must be
dependent upon both and not on any of the
prime key attribute individually. But we find that
Stu_Name can be identified by Stu_ID and
Proj_Name can be identified by Proj_ID
independently. This is called partial dependency,
which is not allowed in Second Normal Form.
• We break the relation in two as
depicted in the above picture to do
away with partial dependency.
• To make the example table 2NF
complaint, we have to remove all the
partial dependencies.
– StudentName and DateOfBirth depend
only on student#.
– CourseName,PreRequisite and
DurationInDays depends only on
Course#
– DateOfExam depends only on Course#.
Student(Student#, StudentName, DOB).

Course(Course#, CourseName,
Prerequisites, duration)

Results(Student#, Course#, Marks, Grade)

ExamDate(Course#, DateOfexam)
• In STUDENT, the key is Student# and all other non-
key attributes, StudentName and DateOfBirth are
fully functionally dependent on the key attribute.
• In COURSE, Course# is the key and all the non-key
attributes, CourseName, DurationInDays are fully
functional dependant on the key attribute.
• In RESULT Student#Course# together are key
attributes and all other non-key attributes, Marks
and Grade are fully functional dependent on the
key attributes.
• In the fourth Table (EXAM DATE) Course# is the
key and the non-key attribute, DateOfExam is fully
functionally dependent on the key attribute
The definition of 3rd normal form

• A relation R is said to be in 3NF if and only


if
– It is in 2NF
– No transitive dependency (where a non-key
attribute is dependent on another non-key
attribute) exists between non-key attributes
and key attributes.
• Transitive dependency: A->B, and B->C,
then A->C
3NF cont....
• Remove columns that are not dependent
upon the primary key.
• 3NF does not allow partial dependencies and
transitive dependencies
• We find that in above depicted
Student_detail relation, Stu_ID is key and
only prime key attribute. We find that City
can be identified by Stu_ID as well as Zip
itself. Neither Zip is a superkey nor City is a
prime attribute. Additionally, Stu_ID → Zip →
City, so there exists transitive dependency.
• We break the relation as above depicted
two relations to bring it into 3NF.
• BACK TO OUR EXAMPLE….
• In the RESULT table Student# and Course# are
the key attributes.
• All other attributes, except grade are non-
partially, non – transitively dependant on key
attributes.
• The grade attribute is dependent on “Marks
“and in turn “Marks” is dependent on
Student# Course#.
• To bring the table in 3NF we need to take off
this transitive dependency.
• Results(Student#, Course#, Marks)
• Grading(Grade#, UpperBound, LowerBound,
Grade)

You might also like