0% found this document useful (0 votes)
11 views

EEI3266_DS4

Unit 3 covers database analysis and design, focusing on the relational data model, data normalization, and its normal forms. It explains key concepts such as relations, attributes, domains, and the importance of normalization to avoid data anomalies like insertion, deletion, and update anomalies. The document outlines the process of normalization through various normal forms, emphasizing the need for reducing redundancy and ensuring data integrity.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

EEI3266_DS4

Unit 3 covers database analysis and design, focusing on the relational data model, data normalization, and its normal forms. It explains key concepts such as relations, attributes, domains, and the importance of normalization to avoid data anomalies like insertion, deletion, and update anomalies. The document outlines the process of normalization through various normal forms, emphasizing the need for reducing redundancy and ensuring data integrity.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

Unit 3: Database Analysis and Design

Session 8: Relational Data Model

Session 9: Data Normalization and the normal forms


The relational data model describes the world as “a collection of inter-related
relations (or tables).”

Structure of a Relation
• Relational database - set of relations
• Relation - set of attributes
• Each relation is a table with a name.
• An attribute is a column heading. An attribute is used to define a row
(record) and a record contains a set of attributes.
• The heading is the schema of the relation
Students(StuNo, Name, Age, GPA)
Relation/Table name
Attribute/Column name
Students
StuNo Name Age GPA
1002 Ruwani A. 24 3.2
1005 Harsha N. 23 3.9 Row/Record/Tuple
1020 Maya M. 25 3.7 (represents characteristic
of the relation)

Column/Attribute/Field
(represents a group of related data values )

Records and fields form the basis of all relational databases.


Domain
The domain of an attribute is the set of values that the attribute can take.
A domain is usually represented by a data type.
e.g. Student No text: size 9
Name text: size 25
Age number
Date of Birth date

Degree
The number of attributes in a relation is known as the degree of that relation

Cardinality
The number of tuples in a relation is known as the cardinality of that relation
Tables in a relational data model have the following major characteristics:
• A table name is distinct from all other tables in the database.
• There are no duplicate rows – distinct tuples.
• Values per columns is atomic (no repeating or multivalued attributes)
• Values of each column are from the same domain based on their data type:
• Number (numeric, integer, float,…)
• Character (string)
• Date
• Logical (true or false)
• Each attribute has a unique/distinct name.
Redundancy means having multiple copies of same data in the database. This
problem arises when a database is not normalized.

The values of attributes College, Course and Rank are repeated.

Redundancy leads to problems which are known as data anomalies.


Data anomalies are problems that can occur in poorly planned databases
where all data and/or information are stored in one single file.

There are three types of data anomalies;


• insertion anomaly
• deletion anomaly
• update anomaly
Insertion Anomaly

This problem happens when the insertion of a data record is not possible
without adding some additional unrelated data to the record or inserting
inconsistent information into a table.

e.g. If a student detail needs to be inserted whose course is not being


decided yet that insertion will not be possible till the time course is
decided for student.
Deletion Anomaly

This anomaly happens when deletion of a data record results in losing some
unrelated information that was stored as part of the record that gets deleted
from the table.

It is not possible to delete some information without loosing some other


information in the table as well.

e.g. If the details of students in the table are deleted then the details of
college will also get deleted.
Update Anomaly

If updates do not occur at all the places, then database will be in inconsistent
state.

e.g. Suppose if the rank of the college changes. Then changes will have to be all
over the database which will be time-consuming and computationally costly.
FD in relational databases defines relationship between two sets of attributes typically
between the Primary Key and other non-key attributes within a table.

An attribute, B has a functional dependency on another attribute, A, if for any two


records, which have the same value for A, then the values for B in these two records
must be the same.

That is for every valid instance of A, that value of A uniquely determines the value of B

Functional dependency between A and B is illustrated as A→B


Examples:
Determinant Dependent
SIN —-> Name, Address, Birthdate
SIN, Course —> DateCompleted
ISBN —–> Title
Suppose we keep track of employee email addresses, and we only track one email
address for each employee. Suppose each employee is identified by their unique
employee number.

There is a functional dependency of email address on employee number

e.g. employee number → email address

EmpNum is the determinant of


the Dependent EmpEmail

If EmpNum is the PK then the FDs:

Dependency Diagram
must exist.
Consider attributes A, B, and C, and where,
A → B and B → C.
Functional dependencies are transitive, which means that we also have the
functional dependency A→C
We say that C is transitively dependent on A through B.

EmpNum → DeptNum
DeptName is transitively
dependent on EmpNum
via DeptNum

EmpNum → DeptName
A partial dependency exists when an attribute B is functionally dependent on an
attribute A, and A is a component of a composite primary key.

Composite key: {InvNum, LineNum}

InvDate is partially dependent on {InvNum, LineNum} as InvNum is a determinant of


InvDate and InvNum is part of the composite primary key.
Normalization is the derivation of data as a set of non-redundant, consistent and inter-
dependent relations. It is a process of decomposing unsatisfactory relations into smaller
relations
Reasons why Normalization is necessary

• To reduce data storage space


• To reduce inconsistency of data
• Reduce update cost
• Remove many-to-many relationships
• Improve flexibility of the system
• To avoid data Anomalies (Insert, Delete and Update anomalies)
Normal Forms

A state of a relation that results from applying simple rules regarding functional
dependencies (or relationships between attributes) to that relation.

There are four normal forms: first, second, third, and Boyce-Codd normal forms
1NF, 2NF, 3NF, and BCNF

There is a sequence to normal forms:


1NF is considered the weakest, Also,
2NF is stronger than 1NF, any relation that is in BCNF, is in 3NF;
3NF is stronger than 2NF, and any relation in 3NF is in 2NF; and
BCNF is considered the strongest any relation in 2NF is in 1NF.
0NF multi-valued attributes exists
1NF any multi-valued attributes have been removed
2NF any partial functional dependencies have been removed
3NF any transitive dependencies have been removed
BCNF any remaining anomalies that result from functional dependencies have
been removed. We consider a relation in BCNF to be fully normalized.

A design that has a lower normal form than another design has more redundancy.
Uncontrolled redundancy can lead to data integrity problems.

Higher normal forms have less redundancy, and as a result, fewer update problems
Apply a set of normalisation
rules to all the attributes of the
entity types identified in the
data requirement step.

Output of the Normalization


Process

A list of normalized entity types


in at least third normal form
(3NF), such that all non-key
attributes of each entity type
fully depend on the whole key
and nothing but the key
A relation is in 1NF if all values stored in the relation are single-valued and
atomic.

1NF places restrictions on the structure of relations that all values must be
simple.

E.g. 1. The following in not in 1NF

EmpDegrees is a multi-valued attribute:


employee 679 has two degrees: BSc and MSc
employee 333 has three degrees: BA, BSc, PhD
1. Use multiple tuples,
one per value
2. Use multiple
columns, one per value
3. Use separate tables
To obtain 1NF relations we must, without loss of information, replace the
above with two relations
Purchase_Order Relation in 0NF
Purchase_Order( PO-NO, PO-DATE, EMP-CODE, SUPP-NO, SUPP-NAME, PART-NO,
PART-DESC, PART-QTY)

Within a single
purchase order
there are
several part
numbers, part
descriptions
and part
quantities.
Hence, parts
ordered can be
decomposed.
Purchase Order relations in 1NF
Purchase Order relations in 1NF has following anomalies

1.INSERT PROBLEM
cannot know available parts until an order is
placed (e.g. P4 is bush)

2.DELETE PROBLEM
loose information of part P7 if we cancel
purchase order 115
(i.e. Delete PO-PART for Part No P7)

3.UPDATE PROBLEM:
to change description of Part P3 we need to
change every tuple in PO-PART containing Part
No P3
A relation is in 2NF if it is in 1NF, and every non-key attribute is fully dependent on
the whole key. (There are no any partial functional dependencies)

• 2NF (and 3NF) both involve the concepts of key and non-key attributes.
• A key attribute is any attribute that is part of a key; any attribute that is not a
key attribute, is a non-key attribute.
• A relation in 2NF will not have any partial dependencies
• A non-key field cannot be a fact about a subset of a key
• 2NF is relevant when the key is composite, i.e. consists of several attributes
Consider the PO-Part Relation (Parts Ordered)
in 1NF
PO-PART( PO-NO, PART-NO, PART-DESC,
PART-QTY)

Part Description is depended only on Part No,


which is part of the key of PO-PART.

If entity has a composite key

1)Check each attribute against the whole key

2)Remove attribute and partial key to new


relation

3)Optimise relations
Parts Ordered Relations in 2NF
Purchase Order Relations in 2NF
Purchase Order Relations in 2NF has following anomalies

1.INSERT PROBLEM
cannot know available suppliers until an order is placed (e.g. 200 is
hardware stores)

2.DELETE PROBLEM
loose information of supplier 100 if we cancel purchase order 116
(i.e. Delete PO for Supplier No 100)

3.UPDATE PROBLEM
to change name of Supplier 222 we
need to change every tuple in PO
containing Supplier No 222
A relation is in 3NF if it is in 2NF and each non-key attribute is only dependent on
the whole key, and not dependent on any non-key attribute i.e. no transitive
dependencies.

• Deals with the relationship between non-key fields


• A non-key field cannot be a fact about another non-key field

Purchase Order Relation in 2NF

PO( PO-NO, PO-DATE, EMP-CODE,


SUPP-NO, SUPP-NAME)

Supplier name is a non-key field


depended on another non-key field
(supplier no) in addition to be
depended on the key PO-No
1) Check each non-key attribute for dependency against other non-key fields

2) Remove attribute depended on another non-key attribute from relation

3) Create new relation comprising the attribute and non-key attribute which it depends on

4) Determine key of new relation

5) Optimise
Purchase Order (PO) and SUPPLIER Relations in 3NF
A relation is in BCNF if it is in 3NF and for any dependency A ->B, A should be a super
key. i.e. A cannot be a non-prime attribute with B being a prime attribute

• BCNF deals with candidate keys. A relation is in BCNF if every determinant is a


candidate key.

• BCNF is an extension to the 3NF having additional constraints to 3NF.

Super Key is an attribute (or set of attributes) that is used to uniquely identify all
attributes in a relation.

Primary Key is a minimal set of attribute (or set of attributes) that is used to
uniquely identify all attributes in a relation. A key (primary key) is a super key with
the additional property that removal of any attributes from the key will not satisfy
the key condition.
Let's assume there is a company where employees work in more than one department

Employee

Functional Dependencies

1.EMP_ID → EMP_COUNTRY
2.EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}

Employee table is not in BCNF as EMP_DEPT, a non-prime attribute depends


on DEPT_TYPE, EMP_DEPT_NO
To convert the given table into BCNF, we decompose it into three tables.
Student_Enrollment Table

• One student can enroll for multiple subjects. (e.g. student 101 takes both Java
and C++ subjects).
• For each subject, a professor is assigned to the student.
• There can be multiple professors teaching one subject like we have for Java.
• In the Student_Enrollment table student_id, subject together form the
primary key, because using student_id and subject, we can find all the
columns of the table.

• Also, one professor teaches only one subject, but one subject may have
two different professors.

• Hence, there is a dependency between subject and professor here,


where subject depends on the Professor name.
• This table satisfies the 1st Normal form because all the values are atomic,
column names are unique, and all the values stored in a particular
column are of the same domain.

• This table also satisfies the 2nd Normal Form as there is no Partial
Dependency.

• There is no Transitive Dependency, hence the table also satisfies the 3rd
Normal Form.

• But this table is not in


Boyce-Codd Normal Form.
• In the Student_Enrollment table, student_id, subject form primary key, which
means subject column is a prime attribute.

• But, there is one more dependency, professor → subject.

• And while subject is a prime attribute, professor is a non-prime attribute,


which is not allowed by BCNF.

• To make this relation(table) satisfy BCNF, it needs to be decomposed into two


tables, student table and professor table.
Student_Enrollment table decomposed into two tables
- Student table and Professor table (in Boyce-Codd Normal Form).

You might also like