EEI3266_DS4
EEI3266_DS4
Structure of a Relation
• Relational database - set of relations
• Relation - set of attributes
• Each relation is a table with a name.
• An attribute is a column heading. An attribute is used to define a row
(record) and a record contains a set of attributes.
• The heading is the schema of the relation
Students(StuNo, Name, Age, GPA)
Relation/Table name
Attribute/Column name
Students
StuNo Name Age GPA
1002 Ruwani A. 24 3.2
1005 Harsha N. 23 3.9 Row/Record/Tuple
1020 Maya M. 25 3.7 (represents characteristic
of the relation)
Column/Attribute/Field
(represents a group of related data values )
Degree
The number of attributes in a relation is known as the degree of that relation
Cardinality
The number of tuples in a relation is known as the cardinality of that relation
Tables in a relational data model have the following major characteristics:
• A table name is distinct from all other tables in the database.
• There are no duplicate rows – distinct tuples.
• Values per columns is atomic (no repeating or multivalued attributes)
• Values of each column are from the same domain based on their data type:
• Number (numeric, integer, float,…)
• Character (string)
• Date
• Logical (true or false)
• Each attribute has a unique/distinct name.
Redundancy means having multiple copies of same data in the database. This
problem arises when a database is not normalized.
This problem happens when the insertion of a data record is not possible
without adding some additional unrelated data to the record or inserting
inconsistent information into a table.
This anomaly happens when deletion of a data record results in losing some
unrelated information that was stored as part of the record that gets deleted
from the table.
e.g. If the details of students in the table are deleted then the details of
college will also get deleted.
Update Anomaly
If updates do not occur at all the places, then database will be in inconsistent
state.
e.g. Suppose if the rank of the college changes. Then changes will have to be all
over the database which will be time-consuming and computationally costly.
FD in relational databases defines relationship between two sets of attributes typically
between the Primary Key and other non-key attributes within a table.
That is for every valid instance of A, that value of A uniquely determines the value of B
Dependency Diagram
must exist.
Consider attributes A, B, and C, and where,
A → B and B → C.
Functional dependencies are transitive, which means that we also have the
functional dependency A→C
We say that C is transitively dependent on A through B.
EmpNum → DeptNum
DeptName is transitively
dependent on EmpNum
via DeptNum
EmpNum → DeptName
A partial dependency exists when an attribute B is functionally dependent on an
attribute A, and A is a component of a composite primary key.
A state of a relation that results from applying simple rules regarding functional
dependencies (or relationships between attributes) to that relation.
There are four normal forms: first, second, third, and Boyce-Codd normal forms
1NF, 2NF, 3NF, and BCNF
A design that has a lower normal form than another design has more redundancy.
Uncontrolled redundancy can lead to data integrity problems.
Higher normal forms have less redundancy, and as a result, fewer update problems
Apply a set of normalisation
rules to all the attributes of the
entity types identified in the
data requirement step.
1NF places restrictions on the structure of relations that all values must be
simple.
Within a single
purchase order
there are
several part
numbers, part
descriptions
and part
quantities.
Hence, parts
ordered can be
decomposed.
Purchase Order relations in 1NF
Purchase Order relations in 1NF has following anomalies
1.INSERT PROBLEM
cannot know available parts until an order is
placed (e.g. P4 is bush)
2.DELETE PROBLEM
loose information of part P7 if we cancel
purchase order 115
(i.e. Delete PO-PART for Part No P7)
3.UPDATE PROBLEM:
to change description of Part P3 we need to
change every tuple in PO-PART containing Part
No P3
A relation is in 2NF if it is in 1NF, and every non-key attribute is fully dependent on
the whole key. (There are no any partial functional dependencies)
• 2NF (and 3NF) both involve the concepts of key and non-key attributes.
• A key attribute is any attribute that is part of a key; any attribute that is not a
key attribute, is a non-key attribute.
• A relation in 2NF will not have any partial dependencies
• A non-key field cannot be a fact about a subset of a key
• 2NF is relevant when the key is composite, i.e. consists of several attributes
Consider the PO-Part Relation (Parts Ordered)
in 1NF
PO-PART( PO-NO, PART-NO, PART-DESC,
PART-QTY)
3)Optimise relations
Parts Ordered Relations in 2NF
Purchase Order Relations in 2NF
Purchase Order Relations in 2NF has following anomalies
1.INSERT PROBLEM
cannot know available suppliers until an order is placed (e.g. 200 is
hardware stores)
2.DELETE PROBLEM
loose information of supplier 100 if we cancel purchase order 116
(i.e. Delete PO for Supplier No 100)
3.UPDATE PROBLEM
to change name of Supplier 222 we
need to change every tuple in PO
containing Supplier No 222
A relation is in 3NF if it is in 2NF and each non-key attribute is only dependent on
the whole key, and not dependent on any non-key attribute i.e. no transitive
dependencies.
3) Create new relation comprising the attribute and non-key attribute which it depends on
5) Optimise
Purchase Order (PO) and SUPPLIER Relations in 3NF
A relation is in BCNF if it is in 3NF and for any dependency A ->B, A should be a super
key. i.e. A cannot be a non-prime attribute with B being a prime attribute
Super Key is an attribute (or set of attributes) that is used to uniquely identify all
attributes in a relation.
Primary Key is a minimal set of attribute (or set of attributes) that is used to
uniquely identify all attributes in a relation. A key (primary key) is a super key with
the additional property that removal of any attributes from the key will not satisfy
the key condition.
Let's assume there is a company where employees work in more than one department
Employee
Functional Dependencies
1.EMP_ID → EMP_COUNTRY
2.EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
• One student can enroll for multiple subjects. (e.g. student 101 takes both Java
and C++ subjects).
• For each subject, a professor is assigned to the student.
• There can be multiple professors teaching one subject like we have for Java.
• In the Student_Enrollment table student_id, subject together form the
primary key, because using student_id and subject, we can find all the
columns of the table.
• Also, one professor teaches only one subject, but one subject may have
two different professors.
• This table also satisfies the 2nd Normal Form as there is no Partial
Dependency.
• There is no Transitive Dependency, hence the table also satisfies the 3rd
Normal Form.