Week 6
Week 6
Week 6 Lecture 1
Class BSCCS2001
Materials
Module # 26
Type Lecture
Week # 6
A systematic approach of decomposing tables to eliminate data redundancy and undesirable characteristics
Insertion Anomaly
Update Anomaly
Deletion Anomaly
Redundancy refers to the repetition of same data or duplicate copies of the same data stored in different locations
Ensuring the data dependencies make sense, that is, data is logically stored
Anomalies
Update Anomaly: Employee 519 is shown as having different addresses on different records
Week 6 Lecture 1 1
a) Update: (ID, Address), (ID, Skill)
Insertion Anomaly: Until the new faculty member, Dr. Newsome, is assigned to teach at least one course, his details
cannot be recorded
Deletion Anomaly: All information about Dr. Giddens is lost if he temporarily ceases to be assigned to any courses
Week 6 Lecture 1 2
Normalization and Normal Forms
A normal form specifies a set of of conditions that the relational schema must satisfy in terms of its constraints — they
offer varied levels of guarantee for the design
Informally, a relational DB relation is often described as "normalized" if it meets the 3NF (Third Normal Form)
Most 3NF are free from insertion, update and deletion anomalies
Week 6 Lecture 1 3
1NF: Possible Redundancy
Example: Supplier (SID, Status, City, PID, Qty)
Supplier
S1 30 Delhi P1 100
S1 30 Delhi P2 125
S1 30 Delhi P3 200
S1 30 Delhi P4 130
S2 10 Karnal P1 115
S2 10 Karnal P2 250
S3 40 Rohtak P1 245
S4 30 Delhi P4 300
S4 30 Delhi P5 315
Drawbacks:
Deletion Anomaly: If we delete <S3, 40, Rohtak, P1, 245>, then we lose the information that S3 lives in Rohtak
Insertion Anomaly: We cannot insert a Supplier S5 located in Karnal, until S5 supplies at least one part
Update Anomaly: If Supplier S1 moves from Delhi to Kanpur, then it is difficult to update all the tuples having SID as
S1 and City as Delhi
Week 6 Lecture 1 4
Corresponding Y value would duplicate also
R is in 1NF and
R contains no Partial Dependency
Partial Dependency:
Let R be a relational schema and X, Y , A be the attribute sets over R where X : Any Candidate Key, Y : Proper subset
of Candidate Key and A: Non-prime attribute
Students
S1 A C
S1 A C++
S2 B C++
S2 B DB
S3 A DB
Redundancy?
Sname
Anomaly?
Yes
Hotel?
Trivago
Functional Dependencies:
SID → Sname
Week 6 Lecture 1 5
Partial Dependencies:
SID → Sname (as SID is a Proper Subset of Candidate Key {SID, Cname})
Key Normalization
R1 R2
S1 A S1 C
S2 B S1 C++
S3 A S2 C++
S2 DB
{SID}: Primary Key
S3 DB
1. Lossless Join
2. 2NF
3. Dependency Preserving
Supplier
S1 30 Delhi P1 100
S1 30 Delhi P2 125
S1 30 Delhi P3 200
S1 30 Delhi P4 130
S2 10 Karnal P1 115
S2 10 Karnal P2 250
S3 40 Rohtak P1 245
S4 30 Delhi P4 300
S4 30 Delhi P5 315
Partial Dependencies:
SID → Status
SID → City
Post Normalization
Week 6 Lecture 1 6
Drawbacks:
Deletion Anomaly: If we delete a tuple in Sup_City, then we not only lose the information about a supplier, but also
lose the status value of a particular city
Insertion Anomaly: We cannot insert a City and its status until a supplier supplies at least one part
Update Anomaly: If the status value for a city is unchanged, then we will face the problem of searching every tuple
for that city
R should be in 2NF
R should not contain transitive dependencies (OR, Every non-prime attribute of R is non-transitively dependent
on every day of R)
[Carlo Zaniolo, 1982] Alternately, R is in 3NF iff for each of its functional dependency X → A, at least one of the
following conditions holds:
X is a superkey or
Every element of A — X, the set difference between A and X, is a prime attribute (ie. each attribute of A — X is
contained in some candidate key)
[Simple Statement] A relational schema R is in 3NF if for every FD X → A associated with R either
A transitive dependency can occur only in a relation that has 3 or more attributes
Let A, B and C designate 3 distinct attributes (or distinct collections of attributes) in the relation
A→B
B→C
Then the functional dependency A → C (which follows from1 and 3 by the axiom of transitivity) is a transitive
dependency
Week 6 Lecture 1 7
Example of transitive dependency
The functional dependency {Book} → {Author Nationality} applies; that is, if we know the book, we know the author's
nationality
Furthermore:
{Book} → {Author}
Transitive dependency occurred because a non-key attribute (Author) was determining another non-key attribute
(Author Nationality)
Twenty Thousand Leagues Under the Sea Science Fiction Jules Verne French
Journey to the Center of the Earth Science Fiction Jules Verne French
3NF: Example
3NF: Example #2
Relation dept_advisor (s_ID, i_ID, dept_name)
R is in 3NF
i_ID → dept_name
Week 6 Lecture 1 8
X is a superkey of R or
3NF: Redundancy
There is some redundancy in this schema
R = (J, L, K)
F = {JK → L, L → K}
(i_ID, dept_name)
Need to use null values (for example, to represent the relationship l2 , k2 where there is no corresponding value for J)
Week 6 Lecture 1 9
📚
Week 6 Lecture 2
Class BSCCS2001
Materials
Module # 27
Type Lecture
Week # 6
Solution: Define a weaker normal form, call Third Normal Form (3NF)
But functional dependencies can be checked on individual relations without computing a join
Week 6 Lecture 2 1
If α is not a superkey, we have to verify if each attribute in β is contained in a candidate key of R
This test is rather more expensive, since it involve finding candidate keys
Algorithm:
i := 0
for each functional dependency α → β in Fc do
if none of the schemas Rj , 1 ≤ j ≤ i contains αβ
then begin
i := i + 1
Ri := αβ
end
if none of the schemas Rj , 1 ≤ j ≤ i contains a candidate key for R
then begin
i := i + 1
Ri := any candidate key for R;
end
Decomposition is ...
Dependency Preserving
Lossless Join
Week 6 Lecture 2 2
employee_id → branch_name
(employee_id, branch_name)
(customer_id, branch_name, employee_id)
Observing that (customer_id, employee_id, type) contains a candidate key of the original schema, so no further
relation schema needs be added
At the end of for loop, detect and delete schemas, such as (employee_name, branch_name), which are subsets of
other schemas
Result will not depend on the order in which FDs are considered
Simplified test: To check if a relation schema R is in BCNF, it suffices to check only the dependencies in the given set
F for violation of BCNF, rather than checking all dependencies in F +
If none of the dependencies in F cause a violation in BCNF, then none of the dependencies in F + will cause a
violation of BCNF either
However, simplified test using only F is incorrect when testing a relation in a decomposition of R
Neither of the dependencies in F contain only attributes from (A, C, D, E) so we might be mislead into thinking
R2 satisfies BCNF
In fact, dependency AC → D in F + shows R2 is not in BCNF
Either test Ri for BCNF w.r.t. the restriction of F to Ri (that is, all FDs in F + that contain only attributes from Ri )
Or use the original set of dependencies F that hold on R, but with the following test:
Week 6 Lecture 2 3
For every set of attributes α ⊆ Ri , check that α+ (the attribute closure of α) either includes no attribute of Ri
— α or includes all attributes of Ri
R (A, B, C, D)
F = {A → B, B → C, C → D, D → A}
A → B is preserved on table R1
B → C is preserved on table R2
C → D is preserved on table R3
F ′ = F1 ∪ F2 ∪ F3
Checking for: D → A in F ′+
On projections:
In this algo F1, F2, F3 are not the closure sets, rather the sets of dependencies directly applicable on R1, R2, R3
respectively
(D) + /F1 = D. (D) + /F2 = D. (D) + /F3 = D. So, D → A could not be preserved
Therefore, the polynomial time algorithm may not work in case of all examples
To prove preservation, Algo 2 is sufficient but not necessary whereas Algo 1 is both sufficient as well as necessary
NOTE: This difference in result can occur in any example where a functional dependency of one decomposed table
uses another functional dependency in its closure which is not applicable on any of the decomposed table because of
the absence of all attributes in the table
Week 6 Lecture 2 4
BCNF Decomposition: Algorithm
For all dependencies A → B in F + , check if A is a superkey
Create R1 = AB
Create R2 = (R — (B — A))
Similarly F 2+
result := {R};
done := false;
compute F + ;
while (not done) do
F = {A→ B
B → C}
Key = {A}
Decomposition
R1 = (B, C)
R2 = (A, B)
Functional dependencies:
BCNF Decomposition:
Week 6 Lecture 2 5
We replace class by:
course is in BCNF
R = (J, K, L)
F = {JK → L
L → K}
Two candidate keys = JK and JL
R is not in BCNF
JK → L
This implies that testing for JK → L requires a join
It is always possible to decompose a relation into a set of relations that are in BCNF such that:
Week 6 Lecture 2 6
📚
Week 6 Lecture 3
Class BSCCS2001
Materials
Module # 28
Type Lecture
Week # 6
The specification document of the LIS has already been shared with you
The coding of various queries in SQL, based on these schema are left as exercise
Books are regularly issued by members on loan and returned after a period
The library needs an LIs to manage the books, the members and the issue-term process
Week 6 Lecture 3 1
Every book has
title
author (in case of multiple authors, only the first author is mentioned)
publisher
year of publication
accession number (which is the unique number of the copy of the book in the library)
Undergraduate students
Post-graduate students
Research scholars
Faculty members
Name
Roll number
Department
Gender
Mobile number
Date of Birth
Degree
Undergrad
Grad
Doctoral
Name
Employee ID
Department
Gender
Mobile number
Date of Joining
Every member has a max quota for the number of books he/she can issue for the maximum duration allowed to her/him
A book may be issued to a member if it is not already issued to someone else (trivial)
A book may not be issued to a member if another copy of the same book is already issued to the same member
Week 6 Lecture 3 2
No issue will be done to a member if at the time of issue one or more of the books issued by the member has already
exceeded its duration of issue
No issue will be allowed also if the quota is exceeded for the member
It is assumed that the name of every author or member has two parts
First name
Last name
Add / Remove / Edit quota for a category of member, duration for a category of member
Check if the library has a book given its title (part of title should match)
Check if a copy of a book (given its ISBN) is available with the library for issue
and so on ...
Entity set:
books
Attributes:
title
author_name (composite);
publisher
year
ISBN_no
accession_no
Entity Set:
students
Attributes
member_no - is unique
name (composite)
Week 6 Lecture 3 3
roll_no - is unique
department
gender
dob
degree
Entity Set:
faculty
Attributes:
member_no - is unique
name (composite)
id - is unique
department
gender
doj
undergraduate students
research scholars
faculty members
Entity Set:
members
Attributes:
member_no
Entity Set:
quota
Attributes:
member_type
max_books
Week 6 Lecture 3 4
max_duration
Entity Set:
staff
name (composite)
id - is unique
gender
mobile_no
doj
LIS Relationships
Books are regularly issued by members on loan and returned after a period
The library needs an LIS to manage the books, the members and the issue-return process
Relationship
book_issue
member_no
books
accession_no
Relationship Attribute
Type of relationship
students (member_no, student_fname, student_lname, roll_no, department, gender, mobile_no, dob, degree)
accession_no → ISBN_no
Key: accession_no
Good to normalize:
Week 6 Lecture 3 5
ISBN_no → title, author_fname, author_lname, publisher, year
Key: ISBN_no
accession_no → ISBN_no
Key: accession_no
Both in BCNF
In BCNF
Key: member_type
In BCNF
member_no → member_type
Key: menber_no
In BCNF
member_no → roll_no
roll_no → member_no
In BCNF
Issues:
Week 6 Lecture 3 6
id → faculty_fname, faculty_lname, department, gender, mobile_no, doj
id → member_no
member_no → id
2 Keys: id | member_no
In BCNF
Issues:
Get the name of the member who has issued the book having accession number = 162715
Attributes:
member_no
student IS A members
faculty IS A members
Types of relationship
One-to-one
Get the name of the member who has issued the book having accession number = 162715
Week 6 Lecture 3 7
SELECT
((SELECT faculty_fname as First_Name, faculty_lname as Last_Name
FROM faculty
WHERE member_class = 'faculty' AND members.id = faculty.id)
UNION
(SELECT student_fname as First_Name, student_lname as Last_Name
FROM students
WHERE member_class = 'student' AND members.roll_no = students.roll_no))
FROM members, book_issue
WHERE accession_no = 162715 AND book_issue.member_no = members.member_no;
member_type → member_class
Key: member_no
Keys: roll_no
Note:
member_type and member_class are set in members from degree at the time of creation of a new record
Keys: id
Note:
member_type and member_class are set in members at the time of creation of a new record
Week 6 Lecture 3 8
📚
Week 6 Lecture 4
Class BSCCS2001
Materials
Module # 29
Type Lecture
Week # 6
There are no non-trivial FDs because all attributes are combined forming Candidate Key, that is, MDP
Man ↠ Phones
Man ↠ Dog_Like
But, after converting the above relation in Single Valued Attribute, each of a man's phone appears with each of the dogs
they like in all combinations
Week 6 Lecture 4 1
MVD
If two or more independent relations are kept in a single relation, then Multi-valued Dependency is possible
If we kept them in a single relation named Student_Course, then MVD will exist because of m:n Cardinality
If two or more MVDs exist in a relation, then while converting into SVAs, MVD exists
Suppose we record names of the children, and phone numbers for the instructors
Example data:
MVD: Definition
Let R be a relation schema and let α ⊆ R and β ⊆ R
Week 6 Lecture 4 2
The multi-valued dependency α ↠ β holds on R if in any legal relation r(R), for all pairs of tuples t1 and t2 in r
such that t1 [α] = t2 [α], there exist tuples t3 and t4 in r such that:
t3 [β] = t1 [β]
t3 [R − β] = t2 [R − β]
t4 [β] = t2 [β]
t4 [R − β] = t1 [R − β]
Example: A relation of university courses, the books recommended for the course, and the lecturers who will be teaching
the course:
course ↠ book
course ↠ lecturer
Let R be a relation schema with a set of attributes that are partitioned into 3 non-empty subsets
Y, Z, W
We say that Y ↠ Z (Y multidetermines Z) if and only if for all possible relations r(R) < y1 , z1 , w1 >∈ r and <
y1 , z2 , w2 >∈ r
< y1 , z1 , w2 >∈ r and < y1 , z2 , w1 >∈ r
Note that since the behaviour of Z and W are identical it follows that
Y ↠ Z if Y ↠ W
In our example:
ID ↠ child_name
ID ↠ phone_number
The above formal definition is supposed to formalize the notion that given a particular value of Y(ID) it has associated with
it a set of values of Z (child_name) and a set of values of W (phone_number) and these two sets are in some sense
independent of each other
NOTE:
IF Y → Z, then Y ↠ Z
MVD: Use
Week 6 Lecture 4 3
We use multi-valued dependencies in 2 ways:
To test relations to determine whether they are legal under a given set of functional and multivalued dependencies
We shall thus concern ourselves only with the relations that satisfy a given set of functional and multivalued
dependencies
If a relation r fails satisfy a given multivalued dependency, we can construct a relation r ′ that does satisfy the
multivalued dependency by adding tuples to r
MVD: Theory
Y is a subset of X (X ⊇ Y) or
X∪Y=R
Otherwise, it is a non-trivial MVD and we have to repeat values redundantly in the tuples
From the definition of multi-valued dependency we can derive the following rule:
If α → β , then α ↠ β
That is, every functional dependency is also a multi-valued dependency
The closure D + of D is the set of all functional and multi-valued dependencies logically implied by D
We can compute D + from D, using the formal definitions of functional dependencies and multi-valued
dependencies
We can manage with such reasoning for very simple multi-valued dependencies, which seem to be most common
in practice
For complex dependencies, it is better to reason about sets of dependencies using a system of inference rules
Decomposition of 4NF
Fourth Normal Form (4NF)
A relation schema R is in 4NF w.r.t. a set D of functional and multi-valued dependencies if for all multi-valued
dependencies in D + of the form α ↠ β , where α ⊆ R and β ⊆ R, at least one of the following hold:
α ↠ (β ∩ Ri )
where α ⊆ Ri and α ↠ β is in D+
Week 6 Lecture 4 4
For all dependencies A ↠ B in D+ , check if A is a superkey
By using attribute closure
if not, then
Create R2 = (R - (B - A))
Similarly D2+
result := {R};
done := false;
compute D + ;
begin
Week 6 Lecture 4 5
Week 6 Lecture 4 6
📚
Week 6 Lecture 5
Class BSCCS2001
Materials
Module # 30
Type Lecture
Week # 6
BCNF / 4NF
Lossless join
Dependency preservation
Interestingly, SQL does not provide a direct way of specifying functional dependencies other than superkeys
Can specify FDs using assertions, but they are expensive to test (and currently not supported by any of the widely
used DB)
Even if we had a dependency preserving decomposition, using SQL we could not be able to efficient test a functional
dependency whose left hand side is not a key
Week 6 Lecture 5 1
Essential Tuple Normal Form (ETNF)
Lead to project-join normal form (PJNF) (also called Fifth Normal Form)
A class of even more general constraints, leads to a normal form called Domain-Key Normal Form
Problem with these generalized constraints: are hard to reason with, and no set of sound and complete set of
inference rules exist
R could have been generated when converting E-R diagram to a set of tables
R could have been a single relation containing all attributes that are of interest (universal relation)
R could have been the result of some ad hoc design of relations, which we then test/convert to normal form
However, in a real (imperfect) design there can be functional dependencies from non-key attributes of an entity to
other attributes of the entity
department_name → building
Functional dependencies from non-key attributes of a relationship set possible, but rare — most relationships are
binary
For example, displaying prereqs along with course_id, and title requires join of course with prereq
Alternative #1: Use denormalized relation containing attributes of course as well as prereq with all above attributes:
Course (course_id, title, prereq, ...)
faster lookup
extra coding work for programmers and possibility of error in extra code
Benefits and drawbacks same as above, except no extra coding work for programmers and avoids possible errors
Week 6 Lecture 5 2
Instead of earnings (company_id, year, amount), use
Above are in BCNF, but make querying across years difficult and needs new table each year
Also in BCNF, but also makes querying across years difficult and requires new attribute each year
is an example of crosstab, where values for one attribute become column names
book_title
book_catalogue, author_lname: A book_title may be associated with more than one author
book_catalogue
book_title ↠ edition
Week 6 Lecture 5 3
Temporal Databases
Some data may be inherently historical because they include time-dependent / time-varying data, such as:
Medical Records
Judicial Records
Share prices
Exchange rates
Interest rates
Company profits
etc.
The desire to model such data means that we need to store not only the respective value but also an associated data
or a time period for which the value is valid
Temporal DB provides a uniform and systematic way of dealing with historical data
Temporal Data
Temporal data have an association time interval during which the data is valid
In practice, DB engineers may add start and end time attributes to relations
Week 6 Lecture 5 4
Constraint: no 2 tuples can have overlapping valid times and are hard to enforce efficiently
Foreign key references may be to current version of data, or to data at a point in time
For example: student transcript should refer to the course information at the time the course was taken
Discrete or dense
Bounded or unbounded
Linear or non-linear
Timestamp Model
Temporal Logic
TQuel [1987]
TSQL2 [1995]
SQL/Temporal [1996]
SQL/TP [1997]
Valid Time: Time period during which a fact is true in the real world, provided to the system
Transaction Time: Time period during which a fact is stored in the DB, based on transaction serialization order
and is the timestamp generated automatically by the system
Temporal Relation is one where each tuple has associated time; either valid time or transaction time or both
associated with it
Uni-Temporal Relations: Has one axis of time, either Valid Time or Transaction Time
Bi-Temporal Relations: Has both axis of time — Valid time and Transaction time
It includes Valid Start Time, Valid End Time, Transaction Start Time, Transaction End Time
Week 6 Lecture 5 5
In a non-temporal DB, John's address is entered as Chennai from 1992
When he registers his new address in 2016, the DB gets updated and the address field now shows his Mumbai
address
So, it will be difficult to find out exactly when he was living in Chennai and when he moved to Mumbai
Johns father registers his birth on 6th April 1992, a new DB entry is made:
Person (John, Chennai, 3-Apr-1992, ∞ )
Bi-Temporal Relation (John’s Data Using Both Valid And Transaction Time)
Week 6 Lecture 5 6
On January 10, 2016 John reports his new address in Mumbai:
Person(John, Mumbai, 21-June-2015, ∞ , 10-Jan-2016, ∞ )
The main advantages of this bi-temporal relations is that it provides historical and roll back information
For example, you can get the result of a query on John's history, like: Where did John live in the year 2001?
The result for this query can be got with the valid time entry
Disadvantages
More storage
Week 6 Lecture 5 7