Unit-Iii Part-2
Unit-Iii Part-2
1. Lossless Decomposition
Decomposition must be lossless. It means that the information should not get lost from the relation that is
decomposed.
It gives a guarantee that the join will result in the same relation as it was decomposed.
Example:
Let's take 'E' is the Relational Schema, With instance 'e'; is decomposed into: E1, E2, E3, . . . . En; With
instance: e1, e2, e3, . . . . en, If e1 ⋈ e2 ⋈ e3 . . . . ⋈ en, then it is called as 'Lossless Join Decomposition'.
In the above example, it means that, if natural joins of all the decomposition give the original relation,
then it is said to be lossless join decomposition.
In the above example, it means that, if natural joins of all the decomposition give the
original relation, then it is said to be lossless join decomposition.
Decompose the above relation into two relations to check whether decomposition is lossless or lossy.
Now, we have decomposed the relation that is Employee and Department.
Employee ⋈ Department
Functional Dependencies
Functional dependency is a constraint between two sets of attributes in a relation from a database.
In other words, functional dependency is a constraint that describes the relationship between
attributes, typically between the primary key and other non key attributes with in a table.
For any relation R, attribute Y is functionally dependent on attribute X , if for every valid instance
of X, that value of X uniquely determines the value of Y.
It is denoted as X → Y, where the attribute set on the left side of the arrow, X is
called Determinant, and Y is called the Dependent.
Example:
dept_nam
roll_no name e dept_building
42 abc CO A4
43 pqr IT A3
44 xyz CO A4
45 xyz IT A3
dept_nam
roll_no name e dept_building
46 mno EC B2
47 jkl ME B2
From the above table we can conclude some valid functional dependencies:
roll_no → { name, dept_name, dept_building },→ Here, roll_no can determine values of fields name,
dept_name and dept_building, hence a valid Functional dependency
roll_no → dept_name , Since, roll_no can determine whole set of {name, dept_name, dept_building},
it can determine its subset dept_name also.
dept_name → dept_building , Dept_name can identify the dept_building accurately, since
departments with different dept_name will also have a different dept_building
More valid functional dependencies: roll_no → name, {roll_no, name} ⇢ {dept_name,
dept_building}, etc.
Here are some invalid functional dependencies:
name → dept_name Students with the same name can have different dept_name, hence this is not a
valid functional dependency.
dept_building → dept_name There can be multiple departments in the same building. Example, in
the above table departments ME and EC are in the same building B2, hence
dept_building → dept_name is an invalid functional dependency.
More invalid functional dependencies: name → roll_no, {name, dept_name} → roll_no, dept_building
→ roll_no, etc.
42 abc 17
43 pqr 18
44 xyz 18
Here, {roll_no, name} → name is a trivial functional dependency, since the dependent name is a subset
of determinant set {roll_no, name}. Similarly, roll_no → roll_no is also an example of trivial functional
dependency.
2. Non-trivial Functional Dependency
In Non-trivial functional dependency, the dependent is strictly not a subset of the determinant. i.e. If X
→ Y and Y is not a subset of X, then it is called Non-trivial functional dependency.
Example:
nam
roll_no e age
42 abc 17
43 pqr 18
44 xyz 18
Here, roll_no → name is a non-trivial functional dependency, since the dependent name is not a subset
of determinant roll_no. Similarly, {roll_no, name} → age is also a non-trivial functional dependency,
since age is not a subset of {roll_no, name}
In Multivalued functional dependency, entities of the dependent set are not dependent on each
other. i.e. If a → {b, c} and there exists no functional dependency between b and c, then it is called
a multivalued functional dependency.
For example,
nam
roll_no e age
42 abc 17
43 pqr 18
44 xyz 18
45 abc 19
Here, roll_no → {name, age} is a multivalued functional dependency, since the
dependents name & age are not dependent on each other(i.e. name → age or age → name doesn’t
exist !)
For example
C
42 abc 4
O
43 pqr EC 2
44 xyz IT 1
45 abc EC 2
1. Data Normalization
Data normalization is the process of organizing data in a database in order to minimize redundancy and
increase data integrity. Functional dependencies play an important part in data normalization. With the
help of functional dependencies we are able to identify the primary key, candidate key in a table which in
turns helps in normalization.
2. Query Optimization
With the help of functional dependencies we are able to decide the connectivity between the tables and
the necessary attributes need to be projected to retrieve the required data from the tables. This helps in
query optimization and improves performance.
3. Consistency of Data
Functional dependencies ensures the consistency of the data by removing any redundancies or
inconsistencies that may exist in the data. Functional dependency ensures that the changes made in one
attribute does not affect inconsistency in another set of attributes thus it maintains the consistency of the
data in database.
Functional dependencies ensure that the data in the database to be accurate, complete and updated. This
helps to improve the overall quality of the data, as well as it eliminates errors and inaccuracies that might
occur during data analysis and decision making, thus functional dependency helps in improving the
quality of data in database.
Normal Forms
Normalization is the process of minimizing redundancy from a relation or set of relations.
Redundancy in relation may cause insertion, deletion, and update anomalies. So, it helps to
minimize the redundancy in relations. Normal forms are used to eliminate or reduce redundancy
in database tables.
There are several levels of normalization, each with its own set of guidelines, known as normal
forms.
First Normal Form (1NF)
Second Normal Form (2NF)
Third Normal Form (3NF)
Boyce-Codd Normal Form (BCNF)
Fourth Normal Form (4NF)
Fifth Normal Form (5NF)
If a relation contain composite or multi-valued attribute, it violates first normal form or a relation is in
first normal form if it does not contain any composite or multi-valued attribute. A relation is in first
normal form if every attribute in that relation is singled valued attribute.
To be in second normal form, a relation must be in first normal form and relation must not contain any
partial dependency. A relation is in 2NF if it has No Partial Dependency, i.e., no non-prime attribute
(attributes which are not part of any candidate key) is dependent on any proper subset of any candidate
key of the table.
Partial Dependency – If the proper subset of candidate key determines non-prime attribute, it is called
partial dependency.
Example 1 – Consider table-3 as following below.
STUD_NO COURSE_NO COURSE_FEE
1 C1 1000
2 C2 1500
1 C4 2000
4 C3 1000
4 C1 1000
2 C5 2000
Here, COURSE_FEE cannot alone decide the value of COURSE_NO or STUD_NO;
COURSE_FEE together with STUD_NO cannot decide the value of COURSE_NO;
COURSE_FEE together with COURSE_NO cannot decide the value of STUD_NO;
Hence, COURSE_FEE would be a non-prime attribute, as it does not belong to the one only candidate
key {STUD_NO, COURSE_NO} ;
But, COURSE_NO -> COURSE_FEE, i.e., COURSE_FEE is dependent on COURSE_NO, which is a
proper subset of the candidate key. Non-prime attribute COURSE_FEE is dependent on a proper
subset of the candidate key, which is a partial dependency and so this relation is not in 2NF.
To convert the above relation to 2NF, we need to split the table into two tables such as :
Table 1: STUD_NO, COURSE_NO Table 2: COURSE_NO, COURSE_FEE
Table 1 Table 2
STUD_NO COURSE_NO COURSE_NO COURSE_FEE
1 C1 C1 1000
2 C2 C2 1500
1 C4 C3 1000
4 C3 C4 2000
4 C1 C5 2000
The normalization of 2NF relations to 3NF involves the removal of transitive dependencies. If a
transitive dependency exists, we remove the transitively dependent attribute(s) from the relation by
placing the attribute(s) in a new relation along with a copy of the determinant.
Consider the examples given below.
Example-1:
In relation STUDENT given in Table 4,
FD set:
{STUD_NO -> STUD_NAME, STUD_NO -> STUD_STATE, STUD_STATE -> STUD_COUNTRY,
STUD_NO -> STUD_AGE}
Candidate Key:
{STUD_NO}
For this relation in table 4, STUD_NO -> STUD_STATE and STUD_STATE -> STUD_COUNTRY are
true.
So STUD_COUNTRY is transitively dependent on STUD_NO.
It violates the third normal form. To convert it in third normal form, we will decompose the relation
STUDENT (STUD_NO, STUD_NAME, STUD_STATE, STUD_COUNTRY_STUD_AGE) as:
STUDENT (STUD_NO, STUD_NAME, STUD_STATE, STUD_AGE)
STATE_COUNTRY (STATE, COUNTRY);
It is an advance version of 3NF that’s why it is also referred as 3.5NF. BCNF is stricter than 3NF.
Example: Suppose there is a company wherein employees work in more than one department. They
store the data like this:
emp_id emp_nationality emp_dept dept_type dept_no_of_emp
The table is not in BCNF as neither emp_id nor emp_dept alone are keys.
To make the table comply with BCNF we can break the table in three tables like this:
emp_nationality table:
emp_id emp_nationality
1001 Austrian
1002 American
emp_dept table:
emp_id emp_dept
1001 stores
Example 2
Let us consider the student database, in which data of the student are mentioned.
Stu_I
D Stu_Branch Stu_Course Branch_NumberStu_Course_No
For satisfying this table in BCNF, we have to decompose it into further tables. Here is the full procedure
through which we transform this table into BCNF.
Let us first divide this main table into two tables Stu_Branch and Stu_Course Table.
Stu_Branch Table
Stu_ID Stu_Branch
Mobile
B_003 402
Communication
Stu_I
D Stu_Course_No
101 201
101 202
102 401
102 402
If two or more independent relations are kept in a single relation or we can say multivalue
dependency occurs when the presence of one or more rows in a table implies the presence of one or more
other rows in that same table. Put another way, two attributes (or columns) in a table are independent of
one another, but both depend on a third attribute.
A multivalued dependency always requires at least three attributes because it consists of at least two
attributes that are dependent on a third.
For a dependency A -> B, if for a single value of A, multiple values of B exist, then the table may have a
multi-valued dependency. The table should have at least 3 attributes and B and C should be independent
for A ->> B multivalued dependency.
Properties
A relation R is in 4NF if and only if the following conditions are satisfied:
1. It should be in the Boyce-Codd Normal Form (BCNF).
2. The table should not have any Multi-valued Dependency.
For a dependency A → B, if for a single value of A, multiple values of B exists, then the relation will be a
multi-valued dependency.
Example
STUDENT
21 Computer Dancing
21 Math Singing
34 Chemistry Dancing
74 Biology Cricket
59 Physics Hockey
The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent entity. Hence,
there is no relationship between COURSE and HOBBY.
In the STUDENT relation, a student with STU_ID, 21 contains two courses, Computer and Math and two
hobbies, Dancing and Singing. So there is a Multi-valued dependency on STU_ID, which leads to
unnecessary repetition of data.
So to make the above table into 4NF, we can decompose it into two tables:
STUDENT_COURSE
STU_ID COURSE
21 Computer
21 Math
34 Chemistry
74 Biology
59 Physics
STUDENT_HOBBY
STU_ID HOBBY
21 Dancing
21 Singing
34 Dancing
74 Cricket
59 Hockey
Fifth normal form (5NF)
o A relation is in 5NF if it is in 4NF and not contains any join dependency and joining should be
lossless.
o 5NF is satisfied when all the tables are broken into as many tables as possible in order to avoid
redundancy.
o 5NF is also known as Project-join normal form (PJ/NF).
Example
SUBJECT LECTURER SEMESTER
In the above table, John takes both Computer and Math class for Semester 1 but he doesn't take Math class
for Semester 2. In this case, combination of all these fields required to identify a valid data.
Suppose we add a new Semester as Semester 3 but do not know about the subject and who will be taking
that subject so we leave Lecturer and Subject as NULL. But all three columns together acts as a primary
key, so we can't leave other two columns blank.
So to make the above table into 5NF, we can decompose it into three relations P1, P2 & P3:
P1
SEMESTER SUBJECT
Semester 1 Computer
Semester 1 Math
Semester 1 Chemistry
Semester 2 Math
P2
SUBJECT LECTURER
Computer Anshika
Computer John
Math John
Math Akash
Chemistry Praveen
P3
SEMSTER LECTURER
Semester 1 Anshika
Semester 1 John
Semester 1 John
Semester 2 Akash
Semester 1 Praveen
Step-01: Add the attributes contained in the attribute set for which closure is being calculated to the result
set.
Step-02: Recursively add the attributes to the result set which can be functionally determined from the
attributes already contained in the result set.
Example:
A → BC
BC → DE
D→F
CF → G
Now, let us find the closure of some attributes and attribute sets
Closure of attribute A
A+={A}
={A,B,C} ( Using A → BC )
= {A, B , C , D , E } ( Using BC → DE )
={A,B,C,D,E,F} ( Using D → F )
={A,B,C,D,E,F,G} ( Using CF → G )
Thus, A + = { A , B , C , D , E , F , G }
Closure of attribute D
D+={D}
= { D , F } ( Using D → F )
We cannot determine any other attribute using attributes D and F contained in the result set.
Thus, D + = { D , F }
{ B , C }+ = { B , C }
= { B , C , D , E } ( Using BC → DE )
= { B , C , D , E , F } ( Using D → F )
= { B , C , D , E , F , G } ( Using CF → G )
Thus, { B , C }+ = { B , C , D , E , F , G }
Candidate Key-
● If there exists no subset of an attribute set whose closure contains all the attributes of the relation, then
that attribute set is called as a candidate key of that relation.
We can determine the candidate keys of a given relation using the following steps
Step-01:
● Essential attributes are those attributes which are not present on RHS of any functional dependency.
● Essential attributes are always a part of every candidate key.
Example
A→B
C→D
D→E
Here, the attributes which are not present on RHS of any functional dependency are A, C and F.
Step-02:
Case-01:
If all essential attributes together can determine all remaining non-essential attributes, then-
Case-02:
If all essential attributes together can not determine all remaining non-essential attributes, then-
● The set of essential attributes and some non-essential attributes will be the candidate key(s).
● To find the candidate keys, we check different combinations of essential and non-essential attributes.
C→F
E→A
EC → D
A→B
Which of the following is a key for R?
1. CD
2. EC
3. AE
4. AC
Step-01:
Step-02:
Now,
● We will check if the essential attributes together can determine all remaining non-essential attributes.
{ CE }+ = { C , E }
= { C , E , F } ( Using C → F )
= { A , C , E , F } ( Using E → A )
= { A , C , D , E , F } ( Using EC → D )
= { A , B , C , D , E , F } ( Using A → B )
We conclude that CE can determine all the attributes of the given relation. So, CE is the only possible
candidate key of the relation.
Problem:
Let a relation R (A, B, C, D ) and functional dependency {AB –> C, C –> D, D –> A}. Relation
R is decomposed into R1( A, B, C) and R2(C, D). Check whether decomposition is dependency
preserving or not.
Solution:
closure(A) = { A } // Trivial
closure(B) = { B } // Trivial
closure(C) = {C, A, D} but D can't be in closure as D is not present R1.
= {C, A}
C--> A // Removing C from right side as it is trivial attribute
closure(AB) = {A, B, C, D}
= {A, B, C}
AB --> C // Removing AB from right side as these are trivial attributes
closure(BC) = {B, C, D, A}
= {A, B, C}
BC --> A // Removing BC from right side as these are trivial attributes
closure(AC) = {A, C, D}
NULL SET
F1 {C--> A, AB --> C, BC --> A}.
To find closure of F2, consider all combination of CD
closure(C) = { C }
{ C,D}
C--> D // Removing C from right side as it is trivial attribute
closure(D) = { D } // Trivial
closure(CD) = { CD }
= {CDA} but A can't be in closure as A is not present R2.
= {CD}
NULL SET // Removing CD from right side as it is trivial attribute
F2 = { C--> D }