DBMS Unit 3 Notes - R21
DBMS Unit 3 Notes - R21
Database Design and the E-R Model: Overview of the Design Process, The
Entity Relationship Model, Constraints, Removing Redundant Attributes in
Entity Sets, Entity Relationship Diagrams, Reduction to Relational
Schemas, Entity-Relationship Design Issues. Relational Database Design:
Features of Good Relational Designs, Atomic Domains and First Normal
Form, Decomposition Using Functional Dependencies, Functional-
Dependency Theory, Algorithms for Decomposition, Decomposition Using
Multivalued Dependencies, More Normal Forms.
Conceptual Design Phase : The designer chooses a data model, by applying the concepts of the
data model that is chosen and translates these requirements into a conceptual schema of the
database. The entity-relationship model is typically used to represent the conceptual design.
The conceptual schema specifies the entities that are represented in the database, the attributes of
the entities, the relationships among the entities, and constraints on the entities and relationships.
Typically, the conceptual-design phase results in the creation of an entity-relationship diagram
that provides a graphic representation of the schema.
A fully developed conceptual schema also indicates the functional requirements of the enterprise.
In a specification of functional requirements, users describe the kinds of operations (or
transactions) that will be performed on the data. The operations can include updating,
UNIT – I / 1
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
searching, retrieving and deleting data. At this stage of conceptual design, the designer can
review the schema to ensure it meets functional requirements.
Logical Design Phase : The database designer plots the high-level conceptual schema onto the
implementation of the data model of the database system that has to be used. The
implementation data model is also called the relational data model.
Physical Design Phase : Here, the designer uses the resulting system-specific database schema.
In this phase, the physical features of the database are specified.
All these entities have some attributes or properties that give them their identity.
UNIT – I / 2
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
Strong Entity
The strong entity has a primary key. Weak entities are dependent on strong entity. Its existence
is not dependent on any other entity. Strong Entity is represented by a single rectangle.
Weak Entity
The weak entity in DBMS do not have a primary key and are dependent on the parent entity. It
mainly depends on other entities. Weak Entity is represented by double rectangle
Attributes
Entities are represented by means of their properties, called attributes. All attributes have
values. For example, a student entity may have name, class, and age as attributes.
There exists a domain or range of values that can be assigned to attributes. For example, a
student's name cannot be a numeric value. It has to be alphabetic. A student's age cannot be
negative, etc.
Types of Attributes
Simple attribute − Simple attributes are atomic values, which cannot be divided further.
For example, a student's phone number is an atomic value of 10 digits.
Composite attribute − Composite attributes are made of more than one simple attribute.
For example, a student's complete name may have first name and last name.
Derived attribute − Derived attributes are the attributes that do not exist in the physical
database, but their values are derived from other attributes present in the database. For
example, age can be derived from date of birth.
Multi-value attribute − Multi-value attributes may contain more than one values. For
example, a person can have more than one phone number, email, address, etc.
UNIT – I / 3
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
Symbol Shape Name Symbol Description
Entities
Attributes
UNIT – I / 4
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
An attribute that uniquely identifies a
Key attribute particular entity. The name of a key
attribute is underscored.
Relationships
UNIT – I / 5
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
Attributes of the STUDENT entity type.
RELATIONSHIP AND RELATIONSHIP SETS
Relationship Set
A set of relationships of similar type is called a relationship set. Like entities, a relationship too
can have attributes. These attributes are called descriptive attributes. Descriptive attributes are
used to record information about the relationship, rather than about any one of the participating
entities
Degree of Relationship
The degree of a relationship is the number of entity types that participate in the relationship. The
three most common relationships in ER models are Binary, Unary and Ternary
UNIT – I / 6
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
A binary relationship, or an association between two entity types, is the most common
form of a relationship expressed by an E-R diagram.
For Example:
A unary relationship is when both participants in the relationship are the same entity
The degree of relationship (also known as cardinality) is the number of occurrences in one
entity which are associated (or linked) to the number of occurrences in another.
UNIT – I / 7
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
Using Sets, it can be represented as:
2. Many to one – When entities in one entity set can take part only once in the relationship
set and entities in other entity set can take part more than once in the relationship
set, cardinality is many to one. Let us assume that a student can take only one course but one
course can be taken by many students. So the cardinality will be n to 1. It means that for one
course there can be n students but for one student, there will be only one course.
In this case, each student is taking only 1 course but 1 course has been taken by many
students.
UNIT – I / 8
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
3. Many to many – When entities in all entity sets can take part more than once in the
relationship cardinality is many to many. Let us assume that a student can take more than one
course and one course can be taken by many students. So the relationship will be many to
many.
In this example, student S1 is enrolled in C1 and C3 and Course C3 is enrolled by S1, S3 and
S4. So it is many to many relationships.
Participation Constraint:
Participation Constraint is applied on the entity participating in the relationship set.
1. Total Participation – Each entity in the entity set must participate in the relationship. If
each student must enroll in a course, the participation of student will be total. Total
participation is shown by double line in ER diagram.
2. Partial Participation – The entity in the entity set may or may NOT participate in the
relationship. If some courses are not enrolled by any of the student, the participation of course
will be partial.
The diagram depicts the ‘Enrolled in’ relationship set with Student Entity set having total
participation and Course Entity set having partial participation.
UNIT – I / 9
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
If the participation of an entity set in a relationship set is total, then a thick line connects the
two. The presence of an arrow indicates a key constraint.
A weak entity type is represented by a double rectangle. The participation of weak entity type
is always total. The relationship between weak entity type and its identifying strong entity type
is called identifying relationship and it is represented by double diamond. A weak entity type
always has a ‘total participation constraint’.
UNIT – I / 10
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
Class Hierarchies
Class hierarchy can be viewed one of two ways
Specialization
Specialization is a process of identifying subsets of an entity that shares different characteristics.
It breaks an entity into multiple entities from higher level (super class) to lower level (subclass).
The class vehicle can be specialized into Car, Truck and Motorcycle ( Top Down Approach)
Hence, vehicle is the superclass and Car, Truck, Motorcycle are subclasses. All three of these
inherit attributes from vehicle. Moreover, these three share those attributes among themselves
while containing some other attributes which make them different.
Generalization
Generalization is a process of generalizing an entity which contains generalized attributes or
properties of generalized entities. The entity that is created will contain the common features.
Generalization is a Bottom up process.
UNIT – I / 11
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
The classes Car, Truck and motorcycle can be generalised into Vehicle. (Bottom Up Approach).
Car, Truck and Motorcycle are subclasses while vehicle is the superclass.
Basically, Vehicle contains the common attributes that were shared between Car, Truck and
Motorcycle.
Aggregation
UNIT – I / 12
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
Keys
o Keys play an important role in the relational database.
o It is used to uniquely identify any record or row of data from the table. It is also used to
establish and identify relationships between tables.
o Keys also help to identify relationships uniquely, and thus distinguish relationships from each other.
o The primary key of an entity set allows us to distinguish among the various entities of the set.
o The structure of the primary key for the relationship set depends on the mapping cardinality of the
relationship set.
Types of Keys
1. Superkey: An attribute (or combination of attributes) that uniquely identifies each row in
a table. It is a super set of candidate key.
2. Candidate key : An attribute (or set of attributes) that uniquely identifies a row. Let K be
a set of attributes of relation R. Then K is a candidate key for R if it possess the
following properties:
1. Uniqueness – No legal value of R ever contains two distinct tuples with the same
value of K
2. Irreducibility- No proper subset of K has the uniqueness property
3. Primary key : is the candidate key which is selected as the principal unique identifier. It
is a key that uniquely identify each record in the table. Cannot contain null entries.
4. Composite Key: Key that consist of two or more attributes that uniquely identifies an entity
occurrence is called composite key.
UNIT – I / 13
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
5. Foreign Key: A foreign key is generally a primary key from one table that appears as a field
in another where the first table has a relationship to the second.
In other words, if we had a table A with a primary key X that linked to a table B where X was a
field in B, then X would be a foreign key in B.
UNIT – I / 14
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
• These relationship sets may result in a situation where attributes in the various entity sets
are redundant and need to be removed .
• Consider the entity sets instructor and department
– Instructor includes the attributes ID, name, dept_name, and salary with ID as
primary key
– Department includes the attributes dept_name, building, and budget with
dept_name as primary key
• Attribute dept_name appears in both entity sets. It is primary key for entity department, it
is redundant in the entity set instructor and needs to be removed.
• If both entities have one to one relationship then we can remove it from instructor table as
it will get added up in the relational schema of department.
• But if an instructor has more than one associated department, the relationship between the
entities is recorded in a separate relation inst_dept
A good entity-relationship design does not contain redundant attributes. For example, the
following are the entity sets and their attributes below, with primary keys underlined:
classroom: with attributes (building, room number, capacity).
course: with attributes (course id, title, credits).
instructor: with attributes (ID, name, salary).
UNIT – I / 15
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
Representation of Strong Entity Sets with Simple Attributes
Given in the figure;
Entity set Scientist
Entity set Invention
UNIT – I / 16
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
Reduction into relational schema
Strong entity sets – Entity set that has a primary key to uniquely represent each entity is Strong
entity set.
Strong entity sets can be converted into relational schema by having the entity set name as the
relation schema name and the attributes of that entity set as the attributes of relation schema.
Then we have,
Scientist (SID, SName, RArea, Country)
Invention (IID, IName, Year)
1. After converting strong entity sets into relation schema
Scientist (SID, SName, RArea, Country)
Invention (IID, IName, Year)
Composite attributes – If an attribute can be further divided into two or more component
attributes, that attribute is called composite attribute.
While converting into relation schemas, component attributes can be part of the strong entity
sets’ relation schema. No need to retain the composite attribute.
In our case, SNam becomes FName, and LName as follows;
Scientist (SID, FName, LName, RArea, Country)
2. After converting composite attributes into relation schema
Scientist (SID, FName, LName, RArea, Country)
Invention (IID, IName, Year)
Multi-valued attributes – Attributes that may have multiple values are referred as multi-valued
attributes.
In our ER diagram, RArea is a multi-valued attribute. That means, a scientist may have one or
more areas as their research areas.
To reduce a multi-valued attribute into a relation schema, we have to create a separate table for
each multi-valued attribute. Also, we need to include the primary key of strong entity set (parent
entity set where the multi-valued attribute belongs) as a foreign key attribute to establish link.
In our case, the strong entity set Scientist will be further divided as follows;
UNIT – I / 17
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
Scientist (SID, FName, LName, RArea, Country)
Scientist_Area (SID, RArea)
3. After converting multi-valued attributes into relation schema
Scientist (SID, FName, LName, Country)
Scientist_Area (SID, RArea)
Invention (IID, IName, Year)
Relationship set – The association between two or more entity sets is termed as relationship set.
A relationship may be either converted into a separate table or not. That can be decided based on
the type of the relationship. Only many-to-many relationship needs to be created as a separate
table.
Here, we are given a many-to-many relationship. That means,
UNIT – I / 18
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
The following are the ER Model design issues:
Suppose the entity set employee with attributes employee-name and cell phone number. It can be
dispute that a cell phone is an entity in its own right with attributes cell phone-number and
location (the department where the cell phone is located). If we take this factor of view, the
employee entity set has to be redefined as follows:
The first distinction between these two definitions of an employee is that each employee has
exactly one cell phone number related to him. In the second case, however, the definition states
that employees may additionally have various cell phone numbers (containing zero) connected
with them.
To simplify whether an item is first-class expressed by using an entity set or a relationship set,
consider that a bank loan is modelled as an entity. An opportunity is to model a loan as a
relationship between clients and departments, with loan-number and amount as descriptive
attributes. Each loan is defined by a relationship between a client and a department.
If every loan is held via exactly one client and client is related with exactly one branch, the
layout where a loan is defined as a relationship, perhaps satisfactory. However, with this design,
we cannot describe conveniently a situation in which various clients keep a loan jointly. We have
to represent a separate relationship for each holder of the joint loan. Then, we must reflect the
UNIT – I / 19
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
values for the definitive attributes loan-number and amount in each such relationship. Each such
relationship must have a similar value for the specific attributes loan-number and amount.
It is generally applicable to follow a number binary (n-ary, for n > 2) association set through
various specific binary relationship sets. For integrity, assume the abstract ternary (n = 3)
relationship set R, combining entity sets A, B and C. We replace the relation set R by an entity
set E, and generate three relationship sets:
If the relationship set R had any attributes, these are created to entity set E; otherwise, a special
identifying attribute is developed for E (since each entity set should have at least one attribute to
distinguish participants of the set). For each relationship (a_i,b_i,c_i) in the relationship set R,
we develop a new entity? e?_i in an entity set E. Then, in every of the three new relationship
sets, we add a relationship as follows:
We can accurately achieve this technique n-ary relationship sets. Thus, theoretically, we can
limit the E-R model to involve only binary relationship sets.
The option among utilizing aggregation a ternary relationship is primarily distinct by the
existence of a relationship that associates a relationship set to an entity set (or second
UNIT – I / 20
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
relationship set). The option may also be guided using certain integrity constraints to we need to
define.
Consider the constraint that every sponsorship (of a task by a department) be monitored by at
maximum one employee. We cannot define this constraint in phrases of the Sponsors2
relationship set. Also, we can get explicit the constraint by drawing an arrow from the
aggregated relationship. Sponsors to the relationship Monitors. Thus, the display of such a
constraint serves as another purpose for the usage of aggregation as opposed to a ternary
relationship set.
UNIT – I / 21
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
access are carried out via controlled transactions. Relational database design satisfies the ACID
(atomicity, consistency, integrity and durability) properties required from a database design.
Relational database design mandates the use of a database server in applications for dealing with
data management problems.
The four stages of an RDD are as follows:
Relations and attributes: The various tables and attributes related to each table are
identified. The tables represent entities, and the attributes represent the properties of the
respective entities.
Primary keys: The attribute or set of attributes that help in uniquely identifying a record
is identified and assigned as the primary key
Relationships: The relationships between the various tables are established with the help
of foreign keys. Foreign keys are attributes occurring in a table that are primary keys of
another table. The types of relationships that can exist between the relations (tables) are:
o One to one
o One to many
o Many to many
An entity-relationship diagram can be used to depict the entities, their attributes and the
relationship between the entities in a diagrammatic way.
By applying a set of rules, a table is normalized into the above normal forms in a linearly
progressive fashion. The efficiency of the design gets better with each higher degree of
normalization.
UNIT – I / 22
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
3.7 Features of good relational design
Good Database Design is what everyone wants to achieve to avoid the consequences of dealing
with a bad design. The following are the objectives of a good database design −
Faultless Information
The database should follow the standards and conventions and provide meaningful information
useful to the organization.
Data Integrity
Integrity assists in guaranteeing that the values are valid and faultless. Data Integrity is set to
tables, relationships, etc.
Modify
The database developed should be worked upon with the conventions and standards, so that it
can be easily modified whenever the need arise.
UNIT – I / 23
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
In other words, a dependency FD: X → Y means that the values of Y are determined by
the values of X. Two tuples sharing the same values of X will necessarily have the same values of
Y.
Let R be a relation schema and let X and Y be nonempty sets of attributes in R. We say
that an instance r of R satisfies the FD X Y if the following holds for every pair of tuples t1
and t2 in r
If t1:X = t2:X, then t1:Y = t2:Y .
The following table illustrates A B
Since for each value of A there is associated one and only one value of B.
The following illustrates the meaning of the FD AB C by showing an instance that satisfies
this dependency.
UNIT – I / 24
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
Ex : <Department> table with two attributes − DeptId and DeptName.
The DeptId is our primary key. Here, DeptId uniquely identifies the DeptName attribute. This
is because if you want to know the department name, then at first you need to have the DeptId.
DeptId DeptName
001 Finance
002 Marketing
003 HR
Therefore, the above functional dependency between DeptId and DeptName can be determined
as DeptId is functionally dependent on DeptName. i.e DeptId -> DeptName
Example
UNIT – I / 25
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
Consider <Department> table with two attributes to understand the concept of trivial
dependency.
Example
DeptId -> DeptName
The above is a non-trivial functional dependency since DeptName is a not a subset of DeptId.
A. Primary Rules
Rule 1 Reflexivity
If A is a set of attributes and B is a subset of A, then A holds B. { A → B }
For example, { Employee_Id, Name } → Name is valid.
Rule 2 Augmentation
If A hold B and C is a set of attributes, then AC holds BC. {AC → BC}
It means that attribute in dependencies does not change the basic
dependencies.
For example, X → Y holds true then, ZX → ZY also holds true.
For example, if { Employee_Id, Name } → { Name } holds true then,
{ Employee_Id, Name, Age } → { Name, Age }
Rule 3 Transitivity
If A holds B and B holds C, then A holds C.
If {A → B} and {B → C}, then {A → C}
A holds B {A → B} means that A functionally determines B.
For example, if { Employee_Id } → { Name } holds true and
{ Name } → { Department } holds true,
then { Employee_Id } → { Department } also holds true.
B. Secondary Rules
Rule 1 Union
If A holds B and A holds C, then A holds BC.
If{A → B} and {A → C}, then {A → BC}
Rule 2 Decomposition
If A holds BC and A holds B, then A holds C.
If{A → BC} and {A → B}, then {A → C}
UNIT – I / 27
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
Rule 3 Pseudo Transitivity
If A holds B and BC holds D, then AC holds D.
If{A → B} and {BC → D}, then {AC → D}
Example:
Consider relation E = (P, Q, R, S, T, U) having set of Functional Dependencies (FD).
P → Q P → R
QR → S Q → T
QR → U PR → U
Solution:
1. P → T
In the above FD set, P → Q and Q → T
So, Using Transitive Rule: If {A → B} and {B → C}, then {A → C}
∴ If P → Q and Q → T, then P → T.
P→T
2. PR → S
In the above FD set, P → Q
As, QR → S
So, Using Pseudo Transitivity Rule: If{A → B} and {BC → D}, then {AC → D}
∴ If P → Q and QR → S, then PR → S.
PR → S
3. QR → SU
In above FD set, QR → S and QR → U
So, Using Union Rule: If{A → B} and {A → C}, then {A → BC}
∴ If QR → S and QR → U, then QR → SU.
QR → SU
4. PR → SU
So, Using Pseudo Transitivity Rule: If{A → B} and {BC → D}, then {AC → D}
UNIT – I / 28
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
∴ If PR → S and PR → U, then PR → SU.
PR → SU
Step-01:
Add the attributes contained in the attribute set for which closure is being calculated to the result
set.
Step-02:
Recursively add the attributes to the result set which can be functionally determined from the
attributes already contained in the result set.
Example-
Consider a relation R ( A , B , C , D , E , F , G ) with the functional dependencies-
A → BC
BC → DE
D→F
CF → G
Now, let us find the closure of some attributes and attribute sets-
Closure of attribute A-
A+ = {A}
= { A , B , C } ( Using A → BC )
= { A , B , C , D , E } ( Using BC → DE )
= { A , B , C , D , E , F } ( Using D → F )
= { A , B , C , D , E , F , G } ( Using CF → G )
UNIT – I / 29
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
Thus, A+ = { A , B , C , D , E , F , G }
Closure of attribute D-
D+ = { D }
= { D , F } ( Using D → F )
We can not determine any other attribute using attributes D and F contained in the result set.
Thus, D+ = { D , F }
Problem-
Consider the given functional dependencies-
AB → CD
AF → D
DE → F
C→G
F→E
G→A
Solution-
UNIT – I / 30
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
Let us check each option one by one-
Option-(A):
{ CF }+ = { C , F }
= { C , F , G } ( Using C → G )
= { C , E , F , G } ( Using F → E )
= { A , C , E , E , F } ( Using G → A )
= { A , C , D , E , F , G } ( Using AF → D )
Since, our obtained result set is same as the given result set, so, it means it is correctly
given.
Option-(B):
{ BG }+ = { B , G }
= { A , B , G } ( Using G → A )
= { A , B , C , D , G } ( Using AB → CD )
Since, our obtained result set is same as the given result set, so, it means it is correctly
given.
Option-(C):
{ AF }+ = { A , F }
= { A , D , F } ( Using AF → D )
= { A , D , E , F } ( Using F → E )
Since, our obtained result set is different from the given result set, so,it means it is not
correctly given.
Option-(D):
{ AB }+ = { A , B } = { A , B , C , D } ( Using AB → CD )
= { A , B , C , D , G } ( Using C → G )
Problem
Given R(E-ID, E-NAME, E-CITY, E-STATE)
FDs = { E-ID->E-NAME, E-ID->E-CITY, E-ID->E-STATE, E-CITY->E-STATE }
The attribute closure of E-ID can be calculated as:
1. Add E-ID to the set {E-ID}
2. Add Attributes which can be derived from any attribute of set.
In this case, E-NAME and E-CITY, E-STATE can be derived from E-ID. So these
are also a part of closure.
UNIT – I / 31
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
3. As there is one other attribute remaining in relation to be derived from E-ID. So
result is:
(E-ID)+ = {E-ID, E-NAME, E-CITY, E-STATE }
Similarly,
(E-NAME)+ = {E-NAME}
(E-CITY)+ = {E-CITY, E_STATE}
If the closure result of an attribute set contains all the attributes of the relation, then that attribute
set is called as a super key of that relation.
Thus, we can say - “The closure of a super key is the entire relation schema.”
Example-
In the above example, The closure of attribute A is the entire relation schema.
Thus, attribute A is a super key for that relation.
Candidate Key-
If there exists no subset of an attribute set whose closure contains all the attributes of the
relation, then that attribute set is called as a candidate key of that relation.
Example-
3.10 Decomposition
Definition: The process of breaking up or dividing a single relation into two or more sub
relations is called as decomposition of a relation.
Properties of Decomposition
UNIT – I / 32
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
The following two properties must be followed when decomposing a given relation
1. Lossless decomposition
Lossless decomposition ensures
• No information is lost from the original relation during decomposition.
• When the sub relations are joined back, the same relation is obtained that was
decomposed. Every decomposition must always be lossless.
Consider there is a relation R which is decomposed into sub relations R1 , R2 , …. , Rn.
• This decomposition is called lossless join decomposition when the join of the sub
relations
• results in the same relation R that was decomposed.
• For lossless join decomposition, we always have-
Example-
UNIT – I / 33
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
Consider this relation is decomposed into two sub relations R1( A , B ) and R2( B , C )
Now, let us check whether this decomposition is lossless or not. For lossless decomposition,
we must have- R1 ⋈ R2 = R
Now, if we perform the natural join ( ⋈ ) of the sub relations R1 and R2 , we get-
UNIT – I / 34
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
This relation is same as the original relation R. Thus, we conclude that the above decomposition
is lossless join decomposition.
NOTE : Lossless join decomposition is also known as non-additive join decomposition.
• This is because the resultant relation after joining the sub relations is same as the
decomposed relation.
• No extraneous tuples appear after joining of the sub-relations.
2. Lossy Join Decomposition-
• Consider there is a relation R which is decomposed into sub relations R1 , R2 , …. , Rn.
• This decomposition is called lossy join decomposition when the join of the sub relations
does not result in the same relation R that was decomposed.
• The natural join of the sub relations is always found to have some extraneous tuples.
• For lossy join decomposition, we always have
R1 ⋈ R2 ⋈ R3 ……. ⋈ Rn ⊃ R, where ⋈ is a natural join operator.
UNIT – I / 35
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
Example-
Consider the above relation R( A , B , C ) -
Consider this relation is decomposed into two sub relations as R1( A , C ) and R2( B , C ) –
UNIT – I / 36
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
This relation is not same as the original relation R and contains some extraneous tuples. Clearly,
R1 ⋈ R2 ⊃ R. Thus, we conclude that the above decomposition is lossy join decomposition.
NOTE-
• Lossy join decomposition is also known as careless decomposition.
• This is because extraneous tuples get introduced in the natural join of the sub-relations.
• Extraneous tuples make the identification of the original tuples difficult.
Determining Whether Decomposition Is Lossless Or Lossy-
Consider a relation R is decomposed into two sub relations R1 and R2. Then,
• If all the following conditions satisfy, then the decomposition is lossless.
• If any of these conditions fail, then the decomposition is lossy.
Condition-01:
Union of both the sub relations must contain all the attributes that are present in the original
relation R. Thus,
• R1 ∪ R2 = R
Condition-02:
Intersection of both the sub relations must not be null. In other words, there must be some
common attribute which is present in both the sub relations. Thus,
• R1 ∩ R2 ≠ ∅
Condition-03:
Intersection of both the sub relations must be a super key of either R1 or R2 or both. Thus,
• R1 ∩ R2 = Super key of R1 or R2
Problem
Consider a relation schema R ( A , B , C , D ) with the following functional dependencies-
UNIT – I / 37
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
A→B
B→C
C→D
D→B
Determine whether the decomposition of R into R1 ( A , B ) , R2 ( B , C ) and R3 ( B , D ) is
lossless or lossy.
Solution-
Consider the original relation R was decomposed into the given sub relations as shown-
Condition-01:
According to condition-01, union of both the sub relations must contain all the attributes of
relation R.
So, we have-
R‘ ( A , B , C ) ∪ R3 ( B , D )
=R(A,B,C,D)
Clearly, union of the sub relations contain all the attributes of relation R.
Thus, condition-01 satisfies.
Condition-02:
According to condition-02, intersection of both the sub relations must not be null.
So, we have- R‘ ( A , B , C ) ∩ R3 ( B , D )
=B
Clearly, intersection of the sub relations is not null. Thus, condition-02 satisfies.
Condition-03:
According to condition-03, intersection of both the sub relations must be the super key of one
of the two sub relations or both. So, we have-
R‘ ( A , B , C ) ∩ R3 ( B , D )
=B
UNIT – I / 38
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
Now, the closure of attribute B is- B+ = { B , C , D }
Now, we see-
Attribute ‘B’ can not determine attribute ‘A’ of sub relation R’.
Thus, it is not a super key of the sub relation R’.
Attribute ‘B’ can determine all the attributes of sub relation R3.
Thus, it is a super key of the sub relation R3.
Clearly, intersection of the sub relations is a super key of one of the sub relations.
So, condition-03 satisfies. Thus, we conclude that the decomposition is lossless.
Condition-01: union of both the sub relations must contain all the attributes of relation R’.
So, we have- R1 ( A , B ) ∪ R2 ( B , C )
= R’ ( A , B , C )
Clearly, union of the sub relations contain all the attributes of relation R’.
Thus, condition-01 satisfies.
Condition-03: intersection of both the sub relations must be the super key of one of the two sub
relations or both.
So, we have- R1 ( A , B ) ∩ R2 ( B , C )
=B
Now, the closure of attribute B is
B+={B,C,D}
Now, we see-
Attribute ‘B’ can not determine attribute ‘A’ of sub relation R1.
Thus, it is not a super key of the sub relation R1.
Attribute ‘B’ can determine all the attributes of sub relation R2.
Thus, it is a super key of the sub relation R2.
Clearly, intersection of the sub relations is a super key of one of the sub relations.
So, condition-03 satisfies.
Thus, we conclude that the decomposition is lossless.
Conclusion: Overall decomposition of relation R into sub relations R1, R2 and R3 is lossless.
Canonical Cover
In any relational model, there exists a set of functional dependencies. These functional
dependencies when closely observed might contain redundant attributes. The ability of removing
UNIT – I / 39
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
these redundant attributes without affecting the capabilities of the functional dependency is
known as “Canonical Cover of Functional Dependency”.
EX : Consider a relation R(A,B,C,D) having some attributes and below are mentioned
functional dependencies.
FD1 : B A
FD2 : AD C
FD3 : C ABD
FD2 : AD C
FD3 : C A
FD4 : C B
FD5 : C D
UNIT – I / 40
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
calculating the closure of FD’s having two or more attributes on LHS.
Here, only one FD has two or more attributes of LHS i.e. AD C.
{A}+ = {A}
{D}+ = {D}
In this case, attribute “A” can only determine “A” and “D” can only determine “D”.
Hence, no extraneous attributes are present and the FD will remain the same and will not
be removed.
FD2 : C A
FD3 : C B
FD4 : AD C
FD5 : C D
Above FD1, FD2 and FD3 are forming transitive pair. Hence, using Armstrong’s law of
transitivity i.e. if X Y, Y X then X Z should be removed. Therefore we will have the
following FD’s left :
FD1 : B A
FD2 : C B
FD3 : AD C
FD4 : C D
Also, FD2 & FD4 can be clubbed together now. Hence, the canonical cover of the relation
R(A,B,C,D) will be:
UNIT – I / 41
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
NORMAL FORMS
It is the process of decomposing the relations with anomaly to produce smaller, well
structure relations
The main reason for normalizing the relations is removing these anomalies. Failure to
eliminate anomalies leads to data redundancy and can cause data integrity and other problems as
the database grows. Normalization consists of a series of guidelines that helps to guide you in
creating a good database structure. Data modification anomalies can be categorized into three
types:
o Insertion Anomaly: Insertion Anomaly refers to when one cannot insert a new tuple into
a relationship due to lack of data.
o Deletion Anomaly: The delete anomaly refers to the situation where the deletion of data
results in the unintended loss of some other important data.
o Updatation Anomaly: The update anomaly is when an update of a single data value
requires multiple rows of data to be updated.
ADVANTAGES OF NORMALIZATION
UNIT – I / 42
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
o Enforces the concept of relational integrity.
DISADVANTAGES OF NORMALIZATION
You cannot start building the database before you know what the user needs.
On Normalizing the relations to higher normal forms i.e. 4NF, 5NF the performance
degrades.
It is very time consuming and difficult process in normalizing relations of higher degree.
Careless decomposition may leads to bad design of database which may leads to serious
problems.
Normalization works through a series of stages called Normal forms. The normal forms apply to
individual relations. The relation is said to be in particular normal form if it satisfies constraints.
Normal Description
Form
2NF A relation will be in 2NF if it is in 1NF and all non-key attributes are fully
functional dependent on the primary key.
4NF A relation will be in 4NF if it is in Boyce Codd's normal form and has no multi-
valued dependency.
5NF A relation is in 5NF. If it is in 4NF and does not contain any join dependency,
joining should be lossless.
UNIT – I / 43
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
First Normal Form (1NF)
A relation will be 1NF if it contains an atomic value.
It states that an attribute of a table cannot hold multiple values. It must hold only single-
valued attribute.
First normal form disallows the multi-valued attribute, composite attribute, and their
combinations.
EMPLOYEE table:
14 John 7272826385, UP
9064738238
The decomposition of the EMPLOYEE table into 1NF has been shown below:
14 John 7272826385 UP
14 John 9064738238 UP
UNIT – I / 44
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
12 Sam 8589830302 Punjab
An attribute that is not part of any candidate key is known as non-prime attribute.
Note : Every Non key attribute is fully functional dependant on key attribute.
Example: Suppose a school wants to store the data of teachers and the subjects they teach. They
create a table that looks like this: Since a teacher can teach more than one subjects, the table can
have multiple rows for a same teacher.
The table is in 1 NF because each attribute has atomic values. However, it is not in 2NF because
non prime attribute teacher_age is dependent on teacher_id alone which is a proper subset of
candidate key. This violates the rule for 2NF as the rule says “no non-prime attribute is
dependent on the proper subset of any candidate key of the table”.
To make the table complies with 2NF we can break it in two tables like this:
teacher_details table:
teacher_id teacher_age
111 38
222 38
UNIT – I / 45
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
333 40
teacher_subject table:
teacher_id subject
111 Maths
111 Physics
222 Biology
333 Physics
333 Chemistry
An attribute that is not part of any candidate key is known as non-prime attribute.
In other words 3NF can be explained like this: A table is in 3NF if it is in 2NF and for each
functional dependency X-> Y at least one of the following conditions hold:
An attribute that is a part of one of the candidate keys is known as prime attribute.
Example: Suppose a company wants to store the complete address of each employee, they create
a table named employee_details that looks like this:
UNIT – I / 46
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
Super keys: {emp_id}, {emp_id, emp_name}, {emp_id, emp_name, emp_zip}…so on
Candidate Keys: {emp_id}
Non-prime attributes: all attributes except emp_id are non-prime as they are not part of any
candidate keys.
Here, emp_state, emp_city & emp_district dependent on emp_zip. And, emp_zip is dependent on
emp_id that makes non-prime attributes (emp_state, emp_city & emp_district) transitively
dependent on super key (emp_id). This violates the rule of 3NF.
To make this table complies with 3NF we have to break the table into two tables to remove the
transitive dependency:
employee table:
employee_zip table:
It is an advance version of 3NF that’s why it is also referred as 3.5NF. BCNF is stricter than
3NF. A table complies with BCNF if it is in 3NF and for every functional dependency X->Y, X
should be the super key of the table.
Example: Suppose there is a company wherein employees work in more than one department.
They store the data like this:
UNIT – I / 47
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
emp_id emp_nationality emp_dept dept_type dept_no_of_emp
1001 Austrian Production and planning D001 200
1001 Austrian stores D001 250
1002 American design and technical support D134 100
1002 American Purchasing department D134 600
The table is not in BCNF as neither emp_id nor emp_dept alone are keys.
To make the table comply with BCNF we can break the table in three tables like this:
emp_nationality table:
emp_id emp_nationality
1001 Austrian
1002 American
emp_dept table:
emp_dept_mapping table:
emp_id emp_dept
1001 Production and planning
1001 stores
1002 design and technical support
1002 Purchasing department
Functional dependencies:
emp_id -> emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}
Candidate keys:
For first table: emp_id
UNIT – I / 48
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
For second table: emp_dept
For third table: {emp_id, emp_dept}
This is now in BCNF as in both the functional dependencies left side part is a key.
UNIT – I / 49
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT