DBMS Notes Unit 2
DBMS Notes Unit 2
The entity-relationship (E-R) data model uses a collection of basic objects, called entities, and
relationships among these objects. An entity is a “thing” or “object” in the real world that is
distinguishable from other objects. The entity relationship model is widely used in database design.
Attributes are the properties that define the entity type.
For example, Roll_No, Name, DOB, Age, Address, and Mobile_No are the attributes that define
entity type Student.
In ER diagram,
Entity set is represented by rectangle
Key attribute - The attribute (or combination of attributes) which is unique for every entity
instance is called key attribute. In ER modeling, notation for key attribute is given
below.
Degree of a Relationship
Degree of a relationship is the number of entity types involved. The n-ary relationship is the
general form for degree n. Special cases are unary, binary, and ternary, where the degree is 1, 2, and 3,
respectively.
Example for unary relationship: An employee ia a manager of another employee
Example for binary relationship: An employee works-for department.
Example for ternary relationship : customer purchase item from a shop keeper
Cardinality of a Relationship
In a database, the mapping cardinality or cardinality ratio means to denote the number of entities to
which another entity can be linked through a certain relation set. Mapping cardinality is most useful in
describing binary relation sets, although it can contribute to the description of relation sets containing
more than two entity sets.
One-to-Many
In this type of cardinality mapping, an entity in A is associated with any number of entities in B.
Or we can say that one unit or item in B can be connected to at most one unit or item in A.
Entity Relationship Diagram – An Entity–relationship model (ER model) describes the structure of a
database with the help of a diagram, which is known as Entity Relationship Diagram. An ER model is a
design or blueprint of a database that can later be implemented as a database.
A simple ER Diagram:
In the following diagram we have two entities Student and College and their relationship. The
relationship between Student and College is many to one as a college can have many students however a
student cannot study in multiple colleges at the same time. Student entity has attributes such as Stu_Id,
Stu_Name & Stu_Addr and College entity has attributes such as Col_ID & Col_Name.
Construct an E-R diagram for a car-insurance company whose customers own one or more cars
each. Each car has associated with it zero to any number of recorded accidents. Each insurance policy
covers one or more cars, and has one or more premium payments associated with it. Each payment is for
a particular period of time, and has an associated due date, and the date when the payment was received.
Example 2:
A university registrar’s office maintains data about the following entities: (i) courses, including
number, title, credits, syllabus, and prerequisites; (ii) course offerings, including course number, year,
semester, section number, instructor(s), timings, and classroom; (iii) students, including student-id,
name, and program; and (iv)instructors, including identification number, name, department, and title.
Further, the enrollment of students in courses and grades awarded to students in each course they are
enrolled for must be appropriately modeled. Construct an E-R diagram for the registrar’s office.
Document all assumptions that you make about the mapping constraints.
The ER Model has the power of expressing database entities in a conceptual hierarchical manner.
As the hierarchy goes up, it generalizes the view of entities, and as we go deep in the hierarchy, it gives
us the detail of every entity included.
Going up in this structure is called generalization, where entities are clubbed together to
represent a more generalized view. For example, a particular student named Mira can be generalized
along with all the students. The entity shall be a student, and further, the student is a person. The reverse
is called specialization where a person is a student, and that student is Mira.
Generalization:
As mentioned above, the process of generalizing entities, where the generalized entities contain
the properties of all the generalized entities is called generalization. In generalization, a number of
entities are brought together into one generalized entity based on their similar characteristics. For
example, pigeon, house sparrow, crow and dove can all be generalized as Birds.
Similarly, in a school database, persons can be specialized as teacher, student, or a staff, based
on what role they play in school as entities.
Inheritance:
We use all the above features of ER-Model in order to create classes of objects in object-oriented
programming. The details of entities are generally hidden from the user; this process known as
abstraction. Inheritance is an important feature of Generalization and Specialization. It allows lower-
level entities to inherit the attributes of higher-level entities.
For example, the attributes of a Person class such as name, age, and gender can be inherited by
lower-level entities such as Student or Teacher.
Many-to-One Relationship
When more than one element of an entity is related to a single element of another entity, then it
is called a many-to-one relationship. For example, students have to opt for a single course, but a course
can have many students.
Many-to-Many Relationship
When more than one element of an entity is associated with more than one element of another
entity, this is called a many-to-many relationship. For example, you can assign an employee too many
projects and a project can have many employees.
Example - Employee
SSN Name JobType DeptName
557-78-6587 Lance Smith Accountant Salary
214-45-2398 Lance Smith Engineer Product
Name is functionally dependent on SSN because an employee‘s name can be uniquely determined
from their SSN. Name does not determine SSN, because more than one employee can have the same
name.
Keys
Whereas a key is a set of attributes that uniquely identifies an entire tuple, a functional dependency
allows us to express constraints that uniquely identify the values of certain attributes.
However, a candidate key is always a determinant, but a determinant doesn‘t need to be a key.
Axioms
Before we can determine the closure of the relation, Student, we need a set of rules.
i. Reflexivity Rule
ii. Augmentation Rule
iii. Transitivity Rule
iv. Union Rule
v. Decomposition Rule
vi. Pseudo transitivity Rule
Types of Decomposition
Lossless Decomposition
If the information is not lost from the relation that is decomposed, then the decomposition will
be lossless.
The lossless decomposition guarantees that the join of relations will result in the same relation
as it was decomposed.
The relation is said to be lossless decomposition if natural joins of all the decomposition give
the original relation.
In First Normal Form, any row must not have a column in which more than one value is saved,
like separated with commas. Rather than that, we must separate such data into multiple rows.
Student Table following 1NF will be:
Using the First Normal Form, data redundancy increases, as there will be many columns with
same data in multiple rows but each row as a whole will be unique.
In example of First Normal Form there are two rows for Adam, to include multiple subjects that
he has opted for. While this is searchable, and follows First normal form, it is an inefficient use of space.
Also in the above Table in First Normal Form, while the candidate key is {Student, Subject}, Age of
Student only depends on Student column, which is incorrect as per Second Normal Form. To achieve
second normal form, it would be helpful to split out the subjects into an independent table, and match
them up using the student names as foreign keys.
New Student Table following 2NF will be:
Student Age Age
Adam 15
Alex 14
Stuart 17
In Student Table the candidate key will be Student column, because all other column i.e Age is
dependent on it.
New Subject Table introduced for 2NF will be:
Student Age Subject
Adam Biology
Adam Maths
Alex Maths
Stuart Maths
In Subject Table the candidate key will be {Student, Subject} column. Now, both the above tables
qualify for Second Normal Form and will never suffer from Update Anomalies. Although there are a
few complex cases in which table in Second Normal Form suffers Update Anomalies, and to handle
those scenarios Third Normal Form is there.
Third Normal form applies that every non-prime attribute of table must be dependent on primary
key, or we can say that, there should not be the case that a non-prime attribute is determined by another
non-prime attribute. So this transitive functional dependency should be removed from the table and also
the table must be in Second Normal form. For example, consider a table with following fields.
Student_Detail Table:
In this table Student_id is Primary key, but street, city and state depends upon Zip. The
dependency between zip and other fields is called transitive dependency. Hence to apply 3NF, we need
to move the street, city and state to new table, with Zip as primary key.
Boyce and Codd Normal Form is a higher version of the Third Normal form. This form deals
with certain type of anomaly that is not handled by 3NF. A 3NF table which does not have multiple
overlapping candidate keys is said to be in BCNF. For a table to be in BCNF, following conditions must
be satisfied:
R must be in 3rd Normal Form and, for each functional dependency ( X -> Y ), X should be a
super Key.
Consider the following relationship: R(A , B, C, D) and following dependencies:
ABCD
BC AD
DB
Keys are A and BC.
Hence, in the functional dependency, ABCD, A is the super key.
In second relation, BCAD, BC is also a key, but in, DB, D is not a key.
Hence we can break our relationship R into two relationships R1 and R2.
A(A, B, C, D)
R1(A, D, C) R2(D,B)
Breaking table into two tables, one with A,D and C while the other with D and B.
Multivalued dependency
Multivalued dependency occurs when two attributes in a table are independent of each other but,
both depend on a third attribute. A multivalued dependency consists of at least two attributes that are
dependent on a third attribute that's why it always requires at least three attributes. Mutiivalued
dependencies are consequences of 1NF which did not allow an attribute in a tuple to have a set of
values.
In a relation, the functional dependency A -> B relates a value of A to a value of B while
multivalued dependency represented A -> B represents a relationship that defines a relationship in
which attribute B are determined by a single value of A. The multivalued dependency is the result of
1NF that prohibits an attribute from having a set of values.
Example: Suppose there is a bike manufacturer company which produces two colors(white and
black) of each model every year.
BIKE_MODEL MANUF_YEAR COLOR
M2011 2008 White
M2001 2008 Black
M3001 2013 White
M3001 2013 Black
M4006 2017 White
M4006 2017 Black
Here columns COLOR and MANUF_YEAR are dependent on BIKE_MODEL and independent
of each other. In this case, these two columns can be called as multivalued dependent on
BIKE_MODEL. The representation of these dependencies is shown below:
BIKE_MODEL → → MANUF_YEAR
BIKE_MODEL → → COLOR
i. For a relation to maintain multivalued dependency, it must have atleast three attributes.
Since Multivalued Dependency always occurs in pairs i.e. A->->C also holds in a relation
R (A, B, C).
ii. The attributes giving rise to the multivalued facts must be independent of each other.
iii. Functional dependency is a special case of multivalued dependency. If we restrict the set
determined by multivalued dependency to a single set then multivalued dependency
reduces to a functional dependency.
If we observe the data in the table above it satisfies 3NF. But LECTURER and BOOKS are two
independent entities here. There is no relationship between Lecturer and Books. In the above example,
either Alex or Bosco can teach Maths. For Maths subject , student can refer either 'Maths Book1' or
'Maths Book2'.
That is,
SUBJECT --> LECTURER
SUBJECT-->BOOKS
This is a multivalued dependency on SUBJECT. If we need to select both lecturer and books
recommended for any of the subject, it will show up (lecturer, books) combination, which implies lecturer
who recommends which book. This is not correct.
SELECT c.LECTURER, c.BOOKS FROM COURSE c WHERE SUBJECT = 'Maths';
Now if we want to know the lecturer names and books recommended for any of the subject, we
will fire two independent queries. Hence it removes the multi-valued dependency and confusion around
the data. Thus the table is in 4NF.
Join Dependency
Join decomposition is a further generalization of Multivalued dependencies.
Consider the following relationship: R(A , B, C, D) and following dependencies:
ABCD
BC AD
DB
Keys are A and BC.
Hence, in the functional dependency, ABCD, A is the super key.
In second relation, BCAD, BC is also a key, but in, DB, D is not a key.
If the join of R1 and R2 over C is equal to relation R, then we can say that a join dependency
(JD) exists, where R1 and R2 are the decompositions R1(A, B, C) and R2(C, D) of a given relations R
(A, B, C, D).
Alternatively, R1 and R2 are a lossless decomposition of R.
A JD ⋈ {R1, R2,..., Rn} is said to hold over a relation R if R1, R2,....., Rn is a lossless-join
decomposition.
The *(A, B, C, D), (C, D) will be a JD of R if the join of join's attribute is equal to the relation R.
Here, *(R1, R2, R3) is used to indicate that relation R1, R2, R3 and so on are a JD of R.
If we can decompose table further to eliminate redundancy and anomaly, and when we rejoin
the decomposed tables by means of candidate keys, we should not be losing the original data
or any new record set should not arise. In simple words, joining two or more decomposed
table should not lose records nor create new records.
Consider an example of different Subjects taught by different lecturers and the lecturers taking
classes for different semesters.
COURSE
SUBJECT
LECTURER
CLASS
In above table, Rose takes both Mathematics and Physics class for Semester 1, but she does not
take Physics class for Semester 2. In this case, combination of all these 3 fields is required to identify a
valid data. Imagine we want to add a new class - Semester3 but do not know which Subject and who
will be taking that subject. We would be simply inserting a new entry with Class as Semester3 and
leaving Lecturer and subject as NULL. As we discussed above, it's not a good to have such entries.
Moreover, all the three columns together act as a primary key, we cannot leave other two columns
blank.
Hence we have to decompose the table in such a way that it satisfies all the rules till 4NF and
when join them by using keys, it should yield correct record. Here, we can represent each lecturer's
Subject area and their classes in a better way. We can divide above table into three - (SUBJECT,
LECTURER), (LECTURER, CLASS), (SUBJECT, CLASS)
5NF
SUBJECT LECTURER CLASS LECTURER CLASS SUBJECT
Maths Alex SEMESTER 1 Alex SEMESTER 1 Maths
Maths Boscoo SEMESTER 1 Boscoo SEMESTER 1 Maths
Physics Rose SEMESTER 1 Rose SEMESTER 1 Physics
Chemistry Adam SEMESTER 1 Adam SEMESTER 1 Chemistry
Now, each of combinations is in three different tables. If we need to identify who is teaching
which subject to which semester, we need join the keys of each table and get the result.