UNIT 02
DATABASE DESIGNS
Database Design can be defined as a set of procedures or collection of tasks
involving various steps taken to implement a database. Following are some
critical points to keep in mind to achieve a good database design:
1. Data consistency and integrity must be maintained.
2. Low Redundancy
3. Faster searching through indices
4. Security measures should be taken by enforcing various integrity
constraints.
5. Data should be stored in fragmented bits of information in the most
atomic format possible.
An Entity Relationship Diagram is a diagram that represents relationships among
entities in a database. It is commonly known as an ER Diagram. An ER Diagram
in DBMS plays a crucial role in designing the database.
Entity- Relationship model
An Entity Relationship Diagram (ER Diagram) pictorially explains the
relationship between entities to be stored in a database. Fundamentally, the ER
Diagram is a structural design of the database. It acts as a framework created with
specialized symbols for the purpose of defining the relationship between the
database entities. ER diagram is created based on three principal components:
entities, attributes, and relationships.
The following diagram showcases two entities - Student and Course, and their
relationship. The relationship described between student and course is many-to-
many, as a course can be opted by several students, and a student can opt for more
than one course. Student entity possesses attributes - Stu_Id, Stu_Name &
Stu_Age. The course entity has attributes such as Cou_ID & Cou_Name.
Why Use ER Diagrams in DBMS?
• ER Diagram helps you conceptualize the database and lets you know
which fields need to be embedded for a particular entity
• ER Diagram gives a better understanding of the information to be stored
in a database
• It reduces complexity and allows database designers to build databases
quickly
• It helps to describe elements using Entity-Relationship models
• It allows users to get a preview of the logical structure of the database
Symbols Used in ER Diagrams
• Rectangles: This Entity Relationship Diagram symbol represents entity
types
• Ellipses: This symbol represents attributes
• Diamonds: This symbol represents relationship types
• Lines: It links attributes to entity types and entity types with other
relationship types
• Primary key: Here, it underlines the attributes
• Double Ellipses: Represents multi-valued attributes
Components of ER Diagram
You base an ER Diagram on three basic concepts:
• Entities
• Weak Entity
• Attributes
• Key Attribute
• Composite Attribute
• Multivalued Attribute
• Derived Attribute
• Relationships
• One-to-One Relationships
• One-to-Many Relationships
• Many-to-One Relationships
• Many-to-Many Relationships
Entities
An entity can be either a living or non-living component.
It showcases an entity as a rectangle in an ER diagram.
For example, in a student study course, both the student and the course are
entities.
Weak Entity
An entity that makes reliance over another entity is called a weak entity
You showcase the weak entity as a double rectangle in ER Diagram.
In the example below, school is a strong entity because it has a primary key
attribute - school number. Unlike school, the classroom is a weak entity because
it does not have any primary key and the room number here acts only as a
discriminator.
Attribute
An attribute exhibits the properties of an entity.
You can illustrate an attribute with an oval shape in an ER diagram.
Key Attribute
Key attribute uniquely identifies an entity from an entity set.
It underlines the text of a key attribute.
For example: For a student entity, the roll number can uniquely identify a student
from a set of students.
Composite Attribute
An attribute that is composed of several other attributes is known as a composite
attribute.
An oval showcases the composite attribute, and the composite attribute oval is
further connected with other ovals.
Multivalued Attribute
Some attributes can possess over one value, those attributes are called
multivalued attributes.
The double oval shape is used to represent a multivalued attribute.
Derived Attribute
An attribute that can be derived from other attributes of the entity is known as a
derived attribute.
In the ER diagram, the dashed oval represents the derived attribute.
Relationship
The diamond shape showcases a relationship in the ER diagram.
It depicts the relationship between two entities.
In the example below, both the student and the course are entities, and study is
the relationship between them.
One-to-One Relationship
When a single element of an entity is associated with a single element of another
entity, it is called a one-to-one relationship.
For example, a student has only one identification card and an identification card
is given to one person.
One-to-Many Relationship
When a single element of an entity is associated with more than one element of
another entity, it is called a one-to-many relationship
For example, a customer can place many orders, but an order cannot be placed by
many customers.
Many-to-One Relationship
When more than one element of an entity is related to a single element of another
entity, then it is called a many-to-one relationship.
For example, students have to opt for a single course, but a course can have many
students.
Many-to-Many Relationship
When more than one element of an entity is associated with more than one
element of another entity, this is called a many-to-many relationship.
For example, you can assign an employee to many projects and a project can have
many employees.
How to Draw an ER Diagram?
Below are some important points to draw ER diagram:
• First, identify all the Entities. Embed all the entities in a rectangle and
label them properly.
• Identify relationships between entities and connect them using a
diamond in the middle, illustrating the relationship. Do not connect
relationships with each other.
• Connect attributes for entities and label them properly.
• Eradicate any redundant entities or relationships.
• Make sure your ER Diagram supports all the data provided to design the
database.
• Effectively use colors to highlight key areas in your diagrams.
Cardinality
The number of times an entity of an entity set participates in a relationship set
is known as cardinality. Cardinality can be of different types:
1. One-to-One: When each entity in each entity set can take part only once in
the relationship, the cardinality is one-to-one. Let us assume that a male can
marry one female and a female can marry one male. So the relationship will be
one-to-one.
the total number of tables that can be used in this is 2.
Using Sets, it can be represented as:
2. One-to-Many: In one-to-many mapping as well where each entity can be
related to more than one relationship and the total number of tables that can be
used in this is 2. Let us assume that one surgeon deparment can accomodate
many doctors. So the Cardinality will be 1 to M. It means one deparment has
many Doctors.
total number of tables that can used is 3.
Using sets, one-to-many cardinality can be represented as:
3. Many-to-One: When entities in one entity set can take part only once in the
relationship set and entities in other entity sets can take part more than once in
the relationship set, cardinality is many to one. Let us assume that a student can
take only one course but one course can be taken by many students. So the
cardinality will be n to 1. It means that for one course there can be n students
but for one student, there will be only one course.
The total number of tables that can be used in this is 3.
Using Sets, it can be represented as:
In this case, each student is taking only 1 course but 1 course has been taken by
many students.
4. Many-to-Many: When entities in all entity sets can take part more than once
in the relationship cardinality is many to many. Let us assume that a student can
take more than one course and one course can be taken by many students. So
the relationship will be many to many.
the total number of tables that can be used in this is 3.
Using Sets, it can be represented as:
In this example, student S1 is enrolled in C1 and C3 and Course C3 is enrolled
by S1, S3, and S4. So it is many-to-many relationships.
Participation Constraint
Participation Constraint is applied to the entity participating in the relationship
set.
1. Total Participation – Each entity in the entity set must participate in the
relationship. If each student must enroll in a course, the participation of students
will be total. Total participation is shown by a double line in the ER diagram.
2. Partial Participation – The entity in the entity set may or may NOT
participate in the relationship. If some courses are not enrolled by any of the
students, the participation in the course will be partial.
The diagram depicts the ‘Enrolled in’ relationship set with Student Entity set
having total participation and Course Entity set having partial participation.
Using Set, it can be represented as,
Set representation of Total Participation and Partial Participation
Every student in the Student Entity set participates in a relationship but there
exists a course C4 that is not taking part in the relationship.
Removing redundant attributes
What is Data redundancy in the database management system?
In DBMS, when the same data is stored in different tables, it causes data
redundancy.
Sometimes, it is done on purpose for recovery or backup of data, faster access of
data, or updating data easily. Redundant data costs extra money, demands higher
storage capacity, and requires extra effort to keep all the files up to date.
Sometimes, unintentional duplicity of data causes a problem for the database to
work properly, or it may become harder for the end user to access data. Redundant
data unnecessarily occupy space in the database to save identical copies, which
leads to space constraints, which is one of the major problems.
Let us understand redundancy in DBMS properly with the help of an example.
Student_id Name Course Session Fee Department
101 Devi B. Tech 2022 90,000 CS
102 Sona B. Tech 2022 90,000 CS
103 Varun B. Tech 2022 90,000 CS
104 Satish B. Tech 2022 90,000 CS
105 Amisha B. Tech 2022 90,000 CS
In the above example, there is a "Student" table that contains data such as
"Student_id", "Name", "Course", "Session", "Fee", and "Department". As you
can see, some data is repeated in the table, which causes redundancy.
Problems that are caused due to redundancy in the database
Redundancy in DBMS gives rise to anomalies, and we will study it further. In a
database management system, the problems that occur while working on data
include inserting, deleting, and updating data in the database.
We will understand these anomalies with the help of the following student table:
udent_id student_name student_age dept_id dept_name dept_head
1 Shiva 19 104 Information Technology Jaspreet Kau
2 Khushi 18 102 Electronics Avni Singh
3 Harsh 19 104 Information Technology Jaspreet Kau
1. Insertion Anomaly:
Insertion anomaly arises when you are trying to insert some data into the database,
but you are not able to insert it.
Example: If you want to add the details of the student in the above table, then
you must know the details of the department; otherwise, you will not be able to
add the details because student details are dependent on department details.
2. Deletion Anomaly:
Deletion anomaly arises when you delete some data from the database, but some
unrelated data is also deleted; that is, there will be a loss of data due to deletion
anomaly.
Example: If we want to delete the student detail, which has student_id 2, we will
also lose the unrelated data, i.e., department_id 102, from the above table.
3. Updating Anomaly:
An update anomaly arises when you update some data in the database, but the
data is partially updated, which causes data inconsistency.
Example: If we want to update the details of dept_head from Jaspreet Kaur to
Ankit Goyal for Dept_id 104, then we have to update it everywhere else;
otherwise, the data will get partially updated, which causes data inconsistency.
Advantages of data redundancy in DBMS
o Provides Data Security: Data redundancy can enhance data security as it
is difficult for cyber attackers to attack data that are in different locations.
o Provides Data Reliability: Reliable data improves accuracy because
organizations can check and confirm whether data is correct.
o Create Data Backup: Data redundancy helps in backing up the data.
Extended ER features
Extended ER is a high-level data model that incorporates the extensions to the
original ER model. Enhanced ER models are high level models that represent the
requirements and complexities of complex databases.
The extended Entity Relationship (ER) models are three types as given below −
• Aggregation
• Specialization
• Generalization
Specialization
The process of designing sub groupings within an entity set is called
specialization. It is a top-down process. If an entity set is given with all the
attributes in which the instances of the entity set are differentiated according to
the given attribute value, then that sub-classes or the sub-entity sets can be formed
from the given attribute.
Example
Specialization of a person allows us to distinguish a person according to whether
they are employees or customers. Specialization of account creates two entity
sets: savings account and current account.
In the E-R diagram specialization is represented by triangle components labeled
ISA. The ISA relationship is referred as superclass- subclass relationship as
shown below −
Generalization
It is the reverse process of specialization. It is a bottom-up approach.
It converts subclasses to superclasses. This process combines a number of entity
sets that share the same features into higher-level entity sets.
If the sub-class information is given for the given entity set then, ISA relationship
type will be used to represent the connectivity between the subclass and
superclass as shown below −
Example
Aggregation
It is an abstraction in which relationship sets are treated as higher level entity sets
and can participate in relationships. Aggregation allows us to indicate that a
relationship set participates in another relationship set.
Aggregation is used to simplify the details of a given database where ternary
relationships will be changed into binary relationships. Ternary relation is only
one type of relationship which is working between three entities.
Aggregation is shown in the image below −
Normalization in DBMS
Normalization in DBMS is a technique using which you can organize the data in
the database tables so that:
• There is less repetition of data,
• A large set of data is structured into a bunch of smaller tables,
• and the tables have a proper relationship between them.
DBMS Normalization is a systematic approach to decompose (break down)
tables to eliminate data redundancy(repetition) and undesirable characteristics
like Insertion anomaly in DBMS, Update anomaly in DBMS, and Delete anomaly
in DBMS.
It is a multi-step process that puts data into tabular form, removes duplicate data,
and set up the relationship between tables.
Types of DBMS Normal forms
Normalization rules are divided into the following normal forms:
1. First Normal Form
2. Second Normal Form
3. Third Normal Form
4. BCNF
5. Fourth Normal Form
6. Fifth Normal Form
First Normal Form
First Normal Form is defined in the definition of relations (tables) itself. This rule
defines that all the attributes in a relation must have atomic domains. The values
in an atomic domain are indivisible units.
We re-arrange the relation (table) as below, to convert it to First Normal Form.
Each attribute must contain only a single value from its pre-defined domain.
Second Normal Form
Before we learn about the second normal form, we need to understand the
following −
• Prime attribute − An attribute, which is a part of the candidate-key, is
known as a prime attribute.
• Non-prime attribute − An attribute, which is not a part of the prime-key,
is said to be a non-prime attribute.
If we follow second normal form, then every non-prime attribute should be fully
functionally dependent on prime key attribute. That is, if X → A holds, then there
should not be any proper subset Y of X, for which Y → A also holds true.
We see here in Student_Project relation that the prime key attributes are Stu_ID
and Proj_ID. According to the rule, non-key attributes, i.e. Stu_Name and
Proj_Name must be dependent upon both and not on any of the prime key
attribute individually. But we find that Stu_Name can be identified by Stu_ID
and Proj_Name can be identified by Proj_ID independently. This is called partial
dependency, which is not allowed in Second Normal Form.
We broke the relation in two as depicted in the above picture. So there exists no
partial dependency.
Third Normal Form
A table is said to be in the Third Normal Form when,
1. It satisfies the First Normal Form and the Second Normal form.
2. And, it doesn't have Transitive Dependency.
What is Transitive Dependency?
In a table we have some column that acts as the primary key and other columns
depends on this column. But what if a column that is not the primary key depends
on another column that is also not a primary key or part of it? Then we have
Transitive dependency in our table.
We find that in the above Student_detail relation, Stu_ID is the key and only
prime key attribute. We find that City can be identified by Stu_ID as well as Zip
itself. Neither Zip is a superkey nor is City a prime attribute. Additionally, Stu_ID
→ Zip → City, so there exists transitive dependency.
To bring this relation into third normal form, we break the relation into two
relations as follows −
Boyce-Codd Normal Form
• Boyce and Codd Normal Form is a higher version of the Third Normal
Form.
• This form deals with a certain type of anomaly that is not handled by 3NF.
• A 3NF table that does not have multiple overlapping candidate keys is
said to be in BCNF.
• For a table to be in BCNF, the following conditions must be satisfied:
o R must be in the 3rd Normal Form
o and, for each functional dependency ( X → Y ), X should be a Super
Key.
In the above image, Stu_ID is the super-key in the relation Student_Detail and
Zip is the super-key in the relation ZipCodes. So,
Stu_ID → Stu_Name, Zip
and
Zip → City
Which confirms that both the relations are in BCNF.
Fourth Normal Form (4NF)
A table is said to be in the Fourth Normal Form when,
1. It is in the Boyce-Codd Normal Form.
2. And, it doesn't have Multi-Valued Dependency.
Fifth Normal Form (5NF)
• The fifth normal form is also called the PJNF - Project-Join Normal
Form
• It is the most advanced level of Database Normalization.
• Using Fifth Normal Form you can fix Join dependency and reduce data
redundancy.
• It also helps in fixing Update anomalies in DBMS design.