0% found this document useful (0 votes)
7 views

Unit 1 Data Modeling

This document provides an overview of data modeling, focusing on the Entity-Relationship (ER) model, its components, and the importance of keys in relational databases. It explains how ER diagrams are used to visually represent database structures, including entities, attributes, and relationships, and discusses the extended ER model that incorporates additional complexities. Additionally, it outlines the various types of keys, including primary, candidate, super, and foreign keys, which are essential for uniquely identifying records and establishing relationships between tables.

Uploaded by

mohitdeo63
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Unit 1 Data Modeling

This document provides an overview of data modeling, focusing on the Entity-Relationship (ER) model, its components, and the importance of keys in relational databases. It explains how ER diagrams are used to visually represent database structures, including entities, attributes, and relationships, and discusses the extended ER model that incorporates additional complexities. Additionally, it outlines the various types of keys, including primary, candidate, super, and foreign keys, which are essential for uniquely identifying records and establishing relationships between tables.

Uploaded by

mohitdeo63
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 51

Unit 1 Data Modeling

Data Modeling
• Entity Relationship (ER) Model
• Keys
• Extended ER Model
• Relational Model
• Codd's Rules
Data modeling
• A data model is a collection of conceptual tools for describing data, data
relationships, data semantics, and consistency constraints.
• In simple terms, Data models define how the logical structure of a database
is modeled that is Data models define how data is connected to each other
and how they are processed and stored inside the system.
• Data Models are fundamental entities to introduce abstraction in a DBMS.
• Categories of Data Models
• High-level or conceptual data models provide concepts that are close to the
way many users perceive data.
• whereas low-level or physical data models provide concepts that describe
the details of how data is stored in the computer.
Entity Relationship (ER) Model
• ER model stands for an Entity-Relationship model.
• An Entity–relationship model (ER model) describes the structure of a
database with the help of a diagram, which is known as Entity Relationship
Diagram (ER Diagram).
• An ER model is a design or blueprint of a database that can later be
implemented as a database.
• It is a high-level data model. This model is used to define the data elements
and relationship for a specified system.
• It helps developers to design the conceptual design or
you can say the logical design of the system from a data
perspective.
• In ER modeling, the database structure is portrayed as a diagram called an
entity-relationship diagram.
Entity Relationship (ER) Model
• For example, Suppose we design a school database. In
this database, the student will be an entity with attributes
like address, name, id, age, etc. The address can be
another entity with attributes like city, street name,
pin code, etc and there will be a relationship between
them.
Why use ER Diagrams?
• Helps you to define terms related to entity relationship modeling
• Provide a preview of how all your tables should connect, what fields are
going to be on each table
• Helps to describe entities, attributes, relationships
• ER diagrams are translatable into relational tables which allows you to build
databases quickly
• ER diagrams can be used by database designers as a blueprint for
implementing data in specific software applications
• The database designer gains a better understanding of the information to be
contained in the database with the help of ERP diagram
• ERD Diagram allows you to communicate with the logical structure of the
database to users
ER Diagram Symbols

• Rectangles: This Entity Relationship Diagram symbol represents entity types


• Double Rectangles: Weak Entity Sets
• Ellipses : Symbol represent attributes
• Double Ellipses: Represent multi-valued attributes
• Dashed Ellipses: Derived Attributes
• Diamonds: This symbol represents relationship types
• Lines: It links attributes to entity types and entity types with other relationship
types
• Primary key: attributes are underlined
Entity Relationship (ER) Model
• ER diagrams are created
based on three basic
concepts: entities,
attributes, and
relationships.
Component of ER Diagram
• Entity:
• An entity may be any object, class,
person or place. In the ER diagram, an
entity can be represented as rectangles.
• For example: In the following ER
diagram we have two entities Student
and College and these two entities have
many to one relationship as many
students study in a single college.
• Strong Entity :
Component of ER Diagram
• Strong entities are those entity
• Weak Entity: types that have a key attribute.
• An entity that depends on another • The primary key helps in
entity called a weak entity. identifying each entity uniquely.
• The weak entity doesn't contain This can not accept null values.
any key attribute of its own. • It is represented by a rectangle.
• The weak entity is represented by a
double rectangle.
• For example – a bank account
cannot be uniquely identified
without knowing the bank to which
the account belongs, so bank
account is a weak entity.
Component of ER Diagram
• Attribute
• An attribute describes the property of an entity. An attribute is
represented as Oval/Ellipses in an ER diagram.
• For example, id, age, contact number, name, etc. can be attributes of a
student.
• There are four types of attributes:
1. Key attribute
2. Composite attribute
3. Multivalued attribute
4. Derived attribute
Component of ER Diagram
• Attribute
• 1. Key attribute:
• A key attribute can uniquely identify an
entity from an entity set.
• It represents a primary key.
• For example, student roll number can
uniquely identify a student from a set of
students. Key attribute is represented by oval
same as other attributes however the text of
key attribute is underlined.
Component of ER Diagram
• Attribute
• 2. Composite attribute:
• An attribute that is a combination of
other attributes is known as composite
attribute.
• An attribute that composed of many
other attributes is known as a composite
attribute.
• For example, In student entity, the
student address is a composite attribute
as an address is composed of other
attributes such as pin code, state,
country.
Component of ER Diagram
• Attribute
• 3. Multivalued attribute:
• An attribute that can hold multiple
values is known as multivalued
attribute/An attribute can have more
than one value.
• It is represented with double ovals in
an ER Diagram.
• For example – A student can have
more than one phone number so the
phone number attribute is
multivalued.
Component of ER Diagram
• Attribute
• 4. Derived attribute:
• A derived attribute is one whose value
is dynamic and derived from another
attribute.
• It is represented by dashed oval in an
ER Diagram.
• For example – Person age is a derived
attribute as it changes over time and
can be derived from another attribute
(Date of birth).
Component of ER Diagram
• Attribute
• A relationship is used to describe the
relation between entities. • 1. One to One Relationship
• A relationship is represented by • When a single instance of an
diamond shape in ER diagram. entity is associated with a single
instance of another entity then it
• There are four types of is called one to one relationship.
relationships/Cardinality
• For example, a person has only
1. One to One one passport and a passport is
2. One to Many given to one person.
3. Many to One
4. Many to Many
Component of ER Diagram
• Attribute
• 2. One to Many Relationship • 3. Many-to-one relationship
• When only one instance of the entity • When more than one instance of
on the left, and more than one instance the entity on the left, and only
of an entity on the right associates with one instance of an entity on the
the relationship then this is known as a right associates with the
one-to-many relationship. relationship then it is known as a
• For example, Scientist can invent many many-to-one relationship.
inventions, but the invention is done by • For example – many students
the only specific scientist. can study in a single college but
a student cannot study in many
colleges at the same time.
Component of ER Diagram
• 4. Many to Many Relationship
• When more than one instances of an entity is associated with more
than one instances of another entity then it is called many to many
relationship.
• For example, Employee can assign by many projects and project can
have many employees.
How to Create an Entity Relationship Diagram (ERD)

• In a university, a Student enrolls


in Courses. A student must be
assigned to at least one or more
Courses. Each course is taught by
a single Professor. To maintain
instruction quality, a Professor
can deliver only one course.
How to Create an Entity Relationship Diagram (ERD)

• Step 1) Entity Identification


• In a university, a Student
enrolls in Courses. A student • We have three entities
must be assigned to at least • Student
one or more Courses. Each • Course
course is taught by a single
Professor. To maintain • Professor
instruction quality, a
Professor can deliver only
one course.
How to Create an Entity Relationship Diagram (ERD)

• Step 2) Relationship Identification


• In a university, a Student
enrolls in Courses. A student • We have the following two relationships
must be assigned to at least • The student is assigned a course
one or more Courses. Each • Professor delivers a course
course is taught by a single
Professor. To maintain
instruction quality, a
Professor can deliver only
one course.
How to Create an Entity Relationship Diagram (ERD)

• Step 3) Cardinality Identification


• In a university, a Student
enrolls in Courses. A student • For them problem statement we know that,
must be assigned to at least • A student can be assigned multiple courses
one or more Courses. Each • A Professor can deliver only one course
course is taught by a single
Professor. To maintain
instruction quality, a
Professor can deliver only
one course.
How to Create an Entity Relationship Diagram (ERD)
Entity Primary Key Attribute
Student Student_ID StudentName
ProfessorNam
Professor Employee_ID
e
• Step 4) Identify Attributes Course Course_ID CourseName
• You need to study the files, forms, reports, data currently maintained by the organization to identify
attributes. You can also conduct interviews with various stakeholders to identify entities. Initially, it’s
important to identify the attributes without mapping them to a particular entity.
• Once, you have a list of Attributes, you need to map them to the identified entities. Once the mapping is
done, identify the primary Keys. If a unique key is not readily available, create one.
How to Create an Entity Relationship Diagram (ERD)
Entity Primary Key Attribute
Student Student_ID StudentName
ProfessorNam
Professor Employee_ID
e
• Step 5) Create the ERD Diagram Course Course_ID CourseName
Sample ER Diagram for Student Database
ER to Table
Conversion
Extended ER Model

• EER is a high-level data model that incorporates the


extensions to the original ER model. Enhanced ERD are
high level models that represent the requirements and
complexities of complex database.
• In addition to ER model concepts EE-R includes −
• Subclasses and Super classes.
• Specialization and Generalization.
• Category or union type.
• Aggregation.
Extended ER Model

• Subclasses and Super class


• Super class is an entity that can be divided into further subtype.
• For example − consider Shape super class.
• Super class shape has sub groups: Triangle, Square and Circle.
• Sub classes are the group of entities with some unique attributes.
• Sub class inherits the properties and attributes from super class.
Extended ER Model
• Generalization
• Generalization is like a bottom-up approach in which
two or more entities of lower level combine to form a
higher level entity if they have some attributes in
common.
• Generalization is more like subclass and superclass
system, but the only difference is the approach.
Generalization uses the bottom-up approach.
• For example, Faculty and Student entities can be
generalized and create a higher level entity Person.
Extended ER Model
• Specialization
• Specialization is a top-down approach, and it is opposite to
Generalization. In specialization, one higher level entity can be
broken down into two lower level entities.
• Specialization is used to identify the subset of an entity set
that shares some distinguishing characteristics.
• Normally, the superclass is defined first, the subclass and its
related attributes are defined next, and relationship set are then
added.
• For example: In an Employee management system, EMPLOYEE
entity can be specialized as TESTER or DEVELOPER based on
what role they play in the company.
Extended ER Model
• Aggregation
• Aggregation is a process in which a single entity alone is not able to
make sense in a relationship so the relationship of two entities acts as
one entity.
• Aggregation can be defined as a procedure for combining multiple
entities into a single one. In database management, it is a design system
performed to model relationships between a group of entities and
another relationship. Its main motive is treating these relationships as a
single one.
• In real world, we know that a manager not only manages the employee working
under them but he has to manage the project as well. In such scenario if entity
“Manager” makes a “manages” relationship with either “Employee” or “Project”
entity alone then it will not make any sense because he has to manage both. In
these cases the relationship of two entities acts as one entity. In our example, the
relationship “Works-On” between “Employee” & “Project” acts as one entity that
has a relationship “Manages” with the entity “Manager”.
Extended ER Model
• Category or Union
• Relationship of one super or sub class with more than
one super class.

• Owner is the subset of two super class: Vehicle and House.


ER Model: Keys
• Keys are very important part of Relational database model. They are used to
establish and identify relationships between tables and also to uniquely identify any
record or row of data inside a table.
• A Key can be a single attribute or a group of attributes, where the combination may
act as a key.
• Why are the Keys Required?
• It is used to uniquely identify any record or row of data from the table.
• A table in a database represents a collection of records. Tables generally extends to
thousands of records stored in them, unsorted and unorganized.
• Now to fetch any particular record from such dataset, you will have to apply some
conditions, but what if there is duplicate data present and every time you try to
fetch some data by applying certain condition, you get the wrong data. How many
trials before you get the right data?
• To avoid all this, Keys are defined to easily identify any row of data in a table.
ER Model : Types of Keys

• Primary Key
• It is the first key used to identify one and only one
instance of an entity uniquely. An entity can contain
multiple keys.
• In the EMPLOYEE table, ID can be the primary key
since it is unique for each employee. In the EMPLOYEE
table, we can even select License_Number and
Passport_Number as primary keys since they are also
unique.
• For each entity, the primary key selection is based on
requirements and developers.
ER Model : Types of Keys

• 2. Candidate key
• A candidate key is an attribute or set of attributes that can
uniquely identify a tuple.
• Except for the primary key, the remaining attributes are
considered a candidate key. The candidate keys are as
strong as the primary key.
• For example: In the EMPLOYEE table, id is best suited for
the primary key. The rest of the attributes, like SSN,
Passport_Number, License_Number, etc., are considered a
candidate key.
• Primary key column value can not be null, Candidate key
column can have null value.
ER Model : Types of Keys

• 3. Super Key
• Super key is an attribute set that can uniquely identify
a tuple. A super key is a superset of a candidate key.
• The super key would be EMPLOYEE-ID (EMPLOYEE_ID,
EMPLOYEE-NAME), etc.
• It is the superset of all such attributes that can
uniquely identify the table. It is the subset or the part
of the Super key.
ER Model : Types of Keys

• 4. Foreign key
• Foreign keys are the column of the table used to point
to the primary key of another table.
• Every employee works in a specific department in a
company, and employee and department are two
different entities. So we can't store the department's
information in the employee table. That's why we link
these two tables through the primary key of one table.
• We add the primary key of the DEPARTMENT table,
Department_Id, as a new attribute in the EMPLOYEE
table.
• In the EMPLOYEE table, Department_Id is the foreign
key, and both the tables are related.
ER Model : Types of Keys

• 5. Alternate key
• The candidate key which are not selected as
primary key are known as secondary keys or
alternative keys.
• One key is chosen as the primary key from candidate
keys, and the remaining candidate key, if it exists, is
termed the alternate key. In other words, the total
number of the alternate keys is the total number of
candidate keys minus the primary key.
• For example, employee relation has two attributes,
Employee_Id and PAN_No, that act as candidate
keys. In this relation, Employee_Id is chosen as the
primary key, so the other candidate key, PAN_No,
acts as the Alternate key.
ER Model : Types of Keys

• 6. Composite key
• Whenever a primary key consists of more than
one attribute, it is known as a composite key.
This key is also known as Concatenated Key.
• But the attributes which together form
the Composite key are not a key independently
or individually.
• For example, Score table which stores the marks
scored by a student in a particular subject.
• In this table student_id and subject_id together
will form the primary key, hence it is a composite
key.
ER Model : Types of Keys

• 7. Artificial key
• The key created using arbitrarily assigned data
are known as artificial keys. These keys are
created when a primary key is large and complex
and has no relationship with many other
relations.
• The data values of the artificial keys are usually
numbered in a serial order.
• For example, the primary key, which is composed
of Emp_ID, Emp_role, and Proj_ID, is large in
employee relations. So it would be better to add
a new virtual attribute to identify each tuple in
the relation uniquely.
Relational Model
• Relational Model was proposed by E.F. Codd to model data in the
form of relations or tables.
• After designing the conceptual model of Database using ER diagram,
we need to convert the conceptual model in the relational model
which can be implemented using any RDBMS languages like Oracle
SQL, MySQL etc.
• Relational Model represents how data is stored in Relational
Databases. A relational database stores data in the form of relations
(tables).
ROLL ADDRES AG
NAME PHONE
_NO S E

Relational Model 1 RAM DELHI

GURGAO
9455123451 18

2 RAMESH 9652431543 18
N

• Consider a relation STUDENT with attributes 3 SUJIT ROHTAK 9156253131 20

• ROLL_NO, NAME, ADDRESS, PHONE and AGE shown4 in Table. SURESH DELHI 18

• IMPORTANT TERMINOLOGIES
• Attribute: Attributes are the properties that define a relation.
e.g.; ROLL_NO, NAME
• Relation Schema: A relation schema represents name of the relation with its
attributes. e.g.; STUDENT (ROLL_NO, NAME, ADDRESS, PHONE and AGE) is
relation schema for STUDENT. If a schema has more than 1 relation, it is called
Relational Schema.
• Tuple: Each row in the relation is known as tuple. The above relation contains
945512345
4 tuples, one of which is shown as: 1 RAM DELHI
1
18
ROLL ADDRES AG
NAME PHONE
_NO S E

Relational Model 1 RAM DELHI 9455123451 18

• Relation Instance: The set of tuples of a relation at a GURGAO


particular instance of time is called as relation instance. Table 2 RAMESH
N
9652431543 18

shows the relation instance of STUDENT at a particular time. It 3 SUJIT ROHTAK 9156253131 20
can change whenever there is insertion, deletion or updation
in the database. 4 SURESH DELHI 18

• Degree: The number of attributes in the relation is known as


degree of the relation. The STUDENT relation defined above ROLL_N
has degree 5. O

• Cardinality: The number of tuples in a relation is known as 1


cardinality. The STUDENT relation defined above has
cardinality 4. 2

• Column: Column represents the set of values for a particular 3

attribute. The column ROLL_NO is extracted from relation 4


STUDENT.
• NULL Values: The value which is not known or unavailable is
called NULL value. It is represented by blank space. e.g.;
PHONE of STUDENT having ROLL_NO 4 is NULL.
ROLL ADDRES AG
NAME PHONE
_NO S E

Relational Model 1 RAM DELHI 9455123451 18


• Constraints in Relational Model
GURGAO
2 RAMESH 9652431543 18
• While designing Relational Model, we define some conditions N

which must hold for data present in database are called 3 SUJIT ROHTAK 9156253131 20
Constraints. These constraints are checked before performing
any operation (insertion, deletion and updation) in database. 4 SURESH DELHI 18

If there is a violation in any of constrains, operation will fail.

• Domain Constraints: These are attribute level constraints. An attribute can only take
values which lie inside the domain range. e.g,; If a constrains AGE>0 is applied on
STUDENT relation, inserting negative value of AGE will result in failure.
• Key Integrity: Every relation in the database should have at least one set of attributes
which defines a tuple uniquely. Those set of attributes is called key. e.g.; ROLL_NO in
STUDENT is a key. No two students can have same roll number. So a key has two
properties:
• It should be unique for all tuples.
• It can’t have NULL values.
• Constraints in Relational Model
Relational Model
• Referential Integrity: When one attribute of a relation can only take values
from other attribute of same relation or any other relation, it is called
referential integrity. Let us suppose we have 2 relations
STUDENT
ROLL_NO NAME ADDRESS PHONE AGE
BRANCH_CO BRANCH
DE
BRANCH_CO
BRANCH_NAME
DE
1 RAM DELHI 9455123451 18 CS
CS COMPUTER SCIENCE
RAMES
2 GURGAON 9652431543 18 CS
H
IT INFORMATION TECHNOLOGY

3 SUJIT ROHTAK 9156253131 20 ECE


ELECTRONICS AND COMMUNICATION
ECE
ENGINEERING
SURES
4 DELHI 18 IT
H
CV CIVIL ENGINEERING

• BRANCH_CODE of STUDENT can only take the values which are present in BRANCH_CODE of
BRANCH which is called referential integrity constraint.
• The relation which is referencing to other relation is called REFERENCING RELATION (STUDENT
in this case) and the relation to which other relations refer is called REFERENCED RELATION
(BRANCH in this case).
Relational • ANOMALIES
• An anomaly is an irregularity, or something which deviates from the expected or normal state.
Model When designing databases, we identify three types of anomalies: Insert, Update and Delete.
• Insertion Anomaly in Referencing Relation:
• We can’t insert a row in REFERENCING RELATION if referencing attribute’s value is not present in
REFERENCING referenced attribute value. e.g.; Insertion of a student with BRANCH_CODE ‘ME’ in STUDENT
relation will result in error because ‘ME’ is not present in BRANCH_CODE of BRANCH.
RELATION
(STUDENT in • Deletion/ Updation Anomaly in Referenced Relation:
this case) • We can’t delete or update a row from REFERENCED RELATION if value of REFERENCED
ATTRIBUTE is used in value of REFERENCING ATTRIBUTE. e.g; if we try to delete tuple from
BRANCH having BRANCH_CODE ‘CS’, it will result in error because ‘CS’ is referenced by
REFERENCED BRANCH_CODE of STUDENT, but if we try to delete the row from BRANCH with BRANCH_CODE
CV, it will be deleted as the value is not been used by referencing relation. It can be handled by
RELATION following method:
(BRANCH in • ON DELETE CASCADE: It will delete the tuples from REFERENCING RELATION if value used by
REFERENCING ATTRIBUTE is deleted from REFERENCED RELATION. e.g;, if we delete a row from
this case) BRANCH with BRANCH_CODE ‘CS’, the rows in STUDENT relation with BRANCH_CODE CS
(ROLL_NO 1 and 2 in this case) will be deleted.
• ON UPDATE CASCADE: It will update the REFERENCING ATTRIBUTE in REFERENCING RELATION if
attribute value used by REFERENCING ATTRIBUTE is updated in REFERENCED RELATION. e.g;, if
we update a row from BRANCH with BRANCH_CODE ‘CS’ to ‘CSE’, the rows in STUDENT relation
with BRANCH_CODE CS (ROLL_NO 1 and 2 in this case) will be updated with BRANCH_CODE
‘CSE’.
Codd's Rules
• E.F Codd was a Computer Scientist who
invented the Relational model for Database
management. Based on relational model,
the Relational database was created.
• Dr Edgar F. Codd, after his extensive research
on the Relational Model of database systems,
came up with twelve rules of his own, which
according to him, a database must obey in
order to be regarded as a true relational
database.
• These rules can be applied on any database
system that manages stored data using only
its relational capabilities.
• This is a foundation rule, which acts as a base
for all the other rules.
Codd's Rules
• Rule 0: The Foundation Rule
• The database must be in relational form. So that the system can handle the database
through its relational capabilities.
• Rule 1: Information Rule
• A database contains various information, and this information must be stored in each
cell of a table in the form of rows and columns.
• Rule 2: Guaranteed Access Rule
• Every single or precise data (atomic value) may be accessed logically from a relational
database using the combination of primary key value, table name, and column
name.
• Rule 3: Systematic Treatment of Null Values
• This rule defines the systematic treatment of Null values in database records. The null
value has various meanings in the database, like missing the data, no value in a cell,
inappropriate information, unknown data and the primary key should not be null.
Codd's Rules
• Rule 4: Active/Dynamic Online Catalog based on the relational model
• It represents the entire logical structure of the descriptive database that must be stored online and
is known as a database dictionary. It authorizes users to access the database and implement a similar
query language to access the database.
• Rule 5: Comprehensive Data Sub Language Rule
• The relational database supports various languages, and if we want to access the database, the
language must be the explicit, linear or well-defined syntax, character strings and supports the
comprehensive: data definition, view definition, data manipulation, integrity constraints, and limit
transaction management operations. If the database allows access to the data without any language,
it is considered a violation of the database.
• Rule 6: View Updating Rule
• All views table can be theoretically updated and must be practically updated by the database
systems.
• Rule 7: Relational Level Operation (High-Level Insert, Update and delete) Rule
• A database system should follow high-level relational operations such as insert, update, and delete
in each level or a single row. It also supports union, intersection and minus operation in the
database system.
Codd's Rules
• Rule 8: Physical Data Independence Rule
• All stored data in a database or an application must be physically independent to access the
database. Each data should not depend on other data or an application. If data is updated
or the physical structure of the database is changed, it will not show any effect on
external applications that are accessing the data from the database.
• Rule 9: Logical Data Independence Rule
• It is similar to physical data independence. It means, if any changes occurred to the logical
level (table structures), it should not affect the user's view (application). For example,
suppose a table either split into two tables, or two table joins to create a single table,
these changes should not be impacted on the user view application.
• Rule 10: Integrity Independence Rule
• The database should be able to enforce its own integrity rather than using other programs.
Key and Check constraints, trigger etc, should be stored in Data Dictionary. All entered
values should not be changed or rely on any external factor or application to maintain
integrity. This also make RDBMS independent of front-end.
Codd's Rules
• Rule 11: Distribution Independence Rule
• The distribution independence rule represents a database that must work
properly, even if it is stored in different locations and used by different end-
users. Suppose a user accesses the database through an application; in that case,
they should not be aware that another user uses particular data, and the data
they always get is only located on one site. This lays the foundation
of distributed database.
• Rule 12: Non Subversion Rule
• The non-subversion rule defines RDBMS as a SQL language to store and
manipulate the data in the database. If a system has a low-level or separate
language other than SQL to access the database system, it should not subvert
or bypass integrity to transform data.

You might also like