Unit 1 Data Modeling
Unit 1 Data Modeling
Data Modeling
• Entity Relationship (ER) Model
• Keys
• Extended ER Model
• Relational Model
• Codd's Rules
Data modeling
• A data model is a collection of conceptual tools for describing data, data
relationships, data semantics, and consistency constraints.
• In simple terms, Data models define how the logical structure of a database
is modeled that is Data models define how data is connected to each other
and how they are processed and stored inside the system.
• Data Models are fundamental entities to introduce abstraction in a DBMS.
• Categories of Data Models
• High-level or conceptual data models provide concepts that are close to the
way many users perceive data.
• whereas low-level or physical data models provide concepts that describe
the details of how data is stored in the computer.
Entity Relationship (ER) Model
• ER model stands for an Entity-Relationship model.
• An Entity–relationship model (ER model) describes the structure of a
database with the help of a diagram, which is known as Entity Relationship
Diagram (ER Diagram).
• An ER model is a design or blueprint of a database that can later be
implemented as a database.
• It is a high-level data model. This model is used to define the data elements
and relationship for a specified system.
• It helps developers to design the conceptual design or
you can say the logical design of the system from a data
perspective.
• In ER modeling, the database structure is portrayed as a diagram called an
entity-relationship diagram.
Entity Relationship (ER) Model
• For example, Suppose we design a school database. In
this database, the student will be an entity with attributes
like address, name, id, age, etc. The address can be
another entity with attributes like city, street name,
pin code, etc and there will be a relationship between
them.
Why use ER Diagrams?
• Helps you to define terms related to entity relationship modeling
• Provide a preview of how all your tables should connect, what fields are
going to be on each table
• Helps to describe entities, attributes, relationships
• ER diagrams are translatable into relational tables which allows you to build
databases quickly
• ER diagrams can be used by database designers as a blueprint for
implementing data in specific software applications
• The database designer gains a better understanding of the information to be
contained in the database with the help of ERP diagram
• ERD Diagram allows you to communicate with the logical structure of the
database to users
ER Diagram Symbols
• Primary Key
• It is the first key used to identify one and only one
instance of an entity uniquely. An entity can contain
multiple keys.
• In the EMPLOYEE table, ID can be the primary key
since it is unique for each employee. In the EMPLOYEE
table, we can even select License_Number and
Passport_Number as primary keys since they are also
unique.
• For each entity, the primary key selection is based on
requirements and developers.
ER Model : Types of Keys
• 2. Candidate key
• A candidate key is an attribute or set of attributes that can
uniquely identify a tuple.
• Except for the primary key, the remaining attributes are
considered a candidate key. The candidate keys are as
strong as the primary key.
• For example: In the EMPLOYEE table, id is best suited for
the primary key. The rest of the attributes, like SSN,
Passport_Number, License_Number, etc., are considered a
candidate key.
• Primary key column value can not be null, Candidate key
column can have null value.
ER Model : Types of Keys
• 3. Super Key
• Super key is an attribute set that can uniquely identify
a tuple. A super key is a superset of a candidate key.
• The super key would be EMPLOYEE-ID (EMPLOYEE_ID,
EMPLOYEE-NAME), etc.
• It is the superset of all such attributes that can
uniquely identify the table. It is the subset or the part
of the Super key.
ER Model : Types of Keys
• 4. Foreign key
• Foreign keys are the column of the table used to point
to the primary key of another table.
• Every employee works in a specific department in a
company, and employee and department are two
different entities. So we can't store the department's
information in the employee table. That's why we link
these two tables through the primary key of one table.
• We add the primary key of the DEPARTMENT table,
Department_Id, as a new attribute in the EMPLOYEE
table.
• In the EMPLOYEE table, Department_Id is the foreign
key, and both the tables are related.
ER Model : Types of Keys
• 5. Alternate key
• The candidate key which are not selected as
primary key are known as secondary keys or
alternative keys.
• One key is chosen as the primary key from candidate
keys, and the remaining candidate key, if it exists, is
termed the alternate key. In other words, the total
number of the alternate keys is the total number of
candidate keys minus the primary key.
• For example, employee relation has two attributes,
Employee_Id and PAN_No, that act as candidate
keys. In this relation, Employee_Id is chosen as the
primary key, so the other candidate key, PAN_No,
acts as the Alternate key.
ER Model : Types of Keys
• 6. Composite key
• Whenever a primary key consists of more than
one attribute, it is known as a composite key.
This key is also known as Concatenated Key.
• But the attributes which together form
the Composite key are not a key independently
or individually.
• For example, Score table which stores the marks
scored by a student in a particular subject.
• In this table student_id and subject_id together
will form the primary key, hence it is a composite
key.
ER Model : Types of Keys
• 7. Artificial key
• The key created using arbitrarily assigned data
are known as artificial keys. These keys are
created when a primary key is large and complex
and has no relationship with many other
relations.
• The data values of the artificial keys are usually
numbered in a serial order.
• For example, the primary key, which is composed
of Emp_ID, Emp_role, and Proj_ID, is large in
employee relations. So it would be better to add
a new virtual attribute to identify each tuple in
the relation uniquely.
Relational Model
• Relational Model was proposed by E.F. Codd to model data in the
form of relations or tables.
• After designing the conceptual model of Database using ER diagram,
we need to convert the conceptual model in the relational model
which can be implemented using any RDBMS languages like Oracle
SQL, MySQL etc.
• Relational Model represents how data is stored in Relational
Databases. A relational database stores data in the form of relations
(tables).
ROLL ADDRES AG
NAME PHONE
_NO S E
GURGAO
9455123451 18
2 RAMESH 9652431543 18
N
• ROLL_NO, NAME, ADDRESS, PHONE and AGE shown4 in Table. SURESH DELHI 18
• IMPORTANT TERMINOLOGIES
• Attribute: Attributes are the properties that define a relation.
e.g.; ROLL_NO, NAME
• Relation Schema: A relation schema represents name of the relation with its
attributes. e.g.; STUDENT (ROLL_NO, NAME, ADDRESS, PHONE and AGE) is
relation schema for STUDENT. If a schema has more than 1 relation, it is called
Relational Schema.
• Tuple: Each row in the relation is known as tuple. The above relation contains
945512345
4 tuples, one of which is shown as: 1 RAM DELHI
1
18
ROLL ADDRES AG
NAME PHONE
_NO S E
shows the relation instance of STUDENT at a particular time. It 3 SUJIT ROHTAK 9156253131 20
can change whenever there is insertion, deletion or updation
in the database. 4 SURESH DELHI 18
which must hold for data present in database are called 3 SUJIT ROHTAK 9156253131 20
Constraints. These constraints are checked before performing
any operation (insertion, deletion and updation) in database. 4 SURESH DELHI 18
• Domain Constraints: These are attribute level constraints. An attribute can only take
values which lie inside the domain range. e.g,; If a constrains AGE>0 is applied on
STUDENT relation, inserting negative value of AGE will result in failure.
• Key Integrity: Every relation in the database should have at least one set of attributes
which defines a tuple uniquely. Those set of attributes is called key. e.g.; ROLL_NO in
STUDENT is a key. No two students can have same roll number. So a key has two
properties:
• It should be unique for all tuples.
• It can’t have NULL values.
• Constraints in Relational Model
Relational Model
• Referential Integrity: When one attribute of a relation can only take values
from other attribute of same relation or any other relation, it is called
referential integrity. Let us suppose we have 2 relations
STUDENT
ROLL_NO NAME ADDRESS PHONE AGE
BRANCH_CO BRANCH
DE
BRANCH_CO
BRANCH_NAME
DE
1 RAM DELHI 9455123451 18 CS
CS COMPUTER SCIENCE
RAMES
2 GURGAON 9652431543 18 CS
H
IT INFORMATION TECHNOLOGY
• BRANCH_CODE of STUDENT can only take the values which are present in BRANCH_CODE of
BRANCH which is called referential integrity constraint.
• The relation which is referencing to other relation is called REFERENCING RELATION (STUDENT
in this case) and the relation to which other relations refer is called REFERENCED RELATION
(BRANCH in this case).
Relational • ANOMALIES
• An anomaly is an irregularity, or something which deviates from the expected or normal state.
Model When designing databases, we identify three types of anomalies: Insert, Update and Delete.
• Insertion Anomaly in Referencing Relation:
• We can’t insert a row in REFERENCING RELATION if referencing attribute’s value is not present in
REFERENCING referenced attribute value. e.g.; Insertion of a student with BRANCH_CODE ‘ME’ in STUDENT
relation will result in error because ‘ME’ is not present in BRANCH_CODE of BRANCH.
RELATION
(STUDENT in • Deletion/ Updation Anomaly in Referenced Relation:
this case) • We can’t delete or update a row from REFERENCED RELATION if value of REFERENCED
ATTRIBUTE is used in value of REFERENCING ATTRIBUTE. e.g; if we try to delete tuple from
BRANCH having BRANCH_CODE ‘CS’, it will result in error because ‘CS’ is referenced by
REFERENCED BRANCH_CODE of STUDENT, but if we try to delete the row from BRANCH with BRANCH_CODE
CV, it will be deleted as the value is not been used by referencing relation. It can be handled by
RELATION following method:
(BRANCH in • ON DELETE CASCADE: It will delete the tuples from REFERENCING RELATION if value used by
REFERENCING ATTRIBUTE is deleted from REFERENCED RELATION. e.g;, if we delete a row from
this case) BRANCH with BRANCH_CODE ‘CS’, the rows in STUDENT relation with BRANCH_CODE CS
(ROLL_NO 1 and 2 in this case) will be deleted.
• ON UPDATE CASCADE: It will update the REFERENCING ATTRIBUTE in REFERENCING RELATION if
attribute value used by REFERENCING ATTRIBUTE is updated in REFERENCED RELATION. e.g;, if
we update a row from BRANCH with BRANCH_CODE ‘CS’ to ‘CSE’, the rows in STUDENT relation
with BRANCH_CODE CS (ROLL_NO 1 and 2 in this case) will be updated with BRANCH_CODE
‘CSE’.
Codd's Rules
• E.F Codd was a Computer Scientist who
invented the Relational model for Database
management. Based on relational model,
the Relational database was created.
• Dr Edgar F. Codd, after his extensive research
on the Relational Model of database systems,
came up with twelve rules of his own, which
according to him, a database must obey in
order to be regarded as a true relational
database.
• These rules can be applied on any database
system that manages stored data using only
its relational capabilities.
• This is a foundation rule, which acts as a base
for all the other rules.
Codd's Rules
• Rule 0: The Foundation Rule
• The database must be in relational form. So that the system can handle the database
through its relational capabilities.
• Rule 1: Information Rule
• A database contains various information, and this information must be stored in each
cell of a table in the form of rows and columns.
• Rule 2: Guaranteed Access Rule
• Every single or precise data (atomic value) may be accessed logically from a relational
database using the combination of primary key value, table name, and column
name.
• Rule 3: Systematic Treatment of Null Values
• This rule defines the systematic treatment of Null values in database records. The null
value has various meanings in the database, like missing the data, no value in a cell,
inappropriate information, unknown data and the primary key should not be null.
Codd's Rules
• Rule 4: Active/Dynamic Online Catalog based on the relational model
• It represents the entire logical structure of the descriptive database that must be stored online and
is known as a database dictionary. It authorizes users to access the database and implement a similar
query language to access the database.
• Rule 5: Comprehensive Data Sub Language Rule
• The relational database supports various languages, and if we want to access the database, the
language must be the explicit, linear or well-defined syntax, character strings and supports the
comprehensive: data definition, view definition, data manipulation, integrity constraints, and limit
transaction management operations. If the database allows access to the data without any language,
it is considered a violation of the database.
• Rule 6: View Updating Rule
• All views table can be theoretically updated and must be practically updated by the database
systems.
• Rule 7: Relational Level Operation (High-Level Insert, Update and delete) Rule
• A database system should follow high-level relational operations such as insert, update, and delete
in each level or a single row. It also supports union, intersection and minus operation in the
database system.
Codd's Rules
• Rule 8: Physical Data Independence Rule
• All stored data in a database or an application must be physically independent to access the
database. Each data should not depend on other data or an application. If data is updated
or the physical structure of the database is changed, it will not show any effect on
external applications that are accessing the data from the database.
• Rule 9: Logical Data Independence Rule
• It is similar to physical data independence. It means, if any changes occurred to the logical
level (table structures), it should not affect the user's view (application). For example,
suppose a table either split into two tables, or two table joins to create a single table,
these changes should not be impacted on the user view application.
• Rule 10: Integrity Independence Rule
• The database should be able to enforce its own integrity rather than using other programs.
Key and Check constraints, trigger etc, should be stored in Data Dictionary. All entered
values should not be changed or rely on any external factor or application to maintain
integrity. This also make RDBMS independent of front-end.
Codd's Rules
• Rule 11: Distribution Independence Rule
• The distribution independence rule represents a database that must work
properly, even if it is stored in different locations and used by different end-
users. Suppose a user accesses the database through an application; in that case,
they should not be aware that another user uses particular data, and the data
they always get is only located on one site. This lays the foundation
of distributed database.
• Rule 12: Non Subversion Rule
• The non-subversion rule defines RDBMS as a SQL language to store and
manipulate the data in the database. If a system has a low-level or separate
language other than SQL to access the database system, it should not subvert
or bypass integrity to transform data.