Unit 1
Unit 1
What is Data?
Data is nothing but facts and statistics stored or free flowing over a
network, generally it's raw and unprocessed. For example: When you
visit any website, they might store you IP address, that is data, in return
they might add a cookie in your browser, marking you that you visited
the website, that is data, your name, it's data, your age, it's data.
Data becomes information when it is processed, turning it into
something meaningful. Like, based on the cookie data saved on user's
browser, if a website can analyse that generally men of age 20-25 visit
us more, that is information, derived from the data collected.
What is a Database?
A Database is a collection of related data organised in a way that data
can be easily accessed, managed and updated. Database can be software
based or hardware based, with one sole purpose, storing data.
The Database is an organized collection of structured data to make it
easily accessible, manageable and update. In simple words, you can
say, a database is a place where the data is stored. The best analogy is
the library. The library contains a huge collection of books of different
genres, here the library is database and books are the data.
During early computer days, data was collected and stored on tapes,
which were mostly write-only, which means once data is stored on it,
it can never be read again. They were slow and bulky, and soon
computer scientists realised that they needed a better solution to this
problem.
Larry Ellison, the co-founder of Oracle was amongst the first few,
who realised the need for a software based Database Management
System.
What is DBMS?
DBMS stands for Database Management System. We can break it
like this DBMS= Database + Management System. Database is a
collection of data and Management System is a set of programs to
store and retrieve those data. Based on this we can define
DBMS like this:
A DBMS is a software that allows creation, definition and
manipulation of database, allowing users to store, process and analyse
data easily. DBMS provides us with an interface or a tool, to perform
various operations like creating database, storing data in it, updating
data, creating tables in the database and a lot more.
DBMS also provides protection and security to the databases. It also
maintains data consistency in case of multiple users.
Here are some examples of popular DBMS used these days:
MySql
Oracle
SQL Server
IBM DB2
PostgreSQL
Amazon SimpleDB (cloud based) etc.
What is the need of DBMS?
Database systems are basically developed for large amount of
data. When dealing with huge amount of data, there are two things
that require optimization: Storage of data and retrieval of data.
Storage: According to the principles of database systems, the
data is stored in such a way that it acquires lot less space as the
redundant data (duplicate data) has been removed before storage.
Let’s take a layman example to understand this:
In a banking system, suppose a customer is having two accounts,
one is saving account and another is salary account. Let’s say
bank stores saving account data at one place (these places are
called tables we will learn them later) and salary account data at
another place, in that case if the customer information such as
customer name, address etc. are stored at both places then this is
just a wastage of storage (redundancy/ duplication of data), to
organize the data in a better way the information should be stored
at one place and both the accounts should be linked to that
information somehow. The same thing we achieve in DBMS.
Fast Retrieval of data: Along with storing the data in an
optimized and systematic manner, it is also important that we
retrieve the data quickly when needed. Database systems ensure
that the data is retrieved as quickly as possible.
Advantages of DBMS
Segregation of applicaion program.
Minimal data duplicacy or data redundancy.
Easy retrieval of data using the Query Language.
Reduced development time and maintainance need.
With Cloud Datacenters, we now have Database Management
Systems capable of storing almost infinite data.
Seamless integration into the application programming languages
which makes it very easier to add a database to almost any
application or website.
Disadvantages of DBMS
It's Complexity
Except MySQL, which is open source, licensed DBMSs are
generally costly.
They are large in size.
DBMS vs. File System
1. Hardware
2. Software
3. Data
4. Procedures
5. Database Access Language
Let's have a simple diagram to see how they all fit together to form a database
management system.
DBMS Architecture
o The DBMS design depends upon its architecture. The basic client/server
architecture is used to deal with a large number of PCs, web servers,
database servers and other components that are connected with networks.
o The client/server architecture consists of many PCs and a workstation
which are connected via the network.
o DBMS architecture depends upon how users are connected to the
database to get their request done.
Types of DBMS Architecture
1-Tier Architecture
In this architecture, the database is directly available to the user. It means the
user can directly sit on the DBMS and uses it.
Any changes done here will directly be done on the database itself. It doesn't
provide a handy tool for end users.
The 1-Tier architecture is used for development of the local application, where
programmers can directly communicate with the database for the quick
response.
2-Tier Architecture
o The 2-Tier architecture is same as basic client-server. In the two-tier
architecture, applications on the client end can directly communicate with
the database at the server side. For this interaction, API's
like: ODBC, JDBC are used.
o The user interfaces and application programs are run on the client-side.
o The server side is responsible to provide the functionalities like: query
processing and transaction management.
o To communicate with the DBMS, client-side application establishes a
connection with the server side.
3-Tier Architecture
o The 3-Tier architecture contains another layer between the client and
server. In this architecture, client can't directly communicate with the
server.
o The application on the client-end interacts with an application server
which further communicates with the database system.
o End user has no idea about the existence of the database beyond the
application server. The database also has no idea about any other user
beyond the application.
o The 3-Tier architecture is used in case of large web application.
Fig: 3-tier Architecture
Logical level: This is the middle level of 3-level data abstraction architecture. It
describes what data is stored in database.
View level: Highest level of data abstraction. This level describes the user
interaction with database system.
Database Schema
A database schema is the skeleton structure that represents the logical view of the
entire database. It defines how the data is organized and how the relations among
them are associated. It formulates all the constraints that are to be applied on the
data.
A database schema defines its entities and the relationship among them. It contains
a descriptive detail of the database, which can be depicted by means of schema
diagrams. It’s the database designers who design the schema to help programmers
understand the database and make it useful.
Fig 1.5(a)
2. Physical data independence is the capacity to change the internal schema without
having to change the conceptual schema. Hence, the external schemas need not be
changed as well. Changes to the internal schema may be needed because some
physical files were reorganized—for example, by creating additional access
structures—to improve the performance of retrieval or update. If the same data as
before remains in the database, we should not have to change the conceptual
schema. For example, providing an access path to improve retrieval speed of section
records (Figure 1.2) by semester and year should not require a query such as list all
sections offered in fall 2008 to be changed, although the query would be executed
more efficiently by the DBMS by utilizing the new access path.
Generally, physical data independence exists in most databases and file
environments where physical details such as the exact location of data on disk, and
hardware details of storage encoding, placement, compression, splitting, merging of
records, and so on are hidden from the user. Applications remain unaware of these
details. On the other hand, logical data independence is harder to achieve because it
allows structural and constraint changes without affecting application programs—a
much stricter requirement.
Whenever we have a multiple-level DBMS, its catalog must be expanded to include
information on how to map requests and data among the various levels. The DBMS
uses additional software to accomplish these mappings by referring to the mapping
information in the catalog. Data independence occurs because when the schema is
changed at some level, the schema at the next higher level remains unchanged; only
the mapping between the two levels is changed. Hence, application programs
referring to the higher-level schema need not be changed.
The three-schema architecture can make it easier to achieve true data independence,
both physical and logical. However, the two levels of mappings create an overhead
during compilation or execution of a query or program, leading to inefficiencies in the
DBMS. Because of this, few DBMSs have implemented the full three schema
architecture.
The DBMS must provide appropriate languages and interfaces for each category of
users. In this section we discuss the types of languages and interfaces provided by a
DBMS and the user categories targeted by each interface.
1. DBMS Languages
Once the design of a database is completed and a DBMS is chosen to implement the
database, the first step is to specify conceptual and internal schemas for the database
and any mappings between the two. In many DBMSs where no strict separation of
levels is maintained, one language, called the data definition language (DDL), is
used by the DBA and by database designers to define both schemas. The DBMS will
have a DDL compiler whose function is to process DDL statements in order to
identify descriptions of the schema constructs and to store the schema description in
the DBMS catalog.
Once the database schemas are compiled and the database is populated with data,
users must have some means to manipulate the database. Typical manipulations
include retrieval, insertion, deletion, and modification of the data. The DBMS pro-
vides a set of operations or a language called the data manipulation
language (DML) for these purposes.
In current DBMSs, the preceding types of languages are usually not considered
distinct languages; rather, a comprehensive integrated language is used that
includes constructs for conceptual schema definition, view definition, and data
manipulation. Storage definition is typically kept separate, since it is used for
defining physical storage structures to fine-tune the performance of the database
system, which is usually done by the DBA staff. A typical example of a
comprehensive database language is the SQL relational database language, which
represents a combination of DDL, VDL, and DML, as well as statements for
constraint specification, schema evolution, and other features. The SDL was a
component in early versions of SQL but has been removed from the language to
keep it at the conceptual and external levels only.
There are two main types of DMLs. A high-level or nonprocedural DML can be
used on its own to specify complex database operations concisely. Many DBMSs
allow high-level DML statements either to be entered interactively from a display
monitor or terminal or to be embedded in a general-purpose programming language.
In the latter case, DML statements must be identified within the program so that they
can be extracted by a pre compiler and processed by the DBMS. A low-
level or procedural DML must be embedded in a general-purpose
programming language. This type of DML typically retrieves individual records or
objects from the database and processes each separately. Therefore, it needs to use
programming language constructs, such as looping, to retrieve and process each
record from a set of records. Low-level DMLs are also called record-at-a-
time DMLs because of this property. DL/1, a DML designed for the hierarchical
model, is a low-level DML that uses commands such as GET UNIQUE, GET NEXT,
or GET NEXT WITHIN PARENT to navigate from record to record within a hierarchy of
records in the database. High-level DMLs, such as SQL, can specify and retrieve
many records in a single DML statement; therefore, they are called set-at-a-
time or set-oriented DMLs. A query in a high-level DML often specifies which data
to retrieve rather than how to retrieve it; therefore, such languages are also
called declarative.
Whenever DML commands, whether high level or low level, are embedded in a
general-purpose programming language, that language is called the host
language and the DML is called the data sublanguage. On the other hand, a high-
level DML used in a standalone interactive manner is called a query language. In
general, both retrieval and update commands of a high-level DML may be used
interactively and are hence considered part of the query language.
Casual end users typically use a high-level query language to specify their requests,
whereas programmers use the DML in its embedded form. For naive and parametric
users, there usually are user-friendly interfaces for interacting with the data-base;
these can also be used by casual users or others who do not want to learn the details
of a high-level query language.
Interfaces in DBMS
A database management system (DBMS) interface is a user interface which allows
for the ability to input queries to a database without using the query language itself.
User-friendly interfaces provide by DBMS may include the following:
1. Menu-Based Interfaces for Web Clients or Browsing –
These interfaces present the user with lists of options (called menus) that lead
the user through the formation of a request. Basic advantage of using menus is
that they removes the tension of remembering specific commands and syntax
of any query language, rather than query is basically composed step by step by
collecting or picking options from a menu that is basically shown by the system.
Pull-down menus are a very popular technique in Web based interfaces. They
are also often used in browsing interface which allow a user to look through the
contents of a database in an exploratory and unstructured manner.
2. Forms-Based Interfaces –
A forms-based interface displays a form to each user. Users can fill out all of
the form entries to insert a new data, or they can fill out only certain entries, in
which case the DBMS will redeem same type of data for other remaining
entries. This type of forms are usually designed or created and programmed for
the users that have no expertise in operating system. Many DBMSs have forms
specification languages which are special languages that help specify such
forms.
Example: SQL* Forms is a form-based language that specifies queries using a
form designed in conjunction with the relational database schema.
3. Graphical User Interface –
A GUI typically displays a schema to the user in diagrammatic form. The user
then can specify a query by manipulating the diagram. In many cases, GUI’s
utilize both menus and forms. Most GUIs use a pointing device such as
mouse, to pick certain part of the displayed schema diagram.
Most database system contains privileged commands that can be used only by
the DBA’s staff. These include commands for creating accounts, setting system
parameters, granting account authorization, changing a schema, reorganizing
the storage structures of a databases.
SQL | DDL, DQL, DML, DCL and TCL Commands
Structured Query Language (SQL) as we all know is the database language by the use
of which we can perform certain operations on the existing database and also we can
use this language to create a database. SQL uses certain commands like Create, Drop,
and Insert etc. to carry out the required tasks.
These SQL commands are mainly categorized into four categories as:
1. DDL – Data Definition Language
2. DQL – Data Query Language
3. DML – Data Manipulation Language
4. DCL – Data Control Language
Though many resources claim there to be another category of SQL clauses TCL –
Transaction Control Language. So we will see in detail about TCL as well.
3. DML (Data Manipulation Language) : The SQL commands that deals with the
manipulation of data present in the database belong to DML or Data Manipulation
Language and this includes most of the SQL statements.
Examples of DML:
INSERT – is used to insert data into a table.
UPDATE – is used to update existing data within a table.
DELETE – is used to delete records from a database table.
MERGE – is used to make changes in one table based on values matched
from anther. It can be used to combine insert, update, and delete
operations into one statement.
CALL - statement calls a procedure created by the CREATE
PROCEDURE or DECLARE PROCEDURE statement.
EXPLAIN PLAN - command displays the execution plan chosen by the Oracle
optimizer for SELECT, UPDATE, INSERT, and DELETE statements.
LOCK TABLE - command is used to lock the table named sql_name in either
EXCLUSIVE or SHARE mode. In EXCLUSIVE mode, the data of the table cannot be
read or modified by another transaction. In SHARE mode, the data of the table can
be read by concurrent transactions but modifications are still prohibited.
4. DCL (Data Control Language): DCL includes commands such as GRANT and
REVOKE which mainly deals with the rights, permissions and other controls of the
database system.
Examples of DCL commands:
GRANT-gives user’s access privileges to database.
REVOKE-withdraw user’s access privileges given by using the GRANT
command.
5. TCL (transaction Control Language): TCL commands deals with the transaction
within the database.
Examples of TCL commands:
COMMIT– commits a Transaction.
ROLLBACK– rollbacks a transaction in case of any error occurs.
SAVEPOINT–sets a save point within a transaction.
SET TRANSACTION–specify characteristics for the transaction.
2. Storage Manager :
(a) Authorization and Integrity Manager
(b) Transaction Manager
(c) File Manager
(d) Buffer Manager
3. Data Structure :
(a) Data Files
(b) Data Dictionary
(c) Indices
(d) Statistical Data
• Buffer Manager : It is responsible for fetching data from disk storage into
main memory and deciding what data to cache in memory.
3. Data Structures:
Following data structures are required as a part of the physical system
implementation.
• Data Files: It stores the database.
• Data Dictionary: It stores meta data (data about data) about the
structure of the database.
• Indices: Provide fast access to data items that hold particular values.
• Statistical Data: It stores statistical information about the data in the
database. This information is used by query processor to select efficient
ways to execute query.
End User: – They are the real users of the database. They can be
developers, designers, administrators, or the actual users of the
database.
DDL: – Data Definition Language (DDL) is a query fired to create
database, schema, tables, mappings, etc in the database. These are
the commands used to create objects like tables, indexes in the
database for the first time. In other words, they create the structure of
the database.
DDL Compiler: – This part of the database is responsible for
processing the DDL commands. That means this compiler actually
breaks down the command into machine-understandable codes. It is
also responsible for storing the metadata information like table name,
space used by it, number of columns in it, mapping information, etc.
DML Compiler: – When the user inserts, deletes, updates or retrieves
the record from the database, he will be sending requests which he
understands by pressing some buttons. But for the database to
work/understand the request, it should be broken down to object code.
This is done by this compiler. One can imagine this as when a person is
asked some question, how this is broken down into waves to reach the
brain!
Query Optimizer: – When a user fires some requests, he is least
bothered how it will be fired on the database. He is not all aware of the
database or its way of performance. But whatever be the request, it
should be efficient enough to fetch, insert, update, or delete the data
from the database. The query optimizer decides the best way to execute
the user request which is received from the DML compiler. It is similar to
selecting the best nerve to carry the waves to the brain!
Stored Data Manager: – This is also known as Database Control
System. It is one of the main central systems of the database. It is
responsible for various tasks
It converts the requests received from query optimizer to machine-
understandable form. It makes actual requests inside the database.
It is like fetching the exact part of the brain to answer.
It helps to maintain consistency and integrity by applying the
constraints. That means it does not allow inserting/updating /
deleting any data if it has child entry. Similarly, it does not allow
entering any duplicate value into database tables.
It controls concurrent access. If there are multiple users accessing
the database at the same time, it makes sure, all of them see correct
data. It guarantees that there is no data loss or data mismatch
happens between the transactions of multiple users.
It helps to back up the database and recovers data whenever
required. Since it is a huge database and when there is any
unexpected exploit of the transaction, and reverting the changes is
not easy. It maintains the backup of all data so that it can be
recovered.
Data Files: – It has the real data stored in it. It can be stored as
magnetic tapes, magnetic disks, or optical disks.
Compiled DML: – Some of the processed DML statements
(insert, update, delete) are stored in it so that if there are similar
requests, it will be re-used.
Data Dictionary: – It contains all the information about the database.
As the name suggests, it is the dictionary of all the data items. It
contains a description of all the tables, view, materialized
views, constraints, indexes, triggers, etc.
DBMS
Lecture 1: ER Model Basics
What is the ER Model?
ENTITY RELATIONAL (ER) MODEL is a high-level conceptual data model
diagram. ER modeling helps you to analyze data requirements systematically
to produce a well-designed database. The Entity-Relation model represents
real-world entities and the relationship between them. It is considered a best
practice to complete ER modeling before implementing your database.
History of ER models
ER diagrams are a visual tool which is helpful to represent the ER model. It
was proposed by Peter Chen in 1971 to create a uniform convention which
can be used for relational database and network. He aimed to use an ER
model as a conceptual modeling approach.
What is ER Diagrams?
ENTITY-RELATIONSHIP DIAGRAM (ERD) displays the relationships of
entity set stored in a database. In other words, we can say that ER diagrams
help you to explain the logical structure of databases. At first look, an ER
diagram looks very similar to the flowchart. However, ER Diagram includes
many specialized symbols, and its meanings make this model unique.
The purpose of ER Diagram is to represent the entity framework
infrastructure.
Entities
Attributes
Relationships
Example
WHAT IS ENTITY?
A real-world thing either living or non-living that is easily recognizable and
nonrecognizable. It is anything in the enterprise that is to be represented in
our database. It may be a physical thing or simply a fact about the enterprise
or an event that happens in the real world.
An entity can be place, person, object, event or a concept, which stores data
in the database. The characteristics of entities are must have an attribute, and
a unique key. Every entity is made up of some 'attributes' which represent that
entity.
Examples of entities:
Notation of an Entity
Entity set:
Student
An entity set is a group of similar kind of entities. It may contain entities with
attribute sharing similar values. Entities are represented by their properties,
which also called attributes. All attributes have their separate values. For
example, a student entity may have a name, age, class, as attributes.
Example of Entities:
Relationship
Relationship is nothing but an association among two or more entities. E.g.,
Tom works in the Chemistry department.
For example:
Weak Entities
A weak entity is a type of entity which doesn't have its key attribute. It can be
identified uniquely by considering the primary key of another entity. For that,
weak entity sets need to have participation.
In above example, "Trans No" is a discriminator within a group of transactions
in an ATM.
Let's learn more about a weak entity by comparing it with a Strong Entity:
Attributes
It is a single-valued property of either an entity-type or a relationship-type.
For example, a lecture might have attributes: time, date, duration, place, etc.
2. Binary Relationship –
When there are TWO entities set participating in a relation, the relationship is
called as binary relationship.For example, Student is enrolled in Course.
3. n-ary Relationship –
When there are n entities set participating in a relation, the relationship is called as
n-ary relationship.
2. Cardinality
Defines the numerical attributes of the relationship between two entities or entity
sets. Cardinality defines the number of entities in one entity set, which can be
associated with the number of entities of other set via relationship set.
By connectivity we mean how many instances of one entity are associated with
how many instances of other entity in a relationship. Cardinality is used to
specify such connectivity. The connectivity of a relationship describes the
mapping of associated entity instances in the relationship. The values of
connectivity are "one" or "many". The cardinality of a relationship is the actual
number of related occurrences for each of the two entities.
One-to-One Relationships
One-to-Many Relationships
May to One Relationships
Many-to-Many Relationships
1.One-to-one:
One entity from entity set X can be associated with at most one entity of entity
set Y and vice versa.
For example, one Student can have only one college ID at a time.
One entity from entity set A can be associated with at most one entity of entity set B and
vice versa.
A one-to-one (1:1) relationship is when at most one instance of an entity A is
associated with one instance of entity B. For example, take the relationship
between board members and offices, where each office is held by one member
and no member may hold more than one office.
2.One-to-many:
One entity from entity set X can be associated with multiple entities of entity set
Y, but an entity from entity set Y can be associated with at least one entity.
One entity from entity set A can be associated with more than one entities of entity set B
however an entity from entity set B, can be associated with at most one entity.
A one-to-many (1:N) relationship is when for one instance of entity A, there are
zero, one, or many instances of entity B but for one instance of entity B, there
is only one instance of entity A. An example of a 1:N relationships is
3. Many to One
More than one entity from entity set X can be associated with at most one entity
of entity set Y. However, an entity from entity set Y may or may not be
associated with more than one entity from entity set X.
More than one entities from entity set A can be associated with at most one entity of
entity set B, however an entity from entity set B can be associated with more than one
entity from entity set A.
4. Many to Many:
One entity from X can be associated with more than one entity from Y and vice
versa.
Here the cardinality of the relationship from employees to projects is three; from
projects to employees, the cardinality is two. Therefore, this relationship can be
classified as a many-to-many relationship.
3. Participation Constraint:
2. Partial Participation – The entity in the entity set may or may NOT participate in
the relationship. If some courses are not enrolled by any of the student, the
participation of course will be partial.
The diagram depicts the ‘Enrolled in’ relationship set with Student Entity set having
total participation and Course Entity set having partial participation.
Every student in Student Entity set is participating in relationship but there exists a
course C4 which is not taking part in the relationship.
DBMS
Lecture 3: Model an ER Diagram
Student
Course
Professor
Once, you have a list of Attributes, you need to map them to the identified
entities. Ensure an attribute is to be paired with exactly one entity. If you think
an attribute should belong to more than one entity, use a modifier to make it
unique.
Once the mapping is done, identify the primary Keys. If a unique key is not
readily available, create one.
For Course Entity, attributes could be Duration, Credits, Assignments, etc. For
the sake of ease we have considered just one attribute.
Summary
The ER model is a high-level data model diagram
ER diagrams are a visual tool which is helpful to represent the ER
model
Entity relationship diagram displays the relationships of entity set stored
in a database
ER diagrams help you to define terms related to entity relationship
modeling
ER model is based on three basic concepts: Entities, Attributes &
Relationships
An entity can be place, person, object, event or a concept, which stores
data in the database
Relationship is nothing but an association among two or more entities
A weak entity is a type of entity which doesn't have its key attribute
It is a single-valued property of either an entity-type or a relationship-
type
It helps you to defines the numerical attributes of the relationship
between two entities or entity sets
ER- Diagram is a visual representation of data that describe how data is
related to each other
While Drawing ER diagram you need to make sure all your entities and
relationships are properly labeled.
DBMS
Lecture 4: Keys in Relational Model
Keys
Keys are very important part of Relational database model. They are used to establish
and identify relationships between tables and also to uniquely identify any record or row
of data inside a table.
A Key can be a single attribute or a group of attributes, where the combination may act
as a key.
Super Key
Super Key is defined as a set of attributes within a table that can uniquely identify each
record within a table. Super Key is a superset of Candidate key.
In the table defined above super key would include student_id, (student_id,
name), phone etc.
Confused? The first one is pretty simple as student_id is unique for every row of data,
hence it can be used to identity each row uniquely.
Next comes, (student_id, name), now name of two students can be same, but
their student_id can't be same hence this combination can also be a key.
Similarly, phone number for every student will be unique, hence again, phone can also
be a key.
So they all are super keys.
Candidate Key
Candidate keys are defined as the minimal set of fields which can uniquely identify each
record in a table. It is an attribute or a set of attributes that can act as a Primary Key for
a table to uniquely identify each record in that table. There can be more than one
candidate key.
In our example, student_id and phone both are candidate keys for table Student.
A candiate key can never be NULL or empty. And its value should be unique.
Primary Key
Primary key is a candidate key that is most appropriate to become the main key for any
table. It is a key that can uniquely identify each record in a table.
For the table Student we can make the student_id column as the primary key.
Composite Key
Key that consists of two or more attributes that uniquely identify any record in a table is
called Composite key. But the attributes which together form the Composite key are
not a key independentely or individually.
In the above picture we have a Score table which stores the marks scored by a student
in a particular subject.
In this table student_id and subject_id together will form the primary key, hence it is
a composite key.
Non-key Attributes
Non-key attributes are the attributes or fields of a table, other than candidate
key attributes/fields in a table.
Non-prime Attributes
Non-prime Attributes are attributes other than Primary Key attribute(s)..
Extended ER Model concepts:
The ER Model has the power of expressing database entities in a conceptual hierarchical
manner. As the hierarchy goes up, it generalizes the view of entities, and as we go deep in the
hierarchy, it gives us the detail of every entity included.
Going up in this structure is called generalization, where entities are clubbed together to
represent a more generalized view. For example, a particular student named Mira can be
generalized along with all the students. The entity shall be a student, and further, the student
is a person. The reverse is called specialization where a person is a student, and that student is
Mira.
Generalization and Specialization both the terms are more common in Object Oriented
Technology, and they are also used in the Database with the same
features. Generalization occurs when we ignore the differences and acknowledge the
similarities between lower entities or child classes or relations (tables in DBMS) to form a
higher entity.
Generalization
o Generalization is like a bottom-up approach in which two or more entities of lower level
combine to form a higher level entity if they have some attributes in common.
o In generalization, an entity of a higher level can also combine with the entities of the
lower level to form a further higher level entity.
o Generalization is more like subclass and superclass system, but the only difference is the
approach. Generalization uses the bottom-up approach.
o In generalization, entities are combined to form a more generalized entity, i.e.,
subclasses are combined to make a superclass.
o Generalization is the process of extracting common properties from a set of entities and
creating a generalized entity from it.
The process of generalizing entities, where the generalized entities contain the properties of
all the generalized entities, is called generalization. In generalization, a number of entities are
brought together into one generalized entity based on their similar characteristics. For
example, pigeon, house sparrow, crow and dove can all be generalized as Birds.
Specialization
For example, EMPLOYEE entity in an Employee management system can be specialized into
DEVELOPER, TESTER etc. as shown in figure below. In this case, common attributes like
E_NAME, E_SALARY etc. become part of higher entity (EMPLOYEE) and specialized attributes
like TES_TYPE become part of specialized entity (TESTER).
Let us consider another example, an entity set ‘Person’. A person has name, date of birth,
gender, etc. These properties are common in all persons, human beings. But in a company,
persons can be identified as employee, employer, customer, or vendor, based on what role
they play in the company. Similarly, in a school database, persons can be specialized as
teacher, student, or a staff, based on what role they play in school as entities.
For example, the attributes of a Person class such as name, age, and gender can be inherited
by lower-level entities such as Student or Teacher.
In aggregation, the relation between two entities is treated as a single entity. In aggregation,
relationship with its corresponding entities is aggregated into a higher level entity.
For Example, Employee working for a project may require some machinery. So, REQUIRE
relationship is needed between relationship WORKS_FOR and entity MACHINERY. Using
aggregation, WORKS_FOR relationship with its entities EMPLOYEE and PROJECT is aggregated
into single entity and relationship REQUIRE is created between aggregated entity and
MACHINERY.
Another example: Center entity offers the Course entity act as a single entity in the relationship
which is in a relationship with another entity visitor. In the real world, if a visitor visits a
coaching center then he will never enquiry about the Course only or just about the Center
instead he will ask the enquiry about both.
A relationship represents a connection between two entity types that are conceptually at the
same level. Sometimes you may want to model a 'has-a,' 'is-a' or 'is-part-of' relationship, in
which one entity represents a larger entity (the 'whole') that will consist of smaller entities (the
'parts'). This special kind of relationship is termed as an aggregation. Aggregation does not
change the meaning of navigation and routing across the relationship between the whole and
its parts. An example of aggregation is the 'Teacher' entity following the 'syllabus' entity act as a
single entity in the relationship. In simple words, aggregation is a process where the relation
between two entities is treated as a single entity.
Reduction of ER diagram to Tables:
The database can be represented using the notations, and these notations can be reduced to a
collection of tables.
In the database, every entity set or relationship set can be represented in tabular form.
There are some points for converting the ER diagram to the table:
In the given ER diagram, LECTURE, STUDENT, SUBJECT and COURSE forms individual tables.
In the STUDENT entity, STUDENT_NAME and STUDENT_ID form the column of STUDENT table.
Similarly, COURSE_NAME and COURSE_ID form the column of COURSE table and so on.
o A key attribute of the entity type represented by the primary key.
In the given ER diagram, COURSE_ID, STUDENT_ID, SUBJECT_ID, and LECTURE_ID are the key
attribute of the entity.
In the given ER diagram, student address is a composite attribute. It contains CITY, PIN, DOOR#,
STREET, and STATE. In the STUDENT table, these attributes can merge as an individual column.
In the STUDENT table, Age is the derived attribute. It can be calculated at any point of time by
calculating the difference between current date and Date of Birth.
Using these rules, we can convert the ER diagram to tables and columns and assign the
mapping between the tables. Table structure for the given ER diagram is as below:
Relationship of higher degree and mapping constraints:
For Student(SID, Name), SID is the primary key. For Course ( CID, C_name ), CID is the primary
key
Student Course
-------------- -----------------
1 A c1 Z
2 B c2 Y
3 C c3 X
4 D
Enroll
(SID CID)
----------
1 C1
2 C1
3 c3
4 C2
Let us consider the primary key for Enroll SID or CID or combined. We can’t have CID as primary
key as you can see in enroll for the same CID we have multiples SID. (SID , CID) can distinguish
table uniquely, but it is not minimum. So SID is the primary key for the relation enroll.
Student
Enroll
Course
Student_Enroll
---------------------
1 A c1
2 B c1
3 C c3
4 D c2
Student Course
-------------- -----------------
1 A c1 Z
2 B c2 Y
3 C c3 X
4 D
Enroll
( SID CID )
----------
1 C1
1 C2
2 C1
2 C2
3 c3
4 C2
Now, same question what is the primary key of Enroll relation, if we carefully analyse the Enroll
primary key for Enroll
table is ( SID , CID ).
But in this case we can’t merge Enroll table with any one of Student and Course. If we try to
merge Enroll with any one of the Student and Course it will create redundant data.
Since E1 is in total participation, each entry in E1 is related to only one entry in E2, but not all
entries in E2 are related to an entry in E1.
The primary key of E1 should be allowed as the primary key of the reduced table, since if the
primary key of E2 is used, it might have null values for many of its entries in the reduced table.
Primary key of R can be A1 or B1, but we can’t still combine all the three table into one. if we
do, so some entries in combined table may have NULL entries. So idea of merging all three table
into one is not good.
o A mapping constraint is a data constraint that expresses the number of entities to which
another entity can be related via a relationship set.
o It is most useful in describing the relationship sets that involve more than two entity
sets.
o For binary relationship set R on an entity set A and B, there are four possible mapping
cardinalities. These are as follows:
One-to-one
In one-to-one mapping, an entity in E1 is associated with at most one entity in E2, and an entity
in E2 is associated with at most one entity in E1.
One-to-many
In one-to-many mapping, an entity in E1 is associated with any number of entities in E2, and an
entity in E2 is associated with at most one entity in E1.
Many-to-one
In one-to-many mapping, an entity in E1 is associated with at most one entity in E2, and an
entity in E2 is associated with any number of entities in E1.
Many-to-many
In many-to-many mapping, an entity in E1 is associated with any number of entities in E2, and
an entity in E2 is associated with any number of entities in E1.