RDBMS
RDBMS
Introduction to DBMS– Data and Information - Database – Database Management System – Objectives
- Advantages – Components - Architecture. ER Model: Building blocks of ER Diagram – Relationship
Degree – Classification – ER diagram to Tables – ISA relationship – Constraints – Aggregation and
Composition – Advantages
Introduction to DBMS
Database is a collection of data and Management System is a set of programs to store and
retrieve those data. Based on this we can define DBMS like this: DBMS is a collection of inter-
related data and set of programs to store & access those data in an easy and effective manner.
Storage:
According to the principles of database systems, the data is stored in such a way that it
acquires lot less space as the redundant data (duplicate data) has been removed before storage.
Purpose of Database Systems
The main purpose of database systems is to manage the data. Consider a university that
keeps the data of students, teachers, courses, books etc. To manage this data we need to store this
data somewhere where we can add new data, delete unused data, update outdated data, retrieve
data, to perform these operations on data we need a Database management system that allows us
to store the data in such a way so that all these operations can be performed on the data
efficiently.
Database Applications – DBMS
Applications where we use Database Management Systems are:
1. Telecom:
There is a database to keeps track of the information regarding calls made, network
usage, customer details etc. Without the database systems it is hard to maintain that huge amount
of data that keeps updating every millisecond.
2. Industry:
Where it is a manufacturing unit, warehouse or distribution centre, each one needs a
database to keep the records of ins and outs. For example distribution centre should keep a track
of the product units that supplied into the centre as well as the products that got delivered out
from the distribution centre on each day; this is where DBMS comes into picture.
3. Banking System:
For storing customer info, tracking day to day credit and debit transactions, generating
bank statements etc. All this work has been done with the help of Database management
systems.
4. Sales:
To store customer information, production information and invoice details.
5. Airlines:
To travel though airlines, we make early reservations; this reservation information along
with flight schedule is stored in database.
6. Education sector:
Database systems are frequently used in schools and colleges to store and retrieve the
data regarding student details, staff details, course details, exam details, payroll data, attendance
details, fees details etc. There is a hell lot amount of inter-related data that needs to be stored
and retrieved in an efficient manner.
7. Online shopping:
Online shopping websites such as Amazon, Flip kart etc. These sites store the product
information, customer addresses and preferences, credit details and provide the relevant list of
products based on query. All this involves a Database management system.
2. Data inconsistency:
Data redundancy leads to data inconsistency, lets take the same example that we have
taken above, a student is enrolled for two courses and we have student address stored twice, now
lets say student requests to change his address, if the address is changed at one place and not on
all the records then this can lead to data inconsistency.
3. Data Isolation:
Because data are scattered in various files, and files may be in different formats, writing
new application programs to retrieve the appropriate data is difficult.
5. Atomicity issues:
Atomicity of a transaction refers to “All or nothing”, which means either all the
operations in a transaction executes or none.
The architecture of DBMS depends on the computer system on which it runs. For
example, in client-server DBMS architecture, the database systems at server machine can run
several requests made by client machine
In two-tier architecture, the Database system is present at the server machine and the
DBMS application is present at the client machine, these two machines are connected with each
other through a reliable network as shown in the above diagram.
Whenever client machine makes a request to access the database present at server using a
query language like sql, the server perform the request on the database and returns the
result back to the client. The application connection interface such as JDBC, ODBC are used
for the interaction between server and client.
1. External level
It is also called view level. The reason this level is called “view” is because several users
can view their desired data from this level which is internally fetched from database with the
help of conceptual and internal level mapping.
The user doesn’t need to know the database schema details such as data structure, table
definition etc. user is only concerned about data which is what returned back to the view level
after it has been fetched from database (present at the internal level).
External level is the “top level” of the Three Level DBMS Architecture.
2. Conceptual level:
It is also called logical level. The whole design of the database such as relationship
among data, schema of data etc. are described in this level.
Database constraints and security are also implemented in this level of architecture.
This level is maintained by DBA (database administrator).
3. Internal level:
This level is also known as physical level. This level describes how the data is actually
stored in the storage devices. This level is also responsible for allocating space to the data. This
is the lowest level of the architecture.
Abstraction is one of the main features of database systems. Hiding irrelevant details
from user and providing abstract view of data to users, helps in easy and efficient user- database
interaction. In the previous tutorial, we discussed the three level of DBMS architecture, The top
level of that architecture is “view level”. The view level provides the “view of data” to the users
and hides the irrelevant details such as data relationship, database schema, constraints, security
etc from the user.
1. Data abstraction
2. Instance and schema
Database systems are made-up of complex data structures. To ease the user interaction with database, the developers
hide internal irrelevant details from users. This process of hiding irrelevant data We have three levels of abstraction:
Physical level:
This is the lowest level of data abstraction. It describes how data is actually stored in
database. You can get the complex data structure details at this level.
Logical level:
This is the middle level of 3-level data abstraction architecture. It describes what data is
stored in database.
View level:
Highest level of data abstraction. This level describes the user interaction with database
system.
Example:
Let’s say we are storing customer information in a customer table. At physical
level these records can be described as blocks of storage (bytes, gigabytes, terabytes etc.) in
memory. These details are often hidden from the programmers.
At the logical level these records can be described as fields and attributes along with their
data types, their relationship among each other can be logically implemented. The programmers
generally work at this level because they are aware of such things about database systems.
At view level, user just interact with system with the help of GUI and enter the details at
the screen, they are not aware of how the data is stored and what data is stored; such details are
hidden from them.Such Details from user is called data abstraction.
DBMS Schema
Definition of schema:
Design of a database is called the schema. Schema is of three types: Physical schema,
logical schema and view schema.
For example:
In the following diagram, we have a schema that shows the relationship between three
tables: Course, Student and Section. The diagram only shows the design of the database, it
doesn’t show the data present in those tables. Schema is only a structural view(design) of a
database as shown in the diagram below.
The design of a database at physical level is called physical schema, how the data stored
in blocks of storage is described at this level.
Design of database at logical level is called logical schema, programmers and database
administrators work at this level, at this level data can be described as certain types of data
records gets stored in data structures, however the internal details such as implementation of data
structure is hidden at this level (available at physical level).
Design of database at view level is called view schema. This generally describes end user
interaction with database systems.
To learn more about these schemas, refer 3 level data abstraction architecture.
DBMS Instance
Definition of instance:
The data stored in database at a particular moment of time is called instance of database.
Database schema defines the variable declarations in tables that belong to a particular database;
the value of these variables at a moment of time is called the instance of that database.
For example, lets say we have a single table student in the database, today the table has
100 records, so today the instance of the database has 100 records. Lets say we are going to
add another 100 records in this table by tomorrow so the instance of database tomorrow will
have 200 records in table. In short, at a particular moment the data stored in database is called
the instance that changes over time when we add or delete data from the database.
DBMS languages
Database languages are used to read, update and store data in a database. There are
several such languages that can be used for this purpose; one of them is SQL (Structured Query
Language).
All of these commands either defines or update the database schema that’s why they
come under Data Definition language.
In practical data definition language, data manipulation language and data control
languages are not separate language, rather they are the parts of a single database language such
as SQL.
The storage manager is important because databases typically require a large amount of
storage space. Corporate databases range in size from hundreds of gigabytes to, for the largest
databases, terabytes of data. A gigabyte is approximately 1000 megabytes (actually 1024) (1
billion bytes), and a terabyte is 1 million megabytes (1 trillion bytes). Since the main memory of
computers cannot store this much information, the information is stored on disks. Data are
moved between disk storage and main memory as needed. Since the movement of data to and
from disk is slow relative to the speed of the central processing unit, it is imperative that the
database system structure the data so as to minimize the need to move data between disk and
main memory.
The query processor is important because it helps the database system to simplify and
facilitate access to data. The query processor allows database users to obtain good
Performance while being able to work at the view level and not be burdened with understanding
the physical-level details of the implementation of the system. It is the job of the database system
to translate updates and queries written in a nonprocedural language, at the logical level, into an
efficient sequence of operations at the physical level.
Storage Manager
The storage manager is the component of a database system that provides the interface
between the low-level data stored in the database and the application programs and queries
submitted to the system. The storage manager is responsible for the interaction with the file
manager. The raw data are stored on the disk using the file system provided by the operating
system. The storage manager translates the various DML statements into low-level file-system
commands. Thus, the storage manager is responsible for storing, retrieving, and updating data in
the database.
The storage manager components include:
1. Authorization and integrity manager, which tests for the satisfaction of integrity
constraints and checks the authority of users to access data
2. Transaction manager, which ensures that the database remains in a consistent (correct)
state despite system failures, and that concurrent transaction
executions proceed without conflicting.
3. File manager, which manages the allocation of space on disk storage and the data
structures used to represent information stored on disk.
4. Buffer manager, which is responsible for fetching data from disk storage into main
memory, and deciding what data to cache in main memory. The buffer manager is a
critical part of the database system, since it enables the database to handle data sizes that
are much larger than the size of main memory.
The storage manager implements several data structures as part of the physical system
implementation:
5. Data files, which store the database itself. Data dictionary, which stores metadata about
the structure of the database, in particular the schema of the data base
6. Indices, which can provide fast access to data items. Like the index in this textbook, a
database index provides pointers to those data items that hold a particular value. For
example, we could use an index to find the instructor record with a particular ID, or all
instructor records with a particular name. Hashing is an alternative to indexing that is
faster in some but not all cases.
DML compiler, which translates DML statements in a query language into an evaluation
plan consisting of low-level instructions that the query evaluation engine understands
A query can usually be translated into any of a number of alternative evaluation plans
that all give the same result. The DML compiler also performs query optimization; that is, it
picks the lowest cost evaluation plan from among the alternatives.
Query evaluation engine, which executes low-level instructions generated by the DML
compiler.
Database design
Database design mainly involves the design of the database schema. The design of a
complete database application environment that meets the needs of the enterprise being modeled
requires attention to a broader set of issues. In this text, we focus initially on the writing of
database queries and the design of database schemas.
Design Process
A high-level data model provides the database designer with a conceptual frame work in
which to specify the data requirements of the database users, and how the database will be
structured to fulfill these requirements. The initial phase of database design, then, is to
characterize fully the data needs of the prospective database users. The database designer needs
to interact extensively with domain experts and users to carry out this task. The outcome of this
phase is a specification of user requirements.
A simple ER Diagram:
In the following diagram we have two entities Student and College and their relationship. The
relationship between Student and College is many to one as a college can have many students however a
student cannot study in multiple colleges at the same time. Student entity has attributes such as Stu_Id,
Stu_Name & Stu_Addr and College entity has attributes such as Col_ID & Col_Name.
Here are the geometric shapes and their meaning in an E-R Diagram. We will discuss these terms
in detail in the next section(Components of a ER Diagram) of this guide so don’t worry too much about
these terms now, just go through them once.
Rectangle: Represents Entity sets.
Ellipses: Attributes
Diamonds: Relationship Set
Lines: They link attributes to Entity Sets and Entity sets to Relationship Set Double
Ellipses: Multi valued Attributes
Dashed Ellipses: Derived Attributes
Double Rectangles: Weak Entity Sets
Double Lines: Total participation of an entity in a relationship set
Compounds of a ER Diagram
For example:
In the following ER diagram we have two entities Student and College and these two entities
have many to one relationship as many students study in a single college. We will read more about
relationships later, for now focus on entities.
M 1
Student Study College
Weak Entity:
An entity that cannot be uniquely identified by its own attributes and relies on the relationship
with other entity is called weak entity. The weak entity is represented by a double rectangle. For example
– a bank account cannot be uniquely identified without knowing the bank to which the account belongs,
so bank account is a weak entity.
2. Attribute
An attribute describes the property of an entity. An attribute is represented as Oval in an ER
diagram. There are four types of attributes:
a) Key attribute
b) Composite attribute
c) Multivalued attribute
d) Derived attribute
a) Key attribute:
A key attribute can uniquely identify an entity from an entity set. For example, student roll number can uniquely
identify a student from a set of students. Key attribute is represented by oval same as other attributes however the text
of key attribute is underlined.
b) Composite attribute:
d) Derived attribute:
A derived attribute is one whose value is dynamic and derived from another attribute. It is
represented by dashed oval in an ER Diagram. For example – Person age is a derived attribute as it
changes over time and can be derived from another attribute (Date of birth).
E-R diagram with multivalued and derived attributes:
3. Relationship
A relationship is represented by diamond shape in ER diagram, it shows the relationship among entities.
When a single instance of an entity is associated with a single instance of another entity then it is
called one to one relationship. For example, a person has only one passport and a passport is given to one
person.
1 1
Person has Pass Port
2. One to Many Relationship
When a single instance of an entity is associated with more than one instances of another
entity then it is called one to many relationship. For example – a customer can place
many orders but a order cannot be placed by many customers.
1 M
Cutomer Placed Order
M 1 College
Study
Student
These concepts are used to create E-R diagrams. Subclasses and Super class
Super class is an entity that can be divided into further subtype. For example − consider Shape super
class.
Super class shape has sub groups: Triangle, Square and Circle.
Sub classes are the group of entities with some unique attributes. Sub class
inherits the properties and attributes from super class.
Normalization of Database
Database Normalization is a technique of organizing the data in the
database. Normalization is a systematic approach of decomposing tables to eliminate
data redundancy (repetition) and undesirable characteristics like Insertion, Update
and Deletion Anomalies. It is a multi-step process that puts data into tabular form,
removing duplicated data from the relation tables.
Normalization is used for mainly two purposes, Eliminating redundant (useless) data
Ensuring data dependencies make sense i.e data is
logically stored. Problems without Normalization
If a table is not properly normalized and has data redundancy then it will not
only eat up extra memory space but will also make it difficult to handle and update
the database, without facing data loss. Insertion, Updation and Deletion Anomalies
are very frequent if database is not normalized. To understand these anomalies let us
take an example of a Student table.
Insertion Anomaly
Suppose for a new admission, until and unless a student opts for a branch, data of the student
cannot be inserted, or else we will have to set the branch information