BSC CsIt Complete RDBMS Notes
BSC CsIt Complete RDBMS Notes
The collection of data, usually referred to as the database, contains information relevant to
an enterprise. The primary goal of a DBMS is to provide a way to store and retrieve database
information that is both convenient and efficient.
Database systems are designed to manage large bodies of information.
Management of data involves both defining structures for storage of information and
providing mechanisms for the manipulation of information.
• Banking: For customer information, accounts, and loans, and banking transactions.
• Airlines: For reservations and schedule information. Airlines were among the first to use
databases in a geographically distributed manner—terminals situated around the world accessed
the central database system through phone lines and other data networks.
• Credit card transactions: For purchases on credit cards and generation of monthly statements.
• Telecommunication: For keeping records of calls made, generating monthly bills, maintaining
balances on prepaid calling cards, and storing information about the communication networks.
• Finance: For storing information about holdings, sales, and purchases of financial instruments
such as stocks and bonds.
• Manufacturing: For management of supply chain and for tracking production of items in
factories, inventories of items in warehouses/stores, and orders for items.
• Human resources: For information about employees, salaries, payroll taxes and benefits, and
for generation of paychecks.
• Data redundancy and inconsistency. Since different programmers create the files and
application programs over a long period, the various files are likely to have different formats and
the programs may be written in several programming languages. Moreover, the same information
may be duplicated in several places (files). For example, the address and telephone number of a
particular customer may appear in a file that consists of savings-account records and in a file that
consists of checking-account records. This redundancy leads to higher storage and access cost. In
addition, it may lead to data inconsistency; that is, the various copies of the same data may no
longer agree. For example, a changed customer address may be reflected in savings-account
records but not elsewhere in the system.
• Difficulty in accessing data The point here is that conventional file-processing environments
do not allow needed data to be retrieved in a convenient and efficient manner. More responsive
data-retrieval systems are required for general use.
• Data isolation. Because data are scattered in various files, and files may be in different
formats, writing new application programs to retrieve the appropriate data is difficult.
• Integrity problems. The data values stored in the database must satisfy certain types of
consistency constraints. For example, the balance of a bank account may never fall below a
prescribed amount (say, $25). Developers enforce these constraints in the system by adding
appropriate code in the various application programs. However, when new constraints are added,
it is difficult to change the programs to enforce them. The problem is compounded when
constraints involve several data items from different files.
• Atomicity problems. A computer system, like any other mechanical or electrical device, is
subject to failure. In many applications, it is crucial that, if a failure occurs, the data be restored
to the consistent state that existed prior to the failure. it must happen in its entirety or not at all. It
is difficult to ensure atomicity in a conventional file-processing system.
• Concurrent-access anomalies. For the sake of overall performance of the system and faster
response, many systems allow multiple users to update the data simultaneously. In such an
environment, interaction of concurrent updates may result in inconsistent data.
• Security problems. Not every user of the database system should be able to access all the data.
For example, in a banking system, payroll personnel need to see only that part of the database
that has information about the various bank employees. They do not need access to information
about customer accounts. But, since application programs are added to the system in an ad hoc
manner, enforcing such security constraints is difficult.
1. Controlling Redundancy
Redundancy in storing the same data multiple times leads to several problems. It is necessary to
use controlled redundancy for improving the performance of queries.
When multiple users share a large database, it is likely that most users will not be authorized to
access all information in the database. For example, financial data is often considered
confidential, and hence only authorized persons are allowed to access such data. A DBMS
should provide a security and authorization subsystem, which the DBA uses to create accounts
and to specify account restrictions.
Databases can be used to provide persistent storage for program objects and data structures. This
is one of the main reasons for object-oriented database systems. Object-oriented database
systems typically offer data structure compatibility with one or more object oriented
programming languages.
A DBMS must provide facilities for recovering from hardware or software failures. The backup
and recovery subsystem of the DBMS is responsible for recovery.
Because many types of users with varying levels of technical knowledge use a database, a
DBMS should provide a variety of user interfaces.
A database may include numerous varieties of data that are interrelated in many ways. A DBMS
must have the capability to represent a variety of complex relationships among the data as well
as to retrieve and update related data easily and efficiently.
Most database applications have certain integrity constraints that must hold for the data. A
DBMS should provide capabilities for defining and enforcing these constraints. The simplest
type of integrity constraint involves specifying a data type for each data item.
Disadvantages of DBMS
1. Cost
DBMS requires high initial investment for hardware, software and trained staff. A significant
investment based upon size and functionality of organization if required. Also organization has
to pay concurrent annual maintenance cost.
2. Complexity
A DBMS fulfill lots of requirement and it solves many problems related to database. But all
these functionality has made DBMS an extremely complex software. Developer, designer, DBA
and End user of database must have complete skills if they want to user it properly. If they don‘t
understand this complex system then it may cause loss of data or database failure.
Any organization have many employees working for it and they can perform many others tasks
too that are not in their domain but it is not easy for them to work on DBMS. A team of technical
staff is required who understand DBMS and company have to pay handsome salary to them too.
4. Database Failure
As we know that in DBMS, all the files are stored in single database so chances of database
failure become more. Any accidental failure of component may cause loss of valuable data. This
is really a big question mark for big firms.
A DBMS requires disk storage for the data and sometimes you need to purchase extra space to
store your data. Also sometimes you need to a dedicated machine for better performance of
database. These machines and storage space increase extra costs of hardware.
Prepared by: Lect. Arohi Patil Page 5
MIT CIDCO DBMS B.Sc. (CS/IT) III Semester
6. Size
As DBMS becomes big software due to its functionalities so it requires lots of space and memory
to run its application efficiently. It gains bigger size as data is fed in it.
Data conversion may require at any time and organization has to take this step. It is unbelievable
that data conversion cost is more than the costs of DBMS hardware and machine combined.
Trained staff is needed to convert data to new system. It is a key reason that most of the
organizations are still working on their old DBMS due to high cost of data conversion.
8. Currency Maintenance
As new threats comes daily, so DBMS requires to updates itself daily. DBMS should be updates
according to the current scenario.
9. Performance
Traditional files system was very good for small organizations as they give splendid
performance. But DBMS gives poor performance for small scale firms as its speed is slow.
Data Abstraction
A major purpose of a database system is to provide users with an abstract view of the data. That
is, the system hides certain details of how the data are stored and maintained.
The need for efficiency has led designers to use complex data structures to represent data in the
database. Since many database-systems users are not computer trained, developers hide the
complexity from users through several levels of abstraction, to simplify users‘ interactions with
the system:
• Physical level. The lowest level of abstraction describes how the data are actually stored. The
physical level describes complex low-level data structures in detail.
• Logical level. The next-higher level of abstraction describes what data are stored in the
database, and what relationships exist among those data. The logical level thus describes the
entire database in terms of a small number of relatively simple structures. Database
Prepared by: Lect. Arohi Patil Page 6
MIT CIDCO DBMS B.Sc. (CS/IT) III Semester
administrators, who must decide what information to keep in the database, use the logical level
of abstraction.
• View level. The highest level of abstraction describes only part of the entire database. Many
users of the database system do not need all this information; instead, they need to access only a
part of the database. The view level of abstraction exists to simplify their interaction with the
system. The system may provide many views for the same database.
Database Languages
A database system provides a data definition language to specify the database schema and a
data manipulation language to express database queries and updates
The following statement in the SQL language defines the account table:
(account-number char(10),
balance integer);
Execution of the above DDL statement creates the account table. In addition, it updates a special
set of tables called the data dictionary or data directory.
The storage structure and access methods used by the database system by a set of statements in a
special type of DDL called a data storage and definition language. These statements define the
implementation details of the database schemas, which are usually hidden from the users.
Data-Manipulation Language
Data manipulation is
• Procedural DMLs require a user to specify what data are needed and how to get those data.
• Declarative DMLs (also referred to as nonprocedural DMLs) require a user to specify what
data are needed without specifying how to get those data.
The DML component of the SQL language is nonprocedural. A query is a statement requesting
the retrieval of information. The portion of a DML that involves information retrieval is called a
query language.
eg.
select customer.customer-name
from customer
1. DDL (Data definition language): The Data definition language DDL contains elementary
commands Specific commands. The purpose of those commands is to specify the set of
definitions of a database schema or plan. The results are stored in data dictionary.
CREATE_TABLE..:
ALTER TABLE…:
CREATE INDEX…:
CREATE VIEW….:
DROP TABLE…:
DROP VIEW…..:
DROP INDEX…:
2. DML (Data manipulation language): DML contains commands that enable users to access
and manipulate the data stored in the database. Normally the commands are categorized into
procedural and non-procedural commands. The procedural DML commands require user to
specify what data are needed and how to get them. The non-procedural DML commands require
the user to specify what data are needed without any specifications of how to get them.
Data manipulation is nothing but to retrieve, insert, delete and modify the data stored in a
database.
SELECT……:
SELECT……
ORDER BY…..:
SELECT …….
GROUP BY……
INSERT INTO…..:
DELETE FROM……:
UPDATE…….:
3. DCL ( Data control language):- DCL contains commands that are used to provide ecurityon
the data contained by the database tables. The data is to be controlled from unauthorized access. The
permissions are granted to read, write and update the data from the table.
GRANT…….:
RECALL……:
COMMIT….:
SAVEPOINT……:
ROLLBACK…….:
DBMS Facilities
Data Definition Facilities –
It allows a database designer to define the database using a Data Definition Language (DDL)
provided for the particular DBMS. The DDL allows the designer to specify the data types and
structures, and the constraints on the data to be stored in the database.
Example:
CREATE_TABLE..:
ALTER TABLE…:
CREATE INDEX…:
CREATE VIEW….:
DROP TABLE…:
DROP VIEW…..:
DROP INDEX…:
Data Manipulation Facilities –
It allows users to insert, update, delete and retrieve data from the database through a Data
Manipulation Language (DML). Having a central repository for all data and data
descriptions allows the DML to provide a general enquiry facility to this data, called a query
language. Using a query language, directly or indirectly, enables new lines of enquiry to be
constructed and satisfied quickly. A query language is sufficiently high level to allow non-
technical personnel to use it, easily. The most common query language is the Structured
Query Language (SQL –pronounced ‗S-Q-L‘).
Example:
SELECT……:
SELECT……
ORDER BY…..:
SELECT …….
GROUP BY……
INSERT INTO…..:
DELETE FROM……:
UPDATE…….:
Help Facilities –
MS Access provides helpful wizards to allow 'novice' users to do a task or even 'expert' users
to do it easier.
Reporting Facilities –
Creating professional looking reports from SELECT statements, another good example of
this is MS Access.
Multi-user Functionality –
Allowing more than one user to access the database simultaneously. Including concurrency
controls such as locking part of the database that is being updated.
Distributed Databases –
CASE Tools –
• Naive users are unsophisticated users who interact with the system by invoking one of the
application programs that have been written previously. For example, a bank teller who needs to
transfer $50 from account A to account B invokes a program called transfer.
The typical user interface for naive users is a forms interface, where the user can fill in
appropriate fields of the form. Naive users may also simply read reports generated from the
database.
application development (RAD) tools are tools that enable an application programmer to
construct forms and reports without writing
a program. There are also special types of programming languages that combine imperative
control structures (for example, for loops, while loops and if-then-else statements) with
statements of the data manipulation language. These languages, sometimes called fourth-
generation languages.
• Sophisticated users interact with the system without writing programs. Instead, they form their
requests in a database query language. They submit each such query to a query processor,
whose function is to break down DML statements into instructions that the storage manager
understands.
Online analytical processing (OLAP) tools simplify analysts‘ tasks by letting them view
summaries of data in different ways.
Another class of tools for analysts is data mining tools, which help them find certain kinds of
patterns in data.
• Specialized users are sophisticated users who write specialized database applications that do
not fit into the traditional data-processing framework. Among these applications are computer-
aided design systems, knowledgebase and expert systems, systems that store data with complex
data types (for example, graphics data and audio data), and environment-modeling systems.
Database Administrator
One of the main reasons for using DBMSs is to have central control of both the data and the
programs that access those data. A person who has such central control over the system is called
a database administrator (DBA). The functions of a DBA include:
• Schema definition. The DBA creates the original database schema by executing a set of data
definition statements in the DDL.
• Schema and physical-organization modification. The DBA carries out changes to the schema
and physical organization to reflect the changing needs of the organization, or to alter the
physical organization to improve performance.
• Granting of authorization for data access. By granting different types of authorization, the
database administrator can regulate which parts of the database various users can access. The
authorization information is kept in a special system structure that the database system consults
whenever someone attempts to access the data in the system.
� Periodically backing up the database, either onto tapes or onto remote servers, to prevent loss
of data in case of disasters such as flooding.
� Ensuring that enough free disk space is available for normal operations, and upgrading disk
space as required.
� Monitoring jobs running on the database and ensuring that performance is not degraded by
very expensive tasks submitted by some users.
Storage Manager
A storage manager is a program module that provides the interface between the low level data
stored in the database and the application programs and queries submitted to the system.
The storage manager is responsible for storing, retrieving, and updating data in the database. The
storage manager components include:
• Authorization and integrity manager, which tests for the satisfaction of integrity constraints
and checks the authority of users to access data.
• Transaction manager, which ensures that the database remains in a consistent (correct) state
despite system failures, and that concurrent transaction executions proceed without conflicting.
• File manager, which manages the allocation of space on disk storage and the data structures
used to represent information stored on disk.
• Buffer manager, which is responsible for fetching data from disk storage into main memory,
and deciding what data to cache in main memory. The buffer manager is a critical part of the
database system, since it enables the database to handle data sizes that are much larger than the
size of main memory. The storage manager implements several data structures as part of the
physical system implementation:
• Data dictionary, which stores metadata about the structure of the database, in
• Indices, which provide fast access to data items that hold particular values.
• DDL interpreter, which interprets DDL statements and records the definitions in the data
dictionary.
• DML compiler, which translates DML statements in a query language into an evaluation plan
consisting of low-level instructions that the query evaluation engine understands. A query can
usually be translated into any of a number of alternative evaluation plans that all give the same
result. The DML compiler also performs
• Query optimization, that is, it picks the lowest cost evaluation plan from among the
alternatives.
• Query evaluation engine, which executes low-level instructions generated by the DML
compiler.
In a two-tier architecture, the application is partitioned into a component that resides at the
client machine, which invokes database system functionality at the server machine through query
language statements. Application program interface standards like ODBC and JDBC are used for
interaction between the client and the server.
In contrast, in a three-tier architecture, the client machine acts as merely a front end and does
not contain any direct database calls. Instead, the client end communicates with an application
server, usually through a forms interface. The application server in turn communicates with a
database system to access data. The business logic of the application, which says what actions to
carry out under what conditions, is embedded in the application server, instead of being
distributed across multiple clients. Three-tier applications are more appropriate for large
applications, and for applications that run on the World Wide Web.
Many Web applications use an architecture called the three-tier architecture, which adds an
intermediate layer between the client and the database server, as illustrated in Figure.
This intermediate layer or middle tier is sometimes called the application server and sometimes
the Web server, depending on the application. This server plays an intermediary role by storing
Prepared by: Lect. Arohi Patil Page 18
MIT CIDCO DBMS B.Sc. (CS/IT) III Semester
business rules (procedures or constraints) that are used to access data from the database server. It
can also improve database security by checking a client's credentials before forwarding a request
to the database server. Clients contain GUI interfaces and some additional application-specific
business rules. The intermediate server accepts requests from the client, processes the request
and sends database commands to the database server, and then acts as a conduit for passing
(partially) processed data from the database server to the clients, where it may be processed
further and filtered to be presented to users in GUI format. Thus, the user interface, application
rules, and data access act as the three tiers.
Advances in encryption and decryption technology make it safer to transfer sensitive data from
server to client in encrypted form, where it will be decrypted. The latter can be done by the
hardware or by advanced software. This technology gives higher levels of data security, but the
network security issues remain a major concern. Various technologies for data compression are
also helping in transferring large amounts of data from servers to clients over wired and wireless
networks.
Data Independence
The three-schema architecture can be used to further explain the concept of data independence,
which can be defined as the capacity to change the schema at one level of a database system
without having to change the schema at the next higher level. We can define two types of data
independence:
1. Logical data independence is the capacity to change the conceptual schema without having
to change external schemas or application programs. We may change the conceptual schema to
expand the database (by adding a record type or data item), to change constraints, or to reduce
the database (by removing a record type or data item). Changes to constraints can be applied to
the conceptual schema without affecting the external schemas or application programs.
2. Physical data independence is the capacity to change the internal schema without having to
change the conceptual schema. Hence, the external schemas need not be changed as well.
Changes to the internal schema may be needed because some physical files had to be
reorganized-for example, by creating additional access structures-to improve the performance of
retrieval or update. If the same data as before remains in the database, we should not have to
change the conceptual schema.
Data independence occurs because when the schema is changed at some level, the schema at the
next higher level remains unchanged; only the mapping between the two levels is changed.
Hence, application programs referring to the higher-level schema need not be changed. The
three-schema architecture can make it easier to achieve true data independence, both physical
and logical.
Therefore objects contain both executable code and data. There are other characteristics of
objects such as whether methods or data can be accessed from outside the object. We don't
consider this here, to keep the definition simple and to apply it to what an object database is. One
other term worth mentioning is classes. Classes are used in object oriented programming to
define the data and methods the object will contain. The class is like a template to the object. The
class does not itself contain data or methods but defines the data and methods contained in the
object. The class is used to create (instantiate) the object. Classes may be used in object
databases to recreate parts of the object that may not actually be stored in the database. Methods
may not be stored in the database and may be recreated by using a class.
Operational Databases
In its day to day operation, an organization generates a huge amount of data. Think of things
such as inventory management, purchases, transactions and financials. All this data is collected
in a database which is often known by several names such as operational/ production database,
subject-area database (SADB) or transaction databases.
An operational database is usually hugely important to Organizations as they include the
customer database, personal database and inventory database i.e. the details of how much of a
product the company has as well as information on the customers who buy them. The data stored
in operational databases can be changed and manipulated depending on what the company
requires.
Database Warehouses
Organizations are required to keep all relevant data for several years. In the UK it can be as long
as 6 years. This data is also an important source of information for analyzing and comparing the
current year data with that of the past years which also makes it easier to determine key trends
taking place. All this data from previous years are stored in a database warehouse. Since the data
stored has gone through all kinds of screening, editing and integration it does not need any
further editing or alteration.
With this database ensure that the software requirements specification (SRS) is formally
approved as part of the project quality plan.
Distributed Databases
Many organizations have several office locations, manufacturing plants, regional offices, branch
offices and a head office at different geographic locations. Each of these work groups may have
their own database which together will form the main database of the company. This is known as
a distributed database.
End-User Databases
There is a variety of data available at the workstation of all the end users of any organization.
Each workstation is like a small database in itself which includes data in spreadsheets,
presentations, word files, note pads and downloaded files. All such small databases form a
different type of database called the end-user database.
Data Association
Association
Association is a relationship between two objects. In other words, association defines the
multiplicity between objects. You may be aware of one-to-one, one-to-many, many-to-one,
many-to-many all these words define an association between objects. Aggregation is a special
form of association. Composition is a special form of aggregation.
Composition
Entity –
The logical association among entities is called relationship. Relationships are mapped with
entities in various ways. Mapping cardinalities define the number of association between two
entities.
Mapping cardinalities −
one to one
one to many
many to one
many to many
Data Models
A database model is a type of data model that determines the logical structure of a database and
fundamentally determines in which manner data can be stored, organized, and manipulated.
Underlying the structure of a database is the data model: a collection of conceptual tools for
describing data, data relationships, data semantics, and consistency constraints.
A data model-a collection of concepts that can be used to describe the structure of a database-
provides the necessary means to achieve this abstraction. is by structure of a database, we mean
the data types, relationships, and constraints that should hold for the data. Most data models also
include a set of basic operations for specifying retrievals and updates on the database.
A model is a representation of reality, 'real world' objects and events, associations. A data
model represents the organization itself.
It is a collection of conceptual tools for describing data, data relationships, data semantics
and consistency constraints.
Data models define how data is connected to each other and how they are processed and
stored inside the system.
It should provide the basic concepts and notations that will allow database designers and end
users unambiguously and accurately to communicate their understanding of the
organizational data.
1. High Level-conceptual data model: User level data model is the high level or
conceptual model. This provides concepts that are close to the way that many users
perceive data.
2. Low level-Physical data model : Physical data models describe how data is stored in
the computer, representing information such as record structures, record ordering, and
access paths. There are not as many physical data models as logical data models, the most
common one being the Unifying Model.
Low level data model is only for Computer specialists not for end-user.
3. Representation data model: It is between High level & Low level data model which
provides concepts that may be understood by end-user but that are not too far removed
from the way data is organized by within the computer.
Record based logical models are used in describing data at the logical and view levels. In
contrast to object based data models, they are used to specify the overall logical structure
of the database and to provide a higher-level description of the implementation. Record
based models are so named because the database is structured in fixed format records of
several types. Each record type defines a fixed number of fields, or attributes, and each
field is usually of a fixed length.
The three most widely accepted record based data models are:
• Hierarchical Model
• Network Model
• Relational Model
Object based data models use concepts such as entities, attributes, and relationships. An
entity is a distinct object (a person, place, concept, and event) in the organization that is
to be represented in the database. An attribute is a property that describes some aspect of
the object that we wish to record, and a relationship is an association between entities.
Some of the more common types of object based data model are:
• Entity-Relationship
• Object Oriented
• Object Relational
Relational Model
The Relational Model uses a collection of tables both data and the relationship among those data.
Each table have multiple column and each column has a unique name.
The relational database was invented by E. F. Codd at IBM in 1970.The relational model
represents data and relationships among data by a collection of tables, each of which has a
number of columns with unique names. Relational data model is used widely around the world
for data storage and processing. This model is simple and it has all the properties and capabilities
required to process data with storage efficiency.
For example the following figure shows a relational database showing customers and their
accounts. The customer Nina has two accounts with Rs. 50000 and 30000 balance.
101 50000
201 30000
402 150000
506 80000
Advantages
In rational database, changes in the database structure do not affect the data access. So
relational database has structural independence.
The database design, maintenance, administration and usage much easier than the other
models.
It is simpler to navigate
Disadvantages
Hierarchical Model
A hierarchical data model is a data model which the data is organized into a tree like structure.
The structure allows repeating information using parent/child relationships: each parent can have
many children but each child only has one parent. All attributes of a specific record are listed
under an entity type.
In Hierarchical model data elements are linked as an inverted tree structure (root at the top with
branches formed below). Below the single root data element are subordinate elements each of
which in turn has its own subordinate elements and so on, the tree can grow to multiple levels.
Data elements has parent child relationship as in a family tree.
For Example in an organization employees are categorized by their department and within a
department they are categorized by their job function such as managers, engineers, technicians
and support staff.
Advantages
1. The representation of records is done using an ordered tree, which is natural method of
implementation of one–to-many relationships.
2. Proper ordering of the tree results in easier and faster retrieval of records.
3. Allows the use of virtual records. This results in a stable database especially when
modification of the data base is made.
4. Hierarchical model was the first database model that offered the data security that is provided
and enforced by DBMS.
Disadvantages
1. Although the hierarchical database model is conceptually simple and easy to design , it is quite
complex to implement.
2. If you make any changes in the database structure of hierarchical database, then you need to
make the necessary changes in all the application programs that access the database. Thus
maintaining the database and the applications can become very difficult.
Network model
The data in the network model are represented by collection of records and relationships among
data are represented by links, which can be viewed as pointers.
This model is the extension of hierarchical data model. In this model also there exist a parent
child relationship but a child data element can have more than one parent element or no parent at
all. The main difference of the network model from the hierarchical model is its ability to handle
many –to – many (n: n) relationships or in other words it allows a record to have more than one
parent.
Example of Network model is given below where there are relationships among courses offered
and students enrolled for each course in a college. Each student can be enrolled for several
courses and each course may have a number of students enrolled for it. The students enrolled for
English are Miya and Priyanka and Miya has taken three courses English, Math and Science. The
example also shows a child element that has no parent element i.e he has not taken any course in
this semester, he might be a research student.
Advantages
3. The changes in data characteristics do not require changes to the application programs.
4. The data access is easier and flexible than the hierarchical model.
Disadvantages
3.The insertion, deletion and updating operations of any record require large number of pointers
adjustments.
Entity
An entity can be real world object, either animate or inanimate, that can be easily identifiable.
For example, in a school database, students, teachers, classes and courses offered can be
considered as entities. Entities are represented by means of rectangles.
Relationship
Attributes
Entities are represented by means of their properties, called attributes. All attributes have
values. For example, a student entity may have name, class, and age as attributes. Attributes are
represented by means of ellipses. Every ellipse represents one attribute and is directly connected
to its entity (rectangle).
Example of ER model
Let us consider an ER model for Banking system consisting of customers and accounts. The
diagram shown below indicates that there are two entity sets, customer and account with
attributes customer name, address, account no. and balance. In the diagram there is also a
depositor between customer and account.
Advantages of ER model
The E-R model gives graphical and diagrammatical representation of various entities,
their attributes and relationships between entities. So, It helps in the clear understanding
of the data structure and in minimizing redundancy and other problems.
Conversion of ER Diagram to any other data model like network model, hierarchical
model and the relational model is very easy.
Disadvantages
The E-R data model is especially popular for high level design.
Physical design derived from E-R Model may have some amount of ambiguities or
inconsistency.
One set comprises models of persistent O-O Programming Languages such as C++ (e.g., in
OBJECTSTORE or VERSANT), and Smalltalk (e.g., in GEMSTONE).
The following diagram represents an example of object oriented database structure. Here Class
vehicle is root of a class composition hierarchy including classes VehicleSpecs, Company and
Employee. Class Vehicle is also root of a class Hierarchy involving classes. Two Wheeler and
FourWheeler. Class Company is in turn, root of a class hierarchy with subclasses Domestic
Company and ForeignCompany. It is also root of a class composition hierarchy involving class
Employee.
For above database structure a typical query may be ―President‘s and Company‘s names for all
companies that manufacture two wheeler vehicles and are located in Pune, India‖.
Advantages
Data access is easy.
Disadvantages
There is no universally agreed data model for an OODBMS, and most models lack a
theoretical foundation.
The increased functionality provided by the OODBMS makes the system more complex
than that of traditional DBMSs.
Object-Relational Models
• Most Recent Trend. Started with Informix
• Universal Server.
The object-relational model is designed to provide a relational database management that allows
developers to integrate databases with their data types and methods. It is essentially a relational
model that allows users to integrate object-oriented features into it. Object relational model
combine advantages of both modern object oriented programming languages with relational
database features such as multiple views of data and high level non procedural query language.
Some of these systems available In market are IBM‘s DB2 universal server, oracle corporations
oracle 8, Microsoft Corporations SQL server 7 and so on.
Advantages
It allows users to define new data types that combine one or more of the currently
existing data types. Complex types aid in better flexibility in organizing the data on a
structure made up of columns and tables.
Users are able to extend the capability of the database server; this can be done by
defining new data types, as well as user-defined patterns. This allows the user to store
and manage data.
Disadvantages
Storage structures and access methods become quite complex.
Entity
An entity can be real world object, either animate or inanimate, that can be
easily identifiable. For example, in a school database, students, teachers,
classes and courses offered can be considered as entities. Entities are
represented by means of rectangles.
Entity set:
Same as an entity type, but defined at a particular point in time, such as students enrolled in a
class on the first day. Other examples: Customers who purchased last month, cars currently
registered in Florida. A related term is instance, in which the specific person or car would be an
instance of the entity set.
Entity categories:
Entities are categorized as strong, weak or associative. A strong entity can be defined solely by
its own attributes, while a weak entity cannot. An associative entity associates entities (or
elements) within an entity set.
An entity set that does not have a primary key is referred to as a weak entity set.
The existence of a weak entity set depends on the existence of a strong entity set.
A discriminator of a weak entity set is the set of attributes that distinguishes
among all the entities of a weak entity set.
The primary key of a weak entity set is formed by the primary key of the strong
Example:
Relationship
A relationship is an association among several entities. For example an
employee works at a department, a student enrolls in a course. Here, works
at and enrolls are called relationship. Relationships are represented by
diamond-shaped box.
Attributes
Attribute categories:
Attributes are categorized as simple, composite, derived, as well as single-value or multi-value.
Simple: Means the attribute value is atomic and can‘t be further divided, such as a phone
number.
Derived: Attributed is calculated or otherwise derived from another attribute, such as age from a
birthdate.
Multi-value: More than one attribute value is denoted, such as multiple phone numbers for a
person.
Single-value: Just one attributes value. The types can be combined, such as: simple single-value
attributes or composite multi-value attributes.
Entity keys:
Prepared by: Lect. Arohi Patil Page 37
MIT CIDCO DBMS B.Sc. (CS/IT) III Semester
Refers to an attribute that uniquely defines an entity in an entity set. Entity keys can be super,
candidate or primary.
Super key: A set of attributes (one or more) that together define an entity in an entity set.
Candidate key: A minimal super key, meaning it has the least possible number of attributes to
still be a super key. An entity set may have more than one candidate key.
Primary key: A candidate key chosen by the database designer to uniquely identify the entity
set.
Foreign key: Identifies the relationship between entities.
The ER Model can be diagrammatically represented as follows:
Example of ER model
Let us consider an ER model for Banking system consisting of customers and accounts. The
diagram shown below indicates that there are two entity sets, customer and account with
attributes customer name, address, account no. and balance. In the diagram there is also a
depositor between customer and account.
Relationship Types:
There are three types of relationship that exist between Entities.
a) Binary Relationship
b) Recursive Relationship
c) Ternary Relationship
a) Binary Relationship
Binary Relationship means relation between two Entities.
The above example describes that one student can enroll only for one course and a course
will also have only one Student. This is not what you will usually see in relationship.
2. One to Many: It reflects business rule that one entity is associated with many number of same
entity. The example for this relation might sound a little weird, but this means that one
student can enroll to many courses, but one course will have one Student.
The arrows in the diagram describes that one student can enroll for only one course.
3. Many to One: It reflects business rule that many entities can be associated with just one entity.
For example, Student enrolls for only one Course but a Course can have many Students.
4. Many to Many :
The above diagram represents that many students can enroll for more than one courses.
b) Recursive Relationship
When an Entity is related with itself it is known as Recursive Relationship.
c) Ternary Relationship
Relationship of degree three is called Ternary relationship.
Degree of Relationship
Prepared by: Lect. Arohi Patil Page 40
MIT CIDCO DBMS B.Sc. (CS/IT) III Semester
The number of participating entities in a relationship defines the degree of the relationship.
1. Binary = degree 2
2. Ternary = degree 3
3. n-ary = degree
One-to-one − One entity from entity set A can be associated with at most one entity of
entity set B and vice versa.
One-to-many − One entity from entity set A can be associated with more than one
entities of entity set B however an entity from entity set B, can be associated with at most
one entity.
Many-to-one − More than one entities from entity set A can be associated with at most
one entity of entity set B, however an entity from entity set B can be associated with
more than one entity from entity set A.
Many-to-many − One entity from A can be associated with more than one entity from B
and vice versa.
Participation Constraints
ER Design Issues
ER design issues need to be discussed for better ER- design
• One-to-Many: Attributes of 1:M relationship set can be repositioned to only the entity set on
the many side of the relationship
• One-to-One: The relationship attribute can be associated with either one of the participating
entities
• Many-to-Many: Here, the relationship attributes can not be represented to the entity sets; rather
they will be represented by the entity set to be created for the relationship set
ER Design Methodologies: (To resolve design issues)
The guidelines that should be followed while designing an ER diagram are discussed below:
• Recognize entity sets
• Recognize relationship sets and participating entity sets
• Recognize attributes of entity sets and attributes of relationship sets
• Define binary relationship types and existence dependencies
• Define general cardinality, constraints, keys, and discriminators
• Design diagram.
The relational database was invented by E. F. Codd at IBM in 1970.The relational model
represents data and relationships among data by a collection of tables, each of which has a
number of columns with unique names. Relational data model is used widely around the world
for data storage and processing. This model is simple and it has all the properties and capabilities
required to process data with storage efficiency.
For example the following figure shows a relational database showing customers and their
accounts. The customer Nina has two accounts with Rs. 50000 and 30000 balance.
101 50000
201 30000
402 150000
506 80000
Relational data model is the primary data model, which is used widely around the world for data
storage and processing. This model is simple and it has all the properties and capabilities
required to process data with storage efficiency.
Prepared by: Lect. Arohi Patil Page 49
MIT CIDCO DBMS B.Sc. (CS/IT) III Semester
Tuple − A single row of a table, which contains a single record for that relation is called a tuple.
Relation instance − A finite set of tuples in the relational database system represents relation
instance. Relation instances do not have duplicate tuples.
Relation schema − A relation schema describes the relation name (table name), attributes, and
their names.
Relation key − Each row has one or more attributes, known as relation key, which can identify
the row in the relation (table) uniquely.
Attribute domain − Every attribute has some pre-defined value scope, known as attribute
domain.
Key constraints
Domain constraints
Key Constraints
There must be at least one minimal subset of attributes in the relation, which can identify a tuple
uniquely. This minimal subset of attributes is called key for that relation. If there are more than
one such minimal subsets, these are called candidate keys.
a) Domain Constraints-
Attributes have specific values in real-world scenario. For example, age can only be a positive
integer. The same constraints have been tried to employ on the attributes of a relation. Every
attribute is bound to have a specific range of values. For example, age cannot be less than zero
and telephone numbers cannot contain a digit outside 0-9.
b) Integrity Rules
Relational database integrity rules are very important to good database design. Many (but by no
means all) RDBMS enforce integrity rules automatically. Those rules are:
In a relation with a key attribute, no two tuples can have identical values for key
attributes.
A key attribute can not have NULL values.
All primary key entries are unique, and no part of primary key may be null. Each row will have a
unique identity, and foreign key values can properly reference primary key values, for example...
No invoice can have a duplicate number, nor can it be null. In short, all invoices are uniquely
identified by their invoice number.
Referential integrity constraints work on the concept of Foreign Keys. A foreign key is a key
attribute of a relation that can be referred in other relation.
Referential integrity constraint states that if a relation refers to a key attribute of a different or
same relation, then that key element must exist.
A foreign key may have either a null entry, as long as it is not a part of its table‘s primary key, or
an entry that matches the primary key value in a table to which it is related.(every non –null
foreign key value must reference an existing primary key value).It is possible for an attribute not
to have corresponding value, but it will be impossible to have an invalid entry. for example, A
Customer might not yet have an assigned sales representative(number),but it will be impossible
to have an invalid sales representative(number).
To avoid nulls, some designers use special codes, known as flags, to indicate the absence of
some value.
Other integrity rules that can be enforced in the relational model are the NOT NULL and
UNIQUE constraints.
The NOT NULL constrain can be placed on a column to ensure that every row in the table has a
value for that column the UNIQUE constraint is a restriction placed on a column to ensure that
no duplicate values exist for that column.
A database schema corresponds to the variable declarations (along with associated type
definitions) in a program. Each variable has a particular value at a given instant. The values of
the variables in a program at a point in time correspond to an instance of a database schema.
The physical schema describes the database design at the physical level, while the logical
schema describes the database design at the logical level. A database may also have several
schemas at the view level, sometimes called sub schemas, that describe different views of the
database.
Schema:
Example:
Instance:
INTEGRITY CONSTRAINTS
Constraints are the rules enforced on data columns on table. These are used to limit the type of
data that can go into a table. This ensures the accuracy and reliability of the data in the database.
Constraints could be column level or table level. Column level constraints are applied only to
one column, whereas table level constraints are applied to the whole table.
Following are commonly used constraints available in SQL.
1. NOT NULL Constraint: Ensures that a column cannot have NULL value.
By default, a column can hold NULL values. If you do not want a column to have a NULL
value, then you need to define such constraint on this column specifying that NULL is now
not allowed for that column.
A NULL is not the same as no data, rather, it represents unknown data.
Example:
For example, the following SQL creates a new table called CUSTOMERS and adds five
columns, three of which, ID and NAME and AGE, specify not to accept NULLs:
);
2. DEFAULT Constraint: Provides a default value for a column when none is specified.
The DEFAULT constraint provides a default value to a column when the INSERT INTO
statement does not provide a specific value.
Example:
For example, the following SQL creates a new table called CUSTOMERS and adds five
columns. Here, SALARY column is set to 5000.00 by default, so in case INSERT INTO
statement does not provide a value for this column, then by default this column would be set
to 5000.00.
);
If CUSTOMERS table has already been created, then to add a DFAULT constraint to
SALARY column, you would write a statement similar to the following:
3. UNIQUE Constraint: Ensures that all values in a column are different.
The UNIQUE Constraint prevents two records from having identical values in a
particular column. In the CUSTOMERS table, for example, you might want to prevent
two or more people from having identical age.
Example:
For example, the following SQL creates a new table called CUSTOMERS and adds five
columns. Here, AGE column is set to UNIQUE, so that you can not have two records
with same age:
);
);
A foreign key is a key used to link two tables together. This is sometimes called a
referencing key.
Foreign Key is a column or a combination of columns whose values match a Primary Key in
a different table.
The relationship between 2 tables matches the Primary Key in one of the tables with a
Foreign Key in the second table.
If a table has a primary key defined on any field(s), then you can not have two records
having the same value of that field(s).
Example:
Consider the structure of the two tables as follows:
CUSTOMERS table:
);
ORDERS table:
DATE DATETIME,
AMOUNT double,
);
6. CHECK Constraint: The CHECK constraint ensures that all values in a column satisfy certain
conditions.
The CHECK Constraint enables a condition to check the value being entered into a record. If
the condition evaluates to false, the record violates the constraint and isn't entered into the
table.
Example:
For example, the following SQL creates a new table called CUSTOMERS and adds five
columns. Here, we add a CHECK with AGE column, so that you can not have any
CUSTOMER below 18 years:
);
Trivial Dependency –
If a functional dependency (FD) X → Y holds, where Y is a subset of X, then it is called a trivial
FD. Trivial FDs always hold.
Normalization
Normalization is a process of organizing the data in database to avoid data redundancy,
insertion anomaly, update anomaly & deletion anomaly.
Database Normalization is a technique of organizing the data in the database. Normalization is a
systematic approach of decomposing tables to eliminate data redundancy and undesirable
characteristics like Insertion, Update and Deletion Anomalies. It is a multi-step process that puts
data into tabular form by removing duplicated data from the relation tables.
Normalization is used for mainly two purpose,
Eliminating redundant (useless) data.
Prepared by: Lect. Arohi Patil Page 58
MIT CIDCO DBMS B.Sc. (CS/IT) III Semester
Anomalies:
Update anomalies − If data items are scattered and are not linked to each other properly, then it
could lead to strange situations. For example, when we try to update one data item having its
copies scattered over several places, a few instances get updated properly while a few others are
left with old values. Such instances leave the database in an inconsistent state.
Deletion anomalies − We tried to delete a record, but parts of it was left undeleted because of
unawareness, the data is also saved somewhere else.
Insert anomalies − We tried to insert data in a record that does not exist at all.
Normalization is a method to remove all these anomalies and bring the database to a consistent
state.
Each attribute must contain only a single value from its pre-defined domain.
We see here in Student_Project relation that the prime key attributes are Stu_ID and Proj_ID.
According to the rule, non-key attributes, i.e. Stu_Name and Proj_Name must be dependent
upon both and not on any of the prime key attribute individually. But we find that Stu_Name can
be identified by Stu_ID and Proj_Name can be identified by Proj_ID independently. This is
called partial dependency, which is not allowed in Second Normal Form.
We broke the relation in two as depicted in the above picture. So there exists no partial
dependency.
We find that in the above Student_detail relation, Stu_ID is the key and only prime key attribute.
We find that City can be identified by Stu_ID as well as Zip itself. Neither Zip is a superkey nor
is City a prime attribute. Additionally, Stu_ID → Zip → City, so there exists transitive
dependency.
To bring this relation into third normal form, we break the relation into two relations as follows
−
Relational Algebra
Relational Algebra
A query language is a language in which user requests information from the database. it
can be categorized as either procedural or nonprocedural.
In a procedural language the user instructs the system to do a sequence of operations on
database to compute the desired result. In nonprocedural language the user describes the
desired information without giving a specific procedure for obtaining that information.
The relational algebra is a procedural query language. It consists of a set of operations
that take one or two relations as input and produces a new relation as output.
It uses operators to perform queries.
An operator can be either unary or binary. Relational algebra is performed recursively
on a relation and intermediate results are also considered relations.
Fundamental Operations
Prepared by: Lect. Arohi Patil Page 62
MIT CIDCO DBMS B.Sc. (CS/IT) III Semester
SELECT
PROJECT
UNION
SET DIFFERENCE
CARTESIAN PRODUCT
RENAME
Select and project operations are unary operation as they operate on a single relation. Union, set
difference, Cartesian product and rename operations are binary operations as they operate on
pairs of relations.
Other Operations
SET INTERSECTION
NATURAL JOIN
DIVISION
ASSIGNMENT
1. SELECT (σ):-
It selects tuples that satisfy the given predicate from a relation.
σcondition(relation)
Notation – σ p (r)
Where σ stands for selection predicate and r stands for relation. p is prepositional logic formula
which may use connectors like and, or, and not. These terms may use relational operators like
− =, ≠, ≥, < , >, ≤.
For example −
σsubject = "database"(Books)
Output − Selects tuples from books where subject is 'database' and 'price' is 450.
Output − Selects tuples from books where subject is 'database' and 'price' is 450 or those books
published after 2010.
Selects and projects columns named as subject and author from the relation Books.
πName,Hobby(Person)
πName,Address(Person)
r ∪ s = { t | t ∈ r or t ∈ s}
Notation − r U s
Where r and s are either database relations or relation result set (temporary relation).
For a union operation to be valid, the following conditions must hold −
r, and s must have the same number of attributes.
Attribute domains must be compatible.
Duplicate tuples are automatically eliminated.
Output − Projects the names of the authors who have either written a book or an article or both.
Output − Provides the name of authors who have written books but not articles.
Output − Yields a relation, which shows all the books and articles written by tutorialspoint.
7. Set Intersection
•R∩S
• Includes all tuples that are in both R and S.
R and S must be union compatible.
• Schema of the result is that of R.
8. Join:
Can be defined as cross-product followed by selection and projection.
We have several variants of join.
o – Condition joins
o – Equijoin
o – Natural join
9. Division:
10. Assignment:
Sometimes it is useful to be able to write a relational algebra expression in parts using a
temporary relation variable.
The assignment operation, denoted by , works like assignment in a programming language.
Example:
No extra relation is added to the database, but the relation variable created can be used in
subsequent expressions. Assignment to a permanent relation would constitute a modification to
the database
DATE This data type is used to represent data and time. The
standard format is DD-MON-YY as an in 21-jun-04. Date
Times stores date in the 24-hour format.
RAW/LONG RAW The RAW/ LONG RAW data types are used to store binary
data, such as digitized picture or image. RAW data type
can have a maximum length of 225 bytes.
SQL Commands:
1. Create Table:
STATE VARCHAR2(5),
BALDUE NUMBER(10,2)
);
Output : Table created.
2. Describe Table:
CITY VARCHAR2(30),
PINCODE NUMBER(15),
STATE VARCHAR2(5),
BALDUE NUMBER(10,2)
3. Insert into:
4. Select Command:
HANSEL
SELECT SALESMANNAME FROM SALESMAN_MASTER WHERE SALAMT=3000;
Output : SALESMANNAME
--------------------
AMAN
OMKAR
RAJ
ADD(<NewColumnName><Datatype> (<Size>),
<NewColumnName><Datatype> (<Size>)…...);
MODIFY(<ColumnName><New Datatype> (<New Size>),
ColumnName><New Datatype> (<New Size>)…...);
Use : To alter table and modify column name, data type or size.
Examples :
Number Functions: Number functions allows you to present a number in a manner that is
useful to the reader.
Date Functions: Dates are stored in database as a number that contains the calendar data
information and time information. Date functions allows to modify and compare date
data types.
Conversion Functions: These functions are used to change data from one data type to
another.
Aggregate or Multi Row SQL Functions: They operate on a set of rows and returns one
result or one result per group.
String/Character/Text Functions:
1. LENGTH(S):
2. UPPER(…):
3. LOWER(…):
4. INITCAP(…):
5. CONCAT(S1,S2):
6. SUBSTR(S1,B,N):
7. INSTR(S1,S2,ST,T):
8. LPAD(S1,S,C) / RPAD(S1,S,C)
Number Functions: Number functions allows you to present a number in a manner that is
useful to the reader.
Number Functions:
1. ROUND(N,D):
2. TRUNC(N,D):
Date Functions: Dates are stored in database as a number that contains the calendar data
information and time information. Date functions allows to modify and compare date
data types.
Date Functions:
1. Months_Between(st,ed):
2. Next_day(d,day_of_week):
3. Round(d, format):
4. Trunc(d, format):
Conversion Functions: These functions are used to change data from one data type to
another.
Conversion Functions:
1. To_Char(date, format) / To_Char(num, format):
2. To_Date(text, format):
3. To_Number(text, format):
Aggregate or Multi Row SQL Functions: They operate on a set of rows and returns one
result or one result per group.
Aggregate Functions:
1. Count( ):
2. Sum( ):
3. Avg( ):
4. Max( ):
5. Min( ):