0% found this document useful (0 votes)
33 views

DBMS_Unit1

Uploaded by

anju143guna
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views

DBMS_Unit1

Uploaded by

anju143guna
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

1|U n it- I

DATABASES & DATABASE USERS


Introduction : Databases and database systems are an essential components of everyday life
in modern society. Due to increasing use of databases as a corporate resource and growth of
information technology, all the organizations have come to realize the importance of speedy access
and reliable data for correct decision making. The data is primarily the known facts that can be
stored and have intrinsic meaning. For example a dictionary is a store of huge amount of
information which is in organized form.

DATABASE : It is a collection of interrelated data. The collection of related data with an


implicit meaning is called Database. Nowadays databases and database technology have a major
impact on growing use of computers. Databases play a critical role in almost all major areas where
computers are used, including business, e-commerce, engineering, medicine, law, education and
library science etc.

The database has the following implicit properties :

1. A database represents some aspects of real world, sometimes called mini world oruniverse
of discourse (UOD). Changes to the mini world are reflected in the database.
2. A database is a logically coherent collection of data with some inherent meaning. A random
assortment of data can’t correctly be referred to as a database.
3. A database is designed, build and populated with database for a specific purpose. It has an
intended group of users and some preconceived applications in which these users are
interested.

DBMS(Database Management Systems) : A DBMS is a collection of program that


enables the users to create and maintain a database. It is a general purpose software system that
facilities the process of defining, constructing, manipulating and sharing databases among various
users and applications. Defining database involves specifying data types, structures and constraints
of data to be stored in the database. This definition is stored in the form of database catalog or
dictionary. It is sometimes called as Meta Data. Constructing the database is the process of storing
the data on some storage medium that is controlled by DBMS. Manipulating adatabase includes
functions such as querying database, retrieve specific data, updating database
2|U n it- I

and generating reports from the database. Sharing a database allows multiple users and programs
to access database simultaneously. Other important functions provided by DBMS include
protecting and maintaining it over a long period of time. Protection includes system protection
against hardware or software malfunction and security protection against unauthorized access.

Users/Programmers

Database
System Application Programs/Queries

Software to process queries/programs

Software to access stored data

Stored database
Stored Database
Definition
(Meta data)

Characteristics of Database Approach : A Traditional File System (TFS) existed before


DBMS came into existence. Because of drawback in TFS, the database systems were introduced.
The main characteristics of database approach versus the file processing approach are described
below.
3|U n it- I

1. Self describing nature of database system : A fundamental characteristic of the database


approach is that database system not only contains database itself but also a complete
definition or description of database structure and constraints. This is stored as DBMS
catalog. The information stored in catalog is also called Meta Data. This catalog is used
by DBMS and also by users who need information about structure of database. However
in TFS data definition is typically a part of application programs. So if file was used in
more than one application, then we had to create a separate copy for each application. This
multiple copies of same data file leads to redundancy problems.

2. Insulation between programs and data abstraction : In TFS, the structure of data files
is embedded in application programs, so any changes to structure of file may require
changing all application programs that access that file. However DBMS don’t require such
changes because structure of data files stored separately in DBMS catalog from the access
programs. This nature is referred as Program-Data independence. In some cases user can
define operations as a part of database definition. This operation has two parts like interface
and implementation. User application programs can operate on data by invoking these
operations through their names and arguments regardless of how the operations are
implemented. This is termed as Program-operation independence. The characteristic that
allow program-data and program-operation independences is called Data Abstraction.

3. Support of Multiple View of data : A database has many users, each of whom may require
a different perspective or view of the database. This view may be a subset of database. It
may contain virtual data that is derived from database files but not explicitly stored.

4. Sharing of data and multi user transaction processing : A multiuser DBMS allows
multiple users to access the database at the same time. This is essential when data for
multiple applications is to be integrated and maintained in a single database. The DBMS
must include concurrency control to ensure that several users trying to update same data,
so that the result of updates is correct. The concept of transaction has become central to
many database applications. A transaction is executing program or process that includes
4|U n it- I

one or more database access. Each transaction is supposed to execute without


interference from other transactions. The DBMS enforce several transactions properties.

Advantages of DBMS : The main advantages of using DBMS are


1. Controls Redundancy : The redundancy is storing same data at multiple places which
leads to several problems. Centralized control of data by DBA avoids unnecessary
duplication of data. So it can effectively reduce the total amount of storage required. DMBS
should have the capability to control this redundancy.

2. Restricts Unauthorized access : Confidential data must not be accessed by unauthorized


users. DBMS should provide a security and authorization system that DBA uses to create
user accounts. DBMS can ensure that proper access procedures should follow accessing
sensitive data. Different levels of security could be implemented for various types of data
and operations.

3. Provides persistent storage for program objects : Databases can be sued to provide
persistent storage for program objects and data structures. An object is said to be
persistent when it is retrieved after the termination of the program. So the persistent storage
of program objects and data structures is an important function of databasesystem.

4. Provides storage structures for efficient query processing : Because database is


typically stored on disk, DBMS must provide specialized data structures to speed up disk
search for desired records. Auxiliary files called indexes are used for this purpose. DBMS
must provide capabilities for efficient query processing and updates. The query processing
and optimization module of DBMS is responsible for choosing efficient queryexecution
plan for each query based on existing storage structure.

5. Provides Backup & Recovery : A DBMS must provide the facilities for recovery from
hardware and software failures. For example when computer fails in the middle of
transaction, the recovery subsystem is responsible for making that database is restored to
the state it was before the transaction stated executing.
5|U n it- I

6. Provides Multiple User Interfaces : DBMS should provide different user interfaces
because different users with different levels of knowledge use the database. For example
query language for casual users and programming language interface for application
programs etc.

7. Represents complex relationships among data : A database may have varieties of data
that are interrelated in many ways. A DBMS must have the capability to represent a variety
of complex relationships among data to define new relationships and to retrieve and update
data easily and efficiently.

8. Enforce Integrity Constraints : Database systems must hold certain integrity constraints
to make the data valid. So DBMS must provide the capabilities for defining and enforcing
these constraints. Some constraints can be specified to DBMS and automatically enforced.
Other constraints may have to be checked by update programs or at the time of data entry.

Disadvantages of DBMS : The DBMS has its own disadvantages. The main disadvantage
of DBMS is its cost. The overhead costs of using DBMS are due to following reasons.

1. High initial investment in hardware, software and training.


2. The generality that DBMS provides for defining and processing data.
3. Overhead for providing security, concurrency control, recovery and integrity functions.

In addition to above, the following are some more disadvantages.

1. Complexity
2. Size
3. Performance
4. Higher impact of failure.
6|U n it- I

Database Users : In large organizations, many people are involved in the design, use and
maintenance of large database. Based on the functionalities these users are grouped into different
categories as

1. Database Administrators (DBA)


2. Database Designers
3. End Users
4. System Analysts and Application Programmers

 DBA : One of the main reasons for having DBMS is to have control of both data and
programs accessing that data. The person having such control over the system is called
DBA. The DBA is responsible for authorizing access to database, coordinating and
monitoring its use and acquiring software and hardware resources as needed.
 Database Designers : These are the users whose responsibility is to identify the data to
be stored in the database. They can choose appropriate structures to represent and store this
data. It is the responsibility of database designers to communicate with all users in order to
understand their requirements and to create the design that meets those requirements. So
the final database design must be capable of supporting the requirements of all user groups.
 End Users : These are the people whose jobs require access to database for querying,
updating and generating reports. There are several categories of end users,
 Casual End Users : These occasionally access the database, but they may need
different information each time.
 Naïve or Parametric End Users : These are unsophisticated who interact with
system by involving one of application program. Their main job function
revolves around constantly querying an updating the database.
 Sophisticated End Users : These include engineers, scientists who interact
with the system without writing programs.
 Standalone Users : They maintain personal databases by using readymade
program package that provide easy to use menu based or graphics based
interfaces.
7|U n it- I

 System Analysts & Application Programmers : System analyst determine the


requirements of end users, especially naïve and parametric users. They can develop some
specifications. Application programmers implement these specifications as programs then
they test, debug, document and maintain them.

Database Administrator :

A person who has central control over the system of both data and programs that
access the data is called Database Administrator

The function of DBA are :


1. creation and modification of conceptual Schema definition
2. Implementation of storage structure and access method.
3. schema and physical organization modifications .
4. granting of authorization for data access.
5. Integrity constraints specification.
6. Execute immediate recovery procedure in case of failures
7. ensure physical security to database

Database language :

Data definition language(DDL) :


DDL is used to define database objects .The conceptual schema is specified by a set
of definitions expressed by this language. It also give some details about how to implement
this schema in the physical devices used to store the data. This definition includes all the
entity sets and their associated attributes and their relation ships. The result of DDL
statements will be a set of tables that are stored in special file called data dictionary.

Data manipulation language(DML) :


A DML is a language that enables users to access or manipulate data stored in the
database. Data manipulation involves retrieval of data from the database, insertion of new
data into the database and deletion of data or modification of existing data.
8|U n it- I

There are basically two types of DML:


Procedural: Which requires a user to specify what data is needed and how
to get it.
non-Procedural: which requires a user to specify what data is needed with
out specifying how to get it.

Data control language(DCL):


This language enables user to grant authorization and canceling authorization of
database objects.

Elements of DBMS:

DML pre-compiler:

It converts DML statement embedded in an application program to normal


procedure calls in the host language. The pre-complier must interact with the query
processor in order to generate the appropriate code.

DDL compiler:

The DDL compiler converts the data definition statements into a set of tables.
These tables contains information concerning the database and are in a form that can be
used by other components of the dbms.

File manager:

File manager manages the allocation of space on disk storage and the data structure
9|U n it- I

used to represent information stored on disk.

Database manager:

A database manager is a program module which provides the interface between the
low level data stored in the database and the application programs and queries submitted
to the system.

The responsibilities of database manager are:

Interaction with file manager: The data is stored on the disk using the file system
which is provided by operating system. The database manager translate the the different
DML statements into low-level file system commands. so The database manager is
responsible for the actual storing,retrieving and updating of data in the database.
Integrity enforcement: The data values stored in the database must satisfy certain
constraints(eg: the age of a person can't be less then zero).These constraints are specified
by DBA. Data manager checks the constraints and if it satisfies then it stores the data in
the database.
Security enforcement: Data manager checks the security measures for database
from unauthorized users.
Backup and recovery: Database manager detects the failures occurs due to different
causes (like disk failure, power failure,deadlock,s/w error) and restores the database to
original state of the database.
Concurrency control: When several users access the same database file
simultaneously, there may be possibilities of data inconsistency. It is
10 | U n i t - I

responsible of database manager to control the problems occurs for concurrent


transactions.
Query processor:

The query processor used to interpret to online user’s query and convert it
into an efficient series of operations in a form capable of being sent to the
data manager for execution. The query processor uses the data dictionary to
find the details of data file and using this information it create query
plan/access plan to execute the query.

Data Dictionary:

Data dictionary is the table which contains the information about database
objects. It contains information like
1. external, conceptual and internal database description
2. description of entities , attributes as well as meaning of data elements
3. synonyms, authorization and security codes
4. database authorization
The data stored in the data dictionary is called meta data.
11 | U n i t - I

DBMS ARCHITECTURE :

Naïve user Application On line user DBA


programers

Application System calls Ddl compiler


programs

Application progobj Dml precomplier Query processor Ddl compiler


code

Database manager

File manager

DBMS

Data file

Data dictionary
12 | U n i t - I

DATABASE SYSTEM CONCEPTS & ARCHITECTURE


Data Model : It is a collection of concepts that can be used to describe the structure of database,
relationships among data, data constraints and semantics. Actually a data model is abstraction
process that suppresses inner details of data storage and organization and highlighting an essential
feature for understanding of data. Most data models include the set of basic operations for
specifying retrievals and updates on the database.

These data models are classified into different categories like

1. High level or conceptual data models


2. Low level or physical data models
3. Representative or implementation data models

1. High Level Models : These models use the concepts like entities, attributes and relationships.
Entities represent a real world object or concept such as employee or project that is described in
the database. An attributes represent some property of entity such as employee name, salary etc.
The relationship is an association among two or more entities.
2. Low Level Models : These can provide the concepts that describe the details of how data is
stored in computer. These concepts are generally meant for specialists not for typical end users.
These describe how data is stored as files in computer by representing information such a record
formats, orderings and access paths.
3. Implementation Models : These are the models that are used most frequently in commercial
DBMSs. These include Hierarchical models, Network models and Relational models. Sometimes
there are referred as Record Based Data Models.

SCHEMA : In any data model, the description of database is different from the database itself.
This description of database is called database schema. So the overall design of database is called
database schema. The displayed schema is called schema diagram. This diagram displays only
structure of each record type but not actual instance of records. So schema will remain same and
does’t changed for instance to instance. The student subschema is given below :

Stu_schema : No Name Class Course


13 | U n i t - I

INSTANCES : Usually database changes over time when some information is inserted or
deleted. So the collection of information stored in the database at a particular moment is called an
Instance of database. It is also referred as database state or snapshot. It is also called the current
set of occurrences or instances in the database. At a given database state, each schema construct
has its own current set of instances.

Three Schema Architecture :


An important purpose of database system is to provide users with an abstract view of data.
i.e system hides certain details of how the data is stored and maintained. The DBMS provides 3
levels of abstraction for the data which is said to be 3 schema architecture of database system. The
view at each level is described by a schema. A schema is outline or a plan that describes the way
in which entities at one level of abstraction cab be mapped to the next level. The overall design of
database is called the database schema. The main goal of the 3 schema architecture is to separate
the user applications and physical database. This 3 schema architecture can be depicted as follows
:

External View External View External View

Conceptual Schema

Internal Schema

Data Data Data


base base base

Because each level is defined by schema, 3 schemas exist in database, which are described
as follows :
14 | U n i t - I

 Internal Schema : The Internal level has internal schema which describes the physical
storage structure of database. It is also referred as physical schema which provides the
lowest level of abstraction. It actually represents how the data will be stored and described
the data structures and access methods to be used by database. It uses aphysical data model
and describes complete details of data storage and access paths for database.
 Conceptual Schema : The Conceptual level has conceptual schema that describes the
structure of whole database for a community of users. This schema hides the details of
physical storage structures and concentrates on describing entities, data types,
relationships, operations and constraints. A representational model is used to describe this
schema when a database system is implemented.
 External Schema : This is at the highest level of abstraction where only those portions of
concern to user or application programmers are included. This is also referred as
subschema and any number of sub schemas exist for a given conceptual schema. This
contains the definitions of logical records and relationships in the external view.
In general one physical schema, one conceptual schema and several sub-schemas exist for
a single database system.

Data Independence : Data Independence is one of main advantages of DBMS. The ability to
modify the schema definition at one level without affecting schema definition in the next higher
level is called Data Independence. We can have two types of Data Independences.

 Logical Data Independence : It is the ability to change the conceptual schema without
having to change external schemas or application programs. The change would be absorbed
by the mapping between the external and conceptual levels. We may change conceptual
schema to expand database, to change constraints or to reduce the database. So only
mapping is need to be changed in DBMS that supports logical data independence. It is
achieved by providing external level or user view of the database.
 Physical Data Independence : It is the ability to change the internal schema without
having to change the conceptual schema. External schemas need not be changed as well.
These changes to internal schema may be needed because some physical files are
reorganized. The change would be absorbed by the mapping between the conceptual and
15 | U n i t - I

internal levels. It is achieved by the presence of mapping or transformation from conceptual


level of database to internal level. So no changes are required in the application programs
to access data from new physical organization.

Database Languages & Interfaces : DBMS must provide appropriate languages and
interfaces for each category of users. Actually when design of database is completed then DBMS
is chosen to implement it. First step is to specify conceptual and internal schemas. When no
strict separation is maintained between them then designers use DDL to define both schemas.
When clear separation is maintained between them then the designers use DDL for conceptual
schema and SDL (Storage Definition Language) for internal schema. The mapping between two
schemas may be specified in either one of these languages. They use VDL (View Definition
Language) to specify user views and their mapping to conceptual schema. However in relational
DBMS, SQL is used in the role of VDL to define user or application views. Once the database
schemas are completed, users need to manipulate the database. These manipulations include
retrieval, insertions, deletions and modifications of data. The DBMS provides a language called
DML for these purposes.

DBMS Interfaces : DBMS must provide user friendly interfaces which include
 Menu based Interfaces : These interfaces present the user with lists of options that lead
the user through the formation of request.
 Forms based Interfaces : This interface displays a form to each user. Users can fill out all
the form entries to insert new data.
 GUI (Graphical User Interface) : This interface displays diagrammatic form to the user.
So user can specify a query by manipulating the diagram. GUI utilizes both menus and
form. These can use pointing device such as mouse to pick certain parts of diagram.
 Natural Language Interface : This interface accepts the request written in English or
some other language. This interface has its own schema as well as dictionary of important
words.
 Speech Input & Output : This interface use speech as input query and speech as answer
to query.
16 | U n i t - I

 Interface for Parametric users : This interface allows the parametric users like bank
tellers to proceed with minimal number of keystrokes. The goal of this interface is to
minimize the number of keystrokes required for each request.
 Interfaces for DBA : This interface is used by DBA itself. This can be used for creating
accounts, setting parameters, granting account authorization, changing schema and
reorganizing storage structures of database.

Centralized Architecture for DBMSs : In this architecture, the database system use
computers as display terminals and a centralized DBMS will do all DBMS functionality,
application program execution and user interface processing were carried out on one machine.

Client/Server Architecture : This architecture divides the total processing into two units
called Clients and Servers. A Client is typically user machine that provides user interface
capabilities and local processing. A Server is a system containing both hardware and software that
can provide services to the client machines such as file access, printing, database access. This
Client/Server architecture was developed to deal with computing environment in whichlarge
number of PCs, workstations, file servers, printers, web servers and other equipments are
connected via network. There are two main types of basic architectures for DBMS under
Client/Server framework. They are given below.

Two Tier Architecture : This is the architecture in which the software components are distributed
over two systems called Clients and Servers. That’s why it is called Two Tier. The advantage of
this architecture is simplicity and compatibility with existing systems. In this architecture, the user
interface and application programs can run on client side. However the server can do the query
and transaction processing functions along with data storage. Thecommunication between
clients and server can be done through ODBC API.
17 | U n i t - I

Diskless client client with disk server server with client

Site 1 Site2 Site3 Site4

Data
Data Data

client server
client server

client

Communication Network

Three Tier and n-Tier Architecture : The introduction of web can change the client/server
architecture to 3 tier architecture. So many web applications use an architecture called 3 tier
architecture. This is the architecture which adds an intermediate layer between client and server
(database server). This intermediate layer or middle tier is sometimes called application server and
sometimes web server depending on application. This server plays an intermediate role by storing
business rules (procedures or constraints) that are used to access data from database server.
Generally in this architecture clients contain GUI interface and additional application specific
business rule. The intermediate server accepts request from client, process that request and sends
database commands to database server. It then sends the processed data from the database server
to clients. Thus, the user interface, application rules and data access act as 3 tiers. However
sometimes the business logic layer may be divided into multiple layers based on application which
is called as n-tier. This technology gives higher levels of data security and network security issues
remain a major concern.
18 | U n i t - I

Client
GUI web interface Presentation layer

Application
Or web Business logic
Application
Server layer
programs

Database DBMS Database service


Server layer

Classification of DBMS : The DBMSs are classified according to the following criteria.
 Data Model : According to this, DBMS is classified as Hierarchical, Network, Relational
and Object Oriented DBMSs. However RDBMS is used in many current commercial
systems.
 Number of Users : This criterion can be used to classify DBMS as single user or multi
user systems. Single user system allows only one user at a time and mostly used with PCs.
But multiuser support multiple users at a time.
 Number of Sites : This criterion can be used to classify DBMS as centralized or distributed
DBMS. A DBMS is centralized if data is stored at a single computer site. In distributed
DBMS can have the database and software distributed over sites which connected by
network.
 Type of Access Paths : DBMS can also be classified on the basis of types of accesspaths.
One such DBMS is based on inverted file structure.
 Generality : This criterion can be used to classify DBMS as general purpose or special
purpose. The special purpose can be designed for specific application like airline
reservation. These can’t be used for other applications.
19 | U n i t - I

BASIC FILE STRUCTURES AND INDEX STRUCTURES


Types of File Organizations : Generally data is organized on secondary storage in terms of
files. Each file has been stored in terms of records. Because huge data can’t be stored in the main
memory, it has to be stored on secondary storage like disk. However for processing, the data is to
be accessed in to main memory. The unit of information being transferred between main memory
and the disk is called a page. Buffer management can be used for reading and writing the data
between main memory and disks. Disk space manager is a software that allocates space for records
on the disk. When DBMS requires an additional space then it calls disk manager to allocate the
space. Also DBMS informs the disk manager when it is not going to use the space.

Most widely used file organizations are

1. Heap (Unordered) file


2. Sequential (Ordered) file
3. Hash file.

 Heap File : It is also called unordered file because it stores records in file in the order as
they arrive. This is the simplest organization .
 Insert Record : Records are inserted in the same order as they arrive.
 Delete Record : The page in which record is to be deleted is accessed first and
marked as deleted. After deletion of record the entire page is then loaded onto the
disk.
 Access Record : A linear search is performed on file starting from the first record
until the desired record is found.
 Sequential File : It is also called Ordered file because it stores the records in a sequential
order. The records are ordered based on value of one field called ordering or key field. The
main advantage of this file is that we can use binary search as file is sorted.
 Insert Record : This is difficult task because we need to identify the place to insert.
If space is available then record can directly inserted. If space is not sufficient then
successive records must be moved to next pages which is tiresome job.
20 | U n i t - I

 Delete Record : This is also difficult job as insertion because the pages must
move back to remove the empty space of deleted records.
 Access Record : This task becomes simple because we can use binary search as
file is sorted.
 Hash File : In these files, records are stored randomly instead of sequentially. A function
called hash function can be used to store and retrieve the records from these files.

INDEXES : Indexes enhance the performance of DBMS. They enable us to go to the desired
record directly without scanning each record in the files. This is similar to an index page of a book.
i.e., the index page enables us to find the desired keyword in the book by avoiding the need of
sequential scan through the complete book. So an index can be defined as data structure that allows
faster retrieval of data. Each index is based on certain attribute of field. This is given in search key.
We can have several indexes based on search keys.

Types of single level ordered indexes : There are several types of ordered indexes. They are

1. Primary Index
2. Clustering Index
3. Secondary Index.

 Primary Index : Index on set of fields that include primary key is called primary index.
A primary index is an ordered file whose records are of fixed length with two fields. The
first field is called key field which is called primary key of data file and second field is a
pointer to disk block. There is only one index entry in index file for each block of data file.
We will refer two field values of index entry ‘i’ as <K(i),P(i)>. This primary index can
further divided into two types called
 Dense Index : It has an index entry for every search key value in the data file. The
index record contains search key value and a pointer to first data record with that
search key value.
 Sparse Index : This index has entries for only some of the search values. So each
index entry record contains a search key value and a pointer to the first data record
with the largest search key value that is less than or equal to search key
21 | U n i t - I

value for which we are looking. We start at record pointed to by index entry and
follow the pointers in the file until we find the desired record.
 Clustering Index : If the file records are physically ordered on a nonkey field which does
not have distinct value for each record, that field is called clustering field. The index created
for such field is called clustering index. This clustering index is different from primary
index which requires that ordering field of data file have a distinct value for eachrecord.
This clustering index is also an ordered file with 2 fields. The first field isclustering
field and second field is block pointer. There is one entry in clustering indexfor each
distinct value of clustering field and containing a pointer to the first block in data file that
has record with clustering field value.
 Secondary Index : An index that is not primary key index is called secondary index. This
index provides secondary access which means that primary access already exist for that
file. This secondary index may be on a field which is candidate key and has unique value
in every record or a non key with duplicate values. This index is also an orderedfile
with 2 fields. The first field is indexing field and second field is either block pointer or
record pointer. In general secondary index is different form primary index if searchkey
is not primary key. i.e., a secondary index must contain pointers to all records because
records are ordered by search key of primary index but not by search key of secondary key
index. So the records with same search key value could be anywhere in the file. Therefore
this index needs more space and more search time than primary index. However it improves
the search time that use keys other than the search key of primary index.

Index Data Structures : There are two methods that can be used to organize the index files.
They are

1) Hash based Indexing : In this, a hash function is used to find which block contains the
desired record. So file records are grouped into blocks, which contain a primary page along
with other pages that are chained together.
2) Tree based Indexing : In this, records are arranged in tree like structure. The records are
arranged according to the search key values and they are in a hierarchical manner.
22 | U n i t - I

Secondary Index :

1 4
2 6
3 1
4 7
5

6 3
7 2
8 8
9 5

Primary index :

Name SSN DoB Job Salary


Aaron
Abbort

Adams
Akers
Aaron
Adams
Allen
Anderson Allen
Atkins
.

Anderson
Alexander
23 | U n i t - I

Clustering Index :

DNo Name SSN Job Salary


1
1
2

1 2
2 3
3 3
4

. 3
4
24 | U n i t - I

DATA MODELING USING ER MODEL


Introduction : Conceptual Modeling is very important phase in design of successful database
application. The database application refers to database and associated programs that implement
database queries and updates. The ER is advanced and dominant model which is closest to the
conceptual model of database. So this model is frequently used for the conceptual design of
database application. In this model, the overall logical structure of database can be expressed by
a diagram known as ER diagram. This ER model describes the data as Entities, Relationships and
Attributes.

Entities : The basic object that ER model represents is an Entity. It is a thing in real world with
independent existence. It may be an object with physical existence like person, car, house,
employee etc or it may be object with conceptual existence like company, job and course etc.
These entities are represented by a rectangle containing the entity type name in it.

Ex : STUDENT EMPLOYEE

Entity Type & Entity Set : Entity type defined as a collection of entities that have the same
attributes. The collection of all entities of particular type in the database at any point in time is
called Entity Set.

Attributes : Each entity has certain characteristics known as attributes. An attribute is a


characteristic property of entity. In ER diagram these attributes are represented by ovels or ellipses
enclosing the attribute name and they are attached to their entity type by straight lines. These
attributes have domains. A domain is a set of all possible values for attributes. Sometimes an
attributes share a domain. For example student and professor can share the domain of all possible
addresses.
25 | U n i t - I

SNO NAME DOB

STUDENT

Classification of Attributes : Attributes cab be classified as

1. Key Attributes
2. Simple & Composite Attributes
3. Single valued & Multi-valued Attributes
4. Stored & Derived Attributes

Key Attributes : A key attribute is the attribute whose values are distinct for each individual
entity in entity set. So this is used as primary key. This attribute is underlined inside the ovel.

Ex : EMP(eno, ename, dob)


ename
eno dob

EMP

Simple Attribute (Atomic) : It is an attribute that cannot be further subdivided is called simple
attribute.

Ex : Age

Composite Attributes : It is an attribute that can be further subdivided to yield additional


attributes. Ex: address can be subdivided into street, city, state and zip code etc.
26 | U n i t - I

STREET CITY STATE

ADDRESS

Single-Valued Attribute : This is an attribute which can have single value. For example age is
single valued attribute of a person. However single valued attribute is need not be a simple. For
example parts serial number is SE080219326 is a single valued but not simple because it can be
subdivided into region in which part was produced, part number etc.

Multi-valued Attribute : It is an attribute which can have many values. In ER diagram this
attribute is represented by a double line ovels. For example color of a car and qualification of an
employee etc.

NAME QUALIFICATIO
ENO

EMP

Stored Attribute : It is an attribute whose value is physically stored in the database.

Derived Attribute : It is an attribute whose value is determined from the value of existing
attribute. i.e., this need not be physically stored with in database. This is represented by a dotted
ovel in ER diagram. For example age attribute is a derived attribute.

DOB AGE

STUDENT
27 | U n i t - I

Relationships : A relationship is an association between entities. This is represented by


diamond symbol in ER diagram. A relationship type R among n entities E1,E2,E3,……,En defines
a set of associations. This collection of associations is also called a Relationship set.

Degree of Relationship : It is the number of participating entity types in that relationship. i.e., it
indicates number of entitites that are associated. Based on this degree the relationships can be
categorized as

1. Unary Relationships
2. Binary Relationships
3. Ternary Relationships
4. Higher order degree relationships.

 Unary Relationship : This is relationship which exists when an association is maintained


within a single entity type.

course

requires

In the above example course within a course is prerequisite for another course. i.e., course
required a course. i.e., course has a relationship with itself. Such relationships are also called as
+Recursive relationships.

 Binary Relationship: It is a relationship which exists when association is maintained


among two entities. In fact in order to simplify the conceptual design, most higher order
relationships are decomposed into appropriate binary relationships whenever possible.

Professor Teaches Class


28 | U n i t - I

 Ternary Relationship : This is one which exists when association is maintained among
3 entities.

Supply
Suppliers Parts

Projects

Constraints on Relationship Types : Relationship types usually have certain constraints


that limit the combinations of entities that participate in relationship set. There are two main types
of relationship constraints which are

1. Cardinality ratio.
2. Participation constraints

Both of these are called Structural Constraints.

1. Cardinality Ratio : It specifies the maximum number of entity instances associated with
one occurrence of related entity. For example in Teaches relationship the cardinality ratio
for professor : class is 1:N, means that each professor can teach any number of classes. But
each class is associated to only one professor. i.e., 1:1
2. Participation Constraint : It specifies whether the existence of an entity on its being
related to another entity through relationship type. i.e., this constraint specifies the
minimum number of instances that entity can participate in relationship. So sometimes it
is also called minimum cardinality constraint.
There are two types of participation constraints. They are total and partial. The total
participation means that every entity must be related to the another type entity in the
relationship. Thus it is also called existence dependency. However partial participation
means that some entities are related to some other entities in the relationship but not
necessary all.
29 | U n i t - I
Weak Entity : An entity that do not have key attribute of its own is called weak entity.
However in contrast entity that have a key attribute is called Regular or Strong Entity. Usually
key attribute of this weak entity is formed by combining the attribute of an entity to which it
is related. So this weak entity always has total participation. i.e., it is always existence
dependent. This can be represented by double line rectangle in ER diagram.
Example:
Consider the entity type dependent related to employee entity, which is used to keep track of
the dependents of each employee. The attributes of dependents are : name
,birthrate, sex and relationship. Each employee entity set is said to its own the dependent
entities that are related to it. How ever, not that the ‘dependent’ entity does not exist of its
own., it is dependent on the employee entity. In other words we can say that in case an
employee leaves the organization all dependents related to without the entity ‘employee’.
Thus it is a weak entity.

Keys:
Super key:
A super key is a set of one or more attributes that taken collectively, allow us to identify
uniquely an entity in the entity set.
For example , customer-id,(cname,customer-id),(cname,telno)
Candidate key:
In a relation R, a candidate key for R is a subset of the set of attributes of R, whichhave the
following properties:
Uniqueness: no two distinct tuples in R have the same values for
the candidate key
Irreducible: No proper subset of the candidate key has the
uniqueness property that is the candidate key.
Eg: (cname,telno)
Primary key:
The primary key is the candidate key that is chosen by the database designer as the principal
means of identifying entities with in an entity set. The remaining candidate keys if any, are
called alternate key.
30 | U n i t - I

ER-DIAGRAM:

The overall logical structure of a database using ER-model graphically with the help
of an ER-diagram.
Symbols use ER- diagram:

entity

Weak entity
composite attribute
attribute
Relationship

Multi valued attribute

Derived attribute Identifying


Relationship
Key attribute

1 m
1 1

One-to -one One-to -many


m 1
m n

many-to -one many-to -many

Total participation Partial participation


31 | U n i t - I
32 | U n i t - I
33 | U n i t - I
34 | U n i t - I
35 | U n i t - I
36 | U n i t - I

Enhanced Entity Relationship Model(EER Model)


This model includes the concepts like Specialization and Generalization. This EER is a complex
model which includes semantic concepts that give a more complex picture of the precise nature
of the data in a given system.

Specialization : It is a process of defining a set of subclasses of an entity type which is called super
class of Specialization. The set of subclasses can be defined on the basis of some distinguishing
characteristics of entities in super class. For example the following diagram shows the
specialization of Employee based on job characteristics. The lower entities are called subclass
entities. Subclass entities inherits all the attributes of super class entities. The relationship between
super and sub class entities is called Class/Subclass relationship or IS-A relationship. So the set of
sub classes secretary, engineering, technician is specialization of super class EMP based on job
type. Another specialization of EMP may yield a set of subclasses salaried and hourly based on
method of pay.
37 | U n i t - I

Generalization : Actually it is a reverse process of specialization. i.e., it is a process of defining


generalized entity from the given entity types. i.e.,. it is a process by which the subclassentities are
generalized into a single super class entity type. For example consider entity types CAR &
TRUCK. They can be generalized in to entity type VEHICLE because they have some common
attributes. So CAR & TRUCK are the subclasses of generalized super class VEHICLE.

Specialization & Generalization Hierarchies & Lattices : A subclass itself may have
further subclasses specified on it, i.e., forming a hierarchy or a lattice of specialization. For
example Engineering is a sub class of EMP and is a super class of Eng-Manger. A specialization
hierarchy has a constraint that every subclass has only one parent, which results a tree structure.
However in specialization lattice, a subclass can have more than one parent which results lattice.
38 | U n i t - I
39 | U n i t - I
40 | U n i t - I

ER- Diagram For College Database

rollno name addres


coursei cname duratio

Student opts Course


1
1

has enroll Taug


ed
1
1 Work fid

gaurdian Department dno Faculty addre

Head
name dnam 1 name sal
1
addres relationship

Date

Conversion of ER-diagram to relational database


Conversion of entity sets:
1. For each strong entity type E in the ER diagram, we create a relation R containing
all the single attributes of E. The primary key of the relation R will be one of the
key attribute of R.

STUDENT(rollno (primary key),name, address)


FACULTY(id(primary key),name ,address, salary)
COURSE(course-id,(primary key),course_name,duration)
DEPARTMENT(dno(primary key),dname)
41 | U n i t - I

2. for each weak entity type W in the ER diagram, we create another relation R that contains
all simple attributes of W. If E is an owner entity of W then key attribute of E is also include
In R. This key attribute of R is set as a foreign key attribute of
R. Now the combination of primary key attribute of owner entity type and partial key of
the weak entity type will form the key of the weak entity type

GUARDIAN((rollno,name) (primary key),address,relationship)

Conversion of relationship sets: Binary Relationships:


 One-to-one relationship:
For each 1:1 relationship type R in the ER-diagram involving two entities E1
and E2 we choose one of entities(say E1) preferably with total participation and
add primary key attribute of another E as a foreign key attribute in the
table ofentity(E1). We will also include all the simple attributes of relationship
type R in E1 if any, For example, the department relationship has been extended
tp include head-id and attribute of the relationship.

DEPARTMENT(D_NO,D_NAME,HEAD_ID,DATE_FROM)
 One-to-many relationship:
For each 1:n relationship type R involving two entities E1 and E2, we identify
the entity type (say E1) at the n-side of the relationship type R and include
primary key of the entity on the other side of the relation (say E2) as a foreign
key attribute in the table of E1. We include all simple attribute(or simple
components of a composite attribute of R(if any) in he table E1)
For example:
The works in relationship between the DEPARTMENT and FACULTY. For
this relationship choose the entity at N side, i.e, FACULTY and add primary
key attribute of another entity DEPARTMENT, ie, DNO as a foreign key
attribute in FACULTY.
FACULTY(CONSTAINS WORKS_IN RELATIOSHIP)
(ID,NAME,ADDRESS,BASIC_SAL,DNO)
42 | U n i t - I

 Many-to-many relationship:

For each m:n relationship type R, we create a new table (say S) to represent R,
Wealso include the primary key attributes of both the participating entity types
as a foreign key attribute in s. Any simple attributes of the m:n relationship
type(or simple components as a composite attribute) is also included as
attributes of S. For example:
The M:n relationship taught-by between entities COURSE; and FACULTY
shod be represented as a new table. The structure of the table will include
primary key of COURSE and primary key of FACULTY entities.
43 | U n i t - I
TAUGHT-BY(ID (primary key of FACULTY table),course-id (primary key of
COURSE table)

 N-ary relationship:

For each n-anry relationship type R where n>2, we create a new table S
torepresent R, We include as foreign key attributes in s the primary keys of the
relations that represent the participating entity types. We also include any simple
attributes of the n-ary relationship type(or simple components of complete
attribute) as attributes of S. The primary key of S is usually a combination of all
the foreign keys that reference the relations representing the participating entity
types.

Customer Loan

Loan -
sanctio

Employee

LOAN_SANCTION(customer_empno,loan_sanction,loan_date,loan_amount)

 Multi-valued attributes:
44 | U n i t - I

For each multivalued attribute ‘A’, we create a new relation R that includes an
attribute corresponding to plus the primary key attributes k of the relation that
represents the entity type or relationship that has as an attribute. The primary key
of R is then combination of A and k.

For example, if a STUDENT entity has rollno,name and phone number where
phone numer is a multivalued attribute the we will create table
PHONE(rollno,phoneno) where primary key is the combination,In the
STUDENT table we need not have phone number, instead if can be simply
(rollno,name) only.
PHONE(rollno,phoneno)

name
Account_n
Account branch

generalisation
specialisation
Is-a

intrest charges

Saving Current

 Converting Generalisation /specification hierarchy to tables:


A simple rule for conversion may be to decompose all the specialized entities into
table in case they are disjoint, for example, for the figure we can create the two table
as:
Account(account_no,name,branch,balance)
Saving account(account-no,intrest)
Current_account(account-no,charges)

You might also like