0% found this document useful (0 votes)
12 views

DBMS Unit-1

unit 1

Uploaded by

himagana21
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

DBMS Unit-1

unit 1

Uploaded by

himagana21
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 49

Unit-1

Database Systems (DBS)


Topics
• Introduction to Database : Data, information, database, metadata,
Database Management System (DBMS)
• Traditional File Processing Systems
• Advantages of File Processing Systems
• Disadvantages of File Processing Systems
• Purpose of Database System
• Database Characteristics
• Advantages of Database Approach
• Disadvantages of Database Approach
• Users of Database System
• Schemas and Instances
• Three Schema Architecture and Data Independence
• The Database System Environment
• File System Organization: Sequential - Pointer - Indexed – Direct
• Relational Algebra
Introduction to Database
Data: It is referred to known facts that could be recorded and stored on
computer media. Databases today are used to store objects such as
documents, photographic images, sound and even video segments, in
addition to conventional textual and numeric data. Thus the term data can be
broadly defined as the data that consists of facts, text, graphics, images,
sound and video segments that have a meaning in the user’s environment.

Information: The term data and information are closely related and in fact
are often used interchangeably. However it is often useful to distinguish
between data and information. We define information as data that has been
processed in such a way that it can increase the knowledge of the person
who uses it.
Data
Baker, Kenneth D. 324917628 Doyle, Joan E. 476193248 Finkle, Clive
R.548429344
Information
Name ID Major GPA
Baker, Kenneth D. 324917628 MGT 2.9
Doyle, Joan E. 476193248 MKT 3.4
Finkle, Clive R. 548429344 ACCT 2.8
Metadata:- Metadata are data that describe the properties or characteristics of
other data. Some of these properties include data, definitions, data structures and
rules or constraints. The metadata for a class roster is shown in the figure below,
which consists of data item name, data type, length, minimum and maximum
allowable values and a brief description of each item.
Data item Value
Name Type Length Min Max Description
Course Alphanumeric 30 Course ID and name
Section Integer 1 1 9 Section number
Semester Alphanumeric 10 Semester and year
Name Alphanumeric 30 Student name
ID Integer 9 Student ID(SSN)
Major Alphanumeric 4 Student major
GPA Decimal 3 0.0 4.0 Student grade point average

Database: A database is an organized collection of logically related data. A database


may be of any size and complexity. For example, a salesperson may maintain a small
database of customer contacts on her laptop computer that consists of a few
megabytes of data. A large corporation may build a very large database consisting of
several terabytes of data (a terabyte is a trillion bytes) on a large mainframe
computer that is used for decision support applications.
Database Management System(DBMS): A database management system(DBMS) is a
collection of interrelated data and set of programs that operate on that data.
Traditional File Processing Systems

Program Program Program Program


Program Program Program
A B A B
A B C

Invoicing System Payroll System


Order Filling System

Customer Inventory Pricing Employee


Customer Inventory Back Order
Master File Master File Master File File Master File
File

Traditional File Based Approach


Traditional file processing systems are learnt for the following reasons.

File processing systems are still widely used today, especially for backing

up database systems.

Understanding the problems and limitations inherent in file processing

systems, can help us avoid these same problems when designing database

systems.
Disadvantages of File Processing Systems

Program-data dependence
Duplication of data
Limited Data sharing
Lengthy development times
Excessive program maintenance
Purpose of Database Systems
Keeping organizational information in a file processing system has a number
of major disadvantages, which are given below:
•Data redundancy and inconsistency
•Difficulty in accessing data
•Data isolation
•Integrity problems
•Atomicity problems: A computer system, like any other device, is subject to
failure. In many applications, it is crucial that, if a failure occurs, the data be
restored to the consistent state that existed prior to the failure.
•Security problems: Not every user of the database system should be able to
access all the data. For example, in a university, payroll personnel need to
see only that part of the database that has financial information. They do
not need access to information about academic records. But, since
application programs are added to the file-processing system in an ad hoc
manner, enforcing such security constraints is difficult. These difficulties,
among others, prompted the development of database systems.
Characteristics of Database Systems

Self-describing nature of a database system


Insulation between programs and data, and data
abstraction
Support of multiple views of the data
Sharing of data and multiuser transaction processing
1) Self-Describing Nature of a Database System
It is the fundamental characteristic of the database approach which states that the
database system contains not only the database itself but also a complete definition
or description of the database structure and constraints.
This definition is stored in the DBMS catalog, which contains information such as
the structure of each file, the type and storage format of each data item, and various
constraints on the data.
The information stored in the catalog is called meta-data, and it describes the
structure of the primary database
The catalog is used by the DBMS software and also by database users who need
information about the database structure.
A general-purpose DBMS software package is not written for a specific database
application. Therefore, it must refer to the catalog to know the structure of the files
in a specific database, such as the type and format of data it will access.
The DBMS software must work equally well with any number of database
applications—for example, a university database, a banking database, or a company
database—as long as the database definition is stored in the catalog
2) Insulation between Programs and Data, and Data Abstraction
In traditional file processing, the structure of data files is embedded in the
application programs, so any changes to the structure of a file may require changing
all programs that access that file.
By contrast, DBMS access programs do not require such changes in most cases. The
structure of data files is stored in the DBMS catalog separately from the access
programs. We call this property program-data independence.
In some types of database systems, such as object-oriented and object-relational
systems, users can define operations on data as part of the database definitions.
An operation (also called a function or method) is specified in two parts. The
interface (or signature) of an operation includes the operation name and the data
types of its arguments (or parameters).
The implementation (or method) of the operation is specified separately and can
be changed without affecting the interface.
User application programs can operate on the data by invoking these operations
through their names and arguments, regardless of how the operations are
implemented. This may be termed program-operation independence.
The characteristic that allows program-data independence and program-operation
independence is called data abstraction.
A DBMS provides users with a conceptual representation of data that does not
include many of the details of how the data is stored or how the operations are
implemented.
Informally, a data model is a type of data abstraction that is used to provide this
conceptual representation. The data model uses logical concepts, such as objects,
their properties, and their interrelationships, that may be easier for most users to
understand than computer storage concepts.
Hence, the data model hides storage and implementation details that are not of
interest to most database users.
In the database approach, the detailed structure and organization of each file are
stored in the catalog.
Database users and application programs refer to the conceptual representation of
the files, and the DBMS extracts the details of file storage from the catalog when
these are needed by the DBMS file access modules.
Many data models can be used to provide this data abstraction to database users
3) Support of Multiple Views of the Data
A database typically has many users, each of whom may require a different
perspective or view of the database.
A view may be a subset of the database or it may contain virtual data that is
derived from the database files but is not explicitly stored.
Some users may not need to be aware of whether the data they refer to is stored
or derived.
A multiuser DBMS whose users have a variety of distinct applications must provide
facilities for defining multiple views.
4) Sharing of Data and Multiuser Transaction Processing
A multiuser DBMS, as its name implies, must allow multiple users to access the
database at the same time.
This is essential if data for multiple applications is to be integrated and maintained
in a single database.
The DBMS must include concurrency control software to ensure that several users
trying to update the same data do so in a controlled manner so that the result of the
updates is correct.
For example, when several reservation agents try to assign a seat on an airline
flight, the DBMS should ensure that each seat can be accessed by only one agent at a
time for assignment to a passenger. These types of applications are generally called
online transaction processing (OLTP) applications.
A fundamental role of multiuser DBMS software is to ensure that concurrent
transactions operate correctly and efficiently. The concept of a transaction has
become central to many database applications.
A transaction is an executing program or process that includes one or more
database accesses, such as reading or updating of database records.
Each transaction is supposed to execute a logically correct database access if
executed in its entirety without interference from other transactions.
The DBMS must enforce several transaction properties.
The isolation property ensures that each transaction appears to execute in
isolation from other transactions, even though hundreds of transactions may be
executing concurrently.
The atomicity property ensures that either all the database operations in a
transaction are executed or none are.
Advantages of the database approach

1) Program-Data independence
2) Minimal data redundancy
3) Improved data consistency
4) Improved Data sharing
5) Increased productivity of application development
6) Enforcement of standards
7) Improved data quality
8) Improved data accessibility and responsiveness
9) Reduced program maintenance
Costs and risks of the database approach

1) New Specialized personnel


2) Installation and management cost and complexity
3) Conversion costs
4) Need for explicit backup and recovery
5) Organizational conflict
Users of Database System
Actors on the Scene:- In this section we identify the
people whose jobs involve the day-to-day use of a large
database; we call them the actors on the scene.
i) Database Administrators (DBA)
ii) Database Designers
iii) End Users
• Casual end users
• Naive or parametric end users
• Sophisticated end users
• Standalone users
iv) System Analysts and Application
Programmers(Software Engineers)
i) Database Administrators (DBA)
In a database environment, the primary resource is the database itself, and the
secondary resource is the DBMS and related software.
Administering these resources is the responsibility of the database administrator
(DBA).
The DBA is responsible for authorizing access to the database, coordinating and
monitoring its use and acquiring software and hardware resources as needed.
The DBA is accountable for problems such as security breaches and poor system
response time.
In large organizations, the DBA is assisted by a staff that carries out these functions.

ii) Database Designers


Database designers are responsible for identifying the data to be stored in the
database and for choosing appropriate structures to represent and store this data.
These tasks are mostly undertaken before the database is actually implemented and
populated with data
Database designers typically interact with each potential group of users and develop
views of the database that meet the data and processing requirements of these
groups.
 Each view is then analyzed and integrated with the views of other user groups.
iii) End Users:-End users are the people whose jobs require access to the
database for querying, updating, and generating reports
•Casual end users - occasionally access the database, but they may need
different information each time
•Naive or parametric end users - These users main job function revolves
around constantly querying and updating the database, using standard
types of queries and updates called canned transactions that have been
carefully programmed and tested. Many of these tasks are now available
as mobile apps for use with mobile devices. The tasks that such users
perform are varied. A few examples are:
• Bank customers and tellers check account balances and post
withdrawals and deposits.
• Reservation agents or customers for airlines, hotels, and car rental
companies check availability for a given request and make
reservations.
•Sophisticated end users - include engineers, scientists, business analysts,
and others who thoroughly familiarize themselves with the facilities of the
DBMS in order to implement their own applications to meet their complex
requirements
•Standalone users - maintain personal databases by using ready-made program
packages that provide easy-to-use menu-based or graphics-based interfaces. An
example is the user of a financial software package that stores a variety of personal
financial data.

iv) System Analysts and Application Programmers(Software Engineers):-


System analysts determine the requirements of end users, especially
naive and parametric end users, and develop specifications for standard
canned transactions that meet these requirements.
Application programmers implement these specifications as
programs, then they test, debug, document, and maintain these canned
transactions.
Such analysts and programmers commonly referred to as
software developers or software engineers should be familiar with the full
range of capabilities provided by the DBMS to accomplish their tasks.
Workers behind the Scene
I. DBMS system designers and implementers:-design and implement the DBMS
modules and interfaces as a software package. A DBMS is a very complex software
system that consists of many components, or modules, including modules for
implementing the catalog, query language processing, interface processing,
accessing and buffering data, controlling concurrency, and handling data recovery
and security
II. Tool developers:- these people design and implement tools—the software
packages that facilitate database modeling and design, database system design,
and improved performance.
III. Operators and maintenance personnel (system administration personnel) - are
responsible for the actual running and maintenance of the hardware and software
environment for the database system
When Not to Use a DBMS
The overhead costs of using a DBMS are due to the following:
 High initial investment in hardware, software, and training
 The generality that a DBMS provides for defining and processing
data
 Overhead for providing security, concurrency control, recovery,
and integrity functions
Therefore, it may be more desirable to develop customized database
applications under the following circumstances:
Simple, well-defined database applications that are not expected
to change at all
Stringent, real-time requirements for some application programs
that may not be met because of DBMS overhead
Embedded systems with limited storage capacity, where a
general-purpose DBMS would not fit
No multiple-user access to data
Schemas and Instances
Data abstraction - generally refers to the suppression of details of data organization and
storage, and the highlighting of the essential features for an improved understanding of data.
One of the main characteristics of the database approach is to support data abstraction so that
different users can perceive data at their preferred level of detail.
Data model - collection of concepts that can be used to describe the structure of a database
and provides the necessary means to achieve this abstraction. By structure of a database we
mean the data types, relationships, and constraints that apply to the data. Most data models
also include a set of basic operations for specifying retrievals and updates on the database.
Database schema - The description of a database is called the database schema, which is
specified during database design and is not expected to change frequently. Most data models
have certain conventions for displaying schemas as diagrams. A displayed schema is called a
schema diagram
Database state or snapshot - The actual data in a database may change quite frequently. The
data in the database at a particular moment in time is called a database state or snapshot. It is
also called the current set of occurrences or instances in the database.
Diagram displays the structure of each record type but not the actual instances of
records. We call each object in the schema such as STUDENT or COURSE a schema
construct.
Specifying a correct schema to the DBMS is extremely important and the schema
must
be designed with utmost care. The DBMS stores the descriptions of the schema
constructs and constraints—also called the meta-data—in the DBMS catalog so
that DBMS software can refer to the schema whenever it needs to. The schema is
sometimes called the intension, and a database state is called an extension of the
schema.
Three-Schema Architecture
Internal level - It has an internal schema, which describes the physical
storage structure of the database. The internal schema uses a physical data
model and describes the complete details of data storage and access paths
for the database.
Conceptual level - The conceptual level has a conceptual schema, which
describes the structure of the whole database for a community of users. The
conceptual schema hides the details of physical storage structures and
concentrates on describing entities, data types, relationships, user
operations, and constraints
External or View level - The external or view level includes a number of
external schemas or user views. Each external schema describes the part of
the database that a particular user group is interested in and hides the rest
of the database from that user group.
Data Independence
Data Independence :- The three-schema architecture can be used to further explain the
concept of data independence, which can be defined as the capacity to change the schema at
one level of a database system without having to change the schema at the next higher level.
1) Logical data independence – It is the capacity to change the conceptual schema
without having to change external schemas or application programs. We may change
the conceptual schema to expand the database (by adding a record type or data
item), to change constraints, or to reduce the database (by removing a record type or
data item). In the last case, external schemas that refer only to the remaining data
should not be affected.
2) Physical data independence – It is the capacity to change the internal schema
without having to change the conceptual schema. Hence, the external schemas need
not be changed as well. Changes to the internal schema may be needed because
some physical files were reorganized—for example, by creating additional access
structures—to improve the performance of retrieval or update. If the same data as
before remains in the database, we should not have to change the conceptual
schema.
The Database System Environment
DBMS Component Modules:-
The database and the DBMS catalog are usually stored on disk. Access to the disk is
controlled primarily by the operating system (OS), which schedules disk read/write.
Many DBMSs have their own buffer management module to schedule disk
read/write, because management of buffer storage has a considerable effect on
performance.
Reducing disk read/write improves performance considerably.
A higher-level stored data manager module of the DBMS controls access to DBMS
information that is stored on disk, whether it is part of the database or the catalog
The DBA staff works on defining the database and tuning it by making changes to its
definition using the DDL and other privileged commands
Casual users and persons with occasional need for information from the database
interact using the interactive query interface
The queries are parsed and validated for correctness of the query syntax, the names
of files and data elements, and so on by a query compiler that compiles them into an
internal form.
This internal query is subjected to query optimization. Among other things, the query
optimizer is concerned with the rearrangement and possible reordering of operations,
elimination of redundancies, and use of efficient search algorithms during execution
Database System Utilities:-
Loading - A loading utility is used to load existing data files—such as text files or
sequential files—into the database.
Usually, the current (source) format of the data file and the desired (target)
database file structure are specified to the utility, which then automatically reformats
the data and stores it in the database.
Backup - A backup utility creates a backup copy of the database, usually by dumping the
entire database onto tape or other mass storage medium.
The backup copy can be used to restore the database in case of catastrophic
disk failure.
Database storage reorganization - This utility can be used to reorganize a set of
database files into different file organizations and create new access paths to improve
performance.
Performance monitoring- Such a utility monitors database usage and provides statistics
to the DBA. The DBA uses the statistics in making decisions such as whether or not to
reorganize files or whether to add or drop indexes to improve performance.
Tools, Application Environments, and Communications Facilities:-

CASE(Computer Aided Software Engineering) tools are used in the design


phase of database systems.
Another tool that can be quite useful in large organizations is an expanded
data dictionary (or data repository) the data dictionary stores other
information, such as design decisions, usage standards, application program
descriptions, and user information.
Such a system is also called an information repository.
This information can be accessed directly by users or the DBA when needed.
Application development environments, such as PowerBuilder (Sybase) or
JBuilder (Borland), have been quite popular.
These systems provide an environment for developing database applications
and include facilities that help in many facets of database systems, including
database design, GUI development, querying and updating, and application
program development.
The DBMS also needs to interface with communications software, whose
function is to allow users at locations remote from the database system site to
access the database through computer terminals, workstations, or personal
computers.
File System Organization
File organization ensures that records are available for processing. It is used to
determine an efficient file organization for each base relation.
For example, if we want to retrieve employee records in alphabetical
order of name. Sorting the file by employee name is a good file organization.
However, if we want to retrieve all employees whose marks are in a certain
range, a file is ordered by employee name would not be a good file organization.

Types of File Organization


1. Sequential access file organization
2. Direct access file organization (Hash File Organization)
3. Indexed sequential access file organization
Sequential access File System Organization

Storing and sorting in contiguous block within files on tape


or disk is called as sequential access file organization.

In sequential access file organization, all records are


stored in a sequential order. The records are arranged in the
ascending or descending order of a key field.

Sequential file search starts from the beginning of the file


and the records can be added at the end of the file.

In sequential file, it is not possible to add a record in the


middle of the file without rewriting the file.
Sequential File Organization can be implemented in two ways:
1. Pile File Method: In this method, we store the record in a sequence, i.e., one after
another. Here, the record will be inserted in the order in which they are inserted into
tables.
In case of updating or deleting of any record, the record will be searched in the
memory blocks. When it is found, then it will be marked for deleting, and the new record is
inserted.

Insertion of the new record:


Suppose we have four records R1, R3 and so on upto R9 and R8 in a sequence. Hence,
records are nothing but a row in the table. Suppose we want to insert a new record R2
in the sequence, then it will be placed at the end of the file. Here, records are nothing
but a row in any table.
2. Sorted File Method: In this method, the new record is always inserted at the end of the
file and then it will sort the sequence in ascending or descending order. Sorting of records
is based on any primary key or any other key.
In the case of modification of any record, it will update the record and then sort
the file, and lastly, the updated record is placed in the right place.

Insertion of the new record:


Suppose there is a preexisting sorted sequence of four records R1, R3 and so on upto R6
and R7. Suppose a new record R2 has to be inserted in the sequence, then it will be
inserted at the end of the file, and then it will sort the sequence.
Pros of sequential file organization
It contains a fast and efficient method for the huge amount of data.
In this method, files can be easily stored in cheaper storage
mechanism like magnetic tapes.
It is simple in design and so it does not require much effort to store
the data.
This method is used when most of the records have to be accessed
like grade calculation of a student, generating the salary slip, etc.
This method is used for report generation or statistical calculations.

Cons of sequential file organization


It will waste time as we cannot jump on a particular record that is
required but we have to move sequentially which takes our time.
It has high data redundancy
Random searching is not possible
Sorted file method takes more time and space for sorting the
records.
Direct access File System Organization
Direct access file is also known as random access or relative file
organization or Hash File Organization
In direct access file, all records are stored in direct access storage device
(DASD), such as hard disk. The records are randomly placed throughout the
file.
The records does not need to be in sequence because they are updated
directly and rewritten back in the same location.
This file organization is useful for immediate access to large amount of
information. It is used in accessing large databases.
A hash function is calculated in this approach for storing the records – that
provides us with the address of the block that stores the record.
Any mathematical function can be used in the form of a hash function.
Hash File Organization uses the computation of the hash function on some
fields of a record. The output of the hash function defines the position of the
disc block where the records will be stored.
When a record is requested using the hash key columns, an address is
generated, and the entire record is fetched using that address.
When a new record needs to be inserted, the hash key is used to generate
the address, and the record is then directly placed.
In the case of removing and updating, the same procedure is followed.
There is no effort involved in searching and categorizing the full file using
this method. Each record will be put in the RAM at random using this
procedure.
Advantages of direct access file organization
Direct access file helps in online transaction processing system
(OLTP) like online railway reservation system, online banking, ticket
booking system etc.
In direct access file, sorting of the records are not required.
It has better control over record allocation.
Since block address is known by hash function, accessing any record
is very faster. Similarly updating or deleting a record is also very
quick.
This method can handle multiple transactions as each record is
independent of other. i.e.; since there is no dependency on storage
location for each record, multiple records can be accessed at the
same time.
Disadvantages of direct access file organization
Direct access file does not provide back up facility.
It has less storage space as compared to sequential file.
This method may accidentally delete the data. For example, In Student table, when
hash field is on the STD_NAME column and there are two same names – ‘Antony’,
then same address is generated. In such case, older record will be overwritten by
newer. So there will be data loss. Thus hash columns needs to be selected with
utmost care. Also, correct backup and recovery mechanism has to be established.
Since all the records are randomly stored, they are scattered in the memory. Hence
memory is not efficiently used.
If we are searching for range of data, then this method is not suitable. Because,
each record will be stored at random address. Hence range search will not give the
correct address range and searching will be inefficient. For example, searching the
employees with salary from 20K to 30K will be inefficient.
Searching for records with exact name or value will not be efficient. If the Student
name starting with ‘B’ will not be efficient as it does not give the exact name of the
student.
If there is a search on some columns which is not a hash column, then the
search will not be efficient. This method is efficient only when the search is
done on hash column. Otherwise, it will not be able find the correct address
of the data.
If there is multiple hash columns – say name and phone number of a
person, to generate the address, and if we are searching any record using
phone or name alone will not give correct results.
If these hash columns are frequently updated, then the data block address
is also changed accordingly. Each update will generate new address. This is
also not acceptable.
Hardware and software required for the memory management are costlier
in this case. Complex programs needs to be written to make this method
efficient.
Indexed Sequential Access Method (ISAM)
File System Organization

•Indexed sequential access file combines both sequential file and direct access
file organization.
•In indexed sequential access file, records are stored randomly on a direct
access device such as magnetic disk by a primary key.
•The data can be access either sequentially or randomly using the index. The
index is stored in a file and read into memory when the file is opened.
• In this method, primary key of the record is stored with an address, this
address is mapped to an address of a data block in memory. This address field
works as an index of the file.
•In this method, reading and fetching a record is done using the index of the
file. Index field contains the address of a data record in memory, which can be
quickly used to read and fetch the record from memory.
Advantages of Indexed sequential access file organization
In indexed sequential access file, sequential file and random file
access is possible.
It accesses the records very fast if the index table is properly
organized.
The records can be inserted in the middle of the file
It provides quick access for sequential and direct processing
It reduces the degree of the sequential search.

Disadvantages of Indexed sequential access file organization


Indexed sequential access file requires unique keys and periodic
reorganization.
Indexed sequential access file takes longer time to search the index
for the data access or retrieval.
It requires more storage space.
It is expensive because it requires special software.
It is less efficient in the use of storage space as compared to other file
organizations.

**********

You might also like