Module 1
Introduction to Database
Concepts
Contents
Introduction To DBMS
Data v/s Information
Characteristics of databases
Main Characteristics of the Database
Approach
File system v/s Database system
Instances and Schemas
Data Independence
DBMS system architecture
Users of Database system
Database Administrator
Lets check Your Knowledge
What is DBMS?
What is Data?
What is Information?
Why do we need DBMS AT ALL?
Application of DBMS?
Basic Definitions
Data:
Known facts that can be recorded and have an implicit meaning.
Generally, data comprises facts, observations, perceptions numbers,
characters, symbols, image, etc.
ex:25,suresh,mumbai
Information:
It is processed data which includes data that possess context,
relevance, and purpose. It also involves manipulation of raw data.
Information is a group of data that collectively carry a logical
meaning.
ex: age of Suresh is 25 and leaves n Mumbai
Database:
A collection of inter-related data.
Database Management System (DBMS):
A software package/ system to facilitate the creation and
maintenance of a computerized database.
Database System:
The DBMS software together with the data itself. Sometimes, the
applications are also included.
Data Vs Information
Data Information
Data is unorganized raw facts that
Information is a processed, organized
need processing without which it is
data presented in a given context and
seemingly random and useless to
is useful to humans.
humans
Data is an individual unit that contains
Information is a group of data that
raw material which does not carry any
collectively carry a logical meaning.
specific meaning.
Data doesn’t depend on information. Information depends on data.
Information is measured in meaningful
It is measured in bits and bytes.
units like time, quantity, etc.
The average score of a class is the
An example of data is a student’s test
information derived from the given
score
data.
Database Management
System
DBMS contains information about a particular enterprise
Collection of interrelated data
Set of programs to access the data
An environment that is both convenient and efficient
to use
Database Applications:
Banking: transactions
Airlines: reservations, schedules
Universities: registration, grades
Sales: customers, products, purchases
Online retailers: order tracking, customized
recommendations
Manufacturing: production, inventory, orders, supply
chain
Human resources: employee records, salaries, tax
deductions
Databases can be very large.
Typical DBMS
Functionality
Levels of Abstraction (View
of Data)
Levels of Abstraction
Physical level: The lowest level of abstraction describes
how the data are actually stored. The physical level
describes complex low-level data structures in detail.
Logical level: describes what data are stored in the
database, and what relationships exist among those data.
The logical level thus describes the entire database in terms
of a small number of relatively simple structures
type instructor = record
ID : string;
name : string;
dept_name : string;
salary : integer;
end;
View level: The highest level of abstraction describes only
part of the entire database(multiple views of data).
Instances and
Schemas
Schema:
The description of a database.
Includes descriptions of the database structure,
data types, and the constraints on the database
Structure of database
Type of variable (int a)
Instance :
The actual content of the database at a
particular point in time. Or
The actual data stored in a database at a
particular moment in time. This includes the
collection of all the data in the database.
Analogous to the vvariable(a=5)alue of a.
Also called database instance (or occurrence
or snapshot).
SCHEMA
STUDENT (RN,NAME,DOB,PH_NO)
TEACHER
(TID,NAME,DEPT,SAL)T=1
RN NAME DOB PH_NO
1 A 24/09/787 222332321
8
2 B 24/09/787 332
8
3 C 24/09/787 332
8
Example of a Database
Schema
Instances
Physical schema– The physical
schema describes the database design
at the physical level
Logical Schema – describes the
database design at the logical level.
Subschema-A database may also
have several schemas at the view
level, sometimes called subschemas,
that describe different views of the
database.
The three-schema
architecture
Data Independence
Logical Data Independence:
The ability to change the conceptual/logical schema
without having to change the external schemas and their
associated application programs.
Physical Data Independence:
The ability to modify the physical/internal schema
without changing the logical schema .
For example, the internal schema may be changed when
certain file structures are reorganized or new indexes are
created to improve database performance
When a schema at a lower level is changed, only the
mappings between this schema and higher-level schemas
need to be changed in a DBMS that fully supports data
independence.
In general, the interfaces between the various levels and
components should be well defined so that changes in
some parts do not seriously influence others.
Database Schema vs.
Database instance
Database State/instance:
Refers to the content of a database at a moment
in time.
Initial Database State:
Refers to the database state when it is initially
loaded into the system.
Valid State:
A state that satisfies the structure and constraints
of the database.
Distinction
The database schema changes very infrequently.
The database state changes every time the
database is updated.
Schema is also called intension.
State is also called extension.
Example of a database state
Database System Architecture
Database Users and
Administrators
Database Users
🞆 There are four different types of database-
system users, differentiated by the way they
expect to interact with the system.
Naïve Users
Application Programmer
Sophisticated Users
Specialized Users
🞆 Different types of user interfaces have been
designed for the different types of users.
Naive users
🞆 Naïve users re unsophisticated users who
interact with the system by invoking one of the
application programs that have been written
previously.
🞆 For example, a clerk in the university who
needs to add a new instructor to department
🞆 The typical user interface for naive users is a
forms interface, where the user can fill in
appropriate fields of the form.
🞆 Naive users may also simply read reports
generated from the database
🞆 Such a user connects to a Web application
program that runs at a Web server.
Application programmers
🞆 Application programmers are computer
professionals who write application
programs.
🞆 Application programmers can choose from
many tools to develop user interfaces.
🞆 Rapid application development (RAD) tools
are tools that enable an application
programmer to construct forms and reports
with minimal programming effort.
Sophisticated users
🞆 Sophisticated users interact with the
system without writing programs.
🞆 Instead, they form their requests either
using a database query language or by
using tools such as data analysis
software.
🞆 Analysts who submit queries to explore
data in the database fall in this category.
Specialized users
🞆 Specialized users are sophisticated users
who write specialized database applications
that do not fit into the traditional data-
processing framework.
🞆 Among these applications are computer-
aided design systems, knowledgebase and
expert systems, systems that store data with
complex data types (for example, graphics
data and audio data), and environment-
modeling systems.
Database Administrator
🞆 A person who has such central
control over the system is called a
database administrator (DBA).
🞆 functions of a DBA include:
Schema definition.
Storage structure and access-method
definition.
Schema and physical-organization
modification.
Granting of authorization for data access.
Routine maintenance.
Database Administrator……
🞆 Routine maintenance includes:
Periodically backing up the database, either
onto tapes or onto remote servers, to prevent loss
of data in case of disasters such as flooding.
Ensuring that enough free disk space is
available for normal operations, and upgrading
disk space as required.
Monitoring jobs running on the database and
ensuring that performance is not degraded by
very expensive tasks submitted by some users.
DATABASE
LANGUAGES
DDL:DATA DEFINITION LANGUAGE
DML:DATA MANIPULATION
LANGUAGE
DCL :DATA CONTROL LANGUAGE
TCL:TRANSACTION CONTROL
Database Engine
Query processing
Storage manager
Transaction manager
Query Processing
The query processor components include:
DDL interpreter: which interprets DDL statements and records the
definitions in the data dictionary.
DML compiler: which translates DML statements in a query
language into an evaluation
plan consisting of low-level instructions that the query-evaluation
engine understands.
A query can usually be translated into any of a number of alternative
evaluation plans that all give the same result. The DML compiler
also performs query Optimization that is, it picks the lowest cost
evaluation plan from among the alternatives.
Query evaluation engine: which executes low-level instructions
generated by the DML compiler.
Query Processing
1.Parsing and translation
2.Optimization
3.Evaluation
Query Processing (Cont.)
Alternative ways of evaluating a given query
Equivalent expressions
Different algorithms for each operation
Cost difference between a good and a bad way
of evaluating a query can be enormous
Need to estimate the cost of operations
Depends critically on statistical information
about relations which the database must
maintain
Need to estimate statistics for intermediate
results to compute cost of complex
expressions
Storage Management
Storage manager is a program module that provides
the interface between the low-level data stored in the
database and the application programs and queries
submitted to the system.
The storage manager is responsible to the following
tasks:
Interaction with the OS file manager
Efficient storing, retrieving and updating of data
The storage manager implements several data structures as part
of the physical system implementation
Data files
Data dictionary
Indexing and hashing
Storage Management
continue:
The storage manager components include:
Authorization and integrity manager: which tests for the satisfaction
of integrity constraints and checks the authority of users to access
data.
Transaction manager: which ensures that the database remains in a
consistent (correct)state despite system failures, and that concurrent
transaction executions proceed without conflicts.
File manager: which manages the allocation of space on disk storage
and the data structures used to represent information stored on disk.
Buffer manager: which is responsible for fetching data from disk
storage into main memory, and deciding what data to cache in main
memory. The buffer manager is a critical part of the database system,
since it enables the database to handle data sizes that are much
larger than the size of main memory.
Transaction Management
What if the system fails?
What if more than one user is concurrently
updating the same data?
A transaction is a collection of operations that
performs a single logical function in a database
application
Transaction-management component ensures
that the database remains in a consistent
(correct) state despite system failures (e.g.,
power failures and operating system crashes) and
transaction failures.
Concurrency-control manager controls the
interaction among the concurrent transactions, to
ensure the consistency of the database.
EXAMPLE OF
TRANSACTION
Transfer of 100 rs from account A to B.
READ(A)
A=A-100;
WRITE(A)
READ(B)
B=B+100;
WRITE(B)
INITIALLY AFTER TRANSACTION
A=1000 A=900
B=2000 B=2100
Database architecture
🞆 Database applications are
Two-tier architecture
Three-tier architecture
Two-tier architecture
🞆 In a two-tier architecture, the application resides
at the client machine, where it invokes database
system functionality at the server machine
through query language statements.
🞆 Application program interface standards like
ODBC and JDBC are used for interaction between
the client and the server
Three-tier architecture
🞆 In a three-tier architecture, the client machine acts
as merely a front end and does not contain any direct
database calls.
🞆 Instead, the client end communicates with an
application server, usually through a forms interface.
🞆 The application server in turn communicates with a
database system to access data.
🞆 The business logic of the application, which says
what actions to carry out under what conditions, is
embedded in the application server, instead of being
distributed across multiple clients.
🞆 Three-tier applications are more appropriate for large
applications, and for applications that run on the
World Wide Web.