Module-1 (2)
Module-1 (2)
Systems
1
Course Faculty Details
Dr. Taranath N L
Associate Professor,
Dept. of CSE,
Alliance University Bengaluru
2
Course Syllabus
MODULE 1: INTRODUCTION TO Database and its Architectures
4
Books
Text Books :
1. RamezElmasri, ShamkantB.Navathe, "Fundamentals of Database
Systems",
6th Edition, Pearson Publishers, 2013. ISBN 10: 0-136-08620-9, ISBN
13: 978-0
136-08620-8.
Reference Books :
1. Ramakrishnan, "Database Management System", 3rd Edition,
McGraw-Hill, 2014. ISBN-10: 0072465638, ISBN-13: 978-
0072465631.
7
Data Processing System
A data processing system may involve some
combination of:
Conversion converting data to another format.
Validation – Ensuring that supplied data is "clean,
correct and useful."
Sorting – "arranging items in some sequence and/or in
different sets."
Summarization – reducing detail data to its main
points.
Aggregation – combining multiple pieces of data.
Analysis – the "collection, organization, analysis,
interpretation and presentation of data.".
Reporting – list detail or summary data or computed
information.
8
What is Information?
A collection of data which conveys some meaningful
idea is information. It may provide answers to questions
like who, which, when, why, what, and how.
OR
The raw input is data and it has no significance when it
exists in that form. When data is collated or organized
into something meaningful, it gains significance. This
meaningful organization is information.
OR
Observations and recordings are done to obtain data,
while analysis is done to obtain information
9
Traditional File Based Systems
Predecessor to the DBMS.
A file system is a hierarchical description of the
folders on a drive and information about the files
inside them.
It handles the movement, creation and deletion of
those folders and files.
A collection of application programs that perform
services for the end-users such as the production
of reports.
Each program defines and manages its own data.
File based systems were developed as better
alternatives to paper based filing systems.
10
File System v/s DBMS
Advantages Disadvantages
FMS • Simpler to use • Typically no multi-user
• Less expensive access
• Limited to smaller
databases
• Limited functionality
• Decentralization of data
• Redundancy and
integrity issues
DBMS • Greater flexibility • Difficult to learn
• Greater processing • Packaged separately
power from the OS
• Ensures data • Requires skilled
integrity administrators
• Supports • Expensive
simultaneous access 11
What is database?
A shared collection of logically related data, and a
description of this data, designed to meet the
information needs of an organization.
It is the collection of schemas, tables, queries,
reports, views and other objects.
In other words, A database is a collection of
information that is organized so that it can easily be
accessed, managed, and updated.
The data is typically organized to model aspects of
reality in a way that supports processes requiring
information, such as modelling the availability of
rooms in hotels in a way that supports finding a hotel
with vacancies.
Often abbreviated DB.
12
What is DBMS?
A software system that enables users to define,
create, maintain, and control access to the
database.
DBMS contains information about a particular
enterprise
Collection of interrelated data
Set of programs to access the data
An environment that is both convenient and
efficient to use
13
Database Applications
Database Applications:
Banking: all transactions
Airlines: reservations, schedules
Universities: registration, grades
Sales: customers, products, purchases
Online retailers: order tracking, customized
recommendations
Manufacturing: production, inventory, orders,
supply chain
Human resources: employee records, salaries,
tax deductions
Databases touch all aspects of our lives
14
Purpose / Benefits of Database Systems
In the early days, database applications were
built directly on top of file systems
Drawbacks of using file systems to store data:
Data redundancy and inconsistency
Multiple file formats, duplication of information in
different files
Difficulty in accessing data
Need to write a new program to carry out each new
task
Data isolation — multiple files and formats
Integrity problems
Integrity constraints (e.g. account balance > 0)
become “buried” in program code rather than being
stated explicitly 15
Purpose / Benefits of Database Systems
Drawbacks of using file systems (cont.)
Atomicity of updates
Failures may leave database in an inconsistent state
with partial updates carried out
Example: Transfer of funds from one account to
another should either complete or not happen at all
Concurrent access by multiple users
Concurrent accessed needed for performance
Uncontrolled concurrent accesses can lead to
inconsistencies
Example: Two people reading a balance and
updating it at the same time
Security problems
Hard to provide user access to some, but not all,
data
Database systems offer solutions to all the16
Levels of Abstraction
Physical level: describes how a record
(e.g., instructor) is stored.
Logical level: describes data stored in
database, and the relationships among the
data.
type instructor = record
ID : string;
name : string;
dept_name : string;
salary : integer;
end;
View level: application programs hide
details of data types. Views can also hide
information (such as an employee’s salary) 17
View of Data
An architecture for a database system
18
Instances and Schemas
Similar to types and variables in programming languages
Logical Schema – the overall logical structure of the
database
Example: The database consists of information about a
set of customers and accounts in a bank and the
relationship between them
Analogous to type information of a variable in a
program
Physical schema– the overall physical structure of the
database
Instance – the actual content of the database at a
particular point in time
Analogous to the value of a variable
Physical Data Independence – the ability to modify the
physical schema without changing the logical schema
Applications depend on the logical schema
In general, the interfaces between the various levels 19
Data Models
A collection of tools for describing
Data
Data relationships
Data semantics
Data constraints
Relational model
Entity-Relationship data model (mainly for
database design)
Object-based data models (Object-oriented and
Object-relational)
Semistructured data model (XML)
Other older models:
Network model
Hierarchical model
20
Relational Model
All the data is stored in various tables.
Columns
Example of tabular data in the relational
model
Rows
21
A Sample Database
22
Database Languages
Database Languages
A database system provides a data-definition
language to specify the database schema and a
data-manipulation language to express database
queries and updates.
In practice, the data definition and data-
manipulation languages are not two separate
languages; instead they simply form parts of a
single database language, such as the widely used
SQL language.
23
Data Definition Language (DDL)
Specification notation for defining the database
schema
Example: create table instructor (
ID char(5),
name varchar(20),
dept_name varchar(20),
salary numeric(8,2))
DDL compiler generates a set of table templates
stored in a data dictionary
Data dictionary contains metadata (i.e., data
about data)
Database schema
Integrity constraints
Primary key (ID uniquely identifies
instructors)
24
Data Definition Language (DDL)
Domain Constraints. A domain of possible values must be
associated with every attribute (for example, integer types,
character types, date/time types). Declaring an attribute to
be of a particular domain acts as a constraint on the values
that it can take. Domain constraints are the most elementary
form of integrity constraint. They are tested easily by the
system whenever a new data item is entered into the
database.
Referential Integrity. There are cases where we wish to
ensure that a value that appears in one relation for a given
set of attributes also appears in a certain set of attributes in
another relation (referential integrity).
For example, the department listed for each course
must be one that actually exists. More precisely, the dept name
value in a course record must appear in the dept name
attribute of some record of the department relation. Database
modifications can cause violations of referential integrity.
When a referential-integrity constraint is violated, the normal
procedure is to reject the action that caused the violation. 25
Data Definition Language (DDL)
Assertions. An assertion is any condition that the database
must always satisfy. Domain constraints and referential-
integrity constraints are special forms of assertions.
However, there are many constraints that we cannot express
by using only these special forms.
For example, “Every department must have at least five
courses offered every semester” must be expressed as an
assertion. When an assertion is created, the system tests it for
validity. If the assertion is valid, then any future modification to
the database is allowed only if it does not cause that assertion
to be violated.
Authorization. We may want to differentiate among the
users as far as the type of access they are permitted on
various data values in the database. These differentiations
are expressed in terms of authorization, the most common
being: read authorization, which allows reading, but not
modification, of data; insert authorization, which allows
insertion of new data, but not modification of existing data;
update authorization, which allows modification, but not 26
Data Manipulation Language (DML)
Data-Manipulation Language
A data-manipulation language (DML) is a
language that enables users to access or manipulate
data as organized by the appropriate data model.
The types of access are:
• Retrieval of information stored in the database
• Insertion of new information into the database
• Deletion of information from the database
• Modification of information stored in the database
There are basically two types:
• Procedural DMLs require a user to specify what
data are needed and how to get those data.
• Declarative DMLs (also referred to as
nonprocedural DMLs) require a user to specify what
data are needed
27
Data Manipulation Language (DML)
Declarative DMLs are usually easier to learn
and use than are procedural DMLs. However, since
a user does not have to specify how to get the
data, the database system has to figure out
an efficient means of accessing data.
A query is a statement requesting the
retrieval of information. The portion of a DML that
involves information retrieval is called a query
language. Although technically incorrect, it is
common practice to use the terms query language
and data-manipulation language synonymously.
28
29
Data Dictionary
Data elements that are define in all tables of all
databases. Specifically the data dictionary stores
the name, datatypes, display formats, internal
storage formats, and validation rules. The data
dictionary tells where an element is used, by
whom it is used and so on.
Tables define in all databases. For example, the
data dictionary is likely to store the name of the
table creator, the date of creation access
authorizations, the number of columns, and so on.
Indexes define for each database tables. For each
index the DBMS stores at least the index name the
attributes used, the location, specific index
characteristics and the creation date.
Define databases: who created each database, the
date of creation where the database is located,
who the DBA is and so on.
30
Data Dictionary Continued…..
End users and The Administrators of the
data base
Programs that access the database
including screen formats, report formats
application formats, SQL queries and so on.
Access authorization for all users of all
databases.
Relationships among data elements which
elements are involved: whether the
relationship are mandatory or optional, the
connectivity and cardinality and so on.
31
Data Dictionary Continued…..
32
SQL
The most widely used commercial
language
SQL is NOT a Turing machine equivalent
language
To be able to compute complex functions
SQL is usually embedded in some higher-
level language
Application programs generally access
databases through one of
Language extensions to allow embedded
SQL
Application program interface (e.g.,
ODBC/JDBC) which allow SQL queries to be 33
Database Design
Logical Design – Deciding on the database
schema. Database design requires that we
find a “good” collection of relation
schemas.
Business decision – What attributes should
we record in the database?
Computer Science decision – What relation
schemas should we have and how should
the attributes be distributed among the
various relation schemas?
Physical Design – Deciding on the physical
layout of the database
34
Database Design
35
Database Design
36
Design Approaches
Need to come up with a methodology to
ensure that each of the relations in the
database is “good”
Two ways of doing so:
Entity Relationship Model (Module-2)
Models an enterprise as a collection of entities
and relationships
Represented diagrammatically by an entity-
relationship diagram:
Normalization Theory (Module-4)
Formalize what designs are bad, and test for
them
37
Object-Relational Data Models
Relational model: flat, “atomic” values
Object Relational Data Models
Extend the relational data model by
including object orientation and constructs
to deal with added data types.
Allow attributes of tuples to have complex
types, including non-atomic values such as
nested relations.
Preserve relational foundations, in particular
the declarative access to data, while
extending modeling power.
Provide upward compatibility with existing
relational languages.
38
XML: Extensible Markup Language
Defined by the WWW Consortium (W3C)
Originally intended as a document markup
language not a database language
The ability to specify new tags, and to
create nested tag structures made XML a
great way to exchange data, not just
documents
XML has become the basis for all new
generation data interchange formats.
A wide variety of tools is available for
parsing, browsing and querying XML
documents/data
39
Database Engine
Storage manager
Query processing
Transaction manager
40
Storage Management
A storage manager is a program module
that provides the interface between the low
level data stored in the database and the
application programs and queries submitted
to the system.
The storage manager is responsible for the
interaction with the file manager.
The raw data are stored on the disk using
the file system, which is usually provided by
a conventional operating system.
The storage manager translates the various
DML statements into low-level file-system
commands. Thus, the storage manager is
responsible for storing, retrieving, and
updating data in the database.
41
Storage Manager Components
Authorization and integrity manager: which
tests for the satisfaction of integrity constraints
and checks the authority of users to access data.
Transaction manager: which ensures that the
database remains in a consistent (correct) state
despite system failures, and that concurrent
transaction executions proceed without conflicting.
File manager: which manages the allocation of
space on disk storage and the data structures used
to represent information stored on disk.
Buffer manager: which is responsible for fetching
data from disk storage into main memory, and
deciding what data to cache in main memory. The
buffer manager is a critical part of the database
system, since it enables the database to handle
data sizes that are much larger than the size of
42
Query Processing
1. Parsing and translation
2. Optimization
3. Evaluation
43
Query Processing
Parser: During parse
call, the database
performs the following
checks- Syntax check,
Semantic check and
Shared pool check,
after converting the
query into relational
algebra.
44
Query Processing
Syntax check – concludes SQL syntactic validity.
Example:
SELECT * FORM employee
Here error of wrong spelling of FROM is given by this
check.
48
Database System Environment
49
Database Users and Administrators
Database
50
51
Database Administrator
52
Database Users
Naive users are unsophisticated users who
interact with the system by invoking one of
the application programs that have been
written previously.
For example, a bank teller who needs to
transfer $50 from account A to account B
invokes a program called transfer.
This program asks the teller for the amount
of money to be transferred, the account
from which the money is to be transferred,
and the account to which the money is to
be transferred.
53
Database Users
Application programmers are computer
professionals who write application programs.
Application programmers can choose from many
tools to develop user interfaces.
Rapid application development (RAD) tools are
tools that enable an application programmer to
construct forms and reports without writing a
program.
There are also special types of programming
languages that combine imperative control
structures (for example, for loops, while loops and
if-then-else statements) with statements of the
data manipulation language.
These languages, sometimes called fourth-
generation languages, often include special
features to facilitate the generation of forms and
the display of data on the screen. Most major
54
commercial database systems include a fourth
Database Users
Sophisticated users interact with the
system without writing programs.
Instead, they form their requests in a
database query language.
They submit each such query to a query
processor, whose function is to break down
DML statements into instructions that the
storage manager understands.
Analysts who submit queries to explore data
in the database fall in this category.
55
Database System Internals
56
Database Architectures
The architecture of a database systems is
greatly influenced by the underlying
computer system on which the database is
running:
Centralized
Client-server
Parallel (multi-processor)
Distributed
57
Centralized Architecture
58
59
Parallel Architecture
60
Distributed Architecture
61
Database Architecture….
62
2 Tier Architecture
In a two-tier architecture, the application
resides at the client machine, where it
invokes database system functionality at
the server machine through query
language statements.
Application program interface standards
like ODBC and JDBC are used for
interaction between the client and the
server.
In contrast, in a three-tier architecture, the
client machine acts as merely a front end
and does not contain any direct database
calls.
Instead, the client end communicates with
an application server, usually through a
63
3 Tier Architecture
The application server in turn
communicates with a database system to
access data.
The business logic of the application, which
says what actions to carry out under what
conditions, is embedded in the application
server, instead of being distributed across
multiple clients.
Three-tier applications are more appropriate
for large applications, and for applications
that run on the World Wide Web.
64
65
66
Physical Data Logical Data
Independence Independence
Deals with storage of Deals with structure or
data changing the data
definition
Data retrieval is easy Data retrieval is difficult
Easy to achieve Difficult to achieve
Application program If new fields are added
level is not changed if a or deleted from the
change done in database changes need
physical level to be made in the
application program
Deals with internal Deals with conceptual
schema schema
Example : Hashing Example :
algorithms, Storage Add/Delete/Modify a
67
Classification of DBMS
Hierarchical databases
Network databases
Relational databases
Object-oriented databases
Graph databases
ER model databases
Document databases
NoSQL databases
68
Hierarchical Database
69
Network Database
70
71
72
Object Oriented Database
73
Graph Database
74
ER Model Database
75
Document Database
76
77