Outline: What Is A Distributed DBMS Problems Current State-Of-Affairs
Outline: What Is A Distributed DBMS Problems Current State-Of-Affairs
File Systems
program 1
File 1
data description 1
program 2
data description 2 File 2
program 3
data description 3 File 3
Page 1
Database Management
Application
program 1
(with data
semantics)
DBMS
description
Application
program 2 manipulation
(with data database
semantics) control
Application
program 3
(with data
semantics)
Motivation
Database Computer
Technology Networks
integration distribution
Distributed
Database
Systems
integration
integration ≠ centralization
CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.4
Page 2
Distributed Computing
n A number of autonomous processing elements
(not necessarily homogeneous) that are
interconnected by a computer network and
that cooperate in performing their assigned
tasks.
n What is being distributed?
l Processing logic
l Function
l Data
l Control
Page 3
What is not a DDBS?
n A timesharing computer system
n A loosely or tightly coupled multiprocessor
system
n A database system which resides at one of the
nodes of a network of computers - this is a
centralized database on a network node
Centralized DBMS on a
Network
Site 1
Site 2
Site 5
Communication
Network
Site 4 Site 3
Page 4
Distributed DBMS
Environment
Site 1
Site 2
Site 5
Communication
Network
Site 4 Site 3
Implicit Assumptions
n Data stored at a number of sites à each site
logically consists of a single processor.
n Processors at different sites are interconnected
by a computer network à not a multiprocessor
system
l Parallel database systems
n Distributed database is a database, not a
collection of files à data logically related as
exhibited in the users’ access patterns
l Relational data model
n D-DBMS is a full-fledged DBMS
l Not remote file system, not a TP system
Page 5
Data Delivery Alternatives
n Delivery modes
l Pull-only
l Push-only
l Hybrid
n Frequency
l Periodic
l Conditional
l Ad-hoc or irregular
n Communication Methods
l Unicast
l One-to-many
n Note: not all combinations make sense
Improved performance
Page 6
Transparency
n Transparency is the separation of the higher
level semantics of a system from the lower
level implementation issues.
n Fundamental issue is to provide
data independence
in the distributed environment
l Network (distribution) transparency
Example
Page 7
Transparent Access
SELECT ENAME,SAL
Tokyo
FROM EMP,ASG,PAY
WHERE DUR > 12
Boston Paris
AND EMP.ENO = ASG.ENO
Paris projects
AND PAY.TITLE = EMP.TITLE Paris employees
Communication Paris assignments
Network Boston employees
Boston projects
Boston employees
Boston assignments
Montreal
New
Montreal projects
York Paris projects
Boston projects New York projects
New York employees with budget > 200000
New York projects Montreal employees
New York assignments Montreal assignments
Distributed Database
Page 8
Distributed DBMS - Reality
User
Query
DBMS User
Application
Software
DBMS
Software
DBMS Communication
Software Subsystem
User
DBMS User Application
Software Query
DBMS
Software
User
Query
Types of Transparency
n Data independence
n Network transparency (or distribution
transparency)
l Location transparency
l Fragmentation transparency
n Replication transparency
n Fragmentation transparency
Page 9
Reliability Through
Transactions
Potentially Improved
Performance
Page 10
Parallelism Requirements
n Have as much of the data required by each
application at the site where the application
executes
l Full replication
System Expansion
n Issue is database scaling
Page 11
Distributed DBMS Issues
n Distributed Database Design
l How to distribute the database
l Replicated & non-replicated database distribution
l A related problem in directory management
n Reliability
l How to make the system resilient to failures
l Atomicity and durability
Page 12
Relationship Between Issues
Directory
Management
Query Distribution
Reliability
Processing Design
Concurrency
Control
Deadlock
Management
CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.25
Related Issues
n Operating System Support
l Operating system with proper support for database
operations
l Dichotomy between general purpose processing
requirements and database processing requirements
n Open Systems and Interoperability
l Distributed Multidatabase Systems
l More probable scenario
l Parallel issues
Page 13
Architecture
n Defines the structure of the system
l components identified
ANSI/SPARC Architecture
Users
Conceptual Conceptual
view
Schema
Page 14
Generic DBMS Architecture
DBMS Implementation
Alternatives
Page 15
Dimensions of the Problem
n Distribution
l Whether the components of the system are located on the same
machine or not
n Heterogeneity
l Various levels (hardware, communications, operating system)
l DBMS important one
u data model, query language,transaction management algorithms
n Autonomy
l Not well understood and most troublesome
l Various versions
u Design autonomy: Ability of a component DBMS to decide on
issues related to its own design.
u Communication autonomy: Ability of a component DBMS to
decide whether and how to communicate with other DBMSs.
u Execution autonomy: Ability of a component DBMS to execute
local operations in any manner it wants to.
Client/Server Architecture
Page 16
Advantages of Client-Server
Architectures
Database Server
Page 17
Distributed Database
Servers
Datalogical Distributed
DBMS Architecture
GCS
Page 18
Peer-to-Peer Component
Architecture
USER PROCESSOR DATA PROCESSOR
Local Recovery
Global Query
Local Query
Controller
Optimizer
Execution
Handler
Processor
Processor
Manager
Runtime
Monitor
Support
Global
USER
System
responses
Datalogical Multi-DBMS
Architecture
Page 19
MDBS Components & Execution
Global
User
Request
Local Local
User Multi-DBMS User
Request Layer Request
Global Global Global
Subrequest Subrequest Subrequest
Mediator/Wrapper Architecture
Page 20