0% found this document useful (0 votes)
44 views

Lecture 4db

The document discusses strategies for distributed database design, including top-down and bottom-up approaches. For top-down design, requirements are analyzed first to define the global conceptual schema (GCS), then views are designed and the GCS is distributed through fragmentation and allocation. For bottom-up design, existing local conceptual schemas (LCSs) are integrated and mapped to a generated GCS. Key issues discussed are how and how much to fragment relations to improve performance, reliability, and other factors based on access patterns and other information.

Uploaded by

mohsin dish
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views

Lecture 4db

The document discusses strategies for distributed database design, including top-down and bottom-up approaches. For top-down design, requirements are analyzed first to define the global conceptual schema (GCS), then views are designed and the GCS is distributed through fragmentation and allocation. For bottom-up design, existing local conceptual schemas (LCSs) are integrated and mapped to a generated GCS. Key issues discussed are how and how much to fragment relations to improve performance, reliability, and other factors based on access patterns and other information.

Uploaded by

mohsin dish
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 14

DDMS

DESIGN
DDB Design
• Design problem of distributed systems: Making
decisions about the placement of data and programs
across the sites of a computer network as well as
possibly designing the network itself.
• In DDBMS, the distribution of applications involves
– Distribution of the DDBMS software
– Distribution of applications that run on the database
• Distribution of applications will not be considered in the
following; instead the distribution of data is studied.
the organization of distributed systems can be investigated along
three orthogonal dimensions.
• Level of sharing
three possibilities:
No sharing
each application and its data execute at one site, and there is no
communication with any other program or access to any data file at
other sites.
Data sharing
all the programs are replicated at all the sites, but data files
are not. Accordingly, user requests are handled at the site where they
originate and the necessary data files are moved around the
network.
Data-plus-program
sharing, both data and programs may be shared, meaning that a
program at a given site can request a service from another program
at a second site, which, in turn, may have to access a data file
located at a third site.
• Behavior of access patterns: (access patterns
of user requests may be static, so that they do
not change over time) static, dynamic
• Level of knowledge on access pattern
behavior: no information, partial information,
complete info(designers have info that how
users access the db).
• Distributed database design problems should
be considered within this general framework.
Distributed Database Design
• Two strategies:
• Top down approach
– Designing systems from scratch
– Homogeneous systems
• Bottom-up approach
– The databases already exist at a number of sites
– The databases should be connected to solve
common tasks
Top down design strategy
Top down design strategy
• Req. analysis: identified data and processing
needs of all potential database users.
• specifies where the final system is expected to
stand with respect to the objectives of a
distributed DBMS.
• Objectives: performance, reliability and
availability, economics, and expandability
Top down design strategy
• View design: defining the interfaces for end
Users.
Conceptual design: determine entity types and relationships
among these entities
• The global conceptual schema (GCS) and access pattern
information collected as a result of view design are inputs
to the distribution design step.
• Objective is to design the local conceptual schemas (LCSs)
by distributing the entities over the sites of the distributed
system.
• Rather than distributing relations, it is quite common to
divide them into sub relations, called fragments, which are
then distributed.
• the distribution design activity consists of two
steps to deals with complexity of the problem:
fragmentation and allocation.
• physical design: which maps the local
conceptual schemas to the physical storage
devices available at the corresponding sites.
Bottom up Approach
the GCS is defined as an integration of parts of LCSs. In this case, the bottom-
up design involves both the generation of the GCS and the mapping of
individual LCSs to this GCS. To reduce heterogeneity between dbs
translators are used to convert schema in canonical representation( match
able with the concepts available in all dbs ).
Three step process:
• Schema matching to determine the syntactic and semantic
correspondences
among the translated LCS elements or between individual LCS elements and
the pre-defined GCS elements
• Integration of the common schema elements into a global conceptual
(mediated)schema if one has not yet been defined
• Schema mapping that determines how to map the elements of each LCS
to the other elements of the GCS
Distribution Design Issues
• Why fragment at all?
• How to fragment?
• How much to fragment?
• How to test correctness?
• How to allocate
Fragmentation
What is a reasonable unit of distribution? Relation or fragment of relation?
• Relations as unit of distribution:
– If the relation is not replicated, we get a high volume of remote data
accesses.
– If the relation is replicated, we get unnecessary replications, which cause
problems in executing updates and waste disk space
– Might be an Ok solution, if queries need all the data in the relation and data
stays at the only sites that uses the data
• Fragments of relation as as unit of distribution:
– Application views are usually subsets of relations
– Thus, locality of accesses of applications is defined on subsets of relations
– Permits a number of transactions to execute concurrently, since they will
access different portions of a relation
Parallel execution of a single query (intra-query concurrency)
– However, semantic data control (especially integrity enforcement) is more
difficult.
Fragmentation aims to improve:
– Reliability
– Performance
– Balanced storage capacity and costs
– Communication costs
– Security
The following information is used to decide
fragmentation:
– Quantitative information: frequency of queries, site,
where query is run, selectivity of the queries, etc.
– Qualitative information: types of access of data,
read/write, etc.

You might also like