0% found this document useful (0 votes)

15 views

17 DatabaseArchitectures

Uploaded by

Phan Long

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views

17 DatabaseArchitectures

Uploaded by

Phan Long

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 41

Database

Architectures
CS220

Slides adapted from

Simon Miner
Gordon College
Start Recording
Attendance Quiz:
Locking Protocols
• Write your name and the date

• Working with a partner:

• Describe the difference between shared and exclusive locks
• Describe the two phases of the "two-phase locking protocol"
• Identify the default isolation level used by PostgreSQL:
Agenda

• Introduce Homework

• Parallelism

• Distributed Databases

• Introduction to NoSQL
Homework
Parallelism
We Need More Power!

• Parallelism brought on by the success of the

client-server model
• Servers need to support more clients with
more demanding operations

• Alternative to acquiring bigger faster more

expensive hardware
• Bottlenecks which can be parallelized
• CPU
• Disk
More Speed for More
Stuff!
• Speed-up – make individual transactions process
faster
• Multiple CPUs cooperate to complete a single
(expensive) transaction

• Scale-up – handle more work in the same amount of

time
• Batch scale-up – increase the size of transactions (as
database grows)
• CPUs cooperate to complete (larger) transactions
• Transaction scale-up – increase the volume of
transactions
• Each CPU handles its own transaction, but more can be
processed at the same time
Shared Resources that
Enable Parallelism
• Shared memory – multiple CPUs sharing
common memory (while also having their
own cache/private local memory)
• Shared disk (cluster) – multiple CPUs share
a disk system
• Shared nothing – each CPU has its own
memory and disk
I/O Parallelism
• Reduce the time required to retrieve relations from disk by
partitioning the relations on multiple disks.
• Horizontal partitioning: tuples of a relation are divided among
many disks such that each tuple resides on one disk.

• Definitions
• Point query: query to look up a single tuple (i.e. age = 25)
• Range query: query to look up a range of values
(i.e. age > 25 and age < 60)
Partitioning Techniques:
Round Robin
• Partitioning techniques (number of disks = n):
• Round-robin: Send the Ith tuple inserted in the relation to disk
i mod n.
• Good for sequential reads of entire table
• Even distribution of data over disks
• Range queries are expensive
Partitioning Techniques:
Hash Partitioning
• Partitioning techniques (number of disks = n):
• Hash partitioning: Choose one or more partitioning
attribute(s) and apply a hashing function to their values that
produces a value within the range of 0…n – 1 disks
• Good for sequential or point queries based on partition attribute(s)
• Range queries are expensive
Partitioning Techniques:
Range Partitioning
• Partitioning techniques (number of disks = n):
• Range partitioning: Choose a partitioning attribute, and
divide its values into ranges, tuples that match a given range
go in the corresponding partition
• Clusters data by partition value (i.e. by date range)
• Good for sequential access and point queries on partitioning
attribute
• Supports range queries on partitioning attribute
Potential Problems
• Skew: non-uniform distribution of database records
• Hash partitioning: bad hash function (not uniform or random)
• Range partitioning: lots of records going In the same partition
(web traffic/orders stored in a date-partitioned table, more
during the shopping season)
Distributed
Databases
One Database, Multiple
Locations
• Distributed database is stored on several
computers located at multiple physical sites
• Types of distributed database
• Homogeneous – all systems run the same brand of
DBMS software on the same OS and hardware
• Coordination is easier in this setup
• Heterogeneous – system run different DBMS on
potentially different OS and hardware
Advantages of Distributed
Systems
• Sharing of data generated at different sites

• Local control and autonomy at each site

• Reliability and availability

• If one site fails, there may be a performance reduction and
some data may become unavailable, but processing can
continue
• Contrast with a failure of a centralized system

• Potentially faster query response times

• For locally stored data – don’t need to go to a central store
• Multiple sites can potentially work on the same query in parallel

• Incremental system maintenance and upgrades

Disadvantages of
Distributed Systems
• Cost and time required to communicate between
sites
• Operations involving multiple sites are slower
because data must be transferred between them

• Increased complexity

• Difficult to debug
Fragmentation
• Splitting a table up between sites
• Also called sharding

• Horizontal fragmentation

• Vertical Fragmentation

• Fragmentation in both directions

Horizontal
Fragmentation
• Store different records (rows) at distinct sites
• Records most pertinent to each site (i.e. store, plant,
branch)

• Specified by relational algebra selection

operation
• Entire table can be reconstructed by a union of
records at all sites
• Queries to local rows are inexpensive, but queries
involving remote records have high
communication cost
Vertical fragmentation
• Store different columns at distinct sites
• Give access only to data that is needed at site
• Restrict access to sensitive or unnecessary data at sites
• Selectively replicate portions of a table
• Replicate columns frequently used at remote sites for quicker
access

• Specified by projection operation

• Entire table can be reconstructed by a natural join

on the fragments
• Requires (primary) key to be present in each fragment
• Or some system-generated row id (not used by end users)
Fragmentation
Example
Replication

• Storing the same data at different locations

• Improves performance – local access to replicated data is
more efficient than working with a remote copy
• Improves availability – if the local copy fails, the system
may still be able to use a remote copy

• Can be combined with fragmentation

• Issues from data redundancy

• Requires extra storage
• Updates to multiple copies of data
• Update strategy must ensure that an inconsistent replica is
not used to update other copies, but rather is itself restored
to a consistent state
Choosing whether to Fragment,
Replicate, and/or Centralize
• Use replication for small relations needed at
multiple sites
• Use fragmentation for large relations when data
is associated with particular sites
• Use centralization for large relations when data
is not associated with particular sites
• In this case, communication costs would be higher for
fragmentation, as queries would have to access
numerous remote sites instead of just the central site
Data Transparency
• Degree to which a user is unaware of how and where
data is stored in distributed system
• Types of data transparency
• Fragmentation transparency
• Replication transparency
• Location transparency

• Advantages
• Allows data to be moved without user needing to know
• Allows query planner to determine the most efficient way to
get data
• Allows access of replicated data from another site if local
copy is unavailable
Names of Data Items
• Criteria – Each data item in a distributed system should be
• Uniquely named
• Efficient to find
• Easy to relocate
• Each site should be able to create new items autonomously

• Approaches
• Centralized naming server
• Keeps item names unique, easy to find, easy to move (via lookup)
• Names cannot be created locally -- high communication cost to get new
names
• What happens if the naming server goes down?
• Incorporate site ID into names
• Meets criteria, but at the cost of location transparency
• Maintain a set of aliases at each site mapping local to actual names
• i.e. customer => site17.customer
Querying Distributed
Data
• Queries and transactions can be either
• Local – all data is stored at current site
• Global – it needs data from one or more remote
sites
• Transaction might originate locally and need data from
elsewhere
• Transaction might originate elsewhere, and need data
stored locally

• Planning strategies for global queries is difficult

• Minimize data transferred between sites
• Use statistical information to assist
Global Query
Strategies
• Execute data reducing operations before transferring data
between sites
• Produce results smaller than starting data
• Selection, projection, intersection, aggregation (count, sum, etc.)
• Sometimes natural and theta join, union

• Execute data expanding operations after transferring data

between sites
• Produce results larger than starting data
• Cartesian join, natural and theta join (sometimes)

• Semijoin -- |X
• r1 |X r2 = π R1 ( r1 |X| r2 )
• Transfer only those tuples in r1 which match in the natural join with
r2 between sites
Global Query Library
Example
• Given
• checkout relation stored locally
• (Large) book_info relation (call_no, title, etc.) stored centrally

• Find details (including book titles) of all local checkouts

that have just gone overdue
• Strategies
• Copy entire book_info relation to the local site and do the join
there
• Not optimal – copying a very large relation for only a few matching
tuples
• Send local site only those book tuples relevant to the query
• Semijoin -- book_info |X checkout
• Data reducing operations at local and central sites
Modifying Distributed Data
can be Complicated
• Challenges related to updating data in a distributed system
• Ensure that updates to data stored at multiple sites get
committed or rolled back on each site
• Avoid one site committing an update and another aborting it
• Ensure that replicated data is consistently updated on all
replicas
• Updates to different replicas do not occur at the same time
• Avoid inconsistencies arising from data read from a replica that has
not been updated yet
• Partial failure – one or more sites down
• Due to hardware, software, or communication link failure
• What happens when this failure occurs in the middle of an update
operation?
• How to deal with corrupted or lost messages?
Two-Phase Commit
Protocol (2PC)
• Ensure that either all updates commit or none commit
• Here, “updates” = changes to data (inserts, updates, deletes, etc.)

• One site (usually the site originating the update) acts as the
coordinator
• Each site completes work on the transaction, becomes partially
committed, and notifies the coordinator
• Once coordinate receives completion messages from all sites, it
can begin the commit protocol
• If coordinator receives a failure message from one or more sites, it
instructs all sites to abort the transaction
• If the coordinator does not receive any message from a site in a
reasonable amount of time, it instructs all sites to abort the
transaction
• Site or communication link might have failed during the transaction
2PC Phase 1: Obtaining a
Decision
• Coordinator writes a <prepare T> entry to its log and
forces all log entries to stable storage
• Coordinator sends a prepare-to-commit message to all
participating sites
• Ideally, each site writes a <ready T> entry to its log,
forces all log entries to stable storage, and sends a ready
message to the coordinator
• If a site needs to abort the transaction, it writes a <no T>
entry to its log, forces all entries to stable storage, and sends
an abort message to the coordinator
• Once a site sends a ready message to the coordinator, it
gives up its right to abort the transaction
• It must commit if/when the coordinator instructs it to
2PC Phase 2: Recording
the Decision
• Coordinator waits for each site to respond to the prepare-to-commit
message
• If any site responds negatively or fails to respond, coordinator writes an
<abort T> entry to its log and sends an abort message to all sites
• If all responses are positive, coordinator writes a <commit T> entry to its log
and sends a commit message to all sites
• At this point, the coordinator’s decision is final
• 2PC protocol will work to carry it out even if a site fails

• As each site receives the coordinator’s message, it either commits or aborts

the transaction, makes an appropriate log entry, and sends an acknowledge
message back to the coordinator
• Once the coordinator receives acknowledge messages from all sites, it writes
a <complete T> entry to its log
• If a site fails to send an acknowledge message, the coordinator may resend
its message to it
• Ultimately, the site is responsible to find and carry out the coordinator’s decision
2PC: If a Remote Site or
Communication Link Fails…
• …before sending its ready message, the transaction will fail
• When the site comes back up, it may send its ready message, but the
coordinator will ignore this
• Coordinator will send periodic abort messages to site so that it will
eventually acknowledge the failure and return to a consistent state
• Same scenario as above if ready message is lost in transit

• …after the coordinator receives the ready message

• The site must figure out what happened to the transaction once it
recovers (via a message from coordinator or asking some other site) and
take appropriate action

• …after the site receives the coordinator’s final decision

• The site will know what to do after it recovers (from commit or abort
entry in its log)
• Takes appropriate action and sends an acknowledgement message to the
coordinator
2PC: If the Coordinator
Fails…
• …before it sends a final decision
• Sites that already sent ready messages have to wait for coordinator to
recover before deciding what to do with the transaction
• Can lead to blocking – locked data items unavailable until coordinator
recovers
• Sites that have not sent ready message can time out and abort the
transaction

• …after sending a final decision to at least one site, it will figure out
what to do after it recovers based on its log
• <start T> but no <prepare T>  abort transaction
• <prepare T> but no <commit T>  find out status of sites or abort
transaction
• <abort T> or <commit T>, but no <complete T>  restart sending of
commit/abort messages and waiting for acknowledgements

• Sites may be able to find out what to do from each other when the
coordinator is down
Updating Replicated
Data
• All replicas of a given data item must be
kept synchronized when updates occur
• How to do this
• Simultaneous updates of all replicas for each
transaction
• Ensures consistency across replicas
• Slows down update transactions and breaks
replication transparency
• What happens if a replica is unreachable during
an update?
Primary Copy
• Designate a primary copy of the data at some site
• Reads can happen on any replica, but updates happen on primary
copy first
• Primary copy’s site sends updates to replica sites
• Immediately after each update or periodically (if eventual consistency is
OK)
• Resending updates periodically to sites that are down

• Secondary copies might be a little out-of-date, so critical reads

should go to the primary copy
• What happens when the site with the primary copy fails?
• Data becomes unavailable for update until the primary copy site is
recovered
• Or, a secondary copy can become a temporary primary copy
• Could lead to inconsistencies when trying to reactivate the real primary
copy
Concurrency Control with
Distributed Systems
• How to ensure serializable transactions in a distributed
system?
• Locks – need to lock an item at multiple sites before
accessing it
• Centralized lock manager – all locks obtained from this
lock manager on one site
• Transaction needing to lock several replicas at once can
get all of its locks in a single message
• Single source for dealing with deadlock
• Local transactions involving locking incur communication
overhead
• Locking manager becomes a bottleneck and single point
of failure
Distributed Locking
• Each site manages the locks of items stored there
• Local transactions stay local, no single point of failure

• Disadvantages
• More message overhead – need to send lock request,
receive lock granted, and unlock message in addition
to the data involved
• Deadlock detection gets harder
• Further complications to updating replicated data
• How many replica locks are needed to do an update (all
of them? Most of them?)
• Primary copy method helps with this, as only primary
copy needs to be locked
Timestamps for Distributed
Concurrency Control
• Must ensure consistency and uniqueness of
timestamps across sites
• Combine locally generated timestamp and site id into a
transaction’s global timestamp

• Need to ensure that all sites’ clocks are always

synchronized with one another
• If any site receives a request from a transaction
originating elsewhere…
• And that transaction’s timestamp is greater than the
current site’s timestamp clock
• Advance the local timestamp clock to one greater than
the transaction timestamp
Further Reading
• What is Distributed SQL? An Evolution of the Datab
ase
• CockroachDB (open source, developed by former
Google engineers)
• Google F1 (proprietary) and Spanner (proprietary,
offered by Google Cloud)
• Amazon Aurora (proprietary, offered by AWS)

• Distributed functionality for traditional RDBMS's:

• Features of standard PostgreSQL and
Citus for PostgreSQL
• Features of standard MySQL and MySQL Cluster

OCI Generative AI
100% (1)
OCI Generative AI
60 pages
My Revision Notes AQA CS A-Level
100% (2)
My Revision Notes AQA CS A-Level
259 pages
Question#1/104: Not Be The Same
100% (1)
Question#1/104: Not Be The Same
61 pages
Microsoft Azure Data Fundamentals Explore Core Data Concepts
No ratings yet
Microsoft Azure Data Fundamentals Explore Core Data Concepts
8 pages
Distributed Database Concepts
No ratings yet
Distributed Database Concepts
52 pages
Oracle PLSQL Developer Resume
No ratings yet
Oracle PLSQL Developer Resume
7 pages
26 Distributed Dbms Nosql
No ratings yet
26 Distributed Dbms Nosql
45 pages
Distributed Databases
No ratings yet
Distributed Databases
53 pages
DDB Slides
No ratings yet
DDB Slides
67 pages
Chapter 4 - Distributed Database System
No ratings yet
Chapter 4 - Distributed Database System
52 pages
Distributed Data Management: Distributed Systems Department of Computer Science UC Irvine
No ratings yet
Distributed Data Management: Distributed Systems Department of Computer Science UC Irvine
67 pages
Unit i Distributed Databases
No ratings yet
Unit i Distributed Databases
15 pages
Tybca Recent Trends in It Chpter 1
No ratings yet
Tybca Recent Trends in It Chpter 1
16 pages
DDIS U1-3
No ratings yet
DDIS U1-3
40 pages
Adv DBMS-Unit 2
No ratings yet
Adv DBMS-Unit 2
15 pages
DDB Slides
No ratings yet
DDB Slides
30 pages
A Distributed Database Management System ('DDBMS') Is A Software System
No ratings yet
A Distributed Database Management System ('DDBMS') Is A Software System
5 pages
Distributed Databases: by Chien-Pin Hsu CS157B Section 1 Nov 11, 2004
No ratings yet
Distributed Databases: by Chien-Pin Hsu CS157B Section 1 Nov 11, 2004
24 pages
Data Communication Basics CH 7
No ratings yet
Data Communication Basics CH 7
27 pages
Distributed Databases: CMP-3440 - Database Systems
No ratings yet
Distributed Databases: CMP-3440 - Database Systems
12 pages
Adbms
No ratings yet
Adbms
70 pages
Distributed Databases: by Allyson Moran
No ratings yet
Distributed Databases: by Allyson Moran
37 pages
DBMS
No ratings yet
DBMS
17 pages
Distributed Databases: Benefits and Issues To Be Considered
No ratings yet
Distributed Databases: Benefits and Issues To Be Considered
25 pages
A It Report Final
No ratings yet
A It Report Final
15 pages
DBMS-Unit 5
No ratings yet
DBMS-Unit 5
27 pages
Distributed Database
100% (1)
Distributed Database
24 pages
Enterprise Systems: Distributed Databases and Systems - DT211 4
No ratings yet
Enterprise Systems: Distributed Databases and Systems - DT211 4
25 pages
DistributedDatabases 3
No ratings yet
DistributedDatabases 3
14 pages
Q # 1: What Are The Components of Distributed Database System? Explain With The Help of A Diagram. Answer
No ratings yet
Q # 1: What Are The Components of Distributed Database System? Explain With The Help of A Diagram. Answer
12 pages
06 - Distributed DBMSs and Replication
No ratings yet
06 - Distributed DBMSs and Replication
55 pages
Unit-4-DDBMS (1)
No ratings yet
Unit-4-DDBMS (1)
58 pages
Distributed Databases AND Client-Server Architechures
No ratings yet
Distributed Databases AND Client-Server Architechures
73 pages
Advanced Database Chapter 6 and 7
No ratings yet
Advanced Database Chapter 6 and 7
30 pages
Chapter 7 - Distributed Database System
No ratings yet
Chapter 7 - Distributed Database System
27 pages
Midterm Elective Database Notes
No ratings yet
Midterm Elective Database Notes
14 pages
Database
No ratings yet
Database
6 pages
Distributed Databases: Not Just A Client/server System
No ratings yet
Distributed Databases: Not Just A Client/server System
43 pages
Distributed DBM S
No ratings yet
Distributed DBM S
67 pages
Unit 1 PDF
No ratings yet
Unit 1 PDF
33 pages
ch6 Distributed Database
No ratings yet
ch6 Distributed Database
25 pages
Adb CH 4
No ratings yet
Adb CH 4
14 pages
Fundamentals of Database Systems: (Parallel and Distributed Databases)
No ratings yet
Fundamentals of Database Systems: (Parallel and Distributed Databases)
46 pages
10 Distributeddbms
No ratings yet
10 Distributeddbms
56 pages
Unit-V Distributed and Client Server Databases: A Lalitha Associate Professor Avinash Degree College
No ratings yet
Unit-V Distributed and Client Server Databases: A Lalitha Associate Professor Avinash Degree College
24 pages
Distributed Databases: Centralized Database System Distributed Database System Advantages and Disadvantages of DDBMS
No ratings yet
Distributed Databases: Centralized Database System Distributed Database System Advantages and Disadvantages of DDBMS
26 pages
ParallelDBs PDF
No ratings yet
ParallelDBs PDF
23 pages
Parallel & Distributed Databases: C S 5 6 1 - S P R I N G 2 0 1 2 Wpi, Mohamed Eltabakh
No ratings yet
Parallel & Distributed Databases: C S 5 6 1 - S P R I N G 2 0 1 2 Wpi, Mohamed Eltabakh
23 pages
Distributed Databases: Not Just A Client/server System
No ratings yet
Distributed Databases: Not Just A Client/server System
43 pages
Unit-2_Distributed Database System
No ratings yet
Unit-2_Distributed Database System
7 pages
DC Unit-1
No ratings yet
DC Unit-1
15 pages
Distributed Databases
No ratings yet
Distributed Databases
46 pages
07-DistributedDataManagement
No ratings yet
07-DistributedDataManagement
44 pages
Unit 1
No ratings yet
Unit 1
28 pages
Lecture 2 Distriburted Databases
No ratings yet
Lecture 2 Distriburted Databases
45 pages
Chapter 4 Bing
No ratings yet
Chapter 4 Bing
5 pages
Unit V
No ratings yet
Unit V
22 pages
Chapter 5 - Distributed Databases Roobera
No ratings yet
Chapter 5 - Distributed Databases Roobera
58 pages
Chapter 6
No ratings yet
Chapter 6
28 pages
Chapter 6
No ratings yet
Chapter 6
45 pages
Unit 2-DBP
No ratings yet
Unit 2-DBP
44 pages
Chapter 4 Distributed Databases
No ratings yet
Chapter 4 Distributed Databases
36 pages
Distributed Database Frank Chinembiri and Florence-2
No ratings yet
Distributed Database Frank Chinembiri and Florence-2
42 pages
III-sharding-strategies
No ratings yet
III-sharding-strategies
30 pages
Introduction to Microsoft SQL Server
From Everand
Introduction to Microsoft SQL Server
Eric Frick
No ratings yet
How to find the largest 10 files or directories in On-Premises (SecureSphere)_
No ratings yet
How to find the largest 10 files or directories in On-Premises (SecureSphere)_
1 page
How to register for the GCP Exam
No ratings yet
How to register for the GCP Exam
16 pages
Administering Oracle Database on Linux
No ratings yet
Administering Oracle Database on Linux
9 pages
Imperva - SecureD Data Protection v1.5 HSL v1.2
No ratings yet
Imperva - SecureD Data Protection v1.5 HSL v1.2
32 pages
11.gan IP Cho Solaris
No ratings yet
11.gan IP Cho Solaris
2 pages
NetBackup82 Network Ports Reference Guide
No ratings yet
NetBackup82 Network Ports Reference Guide
24 pages
Oracle Cloud Infrastructure 2021
No ratings yet
Oracle Cloud Infrastructure 2021
482 pages
Steps To Perform For Rolling Forward A Standby Database Using RMAN Incremental Backup When Datafile Is Added To Primary (Doc ID 1531031.1)
No ratings yet
Steps To Perform For Rolling Forward A Standby Database Using RMAN Incremental Backup When Datafile Is Added To Primary (Doc ID 1531031.1)
3 pages
Configuring The BPM
No ratings yet
Configuring The BPM
29 pages
Oracle BPM Suite and High Availability Concepts
No ratings yet
Oracle BPM Suite and High Availability Concepts
4 pages
Scrib
No ratings yet
Scrib
8 pages
Data and Database Administration With Focus On Data Quality
100% (1)
Data and Database Administration With Focus On Data Quality
35 pages
Prac 303
No ratings yet
Prac 303
19 pages
AFF and FAS System Documentation-82
No ratings yet
AFF and FAS System Documentation-82
10 pages
SPARK
No ratings yet
SPARK
125 pages
Unit - Iii RDBMS Notes
No ratings yet
Unit - Iii RDBMS Notes
26 pages
Lecture 6 Stack Using Linked List
No ratings yet
Lecture 6 Stack Using Linked List
21 pages
XML Data Classes: Table of Contents
No ratings yet
XML Data Classes: Table of Contents
14 pages
Excellent Tricks and Techniques of Google Hacks
No ratings yet
Excellent Tricks and Techniques of Google Hacks
304 pages
MERGE SQL Statement: Lesser Known Facets: Andrej Pashchenko
No ratings yet
MERGE SQL Statement: Lesser Known Facets: Andrej Pashchenko
35 pages
History of Database Applications
No ratings yet
History of Database Applications
4 pages
What Is Concurrency Control
No ratings yet
What Is Concurrency Control
4 pages
SQL Interview Questions
No ratings yet
SQL Interview Questions
24 pages
Working With Geosoft Databases in Oasis Montaj
No ratings yet
Working With Geosoft Databases in Oasis Montaj
8 pages
Database Connectivity Programs
No ratings yet
Database Connectivity Programs
7 pages
Azure Migration
No ratings yet
Azure Migration
25 pages
Variables in SSIS
No ratings yet
Variables in SSIS
14 pages
Managing State
No ratings yet
Managing State
2 pages
CSC 220 DBMS Assignment01 (10marks)
0% (1)
CSC 220 DBMS Assignment01 (10marks)
3 pages
Trip To The Beach Word Search
100% (1)
Trip To The Beach Word Search
3 pages
Ch15 System and User Security
No ratings yet
Ch15 System and User Security
16 pages
Fundamentals of Database Management Systems: Prof. Mandeep Gupta
No ratings yet
Fundamentals of Database Management Systems: Prof. Mandeep Gupta
21 pages
Comparison of Relational Database With Document-Oriented Database (Mongodb) For Big Data Applications
No ratings yet
Comparison of Relational Database With Document-Oriented Database (Mongodb) For Big Data Applications
7 pages
Pyspark Hands on
No ratings yet
Pyspark Hands on
189 pages
Quezon Opes
No ratings yet
Quezon Opes
3 pages
DB2 Hadr
No ratings yet
DB2 Hadr
63 pages
TM103 Chapter 6
No ratings yet
TM103 Chapter 6
48 pages

17 DatabaseArchitectures

Uploaded by

17 DatabaseArchitectures

Uploaded by

Database

Slides adapted from

• Working with a partner:

• Parallelism brought on by the success of the

• Alternative to acquiring bigger faster more

• Scale-up – handle more work in the same amount of

• Local control and autonomy at each site

• Reliability and availability

• Potentially faster query response times

• Incremental system maintenance and upgrades

• Fragmentation in both directions

• Specified by relational algebra selection

• Specified by projection operation

• Entire table can be reconstructed by a natural join

• Storing the same data at different locations

• Can be combined with fragmentation

• Issues from data redundancy

• Planning strategies for global queries is difficult

• Execute data expanding operations after transferring data

• Find details (including book titles) of all local checkouts

• As each site receives the coordinator’s message, it either commits or aborts

• …after the coordinator receives the ready message

• …after the site receives the coordinator’s final decision

• Secondary copies might be a little out-of-date, so critical reads

• Need to ensure that all sites’ clocks are always

• Distributed functionality for traditional RDBMS's:

You might also like