Dbmsunit 3
Dbmsunit 3
Control
■Lock-Based Protocols
■Timestamp-Based Protocols
■Validation-Based Protocols
■Multiple Granularity
■Multiversion Schemes
■Deadlock Handling
■Insert and Delete Operations
■Concurrency in Index Structures
■ Locking get
as above is not sufficient to guarantee serializability — if A and B
updated in-between the read of A and B, the displayed sum would be
wrong.
lock-S on D;
Database System Concepts 3rd Edition read(D) 16.11 ©Silberschatz, Korth and
Automatic Acquisition of Locks
(Cont.)
■ write(D) is processed as: if
Ti has a lock-X on
D
then
write(D)
else
begin
if necessary wait until no other trans. has any lock on D,
if Ti has a lock-S on D
then
upgrade lock on D to lock-
X else
grant Ti a lock-X on D
write(D)
end;
■In order to assure such behavior, the protocol maintains for each data Q
two timestamp values:
W-timestamp(Q) is the largest time-stamp of any transaction that
executed write(Q) successfully.
R-timestamp(Q) is the largest time-stamp of any transaction that
executed read(Q) successfully.
T1 T2 T3 T4 T5
read(X)
read(Y)
read(Y)
write(Y)
write(Z) read(Z)
read(X)
read(X) abort
write(Z) write(Y)
abort
write(Z)
transaction transaction
with smaller with larger
timestamp timestamp
■ Solution:
A transaction is structured such that its writes are all performed at
the end of its processing
All writes of a transaction form an atomic action; no transaction may
execute while a transaction is being written
A transaction that aborts is restarted with a new timestamp
■ If for all Ti with TS (Ti) < TS (Tj) either one of the following
condition holds:
finish(Ti) < start(Tj)
start(Tj) < finish(Ti) < validation(Tj) and the set of data items
written by Ti does not intersect with the set of data items read by
Tj .
■In addition to S and X lock modes, there are three additional lock modes
with multiple granularity:
intention-shared (IS): indicates explicit locking at a lower level of the
tree but only with shared locks.
intention-exclusive (IX): indicates explicit locking at a lower level with
exclusive or shared locks
shared and intention-exclusive (SIX): the subtree rooted by that
node is locked explicitly in shared mode and explicit locking is being
done at a lower level with exclusive-mode locks.
■intention locks allow a higher level node to be locked in S or X mode
without having to check all descendent nodes.
IX
S
S IX
X
lock-X on X
write (X)
lock-X on Y
write (X)
wait for lock-X on X
wait for lock-X on Y
■Timeout-Based Schemes :
a transaction waits for a lock only for a specified amount of
time.
After that, the wait times out and the transaction is rolled
back.
thus deadlocks are not possible
simple to implement; but starvation is possible. Also difficult to
determine good value of the timeout interval.
■One solution:
Associate a data item with the relation, to represent the information
about what tuples the relation contains.
Transactions scanning the relation acquire a shared lock in the data
item,
Transactions inserting or deleting a tuple acquire an exclusive lock on
the data item. (Note: locks on the data item do not conflict with locks on
individual tuples.)
■Above protocol provides very low concurrency for
insertions/deletions.
■Index locking protocols provide higher concurrency while
preventing the phantom phenomenon, by requiring locks
on certain index buckets.
Database System Concepts 3rd Edition 16.47 ©Silberschatz, Korth and
Index Locking
Protocol
■Every relation must have at least one index. Access to a relation must
be made only through one of the indices on the relation.
■ A transaction Ti that performs a lookup must lock all the index
buckets that it accesses, in S-mode.
■ A transaction Ti may not insert a tuple ti into a relation r
without updating all indices to r.
■ Ti must perform a lookup on every index to find all index buckets
that could have possibly contained a pointer to tuple ti, had it
existed already, and obtain locks in X-mode on all these index
buckets. Ti must also obtain locks in X-mode on all index buckets
that it modifies.
■Cursor stability:
For reads, each tuple is locked, read, and lock is immediately
released
X-locks are held till end of transaction
Special case of degree-two consistency
– T1 may see some records inserted by T2, but may not see
others inserted by T2
Read committed: same as degree two consistency,
but most systems implement it as cursor-stability
Read uncommitted: allows even uncommitted data to be
read
• For n transactions T1, T2, ..., Tn, where each Ti has mi read
and write operations, the number of possible schedules is (! is
factorial function):
(m1 + m2 + … + mn)! / ( (m1)! * (m2)! * … * (mn)! )
• Schedule A: r1(X); w1(X); r2(X); w2(X); c2; r1(Y); w1(Y); c1 (or a1)
• Schedule B: r1(X); w1(X); r2(X); w2(X); r1(Y); w1(Y); c1 (or a1); ...
Characterizing Schedules
based on Recoverability (cont.)
Recoverable schedules can be further refined:
•Cascadeless schedule: A schedule in which a transaction T2
cannot read an item X until the transaction T1 that last wrote X
has committed.
•The set of cascadeless schedules is a subset of the set of
recoverable schedules.
In Sa, the operations w2(X) and w3(X) are blind writes, since T2
and T3 do not read the value of X.
Distributed databases
Concepts
Distributed Database.
A logically interrelated collection of shared data (and a description of this
data), physically distributed over a computer network.
Distributed DBMS.
Software system that permits the management of the distributed database
and makes the distribution transparent to users.
Concepts
site 1
GDD
DDBMS
DC LDBMS
GDD
Computer Network
DDBMS
DC
site 2 DB
5
Distributed
Processing
6
Parallel
DBMS
a: Shared memory.
b: Shared disk.
c: Shared nothing.
Parallel
DBMS
9
Advantages
of DDBMSs
Organizational Structure
Shareability and Local Autonomy
Improved Availability
Improved Reliability
Improved Performance
Economics
Modular Growth
Disadvanta
ges of
DDBMSs
Complexity
Cost
Security
Integrity Control More Difficult
Lack of Standards
Lack of Experience
Database Design More Complex
Types of
DDBMS
Homogeneous DDBMS
Heterogeneous DDBMS
Homogene
ous
DDBMS
25
Distributed
Database
Design
Fragmentation.
Allocation
Replication
Distributed
Database
Fragmentation
Design
Relation may be divided into a number of sub-relations, which are then distributed.
Allocation
Each fragment is stored at site with "optimal" distribution.
Replication
Copy of fragment may be maintained at several sites.
Fragmentat
ion
Centralized
Consists of single database and DBMS stored at one
site with users distributed across the network.
Partitioned
Database partitioned into disjoint fragments, each
fragment assigned to one site.
Data
Allocation
Complete Replication
Consists of maintaining complete copy of database at
each site.
Selective Replication
Combination of partitioning, replication, and
centralization.
Compariso
n of
Strategies
for Data
Distribution
33
Why
Fragment?
Usage
Applications work with views rather than entire
relations.
Efficiency
Data is stored close to where it is most frequently
used.
Data that is not needed by local applications is not
stored.
Why
Fragment?
Parallelism
With fragments as unit of distribution, transaction
can be divided into several subqueries that operate
on fragments.
Security
Data not required by local applications is not stored
and so not available to unauthorized users.
Disadvantages
Performance
Integrity.
Correctnes
s of
Fragmentat
ion
Completeness
Reconstruction
Disjointness.
Correctnes
s of
Fragmentat
ion
Completeness
If relation R is decomposed into fragments R1,
R2, ... Rn, each data item that can be found in R must
appear in at least one fragment.
Reconstruction
Must be possible to define a relational operation that will
reconstruct R from the fragments.
Reconstruction for horizontal fragmentation is Union
operation and Join for vertical .
Correctnes
s of
Fragmentat
ion
Disjointness
If data item di appears in fragment Ri, then it should not
appear in any other fragment.
Exception: vertical fragmentation, where primary key
attributes must be repeated to allow reconstruction.
For horizontal fragmentation, data item is a tuple
For vertical fragmentation, data item is an attribute.
Types of
Fragmentat
ion
Horizontal
Vertical
Mixed
Derived.
41
Mixed
Fragmentat
ion
Horizontal
Fragmentat
ion
Distribution Transparency
Fragmentation Transparency
Location Transparency
Replication Transparency
Local Mapping Transparency
Naming Transparency
Transparen
cies in a
DDBMS
Transaction Transparency
Concurrency Transparency
Failure Transparency
Performance Transparency
DBMS Transparency
Distribution
Transparen
cy
I/O cost;
CPU cost;
communication cost.
Date’s 12
Rules for a
DDBMS
0. Fundamental Principle
To the user, a distributed system should
look exactly like a non-distributed system.
1. Local Autonomy
2. No Reliance on a Central Site
3. Continuous Operation
4. Location Independence
5. Fragmentation Independence
6. Replication Independence
Date’s 12
Rules for a
DDBMS
• For n transactions T1, T2, ..., Tn, where each Ti has mi read
and write operations, the number of possible schedules is (! is
factorial function):
(m1 + m2 + … + mn)! / ( (m1)! * (m2)! * … * (mn)! )
• Schedule A: r1(X); w1(X); r2(X); w2(X); c2; r1(Y); w1(Y); c1 (or a1)
• Schedule B: r1(X); w1(X); r2(X); w2(X); r1(Y); w1(Y); c1 (or a1); ...
Characterizing Schedules
based on Recoverability (cont.)
Recoverable schedules can be further refined:
•Cascadeless schedule: A schedule in which a transaction T2
cannot read an item X until the transaction T1 that last wrote X
has committed.
•The set of cascadeless schedules is a subset of the set of
recoverable schedules.
• Result equivalent: Two schedules are called result equivalent if they produce the
same final state of the database.
• Difficult to determine without analyzing the internal operations of the
transactions, which is not feasible in general.
• May also get result equivalence by chance for a particular input parameter even
though schedules are not equivalent in general (see Figure 21.6, next slide)
Equivalence of Schedules (cont.)
• Conflict equivalent: Two schedules are conflict equivalent if the relative order of
any two conflicting operations is the same in both schedules.
• Commonly used definition of schedule equivalence
• Two operations are conflicting if:
– They access the same data item X
– They are from two different transactions
– At least one is a write operation
• Read-Write conflict example: r1(X) and w2(X)
• Write-write conflict example: w1(Y) and w2(Y)
Equivalence of Schedules (cont.)
In Sa, the operations w2(X) and w3(X) are blind writes, since T2
and T3 do not read the value of X.
ns
or by using indices in the following ways:
A5 (primary index, comparison). (Relation is sorted on A)
For A V(r) use index to find first tuple v and scan relation sequentially
from there
For AV (r) just scan relation sequentially till first tuple > v; do not use
index
A6 (secondary index, comparison).
For A V(r) use index to find first index entry v and scan index
sequentially from there, to find pointers to records.
For AV (r) just scan leaf pages of index finding pointers to records, till first
entry > v
In either case, retrieve records that are pointed to
requires an I/O for each record
Linear file scan may be cheaper
Implementa
tion of
Conjunction: 1 2. . . Complex
n(r)
Selections
A7 (conjunctive selection using one index).
Select a combination of i and algorithms A1 through A7 that results in the
least cost for i (r).
Test other conditions on tuple after fetching it into memory buffer.
A8 (conjunctive selection using composite index).
Use appropriate composite (multiple-key) index if available.
A9 (conjunctive selection by intersection of identifiers).
Requires indices with record pointers.
Use corresponding index for each condition, and take intersection of all the
obtained sets of record pointers.
Then fetch records from file
If some conditions do not have appropriate indices, apply test in memory.
Algorithms
for
Disjunction:1 2 . . Complex
. n (r).
Selections
A10 (disjunctive selection by union of identifiers).
Applicable if all conditions have available indices.
Otherwise use linear scan.
Use corresponding index for each condition, and take union of all the
obtained sets of record pointers.
Then fetch records from file
Negation: (r)
Use linear scan on file
If very few records satisfy , and an index is applicable to
Find satisfying records using index and fetch from file
Sorting
Cost analysis:
1 block per run leads to too many seeks during
merge
Instead use bb buffer blocks per run
read/write bb blocks at a time
Can merge M/bb–1 runs in one pass
Total number of merge passes required: log M/bb–
1(br/M).
Block transfers for initial run creation as well as in
each pass is 2br
for final pass, we don’t count write cost
we ignore final write cost for all operations
since the output of an operation may be
External
Merge Sort
(Cont.)
Cost of seeks
During run generation: one seek to read each
run and one seek to write each run
2 br / M
During the merge phase
Need 2 br / bb seeks for each merge pass
except the final one which does not
require a write
Total number of seeks:
2 br / M + br / bb (2 logM/bb–1(br / M)
-1)
Join
Operation
Relation s is called the build input and r is called the probe input.
Hash-Join
algorithm
The value n and the hash (Cont.)
function h is chosen such that each si should fit
in memory.
Typically n is chosen as bs/M * f where f is a “fudge factor”,
typically around 1.2
The probe relation partitions si need not fit in memory
Recursive partitioning required if number of partitions n is greater than
number of pages M of memory.
instead of partitioning n ways, use M – 1 partitions for s
Further partition the M – 1 partitions using a different hash function
Use same partitioning method on r
Rarely required: e.g., with block size of 4 KB, recursive partitioning
not needed for relations of < 1GB with memory size of 2MB, or
relations of < 36 GB with memory of 12 MB
Handling of
Overflows
Partitioning is said to be skewed if some partitions have significantly more tuples
than some others
Hash-table overflow occurs in partition si if si does not fit in memory. Reasons could
be
Many tuples in s with same value for join attributes
Bad hash function
Overflow resolution can be done in build phase
Partition si is further partitioned using different hash function.
Partition ri must be similarly partitioned.
Overflow avoidance performs partitioning carefully to avoid overflows during build
phase
E.g. partition build relation into many partitions, then combine them
Both approaches fail with large numbers of duplicates
Fallback option: use block nested loops join on overflowed partitions
Cost of
Hash-Join
If recursive partitioning is not required: cost of hash join is
3(br + bs) +4 nh block transfers +
2( br / bb + bs / bb) seeks
If recursive partitioning required:
number of passes required for partitioning build relation s to less than
M blocks per partition is logM/bb–1(bs/M)
best to choose the smaller relation as the build relation.
Total cost estimate is:
2(br + bs) logM/bb–1(bs/M) + br + bs block transfers +
2(br / bb + bs / bb) logM/bb–1(bs/M) seeks
If the entire build input can be kept in main memory no partitioning is
required
Cost estimate goes down to br + bs.
Example of
Cost of
instructor teaches
Hash-Join
Assume that memory size is 20 blocks
binstructor= 100 and bteaches = 400.
instructor is to be used as build input. Partition it into five partitions, each
of size 20 blocks. This partitioning can be done in one pass.
Similarly, partition teaches into five partitions,each of size 80. This is also
done in one pass.
Therefore total cost, ignoring cost of writing partially filled blocks:
3(100 + 400) = 1500 block transfers +
2( 100/3 + 400/3) = 336 seeks
Hybrid
Hash–Join
Useful when memory sized are relatively large, and the build input is bigger
than memory.
Main feature of hybrid hash join:
Keep the first partition of the build relation in memory.
E.g. With memory size of 25 blocks, instructor can be partitioned into five
partitions, each of size 20 blocks.
Division of memory:
The first partition occupies 20 blocks of memory
1 block is used for input, and 1 block each for buffering the other 4
partitions.
teaches is similarly partitioned into five partitions each of size 80
the first is used right away for probing, instead of being written out
Cost of 3(80 + 320) + 20 +80 = 1300 block transfers for
hybrid hash join, instead of 1500 with plain hash-join.
Hybrid hash-join most useful if M >>
bs
Complex
Joins
Join with a conjunctive condition:
r 1 2... n s
Either use nested loops/block nested loops, or
Compute the result of one of the simpler joins r i s
final result comprises those tuples in the intermediate result that
satisfy the remaining conditions
1 . . . i –1 i +1 . . . n
Join with a disjunctive condition
r 1 2 ... n s
Either use nested loops/block nested loops, or
Compute as the union of the records in individual joins r i s:
(r 1 s) (r 2 s) . . . (r n s)
Other
Operations
Duplicate elimination can be implemented via hashing or sorting.
On sorting duplicates will come adjacent to each other, and all but one
set of duplicates can be deleted.
Optimization: duplicates can be deleted during run generation as well as
at intermediate merge steps in external sort-merge.
Hashing is similar – duplicates will come into the same bucket.
Projection:
perform projection on each tuple
followed by duplicate elimination.
Other
Operations
: in a manner similar to duplicate elimination.
Aggregation can be implemented
Sorting or hashing can be used to bring tuples in the same group
together, and then theAggregatio
aggregate functions can be applied on each group.
n
Optimization: combine tuples in the same group during run generation
and intermediate merges, by computing partial aggregate values
For count, min, max, sum: keep aggregate values on tuples found so
far in the group.
When combining partial aggregate for count, add up the
aggregates
For avg, keep sum and count, and divide sum by count at the end
Other
Operations
Set operations (, and ): : Setcan either use variant of merge-join after
sorting, or variant of hash-join.
Operations
E.g., Set operations using hashing:
1. Partition both relations using the same hash function
2. Process each partition i as follows.
1. Using a different hashing function, build an in-memory hash index
on ri.
2. Process si as follows
r s:
1. Add tuples in si to the hash index if they are not already in
it.
2. At end of si add the tuples in the hash index to the result.
Other
Operations
E.g., Set operations using:hashing:
Set
1. as before partition r and s,
2. as before, processOperations
each partition i as follows
1. build a hash index on ri
2. Process si as follows
r s:
1. output tuples in si to the result if they are already
there in the hash index
r – s:
1. for each tuple in si, if it is there in the hash index,
delete it from the index.
2. At end of si add remaining tuples in the hash index to
the result.
Other
Operations
: Outer
Outer join can be computed either as Join
A join followed by addition of null-padded non-participating tuples.
by modifying the join algorithms.
Modifying merge join to compute r s
In r s, non participating tuples are those in r – R(r s)
Modify merge-join to compute r s:
During merging, for every tuple tr from r that do not match any
tuple in s, output tr padded with nulls.
Right outer-join and full outer-join can be computed similarly.
Other
Operations
: Outer
Modifying hash join to compute r sJoin
If r is probe relation, output non-matching r tuples padded with nulls
If r is build relation, when probing keep track of which
r tuples matched s tuples. At end of si output
non-matched r tuples padded with nulls
Evaluation
of
Expression
s