0% found this document useful (0 votes)
22 views48 pages

Unit V Dbms Ppts

transaction management dbms
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views48 pages

Unit V Dbms Ppts

transaction management dbms
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 48

Unit V

Concurrency Control Techniques


Disk Storage, Basic File Structures, and
Hashing
UNIT 5 Syllabus
Concurrency Control Techniques:
Two-Phase Locking Techniques for Concurrency
Control- Concurrency Control Based on
Timestamp Ordering.
Disk Storage, Basic File Structures, and Hashing:
Introduction, Secondary Storage Devices,
Buffering of Blocks, Placing File Records on Disk,
Operations on Files.
Transaction
• Transaction is a set of operations which are all
logically related
• Transaction is a single logical unit of work
formed by a set of operations
Operations in a transaction-

1. Read Operation
2. Write Operation
Concurrency Control
Concurrency Control
Concurrency Control in Database Management
System is a procedure of managing simultaneous
operations without conflicting with each other.
It ensures that Database transactions are
performed concurrently and accurately to
produce correct results without violating data
integrity of the respective Database.
Potential problems of Concurrency

• Potential problems of Concurrency


• Here, are some issues while using the DBMS Concurrency Control
method:
• Lost Updates occur when multiple transactions select the same row
and update the row based on the value selected
• Uncommitted dependency issues occur when the second transaction
selects a row which is updated by another transaction (dirty read)
• Non-Repeatable Read occurs when a second transaction is trying to
access the same row several times and reads different data each time.
• Incorrect Summary issue occurs when one transaction takes summary
over the value of all the instances of a repeated data-item, and second
transaction update few instances of that specific data-item. In that
situation, the resulting summary does not reflect a correct result.
Uses Of Concurrency Control
• To apply Isolation through mutual exclusion
between conflicting transactions
• To resolve read-write and write-write conflict issues
• To preserve database consistency through
constantly preserving execution obstructions
• The system needs to control the interaction among
the concurrent transactions. This control is achieved
using concurrent-control schemes.
• Concurrency control helps to ensure serializability
Concurrency Control Protocols
• Different concurrency control protocols offer
different benefits between the amount of
concurrency they allow and the amount of
overhead that they impose. Following are the
Concurrency Control techniques in DBMS:
• Lock-Based Protocols
• Two Phase Locking Protocol
• Timestamp-Based Protocols
• Validation-Based Protocols
Two-Phase Locking Techniques for
Concurrency Control
Lock Management
• To provide concurrency control and prevent
uncontrolled data access, the database manager
places locks on data.
Lock:
• A lock is a variable associated with a data item that
describes the status of the item with respect to
possible operations that can be applied to it. Generally,
there is one lock for each data item in the data base.
Locks are used as a means of synchronizing the access
by concurrent transactions to the database item.
Lock Management
• A lock manager can be implemented as a separate process to
which transactions send lock and unlock requests
• The lock manager replies to a lock request by sending a lock
grant messages (or a message asking the transaction to roll
back, in case of a deadlock)
• The requesting transaction waits until its request is answered
• The lock manager maintains a data-structure called a lock
table to record granted locks and pending requests
• The lock table is usually implemented as an in-memory hash
table indexed on the name of the data item being locked
Lock Table
• Dark blue rectangles indicate granted locks;
light blue indicate waiting requests
• Lock table also records the type of lock
granted or requested
• New request is added to the end of the queue
of requests for the data item, and granted if it
is compatible with all earlier locks
• Unlock requests result in the request being
deleted, and later requests are checked to see
if they can now be granted
• If transaction aborts, all waiting or granted
requests of the transaction are deleted
– lock manager may keep a list of locks
held by each transaction, to implement
this efficiently
Lock Management
• Types of Locks
1.Binary Locks :
• A binary lock can have two states or values:
• locked and unlocked.
• A distinct lock is associated with each database item A. If the
value of the lock on A is 1, item A cannot be accessed by a
database operation that requests the item. If the value of the
lock on A is 0 then item can be accessed when requested.
• If LOCK (A) = 1, the transaction is forced to wait. If LOCK (A) =
0 it is set to 1 (the transaction locks the item) and the
transaction is allowed to access item A.
Lock Management
2. In this type of protocol, any transaction cannot read or write data
until it acquires an appropriate lock on it. There are two types of locks:
1. Shared lock:
• It is also known as a Read-only lock. In a shared lock, the data item
can only read by the transaction.
• It can be shared between the transactions because when the
transaction holds a lock, then it can't update the data on the data
item.
2. Exclusive lock:
• In the exclusive lock, the data item can be both reads as well as
written by the transaction.
• This lock is exclusive, and in this lock, multiple transactions do not
modify the same data simultaneously.
Lock Management
• 3. Simplistic Lock Protocol
• This type of lock-based protocols allows transactions to
obtain a lock on every object before beginning operation.
Transactions may unlock the data item after finishing the
‘write’ operation.
• 4. Pre-claiming Locking
• Pre-claiming lock protocol helps to evaluate operations and
create a list of required data items which are needed to
initiate an execution process. In the situation when all
locks are granted, the transaction executes. After that, all
locks release when all of its operations are over.
Two Phase Locking Protocol
• Two Phase Locking Protocol also known as 2PL protocol is a method of
concurrency control in DBMS that ensures serializability by applying a
lock to the transaction data which blocks other transactions to access
the same data simultaneously. Two Phase Locking protocol helps to
eliminate the concurrency problem in DBMS.
• This locking protocol divides the execution phase of a transaction into
three different parts.
• In the first phase, when the transaction begins to execute, it requires
permission for the locks it needs.
• The second part is where the transaction obtains all the locks. When a
transaction releases its first lock, the third phase starts.
• In this third phase, the transaction cannot demand any new locks.
Instead, it only releases the acquired locks.
Two Phase Locking Protocol
Strict Two-Phase Locking Method

• Strict-Two phase locking system is almost similar to 2PL. The only difference is that
Strict-2PL never releases a lock after using it. It holds all the locks until the commit
point and releases all the locks at one go when the process is over.
• Centralized 2PL
• In Centralized 2 PL, a single site is responsible for lock management process. It has
only one lock manager for the entire DBMS.
• Primary copy 2PL
• Primary copy 2PL mechanism, many lock managers are distributed to different
sites. After that, a particular lock manager is responsible for managing the lock for
a set of data items. When the primary copy has been updated, the change is
propagated to the slaves.
• Distributed 2PL
• In this kind of two-phase locking mechanism, Lock managers are distributed to all
sites. They are responsible for managing locks for data at that site. If no data is
replicated, it is equivalent to primary copy 2PL. Communication costs of
Distributed 2PL are quite higher than primary copy 2PL
Two Phase Locking Protocol

• The Two-Phase Locking protocol allows each


transaction to make a lock or unlock request
in two steps:
• Growing Phase: In this phase transaction may
obtain locks but may not release any locks.
• Shrinking Phase: In this phase, a transaction
may release locks but not obtain any new lock
concurrency control with
time stamping methods
Timestamp Based Protocol
• The most commonly used concurrency protocol is the timestamp-based
protocol. This protocol uses either system time or logical encounter as a
time-stamp.
• Lock-based protocols manage the order between the conflicting pairs
among transactions at the time of execution, whereas timestamp-based
protocols start working as soon as a transaction is created.

• Lock-based protocols manage the order between the conflicting pairs


among transactions at the time of execution, whereas timestamp-based
protocols start working as soon as a transaction is created.
• Time-stamp is a unique identifier created by the DBMS to identify a
transaction. —Each transaction is issued a time-stamp when it enters the
system. If an old transaction Ti has time-stamp TS(Ti), a new transaction Tj
is assigned time-stamp TS(Tj) such that TS(Ti) < TS(Tj).
Timestamp Based Protocol
• The time-stamp ordering protocol ensures
serializability among transactions in their conflicting
read and writes operations. This is the responsibility
of the protocol system that the conflicting pair of
tasks should be executed according to the time-
stamp values of the transactions.
Timestamp Based Protocol
The time-stamp of transaction Ti is denoted as TS(Ti)
Read time-stamp of data item X is denoted by R - time-stamp (X)
Write time-stamp of data-item X is denoted by W - time-stamp (X)

Time-stamp ordering protocol works as follows” If a transaction Ti issues a read(X) operation”:

If a TS(T) < W-timestamp(X). Operation rejected

If TS(Ti) >= W-timestamp(X). Operation executed

All data-item time stamps are updated.

If a transaction Ti issues a write(X) operation

If TS(Ti) < R-timestamp(X). Operation rejected

If TS(Ti) < W-timestamp(X). Operation rejected and Ti rolled back. Otherwise, the operation is
executed
Timestamp Based Protocol
• Read Operations
• For read operations, if TS(Ti) < W-TS(X), this violates the time-stamp order of Ti with
regard to the previous writer of X. Thus, Ti is aborted and restarted with a new time-
stamp.

• Otherwise, the read is valid, and Ti is allowed to read X. The DBMS then updates R-
TS(X) to be the max of R-TS(X) and TS(Ti). It also has to make a local copy of X to ensure
repeatable reads for Ti.

• Write Operations
• For write operations, if TS(Ti) < R-TS(X) or TS(Ti) < W-TS(X), Ti must be restarted.
Otherwise, the DBMS allows Ti to write X and updates W-TS(X). Again, it needs to make
a local copy of X to ensure repeatable reads for Ti.

• Thomas Write Rule


• An optimization for writes is if TS(Ti) < W-TS(X), the DBMS can instead ignore the write
and allow the transaction to continue instead of aborting and restarting it. This is called
the Thomas Write Rule.
Timestamp Based Protocol
Advantages
• The time-stamp ordering protocol ensures
conflict serializability. This is because
conflicting operations are processed in time-
stamp order.
• The timestamp-based protocol ensures
freedom from deadlock since no transaction
ever waits.
UNIT V
Disk Storage, Basic File Structures, and Hashing
Introduction, Secondary Storage Devices,
Buffering of Blocks, Placing File Records on Disk,
Operations on Files.
DISK STORAGE
DISK STORAGE
 storage hierarchy
1) primary storage
 storage media that can be operated on directly
by CPU
ex: RAMs: main memory, cache memory
2) secondary storage
ex: magnetic disks, optical disks, and tapes
 larger capacity, cost less, slower access than
primary storage devices
Tertiary storage
3)Tertiary storage:
Tertiary memory is the third storage level just below the
secondary storage. It involves mounting and unmounting
removable mass storage with the help to function
automatically. The main use of tertiary storage is to archive
data that no longer needs to be accessed regularly. These are
mainly used for storing large data, which can be accessed
without human operators and can function automatically. It is
generally used for backing up data. Tertiary storage devices are
very cost-effective, and they offer storage capacities with the
help of robotic arms, which act as removable tapes or disks.
Secondary storage devices
• Secondary storage, also known as auxiliary storage or external
memory, is a type of data storage that provides non-volatile, long-
term storage for computer systems. Unlike primary storage (e.g.,
RAM) which is directly accessible by the Central Processing Unit
(CPU) and is volatile, meaning it loses data when the computer is
switched off, secondary storage retains data even after the
system is powered down.
• Data is stored on external storage devices like disks and tapes.
and fetched into memory when needed for processing. The unit
of information read from or written to disk is called a page The
size of a page is typically 4 or 8 kb.Each record in a file has a
unique id called record. We use the term data entry to refer to
the records stored in an index file.
Secondary Storage Devices

• Hard Disk Drive (HDD): A traditional storage device that uses spinning magnetic disks to
store and access data.
• Solid-State Drive (SSD): A storage device that uses flash memory and has no moving
parts, resulting in faster data access and greater reliability than HDDs.
• USB Flash Drive: A small, portable storage device that also uses flash memory and is
typically connected to a computer via a USB port.
• Optical Discs (CD, DVD, Blu-ray): A storage medium that uses laser technology to read
and write data on plastic discs, often used for multimedia files or software distribution.
• Cloud Storage: A storage service that allows users to save and access data on remote
servers via the internet, enabling easy sharing and accessibility from multiple devices.
Secondary Storage Devices

• Hard disk drives are the most common secondary storage devices in
present computer systems. These are called magnetic disks because
they use the concept of magnetization to store information. Hard
disks consist of metal disks coated with magnetizable material. These
disks are placed vertically on a spindle. A read/write head moves in
between the disks and is used to magnetize or de-magnetize the spot
under it. A magnetized spot can be recognized as 0 (zero) or 1 (one).

• Hard disks are formatted in a well-defined order to store data


efficiently. A hard disk plate has many concentric circles on it, called
tracks. Every track is further divided into sectors. A sector on a hard
disk typically stores 512 bytes of data.
Secondary Storage Devices
Disks: Can retrieve random page at fixed cost –
But reading several consecutive pages is much
cheaperthan reading them in random order
•Tapes: Can only read pages in sequence –
Cheaper than disks; used for archival storage
Placing File Records on
Disk
Hardware description of disk devices
File organization
 storage of DBs
 data stored on disk is organized as files of records
 each record is a collection of data values that can be
interpreted as facts about entities, their attributes, and their
relationships
 primary file organizations: determines how the records of a file
are physically placed on the disk, and hence how the records
can be accessed
1. heap: no particular order
2. sequential file: records are ordered
3. hashed file: uses a hash function applied to a particular field
4. B-tree: uses tree structures
File organization
 File organization:It is a method of arranging a file of
records on external storage
 Record id (rid) is sufficient to physically locate record
 Indexes is a data structure that organizes data
records on adisk to optimize certain kind of retrieval
operations
 Buffer manager brings pages from external storage
to main memory buffer pool.
 File and index layers make calls to the buffer
manager.
File
• A file is a collection of pages, each one
containing a collection of records
• A file organization should support the
following operations:
• – insert/delete/update a record
• – read the record specified by its rid
• – scan all the records, possibly focusing on the
records by satisfying some given condition
Pages and records
• The usual dimension of a page is that of a block
• A page is physically constituted by a set of slots
• A slot is a memory space that may contain one
record
(typically, all slots of a page contain records of one
relation) and has a number that identifies it in the
context of the page
• • Each record has an identifier (record id, o rid)
• rid = <page id, slot number>
File Organization
• File Organization defines how file records are
mapped onto disk blocks. We have four types
of File Organization to organize file records
File organization
Heap File Organization
• When a file is created using Heap File Organization, the Operating System allocates memory area to that file without any
further accounting details. File records can be placed anywhere in that memory area. It is the responsibility of the software to
manage the records. Heap File does not support any ordering, sequencing, or indexing on its own.

Sequential File Organization


• Every file record contains a data field (attribute) to uniquely identify that record. In sequential file organization, records are
placed in the file in some sequential order based on the unique key field or search key. Practically, it is not possible to store all
the records sequentially in physical form.

Hash File Organization


• Hash File Organization uses Hash function computation on some fields of the records. The output of the hash function
determines the location of disk block where the records are to be placed.

Clustered File Organization


• Clustered file organization is not considered good for large databases. In this mechanism, related records from one or more
relations are kept in the same disk block, that is, the ordering of records is not based on primary key or search key.
Operation on Files
• Typical file operations include:
– OPEN: Readies the file for access, and associates a pointer that will refer to a current file record at each
point in time.
– FIND: Searches for the first file record that satisfies a certain condition, and makes it the current file
record.
– FINDNEXT: Searches for the next file record (from the current record) that satisfies a certain condition,
and makes it the current file record.
– READ: Reads the current file record into a program variable.
– INSERT: Inserts a new record into the file & makes it the current file record.
– DELETE: Removes the current file record from the file, usually by marking the record to indicate that it is
no longer valid.
– MODIFY: Changes the values of some fields of the current file record.
– CLOSE: Terminates access to the file.
– REORGANIZE: Reorganizes the file records.
• For example, the records marked deleted are physically removed from the file or a new organization
of the file records is created.
– READ_ORDERED: Read the file blocks in order of a specific field of the file.
HASHING
Buffering of Blocks?
• buffering refers to the temporary storage of data in a buffer, or a small, fixed-sized
area in memory, while it is being moved from one place to another. When data is
transferred from one location to another, it is often necessary to store it
temporarily in a buffer to ensure that the transfer is smooth and efficient.

• There are two main types of buffering: input buffering and output buffering. Input
buffering refers to the temporary storage of data that is being received from an
external source, such as a file on a hard drive or data being transmitted over a
network. Output buffering refers to the temporary storage of data that is being sent
to an external destination, such as a printer or a file on a hard drive.

• One common application of buffering is in the transfer of blocks of data. When a


large amount of data is being transferred, it is often more efficient to transfer it in
smaller blocks rather than all at once. This is because transferring data in smaller
blocks allows the system to process the data more efficiently and reduces the risk
of errors or delays.
Buffering of Blocks
 Benefits of Buffering of Blocks

 Improved performance − Buffering allows data to be transferred more efficiently,


which can improve the overall performance of the system.

 Error detection and recovery − By transferring data in smaller blocks, it is easier to


detect and recover from errors that may occur during the transfer process.

 Reduced risk of data loss − Buffering can help to prevent data loss by temporarily
storing data in a buffer before it is written to a permanent storage location.

 Greater flexibility − Buffering allows data to be transferred asynchronously, which


means that the data can be transferred at a time that is convenient for the system
rather than all at once.
Buffering of Blocks
Database Management
• In database management, buffering is used to
temporarily store data as it is being written to
or read from a database. For example, when
you update a record in a database, the
changes may be temporarily stored in a buffer
before they are written to the database. This
helps to ensure that the database is updated
efficiently and reduces the risk of data loss.
Buffering of Blocks
 How to Buffer Blocks?
 There are several ways to implement block buffering, and the approach that you choose will depend
on your specific requirements and the constraints of your system. Some common methods include −

 Fixed-size block buffering − In this approach, the buffer is divided into a fixed number of blocks, and
each block is given a fixed size. When data is written to the buffer, it is divided into blocks of the
specified size and written to the appropriate block in the buffer. This approach is simple to
implement, but it can be inefficient if the block size does not match the size of the data being
written.

 Dynamic block buffering − In this approach, the size of the blocks in the buffer is not fixed. Instead,
the buffer is divided into a series of linked blocks, and the size of each block is determined by the
amount of data that it contains. This approach is more flexible than fixed-size block buffering, but it
can be more complex to implement.

 Circular block buffering − In this approach, the buffer is treated as a circular buffer, with data being
written to the buffer and then overwriting the oldest data as the buffer becomes full. This approach
is simple to implement and can be efficient, but it can lead to data loss if the data is not processed
quickly enough.

You might also like