JHGV
JHGV
Preface
Abbreviations
Introduction
o Overview of DBMS
o Overview of MMDB
Storage Engine
o Database Management
o Allocator
o Transaction
o Concurrency
o Logging
o Recovery
o Table
o Index
SQL Engine
o Query Representation
o Query Optimization
o Query Execution
Interface Drivers
o ODBC
o JDBC
Authors
1. Prabakaran Thirumalai
Related Books
Structured Query Language
Python Programming/Database Programming
Operating System Design
Embedded Control Systems Design/Operating systems -- other real-time operating systems
Chapter 1: Overview
Contents
[hide]
1 1.1 Introduction
2 1.2 Database
3 1.3 Database Management Systems
4 1.3.1 Benefits of Database Approach
5 1.4 Database System Types
6 1.4.1 Hierarchical DBMS
7 1.4.2 Network DBMS
8 1.4.3 Relational DBMS
9 1.4.4 Main Memory DBMS
10 1.4.5 Column or Vertical DBMS
11 1.4.6 Stream Processing DBMS
12 1.4.7 Object Relational DBMS
13 1.4.8 Distributed DBMS
14 1.5 Database Related Roles and Responsibilities
15 1.6 Programming Interfaces
Generalized Update Access Method (GUAM) is the hierarchical database system developed in early 1960s by Rockwell. Rockwell
developed this software to manage the data usually associated with manufacturing operations. IBM introduced Information
Management System (IMS) as a hierarchical database management system soon after that. The 1970s is the era of relational database
technology. Dr. Codd’s paper on the relational model revolutionized the thinking on data systems. The industry quickly responded to
the superiority of the relational model, by adapting their products to that model. During the 1980s, database systems gained a lot of
ground and a large percentage of businesses made the transition from file-oriented data systems to database systems. Some of the
leading products like ORACLE, DB2, SQL Server, Informix and Sybase started ruling the database world with their flagship
relational database management systems (RDBMS).
Relational model matured in 1990s and became the leading data model. Towards the end of 90s, object oriented database gained
popularity. Though the transition has started, applications that are already developed in relational model are reluctant to move to
object oriented model. It is only, the new applications that are using this object model as it allows the user to design the application
more naturally than the relational model.
Most of the leading database management systems support object-oriented model. Many of them give object to relational mapping to
achieve object model support. For example, DB2 is Relational, Hierarchical (XML), Object oriented Database Management system.
Commercial DBMS
Oracle
DB2
Microsoft SQL Server
Sybase
Informix
MySQL
Postgres
Firebird
CSQL
A database is collection of related data, designed, built and populated with data for a specific purpose.
A database could be of any size and of varying complexity. For example, the list of name and address of employees may consist of
few hundred to thousand records varying upon the organization size. On the other hand, there are databases with huge number of
records. For example, database maintained by Income Tax Department to keep track of tax filed by taxpayers. In India, lets say there
are around 1 billion tax payers, and if each taxpayer files with approximately 500 characters of information per form, we would get a
database of 10^9 * 500 = 500 Giga bytes (GB) of data. For keeping at least past three years of returns, we need 1.5 Terabytes(TB) of
space. This huge amount of information must be organized and managed so that users can search and update the data as needed. There
are some databases that are highly complex in nature because of the relationship which exist between the records. Railway reservation
database is best example for complex database.
The database approach includes the fundamental operations that can be applied to data. Every database management system provides
for the following basic operations:
READ data contained in the database, INSERT data to the database, UPDATE individual parts of the data in the database, DELETE
portions of the data in the database
Data is organized into a tree like structure, which allows repeating information using parent/child as one to many relationships. Parent
can have many children but child has only one parent. This model was widely used in the first main frame database management
systems. The most common form of hierarchical model used currently is the LDAP model. This model gained popularity again with
the recent XML databases.
An XML database is a data persistence software system that allows data to be imported, accessed and exported in the XML format.
Two major classes of XML database exist:
XML-enabled: These map all XML to a traditional database (such as a relational database), accepting XML as input and
rendering XML as output.
Native XML (NXD): The internal model of such databases depends on XML and uses XML documents as the fundamental
unit of storage.
[edit] 1.4.2 Network DBMS
This model is an extension to hierarchical data model in which each record can have multiple parents and multiple child records. In
effect, it supports many to many relationships. It provides flexible way to represent objects and their relationships. But before it gains
popularity, new model ‘relational model’ was proposed and that replaced network database model as soon as it was proposed.
Dr. E. F. Codd, the father of the relational model, stipulated the rules and proposed this model. Data is represented as mathematical n-
ary relations, an n-ary relation being a subset of the Cartesian product of n domains. "Relation" is a mathematical term for "table", and
thus "relational" roughly means, "based on tables".
The basic principle of the relational model is the Information Principle: all information is represented by data values in relations. In
accordance with this Principle, a relational database is a set of relations and the result of every query is presented as a relation.
The basic relational building block is the domain or data type. A tuple is an unordered set of attribute values. An attribute is an ordered
pair of attribute name and type name. An attribute value is a specific valid value for the type of the attribute. This can be either a scalar
value or a more complex type. A relation is defined as a set of n-tuples. A table in a relational database, alternatively known as a
relation, is a two-dimensional structure used to hold related information. A database consists of one or more related tables. Don’t
confuse a relation with relationships. A relation is essentially a table, and a relationship is a way to correlate, join, or associate the two
tables.
A row in a table is a collection or instance of one thing, such as one employee or one line item on an invoice. A column contains all
the information of a single type, and the piece of data at the intersection of a row and a column, a field, is the smallest piece of
information that can be retrieved with the database’s query language.
The consistency of a relational database is enforced, not by rules built into the applications that use it, but rather by constraints,
declared as part of the logical schema and enforced by the DBMS for all applications. The relational model establishes the connections
between related data occurrences by means of logical links implemented through foreign keys.
The relational model defines operations such as select, project and join. Although these operations may not be explicit in a particular
query language, they provide the foundation on which a query language is built.
SQL, which stands for Structured Query Language, supports the database components in virtually every modern relational database
system. SQL has been refined and improved by the American National Standards Institute (ANSI) for more than 20 years.
ANSI introduced standard querying language to access relational databases, SQL (Structured Query Language). All database vendors
developed SQL Engines on top of their relational engines to interpret and execute these SQL Statements. This lead to standard
interfaces to emerge in programming languages as well. ODBC for C and JDBC for JAVA became the de-facto standard to access the
SQL engine.
This book is mainly focused on relational database management systems (RDBMS) as other database systems are built on top of this
RDBMS, and they have nothing to gain from the main memory nature of the MMDBs.
Disk and memory capacities continue to grow much faster than latency and bandwidth improvements. Now, multi-terabyte RAM
scans take minutes and terabyte-disk scans take many hours. We shall now keep the whole database in memory and design our data
structures and algorithms intelligently and use multi-processors sharing a massive main memory, and intelligently use precious disk
bandwidth. The database engines need to overhaul their algorithms to deal with the fact that main memories are huge (billions of
pages trillions of bytes). Main memory database implementation has proved that they can execute queries ten to twenty times faster
than traditional approach. The era of main memory databases has finally arrived. In this book we will discuss about this type of
database management systems.
Database Administrator takes care of database itself, DBMS and related software. He is responsible for authorizing access to
the database, monitor database usage and for acquiring software and hardware resources as needed.
Database Designers are responsible for identifying the data to be stored in the database and for choosing appropriate structures
to represent and store this data. They interact with specialized users and develop ‘views’ of the database that meets their
application requirements.
System Analysts / Software Engineers who thoroughly understands the functionalities of DBMS so as to implement their
applications to meet their complex application requirements.
DBMS Kernel Developers are persons who design and implement the DBMS interfaces and modules as a software package. .
DBMS Tool Developers include persons who develop tools to access and use DBMS software. Typical packages include
database design, performance monitoring, GUI, etc.
This book is mainly focused on persons who perform “DBMS Kernel Developers” role.
Embedded SQL – Embedding SQL Commands in a general purpose programming language. Database statements are
embedded into the programming language and are identified by preprocessor with prefix “EXEC SQL”. These pre processors
convert these statements into DBMS generated code. ESQL for C/C++ and SQLJ for Java.
Native language drivers – standard interface on top of SQL commands. These provide functions to connect to database execute
statement, retrieve resultant records, etc. ODBC, JDBC are examples
Proprietary language/drivers – PL/SQL, PHP drivers. Mysql Provides PHP driver to access the database.
Chapter 2: Introduction to Database Management Systems
Contents
[hide]
1 2.1 Overview
2 2.2 Driver Interfaces
3 2.3 SQL Engine
4 2.4 Transaction Engine
5 2.5 Relational Engine
6 2.6 Storage Engine
7 2.7 SELECT Execution Sequence
8 2.8 INSERT Execution Sequence
Figure 1 contains DBMS components, Memory layout and disk files associated with relational database management system. From
early days of database system evolution, disk is considered to be the backing store for the data to achieve durability. The architecture
above applies for disk resident database systems (DRDB). Nowadays there are two different approaches other than DRDBs. They are
Most of the components in the DRDB system architecture above are present in main memory and network databases as well.
[edit] 2.2 Driver Interfaces
A user or application program shall initiate either schema modification or content modification. These application requests are broadly
classified by SQL as Data Definition Language (DDL), Data Manipulation Language (DML) and Data Control Language (DCL).
DDL deals with schema modifications; DML deals with content modifications; DCL deals with user access and privilege
modifications. If the application program is written in C/C++, it shall use ODBC drivers to connect to the DBMS, or if it is written in
Java, it shall use JDBC drivers to connect to DBMS. Some vendors provide language specific proprietary interfaces. For example
MySQL provides drivers for PHP, Python, etc.
These drivers are built on top of SQL. They provide methods to prepare statements, execute statements, fetch results, etc.
Compiler – builds a data structure from the SQL statement and then does semantic checking on the query such as whether the table
exists, field exists, etc.
Optimizer – transforms the initial query plan (data structure created by compiler), into sequence of operations usually pipelined
together to achieve fast execution. It refers the metadata (dictionary) and statistical information stored about the data to decide which
sequence of operations is likely to be faster and based on that it creates the optimal query plan. Both cost and rule based optimizers are
used in case of DRDBs.
Execution Engine – executes each step in the query plan chosen by optimizer. It interacts with the relational engine to retrieve and
store records.
Durability: Once the transaction completes, effect of the transaction on the database must never be lost.
All the above properties are explained in detail under the Transaction Chapter.
Concurrency Manager – responsible for concurrent synchronized access to data. This is usually implemented using latches and
locks. Latches or Mutexes are acquired and released for short duration synchronization and locks are used for that of long duration.
Log Manager – responsible for atomicity and durability property of transaction. Undo logs make sure that transaction rollback takes
the database state to previous consistent state when that transaction started. Redo logs make sure that all committed transactions shall
be recovered in case of crash.
Recovery Manager- responsible for recovering the database from the disk image and redo log files. Most of the databases uses a
technique called ‘shadow paging’ to maintain consistent image of memory in disk.
Field – abstracts column level information including type, length, etc Catalog – maintains Meta data information about the relational
database objects such as table, index, trigger, field, etc
Table – responsible for insert, update, delete, fetch, execute. It interacts with the allocator subsystem of storage engine, which in turn
talks to buffer manager to get the job done.
Index – responsible for insert, update, delete, and scan of index nodes for all index types. Popular index types are hash and tree. Hash
index is used for improving the point lookup (predicate with equality on primary key) and tree index is used for improving the range
query (predicate with greater or less than operator on key).
Expression Engine – represents the predicate (WHERE clause of SQL statement) of the data retrieval operation and responsible for
evaluating the expressions, which shall include arithmetic, comparison, and logical expressions.
Buffer manager – responsible for loading pages from disk to memory and to manage the buffer pool based on Least Recently Used
(LRU) algorithm. This also has special purpose allocator for storing control information, which are transient. Buffer pool is the
memory space used by buffer manager to cache disk pages associated with records, index information, Meta data information. Some
database systems have space limit at individual level and some at global level for buffer pool size.
File Manager – Database in DRDB is nothing but a physical file at disk. File manager maps disk pages of the file to the memory
pages and does the actual disk I/O operations in case of major faults generated by buffer manager module.
Process Manager – responsible for registering and deregistering database application process and threads and account all the
resources (transactions, locks, latches) acquired by them.
Contents
[hide]
1 3.1 Overview
2 3.2 Memory Segment
3 3.3 SQL Engine
4 3.4 Relational Engine
5 3.5 Transaction Engine
6 3.6 Storage Engine
As the basic underlying assumption changed in this type of database management system, it leads to research and design of each and
every component of storage, relational and sql engine of the traditional disk based management system.
One disk block transfer takes approximately 5 milliseconds whereas main memory access takes 100 nanoseconds. By keeping the
whole database in memory, these disk I/Os are converted to memory access, thereby improves the throughput by many-folds. This
leads to an interesting question “In DRDBs if all the data is cached in buffer pool, then they will perform as good as MMDBs?”
Unfortunately the answer is No. This is because the data structures and access algorithms are designed for disk based system and
cannot work well when whole database is completely in memory. There are some common misconceptions about main memory
databases
Yes. Multiple users and multiple threads shall access the database and are synchronized through latches and locks.
Data Representation
Data Access Algorithms – Query Processing
Recovery
Concurrency control
This book will discuss more in detail about the difference between DRDB and MMDB implementations and how MMDB is times
faster than DRDB.
Figure 4, depicts a main memory database management system. This has nearly all the components, which are present in disk resident
database management system. Implementations of components under SQL Engine, Relational Engine, and Storage Engine differ
heavily from the DRDB components.
Control Segment containing lock table, process table, transaction table, undo logs and redo logs. These structures are transient and are
required for the operation of the database management system.
Catalog Segment containing meta data about tables, indexes, fields, etc.
User Segment containing records of all the tables and index records of those tables, if any.
T-Tree index structures are better in case of space and cpu cycles for MMDB than B-Tree of DRDB.
[edit] 3.5 Transaction Engine
As MMDB transactions no more wait for disk I/O, it will lead to lot of contention issues and deadlocks, which will hamper the
throughput. This needs to be avoided and given more importance in case of MMDB than DRDB
Storage engine usually takes care of data organization, access methods and transactions in Database Management Systems (DBMS).
This is the lowest subsystem in DBMS that interacts with operating system services. Generally, products interact through OS Layer
subsystem (wrapper to OS system calls) to ease the porting effort.
Data shall be stored in one of the following components of the computer system
Memory
Secondary Storage - Disk
Tertiary storage - Tape
These components have data capacities, cost per byte, speed ranging over many orders of magnitude. Devices with smallest capacity
offer the fastest access speed and have the highest cost per byte.
Current Trend:Tertiary storage are now replaced by disks because of its huge capacity and low cost per byte. Future Trend: Disks are
replaced by main memory because of its increase in capacity and decrease in cost per byte.
One disk block transfer takes approximately 5 milliseconds whereas main memory access takes 100 nanoseconds. By keeping the
whole database in memory, these disk I/Os are converted to memory access, thereby improves the throughput by many-folds. Memory
being a volatile devices losses the data stored in it, when the power goes off. But disk being a non-volatile device keeps the content
intact even for long periods when there is a power failure. This makes main memory storage manager difficult to implement. It uses
logging and checkpointing mechanisms to cope up with power failure and make the data stored in main memory persistant.
Database is a collection of objects that hold and manipulate data. Every database management system will have a limit on the total
number of databases supported. Most of the databases support up to 10K databases per instance. Because of the virtual address space
limitation, MMDB may not be able to support 100 databases per instance. Lets assume that database size is set to 100MB, then
instance shall support 4 GB/ 100 MB= 40 databases maximum.
Database is a collection of many relational objects such as tables, views, and constraints. A user owns a database; the owners shall
give special access to other users.
Contents
[hide]
Another security issue with databases that work across network shall be handled though data encryption. Encryption can also be used
to provide additional protection for sensitive portions of database as well. The data is encoded using some coding algorithm. An
unauthorized user who accesses encoded data will have difficulty in decoding the data. Authorized users will be given the decryption
algorithms using which they shall access the data.
3.The DBMS checks that the account number and password are valid; if they are, the user is permitted to use
the DBMS and to access the database. Application programs can also be considered as users and can be
required to supply passwords.
4. It is straightforward to keep track of database users and their accounts and passwords by creating an encrypted table or file with the
two fields account number and password. This table can easily be maintained by the DBMS. Whenever a new account is created, a
new record is inserted into the table. When an account is canceled, the corresponding record must be deleted from the table. 5. The
database system must also keep track of all operations on the database that are applied by a certain user throughout each login session,
which consists of the sequence of the database interactions that a user performs from the time of logging in to the time of logging off.
When a user logs in, the DBMS can record the user's account number and associate it with the terminal are attributed to the user's
account until the user logs off. It is particularly important to keep track of update operations that are applied to the database so that, if
the database is tampered with, the DBA can find out which user did the tampering. 6.To keep a record of all updates applied to the
database and of the particular user who applied each update, we can modify system log, which includes an entry for each operation
applied to the database that may be required for recovery from a transaction failure or system crash. We can expand the log entries so
that they also include the account number of the user and the on line terminal ID that applied each operation recorded in the log. If any
tampering with the database is suspected, a database audit is performed, which consists of reviewing the log to examine all accesses
and operations applied to the database during a certain time period. When an illegal or unauthorized operation is found, the DBA can
determine the account number used to perform this operation. Database audits are particularly important for sensitive databases that
are updated by many transactions and users, such as a banking database that is updated by many bank tellers. A database log that is
used mainly for security purposes is sometimes called an audit trail.
A transaction is basic unit of work that comprises of many database operations. Transactions shall either be committed or rolled back.
This means it shall decide whether to submit all the changes to DBMS or to neglet all the changes and take the database state to the
state when that transaction started.
Contents
[hide]
1 Transaction Stages
2 Transaction pseudo code for money transfer
3 Transaction Properties
4 Correctness Principle
Transaction Start
Database operations (INSERT, UPDATE, DELETE, SELECT)
Commit or Abort
Database operations include read and modifications of data records. The effect of running SELECT statement is read and effect of
running INSERT, UPDATE and DELETE is data record modifications.
The commit operation informs the successful end of transaction to the transaction manager. After commit operation, database should
be in consistent state and all the updates made by that transaction should be made permanent.
The rollback or abort operation informs the transaction manager that there is an error in one of the operations involved in the
transaction and database is in inconsistent state. All the updates made by the transaction must be undone on the database to get it back
to the previous consistent state; that is state at which the transaction started.
COMMIT;
GOTO FINISH;
UNDO:
ROLLBACK;
FINISH:
RETURN;
Atomicity — Each transaction is treated as all or nothing- it either commits or aborts. If a transaction commits, all its effects remain.
If it aborts, all its effects are undone.
Consistency — Transactions preserve data consistency; it always takes the database from one consistent state to another consistent
state. Intermediate state shall be inconsistent, but commit or rollback operation should again take the database to another new
consistent state or to the old consistent state.
Isolation — Transactions should be isolated from one another. Even though all transactions run in parallel, updates made by one
transaction should not be visible to other and vice versa until the transaction commits.
Durability — Transaction commit should ensure that its updates are present in the database, even if there is a subsequent system
crash.
Contents
[hide]
1 8.1 Overview
2 8.1.1 Pessimistic Concurrency
3 8.1.2 Optimistic Concurrency
4 8.2 Concurrency problem
5 8.2.1 Lost Update Problem
6 8.2.2 Uncommitted dependency
7 8.2.3 Inconsistent Analysis
8 8.3 Locking
9 8.3.1 Serializable transactions
10 8.3.2 Two Phase Locking Protocol
11 8.3.2.1 Conservative 2 PL
12 8.3.2.2 Strict 2 PL
13 8.3.2.3 Rigorous 2 PL
14 8.3.3 Lock Starvation
15 8.3.4 Dead Lock
16 8.3.4.1 Dead Lock Prevention
17 8.3.4.2 Dead Lock Detection
18 8.3.4.3 Timeouts
19 8.4. Isolation Levels
20 8.5 Lock Granularity
21 8.5.1 Granularity Levels in DBMS
22 8.5.2 Intent Locks
23 8.5.2 Lock Escalation
24 8.6 Index and Predicate Locking
25 8.7 Timestamp based concurrency control (TODO:Rephrase whole section)
26 8.7.1 Timestamps
27 8.7.2 Basic Timestamp ordering
28 8.7.3 Strict Timestamp ordering
29 8.8 Multi Version concurrency control
30 8.9 Optimistic concurrency control
31 8.10 Architecture for lock manager
In general, database systems uses two approaches to manage concurrent data access; pessimistic and optimistic. Conflicts cannot be
avoided in both the models, it differs only in when the conflicts are dealt.
Transaction-1 reads tuple1 at time t1 Transaction-2 reads tuple1 at time t2 Transaction-1 updates tuple1 at time t3 Transaction-2
updates tuple1 at time t4
Transaction-1 update on tuple1 at time t3 is lost as Transaction-2 overwrites the update made by Transaction-1 on tuple1 without
checking whether it has changed.
If transaction-2 updates instead of reading tuple t1 at t2, the situation is even worse, it will loose its update on tuple t1, once
transaction-1 rolls back.
For example, suppose that transaction-1 is calculating the total number of reservations on all the theatres for particular day; meanwhile
transaction-2 is reserving 5 seats on that day, then results of transaction-1 will be off by 5 because transaction-1 reads the value of X
after 5 seats have been subtracted from it.
A lock is a variable associated with a data item that describes the status of the data item. Generally there is one lock for each data
item(record) in DBMS. Locks are used to provide synchronous access to the data items by concurrent transactions. There are two
types of locks supported by Unix kind of operating systems
pthread mutexes
semaphores
pthread mutexes work well with multiple threads and semaphores work well with multiple threads as well as multiple process. Pthread
mutexes are called binary locks as they have two states (lockState); locked and unlocked. Semaphores can be used as binary locks as
well as counting locks. In our case, binary locks will be used to provide synchronized concurrent access to the data items. Two
operations, lockItem() and unlockItem() are used with binary locking. A transaction requests access to data item X by first issuing
lockItem(X) operation. If lockState(X) is 1, then the transaction is forced to wait till lockState(X) becomes 0. If it is zero, then
lockState(X) is set to 1, and the transaction is allowed to access data item X. When the transaction is through with using the data item,
it issues an unlockItem(X) operation, which sets lockState(X) to zero, so that other transactions shall access X.
A read transaction will acquire lock on the data before it reads the data.
A write transaction will acquire lock on the data before it writes the data.
If lock request is denied, then the transaction goes to wait state and tries for the lock periodically until the lock is released by
the transaction that acquired it.
Concurrency shall be slightly improved on the above model by making readers not blocking other readers, as it will not lead to any
inconsistencies. This shall be achieved by introducing another type of lock, which shall be shared by all readers. These locks are called
as Shared Locks. Another type of lock, exclusive locks are obtained by writers to block all readers and writers from accessing the data.
The data access protocol for this locking method is
A read transaction will acquire Shared Lock on the data before it reads the data.
A write transaction will acquire Exclusive Lock on the data before it writes the data.
Lock request is denied for read operation if another transaction has exclusive lock on the data item.
Lock request is denied for write operation if another transaction has read or exclusive lock on the data item.
If lock request is denied, then the transaction goes to wait state and tries for the lock periodically until the lock is released by
the transaction that acquired it.
In the above locking model,
Shared Exclusive No Lock Shared Yes No Yes Exclusive No No Yes No Lock Yes Yes Yes TODO::Above lock compatibility matrix
in image
Lost update problem – transaction-1 waits for exclusive lock forever from time t3 as shared lock is acquired by transaction-1
and transaction-2. Transaction-2 waits for exclusive lock forever from time t4 as shared lock is acquired by transaction-1 and
transaction-2. Our lost update problem is solved, but a new problem has occurred. It is called “DeadLock”. We will look into
its details later.
Uncommitted dependency - Transaction-2 waits for lock when it tries to read tuple1 at time t2 as it is exclusively locked by
transaction-1. It waits till transaction-1 either commits or rollbacks. Locking avoids uncommitted dependency issue.
Inconsistent Analysis – Transcation-1 waits till transaction-2 releases the exclusive lock on the data record X and then
computes the aggregation giving correct results.
[edit] 8.3.1 Serializable transactions
A given set of transactions is considered to be serializable, if it produces the same result, as though these transactions are executed
serially one after the other.
Individual transactions are correct as they transform a correct state of the database to another correct state
Executing transaction one at a time in any serial order is also correct, as individual transactions are independent of each other.
An interleaved execution is correct, if it is equivalent to serial execution or if it is serializable.
The concept of serializability was first introduced by Eswaran, Gray proved the two-phase locking protocol, which is briefly described
as:
If all transactions obey the “two-phase locking protocol”, then all possible interleaved schedules are
serializable.
Two phase locking may limit the amount of concurrency that can occur in a schedule. This is because a transaction may not be able to
release lock on data item after it is through with it. This is the price for guaranteeing serializability of all schedules without having to
check the schedules themselves.
There are number of variations of two phase locking (2PL). The technique we described above is known as 2 phase locking.
Growing phase is before the transaction starts and shrinking phase starts as soon as the transaction ends in case of conservative 2
phase locking.
Growing phase starts as soon as the transaction starts and shrinking happens during either transaction commit or rollback.
Starvation can also occur when the deadlock algorithm, selects the same transaction repeatedly for abort, thereby never allowing it to
finish. The algorithms shall be modified to use higher priorities for transactions that have been aborted multiple times to avoid this
problem.
[edit] 8.3.4 Dead Lock
Deadlock occurs when each transaction in a set of two or more transactions wait for some resource that is locked by some other
transaction in the same set.
For example transaction T1 acquires Resource R1 and transaction T2 acquires resource R2. After this if T1 waits for R2 and T2 waits
for R1. Both will never get the lock, and this situation is termed as deadlock.
Conservative two-phase locking protocol is deadlock prevention protocol, in which all locks are acquired before the transaction
works on the data records.
Ordering of data record locking will also prevent deadlocks. A transaction, which works on several data records, should obtain
locks in pre-determined order always. This requires the programmer or DBMS aware of chosen order of the data record locks. This is
also not practical to implement in database systems.
No Waiting Algorithm If a transaction is unable to obtain a lock, it is immediately aborted and then restarted after a certain time
delay without checking whether a deadlock will actually occur or not. This can cause transactions to abort and restart needlessly.
Cautious Waiting Algorithm
This is proposed to avoid needless restart in case of no waiting algorithm, If transaction T1 tries to lock an data record R1, but is not
able to do so because R1 is locked by some other transaction T2 with a conflicting lock. If T2 is not blocked on some other locked
data record, then T1 is blocked and allowed to wait; otherwise abort T1.
Wait-Die and Wound-Wait Algorithm The other two techniques, wait-die and wound-wait use transaction timestamps as basis to
determine what to do in case of deadlocks. Transaction timestamp in a unique identifier assigned to each transaction. These
timestamps are generally running counter which gets incremented for every transaction started. If transaction T1, starts before
transaction T2, then TS(T1) < TS(T2)
Suppose that transaction T1 tries to lock data record R1, but is not able to lock because R1 is locked by some other transaction T2 with
a conflicting lock. Rules followed by these schemes are as follows
Wait-Die – If TS (T1) < TS (T2), then T1 is allowed to wait, otherwise abort T1 and restart it later with the same timestamp. Wound-
Wait – If TS (T1) < TS (T2), then abort T1 and restart it later with same timestamp; otherwise T1 is allowed to wait
In wait-die, older transaction is allowed to wait on younger transaction, whereas a younger transaction requesting lock on record R1
held by an older transaction is aborted and restarted. The wound-wait approach does the opposite; a younger transaction is allowed to
wait on an older one, whereas an older transaction requesting lock on record R1 held by an younger transaction preempts the younger
transaction by aborting it. Both schemes end up aborting the younger of the two transactions that may be involved in a deadlock. In
wait-die, transactions wait only on younger transactions. In wound-wait, transactions wait only on older transactions. So no cycle is
created in both of these schemes avoiding deadlocks.
A simple way to detect a state of deadlock is for the system to construct and maintain a “wait-for” graph TODO-Diagram and
explanation
If the system is in a state of deadlock, some of the transactions causing deadlocks must be aborted. Either application or DBMS should
select one of the transactions involved in deadlock for rollback to get the system out of deadlock situation. This selection algorithm
should consider avoiding transactions that are running for long time and transactions that have performed many updates. The best
transactions to be aborted are the SELECT or read only transactions.
Larger the data item size, lower the degree of concurrency. For example if the data item size is a ‘Table’ denoted by Table1, a
transaction T1 that needs to lock a record X must lock the whole table Table1 that contains record X because the lock is associated
with the whole data item, Table1. If another transaction T2 wants to lock a different record Y of Table1, it is forced to wait till T1
releases the lock on Table1. If the data item size is single record, then transaction T2 would be able to proceed, because it would lock
different data item.
Smaller the data item size, more the number of items in the database. Because every item is associated with a lock, the system will
have a larger number of active locks. More lock and unlock operations will be performed, causing a higher overhead. In addition,
more storage space is required for storing these locks.
For large transactions, which access many records, coarse granularity should be used and for small transactions, which access small
number of records, fine granularity should be used.
Database
Table
Disk Block or Memory Page
Record
Record Field
Since the best granularity size depends on the given transaction, DBMS should support multiple level so granularity and allows the
transaction to pick any level it wants.
Scenario 1:
Transaction T1 wants to update all records in Table1. It will request for exclusive lock on Table1. This is beneficial for T1 than
acquiring 20 locks for each data record. Now suppose, another transaction T2 wants to read record R5 from page P1, then T2 would
request a shared record level lock on R5. DBMS will now check for the compatibility of the requested lock with already held locks.
One way to verify this is to traverse the tree from leaf R5 to root DB1 and check for conflicting locks.
Scenario 2:
Transaction T1 wants to read record R5 from page P1, and then T2 would request a shared record level lock on R5. Now suppose,
another transaction T2 wants to update all records in Table1, so it will request exclusive lock on Table1. DBMS will now check for
the compatibility of the requested lock with already held locks. For this it needs to check all locks at page level and record level to
ensure that there are not conflicting locks.
For both the above scenarios, traversal-based lock conflict detection is very inefficient and would defeat the purpose of having
multiple granularity locking.
New types of locks are introduced to make the multiple granularity locking efficient. The idea behind intention locks is for a
transaction to indicate, along the path from the root to the desired node, what type of lock it will require from one of the node’s
descendants.
Indicates that this node is locked in shared mode and exclusive lock will be requested on some descendant node
Compatibility Table
Mode IS IX S SIX X IS Yes Yes Yes Yes No IX Yes Yes No No No S Yes No Yes No No SIX Yes No No No No X No No No No
No
Locking protocol
TODO::Example illustrating above protocol and compatibility table: Refer 445 of Elmasri
A more general technique, called predicate locking would lock access to all records that satisfy a predicate or where condition.
Predicate locks have proved to be difficult to implement effectively.
1. ReadTS(X) – The read timestamp of data item X; this is the largest timestamp among all the timestamps of the transactions
that have successfully read the item X.
2. WriteTS(X) – The write timestamp of data item X; this is the largest timestamp among all the timestamps of the transactions
that have successfully modified the item X.
Whenever some transaction T tries to issue readItem(X) or writeItem(X), the algorithm should compare the timestamp of T with
ReadTS(X) and WriteTS(X) to ensure that the timestamp order of the transaction execution is not violated. If this order is violated,
then transaction T is aborted and resubmitted to the system as a new transaction with a new timestamp. If T is aborted, then any
transaction T1 that may have used a value written by T must also be aborted. Similarly any transaction T2 that may have used a value
written by T1 must also be aborted and so on. This effect is known as cascading rollback and is one of the biggest problems associated
with this scheme.
Transaction T issues writeItem(X) operation: If readTS(X) > TS(T), or writeTS(T) > TS(T), then abort T, else execute
writeItem(X) of T and set writeTS(X) to TS(T)
Transaction T issues readItem(X) operation: If writeTS(T) > TS(T), then abort T, else execute readItem(X) of T and set
readTS(X) to largest of TS(T) and current readTS(X)
Whenever the basic timestamp ordering algorithm, detects two conflicting operations that occur in the incorrect order, it rejects the
later of the two operations by aborting the transaction that issued it. The schedules produced by this algorithm are guaranteed to be
conflict serializable, like the 2 phase locking protocol.
Read Phase – A transaction can read values of committed data items from the database. However updates are applied only to
local copies of the data items kept in transaction workspace.
Validation Phase - Checking is performed to ensure that serializability will not be violated if the transaction updates are
applied to the database.
Write Phase – Transaction updates are applied to the database if the validation phase says that it is serializable. Otherwise the
updates are discarded and the transaction is restarted.
This protocol suits well incase of minimal interference between transaction on data items. If the interference is more, then transactions
will be restarted often. This technique is called ‘optimistic’ because they assume that little interference will occur and hence that there
is no need to do checking during transaction execution.