Understanding Database Transactions and Properties
Understanding Database Transactions and Properties
Transaction
● The transaction is a set of logically related operations. It contains a group of tasks.
● A transaction is an action or series of actions. It is performed by a single user to
perform operations for accessing the contents of the database.
Example: Suppose an employee of a bank transfers Rs 800 from X's account to Y's account.
This small transaction contains several low-level tasks:
X's Account
1. Open_Account(X)
2. Old_Balance = [Link]
3. New_Balance = Old_Balance - 800
4. [Link] = New_Balance
5. Close_Account(X)
Y's Account
1. Open_Account(Y)
2. Old_Balance = [Link]
3. New_Balance = Old_Balance + 800
4. [Link] = New_Balance
5. Close_Account(Y)
Operations of Transaction:
Following are the main operations of transaction:
Read(X): Read operation is used to read the value of X from the database and stores it in a buffer in main
memory.
Write(X): Write operation is used to write the value back to the database from the buffer.
Let's take an example to debit transaction from an account which consists of following operations:
1. 1. R(X);
2. 2. X = X - 500;
3. 3. W(X);
Let's assume the value of X before starting the transaction is 4000.
● The first operation reads X's value from the database and stores it in a buffer.
● The second operation will decrease the value of X by 500. So the buffer will
contain 3500.
● The third operation will write the buffer's value to the database. So X's final value
will be 3500.
But it may be possible that because of the failure of hardware, software or power, etc. that
transaction may fail before finishing all the operations in the set.
For example: If in the above transaction, the debit transaction fails after executing operation 2 then X's
value will remain 4000 in the database which is not acceptable by the bank.
To solve this problem, we have two important operations:
Commit: It is used to save the work done permanently.
Rollback: It is used to undo the work done.
Transaction property
The transaction has the four properties. These are used to maintain consistency in a database,
before and after the transaction.
Property of Transaction
1. Atomicity
2. Consistency
3. Isolation
4. Durability
Atomicity
● It states that all operations of the transaction take place at once if not, the
transaction is aborted.
● There is no midway, i.e., the transaction cannot occur partially. Each
transaction is treated as one unit and either run to completion or is not
executed at all.
Atomicity involves the following two operations:
Abort
If a transaction aborts then all the changes made are not visible.
Commit: If a transaction commits then all the changes made are visible.
Example: Let's assume that following transaction T consisting of T1 and T2. A
consists of Rs 600 and B consists of Rs 300. Transfer Rs 100 from account A to
account B.
T1 T2
Read(A) Read(B)
Write(A) Write(B)
States of Transaction
In a database, the transaction can be in one of the following states
Active state
● The active state is the first state of every transaction. In this state, the
transaction is being executed.
● For example: Insertion or deletion or updating a record is done here. But all the
records are still not saved to the database.
Partially committed
● In the partially committed state, a transaction executes its final operation, but the
data is still not saved to the database.
● In the total mark calculation example, a final display of the total marks step is
executed in this state.
Committed
A transaction is said to be in a committed state if it executes all its operations successfully.
In this state, all the effects are now permanently saved on the database system.
Failed state
● If any of the checks made by the database recovery system fails, then the
transaction is said to be in the failed state.
● In the example of total mark calculation, if the database is not able to fire a query
to fetch the marks, then the transaction will fail to execute.
Aborted
● If any of the checks fail and the transaction has reached a failed state then the
database recovery system will make sure that the database is in its previous
consistent state. If not then it will abort or roll back the transaction to bring the
database into a consistent state.
● If the transaction fails in the middle of the transaction then before executing the
transaction, all the executed transactions are rolled back to its consistent state.
● After aborting the transaction, the database recovery module will select one of the
two operations:
1. Re-start the transaction
2. Kill the transaction
Schedule
A series of operations from one transaction to another transaction is known as schedule. It
is used to preserve the order of the operation in each of the individual transactions.
1. Serial Schedule
The serial schedule is a type of schedule where one transaction is executed completely
before starting another transaction. In the serial schedule, when the first transaction
completes its cycle, then the next transaction is executed.
For example: Suppose there are two transactions T1 and T2 which have some operations. If it
has no interleaving of operations, then there are the following two possible outcomes:
1. Execute all the operations of T1 which was followed by all the operations of T2.
2. Execute all the operations of T1 which was followed by all the operations of T2.
● In the given (a) figure, Schedule A shows the serial schedule where T1
followed by T2.
● In the given (b) figure, Schedule B shows the serial schedule where T2
followed by T1.
2. Non-serial Schedule
● If interleaving of operations is allowed, then there will be a non-serial
schedule.
● It contains many possible orders in which the system can execute the
individual operations of the transactions.
● In the given figure (c) and (d), Schedule C and Schedule D are the non-serial
schedules. It has interleaving operations.
The Non-Serial Schedule can be divided further into Serializable and Non-
Serializable.
For example: Consider the below diagram where two transactions TX and TY, are performed on
the same account A where the balance of account A is $300.
● At time t1, transaction TX reads the value of account A, i.e., $300 (only read).
● At time t2, transaction TX deducts $50 from account A that becomes $250 (only
deducted and not updated/written).
● Alternately, at time t3, transaction TY reads the value of account A that will be
$300 only because TX didn't update the value yet.
● At time t4, transaction TY adds $100 to account A that becomes $400 (only added
but not updated/write).
● At time t6, transaction TX writes the value of account A that will be updated as
$250 only, as TY didn't update the value yet.
● Similarly, at time t7, transaction T Y writes the values of account A, so it will write as
done at time t4 that will be $400. It means the value written by TX is lost, i.e., $250 is
lost.
Hence data becomes incorrect, and the database sets to inconsistent.
Thus, in order to maintain consistency in the database and avoid such problems that take place
in concurrent execution, management is needed, and that is where the concept of Concurrency
Control comes into play.
Concurrency Control
Concurrency Control is the working concept that is required for controlling and managing
the concurrent execution of database operations and thus avoiding the inconsistencies in the
database. Thus, for maintaining the concurrency of the database, we have the concurrency
control protocols.
2. Exclusive lock:
● In the exclusive lock, the data item can be both read as well as written by the
transaction.
● This lock is exclusive, and in this lock, multiple transactions do not modify the same
data simultaneously.
The following way shows how unlocking and locking work with 2-PL.
Transaction T1:
● Growing phase: from step 1-3
● Shrinking phase: from step 5-7
● Lock point: at 3
Transaction T2:
● Growing phase: from step 2-6
● Shrinking phase: from step 8-9
● Lock point: at 6
5. Strict Two-phase locking (Strict-2PL)
● The first phase of Strict-2PL is similar to 2PL. In the first phase, after
acquiring all the locks, the transaction continues to execute normally.
● The only difference between 2PL and strict 2PL is that Strict-2PL does not
release a lock after using it.
● Strict-2PL waits until the whole transaction to commit, and then it releases all the
locks at a time.
● Strict-2PL protocol does not have a shrinking phase of lock release.
● TS protocol ensures freedom from deadlock that means no transaction ever waits.
● But the schedule may not be recoverable and may not even be cascade- free.
The optimistic approach is based on the assumption that the majority of database operations do not
conflict. The optimistic approach requires neither locking nor time stamping techniques. Instead, a
transaction is executed without restrictions until it is committed. Using an optimistic approach, each
transaction moves through two or three phases, referred to as read, validation, and write.
• During the read phase, the transaction reads the database, executes the needed computations, and makes
the updates to a private copy of the database values. All update operations of the transaction are recorded in
a temporary update file, which is not accessed by the remaining transactions.
• During the validation phase, the transaction is validated to ensure that the changes made will not affect
the integrity and consistency of the database. If the validation test is positive, the transaction goes to the
write phase. If the alidation test is negative, the transaction is restarted and the changes are discarded.
• During the write phase, the changes are permanently applied to the database.
The ANSI SQL standard (1992) defines transaction management based on transaction isolation levels.
Transaction isolation levels refer to the degree to which transaction data is “protected or isolated” from other
concurrent transactions.
The types of read operations are:
• Dirty read: a transaction can read data that is not yet committed.
• Non repeatable read: a transaction reads a given row at time t1, and then it reads the same row at time t2,
yielding different results. The original row may have been updated or deleted.
• Phantom read: a transaction executes a query at time t1, and then it runs the same query at time t2, yielding
additional rows that satisfy the query.
Read Uncommitted will read uncommitted data from other transactions. At this isolation level, the database
does not place any locks on the data, which increases transaction performance but at the cost of data
consistency.
Read Committed forces transactions to read only committed data. This is the default mode of operation for
most databases (including Oracle and SQL Server). At this level, the database will use exclusive locks on
data, causing other transactions to wait until the original transaction commits.
The Repeatable Read isolation level ensures that queries return consistent results. This type of isolation level
uses shared locks to ensure other transactions do not update a row after the original query reads it. However,
new rows are read (phantom read) as these rows did not exist when the first query ran. The Serializable
isolation level is the most restrictive level defined by the ANSI SQL standard
The isolation level of a transaction is defined in the transaction statement, for example using general ANSI
SQL syntax:
BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED
… SQL STATEMENTS….
COMMIT TRANSACTION;
Critical events can cause a database to stop working and compromise the integrity of the data. Examples of
critical events are:
• Hardware/software failures. A failure of this type could be a hard disk media failure, a bad capacitor on a
motherboard, or a failing memory bank. Other causes of errors under this category include application
program or operating system errors that cause data to be overwritten, deleted, or lost. Some database
administrators argue that this is one of the most common sources of database problems.
• Human-caused incidents. This type of event can be categorized as unintentional or intentional.
-- An unintentional failure is caused by a careless end user. Such errors include deleting the wrong rows
from a table, pressing the wrong key on the keyboard, or shutting down the main database server by
accident.
-- Intentional events are of a more severe nature and normally indicate that the company data are at
seriousrisk. Under this category are security threats caused by hackers trying to gain unauthorized access to
data resources and virus attacks caused by disgruntled employees trying to compromise the database
operation and damage the company.
• Natural disasters. This category includes fires, earthquakes, floods, and power failures.
Transaction Recovery
Before continuing, examine four important concepts that affect the recovery process:
• The write-ahead-log protocol ensures that transaction logs are always written before any database data are
actually updated. This protocol ensures that, in case of a failure, the database can later be recovered to a
consistent state using the data in the transaction log.
• Redundant transaction logs (several copies of the transaction log) ensure that a physical disk failure will
not impair the DBMS’s ability to recover data.
• Database buffers are temporary storage areas in primary memory used to speed up disk operations. To
improve processing time, the DBMS software reads the data from the physical disk and stores a copy of it on
a “buffer” in primary memory. When a transaction updates data, it actually updates the copy of the data in
the buffer because that process is much faster than accessing the physical disk every time. Later, all buffers
that contain updated data are written to a physical disk during a single operation, thereby saving significant
processing time.
• Database checkpoints are operations in which the DBMS writes all of its updated buffers in memory (also
known as dirty buffers) to disk. While this is happening, the DBMS does not execute any other requests.
A checkpoint operation is also registered in the transaction log.
When the recovery procedure uses a deferred-write technique (also called a deferred update), the transaction
operations do not immediately update the physical database. Instead,
When the recovery procedure uses a write-through technique (also called an immediate update), the database
is immediately updated by transaction operations during the transaction’s execution, even before the
transaction reaches its commit point. If the transaction aborts before it reaches its commit point, a
ROLLBACK or undo operation needs to be done to restore the database to a consistent state. In that case, the
ROLLBACK operation will use the transaction log “before” values. The recovery process follows these
steps:
1. Identify the last checkpoint in the transaction log. This is the last time transaction data were physically
saved to disk.
2. For a transaction that started and was committed before the last checkpoint, nothing needs to be done
because the data are already saved.
3. For a transaction that was committed after the last checkpoint, the DBMS redoes the transaction, using the
“after” values of the transaction log. Changes are applied in ascending order, from oldest to newest.
4. For any transaction that had a ROLLBACK operation after the last checkpoint or that was left active (with
neither a COMMIT nor a ROLLBACK) before the failure occurred, the DBMS uses the transaction log
records to ROLLBACK or undo the operations, using the “before” values in the transaction log. Changes are
applied in reverse order, from newest to oldest
Studocu is not sponsored or endorsed by any college or university
DATABASE PERFOMANCE TUNING & QUERY OPTIMIZATION
One of the main functions of a database system is to provide timely answers to end users. End
users interact with the DBMS through the use of queries to generate information, using the
following sequence:
1. The end-user (client-end) application generates a query.
2. The query is sent to the DBMS (server end).
3. The DBMS (server end) executes the query.
4. The DBMS sends the resulting data set to the end-user (client-end) application.
The goal of database performance is to execute queries as fast as possible. Therefore, database
performance must be closely monitored and regularly tuned.
Database performance tuning
Refers to a set of activities and procedures designed to reduce the response time of the database
system—that is, to ensure that an end-user query is processed by the DBMS in the minimum
amount of time.
The time required by a query to return a result set depends on many factors. Those factors tend to
be wide-ranging and to vary from environment to environment and from vendor to vendor. The
performance of a typical DBMS is constrained by three main factors:
CPU processing power
Available primary memory (RAM),
Input/output (hard disk and network) throughput.
Database performance-tuning activities can be divided into those taking place on the client side
and those taking place on the server side.
On the client side, the objective is to generate a SQL query that returns the correct answer in the
least amount of time, using the minimum amount of resources at the server end. The activities
required to achieve that goal are commonly referred to as SQL performance tuning.
On the server side, the DBMS environment must be properly configured to respond to clients’
requests in the fastest way possible, while making optimum use of existing resources. The
activities required to achieve that goal are commonly referred to as DBMS performance tuning.
DBMS Architecture
The architecture of a DBMS is represented by the processes and structures (in memory and in
permanent storage) used to manage a database. Such processes collaborate with one another to
perform specific functions.
Basic DBMS Architecture
The following are some of the processes of a typical DBMS
Listener- The listener process listens for clients’ requests and handles the processing of the
SQL requests to other DBMS processes. Once a request is received, the listener passes the
request to the appropriate user process.
User- The DBMS creates a user process to manage each client session. Therefore, when you
log on to the DBMS, you are assigned a user process. This process handles all requests you
submit to the server. There are many user processes—at least one per each logged-in client.
Scheduler- The scheduler process organizes the concurrent execution of SQL requests.
Lock manager- This process manages all locks placed on database objects, including disk
pages. Optimizer- The optimizer process analyzes SQL queries and finds the most efficient
way to access the data.
The data cache or buffer cache is a shared, reserved memory area that stores the most recently
accessed data blocks in RAM. The data cache is where the data read from the database data
files are stored after the data have been read or before the data are written to the database data
files. The data cache also caches system catalog data and the contents of the indexes.
The SQL cache, or procedure cache, is a shared, reserved memory area that stores the
most recently executed SQL statements or PL/SQL procedures, including triggers and
functions. The SQL cache does not store the end-user-written SQL. Rather, the SQL cache
stores a “processed” version of the SQL that is ready for execution by the DBMS.
Database Statistics
Refers to a number of measurements about database objects, such as number of processors used
processor speed, and temporary space available. Such statistics give a snapshot of database
characteristics.
The DBMS uses these statistics to make critical decisions about improving query processing
efficiency. Database statistics can be gathered manually by the DBA or automatically by the
DBMS. For example, many DBMS vendors support the ANALYZE command in SQL to
gather statistics.
Example:
ANALYZE TABLE STUDENT COMPUTE STATISTICS;
UPDATE STATISTICS STUDENT;
When you generate statistics for a table, all related indexes are also analyzed. However, you
could generate statistics for a single index by using the following command:
ANALYZE INDEX STU_NDX COMPUTE STATISTICS;
UPDATE STATISTICS STUDENT STU_NDX;
QUERY PROCESSING
The DBMS processes a query in three phases:
1. Parsing. The DBMS parses the SQL query and chooses the most
efficient access/execution plan.
2. Execution. The DBMS executes the SQL query using the chosen execution plan.
3. Fetching. The DBMS fetches the data and sends the result set back to the client.
The processing of SQL DDL statements (such as CREATE TABLE) is different from the
processing required by DML statements. The difference is that a DDL statement actually
updates the data dictionary tables or system catalog, while a DML statement (SELECT,
INSERT, UPDATE, and DELETE) mostly manipulates end-user data.
QUERY PROCESSING BOTTLENECKS
The execution of a query requires the DBMS to break down the query into a series of interdependent
I/O operations to be executed in a collaborative manner. The more complex a query is, the more
complex the operations are, and the more likely it is that there will be bottlenecks.
A query processing bottleneck is a delay introduced in the processing of an I/O operation that causes
the overall system to slow down. In the same way, the more components a system has, the more
interfacing among the components is required, and the more likely it is that there will be bottlenecks.
Within a DBMS, there are five components that typically cause bottlenecks:
a. CPU. The CPU processing power of the DBMS should match the system’s expected work load. A
high CPU utilization might indicate that the processor speed is too slow for the amount of work
performed. However, heavy CPU utilization can be caused by other factors, such as a defective
component, not enough RAM (the CPU spends too much time swapping memory blocks), a badly
written device driver, or a rogue process. A CPU bottleneck will affect not only the DBMS but all
processes running in the system.
b. RAM. The DBMS allocates memory for specific usage, such as data cache and SQL cache. RAM
must be shared among all running processes (operating system, DBMS, and all other running
processes). If there is not enough RAM available, moving data among components that are competing
for scarce RAM can create a bottleneck.
c. Hard disk. Another common cause of bottlenecks is hard disk speed and data transfer rates.
Current hard disk storage technology allows for greater storage capacity than in the past; however,
hard disk space is used for more than just storing end-user data. Current operating systems also use
the hard disk for virtual memory, which refers to copying areas of RAM to the hard disk as needed to
make room in RAM for more urgent tasks.
Therefore, the greater the hard disk storage space and the faster the data transfer rates, the less the
likelihood of bottlenecks.
d. Network. In a database environment, the database server and the clients are connected via a
network. All networks have a limited amount of bandwidth that is shared among all clients. When
many network nodes access the network at the same time, bottlenecks are likely.
e. Application code. Not all bottlenecks are caused by limited hardware resources. One of the most
common sources of bottlenecks is badly written application code. No amount of coding will make a
poorly designed database perform better. We should also add: you can throw unlimited resources at a
badly written application, and it will still perform as a badly written application!
Indexes and Query Optimization
A use they facilitate searching, sorting, and using aggregate functions and even join operations. The
improvement in Indexes are crucial in speeding up data access bec data access speed occurs because
an index is an ordered set of values that contains the index key and pointers
Most DBMSs implement indexes using one of the following data structures:
• Hash index. A hash index is based on an ordered list of hash values. A hash algorithm is used to create a
hash value from a key column. This value points to an entry in a hash table, which in turn points to the actual
location of the data row. This type of index is good for simple and fast lookup operations based on equality
conditions—for example, LNAME=“Scott” and FNAME=“Shannon”.
• B-tree index. The B-tree index is an ordered data structure organized as an upside-down tree.) The index
tree is stored separately from the data. The lower-level leaves of the B-tree index contain the pointers to the
actual data rows. B-tree indexes are “self-balanced,” which means that it takes approximately the same
amount of time to access any given row in the index. This is the default and most common type of index
used in databases. The B-tree index is used mainly in tables in which column values repeat a relatively small
number of times.
• Bitmap index. A bitmap index uses a bit array (0s and 1s) to represent the existence of a value or
condition. These indexes are used mostly in data warehouse applications in tables with a large number of
rows in which a small number of column values repeat many times. (See Figure 11.4.) Bitmap indexes tend
to use less space than B-tree indexes because they use bits instead of bytes to store their data.
Query Formulation
Business operations became global; with this change, competition expanded from the shop on the next
corner to the web store in cyberspace.
• Customer demands and market needs favored an on-demand transaction style, mostly based on web-based
services.
• Rapid social and technological changes fueled by low-cost, smart mobile devices increased the demand for
complex and fast networks to interconnect them. As a consequence, corporations have increasingly adopted
advanced network technologies as the platform for their computerized solutions. See Chapter 14, Database
Connectivity and Web Technologies, for a discussion of cloud-based services.
• Data realms are converging in the digital world more frequently. As a result, applications must manage
multiple types of data, such as voice, video, music, and images. Such data tend to be geographically
distributed and remotely accessed from diverse locations via location-aware mobile devices.
• Businesses are looking for new ways to gain business intelligence through the analysis of vast stores of
structured and unstructured data.
DDBMS Advantages and Disadvantages
Distributed processing does not require a distributed database, but a distributed database requires distributed
processing. (Each database fragment is managed by its own local database process.)
• Distributed processing may be based on a single database located on a single computer. For the
management of distributed data to occur, copies or parts of the database processing functions must be
distributed to all data storage sites.
• Both distributed processing and distributed databases require a network of interconnected components.
A fully distributed database management system must perform all of the functions of a centralized DBMS,
as follows:
1. Receive the request of an application or end user.
2. Validate, analyze, and decompose the request. The request might include mathematical and logical
operations such as the following: Select all customers with a balance greater than $1,000. The request might
require data from only a single table, or it might require access to several tables.
3. Map the request’s logical-to-physical data components.
4. Decompose the request into several disk I/O operations.
5. Search for, locate, read, and validate the data.
6. Ensure database consistency, security, and integrity.
7. Validate the data for the conditions, if any, specified by the request.
8. Present the selected data in the required format.