0% found this document useful (0 votes)
35 views32 pages

Understanding Database Transactions and Properties

Uploaded by

yasothapriya.m
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views32 pages

Understanding Database Transactions and Properties

Uploaded by

yasothapriya.m
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

UNIT IV ADVANCED DATABASE CONCEPTS

Advanced Database Concepts: Transaction Management and Concurrency Control – Concurrency


Control – Concurrency Control with locking methods – Concurrency Control with Optimistic
Methods – ANSI levels of Transaction Isolation – Database Recovery management. – Database
Performance Turning and Query Optimization – Query Processing – Indexes and Query Optimization
– SQL Performance Tuning – Query Formulation – DBMS Performance Tuning – Distributed
Database Management System.

Transaction
● The transaction is a set of logically related operations. It contains a group of tasks.
● A transaction is an action or series of actions. It is performed by a single user to
perform operations for accessing the contents of the database.

Example: Suppose an employee of a bank transfers Rs 800 from X's account to Y's account.
This small transaction contains several low-level tasks:
X's Account
1. Open_Account(X)
2. Old_Balance = [Link]
3. New_Balance = Old_Balance - 800
4. [Link] = New_Balance
5. Close_Account(X)

Y's Account
1. Open_Account(Y)
2. Old_Balance = [Link]
3. New_Balance = Old_Balance + 800
4. [Link] = New_Balance
5. Close_Account(Y)

Operations of Transaction:
Following are the main operations of transaction:
Read(X): Read operation is used to read the value of X from the database and stores it in a buffer in main
memory.
Write(X): Write operation is used to write the value back to the database from the buffer.
Let's take an example to debit transaction from an account which consists of following operations:
1. 1. R(X);
2. 2. X = X - 500;
3. 3. W(X);
Let's assume the value of X before starting the transaction is 4000.

● The first operation reads X's value from the database and stores it in a buffer.
● The second operation will decrease the value of X by 500. So the buffer will
contain 3500.
● The third operation will write the buffer's value to the database. So X's final value
will be 3500.
But it may be possible that because of the failure of hardware, software or power, etc. that
transaction may fail before finishing all the operations in the set.
For example: If in the above transaction, the debit transaction fails after executing operation 2 then X's
value will remain 4000 in the database which is not acceptable by the bank.
To solve this problem, we have two important operations:
Commit: It is used to save the work done permanently.
Rollback: It is used to undo the work done.
Transaction property
The transaction has the four properties. These are used to maintain consistency in a database,
before and after the transaction.
Property of Transaction
1. Atomicity
2. Consistency
3. Isolation
4. Durability

Atomicity
● It states that all operations of the transaction take place at once if not, the
transaction is aborted.
● There is no midway, i.e., the transaction cannot occur partially. Each
transaction is treated as one unit and either run to completion or is not
executed at all.
Atomicity involves the following two operations:

Abort
If a transaction aborts then all the changes made are not visible.
Commit: If a transaction commits then all the changes made are visible.
Example: Let's assume that following transaction T consisting of T1 and T2. A
consists of Rs 600 and B consists of Rs 300. Transfer Rs 100 from account A to
account B.

T1 T2

Read(A) Read(B)

A:= A-100 Y:= Y+100

Write(A) Write(B)

After completion of the transaction, A consists of Rs 500 and B consists of Rs 400.


If the transaction T fails after the completion of transaction T1 but before completion of
transaction T2, then the amount will be deducted from A but not added to B. This shows the
inconsistent database state. In order to ensure correctness of database state, the transaction
must be executed in entirety.
Consistency
● The integrity constraints are maintained so that the database is consistent before
and after the transaction.
● The execution of a transaction will leave a database in either its prior stable state or
a new stable state.
● The consistent property of the database states that every transaction sees a
consistent database instance.
● The transaction is used to transform the database from one consistent state to
another consistent state.
For example: The total amount must be maintained before or after the transaction.
1. Total before T occurs = 600+300=900
2. Total after T occurs= 500+400=900
Therefore, the database is consistent. In the case when T1 is completed but T2 fails, then
inconsistency will occur.
Isolation
● It shows that the data which is used at the time of execution of a transaction cannot
be used by the second transaction until the first one is completed.
● In isolation, if the transaction T1 is being executed and using the data item X, then
that data item can't be accessed by any other transaction T2 until the transaction T1
ends.
● The concurrency control subsystem of the DBMS enforced the isolation
property.
Durability
● The durability property is used to indicate the performance of the database's
consistent state. It states that the transaction made the permanent changes.
● They cannot be lost by the erroneous operation of a faulty transaction or by the
system failure. When a transaction is completed, then the database reaches a state
known as the consistent state. That consistent state cannot be lost, even in the event
of a system's failure.
● The recovery subsystem of the DBMS has the responsibility of Durability
property.

States of Transaction
In a database, the transaction can be in one of the following states

Active state
● The active state is the first state of every transaction. In this state, the
transaction is being executed.
● For example: Insertion or deletion or updating a record is done here. But all the
records are still not saved to the database.

Partially committed
● In the partially committed state, a transaction executes its final operation, but the
data is still not saved to the database.
● In the total mark calculation example, a final display of the total marks step is
executed in this state.
Committed
A transaction is said to be in a committed state if it executes all its operations successfully.
In this state, all the effects are now permanently saved on the database system.

Failed state
● If any of the checks made by the database recovery system fails, then the
transaction is said to be in the failed state.
● In the example of total mark calculation, if the database is not able to fire a query
to fetch the marks, then the transaction will fail to execute.

Aborted
● If any of the checks fail and the transaction has reached a failed state then the
database recovery system will make sure that the database is in its previous
consistent state. If not then it will abort or roll back the transaction to bring the
database into a consistent state.
● If the transaction fails in the middle of the transaction then before executing the
transaction, all the executed transactions are rolled back to its consistent state.
● After aborting the transaction, the database recovery module will select one of the
two operations:
1. Re-start the transaction
2. Kill the transaction

Schedule
A series of operations from one transaction to another transaction is known as schedule. It
is used to preserve the order of the operation in each of the individual transactions.

1. Serial Schedule
The serial schedule is a type of schedule where one transaction is executed completely
before starting another transaction. In the serial schedule, when the first transaction
completes its cycle, then the next transaction is executed.
For example: Suppose there are two transactions T1 and T2 which have some operations. If it
has no interleaving of operations, then there are the following two possible outcomes:
1. Execute all the operations of T1 which was followed by all the operations of T2.
2. Execute all the operations of T1 which was followed by all the operations of T2.
● In the given (a) figure, Schedule A shows the serial schedule where T1
followed by T2.
● In the given (b) figure, Schedule B shows the serial schedule where T2
followed by T1.

2. Non-serial Schedule
● If interleaving of operations is allowed, then there will be a non-serial
schedule.
● It contains many possible orders in which the system can execute the
individual operations of the transactions.
● In the given figure (c) and (d), Schedule C and Schedule D are the non-serial
schedules. It has interleaving operations.
The Non-Serial Schedule can be divided further into Serializable and Non-
Serializable.

DBMS Concurrency Control


Concurrency Control is the management procedure that is required for controlling
concurrent execution of the operations that take place on a database. But before knowing
about concurrency control, we should know about concurrent execution.
Concurrent Execution in DBMS
● In a multi-user system, multiple users can access and use the same database at one
time, which is known as the concurrent execution of the database. It means that the
same database is executed simultaneously on a multi-user system by different users.
● While working on the database transactions, there occurs the requirement of using
the database by multiple users for performing different operations, and in that case,
concurrent execution of the database is performed.
● The thing is that the simultaneous execution that is performed should be done in an
interleaved manner, and no operation should affect the other executing operations,
thus maintaining the consistency of the database. Thus, on making the concurrent
execution of the transaction operations, there occur several challenging problems that
need to be solved.

Problems with Concurrent Execution


In a database transaction, the two main operations are READ and WRITE operations. So,
there is a need to manage these two operations in the concurrent execution of the
transactions as if these operations are not performed in an interleaved manner, and the data
may become inconsistent. So, the following problems occur with the Concurrent Execution
of the operations:

Problem 1: Lost Update Problems (W - W Conflict)


The problem occurs when two different database transactions perform the read/write
operations on the same database items in an interleaved manner (i.e., concurrent execution)
that makes the values of the items incorrect hence making the database inconsistent.

For example: Consider the below diagram where two transactions TX and TY, are performed on
the same account A where the balance of account A is $300.

● At time t1, transaction TX reads the value of account A, i.e., $300 (only read).
● At time t2, transaction TX deducts $50 from account A that becomes $250 (only
deducted and not updated/written).
● Alternately, at time t3, transaction TY reads the value of account A that will be
$300 only because TX didn't update the value yet.
● At time t4, transaction TY adds $100 to account A that becomes $400 (only added
but not updated/write).
● At time t6, transaction TX writes the value of account A that will be updated as
$250 only, as TY didn't update the value yet.
● Similarly, at time t7, transaction T Y writes the values of account A, so it will write as
done at time t4 that will be $400. It means the value written by TX is lost, i.e., $250 is
lost.
Hence data becomes incorrect, and the database sets to inconsistent.

Dirty Read Problems (W-R Conflict)


The dirty read problem occurs when one transaction updates an item of the database, and
somehow the transaction fails, and before the data gets rollback, the updated database item
is accessed by another transaction. There comes the Read-Write Conflict between both
transactions.
For example: Consider two transactions TX and TY in the below diagram performing read/write
operations on account A where the available balance in account A is $300:

● At time t1, transaction TX reads the value of account A, i.e., $300.


● At time t2, transaction TX adds $50 to account A that becomes $350.
● At time t3, transaction TX writes the updated value in account A, i.e., $350.
● Then at time t4, transaction TY reads account A that will be read as $350.
● Then at time t5, transaction TX rollbacks due to server problem, and the value
changes back to $300 (as initially).
● But the value for account A remains $350 for transaction TY as committed, which
is the dirty read and therefore known as the Dirty Read Problem.

Unrepeatable Read Problem (W-R Conflict)


Also known as Inconsistent Retrievals Problem that occurs when in a transaction, two
different values are read for the same database item.
For example: Consider two transactions, TX and TY, performing the read/write operations on
account A, having an available balance = $300. The diagram is shown below:
● At time t1, transaction TX reads the value from account A, i.e., $300.
● At time t2, transaction TY reads the value from account A, i.e., $300.
● At time t3, transaction TY updates the value of account A by adding $100 to the
available balance, and then it becomes $400.
● At time t4, transaction TY writes the updated value, i.e., $400.
● After that, at time t5, transaction TX reads the available value of account A, and
that will be read as $400.
● It means that within the same transaction TX, it reads two different values of account
A, i.e., $ 300 initially, and after updation made by transaction T Y, it reads $400. It is
an unrepeatable read and is therefore known as the Unrepeatable read problem.

Thus, in order to maintain consistency in the database and avoid such problems that take place
in concurrent execution, management is needed, and that is where the concept of Concurrency
Control comes into play.
Concurrency Control
Concurrency Control is the working concept that is required for controlling and managing
the concurrent execution of database operations and thus avoiding the inconsistencies in the
database. Thus, for maintaining the concurrency of the database, we have the concurrency
control protocols.

Concurrency Control Protocols


The concurrency control protocols ensure the atomicity, consistency, isolation,
durability and serializability of the concurrent execution of the database transactions.
Therefore, these protocols are categorized as:
● Lock Based Concurrency Control Protocol
● Timestamp Concurrency Control Protocol
● Multi version concurrency control
● Validation Based / Optimistic Concurrency Control Protocol
Lock-Based Protocol
In this type of protocol, any transaction cannot read or write data until it acquires an
appropriate lock on it. There are two types of lock:
1. Shared lock:
● It is also known as a Read-only lock. In a shared lock, the data item can only read by
the transaction.
● It can be shared between the transactions because when the transaction holds a
lock, then it can't update the data on the data item.

2. Exclusive lock:
● In the exclusive lock, the data item can be both read as well as written by the
transaction.
● This lock is exclusive, and in this lock, multiple transactions do not modify the same
data simultaneously.

There are four types of lock protocols available:


1. Simplistic lock protocol
It is the simplest way of locking the data while transaction. Simplistic lock-based protocols
allow all the transactions to get the lock on the data before insert or delete or update on it. It
will unlock the data item after completing the transaction.
2. Pre-claiming Lock Protocol
● Pre-claiming Lock Protocols evaluate the transaction to list all the data items on
which they need locks.
● Before initiating an execution of the transaction, it requests DBMS for all the lock
on all those data items.
● If all the locks are granted then this protocol allows the transaction to begin. When
the transaction is completed then it releases all the lock.
● If all the locks are not granted then this protocol allows the transaction to rolls
back and waits until all the locks are granted.
3. Two-phase locking (2PL)
● The two-phase locking protocol divides the execution phase of the transaction into
three parts.
● In the first part, when the execution of the transaction starts, it seeks
permission for the lock it requires.
● In the second part, the transaction acquires all the locks. The third phase is started
as soon as the transaction releases its first lock.
● In the third phase, the transaction cannot demand any new locks. It only releases
the acquired locks.

4. Two-phase locking (2PL)


● The two-phase locking protocol divides the execution phase of the transaction into
three parts.
● In the first part, when the execution of the transaction starts, it seeks
permission for the lock it requires.
● In the second part, the transaction acquires all the locks. The third phase is started
as soon as the transaction releases its first lock.
● In the third phase, the transaction cannot demand any new locks. It only releases
the acquired locks.
There are two phases of 2PL:
Growing phase: In the growing phase, a new lock on the data item may be acquired by the
transaction, but none can be released.
Shrinking phase: In the shrinking phase, existing lock held by the transaction may be
released, but no new locks can be acquired.
In the below example, if lock conversion is allowed then the following phase can happen:
1. Upgrading of lock (from S(a) to X (a)) is allowed in the growing phase.
2. Downgrading of lock (from X(a) to S(a)) must be done in the shrinking phase.

The following way shows how unlocking and locking work with 2-PL.

Transaction T1:
● Growing phase: from step 1-3
● Shrinking phase: from step 5-7
● Lock point: at 3

Transaction T2:
● Growing phase: from step 2-6
● Shrinking phase: from step 8-9
● Lock point: at 6
5. Strict Two-phase locking (Strict-2PL)
● The first phase of Strict-2PL is similar to 2PL. In the first phase, after
acquiring all the locks, the transaction continues to execute normally.
● The only difference between 2PL and strict 2PL is that Strict-2PL does not
release a lock after using it.
● Strict-2PL waits until the whole transaction to commit, and then it releases all the
locks at a time.
● Strict-2PL protocol does not have a shrinking phase of lock release.

It does not have cascading abort as 2PL does.


Timestamp Ordering Protocol
● The Timestamp Ordering Protocol is used to order the transactions based on their
Timestamps. The order of transaction is nothing but the ascending order of the
transaction creation.
● The priority of the older transaction is higher that's why it executes first. To determine
the timestamp of the transaction, this protocol uses system time or logical counter.
● The lock-based protocol is used to manage the order between conflicting pairs among
transactions at the execution time. But Timestamp based protocols start working as
soon as a transaction is created.
● Let's assume there are two transactions T1 and T2. Suppose the transaction T1 has
entered the system at 007 times and transaction T2 has entered the system at 009
times. T1 has the higher priority, so it executes first as it is entered the system first.
● The timestamp ordering protocol also maintains the timestamp of the last 'read'
and 'write' operation on a data.
Basic Timestamp ordering protocol works as follows:
1. Check the following condition whenever a transaction Ti issues a Read (X) operation:
● If W_TS(X) >TS(Ti) then the operation is rejected.
● If W_TS(X) <= TS(Ti) then the operation is executed.
● Timestamps of all the data items are updated.
2. Check the following condition whenever a transaction Ti issues a Write(X) operation:
● If TS(Ti) < R_TS(X) then the operation is rejected.
● If TS(Ti) < W_TS(X) then the operation is rejected and Ti is rolled back
otherwise the operation is executed.
Where,
TS(TI) denotes the timestamp of the transaction Ti. R_TS(X)
denotes the Read time-stamp of data-item X. W_TS(X)
denotes the Write time-stamp of data-item X.
Advantages and Disadvantages of TO protocol:
● TO protocol ensures serializability since the precedence graph is as follows:

● TS protocol ensures freedom from deadlock that means no transaction ever waits.
● But the schedule may not be recoverable and may not even be cascade- free.

Concurrency Control with Optimistic Methods

The optimistic approach is based on the assumption that the majority of database operations do not
conflict. The optimistic approach requires neither locking nor time stamping techniques. Instead, a
transaction is executed without restrictions until it is committed. Using an optimistic approach, each
transaction moves through two or three phases, referred to as read, validation, and write.
• During the read phase, the transaction reads the database, executes the needed computations, and makes
the updates to a private copy of the database values. All update operations of the transaction are recorded in
a temporary update file, which is not accessed by the remaining transactions.
• During the validation phase, the transaction is validated to ensure that the changes made will not affect
the integrity and consistency of the database. If the validation test is positive, the transaction goes to the
write phase. If the alidation test is negative, the transaction is restarted and the changes are discarded.
• During the write phase, the changes are permanently applied to the database.

ANSI Levels of Transaction Isolation

The ANSI SQL standard (1992) defines transaction management based on transaction isolation levels.
Transaction isolation levels refer to the degree to which transaction data is “protected or isolated” from other
concurrent transactions.
The types of read operations are:
• Dirty read: a transaction can read data that is not yet committed.
• Non repeatable read: a transaction reads a given row at time t1, and then it reads the same row at time t2,
yielding different results. The original row may have been updated or deleted.
• Phantom read: a transaction executes a query at time t1, and then it runs the same query at time t2, yielding
additional rows that satisfy the query.

Read Uncommitted will read uncommitted data from other transactions. At this isolation level, the database
does not place any locks on the data, which increases transaction performance but at the cost of data
consistency.
Read Committed forces transactions to read only committed data. This is the default mode of operation for
most databases (including Oracle and SQL Server). At this level, the database will use exclusive locks on
data, causing other transactions to wait until the original transaction commits.
The Repeatable Read isolation level ensures that queries return consistent results. This type of isolation level
uses shared locks to ensure other transactions do not update a row after the original query reads it. However,
new rows are read (phantom read) as these rows did not exist when the first query ran. The Serializable
isolation level is the most restrictive level defined by the ANSI SQL standard
The isolation level of a transaction is defined in the transaction statement, for example using general ANSI
SQL syntax:
BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED
… SQL STATEMENTS….
COMMIT TRANSACTION;

Database Recovery Management


Database recovery restores a database from a given state (usually inconsistent) to a previously consistent
state. Recovery techniques are based on the atomic transaction property: all portions of the transaction must
be treated as a single, logical unit of work in which all operations are applied and completed to produce a
consistent database.

Critical events can cause a database to stop working and compromise the integrity of the data. Examples of
critical events are:
• Hardware/software failures. A failure of this type could be a hard disk media failure, a bad capacitor on a
motherboard, or a failing memory bank. Other causes of errors under this category include application
program or operating system errors that cause data to be overwritten, deleted, or lost. Some database
administrators argue that this is one of the most common sources of database problems.
• Human-caused incidents. This type of event can be categorized as unintentional or intentional.
-- An unintentional failure is caused by a careless end user. Such errors include deleting the wrong rows
from a table, pressing the wrong key on the keyboard, or shutting down the main database server by
accident.
-- Intentional events are of a more severe nature and normally indicate that the company data are at
seriousrisk. Under this category are security threats caused by hackers trying to gain unauthorized access to
data resources and virus attacks caused by disgruntled employees trying to compromise the database
operation and damage the company.
• Natural disasters. This category includes fires, earthquakes, floods, and power failures.

Transaction Recovery
Before continuing, examine four important concepts that affect the recovery process:
• The write-ahead-log protocol ensures that transaction logs are always written before any database data are
actually updated. This protocol ensures that, in case of a failure, the database can later be recovered to a
consistent state using the data in the transaction log.
• Redundant transaction logs (several copies of the transaction log) ensure that a physical disk failure will
not impair the DBMS’s ability to recover data.
• Database buffers are temporary storage areas in primary memory used to speed up disk operations. To
improve processing time, the DBMS software reads the data from the physical disk and stores a copy of it on
a “buffer” in primary memory. When a transaction updates data, it actually updates the copy of the data in
the buffer because that process is much faster than accessing the physical disk every time. Later, all buffers
that contain updated data are written to a physical disk during a single operation, thereby saving significant
processing time.
• Database checkpoints are operations in which the DBMS writes all of its updated buffers in memory (also
known as dirty buffers) to disk. While this is happening, the DBMS does not execute any other requests.
A checkpoint operation is also registered in the transaction log.
When the recovery procedure uses a deferred-write technique (also called a deferred update), the transaction
operations do not immediately update the physical database. Instead,
When the recovery procedure uses a write-through technique (also called an immediate update), the database
is immediately updated by transaction operations during the transaction’s execution, even before the
transaction reaches its commit point. If the transaction aborts before it reaches its commit point, a
ROLLBACK or undo operation needs to be done to restore the database to a consistent state. In that case, the
ROLLBACK operation will use the transaction log “before” values. The recovery process follows these
steps:
1. Identify the last checkpoint in the transaction log. This is the last time transaction data were physically
saved to disk.
2. For a transaction that started and was committed before the last checkpoint, nothing needs to be done
because the data are already saved.
3. For a transaction that was committed after the last checkpoint, the DBMS redoes the transaction, using the
“after” values of the transaction log. Changes are applied in ascending order, from oldest to newest.
4. For any transaction that had a ROLLBACK operation after the last checkpoint or that was left active (with
neither a COMMIT nor a ROLLBACK) before the failure occurred, the DBMS uses the transaction log
records to ROLLBACK or undo the operations, using the “before” values in the transaction log. Changes are
applied in reverse order, from newest to oldest
Studocu is not sponsored or endorsed by any college or university
DATABASE PERFOMANCE TUNING & QUERY OPTIMIZATION
One of the main functions of a database system is to provide timely answers to end users. End
users interact with the DBMS through the use of queries to generate information, using the
following sequence:
1. The end-user (client-end) application generates a query.
2. The query is sent to the DBMS (server end).
3. The DBMS (server end) executes the query.
4. The DBMS sends the resulting data set to the end-user (client-end) application.
The goal of database performance is to execute queries as fast as possible. Therefore, database
performance must be closely monitored and regularly tuned.
Database performance tuning
Refers to a set of activities and procedures designed to reduce the response time of the database
system—that is, to ensure that an end-user query is processed by the DBMS in the minimum
amount of time.
The time required by a query to return a result set depends on many factors. Those factors tend to
be wide-ranging and to vary from environment to environment and from vendor to vendor. The
performance of a typical DBMS is constrained by three main factors:
 CPU processing power
 Available primary memory (RAM),
 Input/output (hard disk and network) throughput.
Database performance-tuning activities can be divided into those taking place on the client side
and those taking place on the server side.
On the client side, the objective is to generate a SQL query that returns the correct answer in the
least amount of time, using the minimum amount of resources at the server end. The activities
required to achieve that goal are commonly referred to as SQL performance tuning.
On the server side, the DBMS environment must be properly configured to respond to clients’
requests in the fastest way possible, while making optimum use of existing resources. The
activities required to achieve that goal are commonly referred to as DBMS performance tuning.
DBMS Architecture
The architecture of a DBMS is represented by the processes and structures (in memory and in
permanent storage) used to manage a database. Such processes collaborate with one another to
perform specific functions.
Basic DBMS Architecture
The following are some of the processes of a typical DBMS
Listener- The listener process listens for clients’ requests and handles the processing of the
SQL requests to other DBMS processes. Once a request is received, the listener passes the
request to the appropriate user process.
User- The DBMS creates a user process to manage each client session. Therefore, when you
log on to the DBMS, you are assigned a user process. This process handles all requests you
submit to the server. There are many user processes—at least one per each logged-in client.
Scheduler- The scheduler process organizes the concurrent execution of SQL requests.
Lock manager- This process manages all locks placed on database objects, including disk
pages. Optimizer- The optimizer process analyzes SQL queries and finds the most efficient
way to access the data.

The data cache or buffer cache is a shared, reserved memory area that stores the most recently
accessed data blocks in RAM. The data cache is where the data read from the database data
files are stored after the data have been read or before the data are written to the database data
files. The data cache also caches system catalog data and the contents of the indexes.

The SQL cache, or procedure cache, is a shared, reserved memory area that stores the
most recently executed SQL statements or PL/SQL procedures, including triggers and
functions. The SQL cache does not store the end-user-written SQL. Rather, the SQL cache
stores a “processed” version of the SQL that is ready for execution by the DBMS.

Database Statistics
Refers to a number of measurements about database objects, such as number of processors used
processor speed, and temporary space available. Such statistics give a snapshot of database
characteristics.
The DBMS uses these statistics to make critical decisions about improving query processing
efficiency. Database statistics can be gathered manually by the DBA or automatically by the
DBMS. For example, many DBMS vendors support the ANALYZE command in SQL to
gather statistics.
Example:
ANALYZE TABLE STUDENT COMPUTE STATISTICS;
UPDATE STATISTICS STUDENT;
When you generate statistics for a table, all related indexes are also analyzed. However, you
could generate statistics for a single index by using the following command:
ANALYZE INDEX STU_NDX COMPUTE STATISTICS;
UPDATE STATISTICS STUDENT STU_NDX;
QUERY PROCESSING
The DBMS processes a query in three phases:
1. Parsing. The DBMS parses the SQL query and chooses the most
efficient access/execution plan.
2. Execution. The DBMS executes the SQL query using the chosen execution plan.
3. Fetching. The DBMS fetches the data and sends the result set back to the client.
The processing of SQL DDL statements (such as CREATE TABLE) is different from the
processing required by DML statements. The difference is that a DDL statement actually
updates the data dictionary tables or system catalog, while a DML statement (SELECT,
INSERT, UPDATE, and DELETE) mostly manipulates end-user data.
QUERY PROCESSING BOTTLENECKS
The execution of a query requires the DBMS to break down the query into a series of interdependent
I/O operations to be executed in a collaborative manner. The more complex a query is, the more
complex the operations are, and the more likely it is that there will be bottlenecks.
A query processing bottleneck is a delay introduced in the processing of an I/O operation that causes
the overall system to slow down. In the same way, the more components a system has, the more
interfacing among the components is required, and the more likely it is that there will be bottlenecks.
Within a DBMS, there are five components that typically cause bottlenecks:
a. CPU. The CPU processing power of the DBMS should match the system’s expected work load. A
high CPU utilization might indicate that the processor speed is too slow for the amount of work
performed. However, heavy CPU utilization can be caused by other factors, such as a defective
component, not enough RAM (the CPU spends too much time swapping memory blocks), a badly
written device driver, or a rogue process. A CPU bottleneck will affect not only the DBMS but all
processes running in the system.
b. RAM. The DBMS allocates memory for specific usage, such as data cache and SQL cache. RAM
must be shared among all running processes (operating system, DBMS, and all other running
processes). If there is not enough RAM available, moving data among components that are competing
for scarce RAM can create a bottleneck.
c. Hard disk. Another common cause of bottlenecks is hard disk speed and data transfer rates.
Current hard disk storage technology allows for greater storage capacity than in the past; however,
hard disk space is used for more than just storing end-user data. Current operating systems also use
the hard disk for virtual memory, which refers to copying areas of RAM to the hard disk as needed to
make room in RAM for more urgent tasks.
Therefore, the greater the hard disk storage space and the faster the data transfer rates, the less the
likelihood of bottlenecks.
d. Network. In a database environment, the database server and the clients are connected via a
network. All networks have a limited amount of bandwidth that is shared among all clients. When
many network nodes access the network at the same time, bottlenecks are likely.
e. Application code. Not all bottlenecks are caused by limited hardware resources. One of the most
common sources of bottlenecks is badly written application code. No amount of coding will make a
poorly designed database perform better. We should also add: you can throw unlimited resources at a
badly written application, and it will still perform as a badly written application!
Indexes and Query Optimization
A use they facilitate searching, sorting, and using aggregate functions and even join operations. The
improvement in Indexes are crucial in speeding up data access bec data access speed occurs because
an index is an ordered set of values that contains the index key and pointers
Most DBMSs implement indexes using one of the following data structures:
• Hash index. A hash index is based on an ordered list of hash values. A hash algorithm is used to create a
hash value from a key column. This value points to an entry in a hash table, which in turn points to the actual
location of the data row. This type of index is good for simple and fast lookup operations based on equality
conditions—for example, LNAME=“Scott” and FNAME=“Shannon”.
• B-tree index. The B-tree index is an ordered data structure organized as an upside-down tree.) The index
tree is stored separately from the data. The lower-level leaves of the B-tree index contain the pointers to the
actual data rows. B-tree indexes are “self-balanced,” which means that it takes approximately the same
amount of time to access any given row in the index. This is the default and most common type of index
used in databases. The B-tree index is used mainly in tables in which column values repeat a relatively small
number of times.
• Bitmap index. A bitmap index uses a bit array (0s and 1s) to represent the existence of a value or
condition. These indexes are used mostly in data warehouse applications in tables with a large number of
rows in which a small number of column values repeat many times. (See Figure 11.4.) Bitmap indexes tend
to use less space than B-tree indexes because they use bits instead of bytes to store their data.
Query Formulation

To formulate a query, you would normally follow these steps:


1. Identify what columns and computations are required. The first step is needed to clearly determine what
data values you want to return. Do you want to return just the names and addresses, or do you also want to
include some computations? Remember that all columns in the SELECT statement should return single
values.
a. Do you need simple expressions? For example, do you need to multiply the price by the quantity on hand
to generate the total inventory cost? You might need some single attribute functions such as DATE(),
SYSDATE(), or ROUND().
b. Do you need aggregate functions? If you need to compute the total sales by product, you should use a
GROUP BY clause. In some cases, you might need to use a subquery.
c. Determine the granularity of the raw data required for your output. Sometimes, you might need to
summarize data that are not readily available in any table. In such cases, you might consider breaking the
query into multiple subqueries and storing those subqueries as views. Then you could create a top-level
query that joins those views and generates the final output.
2. Identify the source tables. Once you know what columns are required, you can determine the source tables
used in the query. Some attributes appear in more than one table. In those cases, try to use the least number
of tables in your query to minimize the number of join operations.
3. Determine how to join the tables. Once you know what tables you need in your query statement, you must
properly identify how to join the tables. In most cases, you will use some type of natural join, but in some
instances, you might need to use an outer join.
4. Determine what selection criteria are needed. Most queries involve some type of selection criteria. In this
case, you must determine what operands and operators are needed in your criteria. Ensure that the data type
and granularity of the data in the comparison criteria are correct.
a. Simple comparison. In most cases, you will be comparing single values—for example, P_PRICE > 10.
b. Single value to multiple values. If you are comparing a single value to multiple values, you might need to
use an IN comparison operator—for example, V_STATE IN ('FL', 'TN', 'GA').
c. Nested comparisons. In other cases, you might need to have some nested selection criteria involving
subqueries - for example, P_PRICE >= (SELECT AVG(P_PRICE) FROM PRODUCT).
d. Grouped data selection. On other occasions, the selection criteria might apply not to the raw data but to
the aggregate data. In those cases, you need to use the HAVING clause.
5. Determine the order in which to display the output. Finally, the required output might be ordered by one
or more columns. In those cases, you need to use the ORDER BY clause. Remember that the ORDER BY
clause is one of the most resource-intensive operations for the DBMS.

DBMS Performance Tuning


DBMS performance tuning at the server end focuses on setting the parameters used for:
• Data cache. The data cache size must be set large enough to permit as many data requests as possible to be
serviced from the cache. Each DBMS has settings that control the size of the data cache; some DBMSs
might require a restart. This cache is shared among all database users. The majority of primary memory
resources will be allocated to the data cache.
• SQL cache. The SQL cache stores the most recently executed SQL statements (after the SQL statements
have been parsed by the optimizer). Generally, if you have an application with multiple users accessing a
database, the same query will likely be submitted by many different users. In those cases, the DBMS will
parse the query only once and execute it many times, using the same access plan. In that way, the second and
subsequent SQL requests for the same query are served from the SQL cache, skipping the parsing phase.
• Sort cache. The sort cache is used as a temporary storage area for ORDER BY or GROUP BY operations,
as well as for index-creation functions.
• Optimizer mode. Most DBMSs operate in one of two optimization modes: cost-based or rule-based. Others
automatically determine the optimization mode based on whether database statistics are available. For
example, the DBA is responsible for generating the database statistics that are used by the cost-based
optimizer. If the statistics are not available, the DBMS uses a rule-based optimizer.
Use RAID (redundant array of independent disks) to provide both performance improvement and fault
tolerance, and a balance between them. Fault tolerance means that in case of failure, data can be
reconstructed and retrieved. RAID systems use multiple disks to create virtual disks (storage volumes)
formed by several individual disks. Table 11.7 describes the most common RAID configurations
Distributed Database Management Systems
The Evolution of Distributed Database Management Systems
A distributed database management system (DDBMS) governs the storage and processing of logically
related data over interconnected computer systems in which both data and processing are distributed among
several sites. To understand how and why the DDBMS is different from the DBMS, it is useful to briefly
examine the changes in the business environment that set the stage for the development of the DDBMS.

Business operations became global; with this change, competition expanded from the shop on the next
corner to the web store in cyberspace.
• Customer demands and market needs favored an on-demand transaction style, mostly based on web-based
services.
• Rapid social and technological changes fueled by low-cost, smart mobile devices increased the demand for
complex and fast networks to interconnect them. As a consequence, corporations have increasingly adopted
advanced network technologies as the platform for their computerized solutions. See Chapter 14, Database
Connectivity and Web Technologies, for a discussion of cloud-based services.
• Data realms are converging in the digital world more frequently. As a result, applications must manage
multiple types of data, such as voice, video, music, and images. Such data tend to be geographically
distributed and remotely accessed from diverse locations via location-aware mobile devices.
• Businesses are looking for new ways to gain business intelligence through the analysis of vast stores of
structured and unstructured data.
DDBMS Advantages and Disadvantages

Distributed Processing and Distributed Databases


In distributed processing, a database’s logical processing is shared among two or more physically
independent sites that are connected through a network. For example, the data input/output (I/O), data
selection, and data validation might be performed on one computer, and a report based on that data might be
created on another computer.
A distributed database, on the other hand, stores a logically related database over two or more physically
independent sites. The sites are connected via a computer network. In contrast, the distributed processing
system uses only a single-site database but shares the processing chores among several sites. In a distributed
database system, a database is composed of several parts known as database fragments. The database
fragments are located at different sites and can be replicated among various sites. Each database fragment is,
in turn, managed by its local database process.

Distributed processing does not require a distributed database, but a distributed database requires distributed
processing. (Each database fragment is managed by its own local database process.)
• Distributed processing may be based on a single database located on a single computer. For the
management of distributed data to occur, copies or parts of the database processing functions must be
distributed to all data storage sites.
• Both distributed processing and distributed databases require a network of interconnected components.

Characteristics of Distributed Database Management Systems


Application interface to interact with the end user, application programs, and other DBMSs within the
distributed database
• Validation to analyze data requests for syntax correctness
• Transformation to decompose complex requests into atomic data request components
• Query optimization to find the best access strategy (which database fragments must be accessed by the
query, and how must data updates, if any, be synchronized?) Mapping to determine the data location of local
and remote fragments
• I/O interface to read or write data from or to permanent local storage
• Formatting to prepare the data for presentation to the end user or to an application program
• Security to provide data privacy at both local and remote databases
• Backup and recovery to ensure the availability and recoverability of the database in case of a failure
• DB administration features for the database administrator
• Concurrency control to manage simultaneous data access and to ensure data consistency across database
fragments in the DDBMS
• Transaction management to ensure that the data move from one consistent state to another; this activity
includes the synchronization of local and remote transactions as well as transactions across multiple
distributed segments

A fully distributed database management system must perform all of the functions of a centralized DBMS,
as follows:
1. Receive the request of an application or end user.
2. Validate, analyze, and decompose the request. The request might include mathematical and logical
operations such as the following: Select all customers with a balance greater than $1,000. The request might
require data from only a single table, or it might require access to several tables.
3. Map the request’s logical-to-physical data components.
4. Decompose the request into several disk I/O operations.
5. Search for, locate, read, and validate the data.
6. Ensure database consistency, security, and integrity.
7. Validate the data for the conditions, if any, specified by the request.
8. Present the selected data in the required format.

You might also like