Oracle Database 11g RAC Performance Tuning
Oracle Database 11g RAC Performance Tuning
Learning Objective
Graphic
In this example graph, the X-axis is Wait time and the Y-axis is CPU time. The
application is scalable when the CPU and Wait time are equal in proportion. When
the CPU time is more than the wait time, the application might need SQL tuning.
When the wait time is more than the CPU time, it needs instance or RAC tuning.
In such a case, no gain is achieved by adding CPUs or nodes.
Supplement
Selecting the link title opens the resource in a new browser window.
Style considerations
View more information on the style considerations for Oracle 11g Database used
in this course.
Launch window
Although the proportion of CPU time to wait time always tends to decrease as load on the
system increases, steep increases in wait time are a sign of contention and must be
addressed for good scalability.
Adding more CPUs to a node, or nodes to a cluster, would provide very limited benefit
under contention. Conversely, a system where the proportion of CPU time does not
decrease significantly as load increases can scale better, and would most likely benefit
from adding CPUs or Real Application Clusters or RAC instances if needed.
Note
Automatic Workload Repository or AWR reports display CPU time together with
wait time in the Top 5 Timed Events section, if the CPU time portion is among the
top five events.
Although there are specific tuning areas for RAC such as instance recovery and
interconnect traffic, you get most benefits by tuning your system like a single-instance
system. At least this must be your starting point. Obviously, if you have serialization
issues in a single-instance environment, these may be exacerbated with RAC.
You have basically the same tuning tools with RAC as with a single-instance system.
However, certain combinations of specific wait events and statistics are well-known RAC
tuning cases.
You see some of those specific combinations, as well as the RAC-specific information
that you can get from the Enterprise Manager performance pages and Statspack and
AWR reports. Finally, you see the RAC-specific information that you can get from the
Automatic Database Diagnostic Monitor or ADDM.
When an instance fails and the failure is detected by another instance, the second
instance performs these recovery steps.
Graphic
Recovery time process is divided into two phases. First phase contains steps 1
and 2 of the process, such as Remaster enqueue resources and Remaster cache
resources. In this phase, you use information for other caches and the LMS
recovers GRD. The second phase contains steps 3, 4, and 5, which are Build
recovery set, Resource claim, and Roll forward recovery set. In this phase, SMON
recovers the database and merges failed redo threads.
Remaster enqueue resources
During the first phase of recovery, Global Enqueue Services or GES remasters the
enqueues.
Remaster cache resources
The Global Cache Services or GCS remasters its resources. The GCS processes
remaster only those resources that lose their masters. During this time, all GCS resource
requests and write requests are temporarily suspended. However, transactions can
continue to modify data blocks as long as these transactions have already acquired the
necessary resources.
Build recovery set
After enqueues are reconfigured, one of the surviving instances can grab the Instance
Recovery enqueue. Therefore, at the same time as GCS resources are remastered,
SMON determines the set of blocks that need recovery. This set is called the recovery set.
Because, with Cache Fusion, an instance ships the contents of its blocks to the requesting
instance without writing the blocks to the disk, the on-disk version of the blocks may not
contain the changes that are made by either instance.
This implies that SMON needs to merge the content of all the online redo logs of each
failed instance to determine the recovery set. This is because one failed thread might
contain a hole in the redo that needs to be applied to a particular block. So redo threads of
failed instances cannot be applied serially. Also, redo threads of surviving instances are
not needed for recovery because SMON could use past or current images of their
corresponding buffer caches.
Resource claim
Buffer space for recovery is allocated and the resources that were identified in the previous
reading of the redo logs are claimed as recovery resources. This is done to avoid other
instances to access those resources.
Roll forward recovery set
All resources required for subsequent processing have been acquired and the Global
Resource Directory or GRD is now unfrozen. Any data blocks that are not in recovery can
now be accessed. Note that the system is already partially available. Then, assuming that
there are past images or current images of blocks to be recovered in other caches in the
cluster database, the most recent image is the starting point of recovery for these
particular blocks.
If neither the past image buffers nor the current buffer for a data block is in any of the
surviving instances' caches, then SMON performs a log merge of the failed instances.
SMON recovers and writes each block identified in step 3, releasing the recovery
resources immediately after block recovery so that more blocks become available as
recovery proceeds.
After all blocks have been recovered and the recovery resources have been released, the
system is again fully available.
In summary, the recovered database or the recovered portions of the database becomes
available earlier, and before the completion of the entire recovery sequence. This makes
the system available sooner and it makes recovery more scalable.
Note
Question
When an instance fails and the failure is detected by another instance, the second
instance performs a series of recovery steps. What is the first step performed by
the second instance?
Options:
1.
2.
3.
4.
Answer
Option 1: Incorrect. During the final step in instance recovery, all resources
required for subsequent processing have been acquired and the Global Resource
Directory (GRD) is unfrozen. Any data blocks that are not in recovery can then be
accessed.
Option 2: Correct. During the first step in instance recovery, the Global Enqueue
Services (GES) remasters the enqueues. Once this step has been performed, one
of the surviving instances can grab the Instance Recovery enqueue, GCS
resources are remastered, and SMON determines the set of blocks that need
recovery.
Option 3: Incorrect. During the fourth step performed during instance recovery,
buffer space for recovery is allocated and the resources that were identified in the
previous reading of the redo logs are claimed as recovery resources. This is done
to avoid other instances accessing those resources.
Option 4: Incorrect. During the second step in instance recovery, the Global
Cache Services (GCS) remasters its resources. The GCS processes remaster
only those resources that lose their masters. During this time, all GCS resource
requests and write requests are temporarily suspended. However, transactions
can continue to modify data blocks as long as these transactions have already
acquired the necessary resources.
Correct answer(s):
2. The GES remasters the enqueues
The example illustrates the degree of database availability during each step of Oracle
instance recovery.
Graphic
The example for Oracle instance recovery contains a database availability ranging
from None, Partial, and Full with increasing Elapsed time.
A
Real Application Clusters is running on multiple nodes.
B
Node failure is detected.
C
The enqueue part of the GRD is reconfigured; resource management is redistributed to the
surviving nodes. This operation occurs relatively quickly.
D
The cache part of the GRD is reconfigured and SMON reads the redo log of the failed
instance to identify the database blocks that it needs to recover.
E
SMON issues the GRD requests to obtain all the database blocks it needs for recovery.
After the requests are complete, all other blocks are accessible.
F
The Oracle server performs roll forward recovery. Redo logs of the failed threads are
applied to the database, and blocks are available right after their recovery is completed.
G
The Oracle server performs rollback recovery. Undo blocks are applied to the database for
all uncommitted transactions.
H
Instance recovery is complete and all data is accessible.
The dotted steps represent the ones identified in the RAC and Instance or Crash
Recovery process. The dashed line represents the blocks identified while remastering
cache resources in this process.
In a single-instance environment, the instance startup combined with the crash recovery
time is controlled by the setting of the FAST_START_MTTR_TARGET initialization
parameter. You can set its value if you want incremental checkpointing to be more
aggressive than autotune checkpointing. However, this is at the expense of a much
higher I/O overhead.
In a RAC environment, including the startup time of the instance in this calculation is
useless because one of the surviving instances is doing the recovery.
In a RAC environment, it is possible to monitor the estimated target (in seconds) for the
duration from the start of instance recovery to the time when GCD is open for lock
requests for blocks that are not needed for recovery. This estimation is published in the
V$INSTANCE_RECOVERY view through the ESTD_CLUSTER_AVAILABLE_TIME column.
Basically, you can monitor the time your cluster is frozen during instance recovery
situations.
In a RAC environment, the FAST_START_MTTR_TARGET initialization parameter is used
to bind the entire instance recovery time, assuming that it is instance recovery for singleinstance death.
Note
If you really want to have small instance recovery time by setting
FAST_START_MTTR_TARGET, you can safely ignore the alert log messages about
raising its value.
Here are some guidelines you can use to make sure that instance recovery in your RAC
environment is faster:
use parallel instance recovery
Use parallel instance recovery by setting RECOVERY_PARALLISM.
increase PARALLEL_EXECUTION_MESSAGE_SIZE
Increase PARALLEL_EXECUTION_MESSAGE_SIZE from its default of 2,148 bytes to 4 KB
or 8 KB. This should provide better recovery slave performance.
set PARALLEL_MIN_SERVERS
Set PARALLEL_MIN_SERVERS to CPU_COUNT-1. This will prespawn recovery slaves at
startup time.
use asynchronous I/O, and
Using asynchronous I/O is one of the most crucial factors in recovery time. The first-pass
log read uses asynchronous I/O.
increase default buffer cache size
Instance recovery uses 50 percent of the default buffer cache for recovery buffers. If this is
not enough, some of the steps of instance recovery will be done in several passes. You
should be able to identify such situations by looking at your alert.log file. In that case,
you should increase the size of your default buffer cache.
Summary
When the wait time of a system is more than the CPU time, it needs instance or RAC
tuning. Specific wait events, system and enqueue statistics, Enterprise Manager
performance pages, Statspack and AWR reports, and ADDM reports help in RAC specific
tuning.
The recovery time in RAC and instance or crash recovery processes is divided into two
phases. The first phase includes remastering enqueue and cache resources. The second
phase includes building recovery set, resource claim, and roll forward of recovery set.
Database availability ranges from none to full during the Oracle instance recovery
process. Follow simple but important guidelines to ensure fast instance recovery in your
RAC environment.
the Global Cache Services statistics for current and cr blocks such as gc current blocks received
and gc cr blocks received and
the Global Cache Services wait events for gc current block 3-way, gc cr grant 2-way, and so on
The response time for Cache Fusion transfers is determined by the messaging time and
processing time imposed by the physical interconnect components, the IPC protocol, and
the GCS protocol.
It is not affected by disk input/output or I/O factors other than occasional log writes. The
Cache Fusion protocol does not require I/O to data files in order to guarantee cache
coherency, and RAC inherently does not cause any more I/O to disk than a nonclustered
instance.
In a RAC AWR report, the Global Cache and Enqueue Services - Workload
Characteristics table in the RAC Statistics section contains average times (latencies) for
some Global Cache Services and Global Enqueue Services operations.
Those latencies should be monitored over time, and significant increases in their values
should be investigated. The table presents some typical values, based on empirical
observations. The factors that may cause variations to those latencies are as follows:
Graphic
The Global Cache and Enqueue Services - Workload Characteristics table has
two columns. The first column includes 11 services and the second column
includes a value for each service. The 11 services in the first column are Avg
global enqueue get time (ms), Avg global cache cr block receive time (ms), Avg
global cache current block receive time (ms), Avg global cache cr block build time
(ms), Avg global cache cr block send time (ms), Global cache log flushes for cr
blocks served %, Avg global cache cr block flush time (ms), Avg global cache
current block pin time (ms), Avg global cache current block send time (ms), Global
cache log flushes for current blocks served %, and Avg global cache current block
flush time (ms).
utilization of the IPC protocol, user-mode IPC protocols are faster, but only Tru64s RDG is
recommended for use
scheduling delays, when the system is under high CPU utilization, and
Graphic
The table includes four columns AWR Report Latency Name, Lower Bound,
Typical, and Upper Bound.The latency names are Average time to process cr
block request, Avg global cache cr block receive time (ms), Average time to
process current block request, and Avg global cache current block receive time
(ms). The values for Lower Bound, Typical, and Upper Bound are provided for
each latency.
Note
The time to process consistent read or CR block request in the cache corresponds
to
(build time + flush time + send time), and the time to process current block
request in the cache corresponds to (pin time + flush time + send time).
Analyzing what sessions are waiting for is an important method to determine where time
is spent. In RAC, the wait time is attributed to an event that reflects the exact outcome of
a request.
For example, when a session on an instance is looking for a block in the global cache, it
does not know whether it will receive the data cached by another instance or whether it
will receive a message to read from disk.
The wait events for the global cache convey precise information and wait for global cache
blocks or messages. They are mainly categorized by the following:
temporarily represented by a placeholder event that is active while waiting for a block, and
Graphic
There are seven views.
The V$SYSTEM_EVENT view displays total waits for an event. The
V$SESSION_WAIT_CLASS view waits for a wait event class by a session. The
V$SESSION_EVENT view waits for an event by a session. The
V$ACTIVE_SESSION_HISTORY view displays activity of recent active sessions.
The V$SESSION_WAIT_HISTORY view displays last 10 wait events for each
active session. The V$SESSION_WAIT view displays events for which active
sessions are waiting. The V$SQLSTATS views identify SQL statements impacted
by interconnect latencies.
For most of the global cache wait events, the parameters include file number, block
number, the block class, and access mode dispositions, such as mode held and
requested.
The wait times for events presented and aggregated in these views are very useful when
debugging response time performance issues. Note that the time waited is cumulative,
and that the event with the highest score is not necessarily a problem.
However, if the available CPU power cannot be maximized, or response times for an
application are too high, the top wait events provide valuable performance diagnostics.
Note
Use the CLUSTER_WAIT_TIME column in V$SQLSTATS to identify SQL
statements impacted by interconnect latencies, or run an ADDM report on the
corresponding AWR snapshot.
Question
Which wait event view contains a CLUSTER_WAIT_TIME column that can be used
to identify SQL statements impacted by interconnect latencies?
Options:
1.
V$SQLSTATS
2.
V$SESSION_WAIT
3.
V$SYSTEM_EVENT
4.
V$SESSION_WAIT_HISTORY
Answer
Option 1: Correct. Use the CLUSTER_WAIT_TIME column in V$SQLSTATS to
identify SQL statements impacted by interconnect latencies, or run an ADDM
report on the corresponding AWR snapshot.
Option 2: Incorrect. This wait event view shows events for which active sessions
are waiting.
Option 3: Incorrect. This wait event view shows total waits for an event.
Option 4: Incorrect. This wait event view shows last 10 wait events for each active
session.
Correct answer(s):
1. V$SQLSTATS
When SGA2 receives the request, its local LGWR process may need to flush some
recovery information to its local redo log files. For example, if the cached block is
frequently changed and the changes have not been logged yet, LMS would have to ask
LGWR to flush the log before it can ship the block. This may add a delay to the serving of
the block and may show up in the requesting node as a busy wait.
step 3
Then, SGA2 sends the requested block to SGA1. When the block arrives in SGA1, the wait
event is complete and is reflected as gc current block 2-way.
Using the notation R = time at requestor, W = wire time and transfer delay, and S = time at
server, the total time for a round-trip would be: R(send) + W(small msg) + S(process msg,
process block, send) + W(block) + R(receive block).
This is a modified scenario for a cluster with more than two nodes. It is very similar to the
2-way Block Request. However, the master for this block is on a node that is different
from that of the requestor, and where the block is cached. Thus, the request must be
forwarded.
There is an additional delay for one message and the processing at the master node:
R(send) + W(small msg) + S(process msg, send) + W(small msg) + S(process msg,
process block, send) + W(block) + R(receive block).
While a remote read is pending, any process on the requesting instance that is trying to
write or read the data cached in the buffer has to wait for a gc buffer busy. The buffer
remains globally busy until the block arrives.
In this scenario, a grant message is sent by the master because the requested block is
not cached in any instance. If the local instance is the resource master, the grant happens
immediately. If not, the grant is always 2-way, regardless of the number of instances in
the cluster.
The grant messages are small. For every block read from the disk, a grant has to be
received before the I/O is initiated, which adds the latency of the grant round-trip to the
disk latency: R(send) + W(small msg) + S(process msg, send) + W(small msg) + R(receive
block).
The round-trip looks similar to a 2-way block round-trip, with the difference that the wire
time is determined by a small message, and the processing does not involve the buffer
cache.
An enqueue wait is not RAC specific, but involves a global lock operation when RAC is
enabled.
Most of the global requests for enqueues are synchronous, and foreground processes
wait for them. Therefore, contention on enqueues in RAC is more visible than in single-
instance environments. The following are the enqueue types for which most waits for
enqueues occur.
TX
Transaction enqueue; used for transaction demarcation and tracking.
TM
Table or partition enqueue; used to protect table definitions during DML operations.
HW
High-water mark enqueue; acquired to synchronize a new block operation.
SQ
Sequence enqueue; used to serialize incrementing of an Oracle sequence number.
TA
Enqueue used mainly for transaction recovery as part of instance recovery.
US
Undo segment enqueue; mainly used by the Automatic Undo Management (AUM) feature.
In the case of all the enqueue types, the waits are synchronous and may constitute
serious serialization points that can be exacerbated in a RAC environment.
Note
In Oracle Database 10g, the enqueue wait events specify the resource name and
a reason for the wait for example, TX Enqueue index block split. This makes
diagnostics of enqueue waits easier.
Using system statistics based on V$SYSSTAT enables characterization of the database
activity based on averages. It is the basis for many metrics and ratios used in various
tools and methods, such as AWR, Statspack, and Database Control.
In order to drill down to individual sessions or groups of sessions, V$SESSTAT is useful
when the important session identifiers to monitor are known. Its usefulness is enhanced if
an application fills in the MODULE and ACTION columns in V$SESSION.
V$SEGMENT_STATISTICS is useful for RAC because it also tracks the number of CR
and current blocks received by the object.
The RAC-relevant statistics can be grouped into the following categories:
Global Cache Service statistics such as gc cr blocks received, gc cr block receive time, and so on
Global Enqueue Service statistics such as global enqueue gets and so on, and
Statistics for messages sent such as gcs messages sent and ges messages sent
V$ENQUEUE_STATISTICS can be queried to determine which enqueue has the highest
impact on database service times and eventually response times.
V$INSTANCE_CACHE_TRANSFER indicates how many current and CR blocks per block
class are received from each instance, including how many transfers incurred a delay.
Summary
In a RAC AWR report you monitor typical latencies of some Global Cache Services and
Global Enqueue Services operations for significant increases in their values. Factors
causing variations include utilization of the IPC protocol, scheduling delays, and log
flushes. Other RAC latencies in AWR reports are mostly derived from
V$GES_STATISTICS and may be useful for debugging purposes, but do not require
frequent monitoring.
There are many global cache wait events but for dynamic remastering, two events are of
the most practical importance: gc remaster and gc quiesce. To manage and monitor
database activities and sessions, you can use session and system statistics such as,
V$SYSSTAT, V$SEGMENT_STATISTICS, V$ENQUEUE_STATISTICS, and
V$INSTANCE_CACHE_TRANSFER.
cache misses, an attempt is first made to find the data in another cache, based on the
assumption that the chance that another instance has cached the block is high.
Resize and tune the buffer cache
Hash partitioning may help to reduce buffer busy contention by making buffer access
distribution patterns sparser, enabling more buffers to be available for concurrent access.
Use ASSM
Automatic Segment Space Management can provide instance affinity to table blocks.
Increase sequence caches
Increasing sequence caches improves instance affinity to index keys deriving their values
from sequences. That technique may result in significant performance gains for multiinstance insert-intensive applications.
Reduce interinstance traffic with partitioning
Range or list partitioning may be very effective in conjunction with data-dependent routing,
if the workload can be directed to modify a particular range of values from a particular
instance.
Avoid unnecessary parsing
In RAC, library cache and row cache operations are globally coordinated. So excessive
parsing means additional interconnect traffic. Library cache locks are heavily used, in
particular by applications using PL/SQL or Advanced Queuing. Library cache locks are
acquired in exclusive mode whenever a package or procedure has to be recompiled.
Minimize locking usage
Because transaction locks are globally coordinated, they also deserve special attention in
RAC. For example, using tables instead of Oracle sequences to generate unique numbers
is not recommended because it may cause severe contention even for a single instance
system.
Remove unselective indexes
Indexes that are not selective do not improve query performance, but can degrade DML
performance. In RAC, unselective index blocks may be subject to interinstance contention,
increasing the frequency of cache transfers for indexes belonging to INSERT-intensive
tables.
Configure interconnect properly
Always verify that you use a private network for your interconnect, and that your private
network is configured properly. Ensure that a network link is operating in full duplex mode.
Ensure that your network interface and Ethernet switches support MTU size of 9 KB. Note
that a single GBE can scale up to ten thousand 8-KB blocks per second before saturation.
Question
2.
3.
4.
Answer
Option 1: Incorrect. Hash partitioning may help to reduce buffer busy contention
by making buffer access distribution patterns sparser, enabling more buffers to be
available for concurrent access.
Option 2: Correct. Increasing sequence caches improves instance affinity to index
keys deriving their values from sequences. That technique may result in significant
performance gains for multi-instance, insert-intensive applications.
Option 3: Incorrect. Range or list partitioning may be very effective in conjunction
with data-dependent routing, if the workload can be directed to modify a particular
range of values from a particular instance.
Option 4: Incorrect. Try to avoid long full-table scans to minimize GCS requests.
The overhead caused by the global CR requests in this scenario is due to the fact
that when queries result in local cache misses, an attempt is first made to find the
data in another cache, based on the assumption that the chance that another
instance has cached the block is high.
Correct answer(s):
2. increasing sequence caches
In application systems where the loading or batch processing of data is a dominant
business function, there may be performance issues affecting response times because of
the high volume of data inserted into indexes. Depending on the access frequency and
the number of processes concurrently inserting data, indexes can become hot spots and
contention can be exacerbated by these three factors.
First, ordered, monotonically increasing key values in the index (right-growing trees).
Second, frequent leaf block splits, and third, low tree depth which means all leaf block
access go through the root block. A leaf or branch block split can become an important
serialization point if the particular leaf block or branch of the tree is concurrently
accessed.
The tables sum up the most common symptoms associated with the splitting of index
blocks, listing wait events and statistics that are commonly elevated when index block
splits are prevalent. As a general recommendation, to alleviate the performance impact of
globally hot index blocks and leaf block splits, a more uniform, less skewed distribution of
the concurrency in the index tree should be the primary objective.
The primary objective can be achieved by the following actions:
Graphic
There are two tables. The first table contains only one column named Wait events.
The wait events listed in this column are enq: TX - indexcontention, gc buffer busy,
gc current block busy, and gc current split.
The second table also contains only one column named as System statistics. The
system statistics listed in this column are Leaf node splits, Branch node splits,
Exchange deadlocks, gcs refuse xid, gcs ast xid, and Service ITL waits.
increasing the sequence cache, if the key value is derived from a sequence
Graphic
An index leaf block consists of 500 rows. The block has instance 1 and instance 2.
Instance 1 inserts values 1...50000 in RAC01. Instance 2 inserts values
50001...100000 in RAC02.
For example, suppose that an index key value is generated by a CACHE NOORDER
sequence and each index leaf block can hold 500 rows. If the sequence cache is set to
50000, while instance 1 inserts values 1, 2, 3, and so on, instance 2 concurrently inserts
50001, 50002, and so on. After some block splits, each instance writes to a different part
of the index tree.
The ideal value for a sequence cache has a dual purpose. It avoids interinstance leaf
index block contention and minimizes possible gaps. One of the main variables to
consider is the insert rate: the higher it is, the higher the sequence cache must be.
However, creating a simulation to evaluate the gains for a specific configuration is
recommended.
Note
By default, the cache value is 20. Typically, 20 is too small for this example.
Excessive undo block shipment and contention for undo buffers usually happens when
index blocks containing active transactions from multiple instances are read frequently.
When a SELECT statement needs to read a block with active transactions, it has to undo
the changes to create a CR version.
If the active transactions in the block belong to more than one instance, there is a need to
combine local and remote undo information for the consistent read. Depending on the
amount of index blocks changed by multiple instances and the duration of the
transactions, undo block shipment may become a bottleneck.
Graphic
In this example, there is an Index with Instance 1 and Instance 2. It also contains
RAC01, RAC02, SGA1, and SGA2. RAC01 sends changes to Instance 2 and
SGA1. There are constant reads between Instance 2 and SGA1 and between
SGA2 and RAC01. This results in Additional interconnect traffic. The Undo blocks
in SGA1 and SGA2 compare Instance 2 with RAC01 and RAC02 respectively to
check if the CR version exists.
Usually this happens in applications that read recently inserted data very frequently, but
commit infrequently. These are the techniques that can be used to reduce such
situations.
Shorter transactions reduce the likelihood that any given index block in the cache
contains uncommitted data, thereby reducing the need to access undo information for
consistent read.
Increasing sequence cache sizes can reduce interinstance concurrent access to index
leaf blocks. CR versions of index blocks modified by only one instance can be fabricated
without the need of remote undo information.
Note
In RAC, the problem is exacerbated by the fact that a subset of the undo
information has to be obtained from remote instances.
A certain combination of wait events and statistics presents itself in applications where
the insertion of data is a dominant business function and new blocks have to be allocated
frequently to a segment. If data is inserted at a high rate, new blocks may have to be
made available after unfruitful searches for free space.
This has to happen while holding the High-Water Mark or HWM enqueue. Therefore, the
most common symptoms for this scenario include a high percentage of wait time for enq:
HW contention and a high percentage of wait time for gc current grant events.
Graphic
RAC01 and RAC02 have heavy inserts during changes and reads with HWM.
HWM enqueue makes new blocks available for free space.
The former is a consequence of the serialization of the HWM enqueue, and the latter is
because of the fact that current access to the new data blocks that need formatting is
required for the new block operation.
In a RAC environment, the length of this space management operation is proportional to
the time it takes to acquire the HWM enqueue and the time it takes to acquire global locks
for all the new blocks that need formatting. This time is small under normal circumstances
because there is never any access conflict for the new blocks.
Therefore, this scenario may be observed in applications with business functions
requiring a lot of data loading, and the main recommendation to alleviate the symptoms is
to define uniform and large extent sizes for the locally managed and automatic space
managed segments that are subject to high-volume inserts.
In data warehouse and data mart environments, it is not uncommon to see a lot of
TRUNCATE operations.
These essentially happen on tables containing temporary data. In a RAC environment,
truncating tables concurrently from different instances does not scale well, especially if,
you are also using direct read operations such as parallel queries.
A truncate operation requires a cross-instance call to flush dirty blocks of the table that
may be spread across instances. This constitutes a point of serialization. So, while the
first TRUNCATE command is processing, the second has to wait until the first one
completes.
Graphic
The example contains SGA1and SGA2. SGA1 consists of Table1 and Table2.
SGA2 consists of Table1 with a Dirty block and Table2. Truncate Table1 is
processing with SGA1 and Truncate Table2 with SGA2. There is a Cross-instance
call between the two truncate commands. Checkpointing CKPT connects SGA1,
SGA2, and the Cross-instance call.
There are different types of cross-instance calls. However, all use the same serialization
mechanism.
For example, the cache flush for a partitioned table with many partitions may add latency
to a corresponding parallel query. This is because each cross-instance call is serialized at
the cluster level, and one cross-instance call is needed for each partition at the start of
the parallel query for direct read purposes.
Question
In a RAC environment, truncating tables concurrently from different instances does
not scale well, especially if, you are also using direct read operations such as
parallel queries. What causes this issue?
Options:
1.
2.
3.
all leaf block access going through the root block leads to contention
4.
Answer
Option 1: Incorrect. Excessive undo block shipment and contention for undo
buffers usually happens when index blocks containing active transactions from
multiple instances are read frequently. In RAC, the problem is exacerbated by the
fact that a subset of the undo information has to be obtained from remote
instances.
Option 2: Correct. A truncate operation requires a cross-instance call to flush dirty
blocks of the table that may be spread across instances. This constitutes a point
of serialization. So, while the first TRUNCATE command is processing, the second
has to wait until the first one completes. There are different types of crossinstance calls. However, all use the same serialization mechanism.
Option 3: Incorrect. Depending on the access frequency and the number of
processes concurrently inserting data, indexes can become hot spots and
contention can be exacerbated by low tree depth. That is all leaf block access
going through the root block.
Option 4: Incorrect. Indexes with key values generated by sequences tend to be
subject to leaf block contention when the insert rate is high. That is because the
index leaf block holding the highest key value is changed for every row inserted,
as the values are monotonically ascending. In RAC, this may lead to a high rate of
current and CR blocks transferred between nodes.
Correct answer(s):
2. concurrent cross-instance calls are a point of serialization
Summary
You should avoid long full-table scans, use ASSM, and increase sequence caches during
RAC tuning. Index Block Contention is worsened by increasing key values in the index,
frequent leaf block splits and low tree depth.
You can use global index hash partitioning for a less skewed distribution of the
concurrency in the index tree. Increasing sequence cache, replacing surrogate with
natural keys, and using reverse key indexes will also facilitate this concurrency. By
increasing the sequence cache, you can limit the overhead caused by high rates of block
transfer in RAC.
To handle new data that is frequently read but rarely committed, you can use shorter
transactions and increase sequence cache size. Be sure to define uniform and large
extent sizes while dealing with High-Water Mark enqueues.
recognize how to use the Cluster Database Performance pages to monitor RAC database and
cluster performance
Question
Which two tasks can be performed directly from the Cluster Database Home
page?
Options:
1.
2.
3.
Monitor summary charts for cache coherency metrics for the cluster
4.
View how much work the database is performing on behalf of the users or
applications
Answer
Option 1: Correct. From the Cluster Database Home page, you can view the
overall system status, such as the number of nodes in the cluster and their current
status so that you do not have to access each individual database instance for
details.
Option 2: Correct. From the Cluster Database Home page, you can monitor
cluster cache coherency statistics to help you identify processing trends and
optimize performance for your Oracle RAC environment. Cache coherency
statistics measure how well the data in caches on multiple instances is
synchronized.
Option 3: Incorrect. The Cluster Cache Coherency page contains summary charts
for cache coherency metrics for the cluster.
Option 4: Incorrect. The Database Throughput charts, on the Cluster Database
Performance page, summarize any resource contention that appears in the
Average Active Sessions chart, and also show how much work the database is
performing on behalf of the users or applications.
Correct answer(s):
1. View the overall system status
2. Monitor cluster cache coherency statistics
The Cluster Database Performance page provides a quick glimpse of the performance
statistics for a database. Enterprise Manager accumulates data from each instance over
specified periods of time (known as collection-based data). Enterprise Manager also
provides current data from each instance (known as real-time data).
Statistics are rolled up across all the instances in the cluster database. Using the links
next to the charts, you can get more specific information and perform any of the following
tasks:
tune your SQL plan and schema for better optimization, and
Graphic
This page displays graphs that provide information on Cluster Host Load Average,
Global Cache Block Access Latency, and Average Active Sessions.
The Cluster Host Load Average chart on the Cluster Database Performance page shows
potential problems that are outside the database. The chart shows maximum, average,
and minimum load values for available nodes in the cluster for the previous hour.
If the load average is higher than the average of the total number of CPUs across all the
hosts in the cluster, then too many processes are waiting for CPU resources. SQL
statements that are not tuned often cause high CPU usage.
Compare the load average values with the values displayed for CPU Used in the Average
Active Sessions chart. If the session's value is low and the load average value is high,
something else on the host other than your database is consuming the CPU.
You can click any of the load value labels for the Cluster Host Load Average chart to view
more detailed information about that load value.
For example if you click the label Average, the Hosts: Load Average page appears,
displaying charts that depict the average host load for up to four nodes in the cluster.
You can select whether the data is displayed in a summary chart (combining the data for
each node in one display) or in tile charts (where the data for each node is displayed in its
own chart). You can click Customize to change the number of tile charts displayed in
each row or the method of ordering the tile charts.
The Global Cache Block Access Latency chart shows the latency for each different type
of data block requests current and consistent-read or CR blocks. That is the elapsed
time it takes to locate and transfer consistent-read and current blocks between the buffer
caches.
You can click metric for the Global Cache Block Access Latency chart to view more
detailed information about that type of cached block.
If the Global Cache Block Access Latency chart shows high latencies (high elapsed
times), the cause can be any of the following:
a high number of requests caused by SQL statements that are not tuned
a large number of processes in the queue waiting for the CPU, or scheduling delays, or
plans and the schema to improve the rate at which data blocks are located in the local
buffer cache, and minimizing I/O is a successful strategy for performance tuning.
If the latency for consistent-read and current block requests reaches 10 milliseconds, the
first step in resolving the problem should be to go to the Cluster Cache Coherency page
for more detailed information.
The Average Active Sessions chart in the Cluster Database Performance page shows
potential problems inside the database.
Categories called wait classes show how much of the database is using a resource,
such as CPU or disk I/O. Comparing CPU time to wait time helps determine how much of
the response time is consumed with useful work rather than waiting for resources that are
potentially held by other processes.
At the cluster database level, this chart shows the aggregate wait class statistics across
all the instances.
The steps to be followed for a more detailed analysis are as follows:
click the clipboard icon at the bottom of the chart to view the ADDM analysis for the database for
that time period
The clipboard icon is a check mark present in the Average Active Sessions graph.
click the wait class legends beside the Average Active Sessions chart to view instance-level
information stored in Active Sessions By Instance pages
There are 13 legends next to the Average Active Sessions chart Other, Cluster, Queueing,
Network, Administrative, Configuration, Commit, Application, Concurrency, System I/O, User I/O,
Scheduler, and CPU Used.
use the Wait Class action list on the Active Sessions By Instance page to view the different wait
classes
use the Customize button and select the instances that are displayed because Active Sessions
By Instance pages show the service times for up to four instances, and
use tile charts or combine the data into a single summary chart to view the data for the instances
separately
The Throughput chart on the Performance page monitors the usage of various database
resources. By clicking the Throughput tab at the top of this chart, you can view the
Database Throughput chart.
Compare the peaks on the Average Active Sessions chart with those on the Database
Throughput charts. If internal contention is high and throughput is low, consider tuning the
database.
The Database Throughput charts summarize any resource contention that appears in the
Average Active Sessions chart, and also show how much work the database is
performing on behalf of the users or applications. The Per Second view shows the
number of transactions compared to the number of logons, and (not shown here) the
amount of physical reads compared to the redo size per second.
The Per Transaction view shows the amount of physical reads compared to the redo size
per transaction. Logons is the number of users that are logged on to the database.
You can also obtain information at the instance level by clicking one of the legends to the
right of the charts to access the Database Throughput by Instance page. This page
shows the breakdown of the aggregated Database Throughput chart for up to four
instances. You can select the instances that are displayed.
You can drill down further on the Database Throughput by Instance page to see the
sessions of an instance that is consuming the greatest resources. Click an instance name
legend just under the chart to go to the Top Sessions subpage of the Top Consumers
page for that instance.
The Throughput chart on the Performance page monitors the usage of various database
resources. By clicking the Instances tab at the top of this chart, you can view the Active
Sessions by Instance chart, which summarizes any resource contention that appears in
the Average Active Sessions chart. You can thus quickly determine how much of the
database work is being performed on each instance.
You can also obtain information at the instance level by clicking one of the legends to the
right of the chart to access the Top Sessions page, where you can view real-time data
showing the sessions that consume the greatest system resources. In the graph, the
orac2 instance after 8:20 PM consistently shows more active sessions than the orac1
instance.
click Cluster Cache Coherency in the Additional Monitoring Links section, and
click either of the legends of the Global Cache Block Access Latency chart
There are two legends next to the Global Cache Block Access Latency chart Average Current
Block Receive Time and Average CR Block Receive Time.
The Cluster Cache Coherency page contains summary charts for cache coherency
metrics for the cluster.
Global Cache Block Access Latency shows the total elapsed time, or latency, for a block
request. Click one of the legends to the right of the chart to view the average time it takes
to receive data blocks for each block type (current or CR) by instance.
On the Average Current Block Receive Time By Instance page, you can click an instance
legend under the chart to go to the Block Transfer for Local Instance page, where you can
identify which block classes (such as undo blocks, data blocks, and so on) are subject to
intense global cache activity. This page displays the block classes that are being
transferred and the instances that are transferring most of the blocks.
Cache transfer indicates how many current and CR blocks for each block class were
received from remote instances, including how many transfers incurred a delay (busy) or
an unexpected longer delay (congested.)
The Global Cache Block Transfer Rate section shows the total aggregated number of
blocks received by all instances in the cluster by way of an interconnect. Click one of the
legends to the right of the chart to go to the GC Current Blocks Received By Instance
page for that type of block.
Click an instance legend under the chart to go to the Segment Statistics by Instance page
where you can see which segments are causing cache contention.
The Global Cache Block Transfers and Physical Reads (vs. Logical Reads) section
shows the percentage of logical read operations that retrieved data from the buffer cache
of other instances by way of Direct Memory Access and from disk. It is essentially a
profile of how much work is performed in the local buffer cache rather than the portion of
remote references and physical reads (which both have higher latencies.)
Click one of the legends to the right of the chart to go to the "Global Cache Block
Transfers and Physical Reads (vs. Logical Reads) and Physical Reads vs. Logical
Reads By Instance" pages. From there, you can click an instance legend under the chart
to go to the "Segment Statistics by Instance" page, where you can see which segments
are causing cache contention.
Question
Which summary chart on the Cluster Cache Coherency page shows the
percentage of logical read operations that retrieved data from the buffer cache of
other instances by way of Direct Memory Access and from disk?
Options:
1.
2.
3.
Answer
Option 1: Incorrect. This summary chart shows the total aggregated number of
blocks received by all instances in the cluster by way of an interconnect.
Option 2: Incorrect. This summary chart shows the total elapsed time, or latency,
for a block request.
Option 3: Correct. This summary chart shows the percentage of logical read
operations that retrieved data from the buffer cache of other instances by way of
Direct Memory Access and from disk. It is essentially a profile of how much work
is performed in the local buffer cache rather than the portion of remote references
and physical reads. Both of which have higher latencies.
Correct answer(s):
3. Global Cache Block Transfers and Physical Reads
The Cluster Interconnects page is useful for monitoring the interconnect interfaces,
determining configuration issues, and identifying transfer rate-related issues, including
excess traffic. This page helps determine the load added by individual instances and
databases on the interconnect.
Sometimes you can immediately identify interconnect delays that are due to applications
outside Oracle.
Graphic
Interconnects is one of the five tabs of the Cluster page. Other tabs are Home,
Performance, Targets, and Topology.
You can use the Cluster Interconnects page to perform the following tasks:
view statistics for the interfaces, such as absolute transfer rates and errors
determine how much the instance is contributing to the transfer rate on the interface
The Private Interconnect Transfer Rate (MB/Sec) value shows a global view of the private
interconnect traffic, which is the estimated traffic on all the private networks in the cluster.
The traffic is calculated as the summary of the input rate of all private interfaces that are
known to the cluster.
On the Cluster Interconnects page, you can access the Hardware Details page, where
you obtain more information about all the network interfaces defined on each node of
your cluster.
Similarly, you can access the Transfer Rate metric page, which collects the internode
communication traffic of a cluster database instance. The critical and warning thresholds
of this metric are not set by default. You can set them according to the speed of your
cluster interconnects.
Note
You can query the V$CLUSTER_INTERCONNECTS view for information about the
private interconnect.
Use the Database Locks page to determine if multiple instances are holding locks for the
same object. The page shows user locks, all database locks, or locks that are blocking
other users or applications. You can use this information to stop a session that is
unnecessarily locking an object.
To access the Database Locks page, you perform the following steps:
click Database Locks in the Additional Monitoring Links section at the bottom of the Performance
subpage
Graphic
the Global Enqueue Statistics section contains data extracted from V$GES_STATISTICS
the Global CURRENT Served Stats section contains data from V$CURRENT_BLOCK_SERVER,
and
the Global Cache Transfer Stats section contains data from V$INSTANCE_CACHE_TRANSFER
The Segment Statistics section also includes the GC Buffer Busy Waits, CR Blocks
Received, and
CUR Blocks Received information for relevant segments.
Question
Which RAC-related statistic section in an AWR report shows round-trip times for
CR and current block transfers?
Options:
1.
2.
3.
4.
Answer
Option 1: Incorrect. This section essentially lists the number of blocks and
messages that are sent and received, as well as the number of fusion writes.
Option 2: Incorrect. This section indicates the percentage of buffer divided into
buffers received from the disk, local cache, and remote caches. Ideally, the
percentage of disk buffer access should be close to zero.
Option 3: Incorrect. The most important statistic in this section is the average
message sent queue time on ksxp, which gives a good indicator of how well
the IPC works. Average numbers should be less than 1 ms.
Option 4: Correct. This section gives you an overview of the more important
numbers first. Because the global enqueue convert statistics have been
consolidated with the global enqueue get statistics, the report prints only the
average global enqueue get time. The round-trip times for CR and current block
transfers follow, as well as the individual sender-side statistics for CR and current
blocks. The average log flush times are computed by dividing the total log flush
time by the number of actual log flushes. Also, the report prints the percentage of
blocks served that actually incurred a log flush.
Correct answer(s):
Summary
Oracle Enterprise Manager Database Control and Grid Control are cluster-aware and
help to manage your cluster database. Using the Cluster Database Performance page,
you can identify the causes of performance issues, decide whether resources need to be
added or redistributed, tune SQL plans, and resolve performance issues. This page
provides various charts that provide information on database performance. For example,
the Average Active Sessions chart shows potential problems inside the database and the
Throughput chart displays the usage of various database resources.
The Cluster Cache Coherency page contains summary charts for cache coherency
metrics for the cluster. The Cluster Interconnects page monitors the interconnect
interfaces, determines configuration issues, and identifies transfer rate-related issues.
The Private Interconnect Transfer Rate value contains a global view of the private
interconnect traffic. The Database Locks page helps determine if multiple instances are
holding locks for the same object.
In RAC environments, each AWR snapshot captures data from all active instances within
the cluster for problem detection and self-tuning purposes. RAC-related statistics are
organized in different sections in the AWR report.
recognize how to discover RAC performance problems using ADDM and Enterprise Manager
Note
The Database ADDM report is generated on the AWR snapshot coordinator.
In Oracle Database 11g, you can create a period analysis mode for ADDM that analyzes
the throughput performance for an entire cluster. When the advisor runs in this mode, it is
called database ADDM. You can run the advisor for a single instance, which is equivalent
to Oracle Database 10g ADDM and is now called instance ADDM.
Database ADDM has access to AWR data generated by all instances, thereby making the
analysis of global resources more accurate. Both database and instance ADDM run on
continuous time periods that can contain instance startup and shutdown.
In the case of database ADDM, there may be several instances that are shut down or
started during the analysis period. You must, however, maintain the same database
version throughout the entire time period.
Database ADDM runs automatically after each snapshot is taken. The automatic instance
ADDM runs are the same as in Oracle Database 10g. You can also perform analysis on a
subset of instances in the cluster. This is called partial analysis ADDM.
I/O capacity finding (the I/O system is overused) is a global finding because it concerns a
global resource affecting multiple instances. A local finding concerns a local resource or
issue that affects a single instance. For example, a CPU-bound instance results in a local
finding about the CPU.
Although ADDM can be used during application development to test changes to either the
application, the database system, or the hosting machines, database ADDM is targeted at
DBAs.
Question
Which mode of Automatic Database Diagnostic Monitor or ADDM analyzes a RAC
database cluster and reports on issues that affect the entire cluster as well as on
those that affect individual instances?
Options:
1.
Instance ADDM
2.
Database ADDM
3.
Answer
Option 1: Incorrect. You can run the advisor for a single instance, which is
equivalent to Oracle Database 10g ADDM and is now called instance ADDM.
Option 2: Correct. A special mode of ADDM analyzes a RAC database cluster
and reports on issues that affect the entire cluster as well as on those that affect
individual instances. This mode is called database ADDM (as opposed to instance
ADDM, which already existed with Oracle Database 10g). Database ADDM for
RAC is not simply a report of reports. Rather, it has independent analysis that is
appropriate for RAC. The Database ADDM report is generated on the AWR
snapshot coordinator.
Option 3: Incorrect. You can perform analysis on a subset of instances in the
cluster. This is called partial analysis ADDM.
Correct answer(s):
2. Database ADDM
Question
Which are true statements about the database ADDM mode?
Options:
1.
2.
3.
4.
Answer
Option 1: Incorrect. In Oracle Database 11g, you can create a period analysis
mode for ADDM that analyzes the throughput performance for an entire cluster.
When the advisor runs in this mode, it is called database ADDM.
Option 2: Correct. Database ADDM runs automatically after each snapshot is
taken. The automatic instance ADDM runs are the same as in Oracle Database
10g.
Option 3: Correct. Database ADDM has access to AWR data generated by all
instances, thereby making the analysis of global resources more accurate. Both
database and instance ADDM run on continuous time periods that can contain
instance startup and shutdown. In the case of database ADDM, there may be
several instances that are shut down or started during the analysis period. You
must, however, maintain the same database version throughout the entire time
period.
Option 4: Incorrect. You can perform analysis on a subset of instances in the
cluster. This is called partial analysis ADDM.
Correct answer(s):
2. Database ADDM runs automatically after each snapshot is taken
3. Database ADDM has access to AWR data generated by all instances
ADDM diagnoses the following in RAC:
lost blocks
information about interconnect devices (warns about using PUBLIC interfaces), and
throughput of devices how much of it is used by Oracle and for what purpose (GC, locks, and
PQ)
The data sources of RAC that ADDM diagnoses are as follows:
ASH
are defined in the legend based on its corresponding color in the chart. Each icon below
the chart represents a different ADDM task, which in turn corresponds to a pair of
individual Oracle Database snapshots saved in the Workload Repository.
In the ADDM Performance Analysis section, the ADDM findings are listed in descending
order, from highest impact to least impact. For each finding, the Affected Instances
column displays the number (m of n) of instances affected. Drilling down further on the
findings takes you to the Performance Findings Detail page.
Graphic
The table in the ADDM Performance Analysis section is divided into four columns
named Impact (%), Finding, Affected Instances, and Occurrences (last 24 hrs).
The Informational Findings section lists the areas that do not have a performance impact
and are for informational purpose only.
The Affected Instances chart shows how much each instance is impacted by these
findings. The display indicates the percentage impact for each instance.
Graphic
The Affected Instances chart in the Informational Findings section contains three
columns Name, Impact (%), and Status.
click the Performance tab on the Cluster Database Home page and
on the Cluster Database Performance page, make sure Real Time: 15 Second Refresh is
selected from the View Data drop-down list
Use PL/SQL to create a new AWR snapshot.
Code
y=`cat /home/oracle/nodeinfo | sed -n '1,1p'`
z=`cat /home/oracle/nodeinfo | sed -n '2,2p'`
DBNAME=`ps -ef | grep dbw0_RDB | grep -v grep | grep -v
callout1 | awk '{ print $8 }' | sed 's/1/''/' | sed
's/ora_dbw0_/''/'`
I1NAME=$DBNAME"1"
I2NAME=$DBNAME"2"
export ORACLE_HOME=/u01/app/oracle/product/11.1.0/db_1
export ORACLE_SID=$I1NAME
$ORACLE_HOME/bin/sqlplus -s /NOLOG <<EOF
connect / as sysdba
exec dbms_workload_repository.create_snapshot
PL/SQL procedure successfully completed.
You generate a workload on both instances of your cluster.
Using Database Control, determine the list of blocking locks in your database:
on the Performance page, click the Database Locks link in the Additional Monitoring Links
section of the page
on the Database Locks page, make sure that Blocking Locks is selected from the View dropdown list, and
The Database Locks page has a table with several columns. Some of the columns include Select,
Username, Sessions Blocked, Instance Name, Session ID, Serial Number, Process ID, SQL Hash
Value, Lock Type, Mode Held, Mode Requested, Object Type, Object Owner, Object Name, and
ROWID. All columns except the Username column are blank. The Username column contains the
text No locks of this type currently exist.
Graphic
The Average Active Sessions graph has its Y-axis contains values such as 0.0,
0.4, and 0.8 and its X-axis contains values such as 1:14, 1:20, and 1:25.
Look at the Average Active Sessions graph. Then drill down to the Other wait class.
Click the Cluster Database locator link at the top of the page to return to the Cluster Database
Performance page.
From there you can now see the Average Active Sessions graph. Make sure that the View Data
field is set to Real Time: 15 Second Refresh.
Using the Throughput tabbed page graph underneath the Average Active Sessions graph, you
can see the transaction rate per second.
In the Average Active Sessions graph, click the Other link on the right. This takes you to the
Active Sessions By Instance: Other page.
On the Active Sessions By Instance: Other page, you can see the number of active sessions for
the Other wait class.
The Active Sessions By Instance: Other page has the Summary Chart graph that is similar to the
Average Active Sessions graph.
After the workload finishes, use PL/SQL to create a new AWR snapshot.
Code
y=`cat /home/oracle/nodeinfo | sed -n '1,1p'`
z=`cat /home/oracle/nodeinfo | sed -n '2,2p'`
DBNAME=`ps -ef | grep dbw0_RDB | grep -v grep | grep -v
callout1 | awk '{ print $8 }' | sed 's/1/''/' | sed
's/ora_dbw0_/''/'`
I1NAME=$DBNAME"1"
I2NAME=$DBNAME"2"
export ORACLE_HOME=/u01/app/oracle/product/11.1.0/db_1
export ORACLE_SID=$I1NAME
$ORACLE_HOME/bin/sqlplus -s /NOLOG <<EOF
connect / as sysdba
exec dbms_workload_repository.create_snapshot
PL/SQL procedure successfully completed.
Using Database Control, review the latest ADDM run. You can see the following:
On the Cluster Database Home page, click the Advisor Central link.
The Advisor Central link is located in the Related Links section of the page.
On the Advisor Central page, make sure that the Advisory Type field is set to All Types, and that
the Advisor Runs field is set to Last Run. Click Go.
The Advisor Central page also contains the Task Name and Status fields.
In the Results table, select the latest ADDM run corresponding to Instance All. Then click View
Result. This takes you to the Automatic Database Diagnostic Monitor or ADDM page, and
You select the latest ADDM from the table in the Results section. This table has ten columns
named Select, Advisory Type, Name, Instance, Description, User, Status, Start Time, Duration
(seconds), and Expires In (days).
On the Automatic Database Diagnostic Monitor (ADDM) page, the ADDM Performance Analysis
shows you the consolidation of ADDM reports from all instances running in your cluster.
On the Automatic Database Diagnostic Monitor (ADDM) page, the Database Activity graph is
currently displayed. This graph is similar to the Average Active Sessions graph.
There is more information listed here:
you click the View Snapshots button to get the details of the snapshots used to create an ADDM
report
The other buttons in the ADDM Performance Analysis are Filters and View Report.
you then click the Report tab to generate the report and view its results, and
you can then click the Save to File button to save a copy of the report to your C: drive
You correct the previously found issue by creating a sequence number instead of using a
table. Using Database Control, and connected as user SYS, navigate to the Performance
page of your Cluster Database.
The steps to navigate to the Performance page are as follows:
click the Performance tab on the Cluster Database Home page and
on the Cluster Database Performance page, make sure Real Time: 15 Second Refresh is
selected from the View Data drop-down list
Use PL/SQL to create a new AWR snapshot.
Code
y=`cat /home/oracle/nodeinfo | sed -n '1,1p'`
z=`cat /home/oracle/nodeinfo | sed -n '2,2p'`
Look at the Average Active Sessions graph. Drill down to the System I/O wait class to view
System I/O wait class details.
Click the Cluster Database locator link at the top of the page to return to the Cluster Database
Performance page.
On the I/O tabbed page underneath the Average Active Sessions graph, you can see graphs
based on I/O functions.
This example shows the I/O Megabytes per Second by I/O Function graph.
Click the LGWR link of the I/O Requests per Second by I/O Function graph to see specific I/O
Requests per Second By Instance For I/O Function: LGWR graphs.
Click the IO drop down list and select Buffer Cache Reads to view the graphs specific to
Buffer Cache Reads on the I/O Megabytes per Second By Instance For I/O Function:
Buffer Cache Reads page.
You can drill down on any of the functions for the I/O Megabytes per Second by I/O
Function and I/O Requests per Second by I/O Function pages, which are examples of just
some of the drill down functionality available in the Enterprise Manager performance
pages.
After the workload finishes, use PL/SQL to create a new AWR snapshot.
Code
y=`cat /home/oracle/nodeinfo | sed -n '1,1p'`
z=`cat /home/oracle/nodeinfo | sed -n '2,2p'`
DBNAME=`ps -ef | grep dbw0_RDB | grep -v grep | grep -v
callout1 | awk '{ print $8 }' | sed 's/1/''/' | sed
's/ora_dbw0_/''/'`
I1NAME=$DBNAME"1"
I2NAME=$DBNAME"2"
export ORACLE_HOME=/u01/app/oracle/product/11.1.0/db_1
export ORACLE_SID=$I1NAME
$ORACLE_HOME/bin/sqlplus -s /NOLOG <<EOF
connect / as sysdba
exec dbms_workload_repository.create_snapshot
PL/SQL procedure successfully completed.
Using Database Control, review the latest ADDM run:
On the Cluster Database Home page, click the Advisor Central link.
The Advisor Central link is located in the Related Links section of the Cluster Database Home
page.
On the Advisor Central page, make sure that the Advisory Type field is set to All Types and that
the Advisor Runs field is set to Last Run. Click Go.
The Advisory Type and Advisor Runs fields are located in the Search section of the Advisor Central
page.
In the Results table, select the latest ADDM run corresponding to Instance All. Then click View
Result. This takes you to the Automatic Database Diagnostic Monitor or ADDM page.
The latest ADDM is located in a table that has several columns such as Select, Advisory Type,
Name, Instance, Description, User, Status, Start Time, and Duration (seconds).
On the Automatic Database Diagnostic Monitor (ADDM) page, the ADDM Performance Analysis
table shows you the consolidation of ADDM reports from all instances running in your cluster. You
can see a message under the Impact (%) column that ADDM did not find any problems.
Summary
recognize how to discover RAC performance problems using ADDM and Enterprise Manager
Note
The Database ADDM report is generated on the AWR snapshot coordinator.
In Oracle Database 11g, you can create a period analysis mode for ADDM that analyzes
the throughput performance for an entire cluster. When the advisor runs in this mode, it is
called database ADDM. You can run the advisor for a single instance, which is equivalent
to Oracle Database 10g ADDM and is now called instance ADDM.
Database ADDM has access to AWR data generated by all instances, thereby making the
analysis of global resources more accurate. Both database and instance ADDM run on
continuous time periods that can contain instance startup and shutdown.
In the case of database ADDM, there may be several instances that are shut down or
started during the analysis period. You must, however, maintain the same database
version throughout the entire time period.
Database ADDM runs automatically after each snapshot is taken. The automatic instance
ADDM runs are the same as in Oracle Database 10g. You can also perform analysis on a
subset of instances in the cluster. This is called partial analysis ADDM.
I/O capacity finding (the I/O system is overused) is a global finding because it concerns a
global resource affecting multiple instances. A local finding concerns a local resource or
issue that affects a single instance. For example, a CPU-bound instance results in a local
finding about the CPU.
Although ADDM can be used during application development to test changes to either the
application, the database system, or the hosting machines, database ADDM is targeted at
DBAs.
Question
Which mode of Automatic Database Diagnostic Monitor or ADDM analyzes a RAC
database cluster and reports on issues that affect the entire cluster as well as on
those that affect individual instances?
Options:
1.
Instance ADDM
2.
Database ADDM
3.
Answer
Option 1: Incorrect. You can run the advisor for a single instance, which is
equivalent to Oracle Database 10g ADDM and is now called instance ADDM.
Option 2: Correct. A special mode of ADDM analyzes a RAC database cluster
and reports on issues that affect the entire cluster as well as on those that affect
individual instances. This mode is called database ADDM (as opposed to instance
ADDM, which already existed with Oracle Database 10g). Database ADDM for
RAC is not simply a report of reports. Rather, it has independent analysis that is
appropriate for RAC. The Database ADDM report is generated on the AWR
snapshot coordinator.
Option 3: Incorrect. You can perform analysis on a subset of instances in the
cluster. This is called partial analysis ADDM.
Correct answer(s):
2. Database ADDM
Question
Which are true statements about the database ADDM mode?
Options:
1.
2.
3.
4.
Answer
Option 1: Incorrect. In Oracle Database 11g, you can create a period analysis
mode for ADDM that analyzes the throughput performance for an entire cluster.
When the advisor runs in this mode, it is called database ADDM.
Option 2: Correct. Database ADDM runs automatically after each snapshot is
taken. The automatic instance ADDM runs are the same as in Oracle Database
10g.
Option 3: Correct. Database ADDM has access to AWR data generated by all
instances, thereby making the analysis of global resources more accurate. Both
database and instance ADDM run on continuous time periods that can contain
instance startup and shutdown. In the case of database ADDM, there may be
several instances that are shut down or started during the analysis period. You
must, however, maintain the same database version throughout the entire time
period.
Option 4: Incorrect. You can perform analysis on a subset of instances in the
cluster. This is called partial analysis ADDM.
Correct answer(s):
2. Database ADDM runs automatically after each snapshot is taken
3. Database ADDM has access to AWR data generated by all instances
ADDM diagnoses the following in RAC:
lost blocks
information about interconnect devices (warns about using PUBLIC interfaces), and
throughput of devices how much of it is used by Oracle and for what purpose (GC, locks, and
PQ)
The data sources of RAC that ADDM diagnoses are as follows:
ASH
Graphic
The table in the ADDM Performance Analysis section is divided into four columns
named Impact (%), Finding, Affected Instances, and Occurrences (last 24 hrs).
The Informational Findings section lists the areas that do not have a performance impact
and are for informational purpose only.
The Affected Instances chart shows how much each instance is impacted by these
findings. The display indicates the percentage impact for each instance.
Graphic
The Affected Instances chart in the Informational Findings section contains three
columns Name, Impact (%), and Status.
click the Performance tab on the Cluster Database Home page and
on the Cluster Database Performance page, make sure Real Time: 15 Second Refresh is
selected from the View Data drop-down list
Use PL/SQL to create a new AWR snapshot.
Code
y=`cat /home/oracle/nodeinfo | sed -n '1,1p'`
z=`cat /home/oracle/nodeinfo | sed -n '2,2p'`
DBNAME=`ps -ef | grep dbw0_RDB | grep -v grep | grep -v
callout1 | awk '{ print $8 }' | sed 's/1/''/' | sed
's/ora_dbw0_/''/'`
I1NAME=$DBNAME"1"
I2NAME=$DBNAME"2"
export ORACLE_HOME=/u01/app/oracle/product/11.1.0/db_1
export ORACLE_SID=$I1NAME
$ORACLE_HOME/bin/sqlplus -s /NOLOG <<EOF
connect / as sysdba
exec dbms_workload_repository.create_snapshot
PL/SQL procedure successfully completed.
You generate a workload on both instances of your cluster.
Using Database Control, determine the list of blocking locks in your database:
on the Performance page, click the Database Locks link in the Additional Monitoring Links
section of the page
on the Database Locks page, make sure that Blocking Locks is selected from the View dropdown list, and
The Database Locks page has a table with several columns. Some of the columns include Select,
Username, Sessions Blocked, Instance Name, Session ID, Serial Number, Process ID, SQL Hash
Value, Lock Type, Mode Held, Mode Requested, Object Type, Object Owner, Object Name, and
ROWID. All columns except the Username column are blank. The Username column contains the
text No locks of this type currently exist.
Graphic
The Average Active Sessions graph has its Y-axis contains values such as 0.0,
0.4, and 0.8 and its X-axis contains values such as 1:14, 1:20, and 1:25.
Look at the Average Active Sessions graph. Then drill down to the Other wait class.
Click the Cluster Database locator link at the top of the page to return to the Cluster Database
Performance page.
From there you can now see the Average Active Sessions graph. Make sure that the View Data
field is set to Real Time: 15 Second Refresh.
Using the Throughput tabbed page graph underneath the Average Active Sessions graph, you
can see the transaction rate per second.
In the Average Active Sessions graph, click the Other link on the right. This takes you to the
Active Sessions By Instance: Other page.
On the Active Sessions By Instance: Other page, you can see the number of active sessions for
the Other wait class.
The Active Sessions By Instance: Other page has the Summary Chart graph that is similar to the
Average Active Sessions graph.
After the workload finishes, use PL/SQL to create a new AWR snapshot.
Code
y=`cat /home/oracle/nodeinfo | sed -n '1,1p'`
z=`cat /home/oracle/nodeinfo | sed -n '2,2p'`
DBNAME=`ps -ef | grep dbw0_RDB | grep -v grep | grep -v
callout1 | awk '{ print $8 }' | sed 's/1/''/' | sed
's/ora_dbw0_/''/'`
I1NAME=$DBNAME"1"
I2NAME=$DBNAME"2"
export ORACLE_HOME=/u01/app/oracle/product/11.1.0/db_1
export ORACLE_SID=$I1NAME
$ORACLE_HOME/bin/sqlplus -s /NOLOG <<EOF
connect / as sysdba
exec dbms_workload_repository.create_snapshot
PL/SQL procedure successfully completed.
Using Database Control, review the latest ADDM run. You can see the following:
On the Cluster Database Home page, click the Advisor Central link.
The Advisor Central link is located in the Related Links section of the page.
On the Advisor Central page, make sure that the Advisory Type field is set to All Types, and that
the Advisor Runs field is set to Last Run. Click Go.
The Advisor Central page also contains the Task Name and Status fields.
In the Results table, select the latest ADDM run corresponding to Instance All. Then click View
Result. This takes you to the Automatic Database Diagnostic Monitor or ADDM page, and
You select the latest ADDM from the table in the Results section. This table has ten columns
named Select, Advisory Type, Name, Instance, Description, User, Status, Start Time, Duration
(seconds), and Expires In (days).
On the Automatic Database Diagnostic Monitor (ADDM) page, the ADDM Performance Analysis
shows you the consolidation of ADDM reports from all instances running in your cluster.
On the Automatic Database Diagnostic Monitor (ADDM) page, the Database Activity graph is
currently displayed. This graph is similar to the Average Active Sessions graph.
There is more information listed here:
you click the View Snapshots button to get the details of the snapshots used to create an ADDM
report
The other buttons in the ADDM Performance Analysis are Filters and View Report.
you then click the Report tab to generate the report and view its results, and
you can then click the Save to File button to save a copy of the report to your C: drive
You correct the previously found issue by creating a sequence number instead of using a
table. Using Database Control, and connected as user SYS, navigate to the Performance
page of your Cluster Database.
The steps to navigate to the Performance page are as follows:
click the Performance tab on the Cluster Database Home page and
on the Cluster Database Performance page, make sure Real Time: 15 Second Refresh is
selected from the View Data drop-down list
Use PL/SQL to create a new AWR snapshot.
Code
y=`cat /home/oracle/nodeinfo | sed -n '1,1p'`
z=`cat /home/oracle/nodeinfo | sed -n '2,2p'`
DBNAME=`ps -ef | grep dbw0_RDB | grep -v grep | grep -v
callout1 | awk '{ print $8 }' | sed 's/1/''/' | sed
's/ora_dbw0_/''/'`
I1NAME=$DBNAME"1"
I2NAME=$DBNAME"2"
export ORACLE_HOME=/u01/app/oracle/product/11.1.0/db_1
export ORACLE_SID=$I1NAME
Look at the Average Active Sessions graph. Drill down to the System I/O wait class to view
System I/O wait class details.
Click the Cluster Database locator link at the top of the page to return to the Cluster Database
Performance page.
On the I/O tabbed page underneath the Average Active Sessions graph, you can see graphs
based on I/O functions.
This example shows the I/O Megabytes per Second by I/O Function graph.
Click the LGWR link of the I/O Requests per Second by I/O Function graph to see specific I/O
Requests per Second By Instance For I/O Function: LGWR graphs.
Click the IO drop down list and select Buffer Cache Reads to view the graphs specific to
Buffer Cache Reads on the I/O Megabytes per Second By Instance For I/O Function:
Buffer Cache Reads page.
You can drill down on any of the functions for the I/O Megabytes per Second by I/O
Function and I/O Requests per Second by I/O Function pages, which are examples of just
some of the drill down functionality available in the Enterprise Manager performance
pages.
After the workload finishes, use PL/SQL to create a new AWR snapshot.
Code
y=`cat /home/oracle/nodeinfo | sed -n '1,1p'`
z=`cat /home/oracle/nodeinfo | sed -n '2,2p'`
DBNAME=`ps -ef | grep dbw0_RDB | grep -v grep | grep -v
callout1 | awk '{ print $8 }' | sed 's/1/''/' | sed
's/ora_dbw0_/''/'`
I1NAME=$DBNAME"1"
I2NAME=$DBNAME"2"
export ORACLE_HOME=/u01/app/oracle/product/11.1.0/db_1
export ORACLE_SID=$I1NAME
$ORACLE_HOME/bin/sqlplus -s /NOLOG <<EOF
connect / as sysdba
exec dbms_workload_repository.create_snapshot
PL/SQL procedure successfully completed.
Using Database Control, review the latest ADDM run:
On the Cluster Database Home page, click the Advisor Central link.
The Advisor Central link is located in the Related Links section of the Cluster Database Home
page.
On the Advisor Central page, make sure that the Advisory Type field is set to All Types and that
the Advisor Runs field is set to Last Run. Click Go.
The Advisory Type and Advisor Runs fields are located in the Search section of the Advisor Central
page.
In the Results table, select the latest ADDM run corresponding to Instance All. Then click View
Result. This takes you to the Automatic Database Diagnostic Monitor or ADDM page.
The latest ADDM is located in a table that has several columns such as Select, Advisory Type,
Name, Instance, Description, User, Status, Start Time, and Duration (seconds).
On the Automatic Database Diagnostic Monitor (ADDM) page, the ADDM Performance Analysis
table shows you the consolidation of ADDM reports from all instances running in your cluster. You
can see a message under the Impact (%) column that ADDM did not find any problems.
Summary
Database ADDM for RAC reports on issues that
affect the entire cluster and individual instances. When an advisor is run for a single
instance, it is called instance ADDM. Database ADDM runs automatically when taking
AWR snapshots. When analysis is performed on a subset of instances in the cluster, it is
called partial analysis ADDM. ADDM diagnoses data sources for RAC, which includes
Wait events and ASH.
Performance problems in a RAC environment can be manually discovered by using the
Enterprise Manager performance pages as well as ADDM.