0% found this document useful (0 votes)
1 views

RAC Database Monitoring and Tuning

Uploaded by

Online admin
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

RAC Database Monitoring and Tuning

Uploaded by

Online admin
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 19

RAC Database Monitoring and Tuning

Using system statistics based on V$SYSSTAT enables characterization of the database


activity based on averages. It is the basis for many metrics and ratios used in various
tools and methods, such as AWR, Statspack, and Database Control.
In order to drill down to individual sessions or groups of sessions, V$SESSTAT is useful
when the important session identifiers to monitor are known. Its usefulness is enhanced
if an application fills in the MODULE and ACTION columns in V$SESSION.
V$SEGMENT_STATISTICS is useful for RAC because it also tracks the number of CR
and current blocks received by the object.
The RAC-relevant statistics can be grouped into:

 Global Cache Service statistics: gc cr blocks received, gc cr block receive


time, and so on
 Global Enqueue Service statistics: global enqueue gets, and so on
 Statistics for messages sent: gcs messages sent and ges messages sent

V$ENQUEUE_STATISTICS can be queried to determine which enqueue has the


highest impact on database service times and, eventually, response times.
V$INSTANCE_CACHE_TRANSFER indicates how many current and CR blocks per block
class are received from each instance, including how many transfers incurred a delay.
In any database system, RAC or single-instance, the most significant performance
gains are usually obtained from traditional application-tuning techniques. The benefits of
those techniques are even more remarkable in a RAC database. In addition to
traditional application tuning, some of the techniques that are particularly important for
RAC include the following:
 Try to avoid long full-table scans to minimize GCS requests. The overhead
caused by the global CR requests in this scenario is because when queries result
in local cache misses, an attempt is first made to find the data in another cache,
based on the assumption that the chance is high that another instance has
cached the block.
 Automatic Segment Space Management can provide instance affinity to table
blocks.
 Increasing sequence caches improves instance affinity to index keys deriving
their values from sequences. That technique may result in significant
performance gains for multi-instance insert-intensive applications.
 Range or list partitioning may be very effective in conjunction with data-
dependent routing, if the workload can be directed to modify a particular range of
values from a particular instance.
 Hash partitioningmay help to reduce buffer busy contention by making buffer
access distribution patterns sparser, enabling more buffers to be available for
concurrent access.
 In RAC, library cache and row cache operations are globally coordinated. So,
excessive parsing means additional interconnect traffic. Library cache locks are
heavily used, in particular by applications that use PL/SQL or Advanced
Queuing. Library cache locks are acquired in exclusive mode whenever a
package or procedure has to be recompiled.
 Because transaction locks are globally coordinated, they also deserve special
attention in RAC. For example, using tables instead of Oracle sequences to
generate unique numbers is not recommended because it may cause severe
contention even for a single-instance system.
 Indexes that are not selective do not improve query performance, but can
degrade DML performance. In RAC, unselective index blocks may be subject to
inter-instance contention, increasing the frequency of cache transfers for indexes
belonging to insertintensive tables.
 Always verify that you use a private network for your interconnect, and that your
private network is configured properly. Ensure that a network link is operating in
full duplex mode. Ensure that your network interface and Ethernet switches
support MTU size of 9 KB. Note that a single-gigabit Ethernet interface can scale
up to ten thousand 8 KB blocks per second before saturation.

Note: You can query the GV$CLUSTER_INTERCONNECTS view to see information about
the private interconnect:

SQL> select * from GV$CLUSTER_INTERCONNECTS;


INST_ID NAME IP_ADDRESS IS_PUBLIC SOURCE
-------- ----- --------------- ---------
-------------------------
1 eth1 192.0.2.110 NO Oracle Cluster Repository
2 eth1 192.0.2.111 NO Oracle Cluster Repository
3 eth1 192.0.2.112 NO Oracle Cluster Repository
Both Oracle Enterprise Manager Database Control and Grid Control are cluster-aware
and provide a central console to manage your cluster database. From the Cluster
Database Home page, you can do all of the following:
 View the overall system status, such as the number of nodes in the cluster and
their current status, so you do not have to access each individual database
instance for details.
 View the alert messages aggregated across all the instances with lists for the
source of each alert message.
 Review the issues that are affecting the entire cluster as well as those that are
affecting individual instances.
 Monitor cluster cache coherency statistics to help you identify processing trends
and optimize performance for your Oracle RAC environment. Cache coherency
statistics measure how well the data in caches on multiple instances is
synchronized.
 Determine whether any of the services for the cluster database are having
availability problems. A service is deemed to be a problem service if it is not
running on all preferred instances, if its response time thresholds are not met,
and so on.
 Review any outstanding Clusterware interconnect alerts.
AWR automatically generates snapshots of the performance data once every hour and
collects the statistics in the workload repository. In RAC environments, each AWR
snapshot captures data from all active instances within the cluster. The data for each
snapshot set that is captured for all active instances is from roughly the same point in
time. In addition, the data for each instance is stored separately and is identified with an
instance identifier. For example, the buffer_busy_wait statistic shows the number
of buffer waits on each instance. The AWR does not store data that is aggregated from
across the entire cluster. That is, the data is stored for each individual instance.

The statistics snapshots generated by the AWR can be evaluated by producing reports
displaying summary data such as load and cluster profiles based on regular statistics
and wait events gathered on each instance.

The AWR functions in a similar way as Statspack. The difference is that the AWR
automatically collects and maintains performance statistics for problem detection and
selftuning purposes. Unlike in Statspack, in the AWR, there is only one snapshot_id
per snapshot across instances.
Active Session History (ASH) is an integral part of the Oracle Database self-
management framework and is useful for diagnosing performance problems in Oracle
RAC environments. ASH report statistics provide details about Oracle Database session
activity. Oracle Database records information about active sessions for all active Oracle
RAC instances and stores this data in the System Global Area (SGA). Any session that
is connected to the database and using CPU is considered an active session. The
exception to this is sessions that are waiting for an event that belongs to the idle wait
class.

ASH reports present a manageable set of data by capturing only information about
active sessions. The amount of the data is directly related to the work being performed,
rather than the number of sessions allowed on the system. ASH statistics that are
gathered over a specified duration can be put into ASH reports.

Each ASH report is divided into multiple sections to help you identify short-lived
performance problems that do not appear in the ADDM analysis. Two ASH report
sections that are specific to Oracle RAC are Top Cluster Events and Top Remote
Instance.

Top Cluster Events


The ASH report Top Cluster Events section is part of the Top Events report that is
specific to Oracle RAC. The Top Cluster Events report lists events that account for the
highest percentage of session activity in the cluster wait class event along with the
instance number of the affected instances. You can use this information to identify
which events and instances caused a high percentage of cluster wait events.

Top Remote Instance


The ASH report Top Remote Instance section is part of the Top Load Profile report that
is specific to Oracle RAC. The Top Remote Instance report shows cluster wait events
along with the instance numbers of the instances that accounted for the highest
percentages of session activity. You can use this information to identify the instance that
caused the extended cluster wait period.

Using the Automatic Database Diagnostic Monitor (ADDM), you can analyze the
information collected by AWR for possible performance problems with your Oracle
database. ADDM presents performance data from a clusterwide perspective, thus
enabling you to analyze performance on a global basis. In an Oracle RAC environment,
ADDM can analyze performance using data collected from all instances and present it
at different levels of granularity, including:
 Analysis for the entire cluster
 Analysis for a specific database instance
 Analysis for a subset of database instances

To perform these analyses, you can run the ADDM Advisor in Database ADDM for RAC
mode to perform an analysis of the entire cluster, in Local ADDM mode to analyze the
performance of an individual instance, or in Partial ADDM mode to analyze a subset of
instances. Database ADDM for RAC is not just a report of reports but has independent
analysis that is appropriate for RAC. You activate ADDM analysis using the advisor
framework through Advisor Central in Oracle Enterprise Manager, or through the
DBMS_ADVISOR and DBMS_ADDM PL/SQL packages.

Note: Database ADDM report is generated on AWR snapshot coordinator


In Oracle Database 12c, you can create a period analysis mode for ADDM that
analyzes the throughput performance for an entire cluster. When the advisor runs in this
mode, it is called database ADDM. You can run the advisor for a single instance, which
is called instance ADDM.

Database ADDM has access to AWR data generated by all instances, thereby making
the analysis of global resources more accurate. Both database and instance ADDM run
on continuous time periods that can contain instance startup and shutdown. In the case
of database ADDM, there may be several instances that are shut down or started during
the analysis period. However, you must maintain the same database version throughout
the entire time period.

Database ADDM runs automatically after each snapshot is taken. You can also perform
analysis on a subset of instances in the cluster. This is called partial analysis ADDM.
I/O capacity finding (the I/O system is overused) is a global finding because it concerns
a global resource affecting multiple instances. A local finding concerns a local resource
or issue that affects a single instance. For example, a CPU-bound instance results in a
local finding about the CPU. Although ADDM can be used during application
development to test changes to either the application, the database system, or the
hosting machines, database ADDM is targeted at DBAs.
Data sources are:
 Wait events (especially Cluster class and buffer busy)
 Active Session History (ASH) reports
 Instance cache transfer data
 Interconnect statistics (throughput, usage by component, pings)
ADDM analyzes the effects of RAC for both the entire database (DATABASE analysis
mode) and for each instance (INSTANCE analysis mode).
The Database Resource Manager (also called Resource Manager) enables you to
identify work by using services. It manages the relative priority of services within an
instance by binding services directly to consumer groups. When a client connects by
using a service, the consumer group is assigned transparently at connect time. This
enables the Resource Manager to manage the work requests by service in the order of
their importance.

For example, you define the AP and BATCH services to run on the same instance, and
assign AP to a high-priority consumer group and BATCH to a low-priority consumer
group. Sessions that connect to the database with the AP service specified in their TNS
connect descriptor get priority over those that connect to the BATCH service.

This offers benefits in managing workloads because priority is given to business


functions rather than the sessions that support those business functions.
Fast Application Notification (FAN) enables end-to-end, lights-out recovery of
applications and load balancing based on real transaction performance in a RAC
environment. With FAN, the continuous service built in to Oracle Real Application
Clusters 11g is extended to applications and mid-tier servers. When the state of a
database service changes, (for example, up, down, or not restarting), the new status is
posted to interested subscribers through FAN events. Applications use these events to
achieve very fast detection of failures, and rebalancing of connection pools following
failures, recovery, or planned changes. The easiest way to receive all the benefits of
FAN, with no effort, is to use a client that is integrated with FAN:

 Oracle Universal Connection Pool (UCP) for Java


 User extensible callouts
 Connection Manager (CMAN)
 Listeners
 Oracle Notification Service (ONS) API
 OCI Connection Pool or Session Pool
 Transparent Application Failover (TAF)
 ODP.NET Connection Pool
 Note: Not all the preceding applications can receive all types of FAN events.

Traditionally, client or mid-tier applications connected to the database have relied on


connection timeouts, out-of-band polling mechanisms, or other custom solutions to
realize that a system component has failed. This approach has huge implications in
application availability, because down times are extended and more noticeable.

With FAN, important high-availability events are pushed as soon as they are detected,
which results in a more efficient use of existing computing resources, and a better
integration with your enterprise applications, including mid-tier connection managers, or
IT management consoles, including trouble ticket loggers and email/paging servers.

FAN is, in fact, a distributed system that is enabled on each participating node. This
makes it very reliable and fault tolerant because the failure of one component is
detected by another. Therefore, event notification can be detected and pushed by any
of the participating nodes.
FAN events are tightly integrated with Oracle Data Guard Broker, Oracle JDBC implicit
connection cache, ODP.NET, TAF, and Enterprise Manager. For example, Oracle
Database 11g JDBC applications managing connection pools do not need custom code
development. They are automatically integrated with the ONS if implicit connection
cache and fast connection failover are enabled.

FAN delivers events pertaining to the list of managed cluster resources shown in the
slide.
The table describes each of the resources.
Note: SRV_PRECONNECT and SERVICEMETRICS are discussed later in this lesson.

You might also like