Understanding Operating System Resources
Understanding Operating System Resources
Skip Headers
View PDF
16
Understanding Operating System
Resources
This chapter explains how to tune the operating system for optimal performance of the Oracle database
server.
See Also:
Your Oracle platform-specific documentation and your operating system
vendor' s documentation
Oracle9i Database Performance Planning for a discussion of the
importance of operating system statistics
For example, if an application experiences excessive buffer busy waits, then the number of system calls
increases. If you reduce the buffer busy waits by tuning the application, then the number of system calls
decreases.
See Also:
download.oracle.com/…/ch23_os.htm 1/11
19/12/2010 Understanding Operating System Reso…
Using Operating System Caches
Operating systems and device controllers provide data caches that do not directly conflict with Oracle's own
cache management. Nonetheless, these structures can consume resources while offering little or no benefit to
performance. This is most noticeable on a UNIX system that has the database files in the UNIX file store; by
default all database I/O goes through the file system cache. On some UNIX systems, direct I/O is available to
the filestore. This arrangement allows the database files to be accessed within the UNIX file system,
bypassing the file system cache. It saves CPU resources and allows the file system cache to be dedicated to
non-database activity, such as program texts and spool files.
This problem does not occur on NT. All file requests by the database bypass the caches in the file system.
Although the operating system cache is often redundant because the Oracle buffer cache buffers blocks, there
are a number of cases where Oracle does not use the Oracle buffer cache. In these cases, using direct I/O
which bypasses the Unix or operating system cache or using raw devices which do not use the operating
system cache may yield worse performance than using operating system buffering. Some examples of this
include the following:
You may want a mix with some files cached at the operating system level and others not.
Hardware Cache
Some underlying I/O subsystems implement hardware-level caching of disk reads and writes to speed up their
response times to I/O requests. It is important to ensure that such subsystems are configured to acknowledge
write requests only after the written data is guaranteed to be safe.
For example, consider a subsystem that implements a RAM cache that acknowledges writes as soon as the
written data is in the RAM cache. The Oracle database server considers this data to be safe. Then a power
failure occurs and the data is lost. This can lead to corruption, because the Oracle server has no way of
knowing that the write was lost. In order to overcome this problem, most I/O subsystems have a mechanism
for ensuring data in the RAM cache is not lost across power failures. If the subsystem cannot guarantee this,
then it is generally best to configure the system in such a way that a write is acknowledged only after the data
has been written to disk, rather than only to the RAM cache.
Asynchronous I/O
With synchronous I/O, when an I/O request is submitted to the operating system, the writing process blocks
until the write is confirmed as complete. It can then continue processing.
Asynchronous I/O allows a process to submit an I/O request but to then continue processing. It may then
check on the result of the I/O at a later time. It is also possible to submit several I/O requests and then collect
the status of those requests at a later time, thus allowing the operating system to parallelize any of those I/O
operations, where possible. Parallel processing can reduce the overall time to complete an operation.
Consider an extreme example: Imagine you want to write out four data blocks to four different files. With
synchronous I/O you must submit block 1, wait, submit block 2, wait, submit block 3, wait, submit block 4,
download.oracle.com/…/ch23_os.htm 2/11
19/12/2010 Understanding Operating System Reso…
and wait. With asynchronous I/O, you can submit blocks 1, 2, 3, and 4 and then wait for all four blocks to
complete. Because you gave the operating system all four I/O requests at once, it can act on all the requests
in parallel. The total response time is only the duration of the longest I/O of the four, rather than the sum of all
four I/O durations.
Some platforms support asynchronous I/O by default, others need special configuration, and some only
support asynchronous I/O for certain underlying file system types.
With Oracle9i Release 2 (9.2), you can use the FILESYSTEMIO_OPTIONS initialization parameter to enable
or disable asynchronous I/O or direct I/O on file system files. This parameter is platform-specific and has a
default value that is best for a particular platform. It can be dynamically changed to update the default setting.
See Also:
Memory Usage
Memory usage is affected by both buffer cache limits and initialization parameters.
The UNIX buffer cache consumes operating system memory resources. Although in some versions of UNIX
the UNIX buffer cache may be allocated a set amount of memory, it is common today for more sophisticated
memory management mechanisms to be used. Typically these will allow free memory pages to be used to
cache I/O. In such systems it is common for operating system reporting tools to show that there is no free
memory which is not generally a problem. If processes require more memory, the memory caching I/O data is
usually released to allow the process memory to be allocated.
The memory required by any one Oracle session depends on many factors. Typically the major contributing
factors are:
In Oracle9i, the PGA_AGGREGATE_TARGET initialization parameter gives greater control over a session's
memory usage.
download.oracle.com/…/ch23_os.htm 3/11
19/12/2010 Understanding Operating System Reso…
Using Process Schedulers
Many processes, or threads on NT systems, are involved in the operation of Oracle. They all access the
shared memory resources in the SGA.
Be sure that all Oracle processes, both background and user processes, have the same process priority.
When you install Oracle, all background processes are given the default priority for the operating system. Do
not change the priorities of background processes. Verify that all user processes have the default operating
system priority.
Assigning different priorities to Oracle processes might exacerbate the effects of contention. The operating
system might not grant processing time to a low-priority process if a high-priority process also requests
processing time. If a high-priority process needs access to a memory resource held by a low-priority process,
then the high-priority process can wait indefinitely for the low-priority process to obtain the CPU, process the
request, and release the resource.
Additionally, do not bind Oracle background processes to CPUs. This can cause the bound processes to be
CPU-starved. This is especially the case when binding processes that fork off operating system threads. In
this case, the parent process and all its threads bind to the CPU.
Operating system resource managers are different from domains or other similar facilities. Domains provide
one or more completely separated environments within one system. Disk, CPU, memory, and all other
resources are dedicated to each domain and cannot be accessed from any other domain. Other similar
facilities completely separate just a portion of system resources into different areas, usually separate CPU or
memory areas. Like domains, the separate resource areas are dedicated only to the processing assigned to
that area; processes cannot migrate across boundaries. Unlike domains, all other resources (usually disk) are
accessed by all partitions on a system.
Oracle runs within domains, as well as within these other less complete partitioning constructs, as long as the
allocation of partitioned memory (RAM) resources is fixed, not dynamic.
Note:
Operating system resource managers prioritize resource allocation within a global pool of resources, usually a
domain or an entire system. Processes are assigned to groups, which are in turn assigned resources anywhere
within the resource pool.
download.oracle.com/…/ch23_os.htm 4/11
19/12/2010 Understanding Operating System Reso…
Note:
Oracle is not supported for use with any operating system resource manager's memory
management and allocation facility. Oracle Database Resource Manager, which
provides resource allocation capabilities within an Oracle instance, cannot be used with
any operating system resource manager.
Caution:
When running under operating system resource managers, Oracle is supported only
when each instance is assigned to a dedicated operating system resource manager
group or managed entity. Also, the dedicated entity running all the instance's processes
must run at one priority (or resource consumption) level. Management of individual
Oracle processes at different priority levels is not supported. Severe consequences,
including instance crashes, can result.
See Also:
Familiarize yourself with platform-specific issues so that you know what performance options the operating
system provides.
See Also:
On UNIX systems, try to establish a good ratio between the amount of time the operating system spends
fulfilling system calls and doing process scheduling and the amount of time the application runs. The goal
should be to run 60% to 75% of the time in application mode (often called user mode) and 25% to 40% of
download.oracle.com/…/ch23_os.htm 5/11
19/12/2010 Understanding Operating System Reso…
the time in operating system mode. If you find that the system is spending 50% of its time in each mode, then
determine what is wrong.
The ratio of time spent in each mode is only a symptom of the underlying problem, which might involve the
following:
Paging or swapping
Executing too many operating system calls
Running too many processes
If such conditions exist, then there is less time available for the application to run. The more time you can
release from the operating system side, the more transactions an application can perform.
Consider the paging parameters on a mainframe, and remember that Oracle can exploit a very large working
set.
Free memory in VAX or VMS environments is actually memory that is not mapped to any operating system
process. On a busy system, free memory likely contains a page belonging to one or more currently active
process. When that access occurs, a soft page fault takes place, and the page is included in the working
set for the process. If the process cannot expand its working set, then one of the pages currently mapped by
the process must be moved to the free set.
Any number of processes might have pages of shared memory within their working sets. The sum of the sizes
of the working sets can thus markedly exceed the available memory. When the Oracle server is running, the
SGA, the Oracle kernel code, and the Oracle Forms runtime executable are normally all sharable and
account for perhaps 80% or 90% of the pages accessed.
Understanding CPU
To address CPU problems, first establish appropriate expectations for the amount of CPU resources your
system should be using. Then, determine whether sufficient CPU resources are available and recognize when
your system is consuming too many resources. Begin by determining the amount of CPU resources the Oracle
instance utilizes with your system in the following three cases:
You can capture various workload snapshots using Statspack or the UTLBSTAT/UTLESTAT utility. Operating
system tools, such as vmstat, sar, and iostat on UNIX and Performance Monitor on NT, should be run
during the same time interval as UTLBSTAT/UTLESTAT to provide a complimentary view of the overall
download.oracle.com/…/ch23_os.htm 6/11
19/12/2010 Understanding Operating System Reso…
statistics.
Note:
Workload is an important factor when evaluating your system's level of CPU utilization. During peak
workload hours, 90% CPU utilization with 10% idle and waiting time can be acceptable. Even 30% utilization
at a time of low workload can be understandable. However, if your system shows high utilization at normal
workload, then there is no room for a peak workload. For example, Figure 16-1 illustrates workload over
time for an application having peak periods at 10:00 AM and 2:00 PM.
This example application has 100 users working 8 hours a day. Each user entering one transaction every 5
minutes translates into 9,600 transactions daily. Over an 8-hour period, the system must support 1,200
transactions an hour, which is an average of 20 transactions a minute. If the demand rate were constant, then
you could build a system to meet this average workload.
However, usage patterns are not constant and in this context, 20 transactions a minute can be understood as
merely a minimum requirement. If the peak rate you need to achieve is 120 transactions a minute, then you
must configure a system that can support this peak workload.
For this example, assume that at peak workload, Oracle uses 90% of the CPU resource. For a period of
average workload, then, Oracle uses no more than about 15% of the available CPU resource, as illustrated in
the following equation:
20 tpm / 120 tpm * 90% = 15% of available CPU resource
As users are added to an application, the workload can rise to what had previously been peak levels. No
further CPU capacity is then available for the new peak rate, which is actually higher than the previous.
Tuning, or the process of detecting and solving CPU problems from excessive consumption
See Also:
See Also:
Reducing the impact of peak load use patterns by prioritizing CPU resource allocation. Oracle's
Database Resource Manager does this by allocating and managing CPU resources among database
users and applications.
See Also:
Context Switching
Oracle has the several features for context switching, described in this section.
Post-wait Driver
An Oracle process needs to be able to post another Oracle process (give it a message) and also needs to be
able to wait to be posted.
For example, a foreground process may need to post LGWR to tell it to write out all blocks up to a given
point so that it can acknowledge a commit.
Often this post-wait mechanism is implemented through UNIX Semaphores, but these can be resource
intensive. Therefore, some platforms supply a post-wait driver, typically a kernel device driver that is a
lightweight method of implementing a post-wait interface.
Oracle often needs to query the system time for timing information. This can involve an operating system call
download.oracle.com/…/ch23_os.htm 8/11
19/12/2010 Understanding Operating System Reso…
that incurs a relatively costly context switch. Some platforms implement a memory-mapped timer that uses an
address within the processes virtual address space to contain the current time information. Reading the time
from this memory-mapped timer is less expensive than the overhead of a context switch for a system call.
List I/O is an application programming interface that allows several asynchronous I/O requests to be
submitted in a single system call, rather than submitting several I/O requests through separate system calls.
The main benefit of this feature is to reduce the number of context switches required.
Use operating system monitoring tools to determine what processes are running on the system as a whole. If
the system is too heavily loaded, check the memory, I/O, and process management areas described later in
this section.
Tools such as sar -u on many UNIX-based systems let you examine the level of CPU utilization on your
entire system. CPU utilization in UNIX is described in statistics that show user time, system time, idle time,
and time waiting for I/O. A CPU problem exists if idle time and time waiting for I/O are both close to zero
(less than 5%) at a normal or low workload.
On NT, use Performance Monitor to examine CPU utilization. Performance Manager provides statistics on
processor time, user time, privileged time, interrupt time, and DPC time. (NT Performance Monitor is not the
same as Performance Manager, which is an Oracle Enterprise Manager tool.)
Note:
This section describes how to check system CPU utilization on most UNIX-based and
NT systems. For other platforms, see your operating system documentation.
Use tools such as sar or vmstat on UNIX or Performance Monitor on NT to investigate the cause of
paging and swapping.
On UNIX, if the processing space becomes too large, then it can result in the page tables becoming too large.
This is not an issue on NT.
download.oracle.com/…/ch23_os.htm 9/11
19/12/2010 Understanding Operating System Reso…
Checking I/O Management
Thrashing is an I/O management issue. Ensure that your workload fits into memory, so the machine is not
thrashing (swapping and paging processes in and out of memory). The operating system allocates fixed
portions of time during which CPU resources are available to your process. If the process wastes a large
portion of each time period checking to be sure that it can run and ensuring that all necessary components are
in the machine, then the process might be using only 50% of the time allotted to actually perform work.
See Also:
The operating system can spend excessive time scheduling and switching processes. Examine the way in
which you are using the operating system, because you could be using too many processes. On NT systems,
do not overload your server with too many non-Oracle processes.
Context Switching
Due to operating system specific characteristics, your system could be spending a lot of time in context
switches. Context switching can be expensive, especially with a large SGA. Context switching is not an issue
on NT, which has only one process for each instance. All threads share the same page table.
There is a high cost in starting new operating system processes. Programmers often create single-purpose
processes, exit the process, and create a new one. Doing this re-creates and destroys the process each time.
Such logic uses excessive amounts of CPU, especially with applications that have large SGAs. This is
because you need to build the page tables each time. The problem is aggravated when you pin or lock shared
memory, because you have to access every page.
For example, if you have a 1 gigabyte SGA, then you might have page table entries for every 4 KB, and a
page table entry might be 8 bytes. You could end up with (1G / 4 KB) * 8 byte entries. This becomes
expensive, because you need to continually make sure that the page table is loaded.
download.oracle.com/…/ch23_os.htm 10/11
19/12/2010 Understanding Operating System Reso…
All Rights Reserved.
Home Book List Contents Index Master Index Feedback
download.oracle.com/…/ch23_os.htm 11/11