technical-brief-stats-concepts-19c
technical-brief-stats-concepts-19c
Optimizer
Statistics
With Oracle
Database 19c
ORACLE W HITE PAPER / DECEMBER 9, 2019
INTRODUCTION
When the Oracle database was first introduced, the decision of how to execute a SQL statement
was determined by a Rule Based Optimizer (RBO). The Rule Based Optimizer, as the name implies,
followed a set of rules to determine the execution plan for a SQL statement.
In Oracle Database 7, the Cost Based Optimizer (CBO) was introduced to deal with the enhanced
functionality being added to the Oracle Database at this time, including parallel execution and
partitioning, and to take the actual data content and distribution into account. The Cost Based
Optimizer examines all of the possible plans for a SQL statement and picks the one with the lowest
cost, where cost represents the estimated resource usage for a given plan. The lower the cost, the
more efficient an execution plan is expected to be. In order for the Cost Based Optimizer to
accurately determine the cost for an execution plan, it must have information about all of the objects
(tables and indexes) accessed in the SQL statement, and information about the system on which the
SQL statement will be run.
DISCLAIMER
This document in any form, software or printed matter, contains proprietary information that is the
exclusive property of Oracle. Your access to and use of this confidential material is subject to the
terms and conditions of your Oracle software license and service agreement, which has been
executed and with which you agree to comply. This document and information contained herein may
not be disclosed, copied, reproduced or distributed to anyone outside Oracle without prior written
consent of Oracle. This document is not part of your license agreement nor can it be incorporated
into any contractual agreement with Oracle or its subsidiaries or affiliates.
This document is for informational purposes only and is intended solely to assist you in planning for
the implementation and upgrade of the product features described. It is not a commitment to deliver
any material, code, or functionality, and should not be relied upon in making purchasing decisions.
The development, release, and timing of any features or functionality described in this document
remains at the sole discretion of Oracle.
Due to the nature of the product architecture, it may not be possible to safely include all features
described in this document without risking significant destabilization of the code.
Introduction .................................................................................................. 2
Managing Statistics.................................................................................... 16
Conclusion ................................................................................................. 25
References ................................................................................................ 26
Figure 1: Optimizer Statistics stored in the data dictionary are used by the Oracle Optimizer to determine execution plans
Most types of optimizer statistics need be gathered or refreshed periodically to ensure that they accurately reflect the nature of the data
that’s stored in the database. If, for example, the data in the database is highly volatile (perhaps there are tables that are rapidly and
continuously populated) then it will be necessary to gather statistics more frequently than if the data is relatively static. Database
administrators can choose to use manual or automatic processes to gather statistics and this topic is covered in the second paper of
this series1.
Column statistics include information on the number of distinct values in a column (NDV) as well as the minimum and maximum value
found in the column. You can view column statistics in the dictionary view USER_TAB_COL_STATISTICS. The optimizer uses the
column statistics information in conjunction with the table statistics (number of rows) to estimate the number of rows that will be
returned by a SQL operation. For example, if a table has 100 records, and the table access evaluates an equality predicate on a
column that has 10 distinct values, then the optimizer, assuming uniform data distribution, estimates the cardinality – the number of
rows returned - to be the number of rows in the table divided by the number of distinct values for the column or 100/10 = 10.
1 Oracle White Paper: Best Practices for Gathering Optimizer Statistics with Oracle Database 19c
This Oracle Database 19c new feature is available on certain Oracle Database platforms. Check the Oracle Database Licensing Guide for more
information.
Real-time statistics extends the online statistic gathering techniques to conventional insert, update and merge DML operations. In order
to minimize the performance overhead of generating these statistics, only the most essential optimizer statistics are gathered during
DML operations. These essential statistics are used to augment the statistics gathered via the auto statistics gathering job (or the
DBMS_STATS API). The collection of remaining stats (such as the number of distinct values) is therefore deferred to the automatic
statistics gathering job, high-frequency stats gathering or the manual invocation of the DBMS_STATS API.
Histograms
Histograms tell the optimizer about the distribution of data within a column. By default (without a histogram), the optimizer assumes a
uniform distribution of rows across the distinct values in a column. As described above, the optimizer calculates the cardinality for an
equality predicate by dividing the total number of rows in the table by the number of distinct values in the column used in the equality
predicate. If the data distribution in that column is not uniform (i.e., a data skew) then the cardinality estimate will be incorrect. In order
to accurately reflect a non-uniform data distribution, a histogram is required on the column. The presence of a histogram changes the
formula used by the optimizer to estimate a more accurate cardinality, and allows it therefore to generate a more accurate execution
plan.
Oracle automatically determines the columns that need histograms based on the column usage information (SYS.COL_USAGE$), and
the presence of a data skew. For example, Oracle will not automatically create a histogram on a unique column if it is only seen in
equality predicates.
There are four types of histograms: frequency, top-frequency, or height-balanced and hybrid. The appropriate histogram type is chosen
automatically by the Oracle database. This decision is based on the number of distinct values in the column. From Oracle Database
12c onwards, height-balance histograms are replaced by hybrid histograms2. The data dictionary view user_tab_col_statistics has
column called “histogram”. It reports what type of histogram is present on any particular table column.
Frequency Histograms
Frequency histograms are created when the number of distinct values in the column is less than the maximum number of buckets
allowed. This is 254 by default, but it can be modified using DBMS_STATS procedures up to a maximum of 2048 (beginning with
Oracle Database 12c).
2 Assuming the parameter ESTIMATE_PERCENT parameter is “AUTO_SAMPLE_SIZE” in the DBMS_STATS.GATHER_*_STATS command used to gather the statistics. This is the default.
Figure 4 shows how this histogram can be viewed in the data dictionary. Compare the ENDPOINT_VALUE and ENDPOINT_NUMBER with
Chart 2 and FREQUENCY with Chart 1.
Note that endpoint values are numeric, so histograms on columns with non-numeric datatypes have their values encoded to a number.
Once the histogram has been created, the optimizer can use it to estimate cardinalities more accurately. For example, if there is a
predicate “WHERE val = 53”, it is easy to see that the histogram can be used to establish (using endpoint value “53”) that the cardinality
will be 3. Predicates for values that aren’t present in the frequency histogram (such as “WHERE val=41”) will have an estimated
cardinality of 1. That is the lowest possible number chosen by the Optimizer for cost calculations, chosen for both “one row returned”
and “zero rows returned”.
Top-Frequency Histograms
Traditionally, if a column has more distinct values than the number of buckets available (254 by default), a height-balanced or hybrid
histogram would be created. However, there are situations where most of the rows in the table have a small number of distinct values,
and remaining rows (with a large number of distinct values) make up a very small proportion of the total. In these circumstances, it can
be appropriate to create a frequency histogram for the majority of the rows in the table and ignore the statistically insignificant set of
rows (the ones with low cardinality and a high number of distinct values). To choose a frequency histogram, the database must decide if
“n” histogram buckets is enough to calculate cardinality accurately even though the number of distinct values in the column exceeds “n”.
It does this by counting how many distinct values there in the top 99.6% of rows in the table for the column in question (99.6% is the
default, but this value is adjusted to take into account the number of histogram buckets available). If there are enough histogram
buckets available to accommodate the top-n distinct values, then a frequency histogram is created for these popular values.
Take, for example, a PRODUCT_SALES table, which contains sales information for a Christmas ornaments company. The table has 1.78
million rows and 632 distinct TIME_IDs. But the majority of the rows in PRODUCT_SALES have less than 254 distinct TIME_IDs, as
the majority of Christmas ornaments are sold in December each year. A histogram is necessary on the TIME_ID column to make the
optimizer aware of the data skew in the column. In this case, a top-frequency histogram is created containing 254 buckets.
Figure 5 illustrates the idea behind a top-frequency histogram. You can see in principle that it is possible to identify the most significant
data (within the dotted lines) and use it to construct a frequency histogram with those values.
Figure 6 shows how the hybrid histogram has characteristics of both frequency and height-balanced histograms; a bucket can contain
multiple values and the endpoint number stores the cumulative frequency.
The endpoint repeat count indicates how many times the endpoint value is repeated. For example, compare the following results with
endpoint values in Figure 6:
Hybrid histograms are the default histogram type for columns with greater than 254 distinct values as long as the statistics are gathered
using the default estimate percent setting (ESTIMATE_PERCENT => DBMS_STATS.AUTO_SAMPLE_SIZE).
Extended statistics encompass two additional types of statistics; column groups and expression statistics.
Column Groups
In real-world data, there is often a relationship (correlation) between the data stored in different columns of the same table. For
example, in the CUSTOMERS table, the values in the CUST_STATE_PROVINCE column are influenced by the values in the COUNTRY_ID
column, as the state of California is only going to be found in the United States. Using only basic column statistics, the optimizer has no
way of knowing about these real-world relationships, and could potentially miscalculate the cardinality if multiple columns from the same
table are used in the where clause of a statement. The optimizer can be made aware of these real-world relationships by having
extended statistics on these columns as a group.
By creating statistics on a group of columns, the optimizer can compute a better cardinality estimate when several the columns from the
same table are used together in a where clause of a SQL statement. You can use the function
DBMS_STATS.CREATE_EXTENDED_STATS to define a column group you want to have statistics gathered on as a group. Once a
column group has been created, Oracle will automatically maintain the statistics on that column group when statistics are gathered on
the table, just like it does for any ordinary column (Figure 8).
After creating the column group and re-gathering statistics, you will see an additional column, with a system-generated name, in the
dictionary view USER_TAB_COL_STATISTICS. This new column represents the column group (Figure 9).
The optimizer will now use the column group statistics for predicates like country_id='US' and cust_state_province='CA'
when they are used together in where clause predicates. Not all of the columns in the column group need to be present in the SQL
statement for the optimizer to use extended statistics; only a subset of the columns is necessary.
Auto column group detection automatically determines which column groups are beneficial for a table based on a given workload.
Please note this functionality does not create extended statistics for function wrapped columns it is only for column groups. Auto
Column Group detection is a simple three-step process:
Oracle must observe a representative workload in order to determine the appropriate column groups. The workload can be provided in
a SQL Tuning Set or by monitoring a running system. The procedure, DBMS_STATS.SEED_COL_USAGE, should be used it indicate
the workload and to tell Oracle how long it should observe that workload. The following example turns on monitoring for 5 minutes or
300 seconds for the current system.
begin
dbms_stats.seed_col_usage(null,null,300);
end;
/
The monitoring procedure records different information from the traditional column usage information you see in sys.col_usage$ and
stores it in sys.col_group_usage$. Information is stored for any SQL statement that is executed or explained during the monitoring
window. Once the monitoring window has finished, it is possible to review the column usage information recorded for a specific table
using the new function DBMS_STATS.REPORT_COL_USAGE. This function generates a report, which lists what columns from the table
were seen in filter predicates, join predicates and group by clauses in the workload:
Calling the DBMS_STATS.CREATE_EXTENDED_STATS function for each table will automatically create the necessary column
groups based on the usage information captured during the monitoring window. Once the extended statistics have been created, they
will be automatically maintained whenever statistics are gathered on the table.
Alternatively, the column groups can be manually creating by specifying the group as the third argument in the
DBMS_STATS.CREATE_EXTENDED_STATS function.
The final step is to re-gather statistics on the affected tables so that the newly created column groups will have statistics created for them:
exec dbms_stats.gather_table_stats(null,'CUSTOMERS')
This behavior has changed from Oracle Database 12c Release 2 onwards. Automatic column group creation is OFF by default and is
controlled by a DBMS_STATS preference called AUTO_STAT_EXTENSIONS. This is how the preference is used:
SQL plan directives will not be used to create column groups automatically (this is the default and recommended setting):
3 See Oracle white paper: Optimizer with Oracle Database 12c Release 2
select dbms_stats.create_extended_stats(NULL,'CUSTOMERS','UPPER(CUST_LAST_NAME))')
from dual;
Just as with column groups, statistics need to be re-gathered on the table after the expression statistics have been defined. After the
statistics have been gathered, an additional column with a system-generated name will appear in the dictionary view
USER_TAB_COL_STATISTICS representing the expression statistics. Just like for column groups, the detailed information about
expression statistics can be found in USER_STAT_EXTENSIONS.
Index Statistics
Index statistics provide information on the number of distinct values in the index (distinct keys), the depth of the index (blevel), the
number of leaf blocks in the index (leaf_blocks), and the clustering factor. The optimizer uses this information in conjunction with other
statistics to determine the cost of an index access. For example, the optimizer will use b-level, leaf_blocks and the table statistics
num_rows to determine the cost of an index range scan (when all predicates are on the leading edge of the index).
Oracle Database 11g enhanced the statistics collection for partitioned tables with the introduction of incremental global statistics. If the
INCREMENTAL preference for a partitioned table is set to TRUE, the DBMS_STATS.GATHER_*_STATS parameter GRANULARITY
includes GLOBAL, and ESTIMATE_PERCENT is set to AUTO_SAMPLE_SIZE, Oracle will gather statistics on the new partition, and
accurately update all global level statistics by scanning only those partitions that have been added or modified, and not the entire table.
Incremental global statistics works by storing a synopsis for each partition in the table. A synopsis is statistical metadata for that
partition and the columns in the partition. Each synopsis is stored in the SYSAUX tablespace. Global statistics are then generated by
aggregating the partition level statistics and the synopses from each partition, thus eliminating the need to scan the entire table to
gather table level statistics (see Figure 12). When a new partition is added to the table, you only need to gather statistics for the new
partition. The global statistics will be automatically and accurately updated using the new partition synopsis and the existing partitions’
synopses.
Note that INCREMENTAL statistics does not apply to the sub-partitions. Statistics will be gathered as normal on the sub-partitions and
on the partitions. Only the partition statistics will be used to determine the global or table level statistics. Below are the steps necessary
to use incremental global statistics.
BEGIN
DBMS_STATS.SET_TABLE_PREFS('SH', 'SALES', 'INCREMENTAL', 'TRUE');
END;
/
BEGIN
DBMS_STATS.GATHER_TABLE_STATS('SH', 'SALES');
END;
/
To check the current setting of INCREMENTAL for a given table, use DBMS_STATS.GET_PREFS.
The Oracle Database includes a preference called INCREMENTAL_STALENESS that allows you to control when partition statistics will
be considered stale and not good enough to generate global level statistics. By default, INCREMENTAL_STALENESS is set to NULL,
which means partition level statistics are considered stale as soon as a single row changes (same as Oracle Database 11g).
Alternatively, it can be set to USE_STALE_PERCENT or USE_LOCKED_STATS. USE_STALE_PERCENT means the partition level
statistics will be used as long as the percentage of rows changed is less than the value of the preference STALE_PRECENTAGE (10%
by default). USE_LOCKED_STATS means if statistics on a partition are locked, they will be used to generate global level statistics
regardless of how many rows have changed in that partition since statistics were last gathered.
In previous releases, it was not possible to generate the necessary statistics on the non-partitioned table to support incremental
statistics during the partition exchange operation. Instead statistics had to be gathered on the partition after the exchange had taken
place, in order to ensure the global statistics could be maintained incrementally.
The necessary statistics (synopsis) can be created on the non-partitioned table prior to the exchange, so that statistics exchanged
during a partition exchange load can be used to maintain incrementally global statistics automatically. The new DBMS_STATS table
preference INCREMENTAL_LEVEL can be used to identify a non-partitioned table that will be used in partition exchange load. By setting
the INCREMENTAL_LEVEL to TABLE (default is PARTITION), Oracle will automatically create a synopsis for the table when statistics
are gathered. This table level synopsis will then become the partition level synopsis after the load the exchange.
For example:
From Oracle Database 12.2, DBMS_STATS provides new algorithm of gathering NDV information which results in much smaller
synopses with a level of accuracy that is similar to the previous algorithm.
When upgrading a pre-Oracle Database 12.2 system (that is using incremental statistics), there are three options:
Old synopses are not immediately deleted and new partitions will have synopses in new format. Mixed formats will potentially
yield less accurate statistics but taking this option will mean that there is no need to re-gather all statistics in the foreground
because the statistics auto job will re-gather statistics on partitions with old synopses so that they will use the new format.
Eventually, all synopses will be in the new format and statistics will be accurate.
For new implementations in Oracle Database 19c, the recommendation is to use the default preference value - REPEAT OR
HYPERLOGLOG so that new format synopses will be used.
Restoring Statistics
When you gather statistics using DBMS_STATS, the original statistics are automatically kept as a backup in dictionary tables, and can be
easily restored by running DBMS_STATS.RESTORE_TABLE_STATS if the newly gathered statistics lead to any kind of problem. The
dictionary view DBA_TAB_STATS_HISTORY contains a list of timestamps when statistics were saved for each table.
The example below restores the statistics for the table SALES to what they were yesterday, and automatically invalidates all of the
cursors referencing the SALES table in the SHARED_POOL. We want to invalidate all of the cursors; because we are restoring
yesterday’s statistics and want them to impact any cursor instantaneously. The value of the NO_INVALIDATE parameter determines if
the cursors referencing the table will be invalidated or not.
BEGIN
DBMS_STATS.RESTORE_TABLE_STATS(ownname => ‘SH’,
tabname => ‘SALES’,
as_of_timestamp => SYSTIMESTAMP-1
force => FALSE,
no_invalidate => FALSE);
END;
/
Pending Statistics
By default, when statistics are gathered, they are published (written) immediately to the appropriate dictionary tables and
instantaneously used by the optimizer. Beginning in Oracle Database 11g, it is possible to gather optimizer statistics but not have them
published immediately; and instead store them in an unpublished, ‘pending’ state. Instead of going into the usual dictionary tables, the
statistics are stored in pending tables so that they can be tested before they are published. These pending statistics can be enabled for
individual sessions, in a controlled fashion, which allows you to validate the statistics before they are published. To activate pending
statistics collection, you need to use one of the DBMS_STATS.SET_*_PREFS procedures to change value of the parameter PUBLISH
from TRUE (default) to FALSE for the object(s) you wish to create pending statistics for.
BEGIN
DBMS_STATS.SET_TABLE_PREFS('SH', 'SALES', 'PUBLISH', 'FALSE');
END;
/
Gather statistics on the object(s) as normal:
BEGIN
DBMS_STATS.GATHER_TABLE_STATS('SH', 'SALES');
END;
/
The statistics gathered for these objects can be displayed using the dictionary views called USER_*_PENDING_STATS. You can tell
the optimizer to use pending statistics by issuing an ALTER SESSION command to set the initialization parameter
OPTIMIZER_USE_PENDING_STATS to TRUE and running a SQL workload. For tables accessed in the workload that do not have
pending statistics, the optimizer will use the current statistics in the standard data dictionary tables. Once you have validated the
pending statistics, you can publish them using the procedure DBMS_STATS.PUBLISH_PENDING_STATS.
BEGIN
DBMS_STATS.PUBLISH_PENDING_STATS('SH', 'SALES');
END;
/
Before exporting statistics, you need to create a table to store the statistics using DBMS_STATS.CREATE_STAT_TABLE. After the table
has been created, you can export statistics from the data dictionary using the DBMS_STATS.EXPORT_*_STATS procedures. Once the
statistics have been packed into the statistics table, you can then use datadump to extract the statistics table from the production
database, and import it into the test database. Once the statistics table is successfully imported into the test system, you can import the
statistics into the data dictionary using the DBMS_STATS.IMPORT_*_STATS procedures.
It is very common with range partitioned tables to have a new partition added to an existing table, and rows inserted into just that
partition. If end-users start to query the newly inserted data before statistics have been gathered, it is possible to get a suboptimal
execution plan due to stale statistics. One of the most common cases occurs when the value supplied in a where clause predicate is
outside the domain of values represented by the [minimum, maximum] column statistics. This is known as an ‘out-of-range’ error. In this
case, the optimizer prorates the selectivity based on the distance between the predicate value, and the maximum value (assuming the
value is higher than the max), that is, the farther the value is from the maximum or minimum value, the lower the selectivity will be.
The "Out of Range" condition can be prevented by using the DBMS_STATS.COPY_TABLE_STATS procedure. This procedure copies the
statistics of a representative source [sub] partition to the newly created and empty destination [sub] partition. It also copies the statistics
of the dependent objects: columns, local (partitioned) indexes, etc. The minimum and maximum values of the partitioning column are
adjusted as follows;
If the partitioning type is HASH the minimum and maximum values of the destination partition are same as that of the source
partition.
If the partitioning type is LIST and the destination partition is a NOT DEFAULT partition, then the minimum value of the
destination partition is set to the minimum value of the value list that describes the destination partition. The maximum value of
the destination partition is set to the maximum value of the value list that describes the destination partition
If the partitioning type is LIST and the destination partition is a DEFAULT partition, then the minimum value of the destination
partition is set to the minimum value of the source partition. The maximum value of the destination partition is set to the
maximum value of the source partition
If the partitioning type is RANGE, then the minimum value of the destination partition is set to the high bound of previous
partition and the maximum value of the destination partition is set to the high bound of the destination partition unless the high
bound of the destination partition is MAXVALUE, in which case the maximum value of the destination partition is set to the high
bound of the previous partition
It can also scale the statistics (such as the number of blocks, or number of rows) based on the given scale_factor. Statistics such as
average row length and number of distinct values are not adjusted and are assumed to be the same in the destination partition.
The following command copies the statistics from SALES_Q3_2011 range partition to the SALES_Q4_2011 partition of the SALES
table and scales the basic statistics by a factor of 2.
BEGIN
DBMS_STATS.COPY_TABLE_STATS('SH','SALES','SALES_Q3_2002','SALES_Q4_2002', 2);
END;
/
Comparing Statistics
One of the key reasons an execution plan can differ from one system to another is because optimizer statistics are different. For
example, statistics may be different in a test environment when compared to production if the data is not in sync. To identify differences
in statistics, the DBMS_STATS.DIFF_TABLE_STATS_* functions can be used to compare statistics for two different sources (denoted
source “A” and source “B”). For example, a table in schema1 can be compared with a table in schema2. It is also possible to compare
the statistics of an individual table at two different points in time or current statistics with pending statistics. For example, comparing
current statistics with yesterday:
select report,
maxdiffpct
from dbms_stats.diff_table_stats_in_history(user,
'CUSTOMERS',
SYSDATE-1,
SYSDATE,
2);
Figure 13: Comparing statistics in CUSTOMERS from one day to the next
On System 1:
exec dbms_stats.create_stat_table('APP1','APP1STAT')
exec dbms_stats.export_table_stats('APP1','CUSTOMERS',stattab=>'APP1STAT',statid=>'mystats1')
select report
from dbms_stats.diff_table_stats_in_stattab('APP1',
'CUSTOMERS',
'APP1STAT',
NULL,
1,
'mystats1');
The “DIFF” functions also compare the statistics of the dependent objects (indexes, columns, partitions), and displays all the statistics
for the object(s) from both sources if the difference between the statistics exceeds a specified threshold. The threshold can be specified
as an argument to the function; the default value is 10%. The statistics corresponding to the first source will be used as the basis for
computing the differential percentage.
Locking Statistics
In some cases, you may want to prevent any new statistics from being gathered on a table or schema by locking the statistics. Once
statistics are locked, no modifications can be made to those statistics until the statistics have been unlocked or unless the FORCE
parameter of the GATHER_*_STATS procedures has been set to TRUE.
Statistics can be locked and unlocked at either the table or partition level.
BEGIN
DBMS_STATS.LOCK_PARTITION_STATS('SH', 'SALES', 'SALES_Q3_2000');
END;
Figure 15: Hierarchy with locked statistics; table level lock trumps partition level unlock
Dynamic statistics allow the optimizer to augment existing statistics to get more accurate cardinality estimates for not only single table
accesses, but also joins and group-by predicates.
So, how and when will dynamic statistics be used? During the compilation of a SQL statement, the optimizer decides whether to use
dynamic statistics or not by considering whether the available statistics are sufficient to generate a good execution plan. If the available
statistics are not enough, dynamic statistics will be used in addition to the existing statistics information. It is typically used to
compensate for missing or insufficient statistics that would otherwise lead to a very bad plan. For the case where one or more of the
tables in the query does not have statistics, dynamic statistics are used by the optimizer to gather basic statistics on these tables before
optimizing the statement. The statistics gathered in this case are not as high a quality or as complete as the statistics gathered using
the DBMS_STATS package. This trade off is made to limit the impact on the compile time of the statement.
There are circumstances where you may want to ask the optimizer to make to be more “aggressive” in its use of dynamic statistics. For
example, you might want to use dynamic statistics for statements containing complex predicate expressions and extended statistics are
not available or cannot be used. For example, consider a query that has non-equality in where clause predicates on two correlated
columns. Standard statistics would not be sufficient and extended statistics cannot be used (because it is not an equality predicate). In
the following simple query against the SALES table, the optimizer assumes that each of the where clause predicates will reduce the
number of rows returned by the query. Based on the standard statistics, it determines the cardinality to be 20,197, when in fact; the
number of rows returned is ten times higher at 210,420.
SELECT count(*)
FROM sh.sales
WHERE cust_id < 2222
AND prod_id > 5;
Figure 16: Execution plan for complex predicates without dynamic sampling
With standard statistics, the optimizer is not aware of the correlation between the CUST_ID and PROD_ID in the SALES table. By setting
OPTIMIZER_DYNAMIC_SAMPLING to level 6, the optimizer will use dynamic statistics to gather additional information about the
complex predicate expression. The additional information provided by dynamic statistics allows the optimizer to generate a more
accurate cardinality estimate, and therefore a better performing execution plan.
Dynamic sampling is controlled by the parameter OPTIMIZER_DYNAMIC_SAMPLING, which can be set to different levels (0-11). These
levels control two different things; when dynamic statistics kicks in, and how large a sample size will be used to gather the statistics.
The greater the sample size, the bigger impact dynamic sampling has on the compilation time of a query.
When set to 11 the optimizer will automatically decide if dynamic statistics will be useful and how much data should be sampled. The
optimizer bases its decision to use dynamic statistics on the complexity of the predicates used, the existing base statistics, and the total
execution time expected for the SQL statement. For example, dynamic statistics will kick in for situations where the optimizer previously
would have used a guess. For example, queries with LIKE predicates and wildcards.
Figure 18: When OPTIMIZER_DYNAMIC_SAMPLING is set to level 11 dynamic sampling will be used instead of guesses
Given these criteria, it’s likely that when set to level 11, dynamic sampling will kick-in more often than it did before. This will extend the
parse time of a statement. In order to minimize the performance impact, the results of the dynamic sampling queries are cached in the
Server Result Cache in Oracle Database 12c Release 1 and (instead) in the SQL plan directives repository from Oracle Database 12c
Release 2 onwards. This allows other SQL statements to share the statistics gathered by dynamic sampling queries. The existence of
persisted dynamic sampling results can be seen in the database view, DBA_SQL_PLAN_DIRECTIVES, where the TYPE column value
is DYNAMIC_SAMPLING_RESULT (from Oracle Database 12c Release 2 onwards).
Adaptive dynamic sampling (i.e. level-11-style dynamic sampling) can be initiated at parse time even if
OPTIMIZER_DYNAMIC_SAMPLING is not set to 11. This can happen for parallel queries on large tables and for serial queries that have
relevant DYNAMIC_SAMPLING SQL plan directives. This behavior is enabled if the database parameter
OPTIMIZER_DYNAMIC_STATISTICS is set to TRUE (the default is FALSE).
System statistics are enabled by default, and are automatically initialized with default values. These defaults work well for most systems
(including Oracle Exadata) so, for this reason, it is not usually necessary to gather them manually. Note that they are not automatically
collected as part of the automatic statistics gathering job.
If you wish to gather system statistics manually, the default values will be overridden and this will affect the cost calculations made by
the Oracle Optimizer. This is likely to change SQL execution plans, so it is important to evaluate the benefit of the change before
implementing it on a production system.
To gather system statistics, DBMS_STATS.GATHER_SYSTEM_STATS can be used during a representative workload time window;
ideally at peak workload times. Alternatively, Oracle Databases on Exadata systems have an Exadata-specific option:
EXEC DBMS_STATS.GATHER_SYSTEM_STATS('EXADATA')
BEGIN
DBMS_STATS.SET_GLOBAL_PREFS('AUTOSTATS_TARGET','ORACLE');
END;
/
Statistics can be manually gathered on the dictionary tables using the DBMS_STATS.GATHER_DICTIONARY_STATS procedure. You
must have both the ANALYZE ANY DICTIONARY, and ANALYZE ANY system privilege, or the DBA role to update dictionary statistics. It
is recommended that dictionary table statistics be maintained on a regular basis in a similar manner to user schemas.
Fixed object statistics are not gathered or maintained by the automatic statistics gathering job prior to Oracle Database 12c. You can
collect statistics manually on fixed objects using the DBMS_STATS.GATHER_FIXED_OBJECTS_STATS procedure.
BEGIN
DBMS_STATS.GATHER_FIXED_OBJECTS_STATS;
END;
/
Some fixed tables will not have statistics even if GATHER_FIXED_OBJECTS_STATS has been executed. This is expected behavior because
some of the tables are explicitly skipped for performance reasons.
Expression Statistics
Beginning with Oracle Database 12c Release 2, optimizer expression tracking captures statistics for expressions that appear in the
database workload and persists this information in the data dictionary. The data is maintained by the Oracle Optimizer and is used by
the In-Memory Expressions (IME) feature of Oracle Database In-Memory.
Figure 19 shows how the statistics can be viewed. Notice how column usage information as well as expression usage is tracked.
The statistics are updated automatically every 15 minutes, but the data can be flushed manually (as demonstrated above). The
evaluation count is an estimate and, similarly, the fixed cost is an estimate of the cost of executing the expression. Note that aggregate
expressions (such as SUM or MAX) are not tracked and expressions that include columns from more than one table are also not
tracked.
The LATEST snapshot presents the latest set of statistics captured and the CUMULATIVE presents the long-term cumulative values.
Now that you have been introduced to what type of statistics are maintained by the Oracle Database, you should consider reading Best
Practices for Gathering Optimizer Statistics with Oracle Database 19c. This white paper covers how to maintain all types of statistics
effectively with minimal management overhead.
Worldwide Headquarters
500 Oracle Parkway, Redwood Shores, CA 94065 USA
Worldwide Inquiries
TELE + 1.650.506.7000 + 1.800.ORACLE1
FAX + 1.650.506.7200
oracle.com
CONNECT W ITH US
Call +1.800.ORACLE1 or visit oracle.com. Outside North America, find your local office at oracle.com/contact.
Copyright © 2019, Oracle and/or its affiliates. All rights reserved. This document is provided for information purposes only, and the contents hereof are
subject to change without notice. This document is not warranted to be error-free, nor subject to any other warranties or conditions, whether expressed
orally or implied in law, including implied warranties and conditions of merchantability or fitness for a particular purpose. We specifically disclaim any
liability with respect to this document, and no contractual obligations are formed either directly or indirectly by this document. This document may not be
reproduced or transmitted in any form or by any means, electronic or mechanical, for any purpose, without our prior written permission. This device has
not been authorized as required by the rules of the Federal Communications Commission. This device is not, and may not be, offered for sale or lease,
or sold or leased, until authorization is obtained. (THIS FCC DISLAIMER MAY NOT BE REQUIRED. SEE DISCLAIMER SECTION ON PAGE 2 FOR
INSTRUCTIONS.)
Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.
Intel and Intel Xeon are trademarks or registered trademarks of Intel Corporation. All SPARC trademarks are used under license and are trademarks or
registered trademarks of SPARC International, Inc. AMD, Opteron, the AMD logo, and the AMD Opteron logo are trademarks or registered trademarks
of Advanced Micro Devices. UNIX is a registered trademark of The Open Group. 1219
White Paper With Oracle Database 19cSQL Plan Management in Oracle Database 19cSql Plan Management In Oracle Database 19c
December 2019December 2019
Author: [OPTIONAL]
Contributing Authors: [OPTIONAL]