Oracle Block Change Tracking
Oracle Block Change Tracking
ABSTRACT
One of the new features in Oracle 10g is Fast Incremental Backups, which require that databases be run with Block Change Tracking enabled. While the usage of Fast Incremental Backups has been discussed quite thoroughly in recent years, the internals of the change-tracking feature implementation in Oracle 10g are still obscure. This presentation will show how change-tracking works inside an Oracle instance, and what CTWR process does. It will illustrate how other processes are involved and discuss the overhead caused by Block Change Tracking. This information is obtained by the means of experiments and advanced research techniques.
DISCLAIMER
I cannot provide any guarantee that presented material is absolutely correct. There is no publicly available documentation on change tracking internals (at least, Im not aware of any) so this paper is based purely on experiments and research plus few hints from more knowledgeable peers that Im very grateful for. Please take this material carefully and make sure you validate my assumptions before making critical decisions either with your own tests or request Oracle support to provide more information if your business requires it.
was made right before the backup. Since, usually, only handful of blocks is changed between incremental backups, RMAN does a lot of useless work reading the blocks not required for backup. Block change tracking provides a way to identify the blocks required for backup without scanning the whole datafile. After that RMAN need only read blocks that are really required for this incremental backup. However, improvement in incremental backup requires some sacrifice during normal database operations. According to Oracle this performance overhead is supposed to be minimal. Nevertheless, by default change tracking is disabled. Oracle 10g introduces a new special background process Check Tracking Writer (CTWR). This process takes care of logging information about changed blocks in block change tracking file. Lets start with a closer look at this file.
To disable:
SQL> ALTER DATABASE DISABLE BLOCK CHANGE TRACKING;
View V$BLOCK_CHANGE_TRACKING can be queried to find out the status of change tracking in
the database.
The bulk of BCT file is occupied by bitmap extents. Bitmaps are the bread and butter of change tracking bitmap blocks store bit flags for every block (every chunk to be precise) in Oracle database. This is what CTWR updates when blocks are changed and what RMAN reads to determine which blocks it needs to backup.
EXTENT MAP
The first 2176 blocks (including first unused block) have predefined layout. The rest of the BCT file is allocated in extents. On Linux I observed extend size 32K or 64 blocks. It can be changed with hidden parameter _bct_file_extent_size. The first block of an extent is extent header and the rest contain the data. View X$KRCEXT exposes extent map. Extent map only indicates whether extent is used or not. X$KRCEXT view is very simple and there are two interesting columns: BNO extent header block number
3 Paper 241
USED flag; 0 unused and 1 used Note that there is not information about extent type.
DATAFILE DESCRIPTORS
Oracle allocates the whole extent for datafile descriptors. Header block has type 0x2F and other 63 blocks with type 0x30. Information about datafile descriptors is externalized via X$KRCFDE fixed table. One 512 byte block contains 4 datafile descriptors. Consequently, one extent with 63 useful blocks fits up to 252 descriptors. Datafile descriptor extents are pre-allocated and formatted based on db_files init.ora parameter. Below are the most important columns: CTFBNO change tracking file block number where descriptor is located FNO absolute file number CHUNK chunk size in database blocks (four 8K blocks for 32K chunk) CRESCN datafile creation SCN CRETIME datafile creation time CURR_LOWSCN start SCN for the current version CURR_HIGHSCN end SCN for the current version; for current version its set to max possible SCN 248-1 CURR_FIRST header block number of the first bitmap extent for current version CURR_LAST header block number of the first bitmap extent for current version CURR_EXTCNT number of extents in the current version CURR_VERCNT current version number CURR_VERTIME current version time; time when the version started HIST_FIRST header block number of the first bitmap extent for previous version HIST_LAST header block number of the first bitmap extent for previous version HIST_EXTCNT number of extents in the previous version HIST_VERCNT current previous number HIST_VERTIME previous version time OLDEST_LOW- start SCN for the oldest bitmap version available in BCT file Oracle should have called the feature chunk change tracking because it keeps track of changes for the whole chunk and not individual blocks. Chunk size seems is be default 32K but can be changed with underscore parameter _bct_chunk_size. Consecutive blocks are joined in one chunk and its considered dirty if any of the blocks is changed. There are four 8K blocks in one chunk, eight 4K blocks and one 32K block, for example. This approach, probably, simplifies implementation and change tracking file sizing doesnt depend on datafile block size. 32K seems to be chosen because this is the smallest chunk to fit maximum Oracle block size on all platforms. It could be that on platforms with max block size less than 32, Oracle tracks changes in smaller chunks, i.e. with higher granularity. Would be interesting to try setting chunk size to 16K and create tablespace with 32K block size interesting what error Oracle would throw.
BITMAP EXTENTS
Bitmap extent associated with a single datafile. One datafile has at least one bitmap extent for each bitmap version stored. The first block is a bitmap extent header. Other 63 blocks contain bitmap. Each bit corresponds to a datafile chunk. When chunk is not changes for particular version the bit is 0. If any of the blocks in the chunk is changed, the bit is set to 1. So far I mentioned bitmap version several times so its good place to describe it.
BITMAP VERSIONS
Oracle tracks which blocks (actually, chunks) are changed between two consecutive incremental backups of a datafile and not only since the last backup. There are separate bitmaps covering each period between incremental backups. This timeframe and associated bitmaps are called versions.
4 Paper 241
Every backup has checkpoint SCN associated with it. This is the SCN of the last checkpoint before the backup. Every change tracking bitmap version has start SCN (or low SCN), which is equal to a checkpoint SCN of the previous backup. Every version except current has end SCN (or high SCN) which is the checkpoint SCN of the next incremental backup. Current version has end SCN set to maximum possible SCN value which is 248-1. Versions are associated with datafile and each datafile has its own set of version even though they might be close to each other in terms of low and high SCNs. Every version has dedicated bitmaps and bitmap extents. We can now refine our understanding of bitmap bits associated with blocks that were changed between versions low and high SCNs are set to 1. Otherwise, they are zeroes. Number of versions to keep is 8 by default and can be changed by hidden parameter _bct_bitmaps_per_file.
BITMAP BLOCK
Every 512 bytes block provides 488 bytes for bitmaps which represent 3904 bits and can, therefore, cover 3904 chunks. Recall that every chunk is 32K. One block can track changes for 122 MB of a datafile (3904 * 32K). The whole extent contains 63 bitmap blocks and covers up to 7686 MB of one datafile. Note that this is only for one version. If file is resized and doesnt fit into existing number of extents, additional extents are allocated. Bitmaps are externalized via X$KRCBIT fixed table. This X$ table contains row for every bit that is set to 1 and nothing for bits set to 0. This makes sense as only dirty chunks are of interests for backups. In other words, each row in X$KRCBIT represents a dirty chunk in one of the version. X$KRCBIT has the following columns: CTFBNO block number of extent header (X$KRCFBH) FNO absolute file number of a datafile (X$KRCFBH) VERCNT version number (X$KRCFBH) VERTIME timestamp when this version was started (X$KRCFBH) BNO first block number of a chunk in the datafile BCT chunks size in datafile blocks (X$KRCFDE.CHUNK) All the columns are carried over from the extent header or the datafile descriptor except the first block number of a chunk that is inferred from bit offset.
Note that in this case maximum number of existing datafiles can be up to:
318 extents / 8 version = 39 datafiles.
Paper 241
In real life, some datafiles are just few hundred megabytes and others sized in tens of megabytes so there is no simple formula to fit all cases. Here is the complete statement that can be used to predict maximum BCT file size based in the current database when all datafiles backed up with incremental backup at least 7 times:
SELECT(( (SELECT SUM(ceil(bytes /(7686 * 1024 * 1024))) * 8 bitmap_ext FROM v$datafile) + (SELECT ceil(VALUE / 252) file_descr_ext FROM v$parameter WHERE name = 'db_files') + 1) * 32 + 1088) / 1024 bct_file_size_mb FROM dual;
The statement is valid only for single instance. In RAC, every instance will use its own bitmap extents so number of bitmat extents should be multiplied by the number of RAC instance. Oracle documentation has the following note about BCT file sizing: For each datafile, a minimum of 320K of space is allocated in the change tracking file, regardless of the size of the file. Thus, if you have a large number of relatively small datafiles, the change tracking file is larger than for databases with a smaller number of larger datafiles containing the same data. Actually, space is not allocated right away only one bitmap extent is allocated initially. As soon as we start performing incremental backups, new extents for new versions of bitmaps get allocated. I believe that there are only 8 extents required for one small datafile one for each version including current. However, Ive been informed that there is 8 versions kept plus current version and one more extent for some overhead. I havent been able to identify this overhead and tests showed that 8 versions include the current one. Thus, I believe that 256K is only required minimum for a datafile providing its a part of incremental backup.
ADDING DATAFILE
CREATE TABLESPACE TBS1 DATAFILE SIZE 128M;
First of all new extent is allocated. Extent allocation is not a straightforward procedure. As far as my observations go, there is some extents pre-allocated as reserve. It seems that column RES_EXTCNT of table X$KRCCDS returns number of extents in the reserve list and column RES_FIRST in the same table shows a header block number of the first reserved extent. So Oracle either take extent from reserve or allocates a new one and mars it as used which we can see in extent map table X$KRCEXT. Next step is to format a new datafile descriptor (X$KRCFDE). Current version start SCN is set to zero and end SCN is set to 248-1. Version number is set to 1 unless there already was existing datafile with such absolute file number. It seems that in this case version is simply set to the next integer number. Bitmap extent header block gets formatted (X$KRCFBH) as well as bitmap blocks themselves. Since Oracle formats (read changes) several data blocks in the new datafile, first chunks are marked as dirty in the bitmap blocks.
DROPPING TABLESPACE
Bitmaps blocks are cleared (X$KRCBIT), bitmap extent headers are cleared (X$KRCFBH) as well as datafile descriptor (X$KRCFDE). Extents are marked as unused in extent map (X$KRCEXT) but not immediately after some time.
6 Paper 241
CHANGING BLOCKS
As I mentioned already, changes are tracked for 32K chunks of datafile so any block change in the chunk will render it dirty and bit must be set to 1. If the chunk was already dirty than nothing will change in the BCT file. Basically, if previous block SCN is more than corresponding to the datafile X$KRCFDE.CURR_LOWSCN.
DATAFILE RESTORE
If single datafile needs to be restored, BCT file keeps information on versions and it can be successfully backed up later using change tracking optimization to read just dirty chunks. I did a small test: Removed datafile in OS Set datafile to offline mode Restored from backup using RESTORE DATAFILE <#>; Recover that datafile using RECOVER DATAFILE <#>; Online datafile All version information was preserved.
8 VERSIONS IMPACT
Lets walk through an example of bi-weekly incremental backup cycle. 2 TB data warehouse database containing 5 years worth of data is backed up every other Sunday with incremental level 0 backup. Full backup is running 20 hours. For the next 13 days incremental level 1 cumulative backup is taken. Cumulative level 1 backup means that RMAN will need to copy blocks changed since last level 0 backup. Backup is running every morning after nightly ETL batch completes. The batch changes about 1% (including new data loaded, updated indexes and changes to the staging tables). Half of changed blocks are in staging area. Another half is new data loaded and indexes updated. This means that first incremental level 1 cumulative backup is 0.5% of the database or 10 GB. The next level 1 cumulative backup adds 0.25% of the database size to previous size so sizes are 10 GB, 15 GB, 20 GB and so on ending with 70 GB on the last level 1 backup before level 0 backup. Incremental backups take less an hour so they finish before users start their day and hit database with their requests. Lets assume that we enabled change tracking just before level 0 incremental backup and version number 1 is the current version. Incremental level 0 backup starts and as soon as each datafile is backed up, the current version becomes 2.
7 Paper 241
Monday incremental backup kicks off and version 3 is the current version. No backup is purged. RMAN is happily using change tracking file to determine which blocks are needed for backup RMAN scans the bitmaps since last level 0 backup version 2 bitmaps. Tuesday - incremental backup kicks off and version 4 is the current version. No backup is purged. RMAN again scans the bitmaps since last level 0 backup. This time it needs bitmaps for versions 2 and 3. Some blocks might be marked dirty in both versions. In fact, those are blocks in the staging area representing 0.25% of the database size as we stated above. Backups for the next days until Sunday are working under the same scenario using bitmaps since version 2. Sundays incremental level 1 cumulative backup does the same but it now purges oldest bitmap version. The current version is switched to number 9 on Sundays backup and version 1 needs to be purged Oracle keeps only 8 versions including current version. This is not a problem and RMAN still can use versions from 2 till 8 to determine which blocks have been changed and must be backed up. Second Monday - incremental backup kicks off and version 10 becomes the current version. Bitmaps of version 2 are purged. Now RMAN cannot locate all the required versions to find all the dirty blocks changed since incremental level 0 backup it misses bitmap version 2 and cannot identify blocks changed between the last level 0 and the first level 1 incremental backup. As a result, RMAN has to fall back to the old incremental backup method and scan the whole database. The consequences are 10 hours incremental backup, IO subsystem performance degradation, users are unhappy because their requests take few times longer than usual.
Running incremental backups for a while its possible to collect historical ration between number of blocks read and number and size of the backup. This would as well account for compression. Note that the query above is just an example and it has the following limitations: Chunk size is hard coded to 32K (could it vary on different platforms?) First block overhead is not accounted for No special case when required bitmap version is not available (purged) and the whole datafile must be read No case with backup optimization for level 0 (v$datafile_backup.used_optimization) No case when no data blocks in datafile is changed (no bitmap version but the first block must be backed up anyway) Only single datafile No accounting for unavailable base incremental backup
8 Paper 241
Bitmap updates after direct path writes done during the CTWR heartbeat are not instrumented either and cause me quite a bit of confusion because I couldnt catch the moment how bitmaps get updated in this case. Again, strace dotted all the i's and crossed the ts.
LATCHES
Latch change tracking state change latch is probably there to protect shared pool area change tracking sta. This latch is taken by CTWR process during enabling and disabling change tracking in the database as well as during instance startup when CTWR process starts. Two other related latches, change tracking optimization SCN and change tracking consistent SCN, have probably something to do with the way CTWR tracks SCNs and, perhaps, optimization so that CTWR doesnt need to update bits for the blocks that are changed over and over again while the same version of bitmap is current.
ENQUEUES
Several new enqueues related to block change tracking mechanism but so far I cant clearly describe their role: enq: CT - global space management enq: CT - local space management enq: CT - change stream ownership enq: CT state enq: CT - state change gate 1 enq: CT - state change gate 2 enq: CT - CTWR process start/stop enq: CT reading
HIDDEN PARAMETERS
_bct_public_dba_buffer_size total size of all public change tracking dba buffers, in _bct_initial_private_dba_buffer_size initial number of entries in the private change tracking dba buffers
10 Paper 241
_bct_bitmaps_per_file number of bitmaps to store for each datafile _bct_file_block_size block size of change tracking file, in bytes _bct_file_extent_size extent size of change tracking file, in bytes _bct_chunk_size change tracking datafile chunk size, in bytes _bct_crash_reserve_size change tracking reserved crash recovery SGA space, in bytes _bct_buffer_allocation_size size of one change tracking buffer allocation, in bytes _bct_buffer_allocation_max maximum size of all change tracking buffer allocations, in bytes _bct_buffer_allocation_min_extents mininum number of extents to allocate per buffer allocation _bct_fixtab_file change tracking file for fixed tables
POSSIBLE BOTTLENECKS
I didnt observe any noticeable and measurable impact of enabling block change tracking. However, it doesnt mean there are none. Since all IO performed by CTWR asynchronously in regard to the foreground sessions and number of IOs is minimal, there are little chances that this would be a bottleneck for the users sessions. The only problem would be when change tracking
11 Paper 241
buffer becomes full and CTWR cannot keep up flushing it to the BCT file. However, this would probably be a side-effect of other issues on the system such as extreme CPU capacity shortage, which is itself often not a root cause, or very poor IO performance. In both cases chances are that more sensitive processes of Oracle instance are impacted first.
REFERENCES
1. Advanced Research Techniques by Tanel Pder 2. Backup and Recovery Basics, Section 4.4 3. Metalink Notes 262853.1 and 306112.1
12
Paper 241