HADR
HADR
PATRICK ZENG & LIWEN YEOW IBM-SAP INTEGRATION & SUPPORT CENTRE IBM SOFTWARE SOLUTIONS TORONTO LAB
0 4 A u g u st 2 0 0 5
USE DB2 V8.2 HIGH AVAILABILITY DISASTER RECOVERY (HADR) IN AN SAP IMPLEMENTATION
0 4 A u g u st 2 0 0 5
Table of Contents
1. 2. EXECUTIVE SUMMARY ............................................................................................................ 5 INTRODUCTION TO HIGH AVAILABILITY AND DISASTER RECOVERY................... 6 2.1 2.2 3. HIGH AVAILABILITY.................................................................................................................. 6 DISASTER RECOVERY ................................................................................................................ 7
INTRODUCTION TO HADR AND CLIENT REROUTE ........................................................ 8 3.1 3.2 3.3 3.4 3.5 HIGH AVAILABILITY DISASTER RECOVERY (HADR) ................................................................ 8 HADR RESTRICTIONS ............................................................................................................... 9 AUTOMATIC CLIENT REROUTE .................................................................................................. 9 AUTOMATIC CLIENT REROUTE LIMITATIONS .......................................................................... 10 AUTOMATIC CLIENT REROUTE AND HADR ............................................................................ 11
4.
SETTING UP HADR AND CLIENT REROUTE IN SAP ENVIRONMENT ....................... 12 4.1 4.2 4.3 4.4 4.5 INSTALL SAP CENTRAL INSTANCE .......................................................................................... 12 INSTALL SAP DATABASE INSTANCE ON THE PRIMARY DATABASE HOST ................................. 12 INSTALL SAP DATABASE INSTANCE ON THE STANDBY DATABASE HOST ................................. 15 SET UP HADR ON THE PRIMARY AND THE STANDBY DATABASE HOST .................................... 15 INSTALL ADDITIONAL SAP APPLICATION SERVERS .................................................................. 21
5.
RUNNING SAP WITH HADR AND CLIENT REROUTE..................................................... 22 5.1 5.2 5.3 5.4 5.5 NORMAL OPERATION .............................................................................................................. 22 SWITCHING THE ROLES OF THE PRIMARY AND STANDBY DATABASE ...................................... 28 FAILOVER WHEN THE PRIMARY DATABASE IS PHYSICALLY DOWN .......................................... 33 REINTEGRATING A DATABASE AFTER A FAILOVER ................................................................... 36 RESTRICTIONS AND RECOMMENDATIONS ................................................................................ 39
6.
PERFORMANCE IMPACT OF HADR .................................................................................... 40 6.1 6.2 6.3 FAILOVER TIME ....................................................................................................................... 40 IMPACT OF HADR SYNCHRONIZATION MODE......................................................................... 40 IMPACT OF HADR LOG RECEIVING BUFFER SIZE ................................................................... 40
7.
TAKING BACKUPS FROM HADR STANDBY IMAGE....................................................... 42 7.1 7.2 7.3 DATABASE BACKUP ON THE STANDBY SERVER ...................................................................... 42 RESTORING TO PRIMARY SERVER ............................................................................................ 43 RESTORING TO STANDBY SERVER ........................................................................................... 45
8.
0 4 A u g u st 2 0 0 5
IBM makes no warranties or representations with respect to the content hereof and specifically disclaims any implied warranties of merchantability or fitness for any particular purpose. IBM assumes no responsibility for any errors that may appear in this document. The information contained in this document is subject to change without any notice. IBM reserves the right to make any such changes without obligation to notify any person of such revision or changes. IBM makes no commitment to keep the information contained herein up to date.
SAP and SAP Business Information Warehouse are registered trademarks of SAP AG Linux is a trademark of Linux Torvalds IBM, RS/6000, AIX, OS/390, OS/400 and DB2 Universal database are registered trademarks of IBM Corporation
0 4 A u g u st 2 0 0 5
1. Executive summary
In todays world where businesses are serving customers from around the world on a 24x7 schedule, customers expect their computing systems to be 100% reliable. DB2 UDB has always been in the forefront of databases in providing such industrial strength reliability. In DB2 UDB V8.2, DB2 introduced two new features that will further provide customers with options to implement High Availability and Disaster Recovery (HADR) and automatic Client Rerouting capabilities. These features protect the customers from a production downtime in the event of a local hardware failure or a catastrophic site failure by duplicating the workload of the database to a separate site. These features are shipped as part of the standard packaging of DB2 UDB ESE. In this paper, we will show how these DB2 UDB features can be used with the world leading ERP software, SAP R/3 4.7 Enterprise. We will walk the reader through the steps necessary to set up both DB2 UDB for HADR and Client Reroute with SAP R/3 4.7 Enterprise. Details of procedures and examples of output are provided for the reader to follow each step carefully and be able to compare with their own experience.
0 4 A u g u st 2 0 0 5
0 4 A u g u st 2 0 0 5
0 4 A u g u st 2 0 0 5
A partial site failure can be caused by a hardware, network, or software (DB2 or operating system) failure. Without HADR, a partial site failure requires the database management system (DBMS) server or the machine where the database resides to be rebooted. The length of time it takes to restart the database and the machine where it resides is unpredictable. It can take several minutes before the database is brought back to a consistent state and made available. With HADR, the standby database can take over in seconds. Further, you can redirect the clients that were using the original primary database to the standby database (new primary database) by using automatic client reroute, or retry logic in the application. A complete site failure can occur when a disaster, such as a fire, causes the entire site to be destroyed. Because HADR uses TCP/IP for communication between the primary and standby databases, they can be situated in different locations. For example, your primary database might be located at your head office in one city, while your standby database is located at your sales office in another city. If a disaster occurs at the primary site, data availability is maintained by having the remote standby database take over as the
0 4 A u g u st 2 0 0 5
primary database with full DB2 functionality. After a takeover operation occurs, you can bring the original primary database back up and return it to its primary database status; this is known as failback. After the failed original, primary server is repaired, it can rejoin the HADR pair as a standby database, if the two copies of the database can be made consistent. After the original primary database is reintegrated into the HADR pair as the standby database, you can switch the roles of the databases to enable the original primary database to once again be the primary database. With HADR, you can choose the level of protection you want from potential loss of data by specifying one of three synchronization modes: synchronous, near synchronous, or asynchronous.
HADR is supported on DB2 UDB Enterprise Server Edition (ESE) as a no-charge option and on DB2 Express and Workgroup Editions as a separately charged option. However, it is not supported when you have multiple database partitions on ESE. The primary and standby databases must have the same operating system version and the same version of DB2 UDB, except for a short time during a rolling upgrade. The DB2 UDB release on the primary and standby databases must be the same bit size (32 or 64 bit). Reads on the standby database are not supported. Clients cannot connect to the standby database. Log archiving can only be performed by the current primary database. Normal backup operations are not supported on the standby database. Non-logged operations, such as changes to database configuration parameters and to the recovery history file, are not replicated to the standby database. Load operations with the COPY NO option specified are not supported. Use of Data Links is not supported.
0 4 A u g u st 2 0 0 5
o o
The DB2 UDB server installed in the alternate host server must be the same version (but could have a higher FixPak) when compared to the DB2 UDB installed on the original host server. Regardless of whether you have authority to update the database directory at the client machine, the alternate server information is always kept in memory. In other words, if you did not have authority to update the database directory (or because it is a read-only database directory), other applications will not be able to determine and use the alternate server because the memory is not shared among applications. The same authentication is applied to all alternate locations. This means that the client will be unable to re-establish the database connection if the alternate location has a different authentication type than the original location. When there is a communication failure, all session resources, such as global temporary tables, identity, sequences, cursors, server options (SET SERVER OPTION) for federated processing, and special registers, are all lost. The application is responsible to re-establish the session resources in order to continue processing the work. You do not have to run any of the special register statements after the connection is re-established because DB2 UDB will replay the special
10
0 4 A u g u st 2 0 0 5
register statements that were issued before the communication error. However, some of the special registers will not be replayed; they are: o o o o Note: If the client is using CLI, JCC Type 2 or Type 4 drivers, after the connection is re-established, then for those SQL statements that have been prepared against the original server, they are implicitly re-prepared with the new server. However, for embedded SQL routines (for example, SQC or SQX applications), they will not be re-prepared. An alternate way to do automatic client reroute is to use the DNS entry to specify an alternate IP address for a DNS entry. The idea is to specify a second IP address (an alternate server location) in the DNS entry: The client would not know about an alternate server, but at connect time, DB2 UDB would alternate between the IP addresses for the DNS entry. SET ENCRYPTPW SET EVENT MONITOR STATE SET SESSION AUTHORIZATION SET TRANSFORM GROUP
11
0 4 A u g u st 2 0 0 5
In our test, we installed SAP R/3 Enterprise 4.Ext 200 Central Instance on Host C (lunen). Since the database instance resides on a remote host, you need to provide the Database Host name (phillipe) and the Communication Port number (54700) during the installation.
b.
12
0 4 A u g u st 2 0 0 5
Please enter the name (system ID) of the CENTRAL (!) monitoring mySAP system. R/3 system ID : SVT additional CENTRAL system y/[n] ? : n INFO: creating ini file /usr/sap/tmp/sapccmsr/sapccmsr.ini. INFO: Checking Distributed Statistical Records Library dsrlib.so INFO: Distributed Statistical Records not configured, dsrlib.so not found. INFO: CCMS version 20040229, 32 bit, multithreaded, Non-Unicode compiled at Aug 29 2004 systemid 38(Intel x86 with Linux) relno 6200 patch text patch collection 2004/4, OSS note 694057 patchno 1622 intno 20020600 running on phillipe Linux 2.4.19-64GB-SMP #1 SMP Mon Oct 21 18:48:05 UTC 2002 i686 pid 1267 INFO: Created Shared Memory Key 1008 (size 20000000) INFO: Connected to Monitoring Segment [CCMS Monitoring Segment for phillipe, created with version CCMS version 20040229, 32 bit multithreaded, compiled at Aug 29 2004, kernel 6200_20020600_1622, platform 38(Intel x86 with Linux)] segment status WARM_UP segment started at Fri Sep 24 10:39:15 2004 segment version 20040229 **************************************************** ********************** SVT **********************
**************************************************** Please enter the logon info for an admin user of the central monitoring mySAP system [SVT]. The user should have system administrator privileges client [000] user language [EN] hostname of SVT message server use Load Balancing n/[y] ? group [PUBLIC] [optional] route string trace level [0] : : ddic : : lunen : n : : :
please enter password for [SVT:000:ddic]: Try to connect ... INFO: [SVT:000:DDIC] connected to SVT, host lunen, System Nr. 00, traceflag [ ]
13
0 4 A u g u st 2 0 0 5
INFO: SVT release is 620 , (kernel release 620 ) This program will act as registered RFC server lateron. Please enter the info for a gateway of monitoring system SVT gateway info: host: service: [lunen] [sapgw00] n/[y] ? : y
Gateway info ok
**** CCMS agent sapccmsr: RFC client functionality **** This CCMS agent program sapccmsr is able to actively report alert data into the monitoring mySAP.com system [SVT]. To enable this feature, you have to setup the user CSMREG in [SVT]. (refer to SAP Online-help, search for 'CSMREG'). Alternatively use any user in [SVT] that has at least authorization ). to call per RFC function groups SALC, SALF, SALH, SALS, SAL_CACHE_RECEIVE, SCSMBK_DATA_OUT, SCSMBK_RECONCILE, SCSM_CEN_TOOL_MAIN, SYST, RFC1 After entering the RFC logon info for the user, the password will be stored here on this machine in a Secure Storage. client [000] : user [CSMREG] : ddic language [EN] : hostname of SVT message server [lunen] : use Load Balancing n/[y] ? : n hostname of application server [lunen] : system number (00 - 98) [00] : [optional] route string : trace level [0] : please enter password for [SVT:000:ddic]: Try to connect ... INFO: [SVT:000:DDIC] connected to SVT, host lunen, System Nr. 00, traceflag [ ] INFO: SVT release is 620 , (kernel release 620 ), CCMS version 20011212 INFO: RFC logon info for [SVT:000:ddic] can be updated at any time with -R option: sapccmsr -R <params> INFO: Updated saprfc.ini in agent work directory /usr/sap/tmp/sapccmsr INFO: Connected to SVT, CCMS version in ABAP: 20011212 INFO: successfully registered at SVT INFO: Updated config file /usr/sap/tmp/sapccmsr/csmconf. Start agent? n/[y] : y
INFO: Checking shared memory status of sapccmsr INFO: CCMS agent sapccmsr working directory is /usr/sap/tmp/sapccmsr INFO: CCMS agent sapccmsr config file is /usr/sap/tmp/sapccmsr/csmconf INFO: Central Monitoring System is [SVT]. (found in config file) INFO: Checking shared memory status of sapccmsr
14
0 4 A u g u st 2 0 0 5
INFO: CCMS version compiled at systemid relno patch text patchno intno running on UTC 2002 i686 pid 20040229, 32 bit, multithreaded, Non-Unicode Aug 29 2004 38(Intel x86 with Linux) 6200 patch collection 2004/4, OSS note 694057 1622 20020600 phillipe Linux 2.4.19-64GB-SMP #1 SMP Mon Oct 21 18:48:05 1267
4.4 Set up HADR on the primary and the standby database host
Steps: 1) Enable log archiving, configure other parameters on the primary database, and make an offline backup. For example: Listing 2. Update, configure, and back up DB on primary
DB2 UPDATE DB CFG FOR svt USING indexrec ACCESS DB2 UPDATE DB CFG FOR svt USING logindexbuild ON DB2 UPDATE DB CFG FOR svt USING logarchmeth1 "DISK:/db2/SVT/log_archive" DB2 DEACTIVATE DB svt DB2 BACKUP DATABASE svt TO "/db2/backup"
2) Move the backup image to the standby database host, and restore the database to the rollforward pending state. For example: Listing 3. Restore database on standby
DB2 RESTORE DB svt FROM /db2/backup REPLACE HISTORY FILE SQL2539W Warning! Restoring to an existing database that is the same as the ba ckup image database. The database files will be deleted. Do you want to continue ? (y/n) y
3) Create HADR local and remote service name in /etc/service file on both primary and standby database servers. For example: Listing 4. Update to /etc/services file
SVT_HADR_1 54711/tcp SVT_HADR_2 54712/tcp
4) Set up HADR and the client reroute information for the primary database. Fox example: Listing 5. Update primary server configuration for HADR and Client Reroute
15
0 4 A u g u st 2 0 0 5
--- Configure databases for client reroute - Phillipe - DB2SVT - SVT -UPDATE ALTERNATE SERVER FOR DATABASE SVT USING HOSTNAME dartagnan PORT 54700; --- Update HADR configuration parameters on primary database - Phillipe - DB2SVT - SVT -UPDATE DB CFG FOR SVT USING HADR_LOCAL_HOST phillipe; UPDATE DB CFG FOR SVT USING HADR_LOCAL_SVC SVT_HADR_1; UPDATE DB CFG FOR SVT USING HADR_REMOTE_HOST dartagnan; UPDATE DB CFG FOR SVT USING HADR_REMOTE_SVC SVT_HADR_2; UPDATE DB CFG FOR SVT USING HADR_REMOTE_INST DB2SVT; UPDATE DB CFG FOR SVT USING HADR_SYNCMODE SYNC; UPDATE DB CFG FOR SVT USING HADR_TIMEOUT 120;
5) Set up HADR and the client reroute information for the standby database. Fox example: Listing 6. Update standby server configuration for HADR and Client Reroute
--- Configure databases for client reroute - Dartagnan - DB2SVT - SVT -UPDATE ALTERNATE SERVER FOR DATABASE SVT USING HOSTNAME phillipe PORT 54700; --- Update HADR configuration parameters on standby database - Dartagnan - DB2SVT - SVT -UPDATE DB CFG FOR SVT USING HADR_LOCAL_HOST dartagnan; UPDATE DB CFG FOR SVT USING HADR_LOCAL_SVC SVT_HADR_2; UPDATE DB CFG FOR SVT USING HADR_REMOTE_HOST phillipe; UPDATE DB CFG FOR SVT USING HADR_REMOTE_SVC SVT_HADR_1; UPDATE DB CFG FOR SVT USING HADR_REMOTE_INST DB2SVT; UPDATE DB CFG FOR SVT USING HADR_SYNCMODE SYNC; UPDATE DB CFG FOR SVT USING HADR_TIMEOUT 120;
6) Start up HADR on the standby database first. For example: Listing 7. Start HADR on standby server first
--- Start HADR on standby database - Dartagnan - DB2SVT - SVT -DEACTIVATE DATABASE SVT; START HADR ON DATABASE SVT AS STANDBY
7) Start up HADR on the primary database. For example: Listing 8. Start HADR on primary server next
--- Start HADR on primary database - Phillipe - DB2SVT - SVT -DEACTIVATE DATABASE SVT; START HADR ON DATABASE SVT AS PRIMARY
After these steps, check the db2diag.log on both the primary and the standby database instances to see whether the HADR is set up correctly. You should be able to see the following entries in the db2diag.log on the primary database:
16
0 4 A u g u st 2 0 0 5
17
0 4 A u g u st 2 0 0 5
FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrSetHdrState, probe:10000 CHANGE : HADR state set to None (was None) 2004-10-01-16.24.05.601883-240 E22208G30 LEVEL: Event PID : 2425 TID : 1024 PROC : db2hadrs (SVT) INSTANCE: db2svt NODE : 000 FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrSetHdrState, probe:10000 CHANGE : HADR state set to S-Boot (was None) 2004-10-01-16.24.05.614868-240 I22516G323 LEVEL: Warning PID : 19960 TID : 1024 PROC : db2agnti (SVT) INSTANCE: db2svt NODE : 000 DB : SVT APPHDL : 0-68 FUNCTION: DB2 UDB, recovery manager, sqlpReplayMaster, probe:300 MESSAGE : Starting Replay Master on standby. 2004-10-01-16.24.05.615199-240 E22840G31 LEVEL: Event PID : 2425 TID : 1024 PROC : db2hadrs (SVT) INSTANCE: db2svt NODE : 000 FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrSetHdrState, probe:10000 CHANGE : HADR state set to S-LocalCatchup (was S-Boot) 2004-10-01-16.24.05.631105-240 I23158G39 LEVEL: Severe PID : 2425 TID : 1024 PROC : db2hadrs (SVT) INSTANCE: db2svt NODE : 000 FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrEduAcceptEvent, probe:20280 MESSAGE : Failed to connect to primary. rc: DATA #1 : Hexdump, 4 bytes 0xBFFFAE3C : 1900 0F81 .... 2004-10-01-16.24.05.641944-240 I23556G341 LEVEL: Severe PID : 2425 TID : 1024 PROC : db2hadrs (SVT) INSTANCE: db2svt NODE : 000 FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrEduAcceptEvent, probe:20280 RETCODE : ZRC=0x810F0019=-2129723367=SQLO_CONN_REFUSED "Connection refused" 2004-10-01-16.24.05.661145-240 E23898G339 LEVEL: Warning PID : 19960 TID : 1024 PROC : db2agnti (SVT) INSTANCE: db2svt NODE : 000 DB : SVT APPHDL : 0-68 FUNCTION: DB2 UDB, recovery manager, sqlpReplayMaster, probe:920 MESSAGE : ADM1602W Rollforward recovery has been initiated. 2004-10-01-16.24.05.661511-240 E24238G382 LEVEL: Warning PID : 19960 TID : 1024 PROC : db2agnti (SVT) INSTANCE: db2svt NODE : 000 DB : SVT APPHDL : 0-68 FUNCTION: DB2 UDB, recovery manager, sqlpReplayMaster, probe:1740 MESSAGE : ADM1603I DB2 is invoking the forward phase of the database rollforward recovery. 2004-10-01-16.24.05.661743-240 I24621G413 LEVEL: Warning PID : 19960 TID : 1024 PROC : db2agnti (SVT) INSTANCE: db2svt NODE : 000 DB : SVT APPHDL : 0-68 FUNCTION: DB2 UDB, recovery manager, sqlpForwardRecovery, probe:720 DATA #1 : String, 103 bytes Invoking database rollforward forward recovery, lowtranlsn 00000006F84C000C minbufflsn 00000006F84C000C 2004-10-01-16.24.05.675999-240 I25035G353 PID : 19960 TID : 1024 LEVEL: Warning PROC : db2agnti (SVT)
18
0 4 A u g u st 2 0 0 5
db2svt NODE : 000 DB : SVT 0-68 DB2 UDB, recovery manager, sqlprecm, probe:2000 Using parallel recovery with 5 agents 2QSets 108 queues and 64 chunks
2004-10-01-16.24.05.766940-240 I25389G373 LEVEL: Error PID : 24240 TID : 1024 PROC : db2logmgr (SVT) INSTANCE: db2svt NODE : 000 FUNCTION: DB2 UDB, data protection, sqlpgRetrieveLogDisk, probe:3500 RETCODE : ZRC=0x860F000A=-2045837302=SQLO_FNEX "File not found." DIA8411C A file "S0000239.LOG" could not be found. 2004-10-01-16.24.05.779207-240 I25763G294 LEVEL: Warning PID : 24258 TID : 1024 PROC : db2shred (SVT) INSTANCE: db2svt NODE : 000 APPHDL : 0-68 FUNCTION: DB2 UDB, recovery manager, sqlpshrEdu, probe:18300 MESSAGE : Maxing hdrLCUEndLsnRequested 2004-10-01-16.24.05.839260-240 E26058G333 LEVEL: Event PID : 2425 TID : 1024 PROC : db2hadrs (SVT) INSTANCE: db2svt NODE : 000 FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrSetHdrState, probe:10000 CHANGE : HADR state set to S-RemoteCatchupPending (was S-LocalCatchup) 2004-10-01-16.24.18.040052-240 E26392G341 LEVEL: Event PID : 2425 TID : 1024 PROC : db2hadrs (SVT) INSTANCE: db2svt NODE : 000 FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrSetHdrState, probe:10000 CHANGE : HADR state set to S-RemoteCatchupPending (was S-RemoteCatchupPending) 2004-10-01-16.24.18.059286-240 E26734G334 LEVEL: Event PID : 2425 TID : 1024 PROC : db2hadrs (SVT) INSTANCE: db2svt NODE : 000 FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrSetHdrState, probe:10000 CHANGE : HADR state set to S-RemoteCatchup (was S-RemoteCatchupPending) 2004-10-01-16.24.18.059445-240 I27069G30 LEVEL: Warning PID : 2425 TID : 1024 PROC : db2hadrs (SVT) INSTANCE: db2svt NODE : 000 FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrSPrepareLogWrite, probe:10260 MESSAGE : RCUStartLsn 00000006F84C000C 2004-10-01-16.24.23.803640-240 E27672G324 LEVEL: Event PID : 2425 TID : 1024 PROC : db2hadrs (SVT) INSTANCE: db2svt NODE : 000 FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrSetHdrState, probe:10000 CHANGE : HADR state set to S-NearlyPeer (was S-RemoteCatchup) 2004-10-01-16.24.23.886153-240 E27997G315 LEVEL: Event PID : 2425 TID : 1024 PROC : db2hadrs (SVT) INSTANCE: db2svt NODE : 000 FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrSetHdrState, probe:10000 CHANGE : HADR state set to S-Peer (was S-NearlyPeer)
You can also monitor the HADR status from the database snapshots. Listing 11. Monitor HADR using snapshot for database on primary server
HADR Status
19
0 4 A u g u st 2 0 0 5
Role = Primary State = Peer Synchronization mode = Sync Connection status = Connected, 10/01/2004 16:24:18.039167 Heartbeats missed = 0 Local host = phillipe Local service = SVT_HADR_1 Remote host = dartagnan Remote service = SVT_HADR_2 Remote instance = DB2SVT timeout(seconds) = 120 Primary log position(file, page, LSN) = S0000239.LOG, 8345, 00000006FA5593C3 Standby log position(file, page, LSN) = S0000239.LOG, 8342, 00000006FA556E4A Log gap running average(bytes) = 11643
8) Obtain the alternate server information on the SAP Central Instance. With the client reroute information updated on both the primary and standby database server configurations and HADR started, the client reroute information (alternate server information) will be used to populate the database directory cache on the client machine upon establishing a connection to the Primary database. In this case, the SAP Central Instance host is the client machine. This is easily done by just establishing a connection to the primary database. Listing 13. Populate the DB directory cache on the client machine with alternate server info
lunen:svtadm 289> db2 list node directory Node Directory Number of entries in the directory = 1 Node 1 entry: Node name Comment Directory entry type Protocol Hostname Service name = = = = = = NODESVT TCPIP Node for database SVT LOCAL TCPIP phillipe sapdb2SVT
20
0 4 A u g u st 2 0 0 5
Number of entries in the directory = 1 Database 1 entry: Database alias Database name Node name Database release level Comment Directory entry type Catalog database partition number Alternate server hostname Alternate server port number = = = = = = = = = SVT SVT NODESVT a.00 Remote -1
lunen:svtadm 294> db2 connect to svt user sapsvt Enter current password for sapsvt: Database Connection Information Database server SQL authorization ID Local database alias = DB2/LINUX 8.2.0 = SAPSVT = SVT
lunen:svtadm 295> db2 list db directory System Database Directory Number of entries in the directory = 1 Database 1 entry: Database alias Database name Node name Database release level Comment Directory entry type Catalog database partition number Alternate server hostname Alternate server port number lunen:svtadm 296> = = = = = = = = = SVT SVT NODESVT a.00 Remote -1 dartagnan 54700
Note that the alternate server hostname and port number in the Database Directory have been populated by making the first connection after HADR and client reroute information being updated on the database server.
21
0 4 A u g u st 2 0 0 5
And its database configuration shows it is running in HADR PRIMARY mode (only interesting configuration parameters are shown below): Listing 15. Database configuration show role of primary HADR server
phillipe:db2svt 724> db2 get db cfg for svt Database Configuration for Database svt Backup pending Database is consistent Rollforward pending Restore pending Log retain for recovery status User exit for logging status First active log file HADR HADR HADR HADR HADR HADR HADR HADR database role local host name local service name remote host name remote service name instance name of remote server timeout value log write synchronization mode = NO = NO = NO = NO = NO = YES = S0000277.LOG (HADR_LOCAL_HOST) (HADR_LOCAL_SVC) (HADR_REMOTE_HOST) (HADR_REMOTE_SVC) (HADR_REMOTE_INST) (HADR_TIMEOUT) (HADR_SYNCMODE) = = = = = = = = PRIMARY phillipe SVT_HADR_1 dartagnan SVT_HADR_2 DB2SVT 120 SYNC
22
0 4 A u g u st 2 0 0 5
First log archive method (LOGARCHMETH1) = DISK:/db2/SVT/log_archive/ Index re-creation time and redo index build (INDEXREC) = ACCESS Log pages during index build (LOGINDEXBUILD) = ON
Its snapshot shows (only interesting snapshot data is shown below): Listing 16. Database snapshot showing HADR status on primary HADR server
Database Snapshot Database name Database path /db3/db2/SVT/db2svt/NODE0000/SQL00001/ Input database alias Database status Log to be redone for recovery (Bytes) Log accounted for by dirty pages (Bytes) File File File File number number number number of of of of first active log last active log current active log log being archived = SVT = = SVT = Active = 2362 = 2362 = = = = 277 296 277 Not applicable
HADR Status Role = Primary State = Peer Synchronization mode = Sync Connection status = Connected, 10/15/2004 09:09:34.377400 Heartbeats missed = 0 Local host = phillipe Local service = SVT_HADR_1 Remote host = dartagnan Remote service = SVT_HADR_2 Remote instance = DB2SVT timeout(seconds) = 120 Primary log position(file, page, LSN) = S0000277.LOG, 0, 0000000790428945 Standby log position(file, page, LSN) = S0000277.LOG, 0, 00000007904286A1 Log gap running average(bytes) = 220
And its database configuration shows it is in rollforward pending state and running in HADR STANDBY mode (only interesting configuration parameters are shown below):
23
0 4 A u g u st 2 0 0 5
Listing 18. Database configuration showing state of database and HADR on standby server
dartagnan:db2svt 318> db2 get db cfg for svt Database Configuration for Database svt Backup pending Database is consistent Rollforward pending Restore pending Log retain for recovery status User exit for logging status First active log file HADR HADR HADR HADR HADR HADR HADR HADR database role local host name local service name remote host name remote service name instance name of remote server timeout value log write synchronization mode (HADR_LOCAL_HOST) (HADR_LOCAL_SVC) (HADR_REMOTE_HOST) (HADR_REMOTE_SVC) (HADR_REMOTE_INST) (HADR_TIMEOUT) (HADR_SYNCMODE) = NO = NO = DATABASE = YES = NO = YES = S0000274.LOG = = = = = = = = STANDBY dartagnan SVT_HADR_2 phillipe SVT_HADR_1 DB2SVT 120 SYNC
First log archive method (LOGARCHMETH1) = DISK:/db2/SVT/log_archive/ Index re-creation time and redo index build (INDEXREC) = ACCESS Log pages during index build (LOGINDEXBUILD) = ON
Its snapshot shows (only interesting snapshot data is shown below): Listing 19. Database snapshot showing HADR status on standby server
Database Snapshot Database name Database path /db3/db2/SVT/db2svt/NODE0000/SQL00001/ Input database alias Database status File number of first active log File number of last active log File number of current active log File number of log being archived Rollforward Rollforward Rollforward Rollforward type last committed timestamp log file being processed status = = = = = SVT = = SVT = Rollforward = 277 = 296 = 277 = Not applicable = = = = Database 10/15/2004 07:18:25 274 Redo
24
0 4 A u g u st 2 0 0 5
Heartbeats missed = 0 Local host = dartagnan Local service = SVT_HADR_2 Remote host = phillipe Remote service = SVT_HADR_1 Remote instance = DB2SVT timeout(seconds) = 120 Primary log position(file, page, LSN) = S0000277.LOG, 237, 0000000790515FFB Standby log position(file, page, LSN) = S0000277.LOG, 237, 0000000790515FBD Log gap running average(bytes) = 17450
lunen:svtadm 263> db2 list db directory System Database Directory Number of entries in the directory = 1 Database 1 entry: Database alias Database name Node name Database release level Comment Directory entry type Catalog database partition number Alternate server hostname Alternate server port number = = = = = = = = = SVT SVT NODESVT a.00 Remote -1 dartagnan 54700
25
0 4 A u g u st 2 0 0 5
Figure 6. SAP GUI showing primary HADR server status using db6cockpit
You can also monitor the remote database hosts operating system activity using OS07: Figure 7. Monitoring remote databases using OS07
26
0 4 A u g u st 2 0 0 5
In our test, we set up both CCMS agent and rfcoscol on the primary database host (phillipe) and only rfcoscol on the standby database host (dartagnan). So from transaction OS07, you can choose either SAPCCMSR.PHILLIPE.99 or SAPOSCOL_PHILLIPE to monitor OS activity on phillipe and SAPOSCOL_DARTAGNAN on dartagnan. Figure 8. Monitoring remote systems using CCMS or rfcoscol
On the primary database host (phillipe), you should see the following SAP agents running: Listing 21. SAP monitoring agents on primary server
Phillipe:db2svt root 7626 svtadm 8209 svtadm 8211 svtadm 8212 svtadm 8213 svtadm 8214 123> ps -ef | grep -i sap 1 0 Sep24 ? 03:10:03 1 0 Sep24 ? 00:01:04 8209 0 Sep24 ? 00:00:12 8211 0 Sep24 ? 00:13:13 8211 0 Sep24 ? 00:00:07 8211 0 Sep24 ? 00:02:43 saposcol sapccmsr sapccmsr sapccmsr sapccmsr sapccmsr -l -DCCMS -DCCMS -DCCMS -DCCMS -DCCMS
27
0 4 A u g u st 2 0 0 5
svtadm 5599 5598 0 11:49 ? 00:00:00 csh -c rfcoscol lunen sapgw00 55323439 GWHOST=lunen GWSERV=sapgw00 CONVID=55323439 pf=/usr/sap/SVT/SYS/profile/SVT_DVEBMGS00_lunen CPIC_TRACE=0 IDX=1 SNC_MODE=0 root 5647 5599 0 11:49 ? 00:00:00 rfcoscol lunen sapgw00 55323439 GWHOST=lunen GWSERV=sapgwSE CONVID=55323439 pf=/usr/sap/SVT/SYS/profile/SVT_DVEBMGS00_lunen CPIC_TRACE=0 IDX=1 SNC_MODE=0
On the standby database host dartagnan, you should see the following SAP agents running: Listing 22. SAP monitoring agents on standby server
dartagnan:svtadm 37> ps -ef | grep sap root 5036 1 0 11:49 ? 00:00:00 saposcol -l svtadm 5057 5056 0 11:50 ? 00:00:00 csh -c rfcoscol lunen sapgw00 55407072 GWHOST=lunen GWSERV=sapgw00 CONVID=55407072 pf=/usr/sap/SVT/SYS/profile/SVT_DVEBMGS00_lunen CPIC_TRACE=0 IDX=1 SNC_MODE=0 root 5086 5057 0 11:50 ? 00:00:00 rfcoscol lunen sapgw00 55407072 GWHOST=lunen GWSERV=sapgwSE CONVID=55407072 pf=/usr/sap/SVT/SYS/profile/SVT_DVEBMGS00_lunen CPIC_TRACE=0 IDX=1 SNC_MODE=0
You should see the following message in the db2diag.log. As you can tell, the database will initially complete the rollforward phase, then stop the Replay Master, and finally switch to the primary mode. Listing 24. db2diag.log from standby server showing phases of becoming the primary HADR server
2004-10-29-11.10.26.157832-240 I9061G410 LEVEL: Warning PID : 12267 TID : 1024 PROC : db2redom (SVT) INSTANCE: db2svt NODE : 000 APPHDL : 0-11 FUNCTION: DB2 UDB, recovery manager, sqlpPRecReadLog, probe:4630 MESSAGE : Last log for rollforward is incomplete, log number: DATA #1 : Hexdump, 4 bytes 0x4D37CF50 : 3701 0000 7... 2004-10-29-11.10.26.368167-240 I9472G317 LEVEL: Warning PID : 11891 TID : 1024 PROC : db2agnti (SVT) INSTANCE: db2svt NODE : 000 DB : SVT APPHDL : 0-11 FUNCTION: DB2 UDB, recovery manager, sqlpForwardRecovery, probe:1990
28
0 4 A u g u st 2 0 0 5
MESSAGE : nextLsn 0000000814505B74 2004-10-29-11.10.26.376534-240 E9790G379 LEVEL: Warning PID : 11891 TID : 1024 PROC : db2agnti (SVT) INSTANCE: db2svt NODE : 000 DB : SVT APPHDL : 0-11 FUNCTION: DB2 UDB, recovery manager, sqlpReplayMaster, probe:3600 MESSAGE : ADM1605I DB2 is invoking the backward phase of database rollforward recovery. 2004-10-29-11.10.26.376827-240 I10170G378 LEVEL: Warning PID : 11891 TID : 1024 PROC : db2agnti (SVT) INSTANCE: db2svt NODE : 000 DB : SVT APPHDL : 0-11 FUNCTION: DB2 UDB, recovery manager, sqlpForwardRecovery, probe:2210 MESSAGE : Invoking database rollforward backward recovery, nextLsn: 0000000814505B74 2004-10-29-11.10.26.870813-240 I10549G409 LEVEL: Error PID : 11891 TID : 1024 PROC : db2agnti (SVT) INSTANCE: db2svt NODE : 000 DB : SVT APPHDL : 0-11 FUNCTION: DB2 UDB, recovery manager, sqlpForwardRecovery, probe:670 MESSAGE : dbcb->logfhdr.firstDeleteFile: DATA #1 : Hexdump, 4 bytes 0x30010264 : FFFF FFFF .... 2004-10-29-11.10.26.985579-240 E10959G350 LEVEL: Warning PID : 11891 TID : 1024 PROC : db2agnti (SVT) INSTANCE: db2svt NODE : 000 DB : SVT APPHDL : 0-11 FUNCTION: DB2 UDB, recovery manager, sqlpReplayMaster, probe:6600 MESSAGE : ADM1611W The rollforward recovery phase has been completed. 2004-10-29-11.10.26.986105-240 I11310G324 LEVEL: Warning PID : 11891 TID : 1024 PROC : db2agnti (SVT) INSTANCE: db2svt NODE : 000 DB : SVT APPHDL : 0-11 FUNCTION: DB2 UDB, recovery manager, sqlpReplayMaster, probe:9500 MESSAGE : Stopping Replay Master on standby. 2004-10-29-11.10.34.093102-240 E11635G309 LEVEL: Event PID : 12262 TID : 1024 PROC : db2hadrp (SVT) INSTANCE: db2svt NODE : 000 FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrSetHdrState, probe:10000 CHANGE : HADR state set to P-Peer (was S-Peer)
After the standby database becomes the new primary database, the SAP application server will reroute the connections from the old primary database to this new primary database: Listing 25. Client Reroute connections established on "New" HADR server
dartagnan:db2svt 338> db2 list applications Auth Id Application Name -------- -------------SAPSVT dw.sapSVT_DVEB DB2SVT db2recindex SAPSVT dw.sapSVT_DVEB SAPSVT dw.sapSVT_DVEB Appl. Handle ---------617 616 614 615 Application Id DB # of Name Agents ------------------------------ -------- ----G91A62B0.HDDC.059015212215 SVT 1 SVT 1 G91A62B0.HADC.059025214302 SVT 1 G91A62B0.HBDC.059005221528 SVT 1
29
0 4 A u g u st 2 0 0 5
And its database snapshot shows it is now assuming the HADR PRIMARY role (only interesting information is shown below): Listing 26. Database snapshot showing new HADR primary role
Database Snapshot Database name Database path /db3/db2/SVT/db2svt/NODE0000/SQL00001/ Input database alias Database status File File File File number number number number of of of of first active log last active log current active log log being archived = SVT = = SVT = Active = = = = 310 329 310 Not applicable
HADR Status Role = Primary State = Peer Synchronization mode = Sync Connection status = Connected, 10/29/2004 10:40:15.031467 Heartbeats missed = 0 Local host = dartagnan Local service = SVT_HADR_2 Remote host = phillipe Remote service = SVT_HADR_1 Remote instance = DB2SVT timeout(seconds) = 120 Primary log position(file, page, LSN) = S0000310.LOG, 408, 000000081453C3C9 Standby log position(file, page, LSN) = S0000310.LOG, 408, 000000081453C3C9 Log gap running average(bytes) = 1000
The db2diag.log on the old Primary server will show that the database will initially switch the role from the primary to standby, then start the log Replay Master, and finally start up the rollforward process and get into rollforward pending mode. Listing 28. db2diag.log from former primary server showing phases of role switch to standby
2004-10-29-11.10.24.350207-240 I6904G355 PID : 29765 TID : 1024 INSTANCE: db2svt NODE : 000 APPHDL : 0-304 LEVEL: Warning PROC : db2agnti (SVT) DB : SVT
30
0 4 A u g u st 2 0 0 5
FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrSwitchDbFromRuntimeToStandby, probe:50122 MESSAGE : copy_nextlsn 0000000814505B74 2004-10-29-11.10.24.367203-240 E7260G344 LEVEL: Event PID : 29765 TID : 1024 PROC : db2agnti (SVT) INSTANCE: db2svt NODE : 000 DB : SVT APPHDL : 0-304 FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrSetHdrState, probe:10000 CHANGE : HADR state set to S-Peer (was P-Peer) 2004-10-29-11.10.24.383493-240 I7605G298 LEVEL: Severe PID : 29755 TID : 1024 PROC : db2agnti (SVT) INSTANCE: db2svt NODE : 000 DB : SVT APPHDL : 0-305 FUNCTION: DB2 UDB, base sys utilities, sqlesrsu, probe:999 MESSAGE : free tran stuff 2004-10-29-11.10.24.384458-240 I7904G324 LEVEL: Warning PID : 29755 TID : 1024 PROC : db2agnti (SVT) INSTANCE: db2svt NODE : 000 DB : SVT APPHDL : 0-305 FUNCTION: DB2 UDB, recovery manager, sqlpReplayMaster, probe:300 MESSAGE : Starting Replay Master on standby. 2004-10-29-11.10.24.384954-240 I8229G307 LEVEL: Warning PID : 3824 TID : 1024 PROC : db2hadrs (SVT) INSTANCE: db2svt NODE : 000 FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrSPrepareLogWrite, probe:10260 MESSAGE : RCUStartLsn 0000000814505B74 2004-10-29-11.10.35.872744-240 E8537G340 LEVEL: Warning PID : 29755 TID : 1024 PROC : db2agnti (SVT) INSTANCE: db2svt NODE : 000 DB : SVT APPHDL : 0-305 FUNCTION: DB2 UDB, recovery manager, sqlpReplayMaster, probe:920 MESSAGE : ADM1602W Rollforward recovery has been initiated. 2004-10-29-11.10.35.873052-240 E8878G383 LEVEL: Warning PID : 29755 TID : 1024 PROC : db2agnti (SVT) INSTANCE: db2svt NODE : 000 DB : SVT APPHDL : 0-305 FUNCTION: DB2 UDB, recovery manager, sqlpReplayMaster, probe:1740 MESSAGE : ADM1603I DB2 is invoking the forward phase of the database rollforward recovery. 2004-10-29-11.10.35.873251-240 I9262G414 LEVEL: Warning PID : 29755 TID : 1024 PROC : db2agnti (SVT) INSTANCE: db2svt NODE : 000 DB : SVT APPHDL : 0-305 FUNCTION: DB2 UDB, recovery manager, sqlpForwardRecovery, probe:720 DATA #1 : String, 103 bytes Invoking database rollforward forward recovery, lowtranlsn 0000000814505B74 minbufflsn 00000008143A400C 2004-10-29-11.10.35.884939-240 I9677G354 LEVEL: Warning PID : 29755 TID : 1024 PROC : db2agnti (SVT) INSTANCE: db2svt NODE : 000 DB : SVT APPHDL : 0-305 FUNCTION: DB2 UDB, recovery manager, sqlprecm, probe:2000 MESSAGE : Using parallel recovery with 5 agents 27 QSets 108 queues and 64 chunks
Its database snapshot shows it is in rollforward pending state and running in HADR STANDBY mode (only interesting information is shown below):
31
0 4 A u g u st 2 0 0 5
HADR Status Role = Standby State = Peer Synchronization mode = Sync Connection status = Connected, 10/29/2004 10:40:13.269653 Heartbeats missed = 0 Local host = phillipe Local service = SVT_HADR_1 Remote host = dartagnan Remote service = SVT_HADR_2 Remote instance = DB2SVT timeout(seconds) = 120 Primary log position(file, page, LSN) = S0000310.LOG, 408, 000000081453C455 Standby log position(file, page, LSN) = S0000310.LOG, 408, 000000081453C455 Log gap running average(bytes) = 885
32
0 4 A u g u st 2 0 0 5
Even after the database on phillipe has been switched to standby role, and all the database connections have been rerouted to the new primary database host dartagnan, the node directory information remains the same, and there is no alternate server for the node (instance). This is a limitation of the current DB2 client reroute implementation. Because of this limitation, on transaction DB6COCKPIT (ST04), you will not be able to monitor the new primary database on dartagnan. The transaction ST04 will still display the snapshot of the database on the original primary host phillipe, which is probably not desired. Having said that, some functions in transaction ST04 that only rely on the database connection, not the instance attachment, will continue to work properly.
33
0 4 A u g u st 2 0 0 5
You should see the following messages in the db2diag.log. The database will initially change from S-Peer to S-RemoteCatchupPending because the connection to the primary was lost, then the rollforward recovery was completed, and the Replay Master was stopped. Finally the database was switched into Primary mode and stayed on P-RemoteCatchPending mode. It couldnt reach P-Peer mode because the remote HADR peer was down. Listing 33. db2diag.log from standby server after a FORCE takeover
2004-10-29-12.36.28.187952-240 E22497G325 LEVEL: Event PID : 12262 TID : 1024 PROC : db2hadrs (SVT) INSTANCE: db2svt NODE : 000 FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrSetHdrState, probe:10000 CHANGE : HADR state set to S-RemoteCatchupPending (was S-Peer) 2004-10-29-12.36.28.203705-240 I22823G411 LEVEL: Warning PID : 22895 TID : 1024 PROC : db2redom (SVT) INSTANCE: db2svt NODE : 000 APPHDL : 0-771 FUNCTION: DB2 UDB, recovery manager, sqlpPRecReadLog, probe:4630 MESSAGE : Last log for rollforward is incomplete, log number: DATA #1 : Hexdump, 4 bytes 0x4C3A4390 : 3701 0000 7... 2004-10-29-12.36.28.426984-240 I23235G318 LEVEL: Warning PID : 12267 TID : 1024 PROC : db2agnti (SVT) INSTANCE: db2svt NODE : 000 DB : SVT APPHDL : 0-771 FUNCTION: DB2 UDB, recovery manager, sqlpForwardRecovery, probe:1990 MESSAGE : nextLsn 0000000814578B92 2004-10-29-12.36.28.460209-240 E23554G380 LEVEL: Warning PID : 12267 TID : 1024 PROC : db2agnti (SVT) INSTANCE: db2svt NODE : 000 DB : SVT APPHDL : 0-771 FUNCTION: DB2 UDB, recovery manager, sqlpReplayMaster, probe:3600 MESSAGE : ADM1605I DB2 is invoking the backward phase of database rollforward recovery. 2004-10-29-12.36.28.460697-240 I23935G379 PID : 12267 TID : 1024 INSTANCE: db2svt NODE : 000 LEVEL: Warning PROC : db2agnti (SVT) DB : SVT
34
0 4 A u g u st 2 0 0 5
APPHDL : 0-771 FUNCTION: DB2 UDB, recovery manager, sqlpForwardRecovery, probe:2210 MESSAGE : Invoking database rollforward backward recovery, nextLsn: 0000000814578B92 2004-10-29-12.36.29.238202-240 I24315G410 LEVEL: Error PID : 12267 TID : 1024 PROC : db2agnti (SVT) INSTANCE: db2svt NODE : 000 DB : SVT APPHDL : 0-771 FUNCTION: DB2 UDB, recovery manager, sqlpForwardRecovery, probe:670 MESSAGE : dbcb->logfhdr.firstDeleteFile: DATA #1 : Hexdump, 4 bytes 2004-10-29-12.36.29.345022-240 E24726G351 LEVEL: Warning PID : 12267 TID : 1024 PROC : db2agnti (SVT) INSTANCE: db2svt NODE : 000 DB : SVT APPHDL : 0-771 FUNCTION: DB2 UDB, recovery manager, sqlpReplayMaster, probe:6600 MESSAGE : ADM1611W The rollforward recovery phase has been completed. 2004-10-29-12.36.29.345636-240 I25078G325 LEVEL: Warning PID : 12267 TID : 1024 PROC : db2agnti (SVT) INSTANCE: db2svt NODE : 000 DB : SVT APPHDL : 0-771 FUNCTION: DB2 UDB, recovery manager, sqlpReplayMaster, probe:9500 MESSAGE : Stopping Replay Master on standby. 2004-10-29-12.36.36.168746-240 E25404G341 LEVEL: Event PID : 12262 TID : 1024 PROC : db2hadrp (SVT) INSTANCE: db2svt NODE : 000 FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrSetHdrState, probe:10000 CHANGE : HADR state set to P-RemoteCatchupPending (was S-RemoteCatchupPending)
Its database snapshot shows: Listing 34. Database snapshot of former standby HADR server after takeover
Database Snapshot Database name File number of File number of File number of File number of first active log last active log current active log log being archived = = = = = SVT 310 329 310 Not applicable
HADR Status Role = Primary State = Disconnected Synchronization mode = Sync Connection status = Disconnected, 10/29/2004 12:36:28.188239 Heartbeats missed = 0 Local host = dartagnan Local service = SVT_HADR_2 Remote host = phillipe Remote service = SVT_HADR_1 Remote instance = DB2SVT timeout(seconds) = 120 Primary log position(file, page, LSN) = S0000310.LOG, 4072, 000000081538C855 Standby log position(file, page, LSN) = S0000000.LOG, 0, 0000000000000000 Log gap running average(bytes) = 0
35
0 4 A u g u st 2 0 0 5
After this, you will be able to see the db2diag.log message on this host as below: Listing 36. db2diag.log from former primary server after restarting
2004-10-29-14.27.14.110464-240 E31781G961 LEVEL: Event PID : 8479 TID : 1024 PROC : db2star2 INSTANCE: db2svt NODE : 000 FUNCTION: DB2 UDB, base sys utilities, DB2StartMain, probe:911 MESSAGE : ADM7513W Database manager has started. START : DB2 DBM . . . . . . . . . . 2004-10-29-14.27.27.936769-240 E32743G305 LEVEL: Event PID : 8972 TID : 1024 PROC : db2hadrs (SVT) FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrSetHdrState, probe:10000 CHANGE : HADR state set to None (was None) 2004-10-29-14.27.27.970986-240 E33049G307 LEVEL: Event PID : 8972 TID : 1024 PROC : db2hadrs (SVT) INSTANCE: db2svt NODE : 000 FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrSetHdrState, probe:10000 CHANGE : HADR state set to S-Boot (was None) 2004-10-29-14.27.28.012588-240 I33357G323 LEVEL: Warning PID : 8652 TID : 1024 PROC : db2agnti (SVT) INSTANCE: db2svt NODE : 000 DB : SVT APPHDL : 0-11 FUNCTION: DB2 UDB, recovery manager, sqlpReplayMaster, probe:300 MESSAGE : Starting Replay Master on standby.
36
0 4 A u g u st 2 0 0 5
2004-10-29-14.27.28.015452-240 I33681G401 LEVEL: Warning PID : 8972 TID : 1024 PROC : db2hadrs (SVT) INSTANCE: db2svt NODE : 000 FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrHandleHsAck, probe:30445 MESSAGE : HADR: old primary reintegration as new standby discarding obsolete logs after hdrLCUEndLsnRequested 0000000814578B91 2004-10-29-14.27.28.015659-240 E34083G317 LEVEL: Event PID : 8972 TID : 1024 PROC : db2hadrs (SVT) INSTANCE: db2svt NODE : 000 FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrSetHdrState, probe:10000 CHANGE : HADR state set to S-LocalCatchup (was S-Boot) 2004-10-29-14.27.28.021135-240 E34401G339 LEVEL: Warning PID : 8652 TID : 1024 PROC : db2agnti (SVT) INSTANCE: db2svt NODE : 000 DB : SVT APPHDL : 0-11 FUNCTION: DB2 UDB, recovery manager, sqlpReplayMaster, probe:920 MESSAGE : ADM1602W Rollforward recovery has been initiated. 2004-10-29-14.27.28.021465-240 E34741G382 LEVEL: Warning PID : 8652 TID : 1024 PROC : db2agnti (SVT) INSTANCE: db2svt NODE : 000 DB : SVT APPHDL : 0-11 FUNCTION: DB2 UDB, recovery manager, sqlpReplayMaster, probe:1740 MESSAGE : ADM1603I DB2 is invoking the forward phase of the database rollforward recovery. 2004-10-29-14.27.28.025235-240 I35124G413 LEVEL: Warning PID : 8652 TID : 1024 PROC : db2agnti (SVT) INSTANCE: db2svt NODE : 000 DB : SVT APPHDL : 0-11 FUNCTION: DB2 UDB, recovery manager, sqlpForwardRecovery, probe:720 DATA #1 : String, 103 bytes Invoking database rollforward forward recovery, lowtranlsn 0000000814578B92 minbufflsn 0000000814560978 2004-10-29-14.27.28.075504-240 I35538G353 LEVEL: Warning PID : 8652 TID : 1024 PROC : db2agnti (SVT) INSTANCE: db2svt NODE : 000 DB : SVT APPHDL : 0-11 FUNCTION: DB2 UDB, recovery manager, sqlprecm, probe:2000 MESSAGE : Using parallel recovery with 5 agents 27 QSets 108 queues and 64 chunks 2004-10-29-14.27.28.204285-240 E35892G333 LEVEL: Event PID : 8972 TID : 1024 PROC : db2hadrs (SVT) INSTANCE: db2svt NODE : 000 FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrSetHdrState, probe:10000 CHANGE : HADR state set to S-RemoteCatchupPending (was S-LocalCatchup) 2004-10-29-14.27.28.220506-240 I36226G364 LEVEL: Warning PID : 8972 TID : 1024 PROC : db2hadrs (SVT) INSTANCE: db2svt NODE : 000 FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrEduS, probe:20895 MESSAGE : Pair validation passed. Primary reintegration: hdrLCUEndLsnRequested: 0000000814578B91 2004-10-29-14.27.28.220717-240 I36591G356 LEVEL: Warning PID : 8972 TID : 1024 PROC : db2hadrs (SVT) INSTANCE: db2svt NODE : 000 FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrNukeLogTail, probe:10445
37
0 4 A u g u st 2 0 0 5
MESSAGE : Primary reintegration: hdrNukeLogTail() called at LSN: 0000000814578B91 2004-10-29-14.28.33.222329-240 E36948G334 LEVEL: Event PID : 8972 TID : 1024 PROC : db2hadrs (SVT) INSTANCE: db2svt NODE : 000 FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrSetHdrState, probe:10000 CHANGE : HADR state set to S-RemoteCatchup (was S-RemoteCatchupPending) 2004-10-29-14.28.33.222550-240 I37283G307 LEVEL: Warning PID : 8972 TID : 1024 PROC : db2hadrs (SVT) INSTANCE: db2svt NODE : 000 FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrSPrepareLogWrite, probe:10260 MESSAGE : RCUStartLsn 0000000814578B92 2004-10-29-14.28.39.343019-240 E37591G324 LEVEL: Event PID : 8972 TID : 1024 PROC : db2hadrs (SVT) INSTANCE: db2svt NODE : 000 FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrSetHdrState, probe:10000 CHANGE : HADR state set to S-NearlyPeer (was S-RemoteCatchup) 2004-10-29-14.28.39.431901-240 E37916G315 LEVEL: Event PID : 8972 TID : 1024 PROC : db2hadrs (SVT) INSTANCE: db2svt NODE : 000 FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrSetHdrState, probe:10000 CHANGE : HADR state set to S-Peer (was S-NearlyPeer)
And on the current primary database, you will be able to see the db2diag.log message as below: Listing 37. db2diag.log from current primary server showing former primary server re-integrating as HADR standby server
2004-10-29-14.27.29.509990-240 I29576G322 LEVEL: Warning PID : 12262 TID : 1024 PROC : db2hadrp (SVT) INSTANCE: db2svt NODE : 000 FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrEduP, probe:20482 MESSAGE : Old primary requesting rejoining HADR pair as a standby 2004-10-29-14.28.37.982717-240 E29899G334 LEVEL: Event PID : 12262 TID : 1024 PROC : db2hadrp (SVT) INSTANCE: db2svt NODE : 000 FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrSetHdrState, probe:10000 CHANGE : HADR state set to P-RemoteCatchup (was P-RemoteCatchupPending) 2004-10-29-14.28.37.986492-240 I30234G308 LEVEL: Warning PID : 12262 TID : 1024 PROC : db2hadrp (SVT) INSTANCE: db2svt NODE : 000 FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrEduP, probe:20445 MESSAGE : remote catchup starts at 000000081457800C 2004-10-29-14.28.40.630513-240 I30543G325 LEVEL: Warning PID : 12262 TID : 1024 PROC : db2hadrp (SVT) INSTANCE: db2svt NODE : 000 FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrTransitionPtoNPeer, probe:10645 MESSAGE : near peer catchup starts at 000000081538DCDD 2004-10-29-14.28.40.730154-240 E30869G324 PID : 12262 TID : 1024 INSTANCE: db2svt NODE : 000 LEVEL: Event PROC : db2hadrp (SVT)
38
0 4 A u g u st 2 0 0 5
FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrSetHdrState, probe:10000 CHANGE : HADR state set to P-NearlyPeer (was P-RemoteCatchup) 2004-10-29-14.28.40.732209-240 E31194G315 LEVEL: Event PID : 12262 TID : 1024 PROC : db2hadrp (SVT) INSTANCE: db2svt NODE : 000 FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrSetHdrState, probe:10000 CHANGE : HADR state set to P-Peer (was P-NearlyPeer)
3) After the original primary database has rejoined the HADR pair as the standby database, you can choose to perform a failback operation to switch the roles of the databases to enable the original primary database to be once again the primary database. To perform this failback operation, follow the steps in 5.2.
5.5.3 Recommendations
1) HADR Synchronization Mode SYNC mode is recommended for the best data protection. If you find this mode is impacting the performance of your system, you could change to NEARSYNC mode. ASYNC mode is not recommended as it could lead to a higher probability of data loss.
39
0 4 A u g u st 2 0 0 5
6.
One of the key considerations that most customers would want to know before implementing HADR in a production environment is the impact it will have on performance. In order to get a feeling of the performance impact, we have conducted several tests in our test systems. Since these systems are not in a controlled environment, the performance number could vary from time to time.
6.1 Failover time In a test conducted to measure failover time in a simulated production environment (see Figure 5. SAP HADR Setup), we ran a 600 user SD Benchmark (with 200 SD users on each application server), and forced a take-over (see section 5.3). The time taken for the clients on the Application Server to reroute the connections and continue processing was about 15 seconds. These 15 seconds can be roughly broken down to the following three phases: 1. Tivoli System Automation (TSA) to detect the primary server failure and initiate the standby server to take over at least 9 seconds. If TSA or other clustering software is not installed, you can initiate the takeover manually, which will usually take longer time. 2. The standby server to take over, including to replay any logs it still has in memory, undo any in-flight transactions, and open the database for new transactions; 3. The Client Reroute to make a new connection to the new primary server.
40
0 4 A u g u st 2 0 0 5
A third test was conducted to measure the impact of log receive buffer size. By default, the log receive buffer size on the standby database will be two times the value specified for the LOGBUFSZ configuration parameter on the primary database. There might be times when this size is not sufficient. For example, when the HADR synchronization mode is asynchronous and the primary and standby databases are in peer state. If the primary database is experiencing a high transaction load, the log receive buffer on the standby database might fill to capacity, and the log shipping operation from the primary database might stall. To manage these temporary peaks, you can increase the size of the log receive buffer on the standby database by modifying the DB2_HADR_BUF_SIZE registry variable. The workload chosen was the SD benchmark test with 600 users equally spread across 3 application servers. Table 2. SD Benchmark performance in different DB2_HADR_BUF_SIZE HADR setup Response Response Time/Throughput on Time/Throughput on Central Instance lunen Diaglog Instance 1 Response Time/Throughput on Diaglog instance 2
Throughput Base (default setting) HADR_SYNCMODE=ASYN (DS/sec) : 11.93 Average response time C : 6887 LOGBUFSZ = 1024 DB2_HADR_BUF_SIZE=2048 Throughput Increased buffer size HADR_SYNCMODE=ASYN (DS/sec) : 10.68 Average response time C : 9351 LOGBUFSZ = 1024 DB2_HADR_BUF_SIZE=4096
(DS/sec) Throughput (DS/sec) Throughput : 15.56 : 11.92 Average response time Average response time : 3228 : 7387
(DS/sec) Throughput (DS/sec) Throughput : 15.57 : 12.67 Average response time Average response time : 3204 : 6346
The performance numbers above are not consistent across all SAP application servers. Therefore, they do not indicate either performance gain or degradation due to the increased HADR receiving buffer pool size with the workload we put on these servers.
41
0 4 A u g u st 2 0 0 5
F:\db2>db2 deactivate db SVT DB20000I The DEACTIVATE DATABASE command completed successfully. F:\db2>db2 list applications SQL1611W No data was returned by Database System Monitor. SQLSTATE=00000
2. 3.
Use the split mirror funciton of the storage system to separate the mirrors Use OS tools, such as dd, tar, gzip, to backup the DB2 image, which includes the database home directory (such as /DB2/<SID>/DB2<SID>) and all the tablespace container file systems (such as /DB2/<SID>/sapdata) to a backup location. Reactivate the standby database:
4.
42
0 4 A u g u st 2 0 0 5
2.
3.
Listing 40. Update database configuration parameters before roll forwarding the log files
D:\Program Files\IBM\SQLLIB\BIN>db2 get db cfg for SVT Database Configuration for Database SVT Backup pending Database is consistent Rollforward pending Restore pending HADR HADR HADR HADR HADR HADR HADR HADR database role local host name local service name remote host name remote service name instance name of remote server timeout value log write synchronization mode (HADR_LOCAL_HOST) (HADR_LOCAL_SVC) (HADR_REMOTE_HOST) (HADR_REMOTE_SVC) (HADR_REMOTE_INST) (HADR_TIMEOUT) (HADR_SYNCMODE)
= NO = NO = DATABASE = YES = = = = = = = = STANDBY dartagnan SVT_hadr_2 phillipe SVT_hadr_1 DB2SVT 120 SYNC
(LOGARCHMETH1) = DISK:F:\db2\NODE0000\
D:\Program Files\IBM\SQLLIB\BIN>db2 stop hadr on db SVT DB20000I The STOP HADR ON DATABASE command completed successfully. D:\Program Files\IBM\SQLLIB\BIN>db2 update db cfg for SVT using hadr_local_ho st phillipe DB20000I The UPDATE DATABASE CONFIGURATION command completed successfully. D:\Program Files\IBM\SQLLIB\BIN>db2 update db cfg for SVT using hadr_remote_h ost dartagnan DB20000I The UPDATE DATABASE CONFIGURATION command completed successfully. D:\Program Files\IBM\SQLLIB\BIN>db2 update db cfg for SVT using hadr_remote_s vc SVT_hadr_2 DB20000I The UPDATE DATABASE CONFIGURATION command completed successfully.
43
0 4 A u g u st 2 0 0 5
D:\Program Files\IBM\SQLLIB\BIN>db2 update db cfg for SVT using hadr_local_sv c SVT_hadr_1 DB20000I The UPDATE DATABASE CONFIGURATION command completed successfully. D:\Program Files\IBM\SQLLIB\BIN>db2 get db cfg for SVT Database Configuration for Database SVT Backup pending Database is consistent Rollforward pending Restore pending HADR HADR HADR HADR HADR HADR HADR HADR database role local host name local service name remote host name remote service name instance name of remote server timeout value log write synchronization mode = NO = NO = DATABASE = YES = = = = = = = = STANDARD phillipe SVT_hadr_1 dartagnan SVT_hadr_2 DB2SVT 120 SYNC
(LOGARCHMETH1) = DISK:F:\db2\NODE0000\
4.
Apply the log files from the original primary database to the restored database.
D:\Program Files\IBM\SQLLIB\BIN>db2 rollforward db SVT complete Rollforward Status Input database alias Number of nodes have returned status Node number Rollforward status Next log file to be read Log files processed Last committed transaction DB20000I = SVT = 1 = = = = = 0 not pending S0000000.LOG - S0000002.LOG 2005-06-23-15.59.29.000000
44
0 4 A u g u st 2 0 0 5
5.
Restart the HADR (assuming the standby database is still up and running)
Please be aware that at this point, the primary database would be using a different log chain for archiving logs.
45
0 4 A u g u st 2 0 0 5
8. Appendix: Reference
DB2 UDB Data Recovery and High Availability Guide and Reference, V8.2 ftp://ftp.software.ibm.com/ps/products/db2/info/vr82/pdf/en_US/db2hae81.pdf DB2 UDB Administration Guide: Implementation, V8.2 ftp://ftp.software.ibm.com/ps/products/db2/info/vr82/pdf/en_US/db2d2e81.pdf Automating DB2 HADR Failover using IBM Tivoli System Automation for Multiplatforms ftp://ftp.software.ibm.com/software/data/db2/linux/tsa_hadr.pdf FlashCopy and Remote Volume Mirror for IBM Total Storage FAStT 900 in an SAP and DB2 Environment http://w3.ncs.ibm.com/cspaper.nsf/HTitle/0BTOS-5ZZQFT?OpenDocument
46
0 4 A u g u st 2 0 0 5