IN - FS - TC - 1117 - Global CASHplus (GCP) Disastr Recovery Test
IN - FS - TC - 1117 - Global CASHplus (GCP) Disastr Recovery Test
To: NAME
Cc:
From: Company X Internal Audit Department
Subject: Failover Observation – 10th November 2017
Date: 10th November 2017
Background
As part of the Company X Disaster Recovery (DR) program, a Disaster Recovery exercise was
conducted on Friday, November 10, 2017 to demonstrate its capability to failover services performed
at the primary data center located in Pune, India to the secondary data center located in Hyderabad,
India. The exercise was designed to validate Company X internal recovery capabilities and ability to
provide end-to-end services from the secondary data center in Hyderabad, India. Once connected to
the secondary data center, the DR team validated the availability and readiness (replication) of the
critical components used in delivering and supporting Global CASHplus across the globe.
Scope
Global CASHplus
Objectives
Demonstrate that a planned failover could be undertaken within the Recovery Time
Objectives (RTO) defined for an unplanned failover. The RTO was 4 – 8 hours.
o Maximum of 8 hours to failover the Pune backup datacenter (Hyderabad) using
alternate network connectivity and validate availability and readiness (replication) of
the critical components used in delivering and supporting Global CASHplus (licensed)
across the globe.
Restore operations to the primary location and preserve the system status with respect to any
data processed while in failover.
Identify areas for improvement in the datacenter infrastructure, operational procedures and
staff awareness with respect to planned and unplanned failover events.
Approach
Prior to the date of the exercise, a scope document was created by the Disaster Recovery Team to
outline the plan and exercise requirements. This document was reviewed by Company X
Management.
On the date of the failover exercise, the in-scope production environments were brought down at the
primary data center located in Pune and the corresponding DR environments brought up at the
alternate data center in Hyderabad. Connection to the Pune backup datacenter (Hyderabad) was
successful using alternate network connectivity. In this case all connections were done via standard
Page 1 of 5
Internal Audit Memo
wireless internet dongles, readily available for purchase within India. Once connected, the DR team
validated availability and readiness (replication) of the critical components used in delivering and
supporting Global CASHplus. Tests were conducted to demonstrate Company X’s ability to process
and validate end-to-end transactions performed from the alternate data center in Hyderabad.
Participants
The following key personnel were observed during the planning and execution of the exercise:
Other Company X IT staff participated in the failover exercise. Their participation included remote
execution and support which occurred throughout the exercise.
Test Results
Page 2 of 5
Internal Audit Memo
WORK IS HANDED OVER FOR VALIDATION - BUILDS CAN BEGIN IF NECESSARY AT THIS POINT -
CALCULATE RTO.
Internal Testing Performed by DR Champions Team 14:40 16:00
Master DB at DRP site
Verify Master DB replication is in sync at the DR site
Verify the Master DB clients can access the Master DB replication
DB at the DR site
DRPDB for GCP
XXXR-LNXVM01
XXXR-LNXVM02
XXXR-LNXVM03
XXXR-LNXVM04
XXXR-LNXVM05
XXXR-LNXVM06
Verify GCP Applications and connectivity
Verify GCP QA Apps and connectivity to VM
Development – GCP (xxxA Bank)
Login to Application
Query some screens
Connect & Validate the Databases
If possible, enter transaction
Check-in code
Update and commit
Any other tests
Generate an application build
Open compiler and work
Development – GCP (xxxB Bank)
Login to Application
Query some screens
Connect & Validate the Databases
If possible, enter transaction
Check-in code
Update and commit
Any other tests
Generate an application build
Open compiler and work
Quality Assurance – GCP (QC)
Login to Application
Connect & Validate the Databases
Submit a test case
If possible, enter transaction
Dev Support 1 – GCP (All Banks)
Login to Application
Query some screens
Connect & Validate the Databases
Page 3 of 5
Internal Audit Memo
RTO 4 hours
RTA 59 minutes
11:10 to 12:09
RPO < 2 hours
RPA < 2 hours
Recovery Time Objective (RTO) is measured from point of declaration onward and not from time of
failure. RTO defines the maximum amount of time following a declaration, that the system must be
made operational again at its recovery site.
Recovery Point Objective (RPO) defines the maximum acceptable data loss, as a measurement of
time, prior to failure. RPO is the period of time, prior to event, which the newly recovered
environment’s data must ‘be-as-of’ and include transactions for.
Recovery Time Actuals / Recovery Point Actuals (RTA/RPA) are estimated in the breakdown of
recovery steps and confirmed during DR Testing. If it is determined that either capability exceeds its
corresponding objective, it is flagged as a gap in the plan.
Summary of Observations
Within the scope of the exercise outlined above, Protiviti:
Page 4 of 5
Internal Audit Memo
Observed the Company X Disaster Recovery (DR) program which demonstrated its
capability to failover services performed at the primary data center located in Pune on
November 10, 2017, India to the secondary data center located in Hyderabad, India.
Once connected, the DR team validated availability and readiness (replication) of the critical
components used in delivering and supporting Global CASHplus.
During the observation, it was noted that was noted that the projected test duration for the
failover exercise was set at 152 minutes. However we observed that the exercise was
completed within 59 minutes. The benchmark needs to be reexamined and revised
accordingly.
Recommendations
Given the critical nature of a successful DR exercise, we recommend the following steps for the
Company X DR program:
Company X should review the projected failover exercise timings by using the central
tendencies measurement method. Previously conducted failover exercises should be used
as the base data to arrive at a revised benchmark timing (on an activity level), which should
be subsequently used to evaluate future failover exercises.
Page 5 of 5