TA050 System Availability Strategy
TA050 System Availability Strategy
Author: <Author>
Creation Date: May 29, 1999
Last Updated: June 10, 1999
Document Ref: <Document Reference Number>
Version: DRAFT 1A
Approvals:
<Approver 1>
<Approver 2>
TA.050 System Availability Strategy Doc Ref: <Document Reference Number>
June 10, 1999
Document Control
Change Record
1
Reviewers
Name Position
Distribution
Note To Holders:
If you receive an electronic copy of this document and print it out, please
write your name on the equivalent of the cover page, for document
control purposes.
If you receive a hard copy of this document, please write your name on
the front cover, for document control purposes.
Contents
Document Control............................................................................................
Introduction......................................................................................................
Purpose......................................................................................................
Scope..........................................................................................................
Definitions..................................................................................................
Critical Systems Availability............................................................................
Introduction
Purpose
Scope
Definitions
Oracle Database
Software Upgrade
Oracle Database
Software Patch
Oracle Applications
Client Software
Upgrade
Oracle Applications
Server Software
Upgrade
Oracle Applications
Software Patch
Oracle Database
Cold Backup
database servers
application file servers
networks
data centers
client desktop machines
For each type of component, all possible physical failure events are
considered for every machine or component with the scope of this
analysis.
This section lists the causes, results and the strategy for providing
continuing availability through database server failure or for recovering
from the failure.
Loss of power supply DS-1 Loss of all server resident Uninterrupted power
application and database supply
processing
Failed CPU DS-2
Failed System Bus DS-3
Failed Memory DS-4
Loss or corruption of a DS-5 Failure of database 2-way disk mirror
single disk instances that access data
or control structures on
the disk
DS-6 Failure of application
processing for processes
that need to read
application code from the
failed disk
Loss of a disk I/O DS-7 Failure of database 2-way disk mirror
Controller instances that access data
or control structures
through that controller
DS-8 Failure of application
processing for processes
that need to read
application code through
the controller
<Failure Cause>
This section lists the causes, results, and the strategy for providing
continuing availability through application server failure, or for recovering
from the failure. This applies to the following application servers:
<Failure Cause>
Network Failure
Network failures can occur within both the local and the wide area
networks (LANs and WANs).
<Failure Cause>
Civil strife, earthquake, DC-1 Loss of all application and Geographically remote
flood database processing disaster recovery site
resident in the data center in...
<Failure Cause>
Virus infection PC-1 Partial or complete loss of Backup spare PCs Assume PC will
<Failure Cause>
database software
application software
This section does not include failures that are caused by the failure of
physical system components. These failures are discussed in the Physical
(Hardware or Network) Component Failure section.
user errors
operations (maintenance) staff errors
software bugs
lack of adequate database space management
database.
interfaces
reporting and analysis
This section does not include failures that are caused by the failure of
physical system components. These failures are discussed in the Physical
(Hardware or Network) Component Failure section.
user errors
operations (maintenance) staff errors
software bugs (interfaces)
improper access to data for reporting and analysis
In the past, two character date coding was an acceptable convention due
to perceived costs associated with the additional disk and memory
storage requirements of full four character date encoding. As the year
2000 approached, it became evident that a full four character coding
scheme was more appropriate.
Interface Failure
Failure Cause Failure Code Result Availability/ Recovery Comments
Strategy
Reporting Failure
Invalid selection of data CD-10 Report/Query may yield Repair query, rerun
in query due to invalid incorrect results. report/query and
date logic in query. validate results.
Maintenance Outages
Maintenance outages are planned outage events that are required to
perform some form of maintenance on the system. Examples of system
maintenance events include:
Database Maintenance
Cold database backup DM-1 Database and applications Use hot backups as
unavailable much as possible. Cold
backups once per week.
Data archive
Data purge
Software Maintenance
Maintenance Event Code Availability During Maintenance Strategy Comments
Maintenance
Open Issues
Closed Issues