Ibm Best Practices
Ibm Best Practices
4.0 HACMP
........................................................ Page 54 of
4.1 Uni/SMP Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 54 of
4.2 LPAR Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 54 of
4.2.1 Clusters of LPAR Images Coupled to Uni/SMP systems . . . . . . . . . . . . . . . . . Page 54 of
4.2.2 Clusters of LPAR Images Across LPAR Systems . . . . . . . . . . . . . . . . . . . . . . . Page 54 of
4.2.3 HACMP Clusters within One LPAR System . . . . . . . . . . . . . . . . . . . . . . . . . . Page 54 of
6.0 Conclusions
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 57 of
1.1 Purpose
The purpose of this paper is to describe in a level of detail useful to a system administrator or operator
how to configure the pSeries 690 system to achieve higher levels of availability utilizing a Best Practices
approach.
There are many factors which influence the overall availability of a system. Some factors take into
account the base design of the system including which components act as single points of failure and
which components are replicated through N+1 design or some other form of redundancy. These areas of
the pSeries 690 are controlled by development and are minimized through various design techniques
(refer to the white paper entitled “pSeries 690 RAS White Paper”).
Other factors are directly influenced by how the administrator of the system chooses to configure their
various operating environments. This is the primary focus of this white paper. Other factors outside the
scope of this white paper, but for which consulting services are available from IBM, deal with how the
system is administered and the various availability management policies and practices established by the
IT department.
1.2 Approach
The first part of this white paper will address the pSeries 690 system running in symmetric
multiprocessing (SMP) mode and will form the basis on which the rest of the paper references.
The second part of the white paper will address the availability factors concerned with the Logical
Partition (LPAR) mode of operation. It’s recommended that the user interested in configuring an LPAR
system for high availability review the SMP sections prior to the LPAR section since most of the LPAR
material references back to the SMP section.
The next section will address running HACMP in SMP and LPAR partitions which is followed by the
conclusions regarding configuring the for availability.
~
This paper is best utilized by referencing the following documents or guides to actually perform the
suggested actions proposed in this paper:
pSeries 690 Installation Guide - SA38-0587-00
Hardware Management Console for pSeries Operations Guide
IBM pSeries 690 User’s Guide - SA38-0588-00
PCI Adapter Placement Reference
HACMP for AIX Installation Guide
Electronic Service Agent for pSeries User’s Guide
This section will begin by addressing base boot configuration options through use of Service Processor
(SP) menus or through alternate methods such as AIX® command line instructions, AIX Diagnostic
Service AIDs, Service Management System menus, or Web System Manager interfaces. Once configured,
these options remain in effect until specifically changed by the user.
Following this section, Boot and User disk configuration options will be reviewed. Disk configuration
and I/O adapter configuration reflect two of the most significant areas that the customer has direct
control over to affect overall system availability. Finally, service enablement and remote support topics
will be addressed which will help to automate the error report, service call, and dispatch of the service
representative with the correct Field Replaceable Units (FRUs) to fix the system quickly and efficiently in
the event of a failure.
Each of the individual settings will be explained in additional sections following the table. The last column
in the table represents whether there is an alternative method from System Management Services (SMS)
menus, the SMIT/SMITTY panels in the AIX operating system or through the AIX Diagnostic Service
Aids interfaces to change the setting. The user can choose whichever method they are most familiar with
to effect the proposed changes. Each of the options are explained in the sections following the table.
~
Memory Enable Memory Enable Enable
Config/Deconfig Repeat Gard
Menu
Refer to the “IBM pSeries Installation Guide” for specific instructions on configuring the
preferred settings through use of the SP menus.
2.1.1 Boot Time Options
Once these options are set utilizing the Service Processor Setup menus, they are maintained in NVRAM
and utilized on every system boot until they are specifically altered by the user.
2.1.2 OS Surveillance Setup
Surveillance is a function in which the service processor monitors the system, and the system monitors
the service processor. This monitoring is accomplished by periodic samplings called heartbeats.
Surveillance is available during two phases:
1. System firmware bringup (automatic)
Operating system surveillance is not enabled by default, allowing you to run operating systems that do
not support this service processor option. You can also use service processor menus and AIX service aids
to enable or disable operating system surveillance.
For operating system surveillance to work correctly, you must set these parameters:
Surveillance enable/disable
Surveillance interval
The maximum time the service processor should wait for a heartbeat from the operating system
before time out.
Surveillance delay
The length of time to wait from the time the operating system is started to when the first
heartbeat is expected. Surveillance does not take effect until the next time the operating system
is started after the parameters have been set.
If desired, you can initiate surveillance mode immediately from service aids. In addition to the three
options above, a fourth option allows you to select immediate surveillance, and rebooting of the system is
not necessarily required. If operating system surveillance is enabled (and system firmware has passed
control to the operating system), and the service processor does not detect any heartbeats from the
operating system, the service processor assumes the system is hung and takes action according to the
reboot/restart policy settings. If surveillance is selected from the service processor menus which are only
available at bootup, then surveillance is enabled by default as soon as the system boots. From service aids,
the selection is optional.
3. Surveillance Delay:
2 minutes
98. Return to Previous Menu
0>
– Surveillance
Can be set to Enabled or Disabled.
– Surveillance Time Interval
Can be set to any number from 2 through 255 (value is in minutes).
– Surveillance Delay
Can be set to any number from 0 through 255 (value is in minutes).
You can access this service aid directly from the AIX command line, by typing:
/usr/lpp/diagnostics/bin/uspchrp -s
Use the Snoop Serial Port option to select the serial port to snoop.
Note: Only serial port 1 is supported.
After serial port snooping is correctly configured, at any point after the system is booted to AIX,
whenever the reset string is typed on the main console, the system uses the service processor reboot
policy to restart. Pressing Enter after the reset string is not required, so make sure that the string is not
common or trivial. A mixed-case string is recommended.
2.1.4 Scan Dump Log Policy
A scan dump is the collection of chip data that the service processor gathers after a
system malfunction, such as a checkstop or hang. The scan dump data may contain
chip scan rings, chip trace arrays, and SCOM contents.
The scan dump data are stored in the system control store. The size of the scan
dump area is approximately 4MB.
When ac power is restored, the system returns to the power state at the
time ac loss occurred. For example, if the system was powered-on when ac loss
occurred, it reboots/restarts when power is restored. If the system was powered-off
when ac loss occurred, it remains off when power is restored.
2.1.5.1 SP Menu
When enabled, “Unattended Start Mode”...allows the system to recover from the loss
of ac power. If the system was powered-on when the ac loss occurred, the system reboots when
power is restored. If the system was powered-off when the ac loss occurred, the system remains off when
power is restored. You can access this service aid directly from the AIX command line, by typing:
/usr/lpp/diagnostics/bin/uspchrp -b
The purpose of this option is to allow the system to attempt to restart after experiencing a fatal error.
This option is defined by a combination of 2 parameters described in the table below. The objective is to
allow the system to attempt to restart after failure. Choose the option that you feel most comfortable with
to cause that affect.
Use OS-Defined restart policy - The default setting for GA1 was Yes. This causes the service processor
to refer to the OS Automatic Restart Policy setting and take action (the same action the operating system
would take if it could have responded to the problem causing the restart).
The default for systems shipped with GA2 & beyond is No. When this setting is No, or if the operating
system did not set a policy, the service processor refers to enable supplemental restart policy for its
action.
Enable supplemental restart policy - The default setting for GA1 was No. The default setting is Yes
for systems shipped with GA2 & beyond. If set to Yes, the service processor restarts the server when the
operating system loses control and either:
The Use OS-Defined restart policy is set to No.
The Use OS-Defined restart policy is set to Yes and the operating system has no automatic restart
policy.
The following table describes the relationship among the operating system and service processor restart
controls in SMP mode
With the recommended settings, the system will automatically attempt restart, regardless of the setting of
the OS automatic restart parameter.
The following table describes the interaction of the parameters in LPAR mode:
With the recommended changes, the system will always automatically attempt reboot if the OS automatic
restart parameter is set to true in each LPAR partition.
For proper operation in LPAR mode, this parameter should always be set to No. With the parameter set
to No in LPAR mode, the OS defined restart policy is used for partition crashes and the Supplemental
Restart policy is used for system crashes.
The default value for Use OS-Defined restart policy in GA1 was Yes. The default for systems shipped
with GA2 & beyond is No.
– Enable supplemental restart policy - The default setting for GA1 was No. The default for systems
shipped with GA2 & beyond is Yes.
In SMP mode, if set to Yes, the service processor restarts the system when the system loses control as
detected by service processor surveillance, and either:
The Use OS-Defined restart policy is set to No
OR
The Use OS-Defined restart policy is set to Yes, and the operating system
has no automatic restart policy.
In LPAR mode, if Use OS-Defined restart policy is No, then Enable supplemental restart policy
controls whether a the service processor restarts the system when the system loses control as detected by
service processor.
– Call-Out before restart (Enabled/Disabled) - This should be left in the default disabled state for this
system. Call out is handled through the Service Focal Point application running on the IBM hardware
management console for pseries. Refer to the “Remote Support” and “Service Enablement” sections later
in this white paper for more detail.
During boot time, the service processor does not configure processors or memory books that are marked
“bad”. If a processor or memory book is deconfigured, the processor or memory book remains offline for
subsequent reboots until it is replaced or Repeat Gard is disabled.
The Repeat Gard function also provides the user with the option of manually deconfiguring a processor
or memory book, or re-enabling a previously deconfigured processor or memory book. Both of these
menus are submenus under the System Information Menu. You can enable or disable CPU Repeat Gard
or Memory Repeat Gard using the Processor Configuration/Deconfiguration Menu, which is a submenu
under the System Information Menu.
To enable or disable Memory Repeat Gard, use menu option 77 of the Memory
Configuration/Deconfiguration Menu.
The failure history of each book is retained. If a book with a history of failures is brought back online by
disabling Repeat Gard, it remains online if it passes testing during the boot process. However, if Repeat
Gard is enabled, the book is taken offline again because of its history of failures.
2.1.8 SP Menus Which Should Not be Changed from Default Values
The following SP menus should be left in their default state for proper operation of the p690 system with
the HMC:
Enable/Disable Modem
Setup Modem Configuration
Setup Dial-out Phone Numbers
Select Modem Line Speed
These functions are provided on the HMC and should be configured there since the modem will be
supported from the HMC instead of attaching to the serial port on the CEC as on prior products.
AIX will attempt to migrate all resources associated with that processor to another processor and then
stop the defective processor.
These errors are not fatal and, as long as they remain rare occurrences, can be safely ignored. However,
when a pattern of failures seems to be developing on a specific processor, this pattern may indicate that
this component is likely to exhibit a fatal failure in the near future. This prediction is made by the
firmware based-on-failure rates and threshold analysis.
AIX, on these systems, implements continuous hardware surveillance and regularly polls the firmware for
hardware errors. When the number of processor errors hits a threshold and the firmware recognizes that
there is a distinct probability that this system component will fail, the firmware returns an error report to
AIX. In all cases, AIX logs the error in the system error log. In addition, on multiprocessor systems,
depending on the type of failure, AIX attempts to stop using the untrustworthy processor and
deallocate it. This feature is called Dynamic Processor Deallocation.
At this point, the processor is also flagged by the firmware for persistent deallocation for subsequent
reboots, until maintenance personnel replaces the processor.
This processor decallocation is transparent for the vast majority of applications, including drivers and
kernel extensions. However, you can use AIX published interfaces to determine whether an application or
kernel extension is running on a multiprocessor machine, find out how many processors there are, and
bind threads to specific processors.
The interface for binding processes or threads to processors uses logical CPU numbers. The logical CPU
numbers are in the range [0..N-1] where N is the total number of CPUs. To avoid breaking applications
or kernel extensions that assume no "holes" in the CPU numbering, AIX always makes it appear for
applications as if it is the "last" (highest numbered) logical CPU to be deallocated. For instance, on an
8-way SMP, the logical CPU numbers are [0..7]. If one processor is deallocated, the total number of
available CPUs becomes 7, and they are numbered [0..6]. Externally, it looks like CPU 7 has
disappeared, regardless of which physical processor failed. In the rest of this description, the term CPU is
used for the logical entity and the term processor for the physical entity.
Applications or kernel extensions using processes/threads binding could potentially be broken if AIX
silently terminated their bound threads or forcefully moved them to another CPU when one of the
processors needs to be deallocated. Dynamic Processor Deallocation provides programming interfaces so
that those applications and kernel extensions can be notified that a processor deallocation is about to
happen. When these applications and kernel extensions get this notification, they are responsible for
moving their bound threads and associated resources (such as timer request blocks) away form
the last logical CPU and adapt themselves to the new CPU configuration.
If, after notification of applications and kernel extensions, some of the threads are still bound to the last
logical CPU, the deallocation is aborted. In this case, AIX logs the fact that the deallocation has been
aborted in the error log and continues using the ailing processor. When the processor ultimately fails, it
creates a total system failure. Thus, it is important for applications or kernel extensions binding threads to
CPUs to get the notification of an impending processor deallocation, and act on this notice.
Even in the rare cases where the deallocation cannot go through, Dynamic Processor Deallocation still
gives advanced warning to system administrators. By recording the error in the error log, it gives them a
chance to schedule a maintenance operation on the system to replace the ailing component before a global
system failure occurs.
1. The firmware detects that a recoverable error threshold has been reached by one of the processors.
2. AIX logs the firmware error report in the system error log, and, when executing on a machine
supporting processor deallocation, start the deallocation process.
3. AIX notifies non-kernel processes and threads bound to the last logical CPU.
4. AIX waits for all the bound threads to move away from the last logical CPU. If threads remain
bound, AIX eventually times out (after ten minutes) and aborts the deallocation
5. Otherwise, AIX invokes the previously registered High Availability Event Handlers (HAEHs). An
HAEH may return an error that will abort the deallocation.
6. Otherwise, AIX goes on with the deallocation process and ultimately stops the failing processor.
In case of failure at any point of the deallocation, AIX logs the failure with the reason why the
deallocation was aborted. The system administrator can look at the error log, take corrective action
(when possible) and restart the deallocation. For instance, if the deallocation was aborted because at least
one application did not unbind its bound threads, the system administrator could stop the application(s),
restart the deallocation (which should go through this time) and restart the application.
Dynamic Processor Deallocation can be enabled or disabled by changing the value of the cpugard
attribute of the ODM object sys0. The possible values for the attribute are enable and disable.
The default, in this version of AIX, is that the dynamic processor deallocation is disabled (the attribute
cpugard has a value of disable). System administrators who want to take advantage of this feature must
enable it using either the Web-based System Manager system menus, the SMIT System Environments
menu, or the chdev command.
Note: If processor deallocation is turned off, AIX still reports the errors in
the error log and you will see the error indicating that AIX was notified of
the problem with a CPU (CPU_FAILURE_PREDICTED, see the
following format).
Sometimes the processor deallocation fails because, for example, an application did not move its bound
threads away from the last logical CPU. Once this problem has been fixed, by either unbinding (when it is
safe to do so) or stopping the application, the system administrator can restart the processor deallocation
process using the ha_star command.
ha_star -C
Physical processors are represented in the ODM data base by objects named procn where n is the physical
processor number (n is a decimal number). Like any other "device" represented in the ODM database,
processor objects have a state (Defined/Available) and attributes.
The state of a proc object is always Available as long as the corresponding processor is present,
regardless of whether it is usable by AIX. The state attribute of a proc object indicates if the processor is
used by AIX and, if not, the reason. This attribute can have three values:
enable
The processor is used by AIX.
disable
The processor has been dynamically deallocated by AIX.
faulty
The processor was declared defective by the firmware at boot time.
In the case of CPU errors, if a processor for which the firmware reports a predictive failure is successfully
deallocated by AIX, its state goes from enable to disable. Independently of AIX, this processor is also
flagged as defective in the firmware. Upon reboot, it will not be available to AIX and will have its state
set to faulty. But the ODM proc object is still marked Available. Only if the defective CPU was physically
removed from the system board or CPU board (if it were at all possible) would the proc object change to
Defined.
Depending on the environment and the software packages installed, selecting this task displays the
following three subtasks:
PCI Hot-plug Manager
SCSI Hot-swap Manager
RAID Hot-plug Devices
To run the Hot-plug Task directly from the command line, type the following: Diag -T"identifyRemove"
If you are running the diagnostics in Online Concurrent mode, run the Missing Options Resolution
Procedure immediately after adding, removing or replacing any device. Start the Missing Options
Resolution Procedure is by running the diag -a command. If the Missing Options Resolution Procedure
runs with no menus or prompts, then device configuration is complete. Otherwise, work through each
menu to complete device configuration.
The Replace/Remove a PCI Hot-plug Adapter function is used to prepare a slot for adapter exchange.
The function lists all the PCI slots that support hot-plug and are occupied. The list includes the slot’s
physical location code and the device name of the resource installed in the slot. The adapter must be in
the Defined state before it can be prepared for hot-plug removal. When a slot is selected, the visual
indicator for the slot is set to the Identify state. After the slot location is confirmed, the visual indicator
for the specified PCI slot is set to the Action state. This means the power for the PCI slot, is off and the
adapter can be removed or replaced.
The Identify a PCI Hot-plug Slot function is used to help identify the location of a PCI hot-plug
adapter. The function lists all the PCI slots that are occupied or empty and support hot-plug. When a slot
is selected for identification, the visual indicator for the slot is set to the Identify state.
The Unconfigure Devices function attempts to put the selected device, in the PCI hot-plug slot, into the
Defined state. This action must be done before any attempted hot-plug function. If the unconfigure
function fails, it is possible that the device is still in use by another application. In this case, the customer
or system administrator must be notified to quiesce the device.
The Configure Devices function allows a newly added adapter to be configured into the system for use.
This function should also be done when a new adapter is added to the system.
The Install/Configure Devices Added After IPL function attempts to install the necessary software
packages for any newly added devices. The software installation media or packages are required for this
function.
Standalone Diagnostics has restrictions on using the PCI Hot-plug Manager. For
example:
Adapters that are replaced must be exactly the same FRU part number as the adapter being
replaced.
New adapters cannot be added unless a device of the same FRU part number already exists in the
system, because the configuration information for the new adapter is not known after the
Standalone Diagnostics are booted.
The following functions are not available from the Standalone Diagnostics and will not display in
the list:
Add a PCI Hot-plug Adapter
Configure Device
Install/Configure Devices Added After IPL
You can run this task directly from the command line by typing the following command:
diag -d device -T"identifyRemove"
However, note that some devices support both the PCI Hot-plug task and the RAID Hot-plug Devices
task. If this is the case for the device specified, then the Hot-plug Task displays instead of the PCI
Hot-plug Manager menu.
The List the SES Devices function lists all the SCSI hot-swap slots and their contents. Status
information about each slot is also available. The status information available includes the slot number,
device name, whether the slot is populated and configured, and location.
The Identify a Device Attached to an SES Device function is used to help identify the location of a
device attached to a SES device. This function lists all the slots that support hot-swap that are occupied
or empty. When a slot is selected for identification, the visual indicator for the slot is set to the Identify
state.
The Attach a Device to an SES Device function lists all empty hot-swap slots that are available for the
insertion of a new device. After a slot is selected, the power is removed. If available, the visual indicator
for the selected slot is set to the Remove state. After the device is added, the visual indicator for the
selected slot is set to the Normal state, and power is restored.
The Replace/Remove a Device Attached to an SES Device function lists all populated hot-swap slots
that are available for removal or replacement of the devices. After a slot is selected, the device populating
that slot is Unconfigured; then the power is removed from that slot. If the Unconfigure operation fails, it
is possible that the device is in use by another application. In this case, the customer or system
administrator must be notified to quiesce the device. If the Unconfigure operation is successful, the visual
indicator for the selected slot is set to the Remove state. After the device is removed or replaced, the
visual indicator, if available for the selected slot, is set to the Normal state, and power is restored.
Note: Be sure that no other host is using the device before you remove it.
The Configure Added/Replaced Devices function runs the configuration manager on the parent
adapters that had child devices added or removed. This function ensures that the devices in the
configuration database are configured correctly. Standalone Diagnostics has restrictions on using the
SCSI Hot-plug Manager. For example:
Devices being used as replacement devices must be exactly the same type of device as the device
being replaced.
New devices may not be added unless a device of the same FRU part number already exists in the
system, because the configuration information for the new device is not known after the Standalone
Diagnostics are booted. You can run this task directly from the command line.
See the following command syntax: diag -d device -T"identifyRemove" OR
diag [-c ]-d device -T"identifyRemove -a [identify|remove ]"
Flag Description
-a Specifies the option under the task.
-c Run the task without displaying menus. Only command line prompts are used. This flag is only
applicable when running an option such as identify or remove.
-d Indicates the SCSI device.
-T Specifies the task to run.
Diagnostic Device list by setting PDiagAtt->attribute=test_mode. Error log analysis can be directed to
run at different times.
Problems are reported by a message to the system console, and a mail message is sent to all members of
the system group. The message contains the SRN.
The diagela program determines whether the error should be analyzed by the diagnostics. If the error
should be analyzed, a diagnostic application will be invoked and the error will be analyzed. No testing is
done. If the diagnostics determines that the error requires a service action, it sends a message to your
console and to all system groups. The message contains the SRN, or a corrective action.
If the resource cannot be tested because it is busy, error log analysis is performed. Hardware errors
logged against a resource can also be monitored by enabling Automatic Error Log Analysis. This allows
error log analysis to be performed every time a hardware error is put into the error log. If a problem is
detected, a message is posted to the system console and a mail message sent to the users belonging to the
system group containing information about the failure, such as the service request number.
The service aid provides the following functions:
Add or delete a resource to the periodic test list
Modify the time to test a resource
Display the periodic test list
Modify the error notification mailing list
Disable or Enable Automatic Error Log Analysis
/usr/lpp/diagnostics/bin/diagela ENABLE
To disable the Automatic Error Log Analysis feature, log in as root and type the
following command:
/usr/lpp/diagnostics/bin/diagela DISABLE
2.2.6 Save or Restore Hardware Management Policies
Use this service aid to save or restore the settings from Ring Indicate Power-On Policy, Surveillance
Policy, Remote Maintenance Policy and Reboot Policy. The following options are available:
Save Hardware Management Policies
This selection writes all of the settings for the hardware-management policies to the following file:
/etc/lpp/diagnostics/data/hmpolicies
p690 Availability Best Practices_100802.lwp Page 24 of 57
IBM ~ pSeries 690 Availability Best Practices White Paper
When this service aid is run from standalone diagnostics, the flash update image file is copied to the file
system from diskette. The user must provide the image on a backup diskette because the user does not
have access to remote file systems or any other files that are on the system. If not enough space is
available, an error is reported, stating additional system memory is needed. After the file is copied, a
screen requests confirmation before continuing with the update flash. Continuing the update flash
reboots the system using the reboot -u command. You may receive a Caution: some process(es) wouldn't
die message during the reboot process. You can ignore this message. The current flash image is not
saved.
D D D D D D D D D D D D D D D D
A A A A A A A A A A A A A A A A
S S S S S S S S S S S S S S S S
D D D D D D D D D D D D D D D D
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
SS SS SS SS SS SS SS SS SS SS SS SS SS SS SS SS
MIDPLANE
SCSI SES MAIN SES SCSI SCSI SES MAIN SES SCSI
SS SS SS SS SS SS
RIO X2 RIO X2
R P P P P P R P P P P P P
P P P P P P P P P
I C C C C C I C C C C C C
C C C C C C C C C
S I I I I I S I I I I I I
I I I I I I I I I
E 1 1 1 1 1 E 1 1 1 1 1 2
1 2 3 4 5 6 7 8 9
R 0 1 2 3 4 R 5 6 7 8 9 0
SW SW
SS SS SS SS SS SS SS SS SS SS SS SS SS SS SS SS SSSS SS SS
PCI
PCI
The user will establish the configuration using the following procedure, described here at a high level.
The user will need to use SMIT.
mirrorvg Command
Purpose
Mirrors all the logical volumes that exist on a given volume group. This command only applies to AIX
Version 4.2.1 or later.
Syntax
Description
The mirrorvg command takes all the logical volumes on a given volume group and mirrors those logical
volumes. This same functionality may also be accomplished manually if you execute the mklvcopy
command for each individual logical volume in a volume group. As with mklvcopy, the target physical
drives to be mirrored with data must already be members of the volume group. To add disks to a volume
group, run the extendvg command.
By default, mirrorvg attempts to mirror the logical volumes onto any of the disks in a volume group. If
you wish to control which drives are used for mirroring, you must include the list of disks in the input
parameters, PhysicalVolume. Mirror strictness is enforced. Additionally, mirrorvg mirrors the logical
volumes, using the default settings of the logical volume being mirrored. If you wish to violate mirror
strictness or affect the policy by which the mirror is created, you must execute the mirroring of all logical
volumes manually with the mklvcopy command.
When mirrorvg is executed, the default behavior of the command requires that the synchronization of the
mirrors must complete before the command returns to the user. If you wish to avoid the delay, use the -S
or -s option. Additionally, the default value of 2 copies is always used. To specify a value other than 2,
use the -c option.
Notes:
1. This command ignores striped logical volumes. Mirroring striped logical volumes
is not possible.
2. To use this command, you must either have root user authority or be a member of
the system group.
Notes:
1. This command ignores striped logical volumes.
2. To use this command, you must either have root user authority or be a member of
the system group.
Note: To use this command, you must either have root user authority or be a member of the
system group.
Attention: The mirrorvg command may take a significant amount of time before completing
because of complex error checking, the amount of logical volumes to mirror in a volume group,
and the time is takes to synchronize the new mirrored logical volumes.
You can use the Web-based System Manager Volumes application (wsm lvm fast path) to run this
command. You could also use the System Management Interface Tool (SMIT) smit mirrorvg fast path
to run this command.
Flags
-c Copies Specifies the minimum number of copies that each logical volume must have after the
mirrorvg command has finished executing. It may be possible, through the independent
use of mklvcopy, that some logical volumes may have more than the minimum number
specified after the mirrorvg command has executed. Minimum value is 2 and 3 is the
maximum value. A value of 1 is ignored.
-m exact map Allows mirroring of logical volumes in the exact physical partition order that the original
copy is ordered. This option requires you to specify a PhysicalVolume(s) where
the exact map copy should be placed. If the space is insufficient for an exact mapping,
then the command will fail. You should add new drives or pick a different set of drives
that will satisfy an exact logical volume mapping of the entire volume group. The
designated disks must be equal to or exceed the size of the drives which are to be exactly
mirrored, regardless of if the entire disk is used. Also, if any logical volume to be
mirrored is already mirrored, this command will fail.
-Q Quorum By default in mirrorvg, when a volume group's contents becomes mirrored, volume
Keep group quorum is disabled. If the user wishes to keep the volume group quorum
requirement after mirroring is complete, this option should be used in the command. For
later quorum changes, refer to the chvg command.
-S Returns the mirrorvg command immediately and starts a background syncvg of the
Background volume group. With this option, it is not obvious when the mirrors have completely
Sync finished their synchronization. However, as portions of the mirrors become
synchronized, they are immediately used by the operating system in mirror usage.
-s Disable Returns the mirrorvg command immediately without performing any type of mirror
Sync synchronization. If this option is used, the mirror may exist for a logical volume but is
not used by the operating system until it has been synchronized with the syncvg
command.
rootvg mirroring When the rootvg mirroring has completed, you must perform three additional tasks:
bosboot, bootlist, and reboot.
The bosboot command is required to customize the bootrec of the newly mirrored
drive. The bootlist command needs to be performed to instruct the system which
disk and order you prefer the mirrored boot process to start.
Finally, the default of this command is for Quorum to be turned off. For this to take
effect on a rootvg volume group, the system must be rebooted.
non-rootvg When this volume group has been mirrored, the default command causes Quorum to
mirroring deactivated. The user must close all open logical volumes, execute varyoffvg and
then varyonvg on the volume group for the system to understand that quorum is or
is not needed for the volume group. If you do not revaryon the volume group,
mirror will still work correctly. However, any quorum changes will not have taken
effect.
rootvg and The system dump devices, primary and secondary, should not be mirrored. In some
non-rootvg systems, the paging device and the dump device are the same device. However,
mirroring most users want the paging device mirrored. When mirrorvg detects that a dump
device and the paging device are the same, the logical volume will be mirrored
automatically.
If mirrorvg detects that the dump and paging device are different logical volumes,
the paging device is automatically mirrored, but the dump logical volume is not. The
dump device can be queried and modified with the sysdumpdev command.
Examples
1. To triply mirror a volume group, enter:
mirrorvg -c 3 workvg
The logical partitions in the logical volumes held on workvg now have three copies.
2. To get default mirroring of rootvg, enter:
mirrorvg rootvg
rootvg now has two copies.
3. To replace a bad disk drive in a mirrored volume group, enter
unmirrorvg workvg hdisk7
reducevg workvg hdisk7
rmdev -l hdisk7 -d
replace the disk drive, let the drive be renamed hdisk7
extendvg workvg hdisk7
mirrorvg workvg
Note: By default in this example, mirrorvg will try to create 2 copies for logical volumes
in workvg. It will try to create the new mirrors onto the replaced disk drive. However, if
the original system had been triply mirrored, there may be no new mirrors created onto
hdisk7, as other copies may already exist for the logical volumes.
4. To sync the newly created mirrors in the background, enter:
mirrorvg -S -c 3 workvg
5. To create an exact mapped volume group, enter:
mirrorvg -m datavg hdisk2 hdisk3
Implementation Specifics
Software Product/Option: Base Operating System/ AIX 3.2 to 4.1 Compatibility
Links
Standards Compliance: NONE
Files
/usr/sbin Directory where the mirrorvg command
resides.
Redundant Array of Independent Disks (RAID) is a term used to describe the technique of improving
data availability through the use of arrays of disks and various data-striping methodologies. Disk arrays
are groups of disk drives that work together to achieve higher data-transfer and I/O rates than those
provided by single large drives. An array is a set of multiple disk drives plus a specialized controller (an
array controller) that keeps track of how data is distributed across the drives. Data for a particular file is
written in segments to the different drives in the array rather than being written to a single drive.
Arrays can also provide data redundancy so that no data is lost if a single drive (physical disk) in the array
should fail. Depending on the RAID level, data is either mirrored or striped.
Subarrays are contained within an array subsystem. Depending on how you configure it, an array
subsystem can contain one or more sub-arrays, also referred to as Logical Units (LUN). Each LUN has
its own characteristics (RAID level, logical block size and logical unit size, for example). From the
operating system, each subarray is seen as a single hdisk with its own unique name.
RAID algorithms can be implemented as part of the operating system's file system software, or as part of
a disk device driver (common for RAID 0 and RAID 1). These algorithms can be performed by a locally
embedded processor on a hardware RAID adapter. Hardware RAID adapters generally provide better
performance than software RAID because embedded processors offload the main system processor by
performing the complex algorithms, sometimes employing specialized circuitry for data transfer and
Manipulation.
Each of the RAID levels supported by disk arrays uses a different method of writing data and hence
provides different benefits.
RAID 0 is also known as data striping. RAID 0 is only designed to increase performance; there is no
redundancy, so any disk failures require reloading from backups. It is not recommended to use this level
for critical applications that require high availability.
RAID 1 is also known as disk mirroring. It is most suited to applications that require high data
availability, good read response times, and where cost is a secondary issue. The response time for writes
can be somewhat slower than for a single disk, depending on the write policy; the writes can either be
executed in parallel for speed or serially for safety. Select RAID Level 1 for applications with a high
percentage of read operations and where the cost is not the major concern.
RAID 2 is rarely used. It implements the same process as RAID 3, but can utilize multiple disk drives for
parity, while RAID 3 can use only one.
RAID 3 and RAID 2 are parallel process array mechanisms, where all drives in the array operate in
unison. Similar to data striping, information to be written to disk is split into chunks (a fixed amount of
p690 Availability Best Practices_100802.lwp Page 31 of 57
IBM ~ pSeries 690 Availability Best Practices White Paper
data), and each chunk is written out to the same physical position on separate disks (in parallel). More
advanced versions of RAID 2 and 3 synchronize the disk spindles so that the reads and writes can truly
occur simultaneously (minimizing rotational latency buildups between disks). This architecture requires
parity information to be written for each stripe of data; the difference between RAID 2 and RAID 3 is
that RAID 2 can utilize multiple disk drives for parity, while RAID 3 can use only one. The LVM does
not support Raid 3; therefore, a RAID 3 array must be used as a raw device from the host system.
RAID 3 provides redundancy without the high overhead incurred by mirroring in RAID 1.
RAID 4 addresses some of the disadvantages of RAID 3 by using larger chunks of data and striping the
data across all of the drives except the one reserved for parity. Write requests require a
read/modify/update cycle that creates a bottleneck at the single parity drive. Therefore, RAID 4 is not
used as often as RAID 5, which implements the same process, but without the parity volume bottleneck.
RAID 5, as has been mentioned, is very similar to RAID 4. The difference is that the parity information is
distributed across the same disks used for the data, thereby eliminating the bottleneck. Parity data is never
stored on the same drive as the chunks that it protects. This means that concurrent read and write
operations can now be performed, and there are performance increases due to the availability of an extra
disk (the disk previously used for parity). There are other enhancements possible to further increase data
transfer rates, such as caching simultaneous reads from the disks and transferring that information while
reading the next blocks. This can generate data transfer rates at up to the adapter speed.
RAID 6 is similar to RAID 5, but with additional parity information written that permits data recovery if
two disk drives fail. Extra parity disk drives are required, and write performance is slower than a similar
implementation of RAID 5.
The RAID 7 architecture gives data and parity the same privileges. The level 7 implementation allows
each individual drive to access data as fast as possible. This is achieved by three features:
RAID 10 - RAID-0+1
RAID-0+1, also known in the industry as RAID 10, implements block interleave data striping and
mirroring. RAID 10 is not formally recognized by the RAID Advisory Board (RAB), but, it is an industry
standard term. In RAID 10, data is striped across multiple disk drives, and then those drives are mirrored
to another set of drives.
RAID 10 provides an enhanced feature for disk mirroring that stripes data and copies the data across all
the drives of the array. The first stripe is the data stripe; the second stripe is the mirror (copy) of the first
data stripe, but it is shifted over one drive. Because the data is mirrored, the capacity of the logical drive
is 50 percent of the physical capacity of the hard disk drives in the array.
The PCI adapter slots controlled by the various PCI Host Bridges are shown in the diagram below.
For placement of redundant adapters or disks, refer to the following section on “I/O Drawer Additions”.
Speedwagon Speedwagon
3.3 3.3
32 32
EADS EADS EADS
EADS EADS EADS
PCI- PCI PCI- PCI PCI- PCI
PCI- PCI PCI- PCI PCI- PCI
Bridge Bridge Bridge
Bridge Bridge Bridge
(PHB1) (PHB2) (PHB4)
(PHB3) (PHB5) (PHB6)
For more information about your device and its capabilities, see the documentation
shipped with that device. For a list of supported adapters and a detailed discussion
about adapter placement, refer to the PCI Adapter Placement Reference, order number
SA38-0538.
2.4.1.1 Non Enhanced Error Handling (EEH) and Third Party Adapters
Because some devices and most third party adapters do not have enhanced error handling (EEH)
capabilities built into their device drivers, non-EEH I/O Adapters (IOA) on a PHB should be solely in
one partition - do not split the PHB between partitions as a failure on a non-EEH adapter will affect all
other adapters on that PHB.
This description uses the term “EADS”, which is IBM unique hardware in the I/O drawer controlling
each PCI slot. EADS, firmware, and device drivers act in concert to support EEH. The exploitation of
EADS error handling occurs in three scenarios:
During firmware probing of the PCI space during boot time configuration
During PCI hot-plug when RTAS configures a PCI adapter after insertion/replacement
During normal run-time operation or AIX diagnostic run.
At boot time, firmware can now deconfigure adapters which cause errors on the PCI bus, and continue
with the boot process, whereas in the past these types of failures would have caused machine checks and
prevented the system from booting. The basic firmware process is as follows:
1. Detect and configure Speedwagon (PHB) bridges.
2. Detect and configure EADS bridges. In each parent PHB, an “ibm,eeh-implemented” property is
added to indicate the number and addresses of all freeze mode capable slots under that PHB.
3. Set all EADS bridges to freeze on error (Bridge Arbitration, Config space 0x0040, bit 16 =1)
4. Probe for devices under each EADS bridge.
5. If a return value of 0xFFFFFFFF is returned on any of the PCI config cycles or direct load requests,
check the freeze state of the bridge (Bridge Interrupt Status, BAR 0 + 0x1234, bit 25:26)
6. If frozen, bypass the configuration of that adapter, leave in freeze mode, and continue with PCI
probing.
7. Before turning control over to AIX, firmware must return all slots to a freeze mode disabled state.
8. Device drivers that support freeze mode detection will re-enable freeze mode for their respective
adapters on a slot-by-slot basis, using the ibm,set-eeh-option RTAS call.
During PCI hot-plug, the firmware scenario would be similar to steps 3-8 above, but targeted to a single
EADS bridge and slot that is being configured by the ibm,configure-connector RTAS call.
To support EADS-based recovery during run-time operation, AIX device drivers and diagnostic must
detect the return values of 0xFFFFFFFF on MMIO Loads to the device address space, and then:
Use the ibm,read-slot-reset-state RTAS function to detect if the bridge is in freeze state
Reset the device using the ibm,set-slot-reset RTAS function
Attempt any appropriate recovery for device operations
The following diagram shows the advantages of configuring redundant adapters and disks across multiple
I/O drawers to eliminate potential single points of failure.
I/O Slot 1
MCM1
001 11 (v); 1A
001 00 (s); 1C
001 01 (t); 1D
001 10 (u);1B
I/O Drawer 5
1 1
0 0 Expansion Rack EIA Pos 1
IO expansion
I/O Slot 2 9 2 P0 A0
GX-RIOP1
10 A1
1 1
I/O Drawer 6
MCM2
010 00 (s); 2A
8 1
GX-RIO
P0 B0 0 0 Expansion Rack EIA Pos 5
P1 B1
11
010 11 (v); 2C
010 10 (u); 2D
1 1 I/O Drawer 3
10 2 P0 A0 0 0
010 01 (t); 2B
GX-RIO P1 A1
Primary Rack EIA Pos 1
4
11 1 P0
GX-RIOP1
B0
B1
1 1 I/O Drawer 4
5 0 0
Primary Rack EIA Pos 13 or
I/O Slot 3 Expansion Rack EIA Pos 9
MCM3
011 11(v): 3A
011 00 (s); 3C
011 01 (t); 3D
011 10 (u); 3B
The following is the list of least to most preferred configurations for populating redundant adapters and
disks in the I/O drawers to provide increasing levels of redundancy and therefore increasing levels of
availability.
1. Redundant adapters/disks within same half of one I/O drawer (i.e. same I/O planar)
2. Redundant adapters/disks across I/O planars (half drawers) within same I/O drawer
3. Redundant adapters/disks across I/O drawer pairs on same I/O card (i.e. I/O drawer pairs 1 & 2 or
pairs 3 & 4 or pairs 5 & 6)
4. Redundant adapters/disks across I/O Driver Cards (i.e. I/O Drawer 1 or 2 and 3 or 4, etc.)
5. Redundant systems within an HACMP cluster. While each increasing level in the list above removes
additional single points of failure from the configuration, because of the focus on high reliability
components and hardening of single point of failure components, any configuration at or above
configuration 2 utilizing redundancy across I/O planars is suggested. The customer will have to
decide how much to invest in redundancy to obtain the required level of availability for their
environment.
Note: These actions need to be performed on both HMCs if redundant HMCs are installed.
Refer to the “Hardware Management Console for pSeries Operations Guide” for additional information
on how to perform the service enablements described.
2.5.3 Establish HMC to OS Partition Network Link for Error Reporting and Management
In order for recovered errors which require service to be sent to the Service Focal Point application on
the hardware maintenance console, there must be a network link from the operating system image
running on the system to the hardware maintenance console. This link is configured on the hardware
maintenance console utilizing the System Configuration set of tools.
2.5.3.1 Customizing Network Settings
Use this section to attach the HMC to a network.
Customize your network settings to edit IP (internet protocol) addresses, name services, and routing
information.
Note: Changes made to your HMC’s network settings do not take effect until you reboot the HMC.
To customize network settings, you must be a member of one of the following roles:
Advanced Operator
System Administrator
Service Representative
Call home
If a customer so chooses, a redundant HMC can be configured for system and service management
purposes. If a failure causes one of the HMCs to become unavailable, the customer can access the system
through the alternate HMC. Note, that either HMC can be used if both are available. They are configured
in a peer (as opposed to Master/Slave) relationship in that there is no limiting primary/backup
relationship between them, and either can be used at any time.
Based on the above changes for LPAR mode, the following table represents the desired settings for
configuring for High Availability utilizing the SP menus or the alternate methods indicated.
Refer to the “Boot Options Utilizing SP Menus and Alternatives” section on page 6 of the paper for
descriptions on how invoke the recommended settings. Some of the options are only valid when
configured in the Service Authority partition as stated in the above table. Otherwise, the options should
be enabled in each operating system image in order to be effective for that LPAR image.
firmware update image is on backup diskettes, perform the firmware update from the service processor
menus as a privileged user. If the firmware update image is in a file on the system, reboot the system in
SMP mode and follow the normal firmware update procedures. If the system is already in SMP mode,
follow the normal firmware update procedures.
Refer to section “Update System or SP Flash” on page 23 for AIX command or Service Aid to perform
this function.
Speedwagon Speedwagon
3.3 3.3
32 32
EADS EADS EADS
EADS EADS EADS
PCI- PCI PCI- PCI PCI- PCI
PCI- PCI PCI- PCI PCI- PCI
Bridge Bridge Bridge
Bridge Bridge Bridge
(PHB1) (PHB2) (PHB4)
(PHB3) (PHB5) (PHB6)
Linux partitions
Linux has to have entire PHBs since Device Drivers are non-EEH
AIX partitions
Non-EEH I/O Adapters (IOA) (including most third party adapters) on a PHB should be solely
in one partition - do not split the PHB between partitions as a failure on a non-EEH adapter will
affect all other adapters on that PHB
For LPARs that have Non-EEH adapters, place all IOAs together (non-EEH and EEH) since a
failure of a Non-EEH adapter will affect the whole partition
If all adapters support EEH, then allow a mix of LPARs on one PHB
Note: This is an allowed configuration but is not the preferred one. See the following list for
allocating adapters for increasing levels of availability within partitions.
1. LPARs use entire PHBs (4 slots or 3 slots plus integrated SCSI)
2. LPARs use entire I/O planars which is half a Bonnie & Clyde (10 slots plus 2 integrated SCSI)
3. LPARs use entire I/O drawer
p690 Availability Best Practices_100802.lwp Page 47 of 57
IBM ~ pSeries 690 Availability Best Practices White Paper
4. LPARs use I/O drawer pairs on same I/O card (i.e. I/O drawer pairs 1 & 2 or pairs 3 & 4 or pairs 5
& 6)
The customer will have to choose which of the above recommended configurations will work best for
them taking into consideration the number of adapters to be configured in the partition, the adapter
placement guidelines and availability and performance objectives for the partition.
3.4.2 I/O Drawer Additions
The only additional concerns related to LPAR when adding I/O drawers over what has been stated in the
SMP section is the proper allocation of these resources among the various partitions. The addition of
more I/O drawers allows increasing capability for configuring for High Availability because of the
increased capacity added to the configuration. Please refer to section “I/O Drawer Additions” on page
33 in conjunction with the above guidelines for assigning I/O adapters to partitions and the “PCI Adapter
Placement Reference” for determining preferred configurations.
Enable systems for Service Focal Point Unknown Set system enablement
automatic call home “Enable / Disable Call for call for service
feature Home” when serviceable action
is reached
Configure Service Service Agent Not configured Configure S.A.
Agent to automatically “Service Agent UI - Parameters to allow
call for service on error Registration/Customi automatic call for
zation” service on error
Note: Should be
performed for each OS
image
Enable for gathering of Service Focal Point Unknown Enable systems to
extended error data “Enable / Disable gather extended error
Extended Error Data data for service events
Collection” Note: Should be
performed for each OS
image
Connecting Two HMCs System Management None Configure redundant
to single system for Environment - HMC in navigation area
redundancy Navigation Area for viewing events
Schedule Critical System Configuration None Backup critical console
Console Data Backups “Scheduled Operations” data (including service
data) on scheduled
basis
Note: These actions need to be performed on both HMCs if redundant HMCs are installed.
Most of the options in the table are described in the section “IBM Hardware Management Console for
pseries (HMC)” on page 35 with the exception of Creating a Service Authority Partition which is
described below. Note that certain options need to be configured for each operating system image in each
of the LPARs as specified in the table.
7. Select boot in partition standby as a boot mode so that the managed system boots to partition
standby mode.
8. Click OK to power on the managed system. In the Contents area, the managed system’s state changes
from No Power to Initializing . . . and then to Ready. When the state reads Ready and the Operator Panel
Value reads LPAR . . . , continue with the next step.
9. Select the managed system in the Contents area.
10. Select Create.
11. Select Partition.
Note: You cannot create a partition without first creating a default profile.
The system automatically prompts you to begin creating a default partition profile for that partition. Do
not click OK until you have assigned all the resources you want to your new partition.
12. Name the default partition profile that you are creating in the General tab in the Profile name field.
Use a unique name for each partition that you create, up to 31 characters long.
13. Click the Memory tab on the Lpar Profile panel. The HMC shows you the total amount of memory
configured for use by the system, and prompts you to enter the number of desired and required memory.
Enter the amount of desired and required memory in 1 gigabyte (GB) increments and 256 megabyte
(MB) increments. You must have a minimum of one GB for each partition.
Note: The HMC shows the total number of installed, useable memory. It does not show how much
memory the system is using at the time.
14. Click the Processor tab on the Lpar Profile window. The HMC shows you the total number of
processors available for use on the system, and prompts you to enter the numbers of processors you
desire and require.
15. Click the I/O tab on the Lpar Profile window. The left side of the dialog displays the I/O drawers
available and configured for use. The I/O Drawers field also displays the media drawer located on the
managed system itself. This grouped I/O is called Native I/O and is identified with the prefix CEC as
shown in the following example. Expand the I/O drawer tree by clicking the icon next to the drawer.
16. Click on the slot to learn more about the adapter installed in that slot. When you select a slot, the field
underneath the I/O drawer tree lists the slot’s class code (in the following example, Token Ring
controller) and physical location code ( P2-I7).
Note: Take note of the slot number when selecting a slot. The slots are not listed in sequential order.
17. Select the slot you want to assign to this partition profile and click Add. If you want to add another
slot, repeat this process. Slots are added individually to the profile; you cannot add more than one slot at
a time. Minimally, you should add a boot device to the required field.
18. Click the Other tab on the Lpar Profile window. This window allows you to set service authoritry
and boot mode policies for this partition profile. Click the box next to Set Service Authority if you want
this partition to be used by service technicians to perform system firmware updates.
19. Click the button next to the boot mode you want for this partition profile. The following example
shows that this user selected service authority for this partition and wants to boot the partition to Normal
mode.
20. Note: Be sure to review each tab of the Partition Properties dialog before clicking OK to ensure you
have assigned all of the resources you need for this partition profile. Click OK when you are finished
assigning resources in the Partition Properties dialog. The default profile appears underneath the
Managed System tree in the Contents area.
21. Now that you have created a partition, you must install an operating system on it for it to function.
To install an operating system on the partition, be sure you have the appropriate resources allocated to
the partition you want to activate and refer to the installation information shipped with your operating
system.
Call home
Customer
Network Service Focal Point
Links
Operating System Operating System Service
SAE Agent
RMC Gateway
Log &
Resource Filter
Mgmt &
Control RMC
(RMC)
Error
Error
Log
Hardware Management
Log
Console
Firmware/Hypervisor
Private
Service
Recoverable Links Service
Errors
Regatta Hardware Fatal Errors Processor
The HMC requires a separate ethernet adapter for connection to a customer network to enable direct
communication between the HMCs and the operating systems running in the partitions.
Refer to section “Establish HMC to OS partition network link for error reporting and management” on
page 36 for a description of how to customize the network link between the HMC and the operating
system images. This procedure should be performed for each partition.
Standalone diagnostics may be run either from diagnostics CD-ROM media (of the correct level for the
machine you wish to run it on) or booting standalone diagnostics from a NIM server. The first part of
this section deals with issues related to booting standalone diagnostics from CD-ROM media, specifically
within a static LPAR system, optimizing the logical assignment of the CD-ROM drive (and associated
devices attached to the same SCSI bus) so as to minimize the impact to users of the system should it
become necessary to run standalone diagnostics on a given partition. The next part discusses NIM boot
of standalone diagnostics.
NOTES:
1: Terminology: This information pertains to placement of CD-ROM or DVD-RAM drives (i.e. devices
capable of reading an AIX or diagnostics CD-ROM) whose intended use includes booting standalone
diagnostics from diagnostics CD-ROM media. This specifically excludes the use of the DVD-RAM drive
in the HMC, which is not intended to be used to boot AIX diagnostic media.
2: Definition: a CD-ROM drive is said to be "parked" in a given partition if it is not immanently needed
by any partition in the system, but rather assigned to a location where it might be moved to another
partition if needed with minimum impact to the users on the system. Another definition would be to leave
the CD-ROM drive on a partition where it is most likely to be used. If the CD-ROM drive is temporarily
needed by another partition, it (and attached associated resources) may be logically assigned to the
requesting partition, but because both the source partition containing the drive and the destination
partition must be rebooted whenever the drive is reassigned then it is desirable to minimize the movement
of the drive between partitions. In particular, one should avoid "parking" a drive in a partition where
the need to reboot that partition in an untimely fashion would negatively impact users of that partition;
once the need for the drive in that partition is through (for instance, after running standalone AIX
diagnostics on that partition) the drive should be moved back to another partition where it is intended to
be "parked", this way the reboot on the partition may be done at a time when it has the least impact to
users on both source and destination partitions, rather than having to move the drive at an inopportune
time when the destination partition desperately needs to have the CD-ROM drive assigned to it.
General recommendations to maximize availability (minimize the need to reboot partitions) when running
standalone diagnostics from CD-ROM:
If the managed system contains more than one CD-ROM drive, if possible, each drive should be
physically connected to a SCSI controller associated with a different PCI "slot" (some SCSI adapters
consist of 2 different SCSI controllers, both associated with the same PCI "slot" (even if the adapter is
integrated onto the system I/O planar, it can be thought of as occupying a PCI slot). If possible, the
controllers in these "slots" should be assigned to different LPAR partitions.
Maximizing the number of CD-ROM drives in the system to the highest number supported, or such that
one CD-ROM drive is available in as many partitions as possible (to a maximum of one CD-ROM drive
per partition) is desirable to minimize the need for rebooting partitions in order to move a drive from
one partition to another.
If the system contains a partition with service authority, ideally, the CD-ROM drive should be "parked" at
that partition (or if the system has more than one drive, one drive should be "parked" at the service
partition (assuming that the service partition may be more readily rebooted than other partitions, there
would be less impact than if the CD-ROM was "parked" on a partition where rebooting the partition
would be less desirable.
If devices that require processing supplemental media are present on a system, if possible, they should be
in the same partition. The diskette drive should be located in the same partition as the CD-ROM drive
that one wants to run standalone diagnostics from.
Standalone diagnostics boot from CD-ROM requires a functioning CD-ROM drive, associated SCSI
controller and PCI bus, plus functioning path between PCI bus and the CEC and memory.
If possible, to avoid having to reboot partitions when moving the CD-ROM drive between partitions, it is
desirable to have one partition (or another system external to the LPAR system) set up as a NIM server.
This can allow the partitions to boot standalone diagnostics from the NIM server instead of from the
CD-ROM drive (and as an additional benefit, can help administer installation of AIX on partitions from
the NIM server). This requires a network adapter tied to the same network as the NIM server be present
in each partition in which you want to be able to do a NIM boot of standalone diagnostics on.
Instead of having to move a CD-ROM drive between source and destination partitions to be able to run
standalone diagnostics from CD-ROM, a NIM boot of standalone diagnostics only requires a reboot of
the partition that one wants to run standalone diagnostics on.
However, the system administrator must be able to properly configure the NIM server, making sure it is
up to date with the proper images necessary to support its client partitions. Also, instead of supplemental
diskettes for adapters that otherwise require them in CD-ROM standalone diagnostics, the NIM server
image must be loaded with support for all the devices that are installed in the client partitions. Because
setup of the NIM server and client is not trivial, it is highly recommended that once it is set up, the
administrator should attempt to do a NIM boot of standalone diagnostics from each partition that might
later rely on it working so that any setup problems might be debugged first on a working system. Also,
NIM boot of standalone diagnostics requires a functioning network adapter in the partition, as well as
associated PCI bus and path to the CEC and system memory. However, note that this can give the service
person an optional way to load standalone diagnostics (such as if the CD-ROM drive or SCSI adapter or
associated PCI logic is not functioning in a given partition).
p690 Availability Best Practices_100802.lwp Page 53 of 57
IBM ~ pSeries 690 Availability Best Practices White Paper
4.0 HACMP
High Availability Cluster Multi-Processing allows the configuration of a cluster of pSeries systems which
provides highly available disk, network, and application resources. These systems may be comprised of
SMP systems or LPAR-based systems. The clusters may be comprised of the following configurations:
Multiple SMP systems
One or more SMP system and one or more Operating System images on an LPAR system/s
LPAR OS images across multiple LPAR systems
One or more LPAR images from one system coupled to one or more images from another
LPAR system
LPAR OS images coupled on the same LPAR system
Multiple OS images coupled on same hardware platform
May be susceptible to hardware single points of failure
Recommended for protection from OS or application faults only
If high availability is critical, then providing just a partial solution is usually not sufficient. In such
situations, we recommend the implementation of high availability failover capability between partitions in
separate servers. Small partitions may only require a relatively small separate partition or system.
p690 Availability Best Practices_100802.lwp Page 54 of 57
IBM ~ pSeries 690 Availability Best Practices White Paper
6.0 Conclusions
The ideas presented forth in this paper are intended to help the system administrator or I/S manager plan
to configure the p690 for higher levels of availability. Each suggestion should be evaluated against the
ultimate availability objective to determine the benefit obtained versus the potential additional cost
incurred by implementing the recommendation.
As stated in the beginning of this document, there are many factors outside the scope of this white paper
which influence the ultimate availability achieved by your specific configuration. Topics such as the
training of the operations and support personnel, the physical environment and surroundings where the
system resides, the operations policies and procedures, etc. are a few examples of these.
The procedures recommended in this white paper coupled with the inherent reliability of the p690 system
and its robust RAS features and functions should help facilitate configuring this system to meet your
availability requirements for years to come.
Additional assistance with performing availability analyses for your specific environment may be
contracted through IBM’s Global Services division or obtained through your marketing representative.
IBM Corporation
Marketing Communications
Server Group
Route 100
Somers, NY 10589
More details on IBM UNIX hardware, software and solutions may be found at:
ibm.com/servers/eserver/pseries.
You can find notices, including applicable legal information, trademark attribution, and notes on
benchmark and performance at www.ibm.com/servers/eserver/pseries/hardware/specnote.html
IBM, the IBM logo, the e-business logo, AIX, and pSeries are registered trademarks or
trademarks of the International Business Machines Corporation in the United States and/or other
countries. The list of all IBM marks can be found at:
http://iplswww.nas.ibm.com/wpts/trademarks/trademar.htm.
Other company, product and service names may be trademarks or service marks of others.
IBM may not offer the products, programs, services or features discussed herein in other
countries, and the information may be subject to change without notice.
IBM hardware products are manufactured from new parts, or new and used parts. Regardless,
our warranty terms apply.
All statements regarding IBM's future direction and intent are subject to change or withdrawal
without notice, and represent goals and objectives only.
Any performance data contained in this document was determined in a controlled environment.
Results obtained in other operating environments may vary significantly.