ESS Problem Determinatio Guide IBM
ESS Problem Determinatio Guide IBM
Version 5.3.1
IBM
GC27-9272-00
Elastic Storage Server
Version 5.3.1
IBM
GC27-9272-00
Note
Before using this information and the product it supports, read the information in “Notices” on page 97.
This edition applies to version 5.3.1 of the Elastic Storage Server (ESS) for Power, to version 5 release 0 modification
1 of the following product, and to all subsequent releases and modifications until otherwise indicated in new
editions:
v IBM Spectrum Scale RAID (product number 5641-GRS)
Significant changes or additions to the text and illustrations are indicated by a vertical line (|) to the left of the
change.
IBM welcomes your comments; see the topic “How to submit your comments” on page ix. When you send
information to IBM, you grant IBM a nonexclusive right to use or distribute the information in any way it believes
appropriate without incurring any obligation to you.
© Copyright IBM Corporation 2014, 2018.
US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract
with IBM Corp.
Contents
Tables . . . . . . . . . . . . . . . v Chapter 10. Recovery Group Issues . . 37
Chapter 1. Drive call home in 5146 and Chapter 12. Maintenance procedures 43
5148 systems . . . . . . . . . . . . 1 Updating the firmware for host adapters,
Background and overview . . . . . . . . . . 1 enclosures, and drives . . . . . . . . . . . 43
Installing the IBM Electronic Service Agent . . . . 2 Disk diagnosis . . . . . . . . . . . . . 44
Login and activation . . . . . . . . . . . 3 Background tasks . . . . . . . . . . . . 45
Electronic Service Agent configuration . . . . . 4 Server failover . . . . . . . . . . . . . 46
Creating problem report . . . . . . . . . 7 Data checksums . . . . . . . . . . . . . 46
Uninstalling and reinstalling the IBM Electronic Disk replacement . . . . . . . . . . . . 46
Service Agent. . . . . . . . . . . . . . 12 Other hardware service . . . . . . . . . . 47
Test call home . . . . . . . . . . . . . 12 Replacing failed disks in an ESS recovery group: a
Callback Script Test. . . . . . . . . . . 13 sample scenario . . . . . . . . . . . . . 47
Post setup activities . . . . . . . . . . . 14 Replacing failed ESS storage enclosure components:
a sample scenario . . . . . . . . . . . . 52
| Chapter 2. Software call home . . . . 15 Replacing a failed ESS storage drawer: a sample
scenario . . . . . . . . . . . . . . . 53
Replacing a failed ESS storage enclosure: a sample
Chapter 3. Re-creating the NVR scenario . . . . . . . . . . . . . . . 59
partitions. . . . . . . . . . . . . . 17 Replacing failed disks in a Power 775 Disk
Enclosure recovery group: a sample scenario . . . 66
Chapter 4. Re-creating NVRAM pdisks 19 Directed maintenance procedures available in the
GUI . . . . . . . . . . . . . . . . . 72
Chapter 5. Steps to restore an I/O node 21 Replace disks . . . . . . . . . . . . 72
Update enclosure firmware . . . . . . . . 73
Update drive firmware . . . . . . . . . 73
Chapter 6. Best practices for Update host-adapter firmware . . . . . . . 74
troubleshooting . . . . . . . . . . . 27 Start NSD . . . . . . . . . . . . . . 74
How to get started with troubleshooting. . . . . 27 Start GPFS daemon . . . . . . . . . . . 74
Back up your data . . . . . . . . . . . . 27 Increase fileset space . . . . . . . . . . 75
Resolve events in a timely manner . . . . . . 28 Synchronize node clocks . . . . . . . . . 75
Keep your software up to date . . . . . . . . 28 Start performance monitoring collector service. . 75
Subscribe to the support notification . . . . . . 28 Start performance monitoring sensor service . . 76
Know your IBM warranty and maintenance
agreement details . . . . . . . . . . . . 29 Chapter 13. References . . . . . . . 77
Know how to report a problem . . . . . . . . 29 Events . . . . . . . . . . . . . . . . 77
Messages . . . . . . . . . . . . . . . 77
Chapter 7. Limitations . . . . . . . . 31 Message severity tags . . . . . . . . . . 77
Limit updates to Red Hat Enterprise Linux (ESS 5.3) 31 IBM Spectrum Scale RAID messages . . . . . 79
Related information
ESS information
http://www-01.ibm.com/support/knowledgecenter/SSYSP8_5.3.1/sts531_welcome.html
For the latest support information about IBM Spectrum Scale™ RAID, see the IBM Spectrum Scale RAID
FAQ in IBM Knowledge Center:
http://www.ibm.com/support/knowledgecenter/SSYSP8/gnrfaq.html
Switch information
ESS release updates are independent of switch updates. Therefore, it is recommended that Ethernet and
Infiniband switches used with the ESS cluster be at their latest switch firmware levels. Customers are
responsible for upgrading their switches to the latest switch firmware. If switches were purchased
through IBM, review the minimum switch firmware used in validation of this ESS release available in
Customer networking considerations section of Elastic Storage Server: Quick Deployment Guide.
Depending on the context, bold typeface sometimes represents path names, directories, or file
names.
bold underlined bold underlined keywords are defaults. These take effect if you do not specify a different
keyword.
constant width Examples and information that the system displays appear in constant-width typeface.
Italics are also used for information unit titles, for the first use of a glossary term, and for
general emphasis in text.
<key> Angle brackets (less-than and greater-than) enclose the name of a key on the keyboard. For
example, <Enter> refers to the key on your terminal or workstation that is labeled with the
word Enter.
\ In command examples, a backslash indicates that the command or coding example continues
on the next line. For example:
mkcondition -r IBM.FileSystem -e "PercentTotUsed > 90" \
-E "PercentTotUsed < 85" -m p "FileSystem space used"
{item} Braces enclose a list from which you must choose an item in format and syntax descriptions.
[item] Brackets enclose optional items in format and syntax descriptions.
<Ctrl-x> The notation <Ctrl-x> indicates a control character sequence. For example, <Ctrl-c> means
that you hold down the control key while pressing <c>.
item... Ellipses indicate that you can repeat the preceding item one or more times.
In the left margin of the document, vertical lines indicate technical changes to the
information.
http://www.ibm.com/support/knowledgecenter/SSYSP8/sts_welcome.html
To contact the IBM Spectrum Scale development organization, send your comments to the following
email address:
ESS version 5.x automatically opens an IBM Service Request with service data, such as the location and
FRU number to carryout the service task. The drive call home feature is only supported for drives
installed in 5887, DCS3700 (1818), 5147-024 and 5147-084 enclosures in the 5146 and 5148 systems.
When a serviceable event occurs on one of the monitored servers, the Hardware Management Console
(HMC) generates a call home event. ESS 5.X provides additional Call Home capabilities for the drives in
the attached enclosures of ESS 5146 and ESS 5148 systems.
In ESS 5146 the HMC obtains the health status from the Flexible Service Process (FSP) of each server.
When there is a serviceable event detected by the FSP, it is sent to the HMC, which initiates a call home
event if needed. This function is not available in ESS 5148 systems.
When the pdisk state is ok, the pdisk is healthy and functioning normally. When the pdisk is in a
diagnosing state, the IBM Spectrum Scale RAID disk hospital is performing a diagnosis task after an
error has occurred.
The disk hospital is a key feature of the IBM Spectrum Scale RAID that asynchronously diagnoses errors
and faults in the storage subsystem. When the pdisk is in a missing state, it indicates that the IBM
Spectrum Scale RAID is unable to communicate with a disk. If a missing disk becomes reconnected and
functions properly, its state changes back to ok. For a complete list of pdisk states and further information
on pdisk configuration and administration, see IBM Spectrum Scale RAID Administration .
Any pdisk that is in the dead, missing, failing or slow state is known as a non-functioning pdisk. When
the disk hospital concludes that a disk is no longer operating effectively and the number of
non-functioning pdisks reaches or exceeds the replacement threshold of their de-clustered array, the disk
hospital adds the replace flag to the pdisk state. The replace flag indicates the physical disk
corresponding to the pdisk that must be replaced as soon as possible. When the pdisk state becomes
replace, the drive replacement callback script is run.
The callback script communicates with the Electronic Service Agent™ (ESA) over a REST API. The ESA is
installed in the ESS Management Server (EMS), and initiates a call home task. The ESA is responsible for
automatically opening a Service Request (PMR) with IBM support, and managing end-to-end life cycle of
the problem.
The IBM Electronic Service Agent is installed when the gssinstall command is run. The gssinstall
command can be used in one of the following ways depending on the system:
v For 5146 system:
gssinstall_ppc64 -u
v For 5148 system:
gssinstall_ppc64le -u
The rpm files for the esagent is found in the /install/gss/otherpkgs/rhels7/<arch>/gss directory.
Issue the following command to verify that the rpm for the esagent is installed:
rpm -qa | grep esagent
For example:
https://192.168.45.20:5024/esa
The ESA uses port 5024 by default. It can be changed by using the ESA CLI if needed. For more
information on ESA, see IBM Electronic Service Agent. On the Welcome page, log in to the IBM Electronic
Service Agent GUI. If an untrusted site certificate warning is received, accept the certificate or click Yes to
proceed to the IBM Electronic Service Agent GUI. You can get the context sensitive help by selecting the
Help option located in the upper right corner.
After you have logged in, go to the Main Activate ESA, to run the activation wizard. The activation
wizard requires valid contact, location and connectivity information.
The All Systems menu option shows the node where ESA is installed. For example, ems1. The node
where ESA is installed is shown as PrimarySystem in the System Info. The ESA Status is shown as
Online only on the PrimarySystem node in the System Info tab.
Note: The ESA is not activated by default. In case it is not activated, you will get a message similar to
the following:
[root@ems1 tmp]# gsscallhomeconf -E ems1 --show
IBM Electronic Service Agent (ESA) is not activated.
Activated ESA using /opt/ibm/esa/bin/activator -C and retry.
In ESS, the ESA is only installed on the EMS, and automatically discovers the EMS as PrimarySystem.
The EMS and I/O Servers have to be registered to ESA as endpoints. The gsscallhomeconf command is
used to perform the registration task. The command also registers enclosures attached to the I/O servers
by default.
| The software call home is registered based on the customer information given while configuring the ESA
| agent. A software call home group auto is configured by default and the EMS node acts as the software
| call home server. The weekly and daily software call home data collection configuration is also activated
| by default.
| The software call home uses the ESA network connection settings to upload the data to IBM. The ESA
| agent network setup must be complete and working for the software call home to work.
| Note: You cannot configure the software call home without configuring the ESA. For more information,
| see Chapter 2, “Software call home,” on page 15.
usage: gsscallhomeconf [-h] ([-N NODE-LIST | -G NODE-GROUP] [--show] [--prefix PREFIX] [--suffix SUFFIX]
| -E ESA-AGENT [--register {node,all}] [--no-swcallhome] [--crvpd]
[--serial SOLN-SERIAL] [--model SOLN-MODEL] [--verbose]
optional arguments:
-h, --help Show this help message and exit
-N NODE-LIST Provide a list of nodes to configure.
-G NODE-GROUP Provide name of node group.
--show Show callhome configuration details.
--prefix PREFIX Provide hostname prefix. Use = between --prefix and value if the value starts with -.
--suffix SUFFIX Provide hostname suffix. Use = between --suffix and value if the value starts with -.
-E ESA-AGENT Provide nodename for esa agent node
--register {node,all} Register endpoints(nodes, enclosure or all) with ESA.
| --no-swcallhome Do not configure software callhome while configuring hardware callhome
--crvpd Create vpd file.
--serial SOLN-SERIAL Provide ESS solution serial number.
--model SOLN-MODEL Provide ESS model.
--verbose Provide verbose output
The gsscallhomeconf command logs the progress and error messages in the /var/log/messages file. There
is a --verbose option that provides more details of the progress, as well error messages. The following
example displays the type of information sent to the /var/log/messages file in the EMS by the
gsscallhomeconf command.
[root@ems1 vpd]# grep ems1 /var/log/messages | grep gsscallhomeconf
Feb 8 01:37:39 ems1 gsscallhomeconf: [I] End point ems1-ib registered successfully with
systemid 802cd01aa0d3fc5137f006b7c9d95c26
Feb 8 01:37:40 ems1 gsscallhomeconf: [I] End point essio11-ib registered successfully
with systemid c7dba51e109c92857dda7540c94830d3
Feb 8 01:37:41 ems1 gsscallhomeconf: [I] End point essio12-ib registered successfully
with systemid 898fb33e04f5ea12f2f5c7ec0f8516d4
Feb 8 01:43:04 ems1 gsscallhomeconf: [I] ESA configuration for ESS Callhome is complete.
The endpoints are visible in the ESA portal after registration, as shown in the following figure:
Name
Shows the name of the endpoints that are discovered or registered.
SystemHealth
Shows the health of the discovered endpoints. A green icon (') indicates that the discovered
system is working fine. The red (X) icon indicates that the discovered endpoint has some
problem.
ESAStatus
Shows that the endpoint is reachable. It is updated whenever there is a communication between
the ESA and the endpoint.
SystemType
Shows the type of system being used. Following are the various ESS device types that the ESA
supports.
Detail information about the node can be obtained by selecting System Information. Here is an example
of the system information:
When an endpoint is successfully registered, the ESA assigns a unique system identification (system id) to
the endpoint. The system id can be viewed using the --show option.
For example:
{
"c14e80c240d92d51b8daae1d41e90f57": "G5CT018",
"c7dba51e109c92857dda7540c94830d3": "essio11-ib",
"898fb33e04f5ea12f2f5c7ec0f8516d4": "essio12-ib",
"802cd01aa0d3fc5137f006b7c9d95c26": "ems1-ib",
"524e48d68ad875ffbeeec5f3c07e1acf": "G5CT016"
}
When an event is generated by an endpoint, the node associated with the endpoint must provide the
system id of the endpoint as part of the event. The ESA then assigns a unique event id for the event. The
system id of the endpoints are stored in a file called esaepinfo01.json in the /vpddirectory of the EMS
and I/O servers that are registered. The following example displays a typical esaepinfo01.json file:
[root@ems1 vpd]# cat esaepinfo01.json
{
"encl": {
"G5CT016": "524e48d68ad875ffbeeec5f3c07e1acf",
"G5CT018": "c14e80c240d92d51b8daae1d41e90f57"
},
"esaagent": "ems1", "node": {
"ems1-ib": "802cd01aa0d3fc5137f006b7c9d95c26",
"essio11-ib": "c7dba51e109c92857dda7540c94830d3",
"essio12-ib": "898fb33e04f5ea12f2f5c7ec0f8516d4"
}
In the ESS 5146, the gsscallhomeconf command requires the ESS solution vpd file that contains the IBM
Machine Type and Model (MTM) and serial number information to be present. The vpd file is used by
the ESA in the call home event. If the vpd file is absent, the gsscallhomeconf command fails, and
displays an error message that the vpd file is missing. In this case, you can rerun the command with the
--crvpd option, and provide the serial number and model number using the --serial and --model
options. In ESS 5148, the vpd file is auto generated if not present.
The system vpd information is stored in the essvpd01.json file in the EMS /vpd directory. Here is an
example of a vpd file:
[root@ems1 vpd]# cat essvpd01.json
{
"groupname": "ESSHMC", "model": "GS2",
"serial": "219G17G", "system": "ESS", "type": "5146"
}
[root@ems1 vpd]# cat essvpd01.json
{
"groupname": "ESSHMC", "model": "GS2",
"serial": "219G17G", "system": "ESS", "type": "5146"
}
For example, when replace is added to a pdisk state, indicating that the corresponding physical drive
needs to be replaced, an event request is sent to the ESA with the associated system id of the enclosure
where the physical drive resides. Once the ESA receives the request it generates a call home event. Each
server in the ESS is configured to enable callback for IBM Spectrum Scale RAID related events. These
callbacks are configured during the cluster creation, and updated during the code upgrade. The ESA can
filter out duplicate events when event requests are generated from different nodes for the same physical
drive. The ESA returns an event identification value when the event is successfully processed. The ESA
portal updates the status of the endpoints. The following figure shows the status of the enclosures when
The problem descriptions of the events can be seen by selecting the endpoint. You can select an endpoint
by clicking the red X. The following figure shows an example of the problem description.
Name
It is the serial number of the enclosure containing the drive to be replaced.
Description
It is a short description of the problem. It shows ESS version or generation, service task name and
location code. This field is used in the synopsis of the problem (PMR) report.
SRC
It is the Service Reference Code (SRC). An SRC identifies the system component area. For
example, DSK XXXXX, that detected the error and additional codes describing the error
condition. It is used by the support team to perform further problem analysis, and determine
service tasks associated with the error code and event.
Time of Occurrence
It is the time when the event is reported to the ESA. The time is reported by the endpoints in the
UTC time format, which ESA displays in local format.
Problem details
Further details of a problem can be obtained by clicking the Details button. The following figure shows
an example of a problem detail.
If an event is successfully reported to the ESA, and an event ID is received from the ESA, the node
reporting the event uploads additional support data to the ESA that are attached to the problem (PMR)
for further analysis by the IBM support team.
The callback script logs information in the /var/log/messages file during the problem reporting episode.
The following examples display the messages logged in the /var/log/message file generated by the
essio11 node:
The call home monitoring protects against missing a call home due to the ESA missing a callback event.
If a problem report is not already created, the call home monitoring ensures that a problem report is
created.
Note: When the call home problem report is generated by the monitoring script, as opposed to being
triggered by the callback, the problem support data is not automatically uploaded. In this scenario, the
IBM support can request support data from the customer.
Upload data
The following support data is uploaded when the system displays a drive replace notification:
v The output of mmlspdisk command for the pdisk that is in replace state.
v Additional support data is provided only when the event is initiated as a response to a callback. The
following information is supplied in a .tgz file as additional support data:
Note: If a PMR is created because of the periodic checking of the replaced drive state, for example, when
the callback event is missed, additional support data is not provided.
The ESA rpm files must be removed manually if needed. Issue the following command to remove the
rpm files for the esagent:
yum remove esagent.pLinux-4.2.0-9.noarch
You can issue the following command to reinstall the rpm files for the esagent. The esagent requires the
gpfs.java file to be installed. The gpfs.java file is automatically installed by the gssinstall and
gssdeploy script. The dependencies may still not be resolved. In such case, use the --nodeps option to
install it.
rpm -ivh --nodeps esagent.pLinux4.1.012.noarch
To test the callback script, select a pdisk from each enclosure alternating recovery groups. The purpose of
the test call home events is to ensure that all the attached enclosures can generate call home events by
using both the I/O servers in the building block.
For example, in a GS2 system with 5885 enclosure, one can select pdisks e1s02 (left RG) and e2s20 (right
RG). You must find the corresponding recovery group and active server for these pdisks. Send a disk
event to the ESA from the active recovery group server as shown in the following steps:.
Examples:
1. ssh to essio11
gsscallhomeevent --event pdReplacePdisk
--eventName “Test symptom generated by Electronic Service Agent”
--rgName rg_essio11-ib --pdName e1s02
Here the recovery group is rg_essio11-ib, and the active server is essio11-ib.
2. ssh to essio12
gsscallhomeevent --event pdReplacePdisk
--eventName Test “symptom generated by Electronic Service Agent”
--rgName rg_essio12-ib --pdName e2s20
Here the recovery group is rg_essio12-ib, and the active server is essio12-ib.
Note: Ensure that you state Test symptom generated by Electronic Service Agent in the --eventName
option. Check in the ESA that the enclosure system health is showing the event. You might have to
refresh the screen to make the event visible.
For DCS3700 enclosures, the pdisks to test call home can have the e1d1s1 and the e2d5s10 (e3d1s1,
e4d5s10 etc.) alternating for recovery groups. For 5148-084 enclosures, the pdisks to test call home can
have the e1d1s1 (or e1d1s1ssd) and the e2d2s14 (e3d1s1, e4d2s14 etc) alternating for the recovery groups.
| These details are shared with the IBM® support center for monitoring and problem determination. For
| more information on call home, see Installing call home and Understanding call home.
| The call home hardware and software call home can be set up using the following command:
| [root@ems1 ~]# gsscallhomeconf -E ems1 -N ems1,gss_ppc64 --suffix=-ib
| If you want to skip the software call home set up, use the following command:
| [root@ems3 ~]# gsscallhomeconf -E ems3 -N ems3,gss_ppc64 --suffix=-te --register=all --no-swcallhome
Although a total of 6 partitions are created, only 2 are actually used per I/O node, one for each NVR
pdisk. In some cases the NVRAM partitions might need to be recreated. For example, after a
hardware/OS failure.
Before re-creating the NVR partitions, list all the existing partitions for sda. To list all partitions for sda,
run the following command:
parted /dev/sda unit KiB print
For optimal alignment, each partition must be exactly 2048000 KiB in size, and must be 1024 KiB apart
from each other.
In the sample output, the last end size pertains to Partition # 9, and has a value of 543247360 KiB.
To get the NVR partition's new start value, add 1024 KiB to the last end size value, and add 2048000 KiB
to the start value to determine the new end as shown:
1. NVR Partition 1 new start value = Last end size value + 1024 KiB = 543247360 KiB + 1024 KiB =
543248384 KiB
2. NVR Partition 1 new end = NVR Partition 1 new start value + 2048000 KiB = 543248384 KiB +
2048000 KiB = 545296384 KiB
To create the first NVR partition, run the following command:
parted /dev/sda mkpart logical 543248384KiB 545296384KiB
To get the new start for the second partition, you need to add 1024 KiB to the end size value of partition
1. Repeat the steps to calculate the start and end positions for the second partition as shown:
1. NVR Partition 2 new start = NVR Partition 1 end value + 1024 KiB = 545296384 KiB + 1024 KiB =
545297408 KiB
2. NVR Partition 2 new end = NVR Partition 2 new start value + 2048000 KiB = 545297408 KiB +
2048000 KiB = 547345408 KiB
Repeat the above steps four times to create a total of six partitions. When complete, the partitions list for
sda will look similar to the following:
The NVRAM pdisks may stop functioning and go into a missing state. This could be due to hardware
failure of the IPR card, or corrupt or missing NVR OS partition caused by an OS failure. To fix this
problem, the NVRAM pdisks must be recreated.
You can find the pdisks that are in a missing state by running the mmlsrecoverygroup command.
mmlsrecoverygroup rg_gssio1 -L --pdisk | grep NVR
NVR no 1 2 0,0 1 3632 MiB 14 days inactive 0% low
n1s01 0, 0 NVR 1816 MiB missing
n2s01 0, 0 NVR 1816 MiB missing
Before recreating the pdisks, ensure that all six NVRAM partitions exist on the sda by using the following
command:
parted /dev/sda unit KiB print
Note: In case the partitions are not present, you must recreate the 6 NVR partitions. For more
information, see "Re-creating the NVR partitions" .
After you have verified the 6 NVR partitions, create a stanza file for each of the NVRAM devices that are
missing, and save it.
Run the mmaddpdisk command using the stanza file that was created to replace the missing pdisks.
mmaddpdisk rg_gssio1 -F gssio1.stanza --replace
Run the mmlsrecoverygroup command to confirm the current state of the pdisks.
mmlsrecoverygroup rg_gssio1 -L --pdisk | grep NVR
n1s01 1, 1 NVR 1816 MiB ok
n2s01 1, 1 NVR 1816 MiB ok
Run the mmaddpdisk command to recreate the other missing NVRAM pdisks.
This process will restore the OS image as well as the required ESS software, drivers and firmware.
Note: For the following steps, we assume that the gssio1 node is the node that is being restored.
1. Disable the GPFS auto load using the mmchconfig command.
Note: If you are restoring gssio1, the active recovery group server for gssio1 should be gssio2. If it
is not set to gssio2, you need to run the mmchrecoverygroup command to change it.
[ems1]# mmchrecoverygroup rg_gssio1 --servers <NEW PRIMARY NODE>,<OLD PRIMARY NODE>
[root@gssio1 ~]# mmchrecoverygroup rg_gssio1 --servers gssio2,gssio1
[ems1]# mmlsrecoverygroup rg_gssio1 -L | grep "active recovery" -A2
active recovery group server servers
----------------------------------------------- -------
gssio2 gssio1,gssio2
Note: Depending on what needs to be updated, the node might reboot one or more time. You need
to wait until there is no process output before taking the next step.
8. Verify that the upgrade files have been copied to the I/O node sync directory, /install/gss/sync/
ppc64/.
[ems]# ssh <REPLACEMENT NODE> "ls /install/gss/sync/ppc64/"
[root@ems1]# ssh gssio2 "ls /install/gss/sync/ppc64/"
gssio2: mofed
Wait for the directory to sync. After the mofed directory is created, you can take the next step.
9. Copy the host files from the healthy node to the replacement node.
[ems]# scp /etc/hosts <REPLACEMENT NODE>:/etc/
[root@ems1 mofed]# scp /etc/hosts gssio2:/etc/
10. Configure the network on the replacement node.
If you had backed up the network files previously, you can copy them over to the node, and restart
the node. Verify that the names of the devices are consistent with the backed up version before
replacing the files.
Note: This code is executed on the replacement node, and the -p option is applied to an existing
healthy node.
13. Start GPFS on the recovered node, and enable the GPFS auto load.
a. Before starting GPFS, verify that the replacement node is still in DOWN state.
[ems]# mmgetstate -aL
Node number Node name Quorum Nodes up Total nodes GPFS state Remarks
------------------------------------------------------------------------------------
1 gssio1 2 2 5 active quorum node
2 gssio2 0 0 5 down quorum node
3 ems1 2 2 5 active quorum node
4 gsscomp1 2 2 5 active
5 gsscomp 2 2 5 active
b. Start GPFS on the replacement node.
[ems]# mmstartup -N <REPLACEMENT NODE>
mmstartup: Starting GPFS ...
c. Verify that the replacement node is active.
[ems]# mmgetstate -aL
Node number Node name Quorum Nodes up Total nodes GPFS state Remarks
------------------------------------------------------------------------------------
1 gssio1 2 3 5 active quorum node
2 gssio2 2 3 5 active quorum node
3 ems1 2 3 5 active quorum node
4 gsscomp1 2 3 5 active
5 gsscomp2 2 3 5 active
d. Ensure that all the file systems are mounted on the replacement node.
[ems]# mmmount all -N <REPLACEMENT NODE>
[ems]# mmlsmount all -L
If the partitions do not exist, you need to create them. For more information, see Chapter 3,
“Re-creating the NVR partitions,” on page 17
16. View the current NVR device status.
[ems1]# mmlsrecoverygroup rg_gssio1 -L --pdisk | egrep "n[0-9]s[0-9]"
n1s01 1, 1 NVR 1816 MiB ok
n2s01 0, 0 NVR 1816 MiB missing
Note: The missing NVR devices must be recreated or replaced. For more information, see Chapter 4,
“Re-creating NVRAM pdisks,” on page 19
For information on IBM Spectrum Scale issues and their resolution, see the IBM Spectrum Scale: Problem
Determination Guide in the IBM Spectrum Scale Knowledge Center.
When you experience some issues with the system, go through the following steps to get started with the
troubleshooting:
1. Check the events that are reported in various nodes of the cluster by using the mmhealth node
eventlog command.
2. Check the user action corresponding to the active events and take the appropriate action. For more
information on the events and corresponding user action, see “Events” on page 77.
3. Check for events which happened before the event you are trying to investigate. They might give you
an idea about the root cause of problems. For example, if you see an event nfs_in_grace and
node_resumed a minute before you get an idea about the root cause why NFS entered the grace
period, it means that the node has been resumed after a suspend.
4. Collect the details of the issues through logs, dumps, and traces. You can use various CLI commands
and Settings > Diagnostic Data GUI page to collect the details of the issues reported in the system.
5. Based on the type of issue, browse through the various topics that are listed in the troubleshooting
section and try to resolve the issue.
6. If you cannot resolve the issue by yourself, contact IBM Support.
Follow the guidelines in the following sections to avoid any issues while creating backup:
v GPFS(tm) backup data in IBM Spectrum Scale: Concepts, Planning, and Installation Guide
v Backup considerations for using IBM Spectrum Protect™ in IBM Spectrum Scale: Concepts, Planning, and
Installation Guide
v Configuration reference for using IBM Spectrum Protect with IBM Spectrum Scale(tm) in IBM Spectrum Scale:
Administration Guide
v Protecting data in a file system using backup in IBM Spectrum Scale: Administration Guide
v Backup procedure with SOBAR in IBM Spectrum Scale: Administration Guide
The following best practices help you to troubleshoot the issues that might arise in the data backup
process:
1. Enable the most useful messages in mmbackup command by setting the MMBACKUP_PROGRESS_CONTENT
and MMBACKUP_PROGRESS_INTERVAL environment variables in the command environment prior to issuing
the mmbackup command. Setting MMBACKUP_PROGRESS_CONTENT=7 provides the most useful messages. For
more information on these variables, see mmbackup command in IBM Spectrum Scale: Command and
Programming Reference.
2. If the mmbackup process is failing regularly, enable debug options in the backup process:
You can use the mmhealth node eventlog to list the events that are reported in the system.
The Monitoring > Events GUI page lists all events reported in the system. You can also mark certain
events as read to change the status of the event in the events view. The status icons become gray in case
an error or warning is fixed or if it is marked as read. Some issues can be resolved by running a fix
procedure. Use the action Run Fix Procedure to do so. The Events page provides a recommendation for
which fix procedure to run next.
This can be done by checking the IBM support website to see if new code releases are available: IBM
Elastic Storage™ Server support website. The release notes provide information about new function in a
release plus any issues that are resolved with the new release. Update your code regularly if the release
notes indicate a potential issue.
Note: If a critical problem is detected on the field, IBM may send a flash, advising the user to contact
IBM for an efix. The efix when applied might resolve the issue.
Subscribe to support notifications by visiting the IBM support page on the following IBM website:
http://www.ibm.com/support/mynotifications.
By subscribing, you are informed of new and updated support site information, such as publications,
hints and tips, technical notes, product flashes (alerts), and downloads.
For more information on the IBM Warranty and maintenance details, see Warranties, licenses and
maintenance.
IBM maintains pages on the web where you can get information about IBM products and fee services,
product implementation and usage assistance, break and fix service support, and the latest technical
information. The following table provides the URLs of the IBM websites where you can find the support
information.
Table 2. IBM websites for help, services, and information
Website Address
IBM home page http://www.ibm.com
Directory of worldwide contacts http://www.ibm.com/planetwide
Support for ESS IBM Elastic Storage Server support website
Support for IBM System Storage and IBM Total Storage http://www.ibm.com/support/entry/portal/product/
products system_storage/
Note: Available services, telephone numbers, and web links are subject to change without notice.
Make sure that you have taken steps to try to solve the problem yourself before you call. Some
suggestions for resolving the problem before calling IBM Support include:
v Check all hardware for issues beforehand.
v Use the troubleshooting information in your system documentation. The troubleshooting section of the
IBM Knowledge Center contains procedures to help you diagnose problems.
To check for technical information, hints, tips, and new device drivers or to submit a request for
information, go to the IBM Elastic Storage Server support website.
Information about your IBM storage system is available in the documentation that comes with the
product. That documentation includes printed documents, online documents, readme files, and help files
in addition to the IBM Knowledge Center.
ESS 5.3 supports Red Hat Enterprise Linux 7.3 (3.10.0-514.44.1 ppc64BE and LE). It is highly
recommended that you install only the following types of updates to RHEL:
v Security updates.
v Errata updates that are requested by IBM Service.
The system will return a gpfs.snap, an installcheck, and the data from each node.
CAUTION: This should be done as a last resort since the GUI configuration settings will be lost after
you execute the following steps:
a. Stop the GUI service.
systemctl stop gpfsgui
b. Drop the GUI schema from the postgres database.
psql postgres postgres -c "DROP SCHEMA FSCC CASCADE"
c. Start the GUI service.
declustered
arrays with
recovery group vdisks vdisks servers
------------------ ------- ------ -------
rg_rchgss1-hs 3 5 rchgss1-hs.gpfs.rchland.ibm.com,rchgss2-hs.gpfs.rchland.ibm.com
rg_rchgss2-hs 3 5 rchgss2-hs.gpfs.rchland.ibm.com,rchgss1-hs.gpfs.rchland.ibm.com
List the active recovery group and their primary servers using the following command :
mmlsrecoverygroup rg_rchgss1-hs -L | grep active -A2
Each of the recovery groups must be served by its own server. If the server is unavailable due to
maintenance or other issues, the recovery group must be served by an available server. After a failure or
maintenance event, when the recovery group’s primary server becomes active again, it must
automatically begin serving its recovery group. You will find the following information under the
/var/adm/ras/mmfs.log.latest file under in the recovery group server:
v Now serving recovery group rg_rchgss1-hs.
v Reason for takeover of rg_rchgss1-hs: 'primary server became ready'.
If the recovery group is not being served by it’s respective server, examine the gpfs log on that server for
errors that might prevent the server from serving the recovery group. If there are no issues, you can
manually activate the recovery group. For example, to allow rchgss1-hs.gpfs.rchland.ibm.com to server
the rg_rchgss1-hs RG, execute:
mmchrecoverygroup rg_rchgss1-hs --active rchgss1-hs.gpfs.rchland.ibm.com
In certain situations, if an ESS server node experiences a disk failure, the disks may be marked down,
and does not automatically start. This can prevent the recovery group from becoming active. For more
information on troubleshooting disk problems, see Disk Issues in IBM Spectrum Scale documentation.
Before troubleshooting further, ensure that GPFS is in the active state for the node in question by running
the mmgetstate command:
mmgetstate -a
Execute the mmlsdisk command to check the status of the disks. The -e option will only display disks
with errors.
mmlsdisk gpfs0 -e
Attention: Due to an earlier configuration change the file system may contain data that is at risk of being
lost.
In the above example, the disk is in the suspended state, hence the to be emptied status. Other disks
may be in the non-ready state or whose availability is down so this prevents the disks from being used
by the GPFS/ESS.
disk driver sector failure holds holds storage
name type size group metadata data status availability disk id pool remarks
------------------------------------ -------- ------ --------- -------- ----- ------------- ------------ -------------- --------
rg_rchgss1_hs_MetaData_1M_3W_1 nsd 512 30 Yes No ready up 1 system
rg_rchgss1_hs_Data_16M_2p_1 nsd 512 30 No Yes ready up 2 data desc
rg_rchgss2_hs_MetaData_1M_3W_1 nsd 512 30 Yes No ready up 3 system desc
rg_rchgss2_hs_Data_16M_2p_1 nsd 512 30 No Yes ready up 4 data desc
You can try to manually start the disks by running the mmchdisk command.
mmchdisk gpfs0 start -d rg_rchgss1_hs_MetaData_1M_3W_1
mmnsddiscover: Attempting to rediscover the disks. This may take a while ...
mmnsddiscover: Finished.
rchgss1-hs.gpfs.rchland.ibm.com: Rediscovered nsd server access to rg_rchgss1_hs_MetaData_1M_3W_1
Note: Depending on number of disks that are down and their size, the mmnsddiscover command may
take a while to complete.
Obtain this information as quickly as you can after a problem is detected, so that error logs will not wrap
and system parameters that are always changing, will be captured as close to the point of failure as
possible. When a serious problem is detected, collect this information and then call IBM.
Information to collect for all problems related to IBM Spectrum Scale RAID
Regardless of the problem encountered with IBM Spectrum Scale RAID, the following data should be
available when you contact the IBM Support Center:
1. A description of the problem.
2. Output of the failing application, command, and so forth.
To collect the gpfs.snap data and the ESS tool logs, issue the following from the EMS:
gsssnap -g -i -n <IO node1>, <IOnode2>,... <ioNodeX>
3. A tar file generated by the gpfs.snap command that contains data from the nodes in the cluster. In
large clusters, the gpfs.snap command can collect data from certain nodes (for example, the affected
nodes, NSD servers, or manager nodes) using the -N option.
For more information about gathering data using the gpfs.snap command, see the IBM Spectrum Scale:
Problem Determination Guide.
If the gpfs.snap command cannot be run, collect these items:
a. Any error log entries that are related to the event:
v On a Linux node, create a tar file of all the entries in the /var/log/messages file from all nodes in
the cluster or the nodes that experienced the failure. For example, issue the following command
to create a tar file that includes all nodes in the cluster:
mmdsh -v -N all "cat /var/log/messages" > all.messages
v On an AIX® node, issue this command:
errpt -a
For more information about the operating system error log facility, see the IBM Spectrum Scale:
Problem Determination Guide.
b. A master GPFS log file that is merged and chronologically sorted for the date of the failure. (See
the IBM Spectrum Scale: Problem Determination Guide for information about creating a master GPFS
log file.
c. If the cluster was configured to store dumps, collect any internal GPFS dumps written to that
directory relating to the time of the failure. The default directory is /tmp/mmfs.
d. On a failing Linux node, gather the installed software packages and the versions of each package
by issuing this command:
rpm -qa
e. On a failing AIX node, gather the name, most recent level, state, and description of all installed
software packages by issuing this command:
lslpp -l
When a delay or deadlock situation is suspected, the IBM Support Center will need additional
information to assist with problem diagnosis. If you have not done so already, make sure you have the
following information available before contacting the IBM Support Center:
1. Everything that is listed in “Information to collect for all problems related to IBM Spectrum Scale
RAID” on page 39.
2. The deadlock debug data collected automatically.
3. If the cluster size is relatively small and the maxFilesToCache setting is not high (less than 10,000),
issue the following command:
gpfs.snap --deadlock
If the cluster size is large or the maxFilesToCache setting is high (greater than 1M), issue the
following command:
gpfs.snap --deadlock --quick
For more information about the --deadlock and --quick options, see the IBM Spectrum Scale: Problem
Determination Guide .
When file system corruption or MMFS_FSSTRUCT errors are encountered, the IBM Support Center will
need additional information to assist with problem diagnosis. If you have not done so already, make sure
you have the following information available before contacting the IBM Support Center:
1. Everything that is listed in “Information to collect for all problems related to IBM Spectrum Scale
RAID” on page 39.
2. Unmount the file system everywhere, then run mmfsck -n in offline mode and redirect it to an output
file.
The IBM Support Center will determine when and if you should run the mmfsck -y command.
When the GPFS daemon is repeatedly crashing, the IBM Support Center will need additional information
to assist with problem diagnosis. If you have not done so already, make sure you have the following
information available before contacting the IBM Support Center:
1. Everything that is listed in “Information to collect for all problems related to IBM Spectrum Scale
RAID” on page 39.
2. Make sure the /tmp/mmfs directory exists on all nodes. If this directory does not exist, the GPFS
daemon will not generate internal dumps.
For failures in non-IBM software, follow the problem-reporting procedures provided with that product.
To maintain system productivity, the vast majority of these failures must be handled automatically
without loss of data, without temporary loss of access to the data, and with minimal impact on the
performance of the system. Some failures require human intervention, such as replacing failed
components with spare parts or correcting faults that cannot be corrected by automated processes.
You can also use the ESS GUI to perform various maintenance tasks. The ESS GUI lists various
maintenance-related events in its event log in the Monitoring > Events page. You can set up email alerts
to get notified when such events are reported in the system. You can resolve these events or contact the
IBM Support Center for help as needed. The ESS GUI includes various maintenance procedures to guide
you through the fix process.
After creating a GPFS cluster, install the most current firmware for host adapters, enclosures, and drives
only if instructed to do so by IBM support. Then, address issues that occur because you have not
upgraded to a later version of ESS.
You can update the firmware either manually or with the help of directed maintenance procedures (DMP)
that are available in the GUI. The ESS GUI lists events in its event log in the Monitoring > Events page if
the host adapter, enclosure, or drive firmware is not up-to-date, compared to the currently-available
firmware packages on the servers. Select Run Fix Procedure from the Action menu for the
firmware-related event to launch the corresponding DMP in the GUI. For more information on the
available DMPs, see Directed maintenance procedures in Elastic Storage Server: Problem Determination Guide.
The most current firmware is packaged as the gpfs.gss.firmware RPM. You can find the most current
firmware on Fix Central.
1. Sign in with your IBM ID and password.
2. On the Find product tab:
a. In the Product selector field, type: IBM Spectrum Scale RAID and click on the arrow to the right
b. On the Installed Version drop-down menu, select: 5.0.0
c. On the Platform drop-down menu, select: Linux 64-bit,pSeries
d. Click on Continue
3. On the Select fixes page, select the most current fix pack.
4. Click on Continue
5. On the Download options page, select radio button to the left of your preferred downloading
method. Make sure the check box to the left of Include prerequisites and co-requisite fixes (you
can select the ones you need later) has a check mark in it.
6. Click on Continue to go to the Download files... page and download the fix pack files.
The gpfs.gss.firmware RPM needs to be installed on all ESS server nodes. It contains the most current
updates of the following types of supported firmware for a ESS configuration:
v Host adapter firmware
Disk diagnosis
For information about disk hospital, see Disk hospital in IBM Spectrum Scale RAID: Administration.
When an individual disk I/O operation (read or write) encounters an error, IBM Spectrum Scale RAID
completes the NSD client request by reconstructing the data (for a read) or by marking the unwritten
data as stale and relying on successfully written parity or replica strips (for a write), and starts the disk
hospital to diagnose the disk. While the disk hospital is diagnosing, the affected disk will not be used for
serving NSD client requests.
Similarly, if an I/O operation does not complete in a reasonable time period, it is timed out, and the
client request is treated just like an I/O error. Again, the disk hospital will diagnose what went wrong. If
the timed-out operation is a disk write, the disk remains temporarily unusable until a pending timed-out
write (PTOW) completes.
The disk hospital then determines the exact nature of the problem. If the cause of the error was an actual
media error on the disk, the disk hospital marks the offending area on disk as temporarily unusable, and
overwrites it with the reconstructed data. This cures the media error on a typical HDD by relocating the
data to spare sectors reserved within that HDD.
If the disk reports that it can no longer write data, the disk is marked as readonly. This can happen when
no spare sectors are available for relocating in HDDs, or the flash memory write endurance in SSDs was
reached. Similarly, if a disk reports that it cannot function at all, for example not spin up, the disk
hospital marks the disk as dead.
The disk hospital also maintains various forms of disk badness, which measure accumulated errors from
the disk, and of relative performance, which compare the performance of this disk to other disks in the
same declustered array. If the badness level is high, the disk can be marked dead. For less severe cases,
the disk can be marked failing.
Finally, the IBM Spectrum Scale RAID server might lose communication with a disk. This can either be
caused by an actual failure of an individual disk, or by a fault in the disk interconnect network. In this
case, the disk is marked as missing. If the relative performance of the disk drops below 66% of the other
disks for an extended period, the disk will be declared slow.
If a disk would have to be marked dead, missing, or readonly, and the problem affects individual disks
only (not a large set of disks), the disk hospital tries to recover the disk. If the disk reports that it is not
started, the disk hospital attempts to start the disk. If nothing else helps, the disk hospital power-cycles
the disk (assuming the JBOD hardware supports that), and then waits for the disk to return online.
Before actually reporting an individual disk as missing, the disk hospital starts a search for that disk by
polling all disk interfaces to locate the disk. Only after that fast poll fails is the disk actually declared
missing.
If a large set of disks has faults, the IBM Spectrum Scale RAID server can continue to serve read and
write requests, provided that the number of failed disks does not exceed the fault tolerance of either the
RAID code for the vdisk or the IBM Spectrum Scale RAID vdisk configuration data. When any disk fails,
the server begins rebuilding its data onto spare space. If the failure is not considered critical, rebuilding is
In a multiple fault scenario, the server might not have enough disks to fulfill a request. More specifically,
such a scenario occurs if the number of unavailable disks exceeds the fault tolerance of the RAID code. If
some of the disks are only temporarily unavailable, and are expected back online soon, the server will
stall the client I/O and wait for the disk to return to service. Disks can be temporarily unavailable for
any of the following reasons:
v The disk hospital is diagnosing an I/O error.
v A timed-out write operation is pending.
v A user intentionally suspended the disk, perhaps it is on a carrier with another failed disk that has been
removed for service.
If too many disks become unavailable for the primary server to proceed, it will fail over. In other words,
the whole recovery group is moved to the backup server. If the disks are not reachable from the backup
server either, then all vdisks in that recovery group become unavailable until connectivity is restored.
A vdisk will suffer data loss when the number of permanently failed disks exceeds the vdisk fault
tolerance. This data loss is reported to NSD clients when the data is accessed.
Background tasks
While IBM Spectrum Scale RAID primarily performs NSD client read and write operations in the
foreground, it also performs several long-running maintenance tasks in the background, which are
referred to as background tasks. The background task that is currently in progress for each declustered
array is reported in the long-form output of the mmlsrecoverygroup command. Table 3 describes the
long-running background tasks.
Table 3. Background tasks
Task Description
repair-RGD/VCD Repairing the internal recovery group data and vdisk configuration data from the failed disk
onto the other disks in the declustered array.
rebuild-critical Rebuilding virtual tracks that cannot tolerate any more disk failures.
rebuild-1r Rebuilding virtual tracks that can tolerate only one more disk failure.
rebuild-2r Rebuilding virtual tracks that can tolerate two more disk failures.
rebuild-offline Rebuilding virtual tracks where failures exceeded the fault tolerance.
rebalance Rebalancing the spare space in the declustered array for either a missing pdisk that was
discovered again, or a new pdisk that was added to an existing array.
scrub Scrubbing vdisks to detect any silent disk corruption or latent sector errors by reading the entire
virtual track, performing checksum verification, and performing consistency checks of the data
and its redundancy information. Any correctable errors found are fixed.
Data checksums
IBM Spectrum Scale RAID stores checksums of the data and redundancy information on all disks for each
vdisk. Whenever data is read from disk or received from an NSD client, checksums are verified. If the
checksum verification on a data transfer to or from an NSD client fails, the data is retransmitted. If the
checksum verification fails for data read from disk, the error is treated similarly to a media error:
v The data is reconstructed from redundant data on other disks.
v The data on disk is rewritten with reconstructed good data.
v The disk badness is adjusted to reflect the silent read error.
Disk replacement
You can use the ESS GUI for detecting failed disks and for disk replacement.
When one disk fails, the system will rebuild the data that was on the failed disk onto spare space and
continue to operate normally, but at slightly reduced performance because the same workload is shared
among fewer disks. With the default setting of two spare disks for each large declustered array, failure of
a single disk would typically not be a sufficient reason for maintenance.
When several disks fail, the system continues to operate even if there is no more spare space. The next
disk failure would make the system unable to maintain the redundancy the user requested during vdisk
creation. At this point, a service request is sent to a maintenance management application that requests
replacement of the failed disks and specifies the disk FRU numbers and locations.
In general, disk maintenance is requested when the number of failed disks in a declustered array reaches
the disk replacement threshold. By default, that threshold is identical to the number of spare disks. For a
more conservative disk replacement policy, the threshold can be set to smaller values using the
mmchrecoverygroup command.
Disk maintenance is performed using the mmchcarrier command with the --release option, which:
v Suspends any functioning disks on the carrier if the multi-disk carrier is shared with the disk that is
being replaced.
v If possible, powers down the disk to be replaced or all of the disks on that carrier.
v Turns on indicators on the disk enclosure and disk or carrier to help locate and identify the disk that
needs to be replaced.
v If necessary, unlocks the carrier for disk replacement.
After the disk is replaced and the carrier reinserted, another mmchcarrier command with the --replace
option powers on the disks.
You can replace the disk either manually or with the help of directed maintenance procedures (DMP) that
are available in the GUI. The ESS GUI lists events in its event log in the Monitoring > Events page if a
disk failure is reported in the system. Select the gnr_pdisk_replaceable event from the list of events and
then select Run Fix Procedure from the Action menu to launch the replace disk DMP in the GUI. For
more information, see Replace disks in Elastic Storage Server: Problem Determination Guide.
In the case that a IBM Spectrum Scale RAID server becomes permanently disabled, a manual failover
procedure exists that requires recabling to an alternate server. For more information, see the
mmchrecoverygroup command in the IBM Spectrum Scale: Command and Programming Reference. If both
the primary and backup IBM Spectrum Scale RAID servers for a recovery group fail, the recovery group
is unavailable until one of the servers is repaired.
Assume a GL4 building block on which the following two recovery groups are defined:
v BB1RGL, containing the disks in the left side of each drawer
v BB1RGR, containing the disks in the right side of each drawer
The data declustered arrays are defined according to GL4 best practices as follows:
v 58 pdisks per data declustered array
v Default disk replacement threshold value set to 2
The replacement threshold of 2 means that IBM Spectrum Scale RAID only requires disk replacement
when two or more disks fail in the declustered array; otherwise, rebuilding onto spare space or
reconstruction from redundancy is used to supply affected data. This configuration can be seen in the
output of mmlsrecoverygroup for the recovery groups, which are shown here for BB1RGL:
# mmlsrecoverygroup BB1RGL -L
declustered
recovery group arrays vdisks pdisks format version
----------------- ----------- ------ ------ --------------
BB1RGL 4 8 119 4.1.0.1
declustered checksum
vdisk RAID code array vdisk size block size granularity state remarks
------------------ ------------------ ----------- ---------- ---------- ----------- ----- -------
config data declustered array VCD spares actual rebuild spare space remarks
------------------ ------------------ ------------- --------------------------------- ----------------
rebuild space DA1 31 35 pdisk
rebuild space DA2 31 35 pdisk
config data max disk group fault tolerance actual disk group fault tolerance remarks
---------------- -------------------------------- --------------------------------- ----------------
rg descriptor 1 enclosure + 1 drawer 1 enclosure + 1 drawer limiting fault tolerance
system index 2 enclosure 1 enclosure + 1 drawer limited by rg descriptor
vdisk max disk group fault tolerance actual disk group fault tolerance remarks
---------------- ----------------------------- --------------------------------- ----------------
ltip_BB1RGL 1 pdisk 1 pdisk
ltbackup_BB1RGL 0 pdisk 0 pdisk
lhome_BB1RGL 3 enclosure 1 enclosure + 1 drawer limited by rg descriptor
reserved1_BB1RGL 3 enclosure 1 enclosure + 1 drawer limited by rg descriptor
BB1RGLMETA1 3 enclosure 1 enclosure + 1 drawer limited by rg descriptor
BB1RGLDATA1 1 enclosure 1 enclosure
BB1RGLMETA2 3 enclosure 1 enclosure + 1 drawer limited by rg descriptor
BB1RGLDATA2 1 enclosure 1 enclosure
The indication that disk replacement is called for in this recovery group is the value of yes in the needs
service column for declustered array DA1.
The fact that DA1 is undergoing rebuild of its IBM Spectrum Scale RAID tracks that can tolerate one strip
failure is by itself not an indication that disk replacement is required; it merely indicates that data from a
failed disk is being rebuilt onto spare space. Only if the replacement threshold has been met will disks be
marked for replacement and the declustered array marked as needing service.
IBM Spectrum Scale RAID provides several indications that disk replacement is required:
v Entries in the Linux syslog
v The pdReplacePdisk callback, which can be configured to run an administrator-supplied script at the
moment a pdisk is marked for replacement
v The output from the following commands, which may be performed from the command line on any
IBM Spectrum Scale RAID cluster node (see the examples that follow):
1. mmlsrecoverygroup with the -L flag shows yes in the needs service column
2. mmlsrecoverygroup with the -L and --pdisk flags; this shows the states of all pdisks, which may be
examined for the replace pdisk state
3. mmlspdisk with the --replace flag, which lists only those pdisks that are marked for replacement
Note: Because the output of mmlsrecoverygroup -L --pdisk is long, this example shows only some of the
pdisks (but includes those marked for replacement).
# mmlsrecoverygroup BB1RGL -L --pdisk
declustered
recovery group arrays vdisks pdisks
----------------- ----------- ------ ------
BB1RGL 3 5 119
The preceding output shows that the following pdisks are marked for replacement:
v e1d5s01 in DA1
v e1d5s04 in DA1
The naming convention used during recovery group creation indicates that these disks are in Enclosure 1
Drawer 5 Slot 1 and Enclosure 1 Drawer 5 Slot 4. To confirm the physical locations of the failed disks, use
the mmlspdisk command to list information about the pdisks in declustered array DA1 of recovery group
BB1RGL that are marked for replacement:
# mmlspdisk BB1RGL --declustered-array DA1 --replace
pdisk:
replacementPriority = 0.98
name = "e1d5s01"
device = ""
recoveryGroup = "BB1RGL"
declusteredArray = "DA1"
state = "slow/noPath/systemDrain/noRGD/noVCD/replace"
.
.
.
pdisk:
replacementPriority = 0.98
name = "e1d5s04"
device = ""
recoveryGroup = "BB1RGL"
declusteredArray = "DA1"
state = "failing/noPath/systemDrain/noRGD/noVCD/replace"
.
.
.
The physical locations of the failed disks are confirmed to be consistent with the pdisk naming
convention and with the IBM Spectrum Scale RAID component database:
--------------------------------------------------------------------------------------
Disk Location User Location
--------------------------------------------------------------------------------------
pdisk e1d5s01 SV21314035-5-1 Rack BB1 U01-04, Enclosure BB1ENC1 Drawer 5 Slot 1
--------------------------------------------------------------------------------------
pdisk e1d5s04 SV21314035-5-4 Rack BB1 U01-04, Enclosure BB1ENC1 Drawer 5 Slot 4
--------------------------------------------------------------------------------------
The relationship between the enclosure serial number and the user location can be seen with the
mmlscomp command:
# mmlscomp --serial-number SV21314035
Note: In this example, it is assumed that two new disks with the appropriate Field Replaceable Unit
(FRU) code, as indicated by the fru attribute (90Y8597 in this case), have been obtained as replacements
for the failed pdisks e1d5s01 and e1d5s04.
IBM Spectrum Scale RAID assigns a priority to pdisk replacement. Disks with smaller values for the
replacementPriority attribute should be replaced first. In this example, the only failed disks are in DA1
and both have the same replacementPriority.
- Remove carrier.
- Replace disk in location SV21314035-5-1 with FRU 90Y8597.
- Reinsert carrier.
- Issue the following command:
- Remove carrier.
- Replace disk in location SV21314035-5-4 with FRU 90Y8597.
- Reinsert carrier.
- Issue the following command:
declustered
recovery group arrays vdisks pdisks
----------------- ----------- ------ ------
BB1RGL 3 5 121
Notice that the temporary pdisks (e1d5s01#026 and e1d5s04#029) representing the now-removed physical
disks are counted toward the total number of pdisks in the recovery group BB1RGL and the declustered
array DA1. They exist until IBM Spectrum Scale RAID rebuild completes the reconstruction of the data
that they carried onto other disks (including their replacements). When rebuild completes, the temporary
pdisks disappear, and the number of disks in DA1 will once again be 58, and the number of disks in BBRGL
will once again be 119.
The mmlsenclosure command can be used to show you which enclosures need service along with the
specific component. A best practice is to run this command every day to check for failures.
# mmlsenclosure all -L --not-ok
needs
serial number service nodes
------------- ------- ------
SV21313971 yes c45f02n01-ib0.gpfs.net
When you are ready to replace the failed component, use the mmchenclosure command to identify
whether it is safe to complete the repair action or whether IBM Spectrum Scale needs to be shut down
first:
# mmchenclosure SV21313971 --component fan --component-id 1_BOT_LEFT
In the following example, only the enclosure itself is being called out as having failed; the specific
component that has actually failed is not identified. This typically means that there are drive “Service
Action Required (Fault)” LEDs that have been turned on in the drawers. In such a situation, the
mmlspdisk all --not-ok command can be used to check for dead or failing disks.
mmlsenclosure all -L --not-ok
needs nodes
serial number service
------------- ------- ------
SV13306129 yes c45f01n01-ib0.gpfs.net
b. Determine the actual disk group fault tolerance of the vdisks in both recovery groups using the
mmlsrecoverygroup RecoveryGroupName -L command. The rg descriptor and all the vdisks must be
able to tolerate the loss of the item being replaced plus one other item. This is necessary because
the disk group fault tolerance code uses a definition of "tolerance" that includes the system
running in critical mode. But since putting the system into critical is not advised, one other
item is required. For example, all the following would be a valid fault tolerance to continue with
drawer replacement: 1E+1D, 1D+1P, and 2D.
c. Compare the actual disk group fault tolerance with the disk group fault tolerance listed in Table 4
on page 54. If the system is using a mix of 2-fault-tolerant and 3-fault-tolerant vdisks, the
comparisons must be done with the weaker (2-fault-tolerant) values. If the fault tolerance can
tolerate at least the item being replaced plus one other item, then replacement can proceed. Go to
step 4.
4. Drawer Replacement procedure.
a. Quiesce the pdisks.
Choose one of the following methods to suspend all the pdisks in the drawer.
v Using the chdrawer sample script:
/usr/lpp/mmfs/samples/vdisk/chdrawer EnclosureSerialNumber DrawerNumber --release
v Manually using the mmchpdisk command:
for slotNumber in 01 02 03 04 05 06 ; do mmchpdisk LeftRecoveryGroupName --pdisk \
e{EnclosureNumber}d{DrawerNumber}s{$slotNumber} --suspend ; done
Example
The system is a GL4 with vdisks that have 4way mirroring and 8+3p RAID codes. Assume that the
drawer that contains pdisk e2d3s01 needs to be replaced because one of the drawer control modules has
failed (so that you only see one path to the drives instead of 2). This means that you are trying to replace
drawer 3 in enclosure 2. Assume that the drawer spans recovery groups rgL and rgR.
Examine the states of the pdisks and find that they are all ok.
> mmlsrecoverygroup rgL -L --pdisk | grep e2d3
e2d3s01 1, 2 DA1 1862 GiB normal ok
e2d3s02 1, 2 DA1 1862 GiB normal ok
e2d3s03 1, 2 DA1 1862 GiB normal ok
e2d3s04 1, 2 DA1 1862 GiB normal ok
e2d3s05 1, 2 DA1 1862 GiB normal ok
e2d3s06 1, 2 DA1 1862 GiB normal ok
Determine whether online replacement is theoretically possible by consulting Table 4 on page 54.
The system is ESS GL4, so according to the last column drawer replacement is theoretically possible.
Determine the actual disk group fault tolerance of the vdisks in both recovery groups.
> mmlsrecoverygroup rgL -L
declustered
recovery group arrays vdisks pdisks format version
----------------- ------------- ------ ------ --------------
rgL 4 5 119 4.2.0.1
declustered checksum
vdisk RAID code array vdisk size block size granularity state remarks
----------------- ----------------- ----------- ---------- ---------- ----------- ----- -------
logtip_rgL 2WayReplication NVR 48 MiB 2 MiB 4096 ok logTip
logtipbackup_rgL Unreplicated SSD 48 MiB 2 MiB 4096 ok logTipBackup
loghome_rgL 4WayReplication DA1 20 GiB 2 MiB 4096 ok log
md_DA1_rgL 4WayReplication DA1 101 GiB 512 KiB 32 KiB ok
da_DA1_rgL 8+3p DA1 110 TiB 8 MiB 32 KiB ok
config data declustered array VCD spares actual rebuild spare space remarks
----------------- ----------------- ----------- -------------------------------- ------------------------
rebuild space DA1 31 35 pdisk
rebuild space DA2 31 36 pdisk
config data max disk group fault tolerance actual disk group fault tolerance remarks
----------------- ------------------------------ --------------------------------- ------------------------
rg descriptor 1 enclosure + 1 drawer 1 enclosure + 1 drawer limiting fault tolerance
system index 2 enclosure 1 enclosure + 1 drawer limited by rg descriptor
vdisk max disk group fault tolerance actual disk group fault tolerance remarks
----------------- ------------------------------ --------------------------------- ------------------------
logtip_rgL 1 pdisk 1 pdisk
logtipbackup_rgL 0 pdisk 0 pdisk
loghome_rgL 3 enclosure 1 enclosure + 1 drawer limited by rg descriptor
md_DA1_rgL 3 enclosure 1 enclosure + 1 drawer limited by rg descriptor
da_DA1_rgL 1 enclosure 1 enclosure
.
.
.
> mmlsrecoverygroup rgR -L
declustered
recovery group arrays vdisks pdisks format version
----------------- ------------- ------ ------ --------------
rgR 4 5 119 4.2.0.1
declustered checksum
vdisk RAID code array vdisk size block size granularity state remarks
----------------- ----------------- ----------- ---------- ---------- ----------- ----- -------
logtip_rgR 2WayReplication NVR 48 MiB 2 MiB 4096 ok logTip
logtipbackup_rgR Unreplicated SSD 48 MiB 2 MiB 4096 ok logTipBackup
loghome_rgR 4WayReplication DA1 20 GiB 2 MiB 4096 ok log
md_DA1_rgR 4WayReplication DA1 101 GiB 512 KiB 32 KiB ok
da_DA1_rgR 8+3p DA1 110 TiB 8 MiB 32 KiB ok
config data declustered array VCD spares actual rebuild spare space remarks
----------------- ----------------- ----------- -------------------------------- ------------------------
rebuild space DA1 31 35 pdisk
rebuild space DA2 31 36 pdisk
config data max disk group fault tolerance actual disk group fault tolerance remarks
----------------- ------------------------------ --------------------------------- ------------------------
rg descriptor 1 enclosure + 1 drawer 1 enclosure + 1 drawer limiting fault tolerance
system index 2 enclosure 1 enclosure + 1 drawer limited by rg descriptor
vdisk max disk group fault tolerance actual disk group fault tolerance remarks
The rg descriptor has an actual fault tolerance of 1 enclosure + 1 drawer (1E+1D). The data vdisks have a
RAID code of 8+3P and an actual fault tolerance of 1 enclosure (1E). The metadata vdisks have a RAID
code of 4WayReplication and an actual fault tolerance of 1 enclosure + 1 drawer (1E+1D).
Compare the actual disk group fault tolerance with the disk group fault tolerance listed in Table 4 on
page 54.
The actual values match the table values exactly. Therefore, drawer replacement can proceed.
Choose one of the following methods to suspend all the pdisks in the drawer.
v Using the chdrawer sample script:
/usr/lpp/mmfs/samples/vdisk/chdrawer SV21106537 3 --release
v Manually using the mmchpdisk command:
for slotNumber in 01 02 03 04 05 06 ; do mmchpdisk rgL --pdisk e2d3s$slotNumber --suspend ; done
for slotNumber in 07 08 09 10 11 12 ; do mmchpdisk rgR --pdisk e2d3s$slotNumber --suspend ; done
Verify the states of the pdisks and find that they are all suspended.
Remove the drives; make sure to record the location of the drives and label them. You will need to
replace them in the corresponding slots of the new drawer later.
Prerequisite information:
v IBM Spectrum Scale 4.1.1 PTF8 or 4.2.1 PTF1 is a prerequisite for this procedure to work. If you are not
at one of these levels or higher, contact IBM.
v This procedure is intended to be done as a partnership between the storage administrator and a
hardware service representative. The storage administrator is expected to understand the IBM
Spectrum Scale RAID concepts and the locations of the storage enclosures. The storage administrator is
responsible for all the steps except those in which the hardware is actually being worked on.
v The pdisks in a drawer span two recovery groups; therefore, it is very important that you examine the
pdisks and the fault tolerance of the vdisks in both recovery groups when going through these steps.
v An underlying principle is that enclosure replacement should never deliberately put any vdisk into
critical state. When vdisks are in critical state, there is no redundancy and the next single sector or
IO error can cause unavailability or data loss. If drawer replacement is not possible without making
the system critical, then the ESS has to be shut down before the drawer is removed. An example of
drawer replacement will follow these instructions.
1. If IBM Spectrum Scale is shut down: perform the enclosure replacement as soon as possible. Perform
steps 4b through 4h and then restart IBM Spectrum Scale.
2. Examine the states of the pdisks in the affected enclosure. If all the pdisk states are missing, dead, or
replace, then go to step 4b to perform drawer replacement as soon as possible without going through
any of the other steps in this procedure.
Assuming that you know the enclosure number and are using standard pdisk naming conventions,
you could use the following commands to display the pdisks and their states:
mmlsrecoverygroup LeftRecoveryGroupName -L --pdisk | grep e{EnclosureNumber}
mmlsrecoverygroup RightRecoveryGroupName -L --pdisk | grep e{EnclosureNumber}
3. Determine whether online replacement is possible.
a. Consult the following table to see if enclosure replacement is theoretically possible for this
configuration. The only required input at this step is the ESS model. The table shows each
possibleESS system as well as the configuration parameters for the systems. If the table indicates
that online replacement is impossible, IBM Spectrum Scale will need to be shut down (on at least
the two I/O servers involved) and you should go back to step 1. The fault tolerance notation uses
E for enclosure, D for drawer, and P for pdisk.
b. Determine the actual disk group fault tolerance of the vdisks in both recovery groups using the
mmlsrecoverygroup RecoveryGroupName -L command. The rg descriptor and all the vdisks must be
able to tolerate the loss of the item being replaced plus one other item. This is necessary because
the disk group fault tolerance code uses a definition of "tolerance" that includes the system
running in critical mode. But since putting the system into critical is not advised, one other
item is required. For example, all the following would be a valid fault tolerance to continue with
enclosure replacement: 1E+1D and 1E+1P.
c. Compare the actual disk group fault tolerance with the disk group fault tolerance listed in Table 5
on page 60. If the system is using a mix of 2-fault-tolerant and 3-fault-tolerant vdisks, the
comparisons must be done with the weaker (2-fault-tolerant) values. If the fault tolerance can
tolerate at least the item being replaced plus one other item, then replacement can proceed. Go to
step 4.
4. Enclosure Replacement procedure.
a. Quiesce the pdisks.
For GL systems, issue the following commands for each drawer.
for slotNumber in 01 02 03 04 05 06 ; do mmchpdisk LeftRecoveryGroupName --pdisk
e{EnclosureNumber}d{DrawerNumber}s{$slotNumber} --suspend ; done
Example
The system is a GL6 with vdisks that have 4way mirroring and 8+3p RAID codes. Assume that the
enclosure that contains pdisk e2d3s01 needs to be replaced. This means that you are trying to replace
enclosure 2.
Assume that the enclosure spans recovery groups rgL and rgR.
Examine the states of the pdisks and find that they are all ok instead of missing. (Given that you have
a failed enclosure, all the drives would not likely be in an ok state, but this is just an example.)
> mmlsrecoverygroup rgL -L --pdisk | grep e2
Determine whether online replacement is theoretically possible by consulting Table 5 on page 60.
The system is ESS GL6, so according to the last column enclosure replacement is theoretically possible.
Determine the actual disk group fault tolerance of the vdisks in both recovery groups.
> mmlsrecoverygroup rgL -L
declustered
recovery group arrays vdisks pdisks format version
----------------- ------------- ------ ------ --------------
rgL 4 5 177 4.2.0.1
declustered checksum
vdisk RAID code array vdisk size block size granularity state remarks
----------------- ----------------- ----------- ---------- ---------- ----------- ----- -------
logtip_rgL 2WayReplication NVR 48 MiB 2 MiB 4096 ok logTip
logtipbackup_rgL Unreplicated SSD 48 MiB 2 MiB 4096 ok logTipBackup
loghome_rgL 4WayReplication DA1 20 GiB 2 MiB 4096 ok log
md_DA1_rgL 4WayReplication DA1 101 GiB 512 KiB 32 KiB ok
da_DA1_rgL 8+3p DA1 110 TiB 8 MiB 32 KiB ok
config data declustered array VCD spares actual rebuild spare space remarks
----------------- ----------------- ----------- -------------------------------- ------------------------
rebuild space DA1 31 35 pdisk
vdisk max disk group fault tolerance actual disk group fault tolerance remarks
----------------- ------------------------------ --------------------------------- ------------------------
logtip_rgL 1 pdisk 1 pdisk
logtipbackup_rgL 0 pdisk 0 pdisk
loghome_rgL 3 enclosure 1 enclosure + 1 drawer limited by rg descriptor
md_DA1_rgL 3 enclosure 1 enclosure + 1 drawer limited by rg descriptor
da_DA1_rgL 1 enclosure + 1 drawer 1 enclosure + 1 drawer
.
.
.
> mmlsrecoverygroup rgR -L
declustered
recovery group arrays vdisks pdisks format version
----------------- ------------- ------ ------ --------------
rgR 4 5 177 4.2.0.1
declustered checksum
vdisk RAID code array vdisk size block size granularity state remarks
----------------- ----------------- ----------- ---------- ---------- ----------- ----- -------
logtip_rgR 2WayReplication NVR 48 MiB 2 MiB 4096 ok logTip
logtipbackup_rgR Unreplicated SSD 48 MiB 2 MiB 4096 ok logTipBackup
loghome_rgR 4WayReplication DA1 20 GiB 2 MiB 4096 ok log
md_DA1_rgR 4WayReplication DA1 101 GiB 512 KiB 32 KiB ok
da_DA1_rgR 8+3p DA1 110 TiB 8 MiB 32 KiB ok
config data declustered array VCD spares actual rebuild spare space remarks
----------------- ----------------- ----------- -------------------------------- ------------------------
rebuild space DA1 31 35 pdisk
config data max disk group fault tolerance actual disk group fault tolerance remarks
----------------- ------------------------------ --------------------------------- ------------------------
rg descriptor 1 enclosure + 1 drawer 1 enclosure + 1 drawer limiting fault tolerance
system index 2 enclosure 1 enclosure + 1 drawer limited by rg descriptor
vdisk max disk group fault tolerance actual disk group fault tolerance remarks
----------------- ------------------------------ --------------------------------- ------------------------
logtip_rgR 1 pdisk 1 pdisk
logtipbackup_rgR 0 pdisk 0 pdisk
loghome_rgR 3 enclosure 1 enclosure + 1 drawer limited by rg descriptor
md_DA1_rgR 3 enclosure 1 enclosure + 1 drawer limited by rg descriptor
da_DA1_rgR 1 enclosure + 1 drawer 1 enclosure + 1 drawer
The rg descriptor has an actual fault tolerance of 1 enclosure + 1 drawer (1E+1D). The data vdisks have a
RAID code of 8+3P and an actual fault tolerance of 1 enclosure (1E). The metadata vdisks have a RAID
code of 4WayReplication and an actual fault tolerance of 1 enclosure + 1 drawer (1E+1D).
Compare the actual disk group fault tolerance with the disk group fault tolerance listed in Table 5 on
page 60.
The actual values match the table values exactly. Therefore, enclosure replacement can proceed.
Verify the pdisks were suspended using the mmlsrecoverygroup command. You should see suspended as
part of the pdisk state.
> mmlsrecoverygroup rgL -L --pdisk | grep e2d
e2d1s01 0, 4 DA1 96 GiB normal ok/suspended
e2d1s02 0, 4 DA1 96 GiB normal ok/suspended
e2d1s04 0, 4 DA1 96 GiB normal ok/suspended
e2d1s05 0, 4 DA2 2792 GiB normal ok/suspended
e2d1s06 0, 4 DA2 2792 GiB normal ok/suspended
e2d2s01 0, 4 DA1 96 GiB normal ok/suspended
.
.
.
Remove the drives; make sure to record the location of the drives and label them. You will need to
replace them in the corresponding drawer slots of the new enclosure later.
Replace the drives in the corresponding drawer slots of the new enclosure.
Verify the SAS topology on the servers to ensure that all drives from the new storage enclosure are
present.
Verify that the pdisks were resumed by using the mmlsrecoverygroup command.
> mmlsrecoverygroup rgL -L --pdisk | grep e2
Assume a fully-populated Power 775 Disk Enclosure (serial number 000DE37) on which the following
two recovery groups are defined:
v 000DE37TOP containing the disks in the top set of carriers
v 000DE37BOT containing the disks in the bottom set of carriers
The replacement threshold of 2 means that GNR will only require disk replacement when two or more
disks have failed in the declustered array; otherwise, rebuilding onto spare space or reconstruction from
redundancy will be used to supply affected data.
This configuration can be seen in the output of mmlsrecoverygroup for the recovery groups, shown here
for 000DE37TOP:
# mmlsrecoverygroup 000DE37TOP -L
declustered
recovery group arrays vdisks pdisks
----------------- ----------- ------ ------
000DE37TOP 5 9 192
declustered
vdisk RAID code array vdisk size remarks
------------------ ------------------ ----------- ---------- -------
000DE37TOPLOG 3WayReplication LOG 4144 MiB log
000DE37TOPDA1META 4WayReplication DA1 250 GiB
000DE37TOPDA1DATA 8+3p DA1 17 TiB
000DE37TOPDA2META 4WayReplication DA2 250 GiB
000DE37TOPDA2DATA 8+3p DA2 17 TiB
000DE37TOPDA3META 4WayReplication DA3 250 GiB
000DE37TOPDA3DATA 8+3p DA3 17 TiB
000DE37TOPDA4META 4WayReplication DA4 250 GiB
000DE37TOPDA4DATA 8+3p DA4 17 TiB
The indication that disk replacement is called for in this recovery group is the value of yes in the needs
service column for declustered array DA3.
The fact that DA3 (the declustered array on the disks in carrier slot 3) is undergoing rebuild of its RAID
tracks that can tolerate two strip failures is by itself not an indication that disk replacement is required; it
merely indicates that data from a failed disk is being rebuilt onto spare space. Only if the replacement
threshold has been met will disks be marked for replacement and the declustered array marked as
needing service.
Note: Because the output of mmlsrecoverygroup -L --pdisk for a fully-populated disk enclosure is very
long, this example shows only some of the pdisks (but includes those marked for replacement).
# mmlsrecoverygroup 000DE37TOP -L --pdisk
declustered
recovery group arrays vdisks pdisks
----------------- ----------- ------ ------
000DE37TOP 5 9 192
The preceding output shows that the following pdisks are marked for replacement:
v c014d3 in DA3
v c018d3 in DA3
The naming convention used during recovery group creation indicates that these are the disks in slot 3 of
carriers 14 and 18. To confirm the physical locations of the failed disks, use the mmlspdisk command to
list information about those pdisks in declustered array DA3 of recovery group 000DE37TOP that are
marked for replacement:
# mmlspdisk 000DE37TOP --declustered-array DA3 --replace
pdisk:
replacementPriority = 1.00
name = "c014d3"
device = "/dev/rhdisk158,/dev/rhdisk62"
recoveryGroup = "000DE37TOP"
declusteredArray = "DA3"
state = "dead/systemDrain/noRGD/noVCD/replace"
.
.
.
pdisk:
replacementPriority = 1.00
name = "c018d3"
device = "/dev/rhdisk630,/dev/rhdisk726"
recoveryGroup = "000DE37TOP"
declusteredArray = "DA3"
state = "dead/systemDrain/noRGD/noVCD/noData/replace"
.
.
.
Replacing the failed disks in a Power 775 Disk Enclosure recovery group
Note: In this example, it is assumed that two new disks with the appropriate Field Replaceable Unit
(FRU) code, as indicated by the fru attribute (74Y4936 in this case), have been obtained as replacements
for the failed pdisks c014d3 and c018d3.
- Remove carrier.
- Replace disk in location 78AD.001.000DE37-C14-D3 with FRU 74Y4936.
- Reinsert carrier.
- Issue the following command:
When the mmchcarrier --replace command returns successfully, GNR has resumed use of the other 3
disks. The failed pdisk may remain in a temporary form (indicated here by the name c014d3#162) until all
data from it has been rebuilt, at which point it is finally deleted. The new replacement disk, which has
assumed the name c014d3, will have RAID tracks rebuilt and rebalanced onto it. Notice that only one
block device name is mentioned as being formatted as a pdisk; the second path will be discovered in the
background.
declustered
recovery group arrays vdisks pdisks
----------------- ----------- ------ ------
000DE37TOP 5 9 193
Notice that the temporary pdisk c014d3#162 is counted in the total number of pdisks in declustered array
DA3 and in the recovery group, until it is finally drained and deleted.
Notice also that pdisk c018d3 is still marked for replacement, and that DA3 still needs service. This is
because GNR replacement policy expects all failed disks in the declustered array to be replaced once the
replacement threshold is reached. The replace state on a pdisk is not removed when the total number of
failed disks goes under the threshold.
declustered
recovery group arrays vdisks pdisks
----------------- ----------- ------ ------
000DE37TOP 5 9 192
Notice that both temporary pdisks have been deleted. This is because c014d3#162 has finished draining,
and because pdisk c018d3#166 had, before it was replaced, already been completely drained (as
evidenced by the noData flag). Declustered array DA3 no longer needs service and once again contains 47
pdisks, and the recovery group once again contains 192 pdisks.
The following table provides details of the available DMPs and the corresponding events.
Table 6. DMPs
DMP Event ID
Replace disks gnr_pdisk_replaceable
Update enclosure firmware enclosure_firmware_wrong
Update drive firmware drive_firmware_wrong
Update host-adapter firmware adapter_firmware_wrong
Start NSD disk_down
Start GPFS daemon gpfs_down
Increase fileset space inode_error_high and inode_warn_high
Synchronize Node Clocks time_not_in_sync
Start performance monitoring collector service pmcollector_down
Start performance monitoring sensor service pmsensors_down
Activate AFM performance monitoring sensors afm_sensors_inactive
Activate NFS performance monitoring sensors nfs_sensors_inactive
Activate SMB performance monitoring sensors smb_sensors_inactive
Replace disks
The replace disks DMP assists you to replace the disks.
The following are the corresponding event details and proposed solution:
v Event name: gnr_pdisk_replaceable
v Problem: The state of a physical disk is changed to “replaceable”.
v Solution: Replace the disk.
The ESS GUI detects if a disk is broken and whether it needs to be replaced. In this case, launch this
DMP to get support to replace the broken disks. You can use this DMP either to replace one disk or
multiple disks.
The DMP automatically launches in corresponding mode depending on situation. You can launch this
DMP from the pages in the GUI and follow the wizard to release one or more disks:
v Monitoring > Hardware page: Select Replace Broken Disks from the Actionsmenu.
v Monitoring > Hardware page: Select the broken disk to be replaced in an enclosure and then select
Replace from the Actions menu.
v Monitoring > Events page: Select the gnr_pdisk_replaceable event from the event listing and then select
Run Fix Procedure from the Actions menu.
v Storage > Physical page: Select Replace Broken Disks from the Actions menu.
v Storage > Physical page: Select the disk to be replaced and then select Replace Disk from the Actions
menu.
The system issues the mmchcarrier command to replace disks as given in the following format:
| The system uses the following command on an mmvdisk-enabled environment to release and replace the
| disk:
| mmvdisk pdisk replace [--prepare | --cancel] --recovery-group DiskRecoveryGroup --pdisk DiskName
The following are the corresponding event details and the proposed solution:
v Event name: enclosure_firmware_wrong
v Problem: The reported firmware level of the environmental service module is not compliant with the
recommendation.
v Solution: Update the firmware.
If more than one enclosure is not running the newest version of the firmware, the system prompts to
update the firmware. The system issues the mmchfirmware command to update firmware as given in the
following format:
mmchfirmware --esms <<ESM_Name>> --cluster
<<Cluster_Id>>- for all the enclosures : mmchfirmware --esms --cluster
<<Cluster_Id>>
The following are the corresponding event details and the proposed solution:
v Event name: drive_firmware_wrong
v Problem: The reported firmware level of the physical disk is not compliant with the recommendation.
v Solution: Update the firmware.
If more than one disk is not running the newest version of the firmware, the system prompts to update
the firmware. The system issues the chfirmware command to update firmware as given in the following
format:
For example:
chfirmware --pdisks <<ENC123001/DRV-2>> --cluster 1857390657572243170
The following are the corresponding event details and the proposed solution:
v Event name: adapter_firmware_wrong
v Problem: The reported firmware level of the host adapter is not compliant with the recommendation.
v Solution: Update the firmware.
If more than one host-adapter is not running the newest version of the firmware, the system prompts to
update the firmware. The system issues the chfirmware command to update firmware as given in the
following format:
For example:
chfirmware --hostadapter <<c45f02n04_HBA_2>> --cluster 1857390657572243170
For example:
chfirmware --pdisks –cluster 1857390657572243170
Start NSD
The Start NSD DMP assists to start NSDs that are not working.
The following are the corresponding event details and the proposed solution:
v Event ID: disk_down
v Problem: The availability of an NSD is changed to “down”.
v Solution: Recover the NSD
The DMP provides the option to start the NSDs that are not functioning. If multiple NSDs are down, you
can select whether to recover only one NSD or all of them.
The system issues the mmchdisk command to recover NSDs as given in the following format:
/usr/lpp/mmfs/bin/mmchdisk <device> start -d <disk description>
The following are the corresponding event details and the proposed solution:
v Event ID: gpfs_down
v Problem: The GPFS daemon is down. GPFS is not operational on node.
v Solution: Start GPFS daemon.
The procedure helps to increase the maximum number of inodes by a percentage of the already allocated
inodes. The following are the corresponding event details and the proposed solution:
v Event ID: inode_error_high and inode_warn_high
v Problem: The inode usage in the fileset reached an exhausted level
v Solution: increase the maximum number of inodes
The system issues the mmchfileset command to recover NSDs as given in the following format:
/usr/lpp/mmfs/bin/mmchfileset <Device> <Fileset> --inode-limit <inodesMaxNumber>
The procedure assists to fix timing issue on a single node or on all nodes that are out of sync. The
following are the corresponding event details and the proposed solution:
v Event ID: time_not_in_sync
v Limitation: This DMP is not available in sudo wrapper clusters. In a sudo wrapper cluster, the user
name is different from 'root'. The system detects the user name by finding the parameter
GPFS_USER=<user name>, which is available in the file /usr/lpp/mmfs/gui/conf/gpfsgui.properties.
v Problem: The time on the node is not synchronous with the time on the GUI node. It differs more
than 1 minute.
v Solution: Synchronize the time with the time on the GUI node.
The system issues the sync_node_time command as given in the following format to synchronize the time
in the nodes:
/usr/lpp/mmfs/gui/bin/sync_node_time <nodeName>
The following are the corresponding event details and the proposed solution:
v Event ID: pmcollector_down
v Limitation: This DMP is not available in sudo wrapper clusters when a remote pmcollector service is
used by the GUI. A remote pmcollector service is detected in case a different value than localhost is
specified in the ZIMonAddress in file, which is located at: /usr/lpp/mmfs/gui/conf/
The system restarts the performance monitoring services by issuing the systemctl restart pmcollector
command.
The performance monitoring collector service might be on some other node of the current cluster. In this
case, the DMP first connects to that node, then restarts the performance monitoring collector service.
ssh <nodeAddress> systemctl restart pmcollector
In a sudo wrapper cluster, when collector on remote node is down, the DMP does not restart the collector
services by itself. You need to do it manually.
The following are the corresponding event details and the proposed solution:
v Event ID: pmsensors_down
v Limitation: This DMP is not available in sudo wrapper clusters. In a sudo wrapper cluster, the user
name is different from 'root'. The system detects the user name by finding the parameter
GPFS_USER=<user name>, which is available in the file /usr/lpp/mmfs/gui/conf/gpfsgui.properties.
v Problem: The performance monitoring sensor service pmsensor is not sending any data. The service
might be down or the difference between the time of the node and the node hosting the performance
monitoring collector service pmcollector is more than 15 minutes.
v Solution: Issue systemctl status pmsensors to verify the status of the sensor service. If pmsensor
service is inactive, issue systemctl start pmsensors.
The system restarts the sensors by issuing systemctl restart pmsensors command.
Events
The recorded events are stored in local database on each node. The user can get a list of recorded events
by using the mmhealth node eventlog command.
The following sections list the RAS events that are applicable to various components of the IBM Spectrum
Scale system:
Messages
This topic contains explanations for IBM Spectrum Scale RAID and ESS GUI messages.
For information about IBM Spectrum Scale messages, see the IBM Spectrum Scale: Problem Determination
Guide.
For IBM Spectrum Scale messages, the severity tag is optionally followed by a colon (:) and a number,
and surrounded by an opening and closing bracket ([ ]). For example:
[E] or [E:nnn]
If more than one substring within a message matches this pattern (for example, [A] or [A:nnn]), the
severity tag is the first such matching string.
When the severity tag includes a numeric code (nnn), this is an error code associated with the message. If
this were the only problem encountered by the command, the command return code would be nnn.
If a message does not have a severity tag, the message does not conform to this specification. You can
determine the message severity by examining the text or any supplemental information provided in the
message catalog, or by contacting the IBM Support Center.
For IBM Spectrum Scale messages, this priority can be used to filter the messages that are sent to the
error log on Linux. Filtering is controlled with the mmchconfig attribute systemLogLevel. The default for
systemLogLevel is error, which means that IBM Spectrum Scale will send all error [E], critical [X], and
alert [A] messages to the error log. The values allowed for systemLogLevel are: alert, critical, error,
warning, notice, configuration, informational, detail, or debug. Additionally, the value none can be
specified so no messages are sent to the error log.
The following table lists the IBM Spectrum Scale message severity tags in order of priority:
Table 7. IBM Spectrum Scale message severity tags ordered by priority
Type of message
(systemLogLevel
Severity tag attribute) Meaning
A alert Indicates a problem where action must be taken immediately. Notify the
appropriate person to correct the problem.
X critical Indicates a critical condition that should be corrected immediately. The
system discovered an internal inconsistency of some kind. Command
execution might be halted or the system might attempt to continue despite
the inconsistency. Report these errors to IBM.
E error Indicates an error condition. Command execution might or might not
continue, but this error was likely caused by a persistent condition and will
remain until corrected by some other program or administrative action. For
example, a command operating on a single file or other GPFS object might
terminate upon encountering any condition of severity E. As another
example, a command operating on a list of files, finding that one of the files
has permission bits set that disallow the operation, might continue to
operate on all other files within the specified list of files.
W warning Indicates a problem, but command execution continues. The problem can be
a transient inconsistency. It can be that the command has skipped some
operations on some objects, or is reporting an irregularity that could be of
interest. For example, if a multipass command operating on many files
discovers during its second pass that a file that was present during the first
pass is no longer present, the file might have been removed by another
command or program.
N notice Indicates a normal but significant condition. These events are unusual, but
are not error conditions, and could be summarized in an email to
developers or administrators for spotting potential problems. No immediate
action is required.
C configuration Indicates a configuration change; such as, creating a file system or removing
a node from the cluster.
I informational Indicates normal operation. This message by itself indicates that nothing is
wrong; no action is required.
D detail Indicates verbose operational messages; no is action required.
B debug Indicates debug-level messages that are useful to application developers for
debugging purposes. This information is not useful during operations.
For ESS GUI messages, error messages ((E)) have the highest priority and informational messages (I)
have the lowest priority.
The following table lists the ESS GUI message severity tags in order of priority:
Table 8. ESS GUI message severity tags ordered by priority
Severity tag Type of message Meaning
E Error Indicates a critical condition that should be corrected immediately. The
system discovered an internal inconsistency of some kind. Command
execution might be halted or the system might attempt to continue despite
the inconsistency. Report these errors to IBM.
For information about the severity designations of these messages, see “Message severity tags” on page
77.
User response: Correct the nsdRAIDTracks attribute Explanation: The number of pdisks specified is not
and restart the GPFS daemon. valid.
User response: Correct the input and retry the
6027-1853 [E] Recovery group recoveryGroupName does command.
not exist or is not active.
6027-1858 [E] Cannot create declustered array | 6027-1864 [E] [E] At least one declustered array must
arrayName; there can be at most number | contain number + vdisk configuration
declustered arrays in a recovery group. | data spares or more pdisks and be
| eligible to hold vdisk configuration
Explanation: The number of declustered arrays
| data.
allowed in a recovery group has been exceeded.
| Explanation: When creating a new RAID recovery
User response: Reduce the number of declustered
| group, at least one of the declustered arrays in the
arrays in the input file and retry the command.
| recovery group must contain at least 2T+1 pdisks,
| where T is the maximum number of disk failures that
6027-1859 [E] Sector size of pdisk pdiskName is invalid. | can be tolerated within a declustered array. This is
| necessary in order to store the on-disk vdisk
Explanation: All pdisks in a recovery group must | configuration data safely. This declustered array cannot
have the same physical sector size. | have canHoldVCD set to no.
User response: Correct the input file to use a different | User response: Supply at least the indicated number
disk and retry the command. | of pdisks in at least one declustered array of the
| recovery group, or do not specify canHoldVCD=no for
| 6027-1860 [E] Pdisk pdiskName must have a capacity of | that declustered array.
| at least number bytes.
| Explanation: The pdisk must be at least as large as the 6027-1866 [E] Disk descriptor for diskName refers to an
| indicated minimum size in order to be added to this existing NSD.
| declustered array. Explanation: A disk being added to a recovery group
| User response: Correct the input file and retry the appears to already be in-use as an NSD disk.
| command. User response: Carefully check the disks given to
tscrrecgroup, tsaddpdisk or tschcarrier. If you are
6027-1861 [W] Size of pdisk pdiskName is too large for certain the disk is not actually in-use, override the
declustered array arrayName. Only check by specifying the -v no option.
number of number bytes of that capacity
will be used. 6027-1867 [E] Disk descriptor for diskName refers to an
Explanation: For optimal utilization of space, pdisks existing pdisk.
added to this declustered array should be no larger Explanation: A disk being added to a recovery group
than the indicated maximum size. Only the indicated appears to already be in-use as a pdisk.
portion of the total capacity of the pdisk will be
available for use. User response: Carefully check the disks given to
tscrrecgroup, tsaddpdisk or tschcarrier. If you are
User response: Consider creating a new declustered certain the disk is not actually in-use, override the
array consisting of all larger pdisks. check by specifying the -v no option.
6027-1862 [E] Cannot add pdisk pdiskName to 6027-1869 [E] Error updating the recovery group
declustered array arrayName; there can descriptor.
be at most number pdisks in a
declustered array. Explanation: Error occurred updating the RAID
recovery group descriptor.
Explanation: The maximum number of pdisks that
can be added to a declustered array was exceeded. User response: Retry the command.
User response: None.
6027-1870 [E] Recovery group name name is already in
use.
6027-1863 [E] Pdisk sizes within a declustered array
cannot vary by more than number. Explanation: The recovery group name already exists.
Explanation: The disk sizes within each declustered User response: Choose a new recovery group name
array must be nearly the same. using the characters a-z, A-Z, 0-9, and underscore, at
most 63 characters in length.
User response: Create separate declustered arrays for
each disk size.
6027-1871 [E] There is only enough free space to 6027-1877 [E] Cannot remove declustered array
allocate number spare(s) in declustered arrayName because the array still
array arrayName. contains vdisks.
Explanation: Too many spares were specified. Explanation: Declustered arrays that still contain
vdisks cannot be deleted.
User response: Retry the command with a valid
number of spares. User response: Delete any vdisks remaining in this
declustered array using the tsdelvdisk command before
retrying this command.
6027-1872 [E] Recovery group still contains vdisks.
Explanation: RAID recovery groups that still contain
6027-1878 [E] Cannot remove pdisk pdiskName because
vdisks cannot be deleted.
it is the last remaining pdisk in
User response: Delete any vdisks remaining in this declustered array arrayName. Remove the
RAID recovery group using the tsdelvdisk command declustered array instead.
before retrying this command.
Explanation: The tsdelpdisk command can be used
either to delete individual pdisks from a declustered
6027-1873 [E] Pdisk creation failed for pdisk array, or to delete a full declustered array from a
pdiskName: err=errorNum. recovery group. You cannot, however, delete a
declustered array by deleting all of its pdisks -- at least
Explanation: Pdisk creation failed because of the one must remain.
specified error.
User response: Delete the declustered array instead of
User response: None. removing all of its pdisks.
6027-1874 [E] Error adding pdisk to a recovery group. 6027-1879 [E] Cannot remove pdisk pdiskName because
Explanation: tsaddpdisk failed to add new pdisks to a arrayName is the only remaining
recovery group. declustered array with at least number
pdisks.
User response: Check the list of pdisks in the -d or -F
parameter of tsaddpdisk. Explanation: The command failed to remove a pdisk
from a declustered array because no other declustered
array in the recovery group has sufficient pdisks to
6027-1875 [E] Cannot delete the only declustered store the on-disk recovery group descriptor at the
array. required fault tolerance level.
Explanation: Cannot delete the only remaining User response: Add pdisks to another declustered
declustered array from a recovery group. array in this recovery group before removing pdisks
User response: Instead, delete the entire recovery from this one.
group.
6027-1880 [E] Cannot remove pdisk pdiskName because
| 6027-1876 [E] Cannot remove declustered array the number of pdisks in declustered
| arrayName because it is the only array arrayName would fall below the
| remaining declustered array with at code width of one or more of its vdisks.
| least number pdisks eligible to hold Explanation: The number of pdisks in a declustered
| vdisk configuration data. array must be at least the maximum code width of any
| Explanation: The command failed to remove a vdisk in the declustered array.
| declustered array because no other declustered array in User response: Either add pdisks or remove vdisks
| the recovery group has sufficient pdisks to store the from the declustered array.
| on-disk recovery group descriptor at the required fault
| tolerance level.
6027-1881 [E] Cannot remove pdisk pdiskName because
| User response: Add pdisks to another declustered of insufficient free space in declustered
| array in this recovery group before removing this one. array arrayName.
Explanation: The tsdelpdisk command could not
delete a pdisk because there was not enough free space
in the declustered array.
User response: Either add pdisks or remove vdisks
from the declustered array.
6027-1882 [E] Cannot remove pdisk pdiskName; unable 6027-1888 [E] Recovery group already contains number
to drain the data from the pdisk. vdisks.
Explanation: Pdisk deletion failed because the system Explanation: The RAID recovery group already
could not find enough free space on other pdisks to contains the maximum number of vdisks.
drain all of the data from the disk.
User response: Create vdisks in another RAID
User response: Either add pdisks or remove vdisks recovery group, or delete one or more of the vdisks in
from the declustered array. the current RAID recovery group before retrying the
tscrvdisk command.
6027-1883 [E] Pdisk pdiskName deletion failed: process
interrupted. 6027-1889 [E] Vdisk name vdiskName is already in use.
Explanation: Pdisk deletion failed because the deletion Explanation: The vdisk name given on the tscrvdisk
process was interrupted. This is most likely because of command already exists.
the recovery group failing over to a different server.
User response: Choose a new vdisk name less than 64
User response: Retry the command. characters using the characters a-z, A-Z, 0-9, and
underscore.
6027-1884 [E] Missing or invalid vdisk name.
6027-1890 [E] A recovery group may only contain one
Explanation: No vdisk name was given on the
log home vdisk.
tscrvdisk command.
Explanation: A log vdisk already exists in the
User response: Specify a vdisk name using the
recovery group.
characters a-z, A-Z, 0-9, and underscore of at most 63
characters in length. User response: None.
6027-1885 [E] Vdisk block size must be a power of 2. 6027-1891 [E] Cannot create vdisk before the log home
vdisk is created.
Explanation: The -B or --blockSize parameter of
tscrvdisk must be a power of 2. Explanation: The log vdisk must be the first vdisk
created in a recovery group.
User response: Reissue the tscrvdisk command with a
correct value for block size. User response: Retry the command after creating the
log home vdisk.
6027-1886 [E] Vdisk block size cannot exceed
maxBlockSize (number). 6027-1892 [E] Log vdisks must use replication.
Explanation: The virtual block size of a vdisk cannot Explanation: The log vdisk must use a RAID code
be larger than the value of the maxblocksize that uses replication.
configuration attribute of the IBM Spectrum Scale
User response: Retry the command with a valid RAID
mmchconfig command.
code.
User response: Use a smaller vdisk virtual block size,
or increase the value of maxBlockSize using
6027-1893 [E] The declustered array must contain at
mmchconfig maxblocksize=newSize.
least as many non-spare pdisks as the
width of the code.
6027-1887 [E] Vdisk block size must be between
Explanation: The RAID code specified requires a
number and number for the specified
minimum number of disks larger than the size of the
code.
declustered array that was given.
Explanation: An invalid vdisk block size was
User response: Place the vdisk in a wider declustered
specified. The message lists the allowable range of
array or use a narrower code.
block sizes.
User response: Use a vdisk virtual block size within
6027-1894 [E] There is not enough space in the
the range shown, or use a different vdisk RAID code.
declustered array to create additional
vdisks.
Explanation: There is insufficient space in the
declustered array to create even a minimum size vdisk
with the given RAID code.
User response: Add additional pdisks to the an alternative method of replacing failed disks.
declustered array, reduce the number of spares or use a
different RAID code.
6027-3001 [E] Location of pdisk pdiskName of recovery
group recoveryGroupName is not known.
6027-1895 [E] Unable to create vdisk vdiskName
Explanation: IBM Spectrum Scale is unable to find the
because there are too many failed
location of the given pdisk.
pdisks in declustered array
declusteredArrayName. User response: Check the disk enclosure hardware.
Explanation: Cannot create the specified vdisk,
because there are too many failed pdisks in the array. 6027-3002 [E] Disk location code locationCode is not
known.
User response: Replace failed pdisks in the
declustered array and allow time for rebalance Explanation: A disk location code specified on the
operations to more evenly distribute the space. command line was not found.
User response: Check the disk location code.
6027-1896 [E] Insufficient memory for vdisk metadata.
Explanation: There was not enough pinned memory 6027-3003 [E] Disk location code locationCode was
for IBM Spectrum Scale to hold all of the metadata specified more than once.
necessary to describe a vdisk.
Explanation: The same disk location code was
User response: Increase the size of the GPFS page specified more than once in the tschcarrier command.
pool.
User response: Check the command usage and run
again.
6027-1897 [E] Error formatting vdisk.
Explanation: An error occurred formatting the vdisk. 6027-3004 [E] Disk location codes locationCode and
locationCode are not in the same disk
User response: None.
carrier.
Explanation: The tschcarrier command cannot be used
6027-1898 [E] The log home vdisk cannot be destroyed
to operate on more than one disk carrier at a time.
if there are other vdisks.
User response: Check the command usage and rerun.
Explanation: The log home vdisk of a recovery group
cannot be destroyed if vdisks other than the log tip
vdisk still exist within the recovery group. 6027-3005 [W] Pdisk in location locationCode is
controlled by recovery group
User response: Remove the user vdisks and then retry
recoveryGroupName.
the command.
Explanation: The tschcarrier command detected that a
pdisk in the indicated location is controlled by a
6027-1899 [E] Vdisk vdiskName is still in use.
different recovery group than the one specified.
Explanation: The vdisk named on the tsdelvdisk
User response: Check the disk location code and
command is being used as an NSD disk.
recovery group name.
User response: Remove the vdisk with the mmdelnsd
command before attempting to delete it.
6027-3006 [W] Pdisk in location locationCode is
controlled by recovery group id
6027-3000 [E] No disk enclosures were found on the idNumber.
target node.
Explanation: The tschcarrier command detected that a
Explanation: IBM Spectrum Scale is unable to pdisk in the indicated location is controlled by a
communicate with any disk enclosures on the node different recovery group than the one specified.
serving the specified pdisks. This might be because
User response: Check the disk location code and
there are no disk enclosures attached to the node, or it
recovery group name.
might indicate a problem in communicating with the
disk enclosures. While the problem persists, disk
maintenance with the mmchcarrier command is not
available.
User response: Check disk enclosure connections and
run the command again. Use mmaddpdisk --replace as
6027-3008 [E] Incorrect recovery group given for 6027-3014 [E] Pdisk pdiskName of recovery group
location. recoveryGroupName was expected to be
replaced with a new disk; instead, it
Explanation: The mmchcarrier command detected that
was moved from location locationCode to
the specified recovery group name given does not
location locationCode.
match that of the pdisk in the specified location.
Explanation: The mmchcarrier command expected a
User response: Check the disk location code and
pdisk to be removed and replaced with a new disk. But
recovery group name. If you are sure that the disks in
instead of being replaced, the old pdisk was moved
the carrier are not being used by other recovery groups,
into a different location.
it is possible to override the check using the --force-RG
flag. Use this flag with caution as it can cause disk User response: Repeat the disk replacement
errors and potential data loss in other recovery groups. procedure.
6027-3009 [E] Pdisk pdiskName of recovery group 6027-3015 [E] Pdisk pdiskName of recovery group
recoveryGroupName is not currently recoveryGroupName in location
scheduled for replacement. locationCode cannot be used as a
replacement for pdisk pdiskName of
Explanation: A pdisk specified in a tschcarrier or
recovery group recoveryGroupName.
tsaddpdisk command is not currently scheduled for
replacement. Explanation: The tschcarrier command expected a
pdisk to be removed and replaced with a new disk. But
User response: Make sure the correct disk location
instead of finding a new disk, the mmchcarrier
code or pdisk name was given. For the mmchcarrier
command found that another pdisk was moved to the
command, the --force-release option can be used to
replacement location.
override the check.
User response: Repeat the disk replacement
procedure, making sure to replace the failed pdisk with
6027-3010 [E] Command interrupted.
a new disk.
Explanation: The mmchcarrier command was
interrupted by a conflicting operation, for example the
6027-3016 [E] Replacement disk in location
mmchpdisk --resume command on the same pdisk.
locationCode has an incorrect type fruCode;
User response: Run the mmchcarrier command again. expected type code is fruCode.
Explanation: The replacement disk has a different
6027-3011 [W] Disk location locationCode failed to field replaceable unit type code than that of the original
power off. disk.
Explanation: The mmchcarrier command detected an User response: Replace the pdisk with a disk of the
error when trying to power off a disk. same part number. If you are certain the new disk is a
valid substitute, override this check by running the
User response: Check the disk enclosure hardware. If
command again with the --force-fru option.
the disk carrier has a lock and does not unlock, try
running the command again or use the manual carrier
release. 6027-3017 [E] Error formatting replacement disk
diskName.
6027-3012 [E] Cannot find a pdisk in location Explanation: An error occurred when trying to format
locationCode. a replacement pdisk.
Explanation: The tschcarrier command cannot find a User response: Check the replacement disk.
pdisk to replace in the given location.
6027-3018 [E] A replacement for pdisk pdiskName of 6027-3025 [E] Device deviceName does not exist or is
recovery group recoveryGroupName was not active on this node.
not found in location locationCode.
Explanation: The specified device was not found on
Explanation: The tschcarrier command expected a this node.
pdisk to be removed and replaced with a new disk, but
User response: None.
no replacement disk was found.
User response: Make sure a replacement disk was
6027-3026 [E] Recovery group recoveryGroupName does
inserted into the correct slot.
not have an active log home vdisk.
Explanation: The indicated recovery group does not
6027-3019 [E] Pdisk pdiskName of recovery group
have an active log vdisk. This may be because the log
recoveryGroupName in location
home vdisk has not yet been created, because a
locationCode was not replaced.
previously existing log home vdisk has been deleted, or
Explanation: The tschcarrier command expected a because the server is in the process of recovery.
pdisk to be removed and replaced with a new disk, but
User response: Create a log home vdisk if none exists.
the original pdisk was still found in the replacement
Retry the command.
location.
User response: Repeat the disk replacement, making
6027-3027 [E] Cannot configure NSD-RAID services
sure to replace the pdisk with a new disk.
on this node.
Explanation: NSD-RAID services are not supported on
6027-3020 [E] Invalid state change, stateChangeName,
this operating system or node hardware.
for pdisk pdiskName.
User response: Configure a supported node type as
Explanation: The tschpdisk command received an
the NSD RAID server and restart the GPFS daemon.
state change request that is not permitted.
User response: Correct the input and reissue the
6027-3028 [E] There is not enough space in
command.
declustered array declusteredArrayName
for the requested vdisk size. The
6027-3021 [E] Unable to change identify state to maximum possible size for this vdisk is
identifyState for pdisk pdiskName: size.
err=errorNum.
Explanation: There is not enough space in the
Explanation: The tschpdisk command failed on an declustered array for the requested vdisk size.
identify request.
User response: Create a smaller vdisk, remove
User response: Check the disk enclosure hardware. existing vdisks or add additional pdisks to the
declustered array.
6027-3022 [E] Unable to create vdisk layout.
6027-3029 [E] There must be at least number non-spare
Explanation: The tscrvdisk command could not create
pdisks in declustered array
the necessary layout for the specified vdisk.
declusteredArrayName to avoid falling
User response: Change the vdisk arguments and retry below the code width of vdisk
the command. vdiskName.
Explanation: A change of spares operation failed
6027-3023 [E] Error initializing vdisk. because the resulting number of non-spare pdisks
would fall below the code width of the indicated vdisk.
Explanation: The tscrvdisk command could not
initialize the vdisk. User response: Add additional pdisks to the
declustered array.
User response: Retry the command.
mmchrecoverygroup --version, then retry the the server has sufficient real memory to support the
command to remove the log tip vdisk. configured values. The specified configuration variables
should be the same for the recovery group servers.
6027-3043 [E] Log vdisks cannot have multiple use Use the mmchconfig command to correct the
specifications. configuration.
Explanation: A vdisk can have usage vdiskLog,
vdiskLogTip, or vdiskLogReserved, but not more than 6027-3047 [E] Location of pdisk pdiskName is not
one. known.
User response: Retry the command with only one of Explanation: IBM Spectrum Scale is unable to find the
the --log, --logTip, or --logReserved attributes. location of the given pdisk.
User response: Check the disk enclosure hardware.
6027-3044 [E] Unable to determine resource
requirements for all the recovery groups
6027-3048 [E] Pdisk pdiskName is not currently
served by node value: to override this
scheduled for replacement.
check reissue the command with the -v
no flag. Explanation: A pdisk specified in a tschcarrier or
tsaddpdisk command is not currently scheduled for
Explanation: A recovery group or vdisk is being
replacement.
created, but IBM Spectrum Scale can not determine if
there are enough non-stealable buffer resources to allow User response: Make sure the correct disk location
the node to successfully serve all the recovery groups code or pdisk name was given. For the tschcarrier
at the same time once the new object is created. command, the --force-release option can be used to
override the check.
User response: You can override this check by
reissuing the command with the -v flag.
6027-3049 [E] The minimum size for vdisk vdiskName
is number.
6027-3045 [W] Buffer request exceeds the
non-stealable buffer limit. Check the Explanation: The vdisk size was too small.
configuration attributes of the recovery
group servers: pagepool, User response: Increase the size of the vdisk and retry
nsdRAIDBufferPoolSizePct, the command.
nsdRAIDNonStealableBufPct.
Explanation: The limit of non-stealable buffers has 6027-3050 [E] There are already number suspended
been exceeded. This is probably because the system is pdisks in declustered array arrayName.
not configured correctly. You must resume pdisks in the array
before suspending more.
User response: Check the settings of the pagepool,
nsdRAIDBufferPoolSizePct, and Explanation: The number of suspended pdisks in the
nsdRAIDNonStealableBufPct attributes and make sure declustered array has reached the maximum limit.
the server has enough real memory to support the Allowing more pdisks to be suspended in the array
configured values. would put data availability at risk.
Use the mmchconfig command to correct the User response: Resume one more suspended pdisks in
configuration. the array by using the mmchcarrier or mmchpdisk
commands then retry the command.
6027-3063 [E] Unknown recovery group version 6027-3070 [E] Log vdisk vdiskName cannot appear in
version. the same declustered array as log vdisk
vdiskName.
Explanation: The recovery group version named by
the argument of the --version option was not Explanation: No two log vdisks may appear in the
recognized. same declustered array.
User response: Run the command with a new value User response: Specify a different declustered array
for --version. The allowable values will be listed for the new log vdisk and retry the command.
following this message.
6027-3071 [E] Device not found: deviceName.
6027-3064 [I] Allowable recovery group versions are:
Explanation: A device name given in an
Explanation: Informational message listing allowable mmcrrecoverygroup or mmaddpdisk command was
recovery group versions. not found.
User response: Run the command with one of the User response: Check the device name.
recovery group versions listed.
6027-3072 [E] Invalid device name: deviceName.
6027-3065 [E] The maximum size of a log tip vdisk is
Explanation: A device name given in an
size.
mmcrrecoverygroup or mmaddpdisk command is
Explanation: Running mmcrvdisk for a log tip vdisk invalid.
failed because the size is too large.
User response: Check the device name.
User response: Correct the size parameter and run the
command again.
6027-3073 [E] Error formatting pdisk pdiskName on
device diskName.
6027-3066 [E] A recovery group may only contain one
Explanation: An error occurred when trying to format
log tip vdisk.
a new pdisk.
Explanation: A log tip vdisk already exists in the
User response: Check that the disk is working
recovery group.
properly.
User response: None.
6027-3074 [E] Node nodeName not found in cluster
6027-3067 [E] Log tip backup vdisks not supported by configuration.
this recovery group version.
Explanation: A node name specified in a command
Explanation: Vdisks with usage type does not exist in the cluster configuration.
vdiskLogTipBackup are not supported by all recovery
User response: Check the command arguments.
group versions.
User response: Upgrade the recovery group to a later
6027-3075 [E] The --servers list must contain the
version using the --version option of
current node, nodeName.
mmchrecoverygroup.
Explanation: The --servers list of a tscrrecgroup
command does not list the server on which the
6027-3068 [E] The sizes of the log tip vdisk and the
command is being run.
log tip backup vdisk must be the same.
User response: Check the --servers list. Make sure the
Explanation: The log tip vdisk must be the same size
tscrrecgroup command is run on a server that will
as the log tip backup vdisk.
actually server the recovery group.
User response: Adjust the vdisk sizes and retry the
mmcrvdisk command.
6027-3076 [E] Remote pdisks are not supported by this
recovery group version.
6027-3069 [E] Log vdisks cannot use code codeName.
Explanation: Pdisks that are not directly attached are
Explanation: Log vdisks must use a RAID code that not supported by all recovery group versions.
uses replication, or be unreplicated. They cannot use
User response: Upgrade the recovery group to a later
parity-based codes such as 8+2P.
version using the --version option of
User response: Retry the command with a valid RAID mmchrecoverygroup.
code.
Explanation: The replacement threshold for a User response: Upgrade the recovery group to a later
declustered array must not be larger than the number version using the --version option of the
of pdisks in the declustered array. mmchrecoverygroup command. Or, don't specify the
expected number of paths.
User response: Reduce the replacement threshold for
the declustered array, then retry the mmdelpdisk
command. 6027-3089 [E] Pdisk pdiskName location locationCode is
already in use.
6027-3084 [E] VCD spares feature must be enabled Explanation: The pdisk location that was specified in
before being changed. Upgrade recovery the command conflicts with another pdisk that is
group version to at least version to already in that location. No two pdisks can be in the
enable it. same location.
Explanation: The vdisk configuration data (VCD) User response: Specify a unique location for this
spares feature is not supported in the current recovery pdisk.
group version.
User response: Apply the recovery group version that
| 6027-3092 [I] Recovery group recoveryGroupName 6027-3098 [E] Pdisk name pdiskName is already in use
| assignment delay delaySeconds seconds in recovery group recoveryGroupName.
| for safe recovery. Explanation: The pdisk name already exists in the
| Explanation: The recovery group must wait before specified recovery group.
| meta-data recovery. Prior disk lease for the failing User response: Choose a pdisk name that is not
| manager must first expire. already in use.
| User response: None.
6027-3099 [E] Device with path(s) pathName is
6027-3093 [E] Checksum granularity must be number specified for both new pdisks pdiskName
or number for log vdisks. and pdiskName.
Explanation: The only allowable values for the Explanation: The same device is specified for more
checksumGranularity attribute of a log vdisk are 512 than one pdisk in the stanza file. The device can have
and 4K. multiple paths, which are shown in the error message.
User response: Change the checksumGranularity User response: Specify different devices for different
attribute of the vdisk, then retry the command. new pdisks, respectively, and run the command again.
6027-3094 [E] Due to the attributes of other log 6027-3800 [E] Device with path(s) pathName for new
vdisks, the checksum granularity of this pdisk pdiskName is already in use by
vdisk must be number. pdisk pdiskName of recovery group
recoveryGroupName.
Explanation: The checksum granularities of the log tip
vdisk, the log tip backup vdisk, and the log home Explanation: The device specified for a new pdisk is
vdisk must all be the same. already being used by an existing pdisk. The device
6027-3801 [E] [E] The checksum granularity for log | Explanation: NSD format version 2 feature is not
vdisks in declustered array | supported in the current recovery group version.
declusteredArrayName of RG
| User response: Apply the recovery group version
recoveryGroupName must be at least
| recommended in the error message and retry the
number bytes.
| command.
Explanation: Use a checksum granularity that is not
smaller than the minimum value given. You can use the
| 6027-3806 [E] The device given for pdisk pdiskName
mmlspdisk command to view the logical block sizes of
| has a logical block size of logicalBlockSize
the pdisks in this array to identify which pdisks are
| bytes, which is not supported by the
driving the limit.
| recovery group version.
User response: Change the checksumGranularity
| Explanation: The current recovery group version does
attribute of the new log vdisk to the indicated value,
| not support disk drives with the indicated logical block
and then retry the command.
| size.
| User response: Use a different disk device or upgrade
6027-3802 [E] [E] Pdisk pdiskName of RG
| the recovery group version and retry the command.
recoveryGroupName has a logical block
size of number bytes; the maximum
logical block size for pdisks in 6027-3807 [E] NSD version 1 specified for pdisk
declustered array declusteredArrayName pdiskName requires a disk with a logical
cannot exceed the log checksum block size of 512 bytes. The supplied
granularity of number bytes. disk has a block size of logicalBlockSize
bytes. For this disk, you must use at
Explanation: Logical block size of pdisks added to this
least NSD version 2.
declustered array must not be larger than any log
vdisk's checksum granularity. Explanation: Requested logical block size is not
supported by NSD format version 1.
User response: Use pdisks with equal or smaller
logical block size than the log vdisk's checksum User response: Correct the input file to use a different
granularity. disk or specify a higher NSD format version.
6027-3803 [E] [E] NSD format version 2 feature must 6027-3808 [E] Pdisk pdiskName must have a capacity of
be enabled before being changed. at least number bytes for NSD version 2.
Upgrade recovery group version to at
Explanation: The pdisk must be at least as large as the
least recoveryGroupVersion to enable it.
indicated minimum size in order to be added to the
Explanation: NSD format version 2 feature is not declustered array.
supported in current recovery group version.
User response: Correct the input file and retry the
User response: Apply the recovery group version command.
recommended in the error message and retry the
command.
| 6027-3809 [I] Pdisk pdiskName can be added as NSD
| version 1.
| 6027-3804 [W] Skipping upgrade of pdisk pdiskName
| Explanation: The pdisk has enough space to be
| because the disk capacity of number
| configured as NSD version 1.
| bytes is less than the number bytes
| required for the new format. | User response: Specify NSD version 1 for this disk.
| Explanation: The existing format of the indicated
| pdisk is not compatible with NSD V2 descriptors. 6027-3810 [W] [W] Skipping the upgrade of pdisk
pdiskName because no I/O paths are
| User response: A complete format of the declustered
currently available.
| array is required in order to upgrade to NSD V2.
Explanation: There is no I/O path available to the
indicated pdisk.
| 6027-3823 [E] Unknown node nodeName in the | 6027-3830 [E] Too many servers specified.
| recovery group configuration.
| Explanation: An input node list has too many nodes
| Explanation: A node name does not exist in the | specified.
| recovery group configuration manager.
| User response: Verify the list of nodes and shorten the
| User response: Check for damage to the mmsdrfs file. | list to the supported number.
| 6027-3824 [E] The defined server serverName for | 6027-3831 [E] A vdisk name must be provided.
| recovery group recoveryGroupName could
| Explanation: A vdisk name is not specified.
| not be resolved.
| User response: Specify a vdisk name.
| Explanation: The host name of recovery group server
| could not be resolved by gethostbyName().
| 6027-3832 [E] A recovery group name must be
| User response: Fix host name resolution.
| provided.
| Explanation: A recovery group name is not specified.
| 6027-3825 [E] The defined server serverName for node
| class nodeClassName could not be | User response: Specify a recovery group name.
| resolved.
| Explanation: The host name of recovery group server | 6027-3833 [E] Recovery group recoveryGroupName does
| could not be resolved by gethostbyName(). | not have an active root log group.
| User response: Fix host name resolution. | Explanation: The root log group must be active before
| the operation is permitted.
| 6027-3826 [A] Error reading volume identifier for | User response: Retry the command after the recovery
| recovery group recoveryGroupName from | group becomes fully active.
| configuration file.
| Explanation: The volume identifier for the named | 6027-3836 [I] Cannot retrieve MSID for device:
| recovery group could not be read from the mmsdrfs file. | devFileName.
| This should never occur.
| Explanation: Command usage message for tsgetmsid.
| User response: Check for damage to the mmsdrfs file.
| User response: None.
| Explanation: The volume identifier for the named | Explanation: The tscrvdisk command could not
| vdisk could not\ be read from the mmsdrfs file. This | initialize the vdisk at the worker node.
| should never occur. | User response: Retry the command.
| User response: Check for damage to the mmsdrfs file.
| 6027-3838 [E] Unable to write new vdisk MDI.
| 6027-3828 [E] Vdisk vdiskName could not be associated | Explanation: The tscrvdisk command could not write
| with its recovery group | the necessary vdisk MDI.
| recoveryGroupName and will be ignored.
| User response: Retry the command.
| Explanation: The named vdisk cannot be associated
| with its recovery group.
| 6027-3839 [E] Unable to write update vdisk MDI.
| User response: Check for damage to the mmsdrfs file.
| Explanation: The tscrvdisk command could not write
| the necessary vdisk MDI.
| 6027-3829 [E] A server list must be provided.
| User response: Retry the command.
| Explanation: No server list is specified.
| User response: Specify a list of valid servers. | 6027-3840 [E] Unable to delete worker vdisk vdiskName
| err=errorNum.
| Explanation: The specified vdisk worker object could
| not be deleted.
| User response: Retry the command with a valid vdisk | User response: Some internal pdisk state flags can be
| name. | set indirectly by running other commands. For
| example, the deleting state can be set by using the
| mmdelpdisk command.
| 6027-3841 [E] Unable to create new vdisk MDI.
| Explanation: The tscrvdisk command could not
| 6027-3847 [E] [E] The serviceDrain state feature must be
| create the necessary vdisk MDI.
| enabled to use this command. Upgrade
| User response: Retry the command. | the recovery group version to at least
| version to enable it.
| 6027-3843 [E] Error returned from node nodeName | Explanation: The mmchpdisk command option
| when preparing new pdisk pdiskName of | --begin-service-drain was issued, but there are
| RG recoveryGroupName for use: err | back-level nodes in the cluster that do not support this
| errorNum | action.
| Explanation: The system received an error from the | User response: Upgrade the nodes in the cluster to at
| given node when trying to prepare a new pdisk for | least the specified version and run the command again.
| use.
| User response: Retry the command. | 6027-3848 [E] The simulated dead and failing state
| feature must be enabled to use this
| command. Upgrade the recovery group
| 6027-3844 [E] Unable to prepare new pdisk pdiskName | version to at least version to enable it.
| of RG recoveryGroupName for use: exit
| status exitStatus. | Explanation: The mmchpdisk command option
| --begin-service-drain was issued, but there are
| Explanation: The system received an error from the | back-level nodes in the cluster that do not support this
| tspreparenewpdiskforuse script when trying to prepare | action.
| a new pdisk for use.
| User response: Upgrade the nodes in the cluster to at
| User response: Check the new disk and retry the | least the specified version and run the command again.
| command.
| User response: Use a valid pdisk state name. | Explanation: An mmchpdisk --revive command was
| unable to bring a pdisk back online.
| 6027-3846 [E] Pdisk state change pdiskState is not | User response: If the state is missing, restore
| permitted. | connectivity to the disk. If the disk is in failed state
| replace the pdisk. A pdisk with the status dead,
| Explanation: An attempt was made to use the | readOnly, failing, or slot is considered as failed.
| mmchpdisk command either to change an internal pdisk
| state, or to create an invalid combination of states.
IBM may not offer the products, services, or features discussed in this document in other countries.
Consult your local IBM representative for information on the products and services currently available in
your area. Any reference to an IBM product, program, or service is not intended to state or imply that
only that IBM product, program, or service may be used. Any functionally equivalent product, program,
or service that does not infringe any IBM intellectual property right may be used instead. However, it is
the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or
service.
IBM may have patents or pending patent applications covering subject matter described in this
document. The furnishing of this document does not grant you any license to these patents. You can send
license inquiries, in writing, to:
IBM Director of Licensing IBM Corporation North Castle Drive Armonk, NY 10504-1785 U.S.A.
For license inquiries regarding double-byte (DBCS) information, contact the IBM Intellectual Property
Department in your country or send inquiries, in writing, to:
Intellectual Property Licensing Legal and Intellectual Property Law IBM Japan Ltd. 19-21,
The following paragraph does not apply to the United Kingdom or any other country where such
provisions are inconsistent with local law:
This information could include technical inaccuracies or typographical errors. Changes are periodically
made to the information herein; these changes will be incorporated in new editions of the publication.
IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this
publication at any time without notice.
Any references in this information to non-IBM Web sites are provided for convenience only and do not in
any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of
the materials for this IBM product and use of those Web sites is at your own risk.
IBM may use or distribute any of the information you supply in any way it believes appropriate without
incurring any obligation to you.
Licensees of this program who wish to have information about it for the purpose of enabling: (i) the
exchange of information between independently created programs and other programs (including this
one) and (ii) the mutual use of the information which has been exchanged, should contact:
IBM Corporation
Dept. 30ZA/Building 707
Mail Station P300
Such information may be available, subject to appropriate terms and conditions, including in some cases,
payment or a fee.
The licensed program described in this document and all licensed material available for it are provided
by IBM under terms of the IBM Customer Agreement, IBM International Program License Agreement or
any equivalent agreement between us.
Any performance data contained herein was determined in a controlled environment. Therefore, the
results obtained in other operating environments may vary significantly. Some measurements may have
been made on development-level systems and there is no guarantee that these measurements will be the
same on generally available systems. Furthermore, some measurements may have been estimated through
extrapolation. Actual results may vary. Users of this document should verify the applicable data for their
specific environment.
Information concerning non-IBM products was obtained from the suppliers of those products, their
published announcements or other publicly available sources. IBM has not tested those products and
cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM
products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of
those products.
This information contains examples of data and reports used in daily business operations. To illustrate
them as completely as possible, the examples include the names of individuals, companies, brands, and
products. All of these names are fictitious and any similarity to the names and addresses used by an
actual business enterprise is entirely coincidental.
COPYRIGHT LICENSE:
This information contains sample application programs in source language, which illustrate programming
techniques on various operating platforms. You may copy, modify, and distribute these sample programs
in any form without payment to IBM, for the purposes of developing, using, marketing or distributing
application programs conforming to the application programming interface for the operating platform for
which the sample programs are written. These examples have not been thoroughly tested under all
conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these
programs. The sample programs are provided "AS IS", without warranty of any kind. IBM shall not be
liable for any damages arising out of your use of the sample programs.
If you are viewing this information softcopy, the photographs and color illustrations may not appear.
Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business
Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be
trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at
“Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml.
Intel is a trademark of Intel Corporation or its subsidiaries in the United States and other countries.
Java™ and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or
its affiliates.
Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.
UNIX is a registered trademark of The Open Group in the United States and other countries.
Notices 99
100 ESS 5.3.1: Problem Determination Guide
Glossary
This glossary provides terms and definitions for managers. The cluster manager is the
the ESS solution. node with the lowest node number
among the quorum nodes that are
The following cross-references are used in this operating at a particular time.
glossary:
compute node
v See refers you from a non-preferred term to the A node with a mounted GPFS file system
preferred term or from an abbreviation to the that is used specifically to run a customer
spelled-out form. job. ESS disks are not directly visible from
v See also refers you to a related or contrasting and are not managed by this type of
term. node.
CPC See central processor complex (CPC).
For other terms and definitions, see the IBM
Terminology website (opens in new window):
D
http://www.ibm.com/software/globalization/ DA See declustered array (DA).
terminology
datagram
A basic transfer unit associated with a
B
packet-switched network.
building block
DCM See drawer control module (DCM).
A pair of servers with shared disk
enclosures attached. declustered array (DA)
A disjoint subset of the pdisks in a
BOOTP
recovery group.
See Bootstrap Protocol (BOOTP).
dependent fileset
Bootstrap Protocol (BOOTP)
A fileset that shares the inode space of an
A computer networking protocol thst is
existing independent fileset.
used in IP networks to automatically
assign an IP address to network devices DFM See direct FSP management (DFM).
from a configuration server.
DHCP See Dynamic Host Configuration Protocol
(DHCP).
C
direct FSP management (DFM)
CEC See central processor complex (CPC).
The ability of the xCAT software to
central electronic complex (CEC) communicate directly with the Power
See central processor complex (CPC). Systems server's service processor without
the use of the HMC for management.
central processor complex (CPC)
A physical collection of hardware that drawer control module (DCM)
consists of channels, timers, main storage, Essentially, a SAS expander on a storage
and one or more central processors. enclosure drawer.
cluster Dynamic Host Configuration Protocol (DHCP)
A loosely-coupled collection of A standardized network protocol that is
independent systems, or nodes, organized used on IP networks to dynamically
into a network for the purpose of sharing distribute such network configuration
resources and communicating with each parameters as IP addresses for interfaces
other. See also GPFS cluster. and services.
cluster manager
E
The node that monitors node status using
disk leases, detects failures, drives Elastic Storage Server (ESS)
recovery, and selects file system A high-performance, GPFS NSD solution
© Copyright IBM Corporation © IBM 2014, 2018 101
made up of one or more building blocks failure group
that runs on IBM Power Systems servers. A collection of disks that share common
The ESS software runs on ESS nodes - access paths or adapter connection, and
management server nodes and I/O server could all become unavailable through a
nodes. single hardware failure.
ESS Management Server (EMS) FEK See file encryption key (FEK).
An xCAT server is required to discover
file encryption key (FEK)
the I/O server nodes (working with the
A key used to encrypt sectors of an
HMC), provision the operating system
individual file. See also encryption key.
(OS) on the I/O server nodes, and deploy
the ESS software on the management file system
node and I/O server nodes. One The methods and data structures used to
management server is required for each control how data is stored and retrieved.
ESS system composed of one or more
file system descriptor
building blocks.
A data structure containing key
encryption key information about a file system. This
A mathematical value that allows information includes the disks assigned to
components to verify that they are in the file system (stripe group), the current
communication with the expected server. state of the file system, and pointers to
Encryption keys are based on a public or key files such as quota files and log files.
private key pair that is created during the
file system descriptor quorum
installation process. See also file encryption
The number of disks needed in order to
key (FEK), master encryption key (MEK).
write the file system descriptor correctly.
ESS See Elastic Storage Server (ESS).
file system manager
environmental service module (ESM) The provider of services for all the nodes
Essentially, a SAS expander that attaches using a single file system. A file system
to the storage enclosure drives. In the manager processes changes to the state or
case of multiple drawers in a storage description of the file system, controls the
enclosure, the ESM attaches to drawer regions of disks that are allocated to each
control modules. node, and controls token management
and quota management.
ESM See environmental service module (ESM).
fileset A hierarchical grouping of files managed
Extreme Cluster/Cloud Administration Toolkit
as a unit for balancing workload across a
(xCAT)
cluster. See also dependent fileset,
Scalable, open-source cluster management
independent fileset.
software. The management infrastructure
of ESS is deployed by xCAT. fileset snapshot
A snapshot of an independent fileset plus
F all dependent filesets.
failback flexible service processor (FSP)
Cluster recovery from failover following Firmware that provices diagnosis,
repair. See also failover. initialization, configuration, runtime error
detection, and correction. Connects to the
failover
HMC.
(1) The assumption of file system duties
by another node when a node fails. (2) FQDN
The process of transferring all control of See fully-qualified domain name (FQDN).
the ESS to a single cluster in the ESS
FSP See flexible service processor (FSP).
when the other clusters in the ESS fails.
See also cluster. (3) The routing of all fully-qualified domain name (FQDN)
transactions to a second controller when The complete domain name for a specific
the first controller fails. See also cluster. computer, or host, on the Internet. The
FQDN consists of two parts: the hostname
and the domain name.
Glossary 103
central coordinator of the cluster. It also node number
serves as a client node in an ESS building A number that is generated and
block. maintained by IBM Spectrum Scale as the
cluster is created, and as nodes are added
master encryption key (MEK)
to or deleted from the cluster.
A key that is used to encrypt other keys.
See also encryption key. node quorum
The minimum number of nodes that must
maximum transmission unit (MTU)
be running in order for the daemon to
The largest packet or frame, specified in
start.
octets (eight-bit bytes), that can be sent in
a packet- or frame-based network, such as node quorum with tiebreaker disks
the Internet. The TCP uses the MTU to A form of quorum that allows IBM
determine the maximum size of each Spectrum Scale to run with as little as one
packet in any transmission. quorum node available, as long as there is
access to a majority of the quorum disks.
MEK See master encryption key (MEK).
non-quorum node
metadata
A node in a cluster that is not counted for
A data structure that contains access
the purposes of quorum determination.
information about file data. Such
structures include inodes, indirect blocks,
O
and directories. These data structures are
not accessible to user applications. OFED See OpenFabrics Enterprise Distribution
(OFED).
MS See management server (MS).
OpenFabrics Enterprise Distribution (OFED)
MTU See maximum transmission unit (MTU).
An open-source software stack includes
software drivers, core kernel code,
N
middleware, and user-level interfaces.
Network File System (NFS)
A protocol (developed by Sun P
Microsystems, Incorporated) that allows
pdisk A physical disk.
any host in a network to gain access to
another host or netgroup and their file PortFast
directories. A Cisco network function that can be
configured to resolve any problems that
Network Shared Disk (NSD)
could be caused by the amount of time
A component for cluster-wide disk
STP takes to transition ports to the
naming and access.
Forwarding state.
NSD volume ID
A unique 16-digit hexadecimal number R
that is used to identify and access all
RAID See redundant array of independent disks
NSDs.
(RAID).
node An individual operating-system image
RDMA
within a cluster. Depending on the way in
See remote direct memory access (RDMA).
which the computer system is partitioned,
it can contain one or more nodes. In a redundant array of independent disks (RAID)
Power Systems environment, synonymous A collection of two or more disk physical
with logical partition. drives that present to the host an image
of one or more logical disk drives. In the
node descriptor
event of a single physical device failure,
A definition that indicates how IBM
the data can be read or regenerated from
Spectrum Scale uses a node. Possible
the other disk drives in the array due to
functions include: manager node, client
data redundancy.
node, quorum node, and non-quorum
node. recovery
The process of restoring access to file
S X
xCAT See Extreme Cluster/Cloud Administration
SAS See Serial Attached SCSI (SAS).
Toolkit.
secure shell (SSH)
A cryptographic (encrypted) network
protocol for initiating text-based shell
sessions securely on remote computers.
Serial Attached SCSI (SAS)
A point-to-point serial protocol that
moves data to and from such computer
storage devices as hard drives and tape
drives.
service network
A private network that is dedicated to
managing POWER8 servers. Provides
Ethernet-based connectivity among the
FSP, CPC, HMC, and management server.
SMP See symmetric multiprocessing (SMP).
Spanning Tree Protocol (STP)
A network protocol that ensures a
loop-free topology for any bridged
Glossary 105
106 ESS 5.3.1: Problem Determination Guide
Index
Special characters directories
/tmp/mmfs 39
/tmp/mmfs directory 39 disks
diagnosis 44
hardware service 47
A hospital 44
array, declustered maintaining 43
background tasks 45 replacement 46
replacing failed 47, 66
DMP 72
B replace disks 72
update drive firmware 73
back up data 27 update enclosure firmware 73
background tasks 45 update host-adapter firmware 74
best practices for troubleshooting 27, 31, 33 documentation
on web vii
drive firmware
C updating 43
call home
5146 system 1
5148 System 1 E
background 1 Electronic Service Agent
overview 1 activation 3
problem report 7 configuration 4
problem report details 9 Installing 2
Call home login 3
monitoring 11 Reinstalling 12
Post setup activities 14 Uninstalling 12
test 12 enclosure components
upload data 11 replacing failed 52
checksum enclosure firmware
data 46 updating 43
commands errpt command 39
errpt 39 events 77
gpfs.snap 39
lslpp 39
mmlsdisk 40
mmlsfs 40 F
rpm 39 failed disks, replacing 47, 66
comments ix failed enclosure components, replacing 52
components of storage enclosures failover, server 46
replacing failed 52 files
contacting IBM 41 mmfs.log 39
firmware
updating 43
D
data checksum 46
declustered array G
background tasks 45 getting started with troubleshooting 27
diagnosis, disk 44 GPFS
directed maintenance procedure 72 events 77
increase fileset space 75 RAS events 77
replace disks 72 GPFS log 39
start gpfs daemon 74 gpfs.snap command 39
start NSD 74 GUI
start performance monitoring collector service 75 directed maintenance procedure 72
start performance monitoring sensor service 76 DMP 72
synchronize node clocks 75 logs 35
update drive firmware 73 logsIssues with loading GUI 35, 37
update enclosure firmware 73
update host-adapter firmware 74
U
update drive firmware 73
update enclosure firmware 73
update host-adapter firmware 74
V
vdisks
data checksum 46
W
warranty and maintenance 29
web
documentation vii
resources vii
Index 109
110 ESS 5.3.1: Problem Determination Guide
IBM®
Printed in USA
GC27-9272-00