DEAD Recovery Solution For 10GigEthr 03 Iocxgbe Adapters 5900-5001
DEAD Recovery Solution For 10GigEthr 03 Iocxgbe Adapters 5900-5001
10GigEthr-03(iocxgbe) adapters
1| HewlettPackardEnterprise
Table of Contents
2| HewlettPackardEnterprise
Executive Summary
The 10GigEthr-03 adapters have been observed to experience hardware or firmware errors
such as unrecoverable error (UE) because of which the LAN controller becomes unresponsive,
eventually forcing the HPUX 10 Gigabit Ethernet driver (iocxgbe) to move to DEAD state. Such
errors don’t have much diagnostic value but do cause an impact in customer deployments that
have to reboot the entire system to get the LAN controllers to recover from such an error. This
document describes a scheme that attempts to recover such DEAD adapters back to the usable
state and avoiding a full system reboot. This document is the replacement of formally called
"Automatic recovery of the 10GigEthr-03(iocxgbe) adapters in DEAD state scenarios”.
Problem Statement
When the 10GigEthr-03 adapter encounters any hardware or firmware issue such as an
unrecoverable error (UE), the HPUX iocxgbe network driver is forced to move to DEAD state.
When a LAN controller chip encounters this problem, all the ports of that particular controller are
impacted, and moved to DEAD state.
Automatic adapter recovery is a method to regain such DEAD network I/O card back to normal
working state without any user intervention, and without rebooting the server. It brings back all
the affected functions of a configured port either as a LAN device or as an FCoE device to the
normal working state.
Requirements
Automatic recovery
The B.11.31.1609 release of the HPUX 10GigEthr-03 (iocxgbe) driver introduces automatic
recovery feature for the iocxgbe adapters in DEAD state.
The 10GigEthr-03 software bundle contains the iocxgbe driver with updates for supporting
recovery mechanism and changes to handle UE scenarios.
In addition to installing the 10GigEthr-03 bundle, the iocxgbe driver requires other patches to
enable auto recovery.
The latest version of the 11.31 Networking Commands cumulative patch PHNE_44564, which
delivers a daemon and a startup script that handles the NIC recovery needs to be installed.
Also, the latest version of the LAN cumulative patch PHNE_44540, which delivers the modified
driver interface layer for DEAD event handling must be installed to support Automatic adapter
recovery.
3| HewlettPackardEnterprise
The latest version of FCoE driver as of 2016 September should be installed for the interfaces
that supports FCoE as well as 10Gigabit Ethernet.
IMPORTANT: The following problem was fixed in the later release of patches i.e.
‘PHNE_44588 - LAN Cumulative patch’ and ‘PHNE_44655 - Networking Commands patch’ or
later.
QXCR1001522640: lsof hangs when nicrecd daemon is running.
Please note that these patches are not included as part of the Fusion 2018 release and needs
to be installed separately.
Supported adapters
This solution will work on all the variants of the adapter i.e. LOM, Mezzanine, Standup and
Combo form factors.
4| HewlettPackardEnterprise
Mezzanine o NC553m 10Gb 2-port FlexFabric Converged Network
Adapter o NC552m Dual-Port Flex-10 10GbE BL-c Adapter
Stand-up cards o AT111A HP PCIe 2-port CNA o AT118A HP
Integrity NC552SFP 2P 10GbE Adapter o AT094A HP PCIe 2p 8Gb
FC and 2p 1/10GbE Adapter
Supported adapters for manual DIO recovery:
Details about the network adapters supporting this feature is described in "a) Overview of the
DIO manual recovery solution" section later in this document.
Limitations
Non-DIO and DIO interfaces recovery solution has some limitations;
Limitation for non-DIO interfaces (Automatic recovery)
A minimum card firmware version of 4.9.416.12 or later is required. Ensure that the adapter
is updated with the latest firmware version before proceeding with the installation. Recovery
is not supported on firmware versions earlier than 4.9.416.12. This means that when an
adapter with older firmware becomes unresponsive, the driver will immediately detect it and
mark the interface as DEAD but it will not recover the interface.
Currently, Automatic adapter recovery feature is not supported for DIO devices. If an
interface has any of its functions exported to HPVM guest as a DIO device, the recovery
fails. The VSP needs to be rebooted to recover the DEAD interface.
Maximum recovery attempts for a particular adapter is five within 24 hours.
For example, if a particular card becomes unresponsive frequently, there will be up to five
attempts to recover the interface within 24 hours of time.
Beyond that, it is either concluded that the card is faulty and needs to be replaced, or the
user needs to try manual recovery.
Interfaces display order in lanscan/nwmgr can change after recovery. However, the
PPA/interface name (suffix) will not change.
Limitation for DIO interfaces (Manual recovery for DIO)
Refer to ‘b) Limitations of DIO manual recovery’ later in this document.
The Automatic adapter recovery feature can be accessed and installed on supported systems
from the HPE Software Depot
To install the driver bundle and other dependent patches on the server with a single reboot,
follow these steps-
Step 1. Log in to the server as root.
Step 2. Back up the server before installing the product.
Step 3: Download all the depots into a directory (say /tmp).
Step 4: Verify that the depots are downloaded correctly using the following commands:
5| HewlettPackardEnterprise
swlist -d -s /tmp/<depot_filename> For
example:
# swlist -d -s /tmp/10GigEthr-03_B.11.31.1609_HP-UX_B.11.31_IA.depot
# 10GigEthr-03 B.11.31.1609 PCIe 10 GbE;Supptd
HW=580151/610609/613431B21,NC551/552/553,AT094/111/118A
# swlist -d -s /tmp/PHNE_44540.depot
# PHNE_44540 1.0 LAN cumulative patch
# swlist -d -s /tmp/PHNE_44564.depot
# PHNE_44564 1.0 Networking commands cumulative patch
The “hpe_patches” is now a mega bundle which contains all the depots downloaded. Installing
the mega bundle effectively installs all the depots together on the server.
Step 7: Install the /tmp/hpe_patches bundle using the swinstall tool:
swinstall -s /tmp/<depot_filename> For
example:
# swinstall -x autoreboot=true -s /tmp/hpe_patches/ \*
Important: Use this command on the server where the product is to be installed, running
standalone; do not perform this step over the network. When installation is complete, the
server reboots.
Step 8: To verify that the 10 Gigabit Ethernet driver installation was successful, use this
command:
what /stand/vmunix /stand/current/mod/* | grep iocxgbe For
example:
# what /stand/vmunix /stand/current/mod/* | grep iocxgbe
/stand/current/mod/iocxgbe:
iocxgbe_ilan Version: 3 Sep 14 2016
iocxgbe Revision: IOCXGBE_B.11.31.WR1609 Sep 14 2016
$Revision: iocxgbe: B.11.31.1609_LR
/stand/current/mod/iocxgbe.prep:
#
Step 9: To verify the patch installation was successful, use these commands:
swverify <patch_name> For
example:
# swverify PHNE_44540
6| HewlettPackardEnterprise
# swverify PHNE_44564
Ensure that there are no errors displayed in the output.
Step 10: Verify the auto recovery daemon is active, by using this command:
ps –ef | grep nicrecd For
example,
# ps -ef | grep nicrecd
root 14389 1 0 16:29:46 ? 0:00 /usr/sbin/nicrecd
#
Important: On a HPVM configuration, the auto recovery related patches should be installed
ONLY on the host/VSP.
In the latter case, the driver mailbox commands will timeout, eventually resulting in a data path
transmit engine stall. The driver will no longer be able to send or receive any packets and the
interface will be marked DEAD due to command timeout.
The Automatic recovery feature will attempt to recover the interfaces from all such DEAD cases.
Following are the examples of the messages logged in the syslog, when an iocxgbe adapter
encounters a DEAD state due to UE:
Sep 23 15:28:19 hpsmbl3 vmunix: iocxgbe0/6725, 1474883899.771878
iocxgbe_mark_card_dead called from iocxgbe_watchdog_detected_dead Sep 23
15:28:20 hpsmbl3 vmunix: iocxgbe0/1649, 1474883900.170283 iocxgbe_watchdog:
uerr lo 0x4000000 0x4000000, hi 0x1080 0x0 Sep 23 15:28:20 hpsmbl3 vmunix:
iocxgbe0/1653, 1474883900.170287 iocxgbe_watchdog: uerr hi bit 7 set: PMEM
Sep 23 15:28:20 hpsmbl3 vmunix: iocxgbe0/1665, 1474883900.170291 iocxgbe_watchdog: Reboot needed
to recover device
Sep 23 15:28:20 hpsmbl3 vmunix: iocxgbe1/1649, 1474883900.171572
iocxgbe_watchdog: uerr lo 0x4000000 0x4000000, hi 0x1080 0x0 Sep 23 15:28:20
hpsmbl3 vmunix: iocxgbe1/1653, 1474883900.171575 iocxgbe_watchdog: uerr hi
bit 7 set: PMEM
Sep 23 15:28:20 hpsmbl3 vmunix: iocxgbe1/1665, 1474883900.171579 iocxgbe_watchdog: Reboot needed
to recover device
Sep 23 15:28:31 hpsmbl3 vmunix: iocxgbe1/1618, 1474883911.171575
iocxgbe_watchdog: uerr lo 0x4000020 0x4000000, hi 0x801080 0x0 Sep 23
15:28:31 hpsmbl3 vmunix: iocxgbe1/1622, 1474883911.171580 iocxgbe_watchdog:
uerr lo bit 5 set: MPU
Sep 23 15:28:31 hpsmbl3 vmunix: iocxgbe1/1634, 1474883911.171585 iocxgbe_watchdog: Reboot needed
to recover device
Sep 23 15:28:31 hpsmbl3 vmunix: iocxgbe1/1649, 1474883911.171588
iocxgbe_watchdog: uerr lo 0x4000020 0x4000000, hi 0x801080 0x0 Sep 23
15:28:31 hpsmbl3 vmunix: iocxgbe1/1653, 1474883911.171591 iocxgbe_watchdog:
uerr hi bit 23 set: NETC
Sep 23 15:28:31 hpsmbl3 vmunix: iocxgbe1/1665, 1474883911.171595 iocxgbe_watchdog: Reboot needed
to recover device
7| HewlettPackardEnterprise
Following are the examples of the messages logged in the syslog, when an iocxgbe adapter
encounters a DEAD state due to command timeout:
With the installation of the mentioned patches and depots, automatic adapter recovery feature is
enabled by default. This means that the NIC recovery daemon will always be alive on that
HPUX server. The daemon will be sleeping on an ioctl to the driver interface layer, to get
unblocked by a driver event.
When the 10GigEthr-03 (iocxgbe) driver detects that a particular interface is not responding, it
will instantly mark the interface as DEAD and notifies the upper layers by sending an event.
The daemon, awakened by the driver event, confirms the interface is in DEAD state and
attempts to recover the interface by simply issuing a driver API.
Once the recovery attempt is completed, a success or failure message is logged in the syslog
and the daemon will go back to sleep.
After the card has been resumed, a recovery message will be logged in syslog, for example:
Sep 23 11:44:16 hpsmbl3.in.rdlabs.hpecorp.net nicrecd[1353]: Interface: 42/0/1/0/0/0
recovered successfully.
If the recovery does not succeed, the adapter has a persistent error condition. A
failure message will be logged in syslog, for example:
Sep 23 11:44:16 hpsmbl3.in.rdlabs.hpecorp.net nicrecd[1353]: Interface: 42/0/1/0/0/0
recovery failed.
Auto recovery daemon also handles cases where multiple adapters are going to dead state
simultaneously.
If a particular adapter becomes DEAD more frequently, there will be up to five attempts to
recover that interface within 24 hours.
Beyond that a manual NIC recovery operation is required to restore the card.
8| HewlettPackardEnterprise
NOTE:
HPE recommends to replace the card with such erratic behaviour. There is a high probability
that the I/O card is defective. But if a user intends to use the same adapter, one can continue
with the manual recovery option.
However, a manual recovery attempt may also fail if there is a persistent error condition.
2. Execute the following command to confirm the driver state of the interface that is
DOWN:
nwmgr –q info -c lan<ppa> | grep “Driver state” For
example,
#nwmgr -q info -c lan1 | grep "Driver state"
Driver state: IOCXGBE_STATE_DEAD
#
#nwmgr -q info -c lan5 | grep "Driver state"
Driver state: IOCXGBE_STATE_DEAD
#
3. Using the following command, user can take a look at all the related functions/hardware
paths of iocxgbe interfaces:
lanscan For
example,
#lanscan
Hardware Station Crd Hdw Net-Interface NM MAC HP-DLPI DLPI
9| HewlettPackardEnterprise
Path Address In# State Name PPA ID Type Support Mjr#
0/0/0/7/0/0/0 0xA0B3CC1CAF28 0 DOWN lan0 snap0 1 ETHER Yes 119
0/0/0/7/0/0/1 0xA0B3CC1CAF2C 1 DOWN lan1 snap1 2 ETHER Yes 119
0/0/0/7/0/0/2 0xA0B3CC1CAF2A 2 DOWN lan2 snap2 3 ETHER Yes 119
0/0/0/7/0/0/3 0xA0B3CC1CAF2E 3 DOWN lan3 snap3 4 ETHER Yes 119
0/0/0/7/0/0/4 0xA0B3CC1CAF2B 4 DOWN lan4 snap4 5 ETHER Yes 119
0/0/0/7/0/0/5 0xA0B3CC1CAF2F 5 DOWN lan5 snap5 6 ETHER Yes 119
0/0/0/7/0/0/6 0xA0B3CC1CAF20 6 DOWN lan4 snap6 7 ETHER Yes 119
0/0/0/7/0/0/7 0xA0B3CC1CAF22 7 DOWN lan5 snap7 8 ETHER Yes 119
0/0/0/3/0/0/0 0x10604B353B6C 8 UP lan8 snap8 9 ETHER Yes 119
0/0/0/3/0/0/1 0x10604B353B70 9 UP lan9 snap9 10 ETHER Yes 119
#
4. Execute the following command to confirm the firmware version of the DEAD interface:
nwmgr -q info -c lan<ppa> | grep "Firmware" For
example,
#nwmgr -q info -c lan0 | grep "Firmware"
Firmware version: 4.9.416.12
#
5. Execute the following command to recover the interface which is in DEAD state:
lanadmin –x recover <ppa> For
example,
# lanadmin -x recover 1
WARNING!!
Driver will reset the entire interface.
This means that ALL the lan/FCoE ports that are associated with this interface on the adapter
will be reset.
Recovery is not supported on an interface with one or more ports/functions exported to the
guest as a DIO device.
In such a case please abort this operation and reboot the VSP to recover the interface.
Do you really want to proceed? (y/n) [n]: y
Reset Success!!
Attempting recovery of the interface(s)...
10 | HewlettPackardEnterprise
Resuming h/w path 0/0/0/7/0/0/4
Resuming h/w path 0/0/0/7/0/0/5
Resuming h/w path 0/0/0/7/0/0/6
Resuming h/w path 0/0/0/7/0/0/7
6. Execute the following command to verify the interface has recovered successfully:
nwmgr –S iocxgbe For
example,
#nwmgr –S iocxgbe
Name/ Interface Station Sub- Interface Related ClassInstance
State Address system Type Interface
========= ===== =========== ===== ======= =======
lan0 UP 0xA0B3CC1CAF28 iocxgbe 10GBASE-KR
lan1 UP 0xA0B3CC1CAF2C iocxgbe 10GBASE-KR
lan2 UP 0xA0B3CC1CAF2A iocxgbe 10GBASE-KR
lan3 UP 0xA0B3CC1CAF2E iocxgbe 10GBASE-KR
lan4 UP 0xA0B3CC1CAF2B iocxgbe 10GBASE-KR
lan5 UP 0xA0B3CC1CAF2F iocxgbe 10GBASE-KR
lan6 UP 0xA0B3CC1CAF20 iocxgbe 10GBASE-KR
lan7 UP 0xA0B3CC1CAF22 iocxgbe 10GBASE-KR
lan8 UP 0x10604B353B6C iocxgbe 10GBASE-KR
lan9 UP 0x10604B353B70 iocxgbe 10GBASE-KR
#
7. Verify the driver state of the recovered interface using the following command:
nwmgr –q info -c lan<ppa> | grep “Driver state”
Note that all the interfaces corresponding to a DEAD card will be restored. For
example,
#nwmgr -q info -c lan1 | grep "Driver state"
Driver state: IOCXGBE_STATE_ONLINE
#
#nwmgr -q info -c lan5 | grep "Driver state"
Driver state: IOCXGBE_STATE_ONLINE
#
NOTE:
When a DEAD state is reported, all the ports of that particular adapter will be affected
and marked as DEAD. Recovery should be attempted by choosing one of the DEAD
ports belonging to the failed LAN controller chip. Executing a single recover command is
sufficient to recover all the ports belonging to the particular failed chip.
Recovery will not proceed when a port is online/offline/suspended. It should be
attempted only on a dead port.
11 | HewlettPackardEnterprise
Two or more simultaneous manual recovery, either on same or different interface,
cannot be executed on a system. The second or latter attempt will fail unless the
currently executed command is finished.
Before attempting the manual recovery,
o User need not stop hpvmnet (vswitch) associated with the interface in prior to the
operation.
o User need not remove the interface from APA failover group in prior to the
operation.
o If the interface is a member of APA failover group, after the recovery is done,
check if the interface is available in the failover group by running:
nwmgr -v -S apa -c lan90x
(lan90x is the interface name of the failover group)
NOTE:
VC stands for Virtual Connect, which is used by blade servers.
For non-blade server users, only the information for "non-VC mode" will be applied.
Tunable
To enable or disable the automatic recovery feature, the 10GigEthr-03 driver startup
configuration file /etc/rc.config.d/hpiocxgbeconf has the following parameter:
HP_IOCXGBE_AUTO_RECOVERY= <yes/no> #default yes
If a user intends to disable the auto recovery feature for some reason, then the iocxgbe startup
configuration file should be edited as:
HP_IOCXGBE_AUTO_RECOVERY=no
Ensure there are no spaces in the beginning of the line.
12 | HewlettPackardEnterprise
However, irrespective of the tunable value, the NIC recovery daemon process will always be
alive in sleeping mode.
Modifying the tunable value to ‘no’ will not stop the NIC recovery daemon. Only recovery will not
be attempted on a DEAD iocxgbe I/O card.
At IOCXGBE B.11.31.2201, a new feature to recover the DEAD interfaces manually for DIO
cases to avoid the system reboot. Below, we'll describe a scheme that attempts to recover such
DEAD adapters back to the usable state and avoiding a full system reboot for Blade server.
NOTE: This feature does not support SuperDome2 and other non-blade servers.
13 | HewlettPackardEnterprise
This solution will also work when the interface is configured under APA, VLAN, Vswitch
and exported to guest (HPVM/VPAR) as an AVIO device although those interfaces
which is not used as DIO can be recovered by auto recovery solution.
Only a single recovery should be attempted by choosing any of the DEAD ports
belonging to the failed LAN controller chip. This is sufficient to recover all the ports
belonging to the particular failed chip. Care should be taken not to run more than one
recovery attempts in parallel on the failed PPAs.
14 | HewlettPackardEnterprise
Depot: PHNE_44903.depot
With the installation of the mentioned patches and depots, Manual DIO recovery of adapter
feature is available to use.
# swlist -d -s /tmp/PHNE_44904.depot
# PHNE_44904 1.0 LAN cumulative patch
# swlist -d -s /tmp/PHNE_44903.depot
# PHNE_44903 1.0 Networking commands cumulative patch
The “hpe_patches” is now a mega bundle which contains all the depots downloaded. Installing
the mega bundle effectively installs all the depots together on the server.
Step 7: Install the /tmp/hpe_patches bundle using the swinstall tool:
swinstall -s /tmp/<depot_filename> For
example:
# swinstall -x autoreboot=true -s /tmp/hpe_patches/ \*
15 | HewlettPackardEnterprise
Important: Use this command on the server where the product is to be installed, running
standalone; do not perform this step over the network. When installation is complete, the
server reboots.
Step 8: To verify that the 10 Gigabit Ethernet driver installation was successful, use this
command:
what /stand/vmunix /stand/current/mod/* | grep iocxgbe For
example:
# what /stand/vmunix /stand/current/mod/* | grep iocxgbe
/stand/current/mod/iocxgbe:
iocxgbe_ilan Version: 3 Jan 11 2022
iocxgbe Revision: IOCXGBE_B.11.31.WR2201 Jan 11 2022
$Revision: iocxgbe: B.11.31.2201_LR
/stand/current/mod/iocxgbe.prep:
#
Step 9: To verify the patch installation was successful, use these commands:
swverify <patch_name> For
example:
# swverify PHNE_44903
# swverify PHNE_44904
Ensure that there are no errors displayed in the output.
Step 10: Verify the auto recovery daemon is active, by using this command:
ps –ef | grep nicrecd For
example,
# ps -ef | grep nicrecd
root 14389 1 0 16:29:46 ? 0:00 /usr/sbin/nicrecd –a
#
Step 11: Verify the manual dio recovery option in hpnicrecovery, by using this command:
# /sbin/init.d/hpnicrecovery
Usage: hpnicrecovery start|stop|restart|recover|resume|dio_recover
Default Usage:
hpnicrecovery dio_recover
Specific Usage:
hpnicrecovery dio_recover <VM_name> <Guest_HW_path>
For more detailed explanation, refer to the appendix "Detailed explanation of DIO manual recovery" in the White
Paper
Usage:
hpnicrecovery resume <VSP/HOST HW_Path of the interface>
16 | HewlettPackardEnterprise
VSP/HOST HW_Path : Suspended state interface hardware path for dead recovery
For more detailed explanation, refer to the appendix "Detailed explanation of DIO manual recovery" in the White
Paper
17 | HewlettPackardEnterprise
36 host 1/0/0/4/0/0/0(lan29) - HP Dual Port CNA 10Gb BL8X0c i4 Embedded CNIC
37 host 1/0/0/4/0/0/1(lan30) - HP Dual Port CNA 10Gb BL8X0c i4 Embedded CNIC
38 host(fc) 1/0/0/4/0/0/3 - HP 10Gb PCIe 2-port Embedded FCoE Adapter
39 host 1/0/0/4/0/0/5(lan31) - HP Dual Port CNA 10Gb BL8X0c i4 Embedded CNIC
40 host 1/0/0/4/0/0/7(lan32) - HP Dual Port CNA 10Gb BL8X0c i4 Embedded CNIC
49 vpar8 1/0/0/7/0/0/0 0/0/0/4/0(lan1) HP NC553m 2p 10GbE BL-c Mezzanine Adapter
50 vm1 1/0/0/7/0/0/1 0/0/2/0(lan1) HP NC553m 2p 10GbE BL-c Mezzanine Adapter
51 vm2 1/0/0/7/0/0/2 0/0/2/0(lan1) HP NC553m 2p 10GbE BL-c Mezzanine Adapter
52 host 1/0/0/7/0/0/3(lan44) - HP NC553m 2p 10GbE BL-c Mezzanine Adapter
53 host 1/0/0/7/0/0/4(lan45) - HP NC553m 2p 10GbE BL-c Mezzanine Adapter
54 vpar5 1/0/0/7/0/0/5 0/0/0/4/0(lan1) HP NC553m 2p 10GbE BL-c Mezzanine Adapter
55 dio 1/0/0/7/0/0/6 - HP NC553m 2p 10GbE BL-c Mezzanine Adapter
56 vpar7 1/0/0/7/0/0/7 0/0/0/4/0(lan1) HP NC553m 2p 10GbE BL-c Mezzanine Adapter
57 host 1/0/0/9/0/0/0(lan49) - HP NC552m 2p 10GbE BL-c Mezzanine Adapter
58 host 1/0/0/9/0/0/1(lan50) - HP NC552m 2p 10GbE BL-c Mezzanine Adapter
Above table shows the information of all the LAN ports distributed across
VMs/host/DIO pool on the system.
Compare the Guest H/W path of the dead adapter mentioned in the table.
Select the ID of the interface that should be recovered : 49 <-
===================================================
Please verify hpvmstatus shows all the owner VM/Vpar guests listed above are in ON state
Also S/W state of each port is in CLAIMED from ioscan output
running on each VM/Vpar guest
18 | HewlettPackardEnterprise
Done.
Suspending h/w path 1/0/0/7/0/0/5 in vpar5 (0/0/0/4/0(lan1))
Done.
Suspending h/w path 1/0/0/7/0/0/6 in DIO pool
Done.
Suspending h/w path 1/0/0/7/0/0/7 in vpar7 (0/0/0/4/0(lan1))
Done.
NOTE:
When a DEAD state is reported, all the ports of that particular adapter will be
affected and marked as DEAD. Recovery should be attempted by choosing one of the
DEAD ports belonging to the failed LAN controller chip. Executing a single recover
command is sufficient to recover all the ports belonging to the particular failed chip.
Solution has a limitation to check the dead adapter in all cases, Recovery will
proceed when a port is online/offline/suspended, so carefully select the dead adapter
from the list. Recovery on non-dead card might be effect badly on your system. So
attempt the manual DIO recovery only on a dead adapter.
User need to follow the instruction very carefully, wrong input may cause your
system badly and it may ended it up with system crash as well.
Two or more simultaneous manual recovery, either on same or different interface,
cannot be executed on a system. The second or latter attempt will fail unless the
currently executed command is finished.
Before attempting the manual recovery, the user needs to check the syslog and
/var/opt/hprecovery/hprecoverylog and confirm auto recovery got failed due to DIO
port, keep a note of the dead adapter.
ERROR: Interface 1/0/0/7/0/0/1 is in Guest OS DIO pool!!
Nov 07 06:43:46 hpbl1-s2 nicrecd[26732]: Interface: 1/0/0/7/0/0 recovery failed
Nov 07 06:43:46 hpbl1-s2:if some of the port(s) on the same NIC hardware are used for DIO,
Nov 07 06:43:46 hpbl1-s2:nicrecd cannot recover. Try manual recocvery by: /sbin/init.d/hpnicrecovery
dio_recover
User need to get the admin support if the manual DIO recovery attempt got failed.
19 | HewlettPackardEnterprise
g) Manual DIO recovery and auto recovery of non-DIO interfaces execution
Manual DIO recovery and auto recovery of non-DIO interfaces cannot run simultaneously.
Before starting DIO manual recovery command user has to check whether automatic recovery
operation is running for DEAD interface on the VSP. If it is running then user has to wait to
complete the automatic recovery operation, also user can check the status of auto recovery
operation from /var/opt/hprecovery/hprecoverylog file.
Here the example user tries dio_recover command and automatic recovery is running in
background, so dio_recover command exit with warning message.
# /sbin/init.d/hpnicrecovery dio_recover
Warning !!
Logging
Following are the examples of messages logged in the syslog, when an iocxgbe adapter
encounters a DEAD state:
Or
Or
Once the recovery attempt is completed, a success/failure message will be logged in the syslog:
Or
20 | HewlettPackardEnterprise
The following messages are examples of what will be logged in the syslog, if an interface has
reported DEAD state more than five times within 24 hours:
Sep 21 16:34:27 lansec724 nicrecd[14389]: CAUTION: 0/0/0/3/0/0 going to dead state too
frequently!!
Sep 21 16:34:27 lansec724 nicrecd[14389]: It could be faulty. Please replace the part and
continue...
Sep 21 16:34:27 lansec724 nicrecd[14389]: NIC Recovery operation failed at Interface
0/0/0/3/0/0.
Sep 21 16:34:27 lansec724 nicrecd[14389]: Manual NIC Recovery operation may be attempted.
The auto recovery daemon start message during system boot up:
Start NIC auto-recovery daemon.................................... OK
Sample message displayed by auto recovery daemon during system boot up without the LAN
cumulative patch PHNE_44540 installed:
Start NIC auto-recovery daemon.................................... N/A
Sample message displayed by auto recovery daemon in rc.log during system boot up without
the LAN cumulative patch PHNE_44540 installed:
Sample message displayed when the user attempts to manually recover an interface which is
NOT in DEAD state:
# lanadmin -x recover 8
WARNING!!
Driver will reset the entire interface.
This means that ALL the lan/FCoE ports that are associated with this interface on the
adapter will be reset.
Recovery is not supported on an interface with one or more ports/functions exported to
the guest as a DIO device.
In such a case please abort this operation and reboot the VSP to recover the interface.
Do you really want to proceed? (y/n) [n]: y
Sample message displayed when the user attempts to manually recover an interface on a guest
DIO device:
# lanadmin -x recover 10
WARNING!!
21 | HewlettPackardEnterprise
Driver will reset the entire interface.
This means that ALL the lan/FCoE ports that are associated with this interface on the
adapter will be reset.
Recovery is not supported on an interface with one or more ports/functions exported to
the guest as a DIO device.
In such a case please abort this operation and reboot the VSP to recover the
interface. Do you really want to proceed? (y/n) [n]: y
#
Sample message displayed when the user attempts to manually recover an interface on host
when some of its ports are exported to the guest DIO pool:
# lanadmin -x recover 21
WARNING!!
Driver will reset the entire interface.
This means that ALL the lan/FCoE ports that are associated with this interface on the
adapter will be reset.
Recovery is not supported on an interface with one or more ports/functions exported to
the guest as a DIO device.
In such a case please abort this operation and reboot the VSP to recover the interface.
Do you really want to proceed? (y/n) [n]:y
Once the DIO manual recovery attempt is completed, a success message will be logged in the
syslog:
Sample message displayed when the user attempts to manually recover an interface which is
NOT in DEAD state:
# /sbin/init.d/hpnicrecovery dio_recover
Warning !!
22 | HewlettPackardEnterprise
Sample message displayed when the user attempts to manually recover an interface on a guest
DIO device:
# /sbin/init.d/hpnicrecovery dio_recover
Warning !!
Manual recovery script not supported on Guest
If iocxgbe DIO interface goes DEAD,
Run manual recovery script from VSP
# /sbin/init.d/hpnicrecovery dio_recover
Warning !!
# /sbin/init.d/hpnicrecovery dio_recover
ERROR !!
III. Recovery operation of a dead adapter port exported the guest is in OFF/EFI state.
Here executing dio_recover command on VSP and selecting a DEAD adapter from the list of ports shown by the
dio_recover command output; but dead adapter port ‘’1/0/0/7/0/0/2” exported the guest “vm2” is in OFF/EFI
state.
# /sbin/init.d/hpnicrecovery dio_recover
Delete the corresponding port from vm2 and keep it in DIO pool
Also find all ports which belongs to VM/Vpar guest OFF/EFI shell state from the
below list,
Delete the corresponding port from those VM/Vpar guest and keep it in DIO pool
Then rerun the script ...
23 | HewlettPackardEnterprise
Note:
In this case, “1/0/0/7/0/0/2” port has to be removed from vm2 by hpvmmodify -d while the recovery is done. After
the recovery is completed, it can be re-added to vm2 using hpvmmodify –a
# /sbin/init.d/hpnicrecovery dio_recover
Please verify hpvmstatus shows all the owner VM/Vpar guests listed above are in ON state
Also S/W state of each port is in CLAIMED from ioscan output
running on each VM/Vpar guest
Warning !!
IF any of port is not in claimed state or port belonging VM/Vpar guest is in OFF/EFI shell state
Then Delete the corresponding port from those VM/Vpar guest and keep it in DIO pool
And rerun the script ...
Note:
In this case, “1/0/0/7/0/0/2” port has to be removed from vm2 by hpvmmodify -d while the recovery is done. After
the recovery is completed, it can be re-added to vm2 using hpvmmodify –a
# /sbin/init.d/hpnicrecovery dio_recover
Warning !!
24 | HewlettPackardEnterprise
Failed to suspend the h/w path:1/0/0/7/0/0/2
If it is on unclaimed state,
Delete the port only from the Guest and keep it in DIO pool
Then rerun the script..
Note:
In this case, “1/0/0/7/0/0/2” port has to be removed from vm2 by hpvmmodify -d while the recovery is done. After
the recovery is completed, it can be re-added to vm2 using hpvmmodify –a
VI. Recovery operation of a dead adapter and manually resume the adapter ports.
Here executing dio_recover command on VSP and selecting a DEAD adapter from the list of ports shown by the
dio_recover command output; but recovery operation got exited before successfully complete the recovery
operation and it has shown below message.
# /sbin/init.d/hpnicrecovery dio_recover
WARNING !!
In this condition do not rerun script again,
it may panic the VSP, So manually resume
each port of the DEAD adapter mentioned below;
Note:
In this case, “1/0/0/7/0/0/0” and “1/0/0/7/0/0/1” ports are successfully resumed. For completing the recovery
operation, do the resume operation on remaining ports manually using resume option.
25 | HewlettPackardEnterprise
Confirm? (y/n) :y
Recovering port 1/0/0/7/0/0/2 in DIO pool
Done.
References
For more information on HPUX 10 Gigabit Ethernet drivers, see - http://www.hpe.com/info/10-
gigabit-ethernet-docs
26 | HewlettPackardEnterprise