100% found this document useful (1 vote)
368 views

Troubleshooting BIG-IP Hardware

F5 LTM Tshoot

Uploaded by

neoalt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
368 views

Troubleshooting BIG-IP Hardware

F5 LTM Tshoot

Uploaded by

neoalt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Troubleshooting BIG-IP

Hardware
Presented by:

Emmanuel T. Jiki, Sr. NSE

[June 2016]
Objectives
• Introduction
• BIG-IP Hardware Overview (sub-systems)
• Lights Out Management (AOM)
• CPU sub-system
• Packet Path Subsystem
• Available BIG-IP Platform Types

• Field Replaceable Units (FRU)


• Power Supply Units
• Hard Drive Units
• RAID Management

• Serial Console
• Setting baud rate
• VIPRION system
• Appliance
• Available BIG-IP Platform Types

© F5 Networks, Inc 2
Objectives
• End User Diagnostics (EUD)
• Use in RMA procedures
• Platform Diagnostics Tool – Live EUD

• Best Practices & Recommendations

• Q/A

© F5 Networks, Inc 3
BIG-IP Hardware Overview
• Overview

• Hardware Troubleshooting Process Map

• BIG-IP Hardware Troubleshooting Flowchart


Introduction – Hardware Overview
• Available BIG-IP platforms include
• Virtual Edition (VE)
• Appliances
• VIPRION

• Key points:
• BIG-IP VE do not have serial numbers
• VE Registration Key used to open support cases
• More recent Appliance platforms use following serial number format:
• F5-123A-BC45
• Older Appliance platforms use following format:
• bipXXXXXXs
• VIPRION Chassis & Blades use serial number format:
• chs012345s
• bld012345s

© F5 Networks, Inc 5
BIG-IP Hardware Troubleshooting Process Map

• Once a hardware problem is suspected:


• Important to as far as possible confirm hardware problems by running EUD
• Try to narrow down to physical hardware or software related issue by
reviewing and validating all symptoms
• The flow chart over leaf provides a very rough guide on things to look at
• Review relevant platform guide for BIG-IP platform
• Use askf5 KB to map symptoms against documented known issues

© F5 Networks, Inc 6
Hardware Troubleshooting Flowchart
START: Suspect an
issue with the device

CONSOLE

CONSOLE
HARDWARE MESSAGES SOFTWARE

UNEXPECTED
APPLIANCE VIPRION REBOOT

LOG
ANALYSIS
FRU MAINBOARD PSU SFP BLADE CHASSIS
KERNEL
PANIC

HDD
HDD
CONFIGURATION
LOADING

PSU
SOFTWARE RCA
EUD RELATED FOUND?
No Yes

HARDWARE Yes No
RMA FAULT
DETECTED OTHER REMEDIAL
Yes SOFTWARE No
STEPS/ES
UPGRADE / ESCALATION
WORKAROUND
No

© F5 Networks, Inc 7
Field Replaceable Units FRU
• Overview

• Power Supply Unit Troubleshooting and Replacement

• Power Supply Unit Troubleshooting Flowchart

• Hard drive Troubleshooting and Replacement

• Hard drive Troubleshooting Flowchart


Field Replaceable Units (FRU)
• Power Supply Units (PSU)
• VIPRION PSUs are located in the chassis (Serial number of chassis used to
process RMA)
• Verify if AC or DC PSU

• Some VIPRION Blades support field replaceable hard drives

• Hard drives on most appliance platforms are field replaceable

• Certain platforms have field replaceable fan trays

• * Consult relevant platform guide for details of field replaceable units

© F5 Networks, Inc 9
PSU Troubleshooting and replacement
• Basic checks:
• Check PSUs are properly installed, power supply in ON state, loose/faulty connectors etc.
• Verify PSUs connected to suitable power sources
• Test alternate power cords to isolate faulty PSU
• Analyze ltm logs/qkview/LCD panel for power supply alarms/log messages
• Console or /var/log/ltm log messages such as below in
010d0006:0: Chassis power supply <X> has experienced an issue. Status is as follows: FAN=bad;
VINPUT=bad; VOUTPUT=bad
system_check[4753]: 010d0006:0: Chassis power supply <X> is not supplying power (status: 0): make
sure it is plugged in.

• <X> refers to the power supply number

• tmsh commands can also be used to verify state of PSUs


tmsh show sys hardware

© F5 Networks, Inc 10
Power supply unit troubleshooting flowchart

© F5 Networks, Inc 11
Hard Drive Troubleshooting and Replacement
• Consult “Platform Maintenance” section in relevant platform guide for instructions
on:
• Viewing the status of RAID 1 mirroring
• Identifying a faulty drive
• Properly swapping hard drives

• Run fsck utility for suspect failing disk (askf5 solution: SOL10328)

• Logical and or physical rebuild of RAID array

• Askf5 KB has solution links for rebuilding RAID arrays (SOL12756)

© F5 Networks, Inc 12
HDD Troubleshooting Flowchart

© F5 Networks, Inc 13
Serial Console
• Baud rate settings
Serial console baud rate settings
• The default serial console baud rate on most BIG-IP platforms including VIPRION is 19200

BAUD RATE PROCEDURE


BAUD RATE PROCEDURE VIPRION
APPLIANCE
• View baud rate settings • Connect to serial console of primary
• tmsh show sys console blade

• Set baud rate • Verify and set baud rate for blade
• tmsh modify sys console baud-rate • tmsh modify sys console baud-rate
<value> <value>

• Use AOM command menu to set • To set baud rate for system hardware
AOM serial console baud rate • Access console port menu ESC (
• Esc ( • Select:
• Select either • B --- Set baud rate
• B --- Set console baud rate • Q --- Quit menu and return to
• B --- AOM baud rate configurator console
• Save config and reboot blade

© F5 Networks, Inc 15
End User Diagnostics (EUD)
• Overview

• VIPRION Specific EUD requirements

• EUD Options Menu

• Platform Diagnostics Tool


EUD – Overview

• Set of Hardware diagnostic tests


• Provide reports on various hardware components
• EUD software pre-installed on BIG-IP systems
• Verify latest version with command:
• eud_info

• EUD should not be run while system is in production

• Two methods to launch EUD


• Select EUD from grub menu when booting
• Boot to EUD from USB storage device or USB CD-ROM

© F5 Networks, Inc 17
VIPRION specific EUD requirements

• All blades must be removed from chassis except EUD pending blade

• Network cables to upstream devices must all be disconnected

• Leave only console connection to blade

• Do not run simultaneous instances of EUD on different blades in VIPRION


chassis

• Failing to adhere to above could result in false positive test results

• * On VIPRION systems:
• the boot (grub) menu doesn’t display by default after powering on the blade
• While blade boots, press Esc key during countdown to display boot menu
• Select EUD option in boot menu

© F5 Networks, Inc 18
EUD options menu
• Option A to “Run All”

• Select Q to quit EUD prior to


rebooting system

• Options available to display test


report summary

© F5 Networks, Inc 19
Platform Diagnostics

• Platform diagnostics are a form of “Live EUD”


• Available on systems running 11.4.0 and later
• A reboot is not required after running platform diagnostics
• Note: Traffic might be impacted

• Only limited subset of devices are tested:


• Hard drives – collect smartctl data from HDD
• PCI bus – Verify expected PCI bus addresses are active and responsive
• SSL/Compression Hardware
• Requires TMMs to be shutdown to test
• Use command ‘bigstart stop’

• Yes, there will be impact to production traffic for this test

© F5 Networks, Inc 20
Platform Diagnostics II

• Two files created when platform diagnostics is invoked:


• platform_check in /var/log directory
• platform_check.xml in /var/db/platform_check

• Both files are rotated into .tgz files so multiple test results can be saved
• To invoke and run tests use syntax:
• tmsh run util platform_check

• Hard drive:
• tmsh run util platform_check disk

• PCI bus:
• tmsh run util platform_check misc

• SSL and compression hardware:


• tmsh run util platform_check hwaccel

© F5 Networks, Inc 21
Platform Diagnostics Output
• Output of HDD check

• Output of PCI check

• *** Failure reported by diagnostics tool should lead to an


expedited window to carry out detailed EUD check

© F5 Networks, Inc 22
Best Practices and
Recommendations
• Summary
BIG-IP Hardware Best Practice

• Serial console connection helps build timeline of events


• Instructions on running EUD are contained in relevant platform field
testing guide
• Narrowing down to either Hardware or Software related issues helps
speed up investigation
• EUD should be run to isolate hardware related problems
• Ensure all cables except console cables are disconnected
• Remove all blades except blade to be tested for bladed VIPRIONS

© F5 Networks, Inc 24
Summary and useful solution links
• Solution links

• Recap
Useful Solution Links & Information

• SOL13885: Setting the baud rate of the serial console


port (11.x - 12.x)
• SOL12821: Modifying the baud rate configuration may
result in garbled console output
• SOL13325: Setting the serial console baud rate on a
VIPRION system (11.x - 12.x)
• SOL9403: Overview of the AOM subsystem
• SOL10834: The serial port baud rate must match the
Always-On Management baud rate
• SOL9476: The F5 hardware/software compatibility matrix
• SOL14426: Hard disk error detection and correction
improvements

© F5 Networks, Inc 26
Useful Solution Links II

• SOL4309: F5 platform life cycle support policy


• SOL13885: Setting the baud rate of the serial console
port (11.x - 12.x)
• SOL3225: F5 End of Life policy
• SOL12534: The F5 RMA Field Technician Preparation
Guide
• SOL12756: Repairing disk errors on RAID-capable BIG-IP
platforms
• SOL15525: Users are unable to add RAID array members
using the Configuration utility

© F5 Networks, Inc 27
Useful Solution Links III

• SOL15442: Using the BIG-IP platform diagnostics tool


• SOL7172: Overview of the End User Diagnostics software
• SOL14425: Error Message: crit smartd[PID]: Device:
/dev/<sda|sdb>, <#> Currently unreadable (pending)
sectors
• SOL15917: The RAID status reports undefined disk when
only one SSD disk is populated
• SOL11525: Rebuilding the RAID configuration after a
hard drive replacement
• SOL13131: Recovering from a corrupt drive partition
(11.x)
• SOL7683: Connecting a serial terminal to a BIG-IP system

© F5 Networks, Inc 28
Useful Solution Links IV

• SOL12380: RAID capable BIG-IP platforms report the


RAID status as degraded in the Configuration utility after
a hard drive replacement
• SOL11965: Disabling RAID drive mirroring
• https://www.f5.com/pdf/customer-support/return-
materials-authorization-ds.pdf
• https://wiki.kernel.org/
• Linux Raid

© F5 Networks, Inc 29
Recap
• Insert points

If I can be of further assistance please contact me:

Julio Hevia | [email protected], Emmanuel T. Jiki | [email protected]

You might also like