Guardium Data Protection - L4 Deploy - Troubleshoot - Troubleshooting Unix S-TAP Sev1 Issues - Presentation
Guardium Data Protection - L4 Deploy - Troubleshoot - Troubleshooting Unix S-TAP Sev1 Issues - Presentation
Version: 1.1
Level 4 - Deployment
Contributors:
Tansel Zenginler
Principal, Learning Content Development
IBM Learning: Security
Petra Unglaub
Advisory Security Technical Specialist
IBM Learning: Security
August 2023 edition
NOTICES
This information was developed for products and services offered in the USA.
IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for
information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to
state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not
infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any
non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not
grant you any license to these patents. You can send license inquiries, in writing, to:
The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law:
INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER
EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS
FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement
may not apply to you.
This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes
will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s)
described in this publication at any time without notice.
Any references in this information to non-IBM websites are provided for convenience only and do not in any manner serve as an endorsement of those
websites. The materials at those websites are not part of the materials for this IBM product and use of those websites is at your own risk.
IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you.
Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly
available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to
non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products.
This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples
include the names of individuals, companies, brands, and products. All names and references for organizations and other business institutions used in
this deliverable’s scenarios are fictional. Any match with real organizations or institutions is coincidental. All names and associated information for
people in this deliverable’s scenarios are fictional. Any match with a real person is coincidental.
TRADEMARKS
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corp., registered in many jurisdictions
worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the web
at “Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml.
Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the
United States, and/or other countries.
Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates.
The registered trademark Linux® is used pursuant to a sublicense from the Linux Foundation, the exclusive licensee of Linus Torvalds, owner of the
mark on a worldwide basis.
Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both.
UNIX is a registered trademark of The Open Group in the United States and other countries.
VMware, the VMware logo, VMware Cloud Foundation, VMware Cloud Foundation Service, VMware vCenter Server, and VMware vSphere are
registered trademarks or trademarks of VMware, Inc. or its subsidiaries in the United States and/or other jurisdictions.
Red Hat®, JBoss®, OpenShift®, Fedora®, Hibernate®, Ansible®, CloudForms®, RHCA®, RHCE®, RHCSA®, Ceph®, and Gluster® are trademarks or
registered trademarks of Red Hat, Inc. or its subsidiaries in the United States and other countries.
© Copyright International Business Machines Corporation 2023.
This document may not be reproduced in whole or in part without the prior written permission from IBM.
US Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
Table of Contents
Introduction ................................................................................................................................................................. 4
What type of problems are Severity 1 cases? ............................................................................................................... 4
Cause ............................................................................................................................................................................... 5
Diagnosing the problem .............................................................................................................................................. 6
Key points ........................................................................................................................................................................ 6
Specific problem considerations and logs to gather ................................................................................................... 8
Summary ..................................................................................................................................................................... 9
Introduction
IBM Security® Guardium® Data Protection S-TAP is designed to minimize the impact to monitored
databases and servers. Extensive testing is performed to ensure that this is the case. However, in rare
situations there can be problems where performance or availability is affected by the S-TAP. These cases
are handled with the highest priority by Guardium support and development. If the problem occurs on a
production server, the issue is treated as a severity 1 issue.
This guide showcases how to troubleshoot these problems for UNIX S-TAP and gather the critical
information required to find the root cause. For Windows troubleshooting, see Troubleshooting Guardium
Windows S-TAP severity 1 issues
Guardium Severity 1 issues are normally reserved for production environments meeting the System
Requirements as detailed in the related information below. The following list contains examples of issues
that are normally considered Severity 1:
• Guardium appliance is down
• Appliances unable to communicate with the Central Manager (CM)
• Database full
• Sniffer restarts more than once an hour
• S-TAP crashed DB server
• S-TAP severely affects DB server performance
• S-TAP is not running
• Upgrade failed during scheduled weekend service window
For other issues that you consider to be a Severity 1, provide a business impact statement including the
following details:
• Number of users affected by the problem
• Whether the system is a production, development, or test system
• Deadlines impacted by the issue
• Other information you deem relevant
Dial In requirements
Customers should be aware of the remote dial in requirements if IBM Technical Support determines that a
remote dial in is necessary. This includes the need to have previously recorded the root passkeys for all
appliances.
Root cause analysis
The main aim of a Severity 1 case is to get the system back up and running as soon as possible. When
critical impact is alleviated, some root cause analysis of the problem might not be possible outside of
normal business hours because it can require extended analysis, time, and specific R&D engineers to be
available. In these cases, the root cause analysis will continue during normal business hours.
Do not set a case to Severity 1 ongoing investigations. Severity 2 is more suitable and receives high priority
attention during normal business hours. This avoids cases being moved between the Severity 1 engineers
as shifts change around the globe. The case owner can perform in depth diagnostics which is significantly
harder when cases are changing hands on a frequent basis.
Cause
Problems vary on a case-by-case basis but are typically one or more of the following causes:
• Defects in Guardium S-TAP
• Incorrect or not optimized configuration
• Environmental factors such as high traffic level or conflict with third party products combined with
either of the previous possible causes
Diagnosing the problem
There are some key points that apply to most of the problem types. For each specific problem, there are
considerations and logs to gather before contacting Support.
Key points
Diagnostic timing
The timing of a diagnostic capture is crucial to find the root cause of these problems. For example, if
diagnostics are only captured after S-TAP is stopped, it might be difficult to tell exactly what happened.
Gathering diagnostics (including crash dump, see below) when the problem is happening is required.
Without diagnostics taken when problem is happening, Support will work on a best effort basis to find the
root cause, but it might not be possible to determine the cause.
When an S-TAP, database, or server crashes, a dump file is required to determine the root cause. Without
a dump file, the exact root cause can often not be determined with certainty. Work with server or database
administrators to ensure the crash dump is collected. If there is any doubt for how to gather the files, use
the following technote as a reference:
How to collect core dumps if Guardium UNIX S-TAP is impacting the database or database server
If a dump file cannot be collected, Support will work on a best effort basis to find the root cause. In such
cases, a new S-TAP version might be provided without knowing if it will resolve the issue. If the problem
returns with the new S-TAP version, the system administrator should be prepared to gather crash dumps.
When a database or server crashes, open a support case to the vendor so that they can provide their
analysis of the crash. If they discover that Guardium is the cause, send the full vendor analysis to the
Guardium support case.
Guard_monitor
Guard_monitor, or S-TAP Watchdog, is a process that can monitor the S-TAP performance and take
automatic actions. For example, the S-TAP can be stopped, diagnostics run, and core dump gathered
when the S-TAP reaches 50% of server CPU or when the S-TAP is not responding. For more information,
see S-TAP Monitor.
Important: Guard_monitor can automatically stop or restart the S-TAP. Before starting Guard_monitor in a
production environment, carefully review and test all the settings.
Guard_diag
Guard_diag is a script that gathers detailed troubleshooting information about the system and installed
agents. It should be provided for any issue involving UNIX S-TAP.
Stopping or uninstalling the S-TAP
Before stopping or uninstalling S-TAP, review this write-up in full and consider whether appropriate logs
were collected to troubleshoot the problem. If appropriate logs are not collected, it might not be possible
to determine the root cause with S-TAP stopped.
If stopping the S-TAP is required, follow steps here - How the Guardium S-TAP Process is handled
throughout OS versions. In most cases, this stops problems.
If the problem is caused by a defect in the S-TAP and the latest version is not installed, an APAR might
resolve the problem. APARs report known defects in the product and fixed versions. Search for key terms
such as., "Guardium Linux crash" on the support portal and check if any APARs exist.
Specific problem considerations and logs to gather
S-TAP process crashing
This problem is not likely to impact database or server performance but will cause a monitoring outage. If
S-TAP is crashing, Guard_monitor can be configured to collect the crash dump. Gather these logs before
contacting Support:
• guard_diag while the S-TAP is installed and soon after the crash happens
• S-TAP process crash dump
• Timing of the crashes. Is it correlated to any other event?
S-TAP debug logging causes an increase in S-TAP CPU or memory usage. When used intentionally for
troubleshooting, this increase is not a cause for concern. But if it is enabled accidentally or never disabled,
the CPU or memory usage will appear higher than normal. Ensure that tap_debug_output_level=0 is in the
guard_tap.ini unless specifically used for troubleshooting.
CPU and memory usage cases are good candidates for guard_monitor so automatic action can be taken
when the problem happens. Gather these logs before contacting Support:
• At least one S-TAP process dump, triggered manually on the server when the CPU/memory is high.
Commands to trigger a process dump vary between OS. To confirm exact steps in your
environment, contact the server admin.
• guard_diag taken when the CPU/Memory is high
• Timing of problem. Is it correlated to any other event?
Database crashing
This problem can only be caused by Guardium if A-TAP or EXIT are in use. In some cases, Guardium is
suspected, but database vendor analysis shows another cause. Gather these logs before contacting
Support:
• Database crash dump
• Database vendor analysis
• guard_diag
• Timing of the crashes. Is it correlated to any other event? Especially consider A-TAP or EXIT related
actions.
If the server is crashing immediately when booting and no actions are possible, special steps are required:
1. Boot into single user mode or failsafe mode for Solaris.
2. In the guard_tap.ini set ktap_installed=0. This will prevent opening K-TAP device and capturing
traffic on next boot, preventing most crash issues.
3. Reboot the server to normal mode. If the server no longer crashes, proceed to step 5.
4. If the server is still crashing, take further actions to prevent the K-TAP loading on the next boot,
preventing rarer crashes. Exact actions per OS are documented in table 1 below. Reboot the server
to normal mode after the action.
5. Gather logs. The guard_diag prompts for the new install directory if the name has changed.
• Files to collect - Server syslog, central_logger.log, all module '.log' files e.g., <install
directory>/STAP/current/STAP.log, <install directory>/KTAP/current/KTAP.log
OS Action
Rename Guardium install directory. For example:
Linux
mv /usr/local/guardium /usr/local/guardium_temp
Rename /etc/drivers/guardium directory. For example:
AIX
mv /etc/drivers/guardium /etc/drivers/guardium_temp
Move /kernel/drv/ktap*.conf file to a new directory. For example:
Solaris
mv /kernel/drv/ktap_107702.conf /kernel/drv/temp_dir/ktap_107702.conf
Move /stand/current/mod/ktap* file to a new directory. For example:
HPUX
mv /stand/current/mod/ktap* /stand/current/mod/temp_dir/
Table 1 - Actions to prevent K-TAP loading
Summary
In most cases, Guardium Support and development will resolve these types of problems. Case
resolutions can be found without contacting Support include:
• Configuration problems e.g., SGATE causing latency
• Vendor analysis shows crashes not caused by Guardium
• Existing issue hit and resolved by upgrading to latest S-TAP