0% found this document useful (0 votes)
176 views9 pages

Guardium Data Protection - L4 Deploy - Troubleshoot - Troubleshooting Unix S-TAP Sev1 Issues - Presentation

Uploaded by

M Yahya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
176 views9 pages

Guardium Data Protection - L4 Deploy - Troubleshoot - Troubleshooting Unix S-TAP Sev1 Issues - Presentation

Uploaded by

M Yahya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Date: August 2023

Version: 1.1

Level 4 - Deployment

Troubleshooting Guardium UNIX S-TAP severity 1


issues

Guardium Data Protection

Contributors:

Tansel Zenginler
Principal, Learning Content Development
IBM Learning: Security

Petra Unglaub
Advisory Security Technical Specialist
IBM Learning: Security
August 2023 edition

NOTICES
This information was developed for products and services offered in the USA.
IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for
information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to
state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not
infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any
non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not
grant you any license to these patents. You can send license inquiries, in writing, to:

IBM Director of Licensing


IBM Corporation
North Castle Drive, MD-NC119
Armonk, NY 10504-1785
United States of America

The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law:
INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER
EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS
FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement
may not apply to you.
This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes
will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s)
described in this publication at any time without notice.
Any references in this information to non-IBM websites are provided for convenience only and do not in any manner serve as an endorsement of those
websites. The materials at those websites are not part of the materials for this IBM product and use of those websites is at your own risk.
IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you.
Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly
available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to
non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products.
This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples
include the names of individuals, companies, brands, and products. All names and references for organizations and other business institutions used in
this deliverable’s scenarios are fictional. Any match with real organizations or institutions is coincidental. All names and associated information for
people in this deliverable’s scenarios are fictional. Any match with a real person is coincidental.

TRADEMARKS
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corp., registered in many jurisdictions
worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the web
at “Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml.

Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the
United States, and/or other countries.
Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates.
The registered trademark Linux® is used pursuant to a sublicense from the Linux Foundation, the exclusive licensee of Linus Torvalds, owner of the
mark on a world­wide basis.
Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both.
UNIX is a registered trademark of The Open Group in the United States and other countries.
VMware, the VMware logo, VMware Cloud Foundation, VMware Cloud Foundation Service, VMware vCenter Server, and VMware vSphere are
registered trademarks or trademarks of VMware, Inc. or its subsidiaries in the United States and/or other jurisdictions.
Red Hat®, JBoss®, OpenShift®, Fedora®, Hibernate®, Ansible®, CloudForms®, RHCA®, RHCE®, RHCSA®, Ceph®, and Gluster® are trademarks or
registered trademarks of Red Hat, Inc. or its subsidiaries in the United States and other countries.
© Copyright International Business Machines Corporation 2023.
This document may not be reproduced in whole or in part without the prior written permission from IBM.
US Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
Table of Contents
Introduction ................................................................................................................................................................. 4
What type of problems are Severity 1 cases? ............................................................................................................... 4
Cause ............................................................................................................................................................................... 5
Diagnosing the problem .............................................................................................................................................. 6
Key points ........................................................................................................................................................................ 6
Specific problem considerations and logs to gather ................................................................................................... 8
Summary ..................................................................................................................................................................... 9
Introduction
IBM Security® Guardium® Data Protection S-TAP is designed to minimize the impact to monitored
databases and servers. Extensive testing is performed to ensure that this is the case. However, in rare
situations there can be problems where performance or availability is affected by the S-TAP. These cases
are handled with the highest priority by Guardium support and development. If the problem occurs on a
production server, the issue is treated as a severity 1 issue.

Severity 1 problems include the following types:


1. S-TAP process crashing
2. High CPU or memory usage of S-TAP on the database server
3. Database crashing
4. Availability or latency problems when accessing the database
5. Server OS crash or hang

This guide showcases how to troubleshoot these problems for UNIX S-TAP and gather the critical
information required to find the root cause. For Windows troubleshooting, see Troubleshooting Guardium
Windows S-TAP severity 1 issues

What type of problems are Severity 1 cases?


The IBM Support Severity definitions are detailed in the Support Guide, specifically here.

Guardium Severity 1 issues are normally reserved for production environments meeting the System
Requirements as detailed in the related information below. The following list contains examples of issues
that are normally considered Severity 1:
• Guardium appliance is down
• Appliances unable to communicate with the Central Manager (CM)
• Database full
• Sniffer restarts more than once an hour
• S-TAP crashed DB server
• S-TAP severely affects DB server performance
• S-TAP is not running
• Upgrade failed during scheduled weekend service window

For other issues that you consider to be a Severity 1, provide a business impact statement including the
following details:
• Number of users affected by the problem
• Whether the system is a production, development, or test system
• Deadlines impacted by the issue
• Other information you deem relevant

Specific note on upgrades *


To avoid potential upgrade problems, it's vital to install the latest required Health Check and check the
results just before the upgrade.

Dial In requirements
Customers should be aware of the remote dial in requirements if IBM Technical Support determines that a
remote dial in is necessary. This includes the need to have previously recorded the root passkeys for all
appliances.
Root cause analysis
The main aim of a Severity 1 case is to get the system back up and running as soon as possible. When
critical impact is alleviated, some root cause analysis of the problem might not be possible outside of
normal business hours because it can require extended analysis, time, and specific R&D engineers to be
available. In these cases, the root cause analysis will continue during normal business hours.

IBM Support severity definitions (see related link below)


IBM Support works with you 24 hours a day, seven days a week to resolve Severity 1 problems provided
you have a technical resource available to work during those hours. If clients do not respond to updates
within 24 hours, IBM Support reserves the right to lower the severity until such time as the client is able to
work 24/7 with IBM Support.

Case severity levels


If you need Severity 1 attention on an existing case, update the case via the Service Request tool and set
the Severity accordingly.

Do not set a case to Severity 1 ongoing investigations. Severity 2 is more suitable and receives high priority
attention during normal business hours. This avoids cases being moved between the Severity 1 engineers
as shifts change around the globe. The case owner can perform in depth diagnostics which is significantly
harder when cases are changing hands on a frequent basis.

Cause
Problems vary on a case-by-case basis but are typically one or more of the following causes:
• Defects in Guardium S-TAP
• Incorrect or not optimized configuration
• Environmental factors such as high traffic level or conflict with third party products combined with
either of the previous possible causes
Diagnosing the problem
There are some key points that apply to most of the problem types. For each specific problem, there are
considerations and logs to gather before contacting Support.

Key points

Diagnostic timing

The timing of a diagnostic capture is crucial to find the root cause of these problems. For example, if
diagnostics are only captured after S-TAP is stopped, it might be difficult to tell exactly what happened.
Gathering diagnostics (including crash dump, see below) when the problem is happening is required.
Without diagnostics taken when problem is happening, Support will work on a best effort basis to find the
root cause, but it might not be possible to determine the cause.

Crash dump file collection

When an S-TAP, database, or server crashes, a dump file is required to determine the root cause. Without
a dump file, the exact root cause can often not be determined with certainty. Work with server or database
administrators to ensure the crash dump is collected. If there is any doubt for how to gather the files, use
the following technote as a reference:
How to collect core dumps if Guardium UNIX S-TAP is impacting the database or database server

If a dump file cannot be collected, Support will work on a best effort basis to find the root cause. In such
cases, a new S-TAP version might be provided without knowing if it will resolve the issue. If the problem
returns with the new S-TAP version, the system administrator should be prepared to gather crash dumps.

Crash dump file vendor analysis

When a database or server crashes, open a support case to the vendor so that they can provide their
analysis of the crash. If they discover that Guardium is the cause, send the full vendor analysis to the
Guardium support case.

Guard_monitor

Guard_monitor, or S-TAP Watchdog, is a process that can monitor the S-TAP performance and take
automatic actions. For example, the S-TAP can be stopped, diagnostics run, and core dump gathered
when the S-TAP reaches 50% of server CPU or when the S-TAP is not responding. For more information,
see S-TAP Monitor.

Important: Guard_monitor can automatically stop or restart the S-TAP. Before starting Guard_monitor in a
production environment, carefully review and test all the settings.

Guard_diag

Guard_diag is a script that gathers detailed troubleshooting information about the system and installed
agents. It should be provided for any issue involving UNIX S-TAP.
Stopping or uninstalling the S-TAP

Before stopping or uninstalling S-TAP, review this write-up in full and consider whether appropriate logs
were collected to troubleshoot the problem. If appropriate logs are not collected, it might not be possible
to determine the root cause with S-TAP stopped.

If stopping the S-TAP is required, follow steps here - How the Guardium S-TAP Process is handled
throughout OS versions. In most cases, this stops problems.

If required, uninstall S-TAP using one of the following processes:


• GIM instructions
• Non GIM instructions.
If S-TAP is uninstalled, reboot the DB Server before reinstalling.

S-TAP version and APARs

If the problem is caused by a defect in the S-TAP and the latest version is not installed, an APAR might
resolve the problem. APARs report known defects in the product and fixed versions. Search for key terms
such as., "Guardium Linux crash" on the support portal and check if any APARs exist.
Specific problem considerations and logs to gather
S-TAP process crashing
This problem is not likely to impact database or server performance but will cause a monitoring outage. If
S-TAP is crashing, Guard_monitor can be configured to collect the crash dump. Gather these logs before
contacting Support:
• guard_diag while the S-TAP is installed and soon after the crash happens
• S-TAP process crash dump
• Timing of the crashes. Is it correlated to any other event?

High CPU or memory usage of S-TAP on the database server


S-TAP process can use several CPU cores depending on the number of K-TAP threads. The default is one
thread, the maximum is five. Therefore, it is important to consider the total server CPU % of the S-TAP. For
example, for a 128-core server, the S-TAP might be using 100% of one CPU, which is 1/128th of the total
server CPU (0.8%) which is well within the normal range.

S-TAP debug logging causes an increase in S-TAP CPU or memory usage. When used intentionally for
troubleshooting, this increase is not a cause for concern. But if it is enabled accidentally or never disabled,
the CPU or memory usage will appear higher than normal. Ensure that tap_debug_output_level=0 is in the
guard_tap.ini unless specifically used for troubleshooting.

CPU and memory usage cases are good candidates for guard_monitor so automatic action can be taken
when the problem happens. Gather these logs before contacting Support:
• At least one S-TAP process dump, triggered manually on the server when the CPU/memory is high.
Commands to trigger a process dump vary between OS. To confirm exact steps in your
environment, contact the server admin.
• guard_diag taken when the CPU/Memory is high
• Timing of problem. Is it correlated to any other event?

Database crashing
This problem can only be caused by Guardium if A-TAP or EXIT are in use. In some cases, Guardium is
suspected, but database vendor analysis shows another cause. Gather these logs before contacting
Support:
• Database crash dump
• Database vendor analysis
• guard_diag
• Timing of the crashes. Is it correlated to any other event? Especially consider A-TAP or EXIT related
actions.

Availability or latency problems when accessing database


Latency or connectivity issues are often caused by configuration, especially if S-GATE blocking or Query
Rewrite (QRW) functionality is configured. Ensure you understand the latency tradeoffs involved with these
features. Review your configuration to check if too many sessions are attached, which causes latency.
Turning firewall_installed or query_rewrite_installed to 0 (off) is a good test to see if they
are the cause. If latency goes away when they are off, that is the cause.

Also check that that ktap_fast_tcp_verdict=1 and ktap_fast_shmem_verdict=1. When set to 0,


these settings can cause latency.
Gather these logs to before contacting Support:
• guard_diag when latency is happening
• S-TAP process core dump when latency is happening
• If S-GATE or QRW used, run slon capture through the CLI ‘support store slon’ command with
the 'sgate' option containing problem sessions causing latency
• Sniffer must gather
• Detailed description of the problem, including when it happens and any specific triggers

Server OS crash or hang


This problem can only be caused by Guardium if K-TAP is loaded. If possible, get guard_diag as soon as
server comes up. In such cases, it is critical to gather all the logs before uninstalling S-TAP.

Gather the following logs before contacting Support:


• Server crash dump must be collected for root cause analysis
• Server vendor analysis
• guard_diag

If the server is crashing immediately when booting and no actions are possible, special steps are required:
1. Boot into single user mode or failsafe mode for Solaris.
2. In the guard_tap.ini set ktap_installed=0. This will prevent opening K-TAP device and capturing
traffic on next boot, preventing most crash issues.
3. Reboot the server to normal mode. If the server no longer crashes, proceed to step 5.
4. If the server is still crashing, take further actions to prevent the K-TAP loading on the next boot,
preventing rarer crashes. Exact actions per OS are documented in table 1 below. Reboot the server
to normal mode after the action.
5. Gather logs. The guard_diag prompts for the new install directory if the name has changed.
• Files to collect - Server syslog, central_logger.log, all module '.log' files e.g., <install
directory>/STAP/current/STAP.log, <install directory>/KTAP/current/KTAP.log

OS Action
Rename Guardium install directory. For example:
Linux
mv /usr/local/guardium /usr/local/guardium_temp
Rename /etc/drivers/guardium directory. For example:
AIX
mv /etc/drivers/guardium /etc/drivers/guardium_temp
Move /kernel/drv/ktap*.conf file to a new directory. For example:
Solaris
mv /kernel/drv/ktap_107702.conf /kernel/drv/temp_dir/ktap_107702.conf
Move /stand/current/mod/ktap* file to a new directory. For example:
HPUX
mv /stand/current/mod/ktap* /stand/current/mod/temp_dir/
Table 1 - Actions to prevent K-TAP loading

Summary
In most cases, Guardium Support and development will resolve these types of problems. Case
resolutions can be found without contacting Support include:
• Configuration problems e.g., SGATE causing latency
• Vendor analysis shows crashes not caused by Guardium
• Existing issue hit and resolved by upgrading to latest S-TAP

You might also like