Tools To Aid Debugging On AIX
Tools To Aid Debugging On AIX
Back to top
AIX environment
The first thing we start with when a problem appears is the environment: the operating system
version and the hardware in use. This is an important step because you might want to check if
you have a reproducible environment where you can debug, or you may want to recreate the
exact environment.
System configuration
The following commands display the version, release, and maintenance levels of AIX.
Listing 2. AIX version, release, and maintenance levels
# instfix -i|grep AIX_ML
All filesets for 5.3.0.0_AIX_ML were found.
All filesets for 5300-01_AIX_ML were found.
All filesets for 5300-02_AIX_ML were found.
All filesets for 5300-03_AIX_ML were found.
All filesets for 5300-04_AIX_ML were found.
All filesets for 5300-05_AIX_ML were found.
All filesets for 5300-06_AIX_ML were found.
All filesets for 5300-07_AIX_ML were found.
# lslpp -h bos.rte
Fileset Level Action Status Date Time
----------------------------------------------------------------------------
Path: /usr/lib/objrepos
bos.rte
5.3.0.50 COMMIT COMPLETE 10/17/07 16:34:57
5.3.0.60 COMMIT COMPLETE 03/11/08 16:08:59
5.3.7.0 COMMIT COMPLETE 03/12/08 11:28:55
# oslevel -r
5300-07
System uptime
#uptime
05:16PM up 2 days, 1:36, 4 users, load average: 1.95, 1.90, 1.80
Back to top
If a program is terminated, depending on the termination type, a core file could have been
generated. A core file is the image of a terminated process — a dump of everything in memory at
the time of the crash. A core file is generated when any of the following occurs:
SIGQUIT — Quit
SIGILL — Invalid instruction
SIGTRAP — Trace trap
SIGIOT — End process
SIGEMT — EMT instruction
SIGFPE — Arithmetic exception, integer divided by 0, or floating-point exception
SIGBUS — Specification exception
SIGSEGV — Segmentation violation
SIGSYS — Parameter not valid to subroutine
Core files are not always generated when an application crashes, or they may be incomplete. If
this occurs, you may need to enable core file dumps or increase the core file size.
#ulimit -c
This command displays the current value, called the soft limit, of the core file size for the shell,
which is applicable for all processes started from that shell. If it is zero, run the following
command to increase it to its maximum value, called the hard limit: #ulimit -c <val>.
#ulimit -Hc
Edit the /etc/security/limits file and change <value> for soft and hard core size, respectively:
Attributes of interest:
Use the chcore command to change the settings and lscore to view the current core settings.
The gencore utility creates a core image of each specified process. It can be then used with a
debugger like dbx.
The snapcore command gathers the core file, program, and libraries used by the program, then
compresses the information into a PAX file. The file can then be transmitted to a debug
environment, and can be used to identify and resolve a problem with the application.
Determine where the core file is created and which program caused it
If a core file has been created, there should be an error log entry logged by the error-logging
process, which is usually started when the first software failure occurs.
1. Retrieve the error log
# errpt -a
LABEL: CORE_DUMP
IDENTIFIER: C69F5C9B
Description
SOFTWARE PROGRAM ABNORMALLY TERMINATED
Probable Causes
SOFTWARE PROGRAM
User Causes
USER GENERATED SIGNAL
Recommended Actions
CORRECT THEN RETRY
Failure Causes
SOFTWARE PROGRAM
Recommended Actions
RERUN THE APPLICATION PROGRAM
IF PROBLEM PERSISTS THEN DO THE FOLLOWING
CONTACT APPROPRIATE SERVICE REPRESENTATIVE
Detail Data
SIGNAL NUMBER
11
USER'S PROCESS ID:
765972
FILE SYSTEM SERIAL NUMBER
8
INODE NUMBER
352516
CORE FILE NAME
/opt/IBM/InformationServer/Server/Projects/sample1/core
PROGRAM NAME
dsapi_slave
To display a detailed report of all errors logged in the past 24 hours, use the errpt
command, as follows:
# date
Fri Nov 13 18:18:33 IST 2009
# errpt -a -s 1112181809
The executable is located between the pipes on the right hand side of the
output and in
the case below, it is uvsh.
Run dbx on the binary executable that caused the core dump. This will display the offending call.
Listing sys0
Useful attributes:
Back to top
There are myriad tools on AIX for inspecting processes for application errors, hangs, and
crashes. We will discuss some of them here.
The following tools can be used to inspect the process or core in question. All the commands
start with proc<cmd>. Special care should be taken while inspecting a process in the production
environment since these tools actually stop the process while they inspect:
Watching a process
The command truss produces a trace of the system calls it performs, the signals it receives, and
the machine faults it incurs. By default, user-level functions are not traced. To enable tracing for
all user-level functions, use truss -u '*' -p <pid>.
Useful options:
To truss a command that runs as another user under SUID, you will not be allowed to do so
because the system identifies it as not belonging to your user. The following error displays:
Log in as the user whom you need to investigate and find the PID of your shell using the
ps command.
Start a new session as root and truss the shell session.
This new session will log all the activity in the original shell. Run the failing command
and stop the truss. The truss.out file can be investigated to find the failure.
In a typical database system environment or applications that have extensive usage of file
handling, it might be important to know the names of files owned by a process for debugging the
problem.
procfiles -n <pid>
ncheck - i <inode>
If client process status field is in FIN_WAIT state for long periods of time, or the server process
status field is in CLOSE_WAIT for a long time, the processes are said to be hanging, or a deadlock
could have occurred.
Socket-to-process ID mapping
Run netstat -Aan, where -A shows the address of any protocol control blocks associated with
the sockets.
Run kdb and issue sockinfo on the address for the socket in question.
Check the time field. If it is constant over time, a probable deadlock or hang could have
occurred.
#ps -mp <pid> -o THREAD
Back to top
Data-segment settings
The LDR_CNTRL environment variable controls the number of data segments a process can use.
The following example defines one additional data segment:
export LDR_CNTRL=MAXDATA=0x10000000
start the process
unset LDR_CNTRL
This value greatly affects some of the memory-related issues on AIX. MAXDATA controls the
amount of mallocd memory, and MAXDATA is changed using LDR_CNTRL=MAXDATA=0xN0000000
(where N equals the number of segments).
On 32-bit systems, the default address-space model is that it uses a single segment for user and
stack data with a maximum aggregate size close to 256 MB. If your application requires more
than that, a large or very large address-space model can be used by setting MAXDATA.
See AIX documents for more information about large program support.
The ldedit command can also be used to change the MAXDATA settings in the executable itself.
For 32-bit programs under the large address-space model, the maximum value allowed is
0x80000000; and under the very-large address-space model, it is 0xD0000000. For 64-bit
programs, any value can be specified, but the data area cannot extend 0x06FFFFFFFFFFFFF8.
The ps command reports mallocd memory and does not include mmapd memory. svmon reports
complete process memory utilization.
#export PSALLOC=early
To print information about active shared-memory segments, use: #ipcs -mop. To remove
shared-memory segments, use: ipcrm [ -m SharedMemoryID ] [ -M SharedMemoryKey ].
Back to top
Conclusion
You have learned about some tools that can be used in a customer environment that helps in
debugging problems. We have discussed a guided approach of debugging and some common
problem areas, along with available AIX tools.
Resources
Learn
Browse the technology bookstore for books on these and other technical topics.
Download IBM product evaluation versions or explore the online trials in the IBM SOA
Sandbox and get your hands on application development tools and middleware products
from DB2®, Lotus®, Rational®, Tivoli®, and WebSphere®.
Discuss
Check out developerWorks blogs and get involved in the developerWorks community.
Follow developerWorks on Twitter.