SAP Startup Troubleshooting Guide For Netweaver Application Server
SAP Startup Troubleshooting Guide For Netweaver Application Server
Application Server
wiki.scn.sap.com/wiki/display/SI/SAP+Startup+Troubleshooting+Guide+for+Netweaver+Application+Server
Purpose
This document was designed to help SAP customers to troubleshoot and resolve the
majority of SAP system startup problems own their own, minimizing the business
impacts and costs attached to a standard support ticket life cycle. As time goes by, new
topics and startup error scenarios will be added to this guide. Stay tuned!
NEWS from December 2016: if SAP is running on AIX 7.1 or 7.2 and you are unable to
start SAP after updating the AIX, read the SAP note 2402207.
1. Preliminary Analysis
This initial checklist will approach mandatory settings that have proven to be root cause
of startup issues in several occasions. With this in mind, please *DO NOT* skip any
phases.
There is NO POINT in moving the analysis further while the return code
is different than zero.
In case the return code is different than zero, please involve your DBA/Network/OS
teams.
LD_LIBRARY_PATH=/usr/sap/<SID>/exe:$LD_LIBRARY_PATH;
export LD_LIBRARY_PATH;
1/16
/usr/sap/<SID>/<instance_nr>/exe/sapstartsrv
pf=/usr/sap/<SID>/SYS/profile/<SID>_<instance_nr>_<hostname> -D -u <SIDadm>
WINDOWS:
1. Run "services.msc";
2. Look for the relevant service SAP<SID>_<instance_nr>;
3. Check the command line assigned to it:
<DRIVE>:\usr\sap\<SID>\<instance>\exe\sapstartsrv pf=<DRIVE>:\usr\sap\
<SID>\SYS\profile\<START_PROFILE>
These start entries are mandatory in the profile being used by sapstartsrv process. You
can use the process list on UNIX and the service definition at WINDOWS to find out what
is the exactly profile currently assigned to sapstartsrv process.
Examples:
Entries needed for starting the dispatcher + work processes:
_DW=dw.sap$(SAPSYSTEMNAME)_$(INSTANCE_NAME)
Execute_0X=local rm -f $(_DW)
Execute_0X=local ln -s -f $(DIR_EXECUTABLE)/disp+work$(FT_EXE) $(_DW)
Start_Program_0X=local $(_DW) pf=$(_PF)
_MS = ms.sap$(SAPSYSTEMNAME)_$(INSTANCE_NAME)
Execute_0X = local rm -f $(_MS)
Execute_0X = local ln -s -f $(DIR_EXECUTABLE)/msg_server$(FT_EXE) $(_MS)
Restart_Program_0X = local $(_MS) pf=$(_PF)
_EN = en.sap$(SAPSYSTEMNAME)_$(INSTANCE_NAME)
Execute_0X = local rm -f $(_EN)
Execute_0X = local ln -s -f $(DIR_EXECUTABLE)/enserver$(FT_EXE) $(_EN)
Restart_Program_0X = local $(_EN) pf=$(_PF)
2/16
It is mandatory that SIDadm user should be able to log on at operating system level
*WITHOUT ANY* other prompts such as:
1. Company policies;
2. User policies;
3. Next scheduled downtimes/System news;
4. Logging in as another user and SUdoing to SIDadm;
This can cause several problems throughout different scenarios, but mainly on upgrade
procedures. Further details on SAP note 1301712.
Examples
As an example, we will use a centralized system with the following 03 instances running
on the same host (ASCS, PAS and DI). The following output should return for the
operating system level command:
(...) /usr/sap/<SID>/ASCS00/exe/sapstartsrv
pf=/usr/sap/<SID>/SYS/profile/<SID>_ASCS01_<hostname> -D
(...) /usr/sap/<SID>/DVEBMGS01/exe/sapstartsrv
pf=/usr/sap/<SID>/SYS/profile/<SID>_DVEBMGS00_<hostname> -D
(...) /usr/sap/<SID>/D02/exe/sapstartsrv
pf=/usr/sap/<SID>/SYS/profile/<SID>_D02_<hostname> -D
When sapstartsrv service is not running you could run manually the above
command for the desired instance
If the service doesn't start and nothing it's written in "sapstartsrv.log" it might
happen the binaries are damaged in the local exe folder. In this case has to be
replaced
WINDOWS:
1. You shoud first configure task manager output so it displays the command line
related to the selected process:
3/16
2. Run task manager → Switch to tab "Processes" → Menu "view" → "columns" → 'PID'
and 'COMMAND LINE';
3. Then using task manager tab "Processes", look for the instance sapstartsrv process
with similar:
<DRIVE>:\usr\sap\<SID>\<instance>\exe\sapstartsrv pf=<DRIVE>:\usr\sap\
<SID>\SYS\profile\<START_PROFILE>
Verify whether any process is already in "GREEN" or "YELLOW" status (e.g., the IGS).
All processes must have the "GRAY" status. Otherwise, the SAP Startup Agent
(sapstartsrv) will "ignore" the start command. Stop the instance first, so all processes are
stopped. Then, start the instance again.
1.7. PERMISSIONS:
UNIX:
"/TMP" DIRECTORY:
"/tmp" directory's permissions must match the following. The final 't' for indicates sticky
bit:
SAPSTARTSRV.LOG:
This file located on "/usr/sap/<SID>/<INSTANCE>/work/" should be owned by <SIDadm>
user. For specific cases when owned by root, sapstartsrv (running as <SIDadm> user) will
not be able to access the file and thus the system will fail to start. Following entry is seen
in "/var/messages":
4/16
Unable to open trace file sapstartsrv.log. (Error 13 Permission denied)
SAPUXUSERCHECK FILE
The permissions for this file should look like this:
HOSTNAME
What IP is resolved for the local hostname (output of OS level command niping -v -H
<hostname>)? There is a known error described in SAP note 1054467 where the
loopback IP 127.0.0.1 is wrongly resolved for it.
5/16
The main approach is to begin for the latest updated files, based on this image's timeline.
As an example, in case the work processes traces (dev_w##) are not updated after the
latest failed start attempt, then the dispatcher trace (dev_disp) needs to be verified and
so on.
3. DISPATCHER (dev_disp):
1. Dispatcher is usually the first and most important trace file analyzed for startup
issues
2. Most of the times the reason for the startup failure will be here.
In the dispatcher trace, we'll initially look for critical error situations, which directly
contributed to the startup fail. If such straightforward occurrences cannot be found, then
we look for any situation which would, indirectly, lead to the same situation. As an
example, another component has crashed, therefore dispatcher has to shut itself down.
Example of error where it's possible to see the cause right away:
***LOG Q0K=> DpMsAttach, mscon ( <server_name>) [dpMessageSer 1652]
*** DP_FATAL_ERROR => DpMsAttach: local hostname '<server_name>' is resolved to
loopback address (cf. SAP note 1054467 for details)
*** DISPATCHER EMERGENCY SHUTDOWN ***
Example of dispatcher startup error caused by the crash of all work processes. The very
next step in this case is to check the "dev_w#" traces and find out what is killing the
disp+work processes and, ultimately, what is the cause of the instance failure:
*** ERROR => DpHdlDeadWp: W64 (pid 1898) died (severity=0, status=65280) [dpxxwp.c
1739]
DpTraceWpStatus: child (pid=1898) exited with exit code 255
6/16
*** DP_FATAL_ERROR => DpWPCheck: no more work processes
*** DISPATCHER EMERGENCY SHUTDOWN ***
*** IMPORTANT:
--> Whenever suspicious about a dispatcher crash and the following entries are found:
Please keep in mind that a manual shut down happened. This can be proved by the
"signal 2" entry and also by the "< (normal)" sign, next to the shut-down sign.
In this list we bring the most common causes for a "dispatcher crash", along with its
resolution:
I. "SI_EPORT_INUSE":
These error entries are preventing the dispatcher from initialize and they are not caused
by SAP but network inconsistencies. At the time dispatcher is trying to start and bind its
default port 32<instance_number> it finds that there is already a process binding this
port, thus, it aborts logging below entries:
***LOG Q0I=> NiIBindSocket: bind (10013: WSAEACCES: Permission denied) [nixxi.cpp 3740] <
Problem...
*** ERROR => NiIBindSocket: SiBind failed for hdl 9/sock 1292
(SI_EPORT_INUSE/10013; I4; ST; 0.0.0.0:3202) [nixxi.cpp 3740]
*** ERROR => DpCommInit: NiBufListen (rc=-4) [dpxxdisp.c 10223]
*** DP_FATAL_ERROR => DpSapEnvInit: DpCommInit
*** DISPATCHER EMERGENCY SHUTDOWN ***
RESOLUTION:
You must involve your network team and find out what process is listening on port
32<instance_number> and ensure that the dispatcher port is free at the time it is
starting up. As an example, in UNIX you can try the OS command "netstat -an | grep
32<instance_number>" when searching for a process listening on that port.
7/16
Following error is seen for scenarios where the path release of dispatcher and message
server is not the same. This can happen for couple of reasons but the major number of
occurrences is related to issues on SAPCPE which failed to copy newest files from the
GLOBAL EXE folder to the INSTANCE EXE folder, resulting in mixed releases.
RESOLUTION:
You must ensure that all instances are running with the same kernel release. As a
potential fix, you can copy manually the files from the global "EXE" folder to the instance
"EXE" folder.
OR
*** ERROR => SHM2_EsInit: invalid argument ES/SHM_MAX_SHARED_SEGS=2. Valid range is 1 ... 1d
[esuxshm2.c 1381]
*** ERROR => DpEmInit: EmInit failed (1) [dpInit.c 1475]
*** ERROR => DpMemInit: DpEmInit (-1) [dpInit.c 1339]
*** DP_FATAL_ERROR => DpSapEnvInit: DpMemInit
*** DISPATCHER EMERGENCY SHUTDOWN ***
RESOLUTION:
There are a few interconnected memory parameters that must reach each other
requisites. Mainly, following SAP notes have to be consulted:
8/16
ES/SHM_MAX_SHARED_SEGS <= ES/SHM_PROC_SEG_COUNT - ES/SHM_MAX_PRIV_SEGS
ES/SHM_PROC_SEG_COUNT >= ES/SHM_MAX_SHARED_SEGS + ES/SHM_MAX_PRIV_SEGS
IV. SI_EADDR_NAVAIL:
***LOG Q0I=> NiIBindSocket: bind (68: Can't assign requested address) [nixxi.cpp 3239]
*** ERROR => NiIBindSocket: SiBind failed for hdl 0 / sock 7
(SI_EADDR_NAVAIL/68; I4; DG; xx.y.zzz.ww:3200) [nixxi.cpp 3239]
*** ERROR => DpCommInit: NiDgHdlBindName failed: -16 [dpxxdisp.c 10263]
*** DP_FATAL_ERROR => DpSapEnvInit: DpCommInit
*** DISPATCHER EMERGENCY SHUTDOWN ***
RESOLUTION:
See PRELIMINARY CHECKLIST above, topic [1.7] and ensure that "localhost" is resolved
as loopback IP "127.0.0.1" and vice-versa.
Dispatcher is shutting itself down because there are no more work processes available
for processing of requests.
In order to pinpoint the cause, work processes traces have to be analyzed. Just follow
the initialization entries and soon the ERROR entries with the root cause will be seen.
RESOLUTION:
Whenever this error is the root cause for a dispatcher crash, it means that in this very
same trace you'll see lots of entries related to the crash of every single work process.
These entries look like this:
------------------------------
*** ERROR => DpHdlDeadWp: W0 (pid 21104) died (severity=0, status=65280)
[dpxxwp.c 1727]
DpTraceWpStatus: child (pid=21104) exited with exit code 255
------------------------------
We will then search in the dispatcher trace for the very first occurrence of work
process that crashed with entries similar to the one above.
There we see that the work process 0 (zero) died. Then we need to change the
9/16
investigation to the work process trace. In this case it will be the "dev_w0", located at the
work directory of the instance.
*** ERROR => not allowed to connect to message server via port XXXX
[dpxxdisp.c 12175]
*** ERROR => Please check your configuration (profile parameter
rdisp/msserv_internal) [dpxxdisp.c 12176]
DpHalt: shutdown server >PTSAPT01_QS1_03< (normal)
RESOLUTION:
Straightforward. Dispatcher shut down as the gateway reader could not be started.
RESOLUTION:
Gateway trace file "dev_rd" should be checked for the rootcause. A very common cause
is the unavailability of the gateway's default port 33<instance_NR> during instance
startup.
If this is the case, the following entries are seen in "dev_rd":
***LOG Q0I=> NiIBindSocket: bind (226: Address already in use) [nixxi.cpp 3738]
*** ERROR => NiIBindSocket: SiBind failed for hdl 1/sock 11
(SI_EPORT_INUSE/226; I4; ST; 0.0.0.0:3310) [nixxi.cpp 3738]
*** In this case the instance number was 10, hence "0.0.0.0:3310".
If you face the same situation, you have to involve your network/OS teams and find
out what process has already bound the port.
10/16
VIII. Es2IResCreateShm: shmget failed:
This issue is mainly seen on SAP systems based on HP-UX servers.
*** ERROR => Es2IResCreateShm: shmget failed (size= 17980981248. Check OS kernel parameter
shmmax. (22: Invalid argument) [es2ux.c 219]
*** ERROR => Es2ResCreateFiles: Es2IResCreate failed [es2xx.c 1324]
*** ERROR => Es2ResCreate: Es2ResCreateFiles failed [es2xx.c 1507]
*** ERROR => DpEmInit: Es2ResCreate (1) [dpInit.c 1642]
*** ERROR => DpMemInit: DpEmInit (-1) [dpInit.c 1417]
*** DP_FATAL_ERROR => DpSapEnvInit: DpMemInit
*** DISPATCHER EMERGENCY SHUTDOWN ***
RESOLUTION:
You must correctly set the OS kernel parameters as per below SAP note:
This error is present only as of kernel 740. The SAP Kernel release 740 uses a multicast
address at the network 224.0.0.0/24 to dispatch tasks to the work processes.
When required settings related to multicast are not in place, the instance will not start
and in the "dev_disp.new" trace file, the following entries are logged:
RESOLUTION:
--> For customers with kernel 740, the following notes guide you on how to set up
multicast in a correct way:
--> For customers with kernel equal of higher than 741 PL 47, SAP does reccommend
the usage of a new implementation based on events.
This feature is enabled with the following parameter (which only exists as of kernel
741 PL 47):
rdisp/queue/use_events_for_dispatching=on
--> This parameter's default value is ON starting from SAP kernel 742.
*** ERROR => e=22 semop(3145913,(0,-1,6144),1) (22: Invalid argument) [semux.c 577]
*** ERROR => DpITmSlotAllocate: SemRq [dpxxtool2.c 4333]
*** ERROR => e=22 semop(3145913,(0,-1,6144),1) (22: Invalid argument) [semux.c 577]
*** DP_FATAL_ERROR => DpTmDisconnect: SemRq
*** DISPATCHER EMERGENCY SHUTDOWN ***
RESOLUTION:
As these entries are related to a crash other than failed startup attempts, the
resolution would be to start the instance once again.
The easiest way to reproduce this scenario (and most common cause of the
crash) is by removing the instance's IPC keys on runtime wich, by obvious reasons,
*SHOULD NOT BE DONE*.
*** ERROR => EgInit: requested segment size (105327362048 bytes) too large.Max size =
34359738367 bytes [egxx.c 244]
*** ERROR => DpEmInit: EmInit (1) [dpxxdisp.c 10757]
*** ERROR => DpMemInit: DpEmInit (-1) [dpxxdisp.c 10671]
*** DP_FATAL_ERROR => DpSapEnvInit: DpMemInit
*** DISPATCHER EMERGENCY SHUTDOWN ***
RESOLUTION:
The instance will fail to start and log the above entries in case the EG configuration is
not maintained as per below SAP notes:
Hint
There are SAP documentation for most of the possible cases. Research on them.
SAPSTARTSRV (through SAPCONTROL interface) offers a variety of web methods that can
be used to monitor/manage the instances of the system. They are specially useful for
handling startup error scenarios. The "must-know" ones are listed below and also
the general pattern for executing them:
GetSystemInstanceList
OK
hostname, instanceNr, httpPort, httpsPort, startPriority, features, dispstatus
testserver001, 0, 50013, 50014, 3, ABAP|GATEWAY|ICMAN|IGS, GREEN
testserver001, 1, 50113, 50114, 1, MESSAGESERVER|ENQUE, GREEN
testserver001, 2, 50213, 50214, 3, ABAP|GATEWAY|ICMAN|IGS, GREEN
4.1: "GetSystemInstanceList":
This one returns the list of active instances for the system, along with component list
and the current status.
GetSystemInstanceList
OK
hostname, instanceNr, httpPort, httpsPort, startPriority, features, dispstatus
testserver001, 0, 50013, 50014, 3, ABAP|GATEWAY|ICMAN|IGS, GREEN
testserver001, 1, 50113, 50114, 1, MESSAGESERVER|ENQUE, GREEN
testserver001, 2, 50213, 50214, 3, ABAP|GATEWAY|ICMAN|IGS, GREEN
EXPECTED OUTPUT:
- One entry per instance of the system, including ASCS, SCS INSTANCES;
- GREEN values for all entries on column "dispstatus".
- For all instances running on the same host (column hostname), there must be
a differente value on "instanceNr".
13/16
4.2: "GetProcessList":
GetProcessList web method will bring the component list currently in place on any of the
instances seen in the first command. It will come in handy as soon as the web method
"GetSystemInstanceList" returns one instance reporting as YELLOW. This will happen
when at least one component is presenting some kind of probem, so now it is possible to
check that affected process with more details:
GetProcessList
OK
name, description, dispstatus, textstatus, starttime, elapsedtime, pid
disp+work, Dispatcher, GREEN, Running, 2015 03 23 23:51:35, 263:27:26, 25549
igswd_mt, IGS Watchdog, GREEN, Running, 2015 03 23 23:51:35, 263:27:26, 25550
gwrd, Gateway, GREEN, Running, 2015 03 23 23:51:36, 263:27:25, 25590
4.3: "GetVersionInfo":
Great for kernel consistency checks, this output will return the current kernel release +
patch for each and every of its components within the instance informed as argument in
the command line.
Excluding the DB* libraries, all the remainning components of the instance must
always share the same kernel and patch release on a consistent scenario.
14/16
GetVersionInfo
OK
Filename, VersionInfo, Time
/usr/sap/WSO/DVEBMGS00/exe/sapstartsrv, 742, patch 28, changelist 1540128, RKS compatibility
level 0, optU (Nov 26 2014, 19:45:41), linuxx86_64, 2014 11 26 19:29:07
/usr/sap/WSO/DVEBMGS00/exe/disp+work, 742, patch 28, changelist 1540128, RKS compatibility
level 0, optU (Nov 26 2014, 19:45:41), linuxx86_64, 2014 11 26 19:36:32
/usr/sap/WSO/DVEBMGS00/exe/gwrd, 742, patch 28, changelist 1540128, RKS compatibility level 0,
optU (Nov 26 2014, 19:45:41), linuxx86_64, 2014 11 26 18:48:09
/usr/sap/WSO/DVEBMGS00/exe/msg_server, 742, patch 28, changelist 1540128, RKS compatibility
level 0, optU (Nov 26 2014, 19:45:41), linuxx86_64, 2014 11 26 18:48:09
/usr/sap/WSO/DVEBMGS00/exe/dboraslib.so, 742, patch 5, changelist 1503525, RKS compatibility
level 0, dbgU (Jun 13 2014, 23:23:51), linuxx86_64, 2014 12 05 17:53:41
/usr/sap/WSO/DVEBMGS00/exe/dbmssslib.so, 742, patch 28, changelist 1540128, RKS compatibility
level 0, optU (Nov 26 2014, 19:45:41), linuxx86_64, 2014 11 26 18:47:06
/usr/sap/WSO/DVEBMGS00/exe/dbdb2slib.so, 742, patch 28, changelist 1540128, RKS compatibility
level 0, optU (Nov 26 2014, 19:45:41), linuxx86_64, 2014 11 26 18:47:03
/usr/sap/WSO/DVEBMGS00/exe/dbdb4slib.so, 742, patch 28, changelist 1540128, RKS compatibility
level 0, optU (Nov 26 2014, 19:45:41), linuxx86_64, 2014 11 26 19:29:51
/usr/sap/WSO/DVEBMGS00/exe/dbdb6slib.so, 742, patch 28, changelist 1540128, RKS compatibility
level 0, optU (Nov 26 2014, 19:45:41), linuxx86_64, 2014 11 26 18:47:06
/usr/sap/WSO/DVEBMGS00/exe/dbsybslib.so, 742, patch 28, changelist 1540128, RKS compatibility
level 0, optU (Nov 26 2014, 19:45:41), linuxx86_64, 2014 11 26 18:47:08
/usr/sap/WSO/DVEBMGS00/exe/enserver, 742, patch 28, changelist 1540128, RKS compatibility
level 0, optU (Nov 26 2014, 19:45:41), linuxx86_64, 2014 11 26 18:47:48
/usr/sap/WSO/DVEBMGS00/exe/icman, 742, patch 28, changelist 1540128, RKS compatibility level
0, optU (Nov 26 2014, 19:45:41), linuxx86_64, 2014 11 26 18:48:12
/usr/sap/WSO/DVEBMGS00/exe/sapwebdisp, 742, patch 28, changelist 1540128, RKS compatibility
level 0, optU (Nov 26 2014, 19:45:41), linuxx86_64, 2014 11 26 18:48:07
/usr/sap/WSO/DVEBMGS00/exe/jcontrol, 742, patch 28, changelist 1540128, RKS compatibility level
0, optU (Nov 26 2014, 19:45:41), linuxx86_64, 2014 11 26 18:49:16
/usr/sap/WSO/DVEBMGS00/exe/jlaunch, 742, patch 28, changelist 1540128, RKS compatibility level
0, optU (Nov 26 2014, 19:45:41), linuxx86_64, 2014 11 26 19:08:36
/usr/sap/WSO/DVEBMGS00/exe/jstart, 742, patch 28, changelist 1540128, RKS compatibility level 0,
optU (Nov 26 2014, 19:45:41), linuxx86_64, 2014 11 26 19:28:59
EXPECTED OUTPUT:
- All entries must share the same kernel release and patch. In this case 742 patch 28
(with the exception of the ones starting with db*).
Hint
There are hundreds of web methods available for the most diverse cases. They
can be listed by typing just "sapcontrol" command as SIDadm user.
Hint
These instance management commands can help to quickly find out version and
specific component details.
15/16
CONCERNING THIS DOCUMENT...
The original concept of this document is that it'll be never finished. Instead, constantly
improved, based on the incoming feedback. For this reason, you are invited to share
whether or not the information present here helped you to solve a startup problem and,
even more importantly, in case it didn't. In that case you are welcome to share your own
scenario and inform what was the solution applied, constraints, specifics.
16/16