0% found this document useful (0 votes)
110 views

DP4400 Monitoring - Troubleshooting - v1.0

Uploaded by

Huy Taxuan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
110 views

DP4400 Monitoring - Troubleshooting - v1.0

Uploaded by

Huy Taxuan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 19

DP4400

Health Monitoring and Troubleshooting

Sep 2018

1 © Copyright 2017 Dell Inc.


DP 4400 Health Monitoring

                 Internal Use - Confidential


DP 4400 Health Monitoring
• DP 4400 uses powertools agent for receiving SNMP traps for hardware events
– Powertools agent is packaged and installed in factory as a part of DP4400 ESXi image
– Configured to send SNMP traps to ACM private IP 192.168.100.100
– Powertools configuration file is present on ESXi at /scratch/dell/config/PTAgent.config

• Customer need not do anything manually to receive SNMP traps

• Dependency on iDRAC IP has been removed

• Along with hardware events, events for vCenter, ESXi and IDPA VMs will be seen in Health tab of ACM UI

• Last 30 days events will be persisted

• Only “critical” and “fatal” events will be sent to ESRS if ESRS is configured

3 © Copyright 2017 Dell Inc.


DP 4400 Health Monitoring
• vCenter events are fetched by ACM periodically every 20 min

• vCenter events window could be changed if required, by changing the parameters in below file on ACM

vi /usr/local/dataprotection/customscripts/HealthmonitorConfig.properties
shm.max_no_of_event=30
shm.max_time_in_min=30
shm.days_to_delete_event=30
shm.snmp_validator_period_in_mins=240
shm.snmp_validator_delay_in_mins=1
shm.vcenter_events_period_in_mins=20
shm.vcenter_events_delay_in_mins=0

• Restart the ACM service to make the change effective


service dataprotection_webapp restart

4 © Copyright 2017 Dell Inc.


DP 4400 Health Monitoring
• Demo

• 7a. DP4400-Health-Monitoring.mp4

5 © Copyright 2017 Dell Inc.


DP4400
Troubleshooting
Troubleshooting – Log location
Functionality Log location

ACM/Component products - /usr/local/dataprotection/var/configmgr/server_data/logs/server.log


Deployment, Configuration and
Network configuration, Health
Monitoring

Appliance Shutdown /usr/local/dataprotection/var/configmgr/server_data/logs/ShutdownActivity.log

Upgrade logs (upgrade in progress ) /data01/tmp/patch/logs

Upgrade logs ( post upgrade ) /data01/upgradeLogs

7 © Copyright 2017 Dell Inc.


Troubleshooting – Diagnostic Report

Error count and warning count

Diagnostic report with description of errors and warnings

8 © Copyright 2017 Dell Inc.


Troubleshooting- Initial Network Configuration Failures

• Possible causes
• IP is already in use
• Wrong gateway or subnet mask provided

• Troubleshooting
• Look for diagnostic report on ACM UI to figure out cause of error
• If diagnostic report is not clear then, look for “Failed” keyword in
“/usr/local/dataprotection/var/configmgr/server_data/logs/server.log” file to trace down the step which failed.
• Click on retry to redo the configuration post correcting the issue

• 7b. DP4400-Initial-network-configuration-failure

9 © Copyright 2017 Dell Inc.


Troubleshooting- Component Deployment & Configuration Failures

• Possible causes
• Wrong IP input provided when IP range not used
• Wrong gateway or subnet mask provided
• License is incorrect

• Troubleshooting
• Look for diagnostic report on ACM UI to figure out cause of error
• Search the logs for ‘Received notifyStatus from the task’ statements. This will lead to the log statements, where the
configuration progress is reported for various steps being executed by multiple tasks.
• You can see status of tasks/step in IN_PROGRESS/FAILED/COMPLETED state. If you find some FAILED task
log, look at the log statements of same thread, just before this statement, to see the exact failure. Thread id in below
logs is pool-2-thread-2.

2017-03-27 11:00:45,217 INFO [pool-2-thread-2]-abstracts.ProductPlugin: Received notifyStatus from task :


DATA_DOMAIN:CONFIG:null:0:FAILED:4:10%:1:0

10 © Copyright 2017 Dell Inc.


Troubleshooting- Component Deployment & Configuration Failures

11 © Copyright 2017 Dell Inc.


Troubleshooting- Component Deployment & Configuration Failures

• Every product deployment will have similar messages in ACM


logs. Below example shows tasks for DDVE
• DDVE Configuration Task Logs
• Start Log: Executing Data Domain config task.
• Completion Log: Execution of Data Domain config task completed.
• Error Log: ApplianceException occurred while executing Datadomain config task .

• DDVE Deployment Task Logs


• Start Log: Executing ddve deployment task.
• Completion Log: DDVE deployed successfully.
• Error Log: Exception occurred while executing deploy Ddve task.
• DDVE VAPP Task Logs
• Start Log: Executing deploy DD vApp task.
• Completion Log: Exception occurred while executing deploy DD vApp task.
• Error Log: successfully executed command: reg set config_master.elms_locking_id_use_psnt=tru

12 © Copyright 2017 Dell Inc.


Troubleshooting- Component Deployment & Configuration Failures

• If DDVE license failed to apply, check whether locking ID mentioned in the license matches the
appliance serial number.

• If it’s non-critical component (DPA or DPS or CDRA ), then you can retry deployment from ACM
Dashboard

• If its critical component (DDVE, AVE or DPC), retry the configuration post correcting the issue

• 7c. DP4400-DPS-failure.mp4

13 © Copyright 2017 Dell Inc.


Troubleshooting- Product Integration Failures

• Possible causes
• Most of the cases could be related to timing issue

• Troubleshooting
• Look for diagnostic report on ACM UI to figure out cause of error
• Integration tasks have messages as shown in below example of server.log file on ACM
• Start Log: Executing IntegrateCDRATask
• Completion Log: Execution of IntegrateCDRATask completed.
• Error Log: Exception occurred while executing IntegrateCDRATask.

14 © Copyright 2017 Dell Inc.


Troubleshooting – Server Health Monitoring

• If you see a message “Health monitor processes are down“ on Health Monitoring UI
• Check services status on ACM
service dataprotection_database status
service rabbitmq-server status
• If the service status is not active, restart the respective service

• If ACM UI is not receiving hardware events then check


• Powertools agent service status on ESXi. It should be in running state.

/etc/init.d/DellPTAgent status
• If its not in running state, start the service
/etc/init.d/DellPTAgent start

15 © Copyright 2017 Dell Inc.


Troubleshooting – ESRS registration failures

• If you see below error message while ESRS registration,


• exception com.avamar.asn.service.ServiceException: Failed to register to ESRS
: Device match not found for input device with Serial Number ELMAVM02180WRL
and Product AVAMAR-GW

• Verify product serial number is present in production database . Use link


http://sdgweb01/ESRSGSReports/Applications/ESRS3/DeviceExtract.aspx to verify database

• If Database entry does not exist then contact Install Base team

• If database entry exists and ESRS registration fails then check server.log on ACM to get more
information around error

16 © Copyright 2017 Dell Inc.


Troubleshooting – vCenter and ESXi

• As a part of DP4400 Deployment and Configuration, a new user “idpauser” gets created on
vCenter as well as ESXi.

• This is limited privilege user with password set to common password provided by customer
• VCSA: idpauser permissions:
"VirtualMachine.Interact.ToolsInstall", "VirtualMachine.Interact.PowerOff", "VirtualMachine.Interact.PowerOn",
“VirtualMachine.Interact.Reset", "VirtualMachine.Interact.DeviceConnection", "VirtualMachine.Interact.Suspend",
"VirtualMachine.Interact.ConsoleInteract","VirtualMachine.Interact.AnswerQuestion", "Global.LogEvent",
"Global.Diagnostics","System.Read“
• ESXi: idpauser permission:
"System.View", "System.Read", "System.Anonymous", "Host.Config.Maintenance", "Host.Config.AutoStart",
"Host.Config.Network", "Host.Config.NetService", "VirtualMachine.Interact.PowerOn", "VirtualMachine.Interact.PowerOff",
"VirtualMachine.Provisioning.ReadCustSpecs", "VirtualMachine.Interact.Reset",
"VirtualMachine.State.RemoveSnapshot", "VirtualMachine.Interact.ConsoleInteract", "VApp.PowerOn", "VApp.PowerOff",
"Alarm.Acknowledge", "Network.Assign", "Network.Config", "Network.Delete", "Network.Move"

17 © Copyright 2017 Dell Inc.


Troubleshooting – vCenter and ESXi

• Root user password for both vCenter and ESXi is randomly generated
• ACM saves VCSA and ESXi root user passwords in encrypted format.
• VCSA root password is stored in file
/usr/local/dataprotection/var/configmgr/server_data/config/componentCredentials.xml
<vCenterPassword>CIPHER_TEXT</vCenterPassword>

• ESXi root password is stored at


/usr/local/dataprotection/var/configmgr/server_data/config/InfrastructureComponents.xml
<ESXi>
<password isEncrypted="true">CIPHER_TEXT</password>
</ESXi>

• Support can use decryption utility and run it on ACM to decrypt the password if required for
troubleshooting

18 © Copyright 2017 Dell Inc.

You might also like