0% found this document useful (0 votes)
67 views

Node Reset&Add

The document discusses issues with Rubrik cluster node resets in versions prior to 5.2.0-p2. It provides steps to resolve a "/var/log is not a mountpoint" error during resets, including rebooting the node, checking logs, and running the sdreset script with various options like PRESERVE_HDD and KEEP_BROADCAST_INTERFACE to prevent data loss or network interruptions during an in-place node replacement. It also includes troubleshooting tips if the reset hangs or fails validation checks.

Uploaded by

NAGARJUNA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views

Node Reset&Add

The document discusses issues with Rubrik cluster node resets in versions prior to 5.2.0-p2. It provides steps to resolve a "/var/log is not a mountpoint" error during resets, including rebooting the node, checking logs, and running the sdreset script with various options like PRESERVE_HDD and KEEP_BROADCAST_INTERFACE to prevent data loss or network interruptions during an in-place node replacement. It also includes troubleshooting tips if the reset hangs or fails validation checks.

Uploaded by

NAGARJUNA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 3

5.

2 reset issues:
mainly hitting this --> "/var/log is not a mountpoint"
https://rubrik.atlassian.net/wiki/spaces/~peter.abromitis/pages/961970665/
More+Upgrade+Woes+and+cluster+operations+we+ve+seen+in+CDM+5.2+and+later...#[hardBr
eak]SDRESET-on-5.2-prior-to-5.2.0-p2-is-stuck:
To avoid it start with reboot:
1. reboot
sudo systemctl --force --force reboot
2. reconnect, check logs, create .rubrik_install_in_progress so you won't be
disconnected during sdreset , remove lock/cron file if exists and sdreset:
sudo touch /home/ubuntu/.rubrik_install_in_progress
tail /tmp/sdtests/reset_node.out.txt
sudo touch /home/ubuntu/.rubrik_install_in_progress
sudo ls /home/ubuntu/.rubrik_install_in_progress

sudo ls /var/lib/rubrik/sdreset_lock
sudo ls /etc/cron.d/sdreset_crontab
sudo rm -rf /home/ubuntu/.rubrik_install_in_progress
sudo rm /var/lib/rubrik/sdreset_lock
sudo /opt/rubrik/src/scripts/dev/sdreset.sh
rkcli cluster reset_node force

keep_broadcast_interface
skip_ipmi_network_reset

sudo vim /opt/rubrik/src/scripts/dev/sdreset_custom.sh


sudo chmod 777 /opt/rubrik/src/scripts/dev/sdreset_custom.sh;
cd /opt/rubrik/src/scripts/dev/;
sudo touch /home/ubuntu/.rubrik_install_in_progress;
sudo ./sdreset_custom.sh;
sudo rm -rf /home/ubuntu/.rubrik_install_in_progress
sudo rm /opt/rubrik/src/scripts/dev/sdreset_custom.sh

For in-place node replacement preserve data:


date; time sudo PRESERVE_HDD=1 /opt/rubrik/src/scripts/dev/sdreset.sh
to keep BROADCAST domain intact:
date; time sudo KEEP_BROADCAST_INTERFACE=1 /opt/rubrik/src/scripts/dev/sdreset.sh
After sdreset is successfully completed:
mv /etc/cron.d/sdreset_crontab /etc/cron.d/.sdreset_crontab
If sdreset hangs for couple of minutes on "Waiting for cluster config and node
monitor to be available"
ctrl+c
sudo reboot
rerun sdreset

sudo /sbin/vconfig add bond0 132

MANAGEMENT NETWORK:
---------------------------------------
IP Address:
Subnet Mask:
Gateway IP:

DATA NETWORK(OPTIONAL)
---------------------------------------
IP Address
Subnet Mask:
IPMI NETWORK

IPMI IP Address:
---------------------------------------
Subnet Mask:
Gateway IP:

VLAN CONFIG:
---------------------------------------
VLAN ID:
VLAN IP:

rkcl exec all 'sudo /opt/rubrik/src/scripts/node-monitor/hw_health.sh 2> /dev/null


| grep -A2 FRU '| paste - - -

rubrik_tool.py create_route_config 0.0.0.0 0.0.0.0 10.133.232.1 bond0

tail -f /var/log/node-monitor/current | grep -ie 'grace\|consec'

rktail -f /var/log/health-monitor/current | grep -ie 'grace\|consec'

tail -f /var/log/node-monitor/current | grep -ie 'validation'

scp -6 trusted-certificates.pem rksupport@\


[fe80::3eec:efff:fe4f:a003%bond0.2226\]:/opt/rubrik/conf/release_signing/

scp -6 rubrik-image-8.0.0-p2-21860.tar.gz* rksupport@\


[fe80:0:0:0:3eec:efff:fe4e:ec07%bond0\]:/home/rksupport/

scp rubrik-image-7.0.3-p2-16069.tar.gz* [email protected]:/home/rksupport/

sudo service avahi-daemon start;avahi-browse -rat | grep -i -A1 rvm |grep -i -A2
ipv6 | grep -v _rubrik._tcp|grep -A2 ^=| grep -A2 bond0

scp trusted-certificates.pem
[email protected]:/opt/rubrik/conf/release_signing/

scp -6 trusted-certificates.pem
rksupport@\[fe80::3eec:efff:fe20:e3ce%bond0\]:/opt/rubrik/conf/release_signing/

rkcl exec
RVM184S002313,RVM183S049171,RVM184S014090,RVM183S049338,RVMHM181S004249,RVMHM185S00
2700,RVMHM185S001993,RVMHM188S004896 "ifconfig bond0 | grep -i inet6"

scp -6 /opt/rubrik/conf/release_signing/trusted-certificates.pem
rksupport@[fe80:0:0:0:3eec:efff:fe3a:baf3%bond0]:/opt/rubrik/conf/release_signing/
scp -6 /opt/rubrik/conf/release_signing/trusted-certificates.pem
rksupport@[fe80::3eec:efff:fe20:e3ce%bond0]:/opt/rubrik/conf/release_signing/

sudo /opt/rubrik/src/scripts/debug/FailJobTool.sh -jobId


STAGE_CDM_SOFTWARE_GLOBAL_253dade3-3f55-40a0-b3dc-3a97791a3257 -instanceId 0
cqlsh -ksd -e "select * from job_instance where
job_id='TIER_EXISTING_SNAPSHOTS@PARALLELIZABLE_EXECUTE_TIER_EXISTING_SNAPSHOTS_TIER
_EXISTING_SNAPSHOTS_253eb2e2-8f74-4494-a293-b19381150eba_72f91df5-1504-4362-8968-
916438185005###0_363' and instance_id=0"

sqlite3 /var/lib/rubrik/node_monitor_check_history.db "select id,


datetime(timestamp / 1000, 'unixepoch', 'localtime'), check_name, success from
check_history" | sort | cut -d\| -f2-| column -s\| -t | sed 's/0$/Fail/g' | sed
's/1$/Pass/g' | egrep -i 'OtherNodesChecker'

You might also like