How to recover from a failed Linux Exadata DB Server dbnodeupdate or rollback (Doc ID 1952372.1)
How to recover from a failed Linux Exadata DB Server dbnodeupdate or rollback (Doc ID 1952372.1)
How to recover from a failed Linux Exadata DB Server dbnodeupdate or rollback (Doc ID
1952372.1)
Modified: Oct 1, 2019 Type: HOWTO
In this Document
Goal
Solution
Boot the system from diagnostic iso (Grub 1 + Grub2)
Switch filesystem labels (Grub 1 + Grub2)
Restore /boot (Grub 1 + Grub 2)
Install grub (Grub 1 only - this step can be skipped for Grub2)
Install grub (Grub 2 only - this step can be skipped for Grub1)
References
APPLIES TO:
GOAL
When updates or rollbacks (possibly via dbnodeupdate.sh) fail, an Exadata database server may become unbootable. It
also happens a database server does boot but fails to load the right modules or libraries after a broken or interrupted
update, effectively making the system unusable. As a result it can happens users and administrators are unable to login to
the system, even via the console. This document describes how to recover from a failing update or rollback on an Exadata
DB server. This document only applies to X2 and later DB servers where logical volume management (lvm) is in use and
where a backup was made to the inactive system partition prior the update.
If the database server was configured with logical volume management and a backup was made prior to the update, there
is a way to 'rollback' to that backup, by 'switching' system partitions. When switching system partitions, the active logical
volume becomes inactive and the inactive (the backup) becomes active. In addition to the switching of active and inactive
system partitions, the /boot directory belonging to the system partition being made the new active system partition needs
to be restored. Also the Grub bootloader needs to be re-installed (for Grub1 only).
On systems where administrators can login and run dbnodeupdate.sh, rollbacks should always be performed by
dbnodeupdate.sh On system where you cannot login, the process of switching system partitions, restoring /boot and
bootloader reinstallation can only be done when booting from the diagnostic iso first. This note will guide the reader
through the process of booting from diagnostic iso and restoring the relevant components.
Note: Failing updates to Exadata 12.1.2.1.0 (Oracle Linux 6) or Oracle Linux 7 should in most cases rollback
automatically.
SOLUTION
Perform the following steps to rollback an Exadata database server to the inactive (backup system partition):
Note:
Each of the below steps describe to which version of Grub it applies. Be careful not to use Grub1
steps for Grub2 systems.
If you are able to query the rpm database, and the query "rpm -aq | grep grub2" returns grub2
rpms, then you are on Grub2 else Grub1.
if unable to query check the grub configuration on your existing system.
It's also important to understand that the version of Grub on the image you are rolling back to
matters. EFI systems require additional steps such as mounting of /boot/efi during the restore
steps.
The steps in this document are verified up until Exadata release 19.2.x
The example in this document uses a system with /dev/sda1 being the first local disk used for
booting. This may vary for depending on the hardware you have. For example xvda is used for a
Xen domU.
Following steps are from 'Recovering Oracle Linux Database Server with Uncustomized Partitions' in the Oracle Exadata
Database Machine Maintenance Guide 12c Release 1 (12.1) but duplicated here with extra information.
In the 'Oracle Exadata Database Machine Maintenance Guide', pick up at 'Attach the
/opt/oracle.SupportTools/diagnostics.iso file from any healthy database server as virtual media to the ILOM of the database
server to be restored' :
1. Copy the diagnostic.iso file to a directory on the machine using the ILOM interface, such as your desktop. The
diagnostics.iso file can be transferred from another Exadata database or cell server running the same operating
system.
For example, to rescue a failed Oracle Linux 6 update, a Oracle Linux 6 diagnostics.iso is required.
2. Log in to the ILOM web interface.
3. Select Remote Console from the Remote Control tab. This will start the console.
Note: Later systems such as X5 may have different ILOM menu options to accomplish the same.
4. Select the Devices menu.
13. While loading of vmlinuz progresses, open another console using the serial ILOM console by connecting to the ILOM
by ssh
14. Select (e) to enter the interactive diagnostics shell, when the following screen appears:
15. At the logon screen login as root with Oracle supplied password.
16. This concludes this step.
Rolling back is basically switching filesystem labels. Exadata expects the active system partition to have filesystem label
'DBSYS' and the inactive system partition to have none.
In order to rollback:
1. the system partition running the OS that has the issues, as well as
2. the system partition to be rollback to need to be identified.
Standard Exadata Database Servers have an inactive and active system partition:
For regular Exadata installations and DOMU inactive and active system partitions are:
/dev/VGExaDb/LVDbSys1
/dev/VGExaDb/LVDbSys2
For DOM0:
/dev/VGExaDb/LVDbSys2
/dev/VGExaDb/LVDbSys3
Depending on the history of a machine (a previous rollback took place), the active system partition can be one or the
other. There is nothing wrong running a system on the LVDbSys3 system partition for DOM0 or have LVDbSys2 system
partition for regular Exadata installations. The current active system partition can be determined by the command
imageinfo as follows:
[root@db01~]# imageinfo
Kernel version: 2.6.39-400.243.1.el6uek.x86_64 #1 SMP Wed Nov 26 09:15:35 PST 2014 x86_64
Image version: 12.1.2.1.0.141206.1
Image activated: 2014-12-08 21:26:51 -0800
Image status: success
System partition on device: /dev/mapper/VGExaDb-LVDbSys1
If, in the above example, the active system partition would have issues, a rollback to LVDbSys2 would be required.
When booting from diagnostics.iso, the imageinfo command does not work. So other methods need to be used to
determine what system partition needs to be rolled back to. This can be done as follows:
Scroll back screen output from your dbnodeupdate patching session and search for 'Active LVM Name' and 'Inactive
LVM Name'
When the session output is lost, mount both system partitions to find which system partition was active when things
failed. This can be done as follows:
Note: for DOM0 systems LVDbSys1 and LVDbSys2 should be replaced by LVDbSys2 and LVDbSys3
Do a tail on the dbnodeupdate.log for both system partitions. The logfile with the most recent timestamp is on the system
partition that was active when things failed and hence rollback should be done to the other system partition.
Example: If the most recent timestamp is in logfile /mnt/LVDbSys1/var/log/cellos/dbnodeupdate.log, then rollback should
be done to LVDbSys2. After determining what system partition needs to be rolled back to, the switching of the filesystem
labels should proceed by removing the filesystem label from the system partition that has the issue (in this example we are
rolling back to LVDbSys2):
Note: for DOM0 systems the filesystem label should be DBSYSOVS instead of DBSYS
Verify the actions by running the e2label command followed by the system partiton without any other argument:
Note: for DOM0 systems LVDbSys1 and LVDbSys2 should be replaced by LVDbSys2 and LVDbSys3
Note: Only For X7-8 and X8-8 systems, there is additional three steps:
2. If not available bring up the LVM devices by performing the following actions:
3. Run the parted command to check which MD device has the fat32 partition for EFI. This can be either /dev/md125 or
/dev/md124”:
-sh-4.1# parted /dev/md124 p
-sh-4.1# parted /dev/md125 p
Say “md124” has a fat32 partition on partition number 2 - then this information about md124 being the boot drive would
be used in a later step.
This concludes this step.
On a working Exadata database server, the /boot filesystem is mounted on top of /dev/sda1 and contains all files required
for booting the system such kernel and Grub configuration. When rolling back to a previous image, the kernel and Grub
configuration belonging to that image should be restored to match the configuration.
On the mount point of the system partition being rolled back to a file named 'boot_backup.tbz' should exist. Validate
existence of this file by listing the directory contents of the mounted system partition being rolled back to (in this example
we are rolling back to LVDbSys2):
If the file is found (and you are on a non X7-8 or X8-8 system), proceed by mounting the /boot filesystem and restoring
the contents of the boot_backup.tbz file:
If you are on an EFI system (non X7-8 or X8-8) also mount the /boot/efi directory as follows:
If you are on an X7-8 or X8-8 (EFI system) mount the /boot and /boot/efi directory as follows:
Note: when creating directory /mnt/boot, ignore error messages saying the directory is already created
Example output of the untar should as follows (note kernel versions being restored depend on the image rolled back to -
also note this example is a Grub1 example):
boot/
boot/I_am_hd_boot
boot/symvers-2.6.39-400.128.17.el5uek.gz
boot/.vmlinuz-2.6.39-400.128.17.el5uek.hmac
boot/grub/
boot/grub/ufs2_stage1_5
boot/grub/grub.conf
boot/grub/minix_stage1_5
boot/grub/vstafs_stage1_5
boot/grub/jfs_stage1_5
boot/grub/iso9660_stage1_5
boot/grub/xfs_stage1_5
boot/grub/oracle.xpm.gz
boot/grub/menu.lst
boot/grub/fat_stage1_5
boot/grub/reiserfs_stage1_5
boot/grub/e2fs_stage1_5
boot/grub/splash.xpm.gz
boot/grub/stage1
boot/grub/stage2
boot/grub/ffs_stage1_5
boot/grub/device.map
boot/grub/stage2_eltorito
boot/grub/grub.stage.version
boot/System.map-2.6.39-400.128.17.el5uek
boot/initrd-2.6.39-400.128.17.el5uek.img
boot/lost+found/
boot/config-2.6.39-400.128.17.el5uek
boot/vmlinuz-2.6.39-400.128.17.el5uek
boot/initrd-2.6.39-400.128.17.el5uekkdump.img
Install grub (Grub 1 only - this step can be skipped for Grub2)
Last step in the rollback process is re-installation of the grub boot-loader. Perform the steps as follows (in this example we
are rolling back to LVDbSys2):
Verify Grub can be installed by searching for the marker restored in the previous step (commands in bold):
[ Minimal BASH-like line editing is supported. For the first word, TAB
lists possible command completions. Anywhere else TAB lists the possible
completions of a device/filename.]
grub> find /I_am_hd_boot
find /I_am_hd_boot
(hd0,0)
grub> quit
quit
The result of the above 'find' should be (hd0,0). If this is not the case, go back to the previous step.
After verifying, proceed re-installing the grub boot-loader as follows (command in bold):
[ Minimal BASH-like line editing is supported. For the first word, TAB
lists possible command completions. Anywhere else TAB lists the possible
completions of a device/filename.]
grub> root (hd0,0)
root (hd0,0)
Filesystem type is ext2fs, partition type 0x83
grub> setup (hd0)
setup (hd0)
Checking if "/boot/grub/stage1" exists... no
Checking if "/grub/stage1" exists... yes
Checking if "/grub/stage2" exists... yes
Checking if "/grub/e2fs_stage1_5" exists... yes
Running "embed /grub/e2fs_stage1_5 (hd0)"... failed (this is not fatal)
Running "embed /grub/e2fs_stage1_5 (hd0)"... failed (this is not fatal)
Running "install /grub/stage1 (hd0) /grub/stage2 p /grub/grub.conf "... succeeded
Done.
grub> quit
quit
Install grub (Grub 2 only - this step can be skipped for Grub1)
-sh-4.1# for dir in proc sys dev; do mkdir -p /mnt/LVDbSys2/$dir && mount --bind /$dir
/mnt/LVDbSys2/$dir; done
Bind mount the /mnt/boot onto the target filesystem (Sys2 in this example)
Verify the outcome of the following command matches your boot disk (sda) in our example:
Install grub
Note: the below command can use either /dev/sda or /dev/md124 (or /dev/md125) depending on your system type as
shown earlier in this note.
In the case of systems with /dev/md124 or /dev/md125 - run fsck on the 'p1' and 'p2' partitions. Example as follows:
After unmounting the filesystems that were mounted in the above steps, disconnect the CD-ROM image from the Devices
menu in the ILOM web interface and reboot the database server to boot into the previous (backup) system partition:
-sh-4.1# sync
-sh-4.1# umount /mnt/LVDbSys1
-sh-4.1# umount /mnt/LVDbSys2
-sh-4.1# umount /mnt/boot/efi # (if on EFI system)
-sh-4.1# umount /mnt/boot
-sh-4.1# reboot
REFERENCES
NOTE:1556257.1 - Exadata YUM Repository Population, One-Time Setup Configuration and YUM upgrades
NOTE:1947114.1 - How to boot Exadata database server with diagnostic ISO image
NOTE:1553103.1 - dbnodeupdate.sh and dbserver.patch.zip: Updating Exadata Database Server Software using the
DBNodeUpdate Utility and patchmgr
Didn't find what you are looking for?