Why Is My Linux ECS Not Booting and Going Into Emergency Mode
Why Is My Linux ECS Not Booting and Going Into Emergency Mode
Possible Causes
The emergency mode allows you to recover the system even if the system fails to enter the rescue
mode. In emergency mode, the system installs only the root file system for data reading. It does not
attempt to install any other local file systems or activate network interfaces.
An error occurred in the /etc/fstab file, leading to the failure in mounting the file system.
Constraints
The operations in this section are applicable to Linux. The operations involve recovering the file
system, which may lead to data loss. Therefore, back up data before recovering the file system.
Solution
1. Enter the password of user root and press Enter to enter the recovery mode.
2. Run the following command to mount the root partition in read-write mode to modify the files in the
root directory:
# mount -o rw,remount /
3. Run the following command to try to mount all unmounted file systems:
# mount -a
o If the message "mount point does not exist" is displayed, the mount point is unavailable. In such a
case, create the mount point.
o If the message "no such device" is displayed, the file system device is unavailable. In such a case,
comment out or delete the mount line.
o If the message "an incorrect mount option was specified" is displayed, the mount parameters have
been incorrectly set. In such a case, correct the parameter setting.
# vi /etc/fstab
Table 1 /etc/fstab parameters
Parameter Description
You are advised to set file system in UUID format. To obtain the UUID of a device file system,
run the blkid command.
UUIDs are independent from the disk order. If the sequence of storage devices is changed
manually or undergoes some random changes by some BIOSs, or the storage devices are removed
and installed again, UUIDs are more effective in identifying the storage devices.
[type] Specifies the type of the file system to which a device or partition is mounted. The following file
systems are supported: ext2, ext3, ext4, reiserfs, xfs, jfs, smbfs, iso9660, vfat, ntfs, swap, and auto.
If type is set to auto, the mount command will speculate on the type of the file system that is
used, which is useful for mobile devices, such as CD-ROM and DVD.
[options] Specifies the parameters used for mounting. Some parameters are available only for specific file
systems. For example, defaults indicates the default mounting parameters of a file system will be
Table 1 /etc/fstab parameters
Parameter Description
For more parameters, run the # man mount command to view the man manual.
The value can be 0 or 1. 0 indicates that data will not be backed up, and 1 indicates that data will
be backed up. If you have not installed dump, set the parameter to 0.
The parameter value can be 0, 1, or 2. 0 indicates that the file systems will not be checked by
fsck. 1 indicates the highest priority of the root directory to be checked by fsck, and 2 indicates the
lower priority of other systems to be checked.
# mount -a
# reboot
7. Run the following command to check for file system errors:
# dmesg |egrep "ext[2..4]|xfs" |grep -i error
NOTE:
o If the error message "I/O error... inode" is displayed, the fault is caused by a file system error.
o If no error is found in the logs, the fault is generally caused by the damaged superblock. The
superblock is the header of the file system. It records the status, size, and idle disk blocks of the file
system.
o If the superblock of a file system is damaged, for example, data is written to the superblock of the file
system by mistake, the system may fail to identify the file system. As a result, the system enters the
emergency mode during startup. The ext2fs file system backs up the superblock and stores the
backup at the blockgroup boundary of the driver.
8. Run the following command to unmount the directory where the file system error occurred:
# Unmount Mount point
9. Recover the damaged file system.
NOTICE:
Recovering the file system may lead to data loss. Back up data before the recovery.
o For the ext file system, run the following command to check whether the file system is faulty:
# fsck -n /dev/vdb1
NOTE:
If the message "The super block Cloud no be read or does not describe a correct ext2 filesystem" is
displayed, go to step 10.
To recover the file system, run the following command:
# fsck /dev/vdb1
o For the xfs file system, run the following command to check whether the file system is faulty:
# xfs_repair -n /dev/vdb1
To recover the file system, run the following command:
# xfs_repair /dev/vdb1
10. (Optional) If the message "The super block Cloud no be read or does not describe a correct ext2
filesystem" is displayed, the superblock is damaged. In such a case, use the superblock backup for
recovery.
Figure 2 Damaged superblock
Run the following command to replace the damaged superblock with the superblock backup:
NOTE:
-b 8193 indicates that the backup of superblock 8193 in the file system is used.
The location of the superblock backup varies depending on the block size of the file system. For a
file system with a 1 KB block size, locate the backup at superblock 8193; for a 2 KB block size,
locate the backup at superblock 16384; for a 4 KB block size, locate the backup at superblock
32768.
# reboot
Wait for a while and then check to see if anything shows up when you
type in the command:
$ sudo grep "[0-9]"
/sys/devices/system/edac/mc/mc*/csrow*/ch*_ce_count
This presents you with a list of the memory controller's row (DIMM)
and error count. Combined with dmidecode data on memory channel,
slot, and part number, this process helps you find the corrupted
memory stick.
If it turns out that, say, the Apache web server isn't running, you can
start it with this:
$ sudo service apache2 start
In short, before jumping in to work out what's wrong, make sure you
work out which element is at fault. Only once you're sure you know
what a problem is do you know the right questions to ask or the next
level of troubleshooting to investigate.
I mean, sure, you know your car doesn't run, but first you need to
make sure there's gas in the tank before hauling the car off to the
shop for repairs.
3. Top
Another useful system debugging step is top, to check load average,
swap, and which processes are using resources. Top shows all of a
Linux server's currently running processes.
Specifically, top displays:
Line 1:
The time
How long the computer has been running
Number of users
Load average (the system load time for the last minute, last 5
minutes, and last 15 minutes)
Line 2:
Total number of tasks
Number of running tasks
Number of sleeping tasks
Number of stopped tasks
Number of zombie tasks
Line 3:
CPU usage as a percentage by the user
CPU usage as a percentage by system
CPU usage as a percentage by low-priority processes
CPU usage as a percentage by idle processes
CPU usage as a percentage by I/O wait
CPU usage as a percentage by hardware interrupts
CPU usage as a percentage by software interrupts
CPU usage as a percentage by steal time
Total system memory
Free memory
Memory used
Buffer cache
Line 4:
Total swap available
Total swap free
Total swap used
Available memory
This is followed by a line for each running application. It includes:
Process ID
User
Priority
Nice level
Virtual memory used by process
Resident memory used by process
Shareable memory
CPU used by process as a percentage
Memory used by process as a percentage
Time process has been running
Command
That’s a wealth of useful troubleshooting information. Here are some
useful ways to get at it.
To find the process consuming the most memory, sort the process list
by pressing the M key. To see which applications are using the most
CPU, press P; and to sort by running time, press T. To more easily see
which column you're using for sorting, press the b key.
You can also interactively filter top's results by pressing o or O, which
displays the following prompt:
add filter #1 (ignoring case) as: [!]FLD?VAL
This displays the delivered reads, writes, read KB, and write KB per
second to the device. It also shows you the average time for the I/O in
milliseconds (await). The bigger the await number, the more likely it is
that the drive is saturated with data requests, or it has a hardware
problem. Which is it? You might use top to see if MySQL (or whatever
DBMS you're using) is keeping your server busy. If there's no
application burning the midnight oil, then chances are your drive is
turning sour.
Another important result is found under %util, which measures device
utilization. This shows how hard the device is doing work. Values
greater than 60% indicate poor storage performance. If the value is
close to 100%, the drive is nearing saturation.
Be careful of what you're looking at. A logical disk device fronting
multiple back-end disks with 100% utilization may just mean that some
I/O is always being processed. What matters is what's happening on
those back-end disks. So, when you're looking at a logical drive, keep
in mind that the disk utilities aren't going to giving you useful
information.
The most common way to access this log data is with the command:
journalctl -b
This shows you all the journal entries since the most recent reboot. If
your system required a reboot, you can track what happened the last
time by using the command:
$ journalctl -b -1