Linux
Linux
1. Please document the real time issues that a DevOps engineer faces with Linux OS.
Please focus on the issues that occur the most.
2. Mini Projects that we can have the freshers do to gain expertise with Linux OS.
For real time issues – Below is the format. Please make it as detailed as possible so that freshers can
easily understand them
RealTime issue: 1
Problem Statement: What if I am not able to make ssh connection to a server.
Why does that occur:
Fix:
First will check whether the server is UP or not.
Need to ensure have proper authentication to make ssh to the server.
Run ssh command in debug mode I.e., ssh -vvv to get in detailed information.
Reltime Issue: 2
Problem Statement: The System's CPU usage is consistently high, causing sluggish performance.
Why does that occur:
Any of the system resources like Ram, Disk, Apache etc., can cause high cpu usage.
Certain default settings or misconfigurations can lead to utilization issues. (Running the App with
more heap memory e.g: JAVA_OPTS)
An application bug can lead to memory leak etc.
When faced with a high traffic, a system will run out of resources (CPU, Memory has been occupied
and not able to respond)
Fix:
Check the memory how much utilized (top) or check the garbage collection.
Check the application logs and restart if required.
Realtime Issue: 3
Problem Statement: The system frequently experiences out-of-memory errors or becomes
unresponsive due to excessive memory storage.
Check the memory how much utilized (top) or check the garbage collection.
Check the application logs and restart if required.
Increase the RAM if required.
Realtime Issue: 4
Problem Statement: A critical service or process is failing or not functioning as expected.
Why does that occur:
Fix:
Verify the status of the service using the 'systemctl' command.
Examine the system logs (e.g., '/var/log/syslog' , '/var/log/messges' ) for error
messages related to the failing service.
Restart the service to see if it resolves the issue temporarily.
Check the Service's configuration files for any misconfigurations or inconsistencies.
Realtime Issue: 5
Problem Statement: For same file system df and du command shows different disk usage.
Why does that occur:
This could be because of open file deletion, I.e.., when someone delete a log file that is being used or
open by other process if we try to delete this file then file name will be deleted but it's inode and
data will not be deleted.
Fix:
With the help of 'lsof' we can determine the deleted files of /var that are still open
$ lsof /var | egrep "^COMMAND|deleted"
So to release the space , we can kill the command with its PID using kill command.
Realtime issue: 6
Problem: System unexpectedly reboot and process restart?
Why does that occur?
CPU stress
RAM stress
Kernal fault
Hardware fault
Fix:
Check the system related logs.
If the hardware is faulty, then replace the corresponding hardware.
If the Kernal is faulty, then recreate the instance.
Realtime issue: 7
Problem Statement:
HTTP error 403: forbidden yum occurs when we try to install a package using yum
The HTTP 403 Forbidden response status code indicates that the server understands the request but
refuses to authorize it. The access is permanently forbidden and tied to the application logic, such as
insufficient rights to a resource.
Why does that occur?
There could be some major causes during installing pkg from yum
Network configuration
A corrupt repo
Permission of packages
Selinux issue
firewall rules
Fix:
Realtime issue: 8
Problem Statement:
Can't cd to the directory? Even if the user has the sudo privileges.
Why does that occur:
Directory does not exist
Pathname conflict: relative vs absolute path
Parent directory permission/ownership
Doesn't have executable permission on target directory
Hidden directory
Realtime issue: 9
Problem Statement:
Run the application (java, python) with specific version of the software(e.g: java11, java17,
python2.7, python3.6)
Why does that occ0075r?
Assume you have an application build on jdk-17 and you are trying to run with jdk11(default java on
the machine)
Fix:
Check the default version of the software.
Update with update-alternatives command to change required software version.
Run with absolute path where the software installed. (/usr/bin/java17 -jar jarpath)
Modify the specified user profile with the ENV variable like PATH if you have
downloaded the tar/zip bundle.
Realtime issue: 10
Problem Statement:
You are trying to run a script named "script.sh" on your Linux system, but you
encounter a permission denied error.
Realtime issue: 11
Problem Statement:
You are experiencing issues with log files on your Linux system. The logs are not being
generated or are not up to date, making it difficult to troubleshoot system issues.
Why it occurs?
1) log file locations may be wrongly configured
2) log utility service (daemon) may not run
3) May not have permissions to write in to log file
4) In sufficient disk space
Fix:
1) Check log file locations: Linux systems typically have a centralized directory for log
files, usually located in the /var/log/ directory. Verify that the log files you expect to
see are present in this directory. Common log files include syslog, messages, auth.log,
and kern.log.
2) Verify logging services are running: Check if the logging services are running on your
system. One of the most common logging services is syslogd or rsyslogd. Use the
following command to check the status of the service:
sudo systemctl status syslog.service
Realtime issue: 12
Problem Statement:
log files grow indefinitely, occupying excessive disk space and potentially impacting system
performance
Fix
Log rotation is typically implemented using tools such as logrotate or systemd-journald.
Any logs we can use (web server logs, system logs , db logs, app logs etc)
Realtime issue: 13
Problem Statement:
Same script working well under one path, when you copy into different location it may not
working as expected
Why it occurs
The script may use the relative path inside the script to access other modules
Fix
Replace all relative path with absolute path to resolve the change location base path
problems
Realtime issue: 14
Problem Statement:
You have a directory named "Documents" that contains several large files. You want to
create a backup of these files in another directory named "Backup" without duplicating the
content. You also want to access and modify the backup files as if they were the original
files.
Fix
We can use the soft link to provide the solutions
Create a soft link for each file from the "Documents" directory to the "Backup" directory.
Soft links, also known as symbolic links, are like shortcuts or pointers to another file or
directory. They reference the original file by its path.
ln -s /path/to/Documents/file.txt /path/to/Backup/file.txt
Realtimeissue: 15
Why it occurs:
Usually slow speed on Ubuntu/Debian server update & upgrade is due to these several
issues:
Mirrors issue
Name Servers issue
Repositories issue
Unknown issues
Fix:
In order to fix apt-get update, you need to check a few points:
Check if you have the right source repositories is your /etc/apt/sources.list file
Remove unwanted or unsupported source repositories.
Clean apt-get cache.
Choose a fast DNS server.
# apt-get clean
# vi /etc/resolv.conf
The following two are Google DNS, admittedly, if Google is broken, we all think the Internet
is broken. Hence the reason of using Google DNS. You can choose other DNS servers like
CloudFlare if you want that are fast and reliable.
nameserver 8.8.8.8
nameserver 8.8.4.4
Or CloudFlare Name Server
nameserver 1.1.1.1
nameserver 1.0.0.1
Now save and close the file.
Do an apt-get update
# apt-get update
Do and upgrade
# apt-get upgrade
Finally, do a distribution upgrade
# apt-get dist-upgrade
Your download speed should be much greater than what you were getting earlier.
Realtimeissue: 16
Problem statement:
If you encounter a "Permission denied" error during mounting, try the following:
Why it occurs
The current logged in user don't have permission to access either a remote folder or or a
mount point
Fix
Make sure you have the necessary permissions to mount. Usually, you need root or
sudo privileges to mount file systems.
Check the file system permissions and ownership of the mount point directory.
Verify that the file system you are attempting to mount is supported by your Linux
distribution and that the necessary file system drivers are installed.
Realtimeissue: 17
Problem statement:
You are experiencing a system slowdown due to a misbehaving application that is consuming
excessive resources. You need to identify the process causing the issue and terminate it to
restore system performance.
Fix
To identify and terminate the problematic process, I would follow these steps:
Check system resource usage: Use the top command or tools like htop to monitor
system resource usage in real-time. Look for processes that are consuming a
significant amount of CPU, memory, or other resources.
Identify the process: Note down the Process ID (PID) of the process causing the high
resource usage. The PID can be found in the leftmost column of the top or htop
output.
Investigate process details: Use the ps command with the PID to gather more
information about the process. For example, run ps -p <PID> -o pid,ppid,user,cmd
to retrieve details such as the process ID, parent process ID, user, and command.
Attempt graceful termination: First, try terminating the process gracefully using the
kill command with the SIGTERM signal. Execute kill <PID> to send the termination
signal to the process. Observe if the process terminates and if the system performance
improves.
Verify termination: Check if the process has indeed terminated by running the ps
command again. If the process is still running, proceed to the next step.
Forceful termination: If the process remains unresponsive, you can forcefully
terminate it using the kill command with the SIGKILL signal. Execute kill -9 <PID>
to send the SIGKILL signal, which will forcefully terminate the process.
Confirm termination and system performance: After terminating the process, monitor
the system again to ensure that the performance has improved and the misbehaving
process is no longer consuming excessive resources.
Realtimeissue: 18
Problem statement:
You are experiencing file system errors on your Linux system, causing issues with file access
and data integrity. You need to diagnose and fix these file system errors to restore normal
operation.
Fix
To diagnose and fix file system errors on Linux, follow these steps:
Identify the file system: Determine the file system type in use on the affected
partition. Run the command df -Th to list all mounted file systems and their types.
Note the file system type (e.g., ext4, XFS, Btrfs).
Unmount the partition: Before performing file system checks and repairs, it is
necessary to unmount the affected partition. Run the command sudo umount
/dev/sdX#, replacing /dev/sdX# with the appropriate device identifier and partition
number.
Check the file system: Run a file system check utility specific to the file system type.
For example:
For ext4: sudo fsck.ext4 /dev/sdX#
For XFS: sudo xfs_repair /dev/sdX#
For Btrfs: sudo btrfs check /dev/sdX#
The file system check utility will scan and identify any errors within the file system.
Fix the file system errors: Depending on the file system check utility, there may be an
option to automatically fix identified errors. For example, you can use the -y flag with
fsck.ext4 or xfs_repair to automatically repair the file system. However, exercise
caution as automated repairs may result in data loss or corruption.
If automated repairs are not possible or if you prefer manual intervention, follow the
instructions provided by the file system check utility to manually fix the identified
errors. This may involve removing or recovering corrupted files, rebuilding metadata,
or other corrective actions.
Remount the partition: Once the file system errors are resolved, remount the partition
using the command sudo mount /dev/sdX# /mount/point, replacing /dev/sdX# with
the appropriate device identifier and partition number, and /mount/point with the
desired mount point.
Verify file system integrity: After remounting the partition, perform additional checks
to ensure the file system integrity. Run the command sudo fsck -f /dev/sdX# to
perform a more thorough check and confirm that no further errors are detected.
Realtimeissue: 19
Problem statement:
Encountering errors during package installation is a common issue in Linux
Why it occurs
Dependency errors
Package not found
Permission errors
Conflict with existing packages
Network connection issues
Insufficient disk space
Fix
Dependency errors:
If you encounter dependency errors while installing a package, it means that the
package you're trying to install requires other packages or libraries to be installed
first. To resolve this:
Identify the missing dependencies mentioned in the error message.
Use your package manager (e.g., apt, yum, dnf) to install the missing dependencies.
Retry installing the package once the dependencies are installed.
Package not found: If you receive a "Package not found" error, it could mean that the
package name is incorrect, or the package repository is not properly configured. To
resolve this:
Double-check the package name for any typos.
Update your package repositories using the appropriate command for your package
manager (e.g., sudo apt update for APT, sudo yum update for YUM).
If the package is still not found, verify that the package repository is correctly
configured and accessible.
Permission errors: Permission errors may occur if you are trying to install packages
without sufficient privileges. To resolve this:
Ensure that you are using root or sudo privileges to install the package.
Use the sudo command before the package installation command (e.g., sudo apt
install package_name).
Conflict with existing packages: If there is a conflict between the package you are
trying to install and an already installed package, the installation may fail. To resolve
this:
Check the error message for details on the conflicting package or file.
If possible, uninstall or remove the conflicting package or file.
Retry the installation of the package.
Network connection issues: Installation errors can occur if your system is unable to
connect to the package repository due to network issues. To resolve this:
Check your network connection to ensure it is stable and functional.
If you are behind a proxy, configure the proxy settings on your system.
Try using a different mirror or repository server for package installation.
Insufficient disk space: If your system has insufficient disk space, package
installation may fail. To resolve this:
Check your available disk space using the df -h command.
Remove unnecessary files or free up disk space to make room for the package
installation.
Realtimeissue: 20
Problem statement:
When try to run some applications on linux got error saying Port is already in use
Why it occurs
If you receive an error indicating that a port is already in use, it means another process is
already listening on that port.
Fix
Identify the process using the port by running the command sudo netstat -tuln | grep
<port_number>. Replace <port_number> with the actual port number.
Take note of the process ID (PID) listed in the output.
Terminate the process using the port by executing sudo kill <PID>, replacing <PID>
with the process ID obtained in the previous step.
Retry using the port or choose a different port if necessary.