Troubleshooting Common Issues
Last updated
Last updated
Ping the server by Hostname and IP Address:
Hostname/IP Address is pingable:
The issue might be on the client side since the server is reachable.
Hostname is not pingable but IP Address is pingable:
Likely a DNS issue. Check:
/etc/hosts
/etc/resolv.conf
/etc/nsswitch.conf
Test DNS Resolution:
Using nslookup, dig or host
Neither Hostname nor IP Address is pingable:
Check another server on the same network:
False: Issue is with this specific host/server.
True: Likely a broader network issue.
Log in via Virtual Console (if the server is powered on):
Check uptime using command uptime
.
Verify if the server has an IP and if the network interface is UP.
Run the command ip addr
Ensure the network interface (e.g., eth0
, ens33
) is listed and in the "UP" state.
Ping the gateway and check routes.
Check SELinux and firewall rules.
Inspect physical cable connections.
Ping the server by Hostname and IP Address:
False: Follow troubleshooting steps from “Server is not reachable or unable to connect.”
True: Check service availability using the telnet
command with the appropriate port:
True: The service is running.
False: The service is not reachable or running. Check:
Service status (using systemctl
or equivalent commands).
Firewall/SELinux settings.
Service logs.
Service configuration.
Ping the server by Hostname and IP Address:
False: Follow troubleshooting steps from “Cannot Reach Server”
True: Check service availability using the telnet
command with the SSH port:
True: The service is running:
Check if the issue is on the client side.
Verify:
User account is not disabled.
User has a valid shell (not nologin
).
Root login is not disabled in the SSH configuration.
False: The service is not reachable or running. Check:
Service status (using systemctl
or equivalent commands).
Firewall/SELinux settings.
Service logs.
Service configuration.
Detect Performance Degradation:
Applications are slow or unresponsive.
Commands fail to execute (e.g., /
disk space is full).
Logging and other system operations fail.
Analyze the Issue:
Use the df
command to identify the problematic filesystem.
Take Action:
Use du
to find large files/directories in the affected filesystem.
Compress or remove large files.
Move files to another partition or server.
Check disk health with badblocks
(e.g., badblocks -v /dev/sda
).
Identify I/O-bound processes using iostat
.
Create a link to move large files/directories.
Add a New Disk:
Simple Partition:
Add the disk to the VM.
Verify the new disk using df
or lsblk
.
Use fdisk
to create a partition (preferably LVM).
Create a filesystem, mount it, and add it to fstab
for persistence.
LVM Partition:
Add the disk to the VM.
Verify with df
or lsblk
.
Use fdisk
to create an LVM partition.
Set up PV, VG, and LV.
Create a filesystem, mount it, and add it to fstab
.
Extend LVM Partition:
Add and create an LVM partition.
Add the new LVM partition (PV) to the existing VG.
Extend the LV and resize the filesystem.
Symptoms:
The system fails to boot.
Check Logs:
Investigate /var/log/messages
, dmesg
, and other log files.
Look for bad sector logs.
Run fsck
if Bad Sectors are Found:
Reboot the system into rescue mode (e.g., boot from CD-ROM or ISO).
Select Option 1 to mount the original root filesystem under /mnt/sysimage
.
Edit fstab
entries or recreate the file using blkid
.
Reboot the system.
fstab
FileSymptoms:
The system fails to boot.
Check Logs:
Investigate /var/log/messages
, dmesg
, and other log files.
Look for bad sector logs.
Run fsck
if Bad Sectors are Found:
Reboot the system into rescue mode (e.g., boot from CD-ROM or ISO).
Select Option 1 to mount the original root filesystem under /mnt/sysimage
.
Edit fstab
entries or recreate the file using blkid
.
Reboot the system.
cd
to Directory (Even with Sudo Privileges)Reasons and Resolutions:
Directory does not exist.
Pathname conflict (relative vs absolute path).
Parent directory permission or ownership issues.
Missing executable permissions on the target directory.
Hidden directory not visible.
Reasons and Resolutions:
Target directory or file does not exist.
Pathname conflict (relative vs absolute path) — ensure the path is complete.
Parent directory permission or ownership issues.
Target file permission or ownership issues — must have read permissions.
Hidden directory or file not visible.
Types of Memory:
Cache: L1, L2, L3.
RAM:
Usage details from free -h
:
Total: Total assigned memory.
Used: Total memory actually in use.
Free: Memory available for immediate use.
Shared: Shared memory.
Buff/Cache: Pages cached in memory.
Available: Memory that can be freed.
Check /proc/meminfo
for detailed metrics:
File active/inactive, Anon active/inactive.
Swap (Virtual Memory): Monitor and manage for system stability.
Resolutions:
Identify high-memory processes using top
, htop
, or ps
.
Check logs for OOM events and review memory overcommit settings in sysctl.conf
.
Kill or restart memory-hogging processes/services.
Use nice
to prioritize critical processes.
Add or extend swap space.
Install more physical RAM.
Steps to Add Swap Space:
Create a file using dd
to reserve disk blocks for swap.
Set file permissions to 600
and assign root ownership.
Format the file for swap with mkswap
.
Enable swap using swapon
.
Add the swap file to fstab
for persistence.
Troubleshooting and Resolutions:
Command issues:
System-related commands may require root access.
User-defined scripts/commands might have restrictions.
Steps to troubleshoot:
Check permission or ownership of the command/script.
Ensure sudo privileges are configured.
Verify the absolute or relative path to the command/script.
Ensure the command is in the user's $PATH
variable.
Confirm that the command is installed.
Check for missing or deleted command libraries.
Troubleshooting and Resolution:
System Reboot/Crash Reasons:
CPU stress.
RAM stress.
Kernel fault.
Hardware fault.
Process Restart Causes:
System reboot triggers process restarts.
Processes might restart themselves.
Watchdog applications:
Prevent high stress on system resources.
Restart or terminate processes causing excessive stress.
Troubleshooting Steps:
After logging in, check system status using commands like:
uptime
, top
, dmesg
, journalctl
, iostat -xz 1
.
Examine log files: syslog.log
, boot.log
, dmesg
, messages.log
.
Check custom application log paths.
If inaccessible, use virtual consoles (e.g., ILO, IDRAC).
Open a support case with the vendor if needed.
IP Assignment Methods:
DHCP:
Fixed Allocation.
Dynamic Allocation.
Static IP.
Troubleshooting Steps:
Check network settings in the virtualization environment (e.g., VMware, VirtualBox).
Verify whether an IP address has been assigned.
Check the NIC status on the host using tools like lspci
, nmcli
.
Restart the network service.
Backup and Restore Steps:
The best option is to create an ACL file for directories/files before making bulk permission changes:
Backup file permissions: getfacl -R <dir> > permissions.acl
.
Restore file permissions: setfacl --restore=permissions.acl
.
Restore using a VM snapshot (not ideal for production environments).
Rebuild the VM (a safer option for long-term stability).
Tips for Managing Disk Partitions:
After attaching a new disk to a VM, use lsblk
to check its status, then rescan using:
echo 1 > /sys/block/sda/device/rescan
.
Increasing the size of an existing disk appends additional space to the disk without affecting the existing file system or partition.
Recreating the filesystem on a block device automatically formats the old one.
For a disk with an existing partition/filesystem, share the .vmdk
file to another VM. After mounting, the data will remain identical.