# Troubleshooting Common Issues

<figure><img src="/files/YIlnmHblLWdl1Bou5Peb" alt=""><figcaption></figcaption></figure>

## **Cannot Reach Server**

1. **Ping the server by Hostname and IP Address:**
   * **Hostname/IP Address is pingable:**
     * The issue might be on the client side since the server is reachable.
   * **Hostname is not pingable but IP Address is pingable:**
     * Likely a DNS issue. Check:
       * `/etc/hosts`
       * `/etc/resolv.conf`
       * `/etc/nsswitch.conf`
       * **Test DNS Resolution:**
         * **Using `nslookup, dig or host`**
   * **Neither Hostname nor IP Address is pingable:**
     * Check another server on the same network:
       * **False:** Issue is with this specific host/server.
       * **True:** Likely a broader network issue.
     * Log in via Virtual Console (if the server is powered on):
       * Check uptime using command `uptime`.
       * Verify if the server has an IP and if the network interface is UP.
         * Run the command `ip addr`&#x20;
         * Ensure the network interface (e.g., `eth0`, `ens33`) is listed and in the "UP" state.
       * Ping the gateway and check routes.
       * Check SELinux and firewall rules.
       * Inspect physical cable connections.

## **Cannot Reach Website or Application**

1. **Ping the server by Hostname and IP Address:**
   * **False:** Follow troubleshooting steps from “Server is not reachable or unable to connect.”
   * **True:** Check service availability using the `telnet` command with the appropriate port:
     * **True:** The service is running.
     * **False:** The service is not reachable or running. Check:
       * Service status (using `systemctl` or equivalent commands).
       * Firewall/SELinux settings.
       * Service logs.
       * Service configuration.

## **Unable to SSH as Root or User**

1. **Ping the server by Hostname and IP Address:**
   * **False:** Follow troubleshooting steps from “Cannot Reach Server”
   * **True:** Check service availability using the `telnet` command with the SSH port:
     * **True:** The service is running:
       * Check if the issue is on the client side.
       * Verify:
         * User account is not disabled.
         * User has a valid shell (not `nologin`).
         * Root login is not disabled in the SSH configuration.
     * **False:** The service is not reachable or running. Check:
       * Service status (using `systemctl` or equivalent commands).
       * Firewall/SELinux settings.
       * Service logs.
       * Service configuration.

## **Disk Space is Full or Adding/Extending Disk Space**

1. **Detect Performance Degradation:**
   * Applications are slow or unresponsive.
   * Commands fail to execute (e.g., `/` disk space is full).
   * Logging and other system operations fail.
2. **Analyze the Issue:**
   * Use the `df` command to identify the problematic filesystem.
3. **Take Action:**
   * Use `du` to find large files/directories in the affected filesystem.
   * Compress or remove large files.
   * Move files to another partition or server.
   * Check disk health with `badblocks` (e.g., `badblocks -v /dev/sda`).
   * Identify I/O-bound processes using `iostat`.
   * Create a link to move large files/directories.
4. **Add a New Disk:**
   * **Simple Partition:**
     * Add the disk to the VM.
     * Verify the new disk using `df` or `lsblk`.
     * Use `fdisk` to create a partition (preferably LVM).
     * Create a filesystem, mount it, and add it to `fstab` for persistence.
   * **LVM Partition:**
     * Add the disk to the VM.
     * Verify with `df` or `lsblk`.
     * Use `fdisk` to create an LVM partition.
     * Set up PV, VG, and LV.
     * Create a filesystem, mount it, and add it to `fstab`.
   * **Extend LVM Partition:**
     * Add and create an LVM partition.
     * Add the new LVM partition (PV) to the existing VG.
     * Extend the LV and resize the filesystem.

## **Filesystem Corruption**

1. **Symptoms:**
   * The system fails to boot.
2. **Check Logs:**
   * Investigate `/var/log/messages`, `dmesg`, and other log files.
   * Look for bad sector logs.
3. **Run `fsck` if Bad Sectors are Found:**
   * Reboot the system into rescue mode (e.g., boot from CD-ROM or ISO).
   * Select Option 1 to mount the original root filesystem under `/mnt/sysimage`.
   * Edit `fstab` entries or recreate the file using `blkid`.
   * Reboot the system.

## **Missing or Incorrect `fstab` File**

1. **Symptoms:**
   * The system fails to boot.
2. **Check Logs:**
   * Investigate `/var/log/messages`, `dmesg`, and other log files.
   * Look for bad sector logs.
3. **Run `fsck` if Bad Sectors are Found:**
   * Reboot the system into rescue mode (e.g., boot from CD-ROM or ISO).
   * Select Option 1 to mount the original root filesystem under `/mnt/sysimage`.
   * Edit `fstab` entries or recreate the file using `blkid`.
   * Reboot the system.

## **Cannot `cd` to Directory (Even with Sudo Privileges)**

1. **Reasons and Resolutions:**
   * Directory does not exist.
   * Pathname conflict (relative vs absolute path).
   * Parent directory permission or ownership issues.
   * Missing executable permissions on the target directory.
   * Hidden directory not visible.

## **Cannot Create Links**

1. **Reasons and Resolutions:**
   * Target directory or file does not exist.
   * Pathname conflict (relative vs absolute path) — ensure the path is complete.
   * Parent directory permission or ownership issues.
   * Target file permission or ownership issues — must have read permissions.
   * Hidden directory or file not visible.

## **Running Out of Memory**

1. **Types of Memory:**
   * **Cache:** L1, L2, L3.
   * **RAM:**
     * Usage details from `free -h`:
       * **Total:** Total assigned memory.
       * **Used:** Total memory actually in use.
       * **Free:** Memory available for immediate use.
       * **Shared:** Shared memory.
       * **Buff/Cache:** Pages cached in memory.
       * **Available:** Memory that can be freed.
     * Check `/proc/meminfo` for detailed metrics:
       * File active/inactive, Anon active/inactive.
   * **Swap (Virtual Memory):** Monitor and manage for system stability.
2. **Resolutions:**
   * Identify high-memory processes using `top`, `htop`, or `ps`.
   * Check logs for OOM events and review memory overcommit settings in `sysctl.conf`.
   * Kill or restart memory-hogging processes/services.
   * Use `nice` to prioritize critical processes.
   * Add or extend swap space.
   * Install more physical RAM.

## **Add or Extend Swap Space**

1. **Steps to Add Swap Space:**
   * Create a file using `dd` to reserve disk blocks for swap.
   * Set file permissions to `600` and assign root ownership.
   * Format the file for swap with `mkswap`.
   * Enable swap using `swapon`.
   * Add the swap file to `fstab` for persistence.

## **Unable to Run Certain Commands**

1. **Troubleshooting and Resolutions:**
   * **Command issues:**
     * System-related commands may require root access.
     * User-defined scripts/commands might have restrictions.
   * **Steps to troubleshoot:**
     * Check permission or ownership of the command/script.
     * Ensure sudo privileges are configured.
     * Verify the absolute or relative path to the command/script.
     * Ensure the command is in the user's `$PATH` variable.
     * Confirm that the command is installed.
     * Check for missing or deleted command libraries.

## **System Unexpectedly Rebooting and Processes Restarting**

1. **Troubleshooting and Resolution:**
   * **System Reboot/Crash Reasons:**
     * CPU stress.
     * RAM stress.
     * Kernel fault.
     * Hardware fault.
   * **Process Restart Causes:**
     * System reboot triggers process restarts.
     * Processes might restart themselves.
     * Watchdog applications:
       * Prevent high stress on system resources.
       * Restart or terminate processes causing excessive stress.
   * **Troubleshooting Steps:**
     * After logging in, check system status using commands like:
       * `uptime`, `top`, `dmesg`, `journalctl`, `iostat -xz 1`.
     * Examine log files: `syslog.log`, `boot.log`, `dmesg`, `messages.log`.
     * Check custom application log paths.
     * If inaccessible, use virtual consoles (e.g., ILO, IDRAC).
     * Open a support case with the vendor if needed.

## **Unable to Get an IP Address**

1. **IP Assignment Methods:**
   * **DHCP:**
     * Fixed Allocation.
     * Dynamic Allocation.
   * **Static IP.**
2. **Troubleshooting Steps:**
   * Check network settings in the virtualization environment (e.g., VMware, VirtualBox).
   * Verify whether an IP address has been assigned.
   * Check the NIC status on the host using tools like `lspci`, `nmcli`.
   * Restart the network service.

## **Backup and Restore File Permissions in Linux**

1. **Backup and Restore Steps:**
   * The best option is to create an ACL file for directories/files before making bulk permission changes:
     * Backup file permissions: `getfacl -R <dir> > permissions.acl`.
     * Restore file permissions: `setfacl --restore=permissions.acl`.
   * Restore using a VM snapshot (not ideal for production environments).
   * Rebuild the VM (a safer option for long-term stability).

## **Useful Tips Related to Disk Partitioning**

1. **Tips for Managing Disk Partitions:**
   * After attaching a new disk to a VM, use `lsblk` to check its status, then rescan using:
     * `echo 1 > /sys/block/sda/device/rescan`.
   * Increasing the size of an existing disk appends additional space to the disk without affecting the existing file system or partition.
   * Recreating the filesystem on a block device automatically formats the old one.
   * For a disk with an existing partition/filesystem, share the `.vmdk` file to another VM. After mounting, the data will remain identical.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://notes.mikaelsamvelian.com/devops-knowledge/sre/troubleshooting-common-issues.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
