Storage Troubleshooting
Storage problems on production Ubuntu servers range from simple disk-full situations to hardware failures to file system corruption. The key is knowing how to quickly diagnose which problem you are dealing with, because the urgency and response differ: a full disk needs immediate cleanup, a failing disk needs immediate backup and replacement, and file system corruption needs careful offline repair.
Disk full errors
# Symptom: "No space left on device" errors
# Step 1: Identify which partition is full
df -hT
df output showing full disk
Filesystem Type Size Used Avail Use% Mounted on
/dev/sda1 ext4 50G 50G 0 100% / ← Full!
# Step 2: Find what's using the space
sudo du -h --max-depth=1 / 2>/dev/null | sort -rh | head -10
# Then drill down:
sudo du -h --max-depth=1 /var | sort -rh | head -10
sudo du -h --max-depth=1 /var/log | sort -rh | head -10
# Step 3: Quick wins to free space immediately
sudo apt clean # Clean apt cache
sudo apt autoremove # Remove unused packages
sudo journalctl --vacuum-size=200M # Limit journal to 200 MB
sudo truncate -s 0 /var/log/bigfile.log # Truncate large log file
# Step 4: Check for inode exhaustion (different cause, same symptom)
df -i # 100% IUse% = inode exhaustion
find /var/spool -type f | wc -l # Often the culprit
I/O errors and bad sectors
# Symptom: errors in /var/log/syslog or kernel messages
sudo dmesg | grep -E "error|I/O|bad sector|SCSI|ata" | tail -20
dmesg I/O error messages
[1234567.890] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[1234567.891] ata1.00: failed command: READ FPDMA QUEUED
[1234567.892] ata1.00: end_request: I/O error, dev sdb, sector 104857600
# Check SMART data (disk health monitoring)
sudo apt install -y smartmontools
sudo smartctl -a /dev/sdb | grep -E "SMART|Reallocated|Pending|Uncorrectable"
# Run a SMART short test (1-2 minutes)
sudo smartctl -t short /dev/sdb
# Wait, then check results:
sudo smartctl -a /dev/sdb | grep -A5 "SMART Self-test log"
# If SMART shows reallocated sectors or uncorrectable errors → disk is failing
# Immediately: take a backup, plan replacement
Slow disk performance
# Diagnose slow disk with iostat
sudo apt install -y sysstat
iostat -xd 2 5 # Extended stats, 2-second interval, 5 times
iostat output showing a busy disk
Device r/s w/s rMB/s wMB/s await %util
sda 2.00 45.00 0.10 2.25 125ms 98% ← 98% utilization!
sdb 5.00 1.00 2.00 0.05 8ms 12%
# Find which process is causing high disk I/O
sudo iotop -o # -o = show only processes with active I/O
# Check for a runaway process writing logs
lsof -p $(pidof myapp) | grep "\.log"
# Check if it's a background kernel process
sudo iotop -o -k -P # -k = kilobytes, -P = processes only
File system corruption
# Symptom: "structure needs cleaning" or read-only remount
dmesg | grep -E "ext4|xfs|corrupt|remounting"
# If the file system was remounted read-only automatically:
# 1. Determine which device
mount | grep "read-only"
# 2. Unmount and run fsck
sudo umount /data
sudo fsck -y /dev/sdb1 # ext4
# For XFS:
sudo xfs_repair /dev/sdb1
# For the root filesystem: boot into recovery mode, then:
# sudo fsck -y /dev/sda1
# sudo reboot
LVM troubleshooting
# Symptom: LV not available after reboot
sudo lvs # Shows 'inactive'?
sudo vgchange -ay # Activate all volume groups
sudo lvs # Should now show active
# Symptom: "Volume group not found"
sudo vgscan # Scan for volume groups on all disks
sudo vgchange -ay
# Symptom: Disk added but VG still shows same free space
sudo pvscan --cache # Rescan PVs
sudo pvs # Verify the new PV is listed
# LVM debugging: verbose scan output
sudo vgscan -v
sudo lvscan -v
Conclusion
For disk-full errors: df -h then du --max-depth=1 to drill down. For I/O errors: check dmesg and smartctl -a — reallocated sectors mean replace the disk soon. For slow I/O: iostat -xd to identify the device, then iotop -o to identify the process. For file system corruption: unmount and run fsck -y (ext4) or xfs_repair (XFS). For LVM issues: start with vgchange -ay to ensure volume groups are activated, then pvs/vgs/lvs to see the current state.
FAQ
Is Storage Troubleshooting important for Ubuntu administrators?+
Yes. It supports practical Ubuntu administration because it connects directly to server reliability, security, troubleshooting, or daily operations.
Should I practice this on a live server?+
Use a lab VM first. After you understand the command output and rollback path, apply the workflow carefully on real systems.
What should I do after reading this article?+
Run the practice commands, write down what each one shows, and continue to the next article in the Ubuntu roadmap.
Need help with Ubuntu administration?
Work directly with Muhammad Irfan Aslam for Ubuntu Server, Linux, cloud, Docker, DevOps, CI/CD, or infrastructure troubleshooting support.
Hire Me for Support