Disk Full Recovery

A full disk is a critical emergency: databases cannot write transaction logs, web servers cannot write access logs (and may stop serving requests), applications cannot write temporary files, and even simple commands like sudo may fail. Recovery requires identifying the culprit quickly, freeing enough space to get services working, and then implementing monitoring to prevent recurrence. Every minute a disk is at 100% risks data corruption or complete service failure.

Immediate impact of disk full

Services affected when disk is 100% full:

  MySQL/PostgreSQL:
    → "ERROR 1114 (HY000): The table is full"
    → InnoDB: cannot write ib_logfile (transaction log) → crash

  nginx/Apache:
    → Cannot write access.log → may stop serving
    → PHP cannot write session files → login failures

  System:
    → cron jobs fail silently
    → User home directories cannot receive files
    → /tmp fills → many applications break

Finding what is using disk space

# Step 1: confirm which filesystem is full:
df -h

df -h output showing full disk

Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        50G   50G     0 100% /
# /dev/sda1 is at 100%

# Step 2: find the biggest directories:
sudo du -sh /* 2>/dev/null | sort -rh | head -10
sudo du -sh /var/* 2>/dev/null | sort -rh | head -10
sudo du -sh /var/log/* 2>/dev/null | sort -rh | head -10

du output showing space consumers

42G /var
38G /var/log
35G /var/log/nginx
34G /var/log/nginx/access.log.1
# Unrotated nginx log is consuming 34GB!

# Alternative: ncdu (interactive disk usage explorer)
sudo apt install -y ncdu
sudo ncdu /    # Navigate with arrow keys, press 'd' to delete

Emergency cleanup

# Safe emergency cleanup options (ordered by risk):

# 1. Clean apt cache (safe, always):
sudo apt clean    # Removes downloaded .deb files
sudo apt autoremove --purge

# 2. Remove old logs (safe if you don't need them):
sudo journalctl --vacuum-size=100M    # Keep only last 100MB of journal logs
sudo find /var/log -name "*.gz" -type f -delete    # Delete compressed old logs

# 3. Truncate a large log file (preserves the file, empties it):
sudo truncate -s 0 /var/log/nginx/access.log
# Do NOT delete log files while the process has them open — truncate is safer

# 4. Clean Docker if present:
sudo docker system prune -f    # Remove stopped containers, unused images

⚠️ WARNING: Never use rm on a log file that a running service has open. The file descriptor stays open and the space is not freed — the file still consumes disk space despite appearing deleted. Use truncate -s 0 /path/to/file or send a log rotation signal (kill -USR1 $(cat /var/run/nginx.pid)) to properly close and rotate the file.

# After freeing space — force log rotation immediately:
sudo logrotate -f /etc/logrotate.d/nginx
sudo logrotate -f /etc/logrotate.d/mysql-server

# Verify space is now available:
df -h

Prevention and monitoring

# Alert when disk exceeds 80%:
cat > /usr/local/bin/disk-monitor.sh << 'EOF'
#!/bin/bash
THRESHOLD=80
ALERT_EMAIL="admin@example.com"
df -h | tail -n +2 | while IFS= read -r line; do
    usage=$(echo "$line" | awk '{print $5}' | tr -d '%')
    mount=$(echo "$line" | awk '{print $6}')
    if [[ "$usage" -gt "$THRESHOLD" ]]; then
        echo "DISK ALERT: $mount is ${usage}% full on $(hostname)" |             mail -s "[ALERT] Disk ${usage}% on $(hostname)" "$ALERT_EMAIL"
    fi
done
EOF
chmod +x /usr/local/bin/disk-monitor.sh

# Run every 15 minutes via cron:
# */15 * * * * /usr/local/bin/disk-monitor.sh

# Configure log rotation with size limits:
# /etc/logrotate.d/nginx: weekly, rotate 4, compress, size 100M

Conclusion

The three biggest disk space consumers on production servers are typically: unrotated application logs (multi-GB nginx/Apache access logs), MySQL binary logs (/var/lib/mysql/mysql-bin.XXXXXX), and Docker images and layer caches. Configure logrotate for every application that writes logs and set expire_logs_days = 7 in MySQL's my.cnf. Alert at 80% usage to have time to investigate before hitting 100% — a disk that goes from 80% to 100% during a traffic spike can take down a production service before anyone can react.

FAQ

Is Disk Full Recovery important for Ubuntu administrators?+

Yes. It supports practical Ubuntu administration because it connects directly to server reliability, security, troubleshooting, or daily operations.

Should I practice this on a live server?+

Use a lab VM first. After you understand the command output and rollback path, apply the workflow carefully on real systems.

What should I do after reading this article?+

Run the practice commands, write down what each one shows, and continue to the next article in the Ubuntu roadmap.

Need help with Ubuntu administration?

Work directly with Muhammad Irfan Aslam for Ubuntu Server, Linux, cloud, Docker, DevOps, CI/CD, or infrastructure troubleshooting support.

Hire Me for Support

Disk Full Recovery