Troubleshooting Docker

Docker problems fall into a predictable set of categories: containers that refuse to start, networking failures between containers, volume permission errors, and hosts running out of disk space from accumulated images and logs. Knowing the right diagnostic commands for each category lets you resolve issues in minutes rather than hours of guessing. This guide covers the most common production Docker problems with real diagnostic commands and expected output.

Container fails to start

# Step 1: Check why a container exited:
docker ps -a    # Shows all containers including exited ones
docker inspect mycontainer | grep -A3 '"State"'

docker inspect state output

"State": {
    "Status": "exited",
    "ExitCode": 1,
    "Error": "",
    "OOMKilled": false
}

# ExitCode 1 = application error. Read the logs:
docker logs mycontainer
docker logs mycontainer --tail 50

# OOMKilled = true means container was killed for exceeding memory limit
# Fix: increase memory limit or find the memory leak

# Exit code 137 = killed by signal (SIGKILL), usually OOM or manual kill
# Exit code 126 = permission denied on entrypoint binary
# Exit code 127 = entrypoint command not found in container PATH

# Test a container interactively to debug startup:
docker run --rm -it --entrypoint /bin/bash myimage:latest
# Now you can inspect the filesystem and test the startup command manually

Container networking problems

# Symptom: container A cannot reach container B by name
# Diagnosis: check both containers are on the same network
docker inspect containerA | grep -A20 '"Networks"'
docker inspect containerB | grep -A20 '"Networks"'

# If they are on the default bridge (not a user-defined network):
# → Default bridge does NOT support DNS by container name
# Fix: create a user-defined network and reconnect containers
docker network create mynet
docker network connect mynet containerA
docker network connect mynet containerB

# Test connectivity between containers:
docker exec containerA ping containerB
docker exec containerA curl http://containerB:8080/health

# Symptom: container cannot reach the internet
docker exec mycontainer ping 8.8.8.8

Expected: ping 8.8.8.8 succeeds

PING 8.8.8.8 (8.8.8.8): 56 data bytes
64 bytes from 8.8.8.8: icmp_seq=0 ttl=118 time=2.4 ms
# If this FAILS, check iptables rules and ip_forward:
# sudo sysctl net.ipv4.ip_forward   (should be = 1)
# sudo iptables -L DOCKER-USER -n   (look for DROP rules blocking traffic)

Volume permission errors

# Symptom: application inside container cannot write to mounted volume
# Example error in logs: "Permission denied: /data/app.log"

# Diagnosis: check UID/GID mismatch
docker exec mycontainer id
# uid=1000(appuser) gid=1000(appuser)

# Check volume directory ownership on host:
ls -la /var/lib/docker/volumes/myapp-data/_data/
# drwxr-xr-x 2 root root 4096 Jun 9 14:30 .
# → Owned by root but container runs as uid=1000 → permission denied

# Fix: change ownership on host to match container UID:
sudo chown -R 1000:1000 /var/lib/docker/volumes/myapp-data/_data/

# Alternatively: add init container or entrypoint script that fixes permissions:
# In Dockerfile entrypoint.sh:
# chown -R appuser:appuser /data && exec gosu appuser "$@"

Disk and resource exhaustion

# Symptom: "no space left on device" or Docker daemon errors
# Diagnosis: check Docker disk usage:
docker system df

docker system df output

TYPE            TOTAL     ACTIVE    SIZE      RECLAIMABLE
Images          47        12        18.3GB    14.1GB (77%)
Containers      8         3         2.1GB     890MB (42%)
Local Volumes   23        8         45.2GB    12.4GB (27%)
Build Cache     156       0         8.7GB     8.7GB

# Safe cleanup (removes only unused resources):
docker image prune -a      # Remove all unused images (not just dangling)
docker container prune     # Remove all stopped containers
docker volume prune        # Remove volumes not attached to any container
docker builder prune -a    # Clear build cache

# Nuclear option — remove EVERYTHING not currently running:
docker system prune -a --volumes
# WARNING: This removes all unused images, containers, volumes, and networks

⚠️ WARNING: docker system prune -a --volumes deletes ALL volumes not attached to a running container. This includes databases with persistent data. Always run docker system df and identify what's safe to remove before using this command.

Conclusion

The diagnostic workflow for any Docker issue: (1) docker ps -a to see container status and exit codes; (2) docker logs to see what the application printed before dying; (3) docker inspect to see the full container configuration including network, volume mounts, and resource limits; (4) docker system df to check disk usage before doing any cleanup. Running docker system prune on a schedule (weekly via cron) prevents disk exhaustion from accumulating build caches and stopped containers.

FAQ

Is Troubleshooting Docker important for Ubuntu administrators?+

Yes. It supports practical Ubuntu administration because it connects directly to server reliability, security, troubleshooting, or daily operations.

Should I practice this on a live server?+

Use a lab VM first. After you understand the command output and rollback path, apply the workflow carefully on real systems.

What should I do after reading this article?+

Run the practice commands, write down what each one shows, and continue to the next article in the Ubuntu roadmap.

Need help with Ubuntu administration?

Work directly with Muhammad Irfan Aslam for Ubuntu Server, Linux, cloud, Docker, DevOps, CI/CD, or infrastructure troubleshooting support.

Hire Me for Support

Troubleshooting Docker