| name | docker-troubleshooting |
|---|---|
| description | Diagnose and fix common Docker problems including port conflicts, permission errors, disk space, networking issues, and build failures. |
| standards-version | 1.7.0 |
Use this skill when:
- A container fails to start or exits immediately
- Port binding errors occur (address already in use)
- Permission denied errors on the Docker socket or inside containers
- Docker runs out of disk space
- Image pulls fail with auth or network errors
- DNS resolution fails inside containers
- Build context is unexpectedly large or builds fail
- Docker Compose services fail dependency checks
- The Docker daemon won't start or is unresponsive
- Containers can't communicate across networks
- Error message or symptom description
docker logsoutput from the failing container (if applicable)- OS and Docker version (
docker version) - Whether running Docker Desktop, Docker Engine, or rootless mode
- Compose file if using Docker Compose
- Identify the symptom - categorize the problem (startup failure, networking, disk, permissions, build).
- Gather diagnostics - collect logs, inspect containers, check system state.
- Isolate the cause - use targeted commands to narrow down the root cause.
- Apply the fix - make the minimal change needed to resolve the issue.
- Verify the fix - confirm the container or build works as expected.
- Prevent recurrence - add monitoring,
.dockerignore, health checks, or resource limits.
- Docker troubleshooting guide
- Docker Desktop troubleshooting
- Docker logs reference
- Docker system prune
- Networking troubleshooting
User: I get "port is already allocated" when starting a container.
Agent: Something else is using that port. Find and stop it:
# Find what's using port 3000 (Linux/macOS)
lsof -i :3000
# or
ss -tlnp | grep 3000
# On Windows PowerShell
netstat -ano | findstr :3000
# Then identify the PID and kill it
taskkill /PID <pid> /FOr check if a stopped Docker container still holds the port:
# List all containers including stopped ones
docker ps -a --filter "publish=3000"
# Remove the stopped container holding the port
docker rm <container_id>If you just need a different port, remap it:
docker run -d -p 3001:3000 my-app:latestUser: I get "permission denied" trying to run Docker commands.
Agent: This usually means your user isn't in the docker group:
# Add your user to the docker group
sudo usermod -aG docker $USER
# Apply the change (log out and back in, or use newgrp)
newgrp docker
# Verify
docker psIf you get permission denied on files inside the container, it's a UID mismatch between the host and container user:
# Check what user the container runs as
docker inspect --format '{{.Config.User}}' my-container
# Fix ownership on bind-mounted directories
sudo chown -R 1000:1000 ./app-data
# Or run the container with your host UID
docker run -u "$(id -u):$(id -g)" -v ./data:/data my-app:latestUser: Docker is eating all my disk space.
Agent: Check what's consuming space, then prune:
# See disk usage breakdown
docker system df
docker system df -v # verbose, per-image/container/volume
# Remove unused data (stopped containers, dangling images, unused networks)
docker system prune
# Nuclear option - remove everything unused including named volumes
docker system prune -a --volumesFor targeted cleanup:
# Remove dangling images only
docker image prune
# Remove all unused images (not just dangling)
docker image prune -a
# Remove stopped containers
docker container prune
# Remove unused volumes (careful - data loss)
docker volume prune
# Remove build cache
docker builder prune
docker builder prune -a # remove all build cacheSet up automatic cleanup in cron or a scheduled task:
# Weekly cleanup of images older than 7 days
0 3 * * 0 docker image prune -a --filter "until=168h" -fUser: My container starts and immediately exits.
Agent: Check the exit code and logs:
# See the exit code
docker ps -a --filter "name=my-container" --format "{{.Status}}"
# Get the logs
docker logs my-container
# If the container was removed, run interactively to see the error
docker run -it my-app:latestCommon exit codes:
- Exit 0 - the process finished successfully. Your CMD probably isn't a long-running process. Use
tail -f /dev/nullfor debugging or fix the entrypoint. - Exit 1 - application error. Check the logs.
- Exit 127 - command not found. The binary in CMD/ENTRYPOINT doesn't exist in the image.
- Exit 137 - killed by OOM (out of memory). Increase memory limit or fix a memory leak.
- Exit 139 - segfault. A native binary crashed.
# Debug by overriding the entrypoint
docker run -it --entrypoint /bin/sh my-app:latest
# Check if the binary exists
which node # or whatever your CMD uses
# Check if config files are present
ls -la /app/User: My build context is huge and builds are slow.
Agent: Docker sends the entire build directory to the daemon. Add a .dockerignore:
# .dockerignore
node_modules
.git
.env
*.log
dist
coverage
.next
__pycache__
*.pyc
.venv
Check the context size:
# See what Docker is sending
docker build --no-cache -t test . 2>&1 | head -5
# "Sending build context to Docker daemon 1.2GB" means trouble
# Find large files in the build directory
du -sh * | sort -rh | head -20User: DNS doesn't work inside my containers.
Agent: Test DNS resolution inside the container:
docker run --rm alpine nslookup google.com
docker run --rm alpine ping -c 2 google.comIf it fails, try these fixes in order:
# 1. Use Google DNS explicitly
docker run --dns 8.8.8.8 --rm alpine nslookup google.com
# 2. Check Docker daemon DNS settings
docker info | grep -i dns
# 3. Configure daemon DNS permanently in /etc/docker/daemon.json
{
"dns": ["8.8.8.8", "8.8.4.4"]
}
# Then restart Docker: sudo systemctl restart dockerOn Docker Desktop for Windows/macOS, DNS issues can be caused by VPN software. Try restarting Docker Desktop or switching the network mode.
| Tool | Purpose |
|---|---|
docker_containerLogs |
Get logs from a failing container without shell access |
docker_inspectContainer |
Check exit code, state, config, mounts, and network settings |
docker_listContainers |
Find stopped/crashed containers and their statuses |
docker_inspectImage |
Verify entrypoint, cmd, user, and exposed ports |
docker_listNetworks |
Check network configuration when containers can't communicate |
docker_diskUsage |
Identify what's consuming disk space |
docker_systemInfo |
Check daemon status, driver, and runtime configuration |
docker_listVolumes |
Find orphaned volumes consuming disk space |
Diagnosing a crashed container:
1. docker_listContainers (all=true) - find the container and its status
2. docker_containerLogs - read the error output
3. docker_inspectContainer - check:
- State.ExitCode (137=OOM, 127=missing binary, 1=app error)
- State.OOMKilled (true if out of memory)
- HostConfig.Memory (check if limits are too low)
- Config.Cmd and Config.Entrypoint (verify command is correct)
- Mounts (verify bind mounts exist on host)
4. docker_inspectImage - verify the image has the expected CMD
Diagnosing networking issues:
1. docker_listNetworks - check which networks exist
2. docker_inspectContainer on both containers - verify:
- NetworkSettings.Networks (are they on the same network?)
- NetworkSettings.IPAddress (can they reach each other?)
3. docker_systemInfo - check if the network driver is working
- Checking running containers only -
docker pshides stopped containers. Always usedocker ps -aor passall=truetodocker_listContainers. - Ignoring the exit code - the exit code tells you the category of failure. Always check it before diving into logs.
- Pruning volumes without checking -
docker volume prunedeletes data permanently. List volumes and inspect them first. - Binding to 0.0.0.0 unintentionally -
-p 3000:3000binds on all interfaces. Use-p 127.0.0.1:3000:3000for local-only access. - VPN breaking Docker networking - VPNs often override DNS and routing. Try disconnecting the VPN to confirm it's the cause.
- WSL2 memory ballooning - Docker Desktop on Windows uses WSL2, which can consume unbounded memory. Add a
.wslconfigfile to limit it:
# %USERPROFILE%\.wslconfig
[wsl2]
memory=4GB
processors=2- Compose service names as hostnames - in Docker Compose, services reach each other by service name (not container name).
db, notmy-project-db-1. - Stale containers blocking port allocation - a container in "Created" or "Exited" state can still hold a port if it wasn't properly cleaned up. Remove it with
docker rm. - Build cache causing stale results - if a
COPYlayer hasn't changed but the file contents have, trydocker build --no-cacheto rule out cache issues. - ARM vs x86 image mismatch - on Apple Silicon Macs, pulling an amd64 image runs under emulation and may fail or be slow. Use
--platform linux/arm64or multi-arch images. - Docker Desktop needing a restart - many Docker Desktop issues are resolved by restarting the application or resetting the Docker engine to factory defaults.