Traefik Unhealthy and Self-Signed Certificate Fix¶
Problem¶
- Traefik container is UNHEALTHY
- Self-signed certificate instead of Let's Encrypt
- Certificate resolver errors persist
Root Cause Analysis¶
The certificate resolver configuration is correct, but Traefik isn't recognizing it. This causes: - Fallback to self-signed certificates - Healthcheck failures (if related to certificate issues) - "Nonexistent certificate resolver" errors
Possible Causes¶
- ACME file doesn't exist or has wrong permissions
- Traefik can't write to acme.json (volume mount issue)
- Network connectivity to Let's Encrypt servers
- Configuration not applied (Traefik needs restart after config changes)
Fix Procedure¶
Step 1: Verify ACME File¶
# On VPS
cd /opt/po-platform
# Check if acme.json exists
ls -la infrastructure/config/traefik/acme.json
# If it doesn't exist, create it
touch infrastructure/config/traefik/acme.json
chmod 600 infrastructure/config/traefik/acme.json
# Verify permissions
ls -l infrastructure/config/traefik/acme.json
# Should show: -rw------- (600)
Step 2: Verify Volume Mount¶
# Check the volume mount in shared.yml
grep "acme.json" infrastructure/compose/shared.yml
# Should show: - ../config/traefik/acme.json:/acme.json:rw
# The :rw ensures write access
Step 3: Check Traefik Logs¶
# Look for ACME-related errors
docker logs po-traefik 2>&1 | grep -i "acme\|certificate\|letsencrypt" | tail -30
# Look for permission errors
docker logs po-traefik 2>&1 | grep -i "permission\|EACCES\|EADDRINUSE" | tail -20
Step 4: Restart Traefik¶
# Stop Traefik
docker compose -f infrastructure/compose/shared.yml --env-file infrastructure/compose/.env.shared stop traefik
# Remove container (to ensure fresh start)
docker compose -f infrastructure/compose/shared.yml --env-file infrastructure/compose/.env.shared rm -f traefik
# Start Traefik
docker compose -f infrastructure/compose/shared.yml --env-file infrastructure/compose/.env.shared up -d traefik
# Wait a few seconds
sleep 5
# Check status
docker ps | grep traefik
# Should show: Up (healthy) not (unhealthy)
Step 5: Trigger Certificate Request¶
# Make an HTTP request (will redirect to HTTPS and trigger certificate request)
curl -I http://qual.portugalodyssey.pt
# Or access via browser
# Navigate to: http://qual.portugalodyssey.pt
Step 6: Monitor Certificate Acquisition¶
# Watch Traefik logs for ACME activity
docker logs -f po-traefik 2>&1 | grep -i "acme\|certificate\|letsencrypt"
# In another terminal, check acme.json
watch -n 2 'ls -lh /opt/po-platform/infrastructure/config/traefik/acme.json'
# File should grow when certificate is obtained
Verification¶
After fixes:
# 1. Check Traefik is healthy
docker ps | grep traefik
# Should show: Up (healthy)
# 2. Check certificate (should not be self-signed)
curl -vI https://qual.portugalodyssey.pt 2>&1 | grep -i "issuer\|subject"
# Should show Let's Encrypt, not self-signed
# 3. Check certificate resolver errors are gone
docker logs po-traefik 2>&1 | grep -i "nonexistent certificate resolver" | wc -l
# Should output: 0 (no errors)
Troubleshooting¶
If Traefik Still Unhealthy¶
# Check healthcheck manually
docker exec po-traefik traefik healthcheck --ping
# Check Traefik API
curl http://localhost:8080/api/rawdata | jq '.routers' | head -20
# Check if certificate resolver is listed
curl http://localhost:8080/api/rawdata | jq '.certificatesResolvers'
If Self-Signed Certificate Persists¶
-
Check ACME file is writable:
-
Check network connectivity:
-
Check Let's Encrypt rate limits:
- Visit: https://letsencrypt.org/docs/rate-limits/
- If rate limited, wait before retrying
If Certificate Resolver Still Not Found¶
-
Verify configuration is in command:
-
Check Traefik version:
-
Try using TLS challenge instead:
Expected Timeline¶
- Immediate: Traefik should become healthy after restart
- After HTTPS request: ACME account initializes (may take 1-2 minutes)
- After certificate obtained: Self-signed certificate replaced, errors disappear