Traefik Certificate Lost Fix¶
Problem¶
After restarting services, Traefik certificates are lost:
- acme.json file is empty (0 bytes)
- Self-signed certificates are being served
- Certificate resolver errors: Router uses a nonexistent certificate resolver
- HTTP challenge errors: HTTP challenge is not enabled
Root Cause¶
When acme.json is empty or invalid JSON, Traefik cannot:
1. Read the ACME account information
2. Initialize the certificate resolver
3. Request new certificates from Let's Encrypt
This causes Traefik to fall back to self-signed certificates.
Solution¶
Step 1: Fix acme.json File¶
# On VPS
cd /opt/po-platform
# Run the fix script
./infrastructure/scripts/fix-traefik-certificates.sh
# Or manually:
# 1. Ensure file exists and has correct permissions
touch infrastructure/config/traefik/acme.json
chmod 600 infrastructure/config/traefik/acme.json
# 2. Initialize with empty JSON object if file is empty
if [ ! -s infrastructure/config/traefik/acme.json ]; then
echo '{}' > infrastructure/config/traefik/acme.json
chmod 600 infrastructure/config/traefik/acme.json
fi
Step 2: Restart Traefik¶
# Stop Traefik
docker compose -f infrastructure/compose/shared.yml --env-file infrastructure/compose/.env.shared stop traefik
# Remove container to ensure fresh start
docker compose -f infrastructure/compose/shared.yml --env-file infrastructure/compose/.env.shared rm -f traefik
# Start Traefik
docker compose -f infrastructure/compose/shared.yml --env-file infrastructure/compose/.env.shared up -d traefik
# Wait a few seconds for Traefik to start
sleep 10
Step 3: Verify Certificate Resolver¶
# Check Traefik logs for certificate resolver initialization
docker logs po-traefik 2>&1 | grep -i "certificatesresolver\|acme\|letsencrypt" | tail -30
# Should see:
# - Certificate resolver being configured
# - ACME account initialization attempts
# - No "nonexistent certificate resolver" errors (after initialization)
Step 4: Wait for Certificates¶
Traefik will automatically request certificates from Let's Encrypt when:
1. A router with tls.certresolver=letsencrypt receives a request
2. The certificate resolver is properly initialized
3. The HTTP challenge can be completed
This may take 1-5 minutes for the first certificate request.
Step 5: Verify Certificates¶
# Check if certificates are being issued
docker logs po-traefik 2>&1 | grep -i "certificate\|acme" | tail -20
# Test HTTPS endpoint (should not show self-signed certificate warning)
curl -I https://qual.portugalodyssey.pt
# Or check certificate details
echo | openssl s_client -servername qual.portugalodyssey.pt -connect qual.portugalodyssey.pt:443 2>/dev/null | openssl x509 -noout -issuer -subject -dates
Prevention¶
Protect acme.json¶
# Make acme.json read-only after certificates are issued (optional)
# Traefik needs write access, so this is NOT recommended
# Instead, ensure proper backups
Backup Certificates¶
# Create backup of acme.json
cp infrastructure/config/traefik/acme.json infrastructure/config/traefik/acme.json.backup.$(date +%Y%m%d)
# Restore from backup if needed
cp infrastructure/config/traefik/acme.json.backup.YYYYMMDD infrastructure/config/traefik/acme.json
chmod 600 infrastructure/config/traefik/acme.json
Troubleshooting¶
Certificate Resolver Still Not Working¶
-
Check file permissions:
-
Check file content:
-
Check Traefik configuration:
-
Check network connectivity:
HTTP Challenge Not Working¶
If HTTP challenge fails: 1. Ensure port 80 is accessible from the internet 2. Check DNS records point to your server 3. Verify Traefik can bind to port 80 4. Check firewall rules allow port 80
Certificates Not Issuing¶
If certificates aren't being issued: 1. Check Let's Encrypt rate limits (5 certs per domain per week) 2. Verify domain ownership (DNS records) 3. Check Traefik logs for specific ACME errors 4. Ensure HTTP challenge endpoint is accessible