Skip to content

Traefik Certificate Lost Fix

Problem

After restarting services, Traefik certificates are lost: - acme.json file is empty (0 bytes) - Self-signed certificates are being served - Certificate resolver errors: Router uses a nonexistent certificate resolver - HTTP challenge errors: HTTP challenge is not enabled

Root Cause

When acme.json is empty or invalid JSON, Traefik cannot: 1. Read the ACME account information 2. Initialize the certificate resolver 3. Request new certificates from Let's Encrypt

This causes Traefik to fall back to self-signed certificates.

Solution

Step 1: Fix acme.json File

# On VPS
cd /opt/po-platform

# Run the fix script
./infrastructure/scripts/fix-traefik-certificates.sh

# Or manually:
# 1. Ensure file exists and has correct permissions
touch infrastructure/config/traefik/acme.json
chmod 600 infrastructure/config/traefik/acme.json

# 2. Initialize with empty JSON object if file is empty
if [ ! -s infrastructure/config/traefik/acme.json ]; then
    echo '{}' > infrastructure/config/traefik/acme.json
    chmod 600 infrastructure/config/traefik/acme.json
fi

Step 2: Restart Traefik

# Stop Traefik
docker compose -f infrastructure/compose/shared.yml --env-file infrastructure/compose/.env.shared stop traefik

# Remove container to ensure fresh start
docker compose -f infrastructure/compose/shared.yml --env-file infrastructure/compose/.env.shared rm -f traefik

# Start Traefik
docker compose -f infrastructure/compose/shared.yml --env-file infrastructure/compose/.env.shared up -d traefik

# Wait a few seconds for Traefik to start
sleep 10

Step 3: Verify Certificate Resolver

# Check Traefik logs for certificate resolver initialization
docker logs po-traefik 2>&1 | grep -i "certificatesresolver\|acme\|letsencrypt" | tail -30

# Should see:
# - Certificate resolver being configured
# - ACME account initialization attempts
# - No "nonexistent certificate resolver" errors (after initialization)

Step 4: Wait for Certificates

Traefik will automatically request certificates from Let's Encrypt when: 1. A router with tls.certresolver=letsencrypt receives a request 2. The certificate resolver is properly initialized 3. The HTTP challenge can be completed

This may take 1-5 minutes for the first certificate request.

Step 5: Verify Certificates

# Check if certificates are being issued
docker logs po-traefik 2>&1 | grep -i "certificate\|acme" | tail -20

# Test HTTPS endpoint (should not show self-signed certificate warning)
curl -I https://qual.portugalodyssey.pt

# Or check certificate details
echo | openssl s_client -servername qual.portugalodyssey.pt -connect qual.portugalodyssey.pt:443 2>/dev/null | openssl x509 -noout -issuer -subject -dates

Prevention

Protect acme.json

# Make acme.json read-only after certificates are issued (optional)
# Traefik needs write access, so this is NOT recommended
# Instead, ensure proper backups

Backup Certificates

# Create backup of acme.json
cp infrastructure/config/traefik/acme.json infrastructure/config/traefik/acme.json.backup.$(date +%Y%m%d)

# Restore from backup if needed
cp infrastructure/config/traefik/acme.json.backup.YYYYMMDD infrastructure/config/traefik/acme.json
chmod 600 infrastructure/config/traefik/acme.json

Troubleshooting

Certificate Resolver Still Not Working

  1. Check file permissions:

    ls -l infrastructure/config/traefik/acme.json
    # Should show: -rw------- (600)
    

  2. Check file content:

    cat infrastructure/config/traefik/acme.json
    # Should show: {} (empty JSON object) or valid JSON with ACME data
    

  3. Check Traefik configuration:

    docker exec po-traefik traefik version
    docker exec po-traefik cat /etc/traefik/traefik.yml 2>/dev/null || echo "Using command-line args"
    

  4. Check network connectivity:

    # Traefik needs to reach Let's Encrypt servers
    docker exec po-traefik wget -q --spider https://acme-v02.api.letsencrypt.org/directory
    

HTTP Challenge Not Working

If HTTP challenge fails: 1. Ensure port 80 is accessible from the internet 2. Check DNS records point to your server 3. Verify Traefik can bind to port 80 4. Check firewall rules allow port 80

Certificates Not Issuing

If certificates aren't being issued: 1. Check Let's Encrypt rate limits (5 certs per domain per week) 2. Verify domain ownership (DNS records) 3. Check Traefik logs for specific ACME errors 4. Ensure HTTP challenge endpoint is accessible