Skip to content

Let's Encrypt Rate Limit Issue

Problem

Traefik logs show errors like:

error="unable to generate a certificate for the domains [qual.portugalodyssey.pt]: 
acme: error: 429 :: POST :: https://acme-v02.api.letsencrypt.org/acme/new-order :: 
urn:ietf:params:acme:error:rateLimited :: too many certificates (5) already issued 
for this exact set of identifiers in the last 168h0m0s, retry after 2025-11-29 02:54:01 UTC"

Root Cause

Let's Encrypt has rate limits to prevent abuse: - 5 certificates per exact set of domains per 168 hours (7 days) - This limit applies to the exact combination of domains in each certificate request

This happens when: 1. Testing/debugging certificate configuration repeatedly 2. Restarting services frequently during development 3. Changing domain configurations multiple times 4. Recreating containers that trigger new certificate requests

Impact

  • Certificates cannot be issued until the rate limit resets
  • Services will use self-signed certificates (browser warnings)
  • The rate limit resets after the specified time (usually within 7 days)

Solutions

The rate limit will automatically reset after the specified time. Check the retry time in the error message:

retry after 2025-11-29 02:54:01 UTC

After this time, certificates will be issued automatically when Traefik retries.

Option 2: Use Let's Encrypt Staging Environment (For Testing)

For development/testing, use Let's Encrypt staging environment which has higher rate limits:

Modify infrastructure/compose/shared.yml:

command:
  # ... existing commands ...
  - --certificatesresolvers.letsencrypt.acme.email=contact@portugalodyssey.pt
  - --certificatesresolvers.letsencrypt.acme.storage=/acme.json
  - --certificatesresolvers.letsencrypt.acme.httpchallenge=true
  - --certificatesresolvers.letsencrypt.acme.httpchallenge.entrypoint=web
  - --certificatesresolvers.letsencrypt.acme.caserver=https://acme-staging-v02.api.letsencrypt.org/directory  # Add this line

Note: Staging certificates will show browser warnings (not trusted), but are useful for testing.

Option 3: Consolidate Certificates (Use SAN Certificates)

Instead of separate certificates for each domain, use a single certificate with multiple Subject Alternative Names (SANs). This reduces the number of certificate requests.

Example: Instead of separate certificates for: - qual.portugalodyssey.pt - api-qual.portugalodyssey.pt - auth-qual.portugalodyssey.pt

Use one certificate with all three domains.

However, Traefik automatically groups domains by router, so this may already be happening. The issue is that each router configuration change triggers a new certificate request.

Option 4: Request Certificate Exemption (For Legitimate High-Volume)

If you have a legitimate need for more certificates, you can request an exemption: - Contact Let's Encrypt support - Explain your use case - They may grant a higher rate limit

Prevention

  1. Avoid frequent restarts during development - Use staging environment for testing
  2. Test certificate configuration carefully - Verify before deploying to production
  3. Use staging environment - For development/testing, use Let's Encrypt staging
  4. Monitor certificate requests - Check Traefik logs regularly
  5. Plan certificate changes - Batch certificate changes to avoid hitting limits

Verification

Check current rate limit status:

# Check Traefik logs for rate limit errors
docker logs po-traefik 2>&1 | grep -i "rateLimited\|429" | tail -20

# Check when rate limit resets
docker logs po-traefik 2>&1 | grep -i "retry after" | tail -5

Monitoring Rate Limits

Use the monitoring script to check current rate limit status:

# Make script executable (first time only)
chmod +x infrastructure/scripts/monitor-certificate-rate-limits.sh

# Run the monitor
./infrastructure/scripts/monitor-certificate-rate-limits.sh

This script will: - Show all rate limit errors from Traefik logs - Extract retry times for each affected domain set - Count affected domains - Check acme.json status - Provide recommendations

Current Situation (After Git Tracking Fix)

Date: December 1, 2025

After fixing the root cause where acme.json was being tracked by git and reset during deployments, Traefik is now regenerating all certificates. This is a one-time event that triggers rate limits because:

  1. acme.json was reset to {} (to fix the git tracking issue)
  2. Traefik attempts to regenerate ALL certificates at once
  3. Many domain sets hit the 5-certificate-per-168-hours limit

This is expected and will resolve automatically. After certificates are regenerated, they will persist because: - acme.json is now untracked by git (added to .gitignore) - The Makefile no longer resets the file unnecessarily - Future deployments won't overwrite certificates

Affected Domains (December 1, 2025)

Multiple domain sets are hitting rate limits: - loki.portugalodyssey.pt + loki.portugalodissey.pt - api-qual.portugalodyssey.pt + api-qual.portugalodissey.pt - s3-console.portugalodyssey.pt + s3-console.portugalodissey.pt - rabbitmq.portugalodyssey.pt + rabbitmq.portugalodissey.pt - s3.portugalodyssey.pt + s3.portugalodissey.pt - notification-qual.portugalodyssey.pt + notification-qual.portugalodissey.pt - prometheus.portugalodyssey.pt + prometheus.portugalodissey.pt - payment-qual.portugalodyssey.pt + payment-qual.portugalodissey.pt - auth-qual.portugalodyssey.pt + auth-qual.portugalodissey.pt - db.portugalodyssey.pt + db.portugalodissey.pt - files-qual.portugalodyssey.pt + files-qual.portugalodissey.pt - monitoring.portugalodyssey.pt + monitoring.portugalodissey.pt

Retry times: Around 2025-12-01 15:XX:XX UTC (check logs for exact times)

Next Steps

  1. Wait for rate limit reset - Certificates will be issued automatically after the retry time
  2. Monitor status - Use monitor-certificate-rate-limits.sh to check progress
  3. Verify certificates - After retry time, check acme.json for stored certificates
  4. This is a one-time issue - After certificates are regenerated, they will persist

References