Let's Encrypt Rate Limit Issue¶
Problem¶
Traefik logs show errors like:
error="unable to generate a certificate for the domains [qual.portugalodyssey.pt]:
acme: error: 429 :: POST :: https://acme-v02.api.letsencrypt.org/acme/new-order ::
urn:ietf:params:acme:error:rateLimited :: too many certificates (5) already issued
for this exact set of identifiers in the last 168h0m0s, retry after 2025-11-29 02:54:01 UTC"
Root Cause¶
Let's Encrypt has rate limits to prevent abuse: - 5 certificates per exact set of domains per 168 hours (7 days) - This limit applies to the exact combination of domains in each certificate request
This happens when: 1. Testing/debugging certificate configuration repeatedly 2. Restarting services frequently during development 3. Changing domain configurations multiple times 4. Recreating containers that trigger new certificate requests
Impact¶
- Certificates cannot be issued until the rate limit resets
- Services will use self-signed certificates (browser warnings)
- The rate limit resets after the specified time (usually within 7 days)
Solutions¶
Option 1: Wait for Rate Limit Reset (Recommended for Production)¶
The rate limit will automatically reset after the specified time. Check the retry time in the error message:
After this time, certificates will be issued automatically when Traefik retries.
Option 2: Use Let's Encrypt Staging Environment (For Testing)¶
For development/testing, use Let's Encrypt staging environment which has higher rate limits:
Modify infrastructure/compose/shared.yml:
command:
# ... existing commands ...
- --certificatesresolvers.letsencrypt.acme.email=contact@portugalodyssey.pt
- --certificatesresolvers.letsencrypt.acme.storage=/acme.json
- --certificatesresolvers.letsencrypt.acme.httpchallenge=true
- --certificatesresolvers.letsencrypt.acme.httpchallenge.entrypoint=web
- --certificatesresolvers.letsencrypt.acme.caserver=https://acme-staging-v02.api.letsencrypt.org/directory # Add this line
Note: Staging certificates will show browser warnings (not trusted), but are useful for testing.
Option 3: Consolidate Certificates (Use SAN Certificates)¶
Instead of separate certificates for each domain, use a single certificate with multiple Subject Alternative Names (SANs). This reduces the number of certificate requests.
Example: Instead of separate certificates for:
- qual.portugalodyssey.pt
- api-qual.portugalodyssey.pt
- auth-qual.portugalodyssey.pt
Use one certificate with all three domains.
However, Traefik automatically groups domains by router, so this may already be happening. The issue is that each router configuration change triggers a new certificate request.
Option 4: Request Certificate Exemption (For Legitimate High-Volume)¶
If you have a legitimate need for more certificates, you can request an exemption: - Contact Let's Encrypt support - Explain your use case - They may grant a higher rate limit
Prevention¶
- Avoid frequent restarts during development - Use staging environment for testing
- Test certificate configuration carefully - Verify before deploying to production
- Use staging environment - For development/testing, use Let's Encrypt staging
- Monitor certificate requests - Check Traefik logs regularly
- Plan certificate changes - Batch certificate changes to avoid hitting limits
Verification¶
Check current rate limit status:
# Check Traefik logs for rate limit errors
docker logs po-traefik 2>&1 | grep -i "rateLimited\|429" | tail -20
# Check when rate limit resets
docker logs po-traefik 2>&1 | grep -i "retry after" | tail -5
Monitoring Rate Limits¶
Use the monitoring script to check current rate limit status:
# Make script executable (first time only)
chmod +x infrastructure/scripts/monitor-certificate-rate-limits.sh
# Run the monitor
./infrastructure/scripts/monitor-certificate-rate-limits.sh
This script will:
- Show all rate limit errors from Traefik logs
- Extract retry times for each affected domain set
- Count affected domains
- Check acme.json status
- Provide recommendations
Current Situation (After Git Tracking Fix)¶
Date: December 1, 2025
After fixing the root cause where acme.json was being tracked by git and reset during deployments, Traefik is now regenerating all certificates. This is a one-time event that triggers rate limits because:
acme.jsonwas reset to{}(to fix the git tracking issue)- Traefik attempts to regenerate ALL certificates at once
- Many domain sets hit the 5-certificate-per-168-hours limit
This is expected and will resolve automatically. After certificates are regenerated, they will persist because:
- acme.json is now untracked by git (added to .gitignore)
- The Makefile no longer resets the file unnecessarily
- Future deployments won't overwrite certificates
Affected Domains (December 1, 2025)¶
Multiple domain sets are hitting rate limits:
- loki.portugalodyssey.pt + loki.portugalodissey.pt
- api-qual.portugalodyssey.pt + api-qual.portugalodissey.pt
- s3-console.portugalodyssey.pt + s3-console.portugalodissey.pt
- rabbitmq.portugalodyssey.pt + rabbitmq.portugalodissey.pt
- s3.portugalodyssey.pt + s3.portugalodissey.pt
- notification-qual.portugalodyssey.pt + notification-qual.portugalodissey.pt
- prometheus.portugalodyssey.pt + prometheus.portugalodissey.pt
- payment-qual.portugalodyssey.pt + payment-qual.portugalodissey.pt
- auth-qual.portugalodyssey.pt + auth-qual.portugalodissey.pt
- db.portugalodyssey.pt + db.portugalodissey.pt
- files-qual.portugalodyssey.pt + files-qual.portugalodissey.pt
- monitoring.portugalodyssey.pt + monitoring.portugalodissey.pt
Retry times: Around 2025-12-01 15:XX:XX UTC (check logs for exact times)
Next Steps¶
- Wait for rate limit reset - Certificates will be issued automatically after the retry time
- Monitor status - Use
monitor-certificate-rate-limits.shto check progress - Verify certificates - After retry time, check
acme.jsonfor stored certificates - This is a one-time issue - After certificates are regenerated, they will persist