UptimeRobot — external uptime probes¶
External monitoring layer for the qualification environment. Replaces the Prometheus + Grafana stack retired in W1-2 (ADR-002).
Why external: an in-cluster monitoring stack can't tell you the cluster is down. UptimeRobot probes from outside the VPS, so any failure mode (network, Traefik, container crash, DNS) shows up as a missed probe.
Account¶
- Provider: UptimeRobot (uptimerobot.com)
- Tier: Free (sufficient for ~7 monitors at 5-min intervals + email alerts)
- Owner: store login + API key in your password manager under
po-platform / uptimerobot - API key location: My Settings → API Settings → Main API Key
Monitors¶
The po- prefix keeps everything grouped in the UptimeRobot dashboard.
Friendly names follow po-<service>-<env> so future prod monitors slot
in cleanly (po-public-fo-prod, etc.).
| Friendly name | URL | Type | Anchor / keyword |
|---|---|---|---|
po-public-fo-qual |
https://qual.portugalodyssey.pt/ |
HTTPS | 200 OK |
po-partner-console-qual |
https://partner.qual.portugalodyssey.pt/ |
HTTPS | 200 OK |
po-admin-console-qual |
https://admin.qual.portugalodyssey.pt/ |
HTTPS | 200 OK |
po-cms-qual |
https://cms.qual.portugalodyssey.pt/ |
HTTPS | 200 OK (after redirect) |
po-api-gateway-qual |
https://api.qual.portugalodyssey.pt/health |
HTTPS keyword | "status":"ok" |
po-ai-service-qual |
https://ai.qual.portugalodyssey.pt/health |
HTTPS keyword | "status":"healthy" |
po-sso-qual |
https://sso.qual.portugalodyssey.pt/realms/portugal-odyssey/.well-known/openid-configuration |
HTTPS keyword | "issuer" |
Settings (apply to every monitor):
- Monitoring interval: 5 minutes (free tier minimum)
- Alert threshold: 2 consecutive failures (avoids paging on transient blips)
- Alert contacts: email only by default (Telegram/SMS require paid tier)
Provisioning¶
Option A — script (recommended)¶
The script is idempotent: existing monitors with matching friendly names are skipped. To re-provision a single monitor, delete it in the UI first.
Option B — UptimeRobot web UI¶
- Log in at uptimerobot.com
- + New monitor
- Pick HTTP(s) for the rows without keyword anchors, Keyword for the ones with a value in the table above
- Friendly name + URL exactly as in the table
- Monitoring interval: 5 minutes; threshold: 2 consecutive failures
- Alert contact: select your email contact
- Repeat for all 7 entries
Verifying alerts¶
Don't trust an unverified alert pipeline — run a deliberate fail to confirm email delivery:
ssh root@31.97.159.7 docker stop po-public-fo-qual
# Wait 10–15 minutes (2 × 5-min interval)
# An email should arrive at the configured contact
ssh root@31.97.159.7 docker start po-public-fo-qual
# A recovery email should follow within ~5 min
Skipping this step is exactly how the prior 4-month CI silence (Jan–Apr 2026) went unnoticed — silent alert pipelines are worse than no monitoring.
When to update¶
- New service deployed with a public URL → add a monitor (mirror the closest pattern in the table)
- URL renamed → delete old monitor, create new (UptimeRobot doesn't support URL edits without losing history; cheaper to recreate)
- Going to prod → duplicate every
po-*-qualrow aspo-*-prodonce prod hostnames are live
Telegram integration (optional, paid tier)¶
Free-tier UptimeRobot only does email. If you want Telegram pings (matching the project's existing Telegram channel), two options:
- Pro tier ($7/mo) unlocks webhook + Telegram alert contacts.
- Free workaround: configure UptimeRobot to email a parser address (e.g. an n8n / Zapier hook) that reposts to your bot's Telegram channel.
Defer until alert volume justifies it.