Skip to content

UptimeRobot — external uptime probes

External monitoring layer for the qualification environment. Replaces the Prometheus + Grafana stack retired in W1-2 (ADR-002).

Why external: an in-cluster monitoring stack can't tell you the cluster is down. UptimeRobot probes from outside the VPS, so any failure mode (network, Traefik, container crash, DNS) shows up as a missed probe.

Account

  • Provider: UptimeRobot (uptimerobot.com)
  • Tier: Free (sufficient for ~7 monitors at 5-min intervals + email alerts)
  • Owner: store login + API key in your password manager under po-platform / uptimerobot
  • API key location: My Settings → API Settings → Main API Key

Monitors

The po- prefix keeps everything grouped in the UptimeRobot dashboard. Friendly names follow po-<service>-<env> so future prod monitors slot in cleanly (po-public-fo-prod, etc.).

Friendly name URL Type Anchor / keyword
po-public-fo-qual https://qual.portugalodyssey.pt/ HTTPS 200 OK
po-partner-console-qual https://partner.qual.portugalodyssey.pt/ HTTPS 200 OK
po-admin-console-qual https://admin.qual.portugalodyssey.pt/ HTTPS 200 OK
po-cms-qual https://cms.qual.portugalodyssey.pt/ HTTPS 200 OK (after redirect)
po-api-gateway-qual https://api.qual.portugalodyssey.pt/health HTTPS keyword "status":"ok"
po-ai-service-qual https://ai.qual.portugalodyssey.pt/health HTTPS keyword "status":"healthy"
po-sso-qual https://sso.qual.portugalodyssey.pt/realms/portugal-odyssey/.well-known/openid-configuration HTTPS keyword "issuer"

Settings (apply to every monitor):

  • Monitoring interval: 5 minutes (free tier minimum)
  • Alert threshold: 2 consecutive failures (avoids paging on transient blips)
  • Alert contacts: email only by default (Telegram/SMS require paid tier)

Provisioning

export UPTIMEROBOT_API_KEY=u1234567-abcdef...
./infrastructure/scripts/uptimerobot-provision.sh

The script is idempotent: existing monitors with matching friendly names are skipped. To re-provision a single monitor, delete it in the UI first.

Option B — UptimeRobot web UI

  1. Log in at uptimerobot.com
  2. + New monitor
  3. Pick HTTP(s) for the rows without keyword anchors, Keyword for the ones with a value in the table above
  4. Friendly name + URL exactly as in the table
  5. Monitoring interval: 5 minutes; threshold: 2 consecutive failures
  6. Alert contact: select your email contact
  7. Repeat for all 7 entries

Verifying alerts

Don't trust an unverified alert pipeline — run a deliberate fail to confirm email delivery:

ssh root@31.97.159.7 docker stop po-public-fo-qual
# Wait 10–15 minutes (2 × 5-min interval)
# An email should arrive at the configured contact
ssh root@31.97.159.7 docker start po-public-fo-qual
# A recovery email should follow within ~5 min

Skipping this step is exactly how the prior 4-month CI silence (Jan–Apr 2026) went unnoticed — silent alert pipelines are worse than no monitoring.

When to update

  • New service deployed with a public URL → add a monitor (mirror the closest pattern in the table)
  • URL renamed → delete old monitor, create new (UptimeRobot doesn't support URL edits without losing history; cheaper to recreate)
  • Going to prod → duplicate every po-*-qual row as po-*-prod once prod hostnames are live

Telegram integration (optional, paid tier)

Free-tier UptimeRobot only does email. If you want Telegram pings (matching the project's existing Telegram channel), two options:

  1. Pro tier ($7/mo) unlocks webhook + Telegram alert contacts.
  2. Free workaround: configure UptimeRobot to email a parser address (e.g. an n8n / Zapier hook) that reposts to your bot's Telegram channel.

Defer until alert volume justifies it.