Skip to content

Runbook: enable MinIO as gitlab-runner distributed cache backend

Status: ready to execute. Operator-gated. Tracks: Riff #147. Effort: ~30 min operator time + 1 verification cycle. Prerequisite: dev-laptop gitlab-runner must be running (it is, since 2026-05-19).

Why

After the 2026-05-17/18 incident — cache: clauses hanging every CI job for 1h on the (now-retired) srv884655 runner — sC stripped all cache: clauses from .gitlab-ci.yml. The cold-cache cost is ~30–60s per service npm ci. Acceptable but optimisable: a real distributed cache backend would let us reintroduce caches safely.

Migrated to dev-laptop runner 2026-05-19 (per ci-runner-architecture.md). Cache backend choice now: MinIO on qual VPS (already running for file-service uploads).

Steps

1. Provision MinIO bucket + access key

SSH to qual VPS (po-platform@31.97.159.7). MinIO admin UI is at https://minio.qual.portugalodyssey.pt (or via the mc CLI in the container).

# Inside the minio-qual container (or with mc configured against the alias):
docker exec -it po-minio-qual mc alias set local http://localhost:9000 \
  "$MINIO_ROOT_USER" "$MINIO_ROOT_PASSWORD"

# Create bucket + lifecycle (7-day delete on objects).
docker exec po-minio-qual mc mb local/gitlab-cache
docker exec po-minio-qual mc ilm add local/gitlab-cache \
  --expire-days 7

# Create a dedicated access key for the runner. Use the admin user policy
# editor in the UI to scope to readwrite-on-gitlab-cache only:
#   Policy name: gitlab-cache-rw
#   Resource:    arn:aws:s3:::gitlab-cache/*
#   Actions:     s3:GetObject, s3:PutObject, s3:DeleteObject, s3:ListBucket
docker exec po-minio-qual mc admin user add local gitlab-runner-cache \
  "$(openssl rand -hex 16)"
docker exec po-minio-qual mc admin policy attach local gitlab-cache-rw \
  --user gitlab-runner-cache

Capture the access key + secret — they'll be pasted into the runner config below. Store in your password manager + ~/.secrets/gitlab-runner-cache.env on the dev laptop.

2. Edit /etc/gitlab-runner/config.toml on the dev laptop

sudo cp /etc/gitlab-runner/config.toml \
  /etc/gitlab-runner/config.toml.bak.$(date +%Y-%m-%d-pre-cache)
sudo nano /etc/gitlab-runner/config.toml

Add (or replace) the [runners.cache] block under the relevant [[runners]] entry. The runner tag is dev-runner:

[[runners]]
  name = "jmeireles-Latitude-5401"
  url = "https://gitlab.com/"
  token = "..."  # unchanged
  executor = "docker"
  # ... other existing settings ...

  [runners.cache]
    Type = "s3"
    Path = "po-platform"   # namespace inside the bucket
    Shared = false          # this runner is single-project
    [runners.cache.s3]
      ServerAddress = "minio.qual.portugalodyssey.pt"
      AccessKey = "<from step 1>"
      SecretKey = "<from step 1>"
      BucketName = "gitlab-cache"
      BucketLocation = "us-east-1"  # MinIO ignores; field is required
      Insecure = false               # MinIO is fronted by Traefik + LE

Validate and restart:

sudo gitlab-runner verify
sudo systemctl restart gitlab-runner
sudo journalctl -u gitlab-runner -n 50 --no-pager

You should see Configuration loaded with no errors.

3. Smoke test

Push a tiny change to frontends/public-fo/ (e.g. a no-op comment edit) and watch lint-frontend in the pipeline:

git checkout -b sA/cache-smoke main
echo "// cache smoke" >> frontends/public-fo/src/main.tsx
git commit -am "[sA] chore(ci): smoke MinIO cache backend"
git push -u origin sA/cache-smoke

In the GitLab CI logs for lint-frontend:

  • First push: Checking cache for sA/cache-smoke-protected...No URL provided (no cache yet) → Created cache.
  • Second push (push any trivial edit): logs should show Successfully extracted cache and skip npm ci's tarball download. Time savings should be visible.

If the first run hangs again on Checking cache... — abort and roll back via sudo systemctl stop gitlab-runner && sudo cp /etc/gitlab-runner/config.toml.bak.<date> /etc/gitlab-runner/config.toml && sudo systemctl start gitlab-runner. Then file a follow-up Riff with the journalctl output.

4. Reintroduce per-job caches (after smoke is green)

In .gitlab-ci/frontend.yml, add to the relevant jobs. Suggested set:

.frontend_cache_template: &frontend_cache_template
  cache:
    key:
      files:
        - frontends/public-fo/package-lock.json
    paths:
      - frontends/public-fo/node_modules/

lint-frontend:
  <<: *frontend_cache_template
  # ... rest unchanged

type-check-frontend:
  <<: *frontend_cache_template
  # ...

test-frontend-public-fo:
  <<: *frontend_cache_template
  # ...

Mirror for partner-console and admin-console (replace path with the appropriate frontends/<app>/). For test-services-integration, use a multi-path cache:

test-services-integration:
  cache:
    key:
      files:
        - package-lock.json
    paths:
      - services/*/node_modules/
      - .npm/

Do NOT reintroduce a project-global cache: clause at the top of .gitlab-ci.yml. That's the structural risk that bit us 2026-05-17 — if the cache backend is misconfigured, every job inheriting the global will hang. Per-job caches isolate the blast radius.

5. Update the CI doc

After the smoke is green: - Add a one-paragraph note to ci-runner-architecture.md under "Operations" pointing here. - Flip Riff #147done.

Rollback plan

If the cache backend starts hanging jobs again (1h timeouts return):

sudo systemctl stop gitlab-runner
sudo cp /etc/gitlab-runner/config.toml.bak.<date> /etc/gitlab-runner/config.toml
sudo systemctl start gitlab-runner

Then revert the per-job cache: blocks in .gitlab-ci/frontend.yml. The pipeline returns to the current cold-cache state.

Acceptance criteria (Riff #147)

  • gitlab-runner config points to MinIO gitlab-cache bucket on minio.qual.portugalodyssey.pt.
  • ✅ Two sequential pipelines on the same branch show the second hitting cached node_modules/ (download + extract OK).
  • ✅ Per-job cache: clauses reintroduced without 1h timeouts.

Cross-references

  • Riff #147: tasks-prod
  • CI runner architecture: ci-runner-architecture.md
  • Incident context: VPS pathology + cache hang sequence — docs/ai/sessions/active.md § "VPS pathology" (2026-05-18, sC)
  • gitlab-runner cache config docs: https://docs.gitlab.com/runner/configuration/advanced-configuration.html#the-runnerscaches3-section