Skip to content

ADR 003: Managed Database Migrations (dbmate)

Date: 2026-04-26 Status: Implemented

Context

Until this ADR, schemas reached the database via Docker bind-mounts: infrastructure/scripts/*.sql files were mounted into /docker-entrypoint-initdb.d/ on the postgres container. This pattern has one decisive failure mode — the entrypoint scripts only run on first init of an empty volume. Once postgres has booted once, any new SQL file added to that directory is silently ignored.

Wave 1 demonstrated the gap twice:

  • W1-4 introduced rag-vectors-schema.sql. Existing dev/qual/prod Postgres volumes never see it; the rag database gets created (via the parallel init-multiple-databases.sh) but stays empty until an operator manually runs psql -d rag -f rag-vectors-schema.sql.
  • The calendar service has 11 numbered SQL files (calendar-migration-002 through 011) that grew over time. Each new one had to be applied to existing dbs by hand. One of them (calendar-migration-009) ended up in services/calendar-service/migrations/ instead of infrastructure/scripts/ — an out-of-place file the community used to track manual application.

CLAUDE.md flagged this as a known gap: "There is no migration runner... CI does not apply migrations... high risk of code/schema drift at deploy time."

The product is still pre-launch — qual has only test data; prod has no real customers — so this is the right moment to fix it without paying the cost of preserving live data.

Decision

Adopt dbmate as the migration runner.

Why dbmate

  • Single static binary (Go), runs as a Docker image (amacneil/dbmate:latest). No language coupling — works equally well for Node, Python, and Strapi services that share the same Postgres.
  • Raw SQL files, not a DSL or ORM. Matches po-platform's existing convention (no ORM in any service).
  • Up + down migrations in one file with -- migrate:up / -- migrate:down markers. Reversibility is opt-in per migration.
  • Tracks state in schema_migrations table inside each managed database — no external state store required.
  • No application bootstrap dependency: services don't need to import a migration library; the runner is invoked separately.

Considered and rejected:

  • golang-migrate — similar shape, slightly more sophisticated CLI than po-platform needs. dbmate's simpler surface won.
  • sqitch — verify/deploy/revert with explicit dependencies. Heavyweight for current scale.
  • alembic — Python-only, tied to SQLAlchemy. Wrong fit for the raw-SQL convention shared by Node + Python services.
  • flyway — Java tooling. No Java elsewhere in the stack.

Layout

infrastructure/migrations/
├── iam/
│   └── 00001_initial_schema.sql
├── calendar/
│   ├── 00001_initial_schema.sql       (rolls up historical 001-007)
│   ├── 00008_activity_log.sql
│   ├── 00009_activity_log_translation_keys.sql
│   ├── 00010_connection_template_binding.sql
│   └── 00011_exception_sync_color.sql
└── rag/
    └── 00001_pgvector_initial.sql

One directory per managed database. Migration filenames keep historical numbering where it carries meaning (calendar's 008-011 stay aligned with their original sequence) but the runner only cares about lexicographic order.

Runner

infrastructure/scripts/migrate.sh is the entrypoint. It dispatches based on $ENV (dev / qual / prod):

  • Picks the right Docker network (po-postgres-network for dev, po-shared-network for qual/prod).
  • Picks the right database suffix ("" / _qual / _prod).
  • Iterates over each managed database (iam, calendar, rag) — or a single one when make db-migrate-dev DB=calendar is specified.
  • Runs amacneil/dbmate:latest --no-dump-schema up against each.

Makefile exposes:

Target Effect
make db-migrate-dev Apply migrations against the dev postgres (chains automatically off make dev)
make db-migrate-qual Apply against qual (manual; CI integration is a P2.5 follow-up)
make db-migrate-prod Apply against prod (manual gate, matches the prod-deploy doctrine)
make db-new-migration DB=<name> NAME=<slug> Generate a stub migration via dbmate new

What the postgres container does now

init-multiple-databases.sh is still bind-mounted into docker-entrypoint-initdb.d — it creates the empty databases listed in POSTGRES_MULTIPLE_DATABASES on first init. Nothing else is mounted. The previous schema mounts (iam-schema.sql, calendar-schema.sql, rag-vectors-schema.sql) are deleted. After first init, the postgres has the right databases but empty schemas; make db-migrate-dev populates them.

Consequences

Positive

  • Schema changes deploy reliably to existing databases, not just fresh inits.
  • No schema drift between developers' local dbs and qual/prod — the schema_migrations table makes the applied set inspectable.
  • Reversibility is now possible per migration (down blocks captured for every new migration; legacy migrations document their irreversibility honestly).
  • One migration source of truth under infrastructure/migrations/ — kills the previous split between infrastructure/scripts/calendar-migration-*.sql and services/calendar-service/migrations/calendar-migration-009-*.sql.
  • Consistent across managed databases — iam, calendar, and rag all follow the same pattern. New databases (e.g., when a future service needs its own) plug in with a new directory.

Negative

  • Two-step boot in dev: docker compose up brings postgres up empty; make db-migrate-dev is the second step. Mitigated by chaining migration into the make dev target — the user-visible flow is still one command.
  • Greenfield baseline: this commit drops the docker-entrypoint mounts. Existing dev/qual/prod databases that were initialized via the old mount mechanism continue to work as-is (their schemas already exist) — but their schema_migrations table is empty, so a dbmate up would attempt to re-apply migrations and conflict on CREATE TABLE. The platform's pre-launch state (test data only, willingness to wipe) makes the decision cheap: on first deploy after this ADR, the postgres volumes for dev/qual/prod must be wiped. From that point forward, dbmate manages everything.
  • No automatic CI integration yet: make db-migrate-qual is invoked manually after deploy. Future work: add a CI step that runs migrations before app services restart in qual.
  • dbmate is a vendor dependency: pin a major version (amacneil/dbmate:latest is fine for now; pin to 2.x if breaking changes land).

Migration cutover sequence

The cutover is wrapped in two make targets — operators don't need to remember the sequence.

Dev (laptop)

make db-reset-dev

Wipes po-postgres-data, restarts the dev stack, applies all migrations. 5-second abort window before the destructive step.

Qual + Prod (single shared Postgres on the VPS)

make db-reset-shared CONFIRM=yes

Stops qual + prod + shared compose stacks on the VPS, wipes the shared po-postgres-data volume, brings the shared stack back up, applies db-migrate-qual AND db-migrate-prod (since the same physical postgres serves both with _qual/_prod suffixed databases), then brings qual + prod stacks back up. The CONFIRM=yes gate prevents accidental invocation.

After the cutover, ongoing migrations run automatically as part of deploy-qual / deploy-prod in CI (see "CI integration" below). Operators run make db-migrate-{qual,prod} manually only when iterating outside the deploy pipeline.

CI integration

.gitlab-ci/infrastructure.yml's ssh_deploy_template runs bash infrastructure/scripts/migrate.sh ${MIGRATE_ENV} after the registry pull and before docker compose up -d. MIGRATE_ENV is derived from CI_ENVIRONMENT_NAME (productionprod, anything else → qual). Idempotency: dbmate skips already-applied migrations; the step is a no-op when nothing new lands.

A failed migration aborts the deploy — the old app keeps running against the old schema until we fix forward. This is the safer default than half-deploying new code against old schema.

Running migrations from a laptop

The runner script (infrastructure/scripts/migrate.sh) needs the po-shared-network Docker network for qual/prod. That network only exists on the VPS. So make db-migrate-qual / make db-migrate-prod SSH to the VPS and run the script there. make db-migrate-dev runs locally against po-postgres-network.

Password sourcing for qual/prod: the runner reads POSTGRES_PASSWORD from infrastructure/compose/.env.shared on the VPS automatically. No need to pass secrets through SSH command-lines.

Future migrations

New schema changes go through make db-new-migration DB=<db> NAME=<snake_case>. Edit the generated infrastructure/migrations/<db>/NNNNN_<name>.sql file, fill in the -- migrate:up and -- migrate:down blocks, commit. CI/operator runs make db-migrate-<env> as part of the deploy pipeline.

The schema_migrations table prevents accidental re-application; dbmate skips already-applied migrations by version.

Follow-ups

  • CI integration for qual — add a db-migrate-qual step to .gitlab-ci/infrastructure.yml after the qual deploy. Tracked in P2.5 follow-ups.
  • Consider per-service schemas later — currently iam, calendar, rag are the only managed databases; the rest (booking, experience, analytics, partner, contract, etc.) have no schema files committed (each service manages its own via raw SQL at runtime). When that pattern becomes painful, add directories under infrastructure/migrations/ and grow the runner's database list.
  • Track dbmate version pins if/when the :latest tag introduces a breaking change.

References