Characterization Test Pattern¶
How to write characterization tests for legacy po-platform services. Read this before starting any Phase 3 backfill slice.
What characterization tests are¶
Characterization tests lock the current, observed behavior of a system so that future refactors fail loudly if they drift. Unlike TDD (write a failing test first, then make it pass), characterization testing starts with working code: you run the code, record what it does, and write a test that asserts exactly that outcome. The goal is not correctness — bugs that exist today get locked in too, then fixed under cover of the green suite. Plan #026's strategic stance: backfill tests are a safety net for refactors, not a bug-hunt. File bugs as Riff issues; fix them in separate slices after the lock is in place.
The three-layer pattern¶
Each layer catches a distinct class of defect. All three layers are needed for complete boundary coverage.
Layer 1 — Unit (mock DB + mock externals)¶
Purpose: verify service business logic in isolation. Fast (<2s wall-time for a full service). Catches logic errors, conditional branches, error-propagation paths.
When to write: every service method branch, including error paths. If a code path exists, there should be a unit test for it.
When NOT enough: unit tests with mocked DB cannot catch SQL typos, missing columns, FK violations, or raw-query logic bugs. They also cannot catch HTTP shape regressions (NestJS pipe behavior, guard order, route-parameter types). You need layers 2 and 3 for those.
Code template (NestJS service, Jest):
// src/modules/payments/payments.service.spec.ts
import { Test, TestingModule } from '@nestjs/testing';
import { PaymentsService } from './payments.service';
import { DatabaseService } from '../../database/database.service';
import { StripeService } from '../stripe/stripe.service';
describe('PaymentsService', () => {
let service: PaymentsService;
let dbQuery: jest.Mock;
let stripeCreate: jest.Mock;
beforeEach(async () => {
dbQuery = jest.fn();
stripeCreate = jest.fn();
const module: TestingModule = await Test.createTestingModule({
providers: [
PaymentsService,
{ provide: DatabaseService, useValue: { query: dbQuery } },
{ provide: StripeService, useValue: { paymentIntents: { create: stripeCreate } } },
],
}).compile();
service = module.get(PaymentsService);
});
it('creates a payment intent and persists it', async () => {
stripeCreate.mockResolvedValue({ id: 'pi_test', client_secret: 'cs_test' });
dbQuery.mockResolvedValue({ rows: [{ id: 'uuid-1', stripe_intent_id: 'pi_test' }] });
const result = await service.createIntent({ amount: 1000, currency: 'eur', partnerId: 'p-1' });
expect(result.stripeIntentId).toBe('pi_test');
expect(dbQuery).toHaveBeenCalledWith(expect.stringContaining('INSERT INTO payments'), expect.any(Array));
});
});
Layer 2 — Supertest E2E (HTTP shape + guards)¶
Purpose: verify the full HTTP stack — routing, ValidationPipe (rejects bad input), ParseUUIDPipe (rejects non-UUID path params), auth guards (401/403 on missing/wrong JWT), and response shape. Does not hit a real database; external services remain mocked.
What it catches that unit tests don't:
- Route path mismatches (e.g. /payments vs /payments/intents)
- ValidationPipe stripping unexpected fields (whitelist: true)
- ParseUUIDPipe rejecting non-UUID IDs before the service method is called
- Auth guard order (global guard vs route-level override)
- Response serialization shape (what actually comes over the wire)
Code template (NestJS + supertest):
// test/integration/payments/payments.controller.e2e-spec.ts
import * as request from 'supertest';
import { Test, TestingModule } from '@nestjs/testing';
import { INestApplication, ValidationPipe } from '@nestjs/common';
import { AppModule } from '../../../src/app.module';
describe('PaymentsController (e2e)', () => {
let app: INestApplication;
beforeAll(async () => {
const module: TestingModule = await Test.createTestingModule({
imports: [AppModule],
})
.overrideProvider(DatabaseService).useValue({ query: jest.fn().mockResolvedValue({ rows: [] }) })
.overrideProvider(StripeService).useValue({ paymentIntents: { create: jest.fn().mockResolvedValue({ id: 'pi_test', client_secret: 'cs_test' }) } })
.compile();
app = module.createNestApplication();
app.useGlobalPipes(new ValidationPipe({ whitelist: true, forbidNonWhitelisted: false }));
await app.init();
});
afterAll(() => app.close());
it('POST /payments/intents → 201 with valid body', () =>
request(app.getHttpServer())
.post('/payments/intents')
.set('Authorization', `Bearer ${validJwt}`)
.send({ amount: 1000, currency: 'eur', partnerId: 'a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11' })
.expect(201)
.expect((res) => expect(res.body.stripeIntentId).toBeDefined()));
it('POST /payments/intents → 401 without JWT', () =>
request(app.getHttpServer())
.post('/payments/intents')
.send({ amount: 1000, currency: 'eur', partnerId: 'a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11' })
.expect(401));
it('POST /payments/intents → 400 with invalid UUID partnerId', () =>
request(app.getHttpServer())
.post('/payments/intents')
.set('Authorization', `Bearer ${validJwt}`)
.send({ amount: 1000, currency: 'eur', partnerId: 'not-a-uuid' })
.expect(400));
});
Layer 3 — @po/test-db integration (real Postgres)¶
Purpose: verify raw SQL correctness against the real schema. Catches UNIQUE/CHECK/FK constraint violations, missing columns, wrong data types, NULL-handling bugs in queries, and transaction-isolation invariants.
What it catches that supertest doesn't: - SQL typos or wrong column names (supertest mocks the DB, so bad SQL never runs) - Schema constraint enforcement (UNIQUE violations, CHECK failures, FK rejections) - Raw-query logic bugs (wrong JOIN, bad WHERE, missing RETURNING) - Data-shape discrepancies between the audit and the real schema (see Pitfalls section)
Code template (@po/test-db + withRollback):
// test/integration/payments/insert-payment.integration-spec.ts
import { testPool } from '../setup';
import { withRollback } from '@po/test-db';
describe('payments — insertPayment', () => {
it('inserts a payment row with pending status', async () => {
await withRollback(testPool, async (client) => {
const result = await client.query(
`INSERT INTO payments (id, stripe_intent_id, amount, currency, status, partner_id)
VALUES (gen_random_uuid(), $1, $2, $3, 'pending', $4)
RETURNING *`,
['pi_test_123', 1000, 'eur', 'a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11'],
);
expect(result.rows[0].status).toBe('pending');
expect(result.rows[0].stripe_intent_id).toBe('pi_test_123');
expect(result.rows[0].created_at).toBeDefined();
});
});
it('rejects duplicate stripe_intent_id (UNIQUE constraint)', async () => {
await withRollback(testPool, async (client) => {
const insert = `INSERT INTO payments (id, stripe_intent_id, amount, currency, status, partner_id)
VALUES (gen_random_uuid(), $1, $2, $3, 'pending', $4)`;
const args = ['pi_dup', 500, 'eur', 'a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11'];
await client.query(insert, args);
await expect(client.query(insert, args)).rejects.toThrow(/unique|duplicate/i);
});
});
});
Shared test setup:
// test/integration/setup.ts
import { Pool } from 'pg';
export const testPool = new Pool({
host: process.env.TEST_DB_HOST ?? 'localhost',
port: Number(process.env.TEST_DB_PORT ?? 5432),
database: process.env.TEST_DB_NAME ?? 'payment_test',
user: process.env.TEST_DB_USER ?? 'postgres',
password: process.env.TEST_DB_PASSWORD ?? 'postgres',
});
afterAll(() => testPool.end());
Add forceExit: true to jest.config.ts for the integration test project — long-lived pools prevent clean Jest shutdown otherwise:
// jest.config.ts (integration project section)
{
displayName: 'integration',
testMatch: ['**/*.integration-spec.ts'],
forceExit: true,
}
Layer 3 pre-flight: service tsconfig.json must set isolatedModules: true.
@po/test-db is consumed as a workspace symlink ("@po/test-db": "file:../../packages/test-db") and its package.json:main points at src/index.ts (raw TypeScript). When ts-jest transforms an integration spec, it follows the symlink and tries to fully type-check packages/test-db/src/with-rollback.ts. TS resolution can't find pg's types because pg is a peerDependencies in test-db and no node_modules exists at packages/test-db/.
Setting "isolatedModules": true in the consumer service's compilerOptions makes ts-jest do per-file transpilation without cross-package type-checking. Type-checking remains intact in CI via tsc --noEmit. Without it, integration specs fail at compile time with error TS2307: Cannot find module 'pg' or its corresponding type declarations.
// services/<service>/tsconfig.json
{
"compilerOptions": {
"isolatedModules": true,
"...": "other options"
}
}
This is a per-service requirement until a shared tsconfig.base.json is introduced for services/*.
Pre-flight: per-service inventory¶
Before writing a single test, build an inventory of the service's test surface. This drives sizing and shapes which spec files to create. The payment-service audit (payment-service-audit.md) is the worked example.
What to inventory:
| Item | How to count | Drives |
|---|---|---|
| Controllers + routes | grep for @Get, @Post, @Put, @Delete, @Patch; count distinct route paths |
Supertest spec count (2 specs per protected route: happy + auth-fail; 1 for public routes) |
| Repository methods | grep for this.db.query / client.query; cluster by the operation they perform; count distinct clusters |
@po/test-db integration spec count (1-2 specs per cluster: happy + constraint edge case) |
| Consumers + webhook handlers | grep for @RabbitSubscribe, @EventPattern, or service process() method branches |
Integration spec count (1 per logical family; add idempotency test per consumer) |
| Existing test files | list *.spec.ts + *.test.ts; classify each as unit/supertest/integration |
Identifies gaps; avoids duplication |
Sizing output table (fill in per service):
| Surface | Count | Estimated specs | Estimated time (Opus) |
|---|---|---|---|
| Protected routes | N | N × 2 supertest | ~10 min each |
| Public / signature-auth routes | M | M × 1 supertest | ~10 min each |
| Repository method clusters | R | R × 1–2 integration | ~15 min each |
| Consumer / webhook families | C | C × 1 integration + idempotency | ~20 min each |
| Total | ~(2N + M + 2R + 2C) specs |
Critical pre-flight step: always run \d <table> against the real schema (psql -d <service>_test) for every table you plan to write integration tests for. Audits can be wrong — see Pitfalls. Discovering a missing column mid-test is a 30-minute derail.
Day-by-day execution structure¶
The payment-service pilot (Slice E) ran across 3 days. This structure is repeatable for any Phase 3 backfill slice.
Day 1 — Foundation + supertest layer (~1–1.5h on Opus with plan + harness)¶
- Read the per-service inventory (or create one if it doesn't exist yet).
- Wire
DatabaseServicePool injection (see next section — do this first; it's the prerequisite for Day 2 + 3). - Create
test/integration/directory and a sharedsetup.ts(singleton pool +afterAll(pool.end)). - Write supertest specs — one file per controller module. Happy path + auth-fail per protected route; happy-only for public/signature-auth routes.
- Add the service to
infrastructure/ci/test-allowlist.txtonce all supertest specs pass locally.
Day 2 — DB integration layer (~1–1.5h on Opus)¶
- Cluster raw SQL queries into repository method groups (e.g., all
INSERT INTO transfersqueries = one cluster). - Write one integration spec file per cluster using
withRollback. Each spec: 1 happy path + 1-2 edge cases (UNIQUE/CHECK/FK violations, NULL handling). - Insert parent rows inside the same
withRollbackblock for FK-constrained tables. - Verify the full suite still runs in <30s combined.
Day 3 — Consumer + webhook handler integration (~30 min–1h on Opus, scope to depth)¶
- Write integration specs for consumers and event handlers — inject
TEST_POOL_TOKEN, use unique IDs + explicit DELETE cleanup (see Patterns section). - Add idempotency test per consumer — call twice with same ID, assert exactly one side-effect.
- Scope explicitly: don't try to cover all webhook branches in one slice. Pick the highest-risk families. Defer the rest to follow-up Riff stories.
- Write
CHARACTERIZATION-NOTES.md— coverage shipped, deferred items, surprises, sizing actuals.
The DatabaseService Pool-injection prerequisite¶
This is the first task in any Phase 3 backfill slice. Do not skip it.
Most po-platform services depend-inject DatabaseService, which constructs its own pg.Pool from environment config. To write @po/test-db integration tests that exercise service methods end-to-end (not just raw SQL), the service must accept an external pool so tests can pass the test pool in.
The pattern established in Slice E (task 1.2):
// src/database/database.service.ts
import { Injectable, Optional, Inject } from '@nestjs/common';
import { ConfigService } from '@nestjs/config';
import { Pool, PoolClient } from 'pg';
export const TEST_POOL_TOKEN = 'TEST_POOL_TOKEN';
@Injectable()
export class DatabaseService {
private readonly pool: Pool;
constructor(
private readonly configService: ConfigService,
@Optional() @Inject(TEST_POOL_TOKEN) externalPool?: Pool,
) {
this.pool = externalPool ?? new Pool({
host: configService.get('DB_HOST'),
port: configService.get('DB_PORT'),
database: configService.get('DB_NAME'),
user: configService.get('DB_USER'),
password: configService.get('DB_PASSWORD'),
});
}
async query<T = any>(sql: string, params?: any[]): Promise<{ rows: T[] }> {
return this.pool.query(sql, params);
}
}
In test modules, register the test pool as the TEST_POOL_TOKEN provider:
const module = await Test.createTestingModule({
imports: [AppModule],
})
.overrideProvider(TEST_POOL_TOKEN)
.useValue(testPool)
.compile();
This replaces the service's internal pool with the test pool for the duration of the test module. Note: this does not give per-test transactional rollback (the service calls this.pool.query internally, not through the withRollback client). Use the unique-IDs + explicit DELETE cleanup pattern for service-method integration tests (see Patterns section).
Patterns from Slice E¶
Per-test transactional rollback (withRollback)¶
Use for table-level integration specs where you control the SQL directly:
it('inserts and reads back a refund row', async () => {
await withRollback(testPool, async (client) => {
// Arrange — insert prerequisite rows in same transaction
await client.query(`INSERT INTO payments (id, ...) VALUES ($1, ...)`, [paymentId, ...]);
// Act
const { rows } = await client.query(
`INSERT INTO refunds (id, payment_id, amount, status) VALUES (gen_random_uuid(), $1, $2, 'pending') RETURNING *`,
[paymentId, 500],
);
// Assert
expect(rows[0].status).toBe('pending');
});
// All rows rolled back — next test starts clean
});
Unique-IDs + explicit DELETE cleanup (service-method integration)¶
Use when the service owns its pool internally (after TEST_POOL_TOKEN injection) and withRollback can't wrap the service's queries:
const testEventId = `evt_test_${Date.now()}_${Math.random().toString(36).slice(2)}`;
afterEach(async () => {
await testPool.query(`DELETE FROM webhooks_log WHERE event_id = $1`, [testEventId]);
await testPool.query(`DELETE FROM payments WHERE stripe_intent_id = $1`, [testEventId]);
});
it('processes a payment_intent.succeeded event', async () => {
// Arrange
const stripeEvent = buildStripeEvent('payment_intent.succeeded', { id: testEventId, ... });
// Act
const result = await webhooksService.process(stripeEvent);
// Assert DB side-effect via testPool
const { rows } = await testPool.query(`SELECT * FROM payments WHERE stripe_intent_id = $1`, [testEventId]);
expect(rows[0].status).toBe('succeeded');
expect(result.status).toBe('ok');
});
Foreign-key-aware test data¶
Insert parent rows inside the same withRollback block — FK constraints are live against the real schema:
await withRollback(testPool, async (client) => {
// Parent row first
await client.query(`INSERT INTO payments (id, stripe_intent_id, ...) VALUES ($1, $2, ...)`, [parentId, 'pi_parent']);
// Child row (FK: refunds.payment_id → payments.id)
await client.query(`INSERT INTO refunds (id, payment_id, ...) VALUES (gen_random_uuid(), $1, ...)`, [parentId]);
});
CHECK / UNIQUE constraint assertions¶
it('rejects negative amount (CHECK constraint)', async () => {
await withRollback(testPool, async (client) => {
await expect(
client.query(`INSERT INTO payments (id, amount, ...) VALUES (gen_random_uuid(), -1, ...)`)
).rejects.toThrow(/check|violates check/i);
});
});
it('rejects duplicate stripe_intent_id (UNIQUE constraint)', async () => {
await withRollback(testPool, async (client) => {
const insert = `INSERT INTO payments (id, stripe_intent_id, ...) VALUES (gen_random_uuid(), $1, ...)`;
await client.query(insert, ['pi_dup']);
await expect(client.query(insert, ['pi_dup'])).rejects.toThrow(/unique|duplicate/i);
});
});
Gotcha: continuing after a constraint rejection inside withRollback needs a SAVEPOINT.
Postgres puts the surrounding transaction into the aborted state after any failing query. Subsequent queries in the same txn fail with current transaction is aborted, commands ignored until end of transaction block — including the positive-case follow-on INSERT that proves the constraint is per-column-group, not per-column. Wrap the expected-to-fail INSERT in a SAVEPOINT and ROLLBACK to it before continuing:
it('UNIQUE is per-(context, slug), not per-slug', async () => {
await withRollback(testPool, async (client) => {
await client.query(`INSERT INTO action_reasons (context, slug, name)
VALUES ('contract.void', 'dup', 'first')`);
// SAVEPOINT lets the txn survive the expected rejection.
await client.query('SAVEPOINT dup');
await expect(
client.query(`INSERT INTO action_reasons (context, slug, name)
VALUES ('contract.void', 'dup', 'second')`)
).rejects.toThrow(/unique|duplicate/i);
await client.query('ROLLBACK TO SAVEPOINT dup');
// Without the SAVEPOINT, this INSERT would fail with
// "current transaction is aborted" even though the data is valid.
const ok = await client.query(`INSERT INTO action_reasons (context, slug, name)
VALUES ('contract.changes_requested', 'dup', 'third')
RETURNING id`);
expect(ok.rows).toHaveLength(1);
});
});
If the failing query is the LAST query in the block (no follow-on positive case), SAVEPOINT isn't strictly needed — withRollback will ROLLBACK the whole txn at block exit.
Idempotency tests for consumers / webhook handlers¶
Every consumer and webhook handler must have an idempotency test. Stripe webhooks redeliver; RabbitMQ messages can redeliver. Same message in twice → exactly one side-effect:
it('processes a duplicate event only once (idempotency)', async () => {
const stripeEvent = buildStripeEvent('payment_intent.succeeded', { id: testEventId, ... });
const result1 = await webhooksService.process(stripeEvent);
const result2 = await webhooksService.process(stripeEvent);
// Second call is a no-op
expect(result1.status).toBe('ok');
expect(result2.status).toBe('duplicate');
// Exactly one DB row
const { rows } = await testPool.query(
`SELECT COUNT(*) AS cnt FROM webhooks_log WHERE event_id = $1`, [testEventId]
);
expect(Number(rows[0].cnt)).toBe(1);
});
Common pitfalls¶
Audit-vs-reality schema drift¶
The pre-execution audit may assume columns that don't exist (or miss columns that do). The Slice E pilot hit this: transfer_reversals was assumed to have a status column, but it doesn't — lifecycle status is inferred from stripe_reversal_id presence and the parent transfers.status='reversed' field.
Rule: always run \d <table> against the real *_test schema before writing integration tests for that table. Never trust the audit's column list without verifying.
Don't fix bugs during characterization¶
If a test reveals that the implementation does something unexpected (wrong error code, missing field, silent data loss), file a Riff issue and write the test to assert the current wrong behavior. This feels wrong — it isn't. The test suite's job at this stage is to give you a safety net for refactoring, not to enforce correct behavior. Fix bugs in separate slices after the lock is in place.
Mock-everything unit tests don't catch SQL bugs¶
A unit test that mocks DatabaseService.query cannot catch a query that references a non-existent column. The query never runs. Only the @po/test-db integration layer runs real SQL. If a service has 100% unit coverage but 0% integration coverage, SQL bugs are invisible until prod.
forceExit: true is required for integration specs with pools¶
Jest's default behavior is to wait for all handles to close before exiting. A pg.Pool with idle connections keeps the process alive indefinitely. Without forceExit: true in the integration jest config, npm test hangs after all specs pass. Set it in the integration project config block, not globally (it masks legitimate leaks in unit tests).
Scope day-3 explicitly — defer, don't skip¶
Consumer and webhook-handler coverage is the hardest to write and easiest to under-scope. When time-boxing day-3, pick the highest-risk families (e.g., the happy-path + idempotency for the primary consumer), ship them, and file follow-up Riff stories for the remaining branches. A partial day-3 with explicit deferral notes is better than a rushed day-3 with low-confidence specs.
CI status reality check¶
Integration specs do not run in CI today. The test-services GitLab job has no services: block wiring a Postgres container. Integration specs pass locally (against dev compose's postgres) but are invisible to CI.
Local recipe to run integration specs:
make dev-up # bring postgres up (docker compose)
make test-db-reset # rebuild *_test schemas from migrations
cd services/<service-name>
npm run test:integration # runs *.integration-spec.ts
The gap: until a follow-up Riff wires services: postgres:15 into .gitlab-ci/services.yml, integration specs are developer-local only. The CI gate catches unit + supertest specs (which mock the DB), not real-DB integration specs. Track this via the Riff filed in CHARACTERIZATION-NOTES.md (item 4 in "What was deferred").
Sizing rule of thumb¶
With Opus + a per-service plan prompt + the harness already in place:
| Service complexity | Three-layer characterization | Notes |
|---|---|---|
| Small/medium (≤15 routes, ≤20 repo methods, ≤3 consumers) | ~0.5 day | contract-service, partner-service, document-signing-service, strapi-cms |
| Large/complex (>15 routes, complex algorithms, large webhook fan-out) | ~1 day | experience-service DAG orchestrator, ai-service tool surface |
| Pure-logic module (no DB, no broker, no externals) | ~1-2h | calendar-service availability module (Phase 3 Rank 1) |
Basis: Slice E actual was ~3.5h wall-time for payment-service (11 routes, ~22 repo methods, 1 consumer + 14 webhook branches), against a conservative 18-26h original estimate — roughly 5-7× faster than human-only. Phase 3 Rank 1 (calendar-service availability) was ~1.5h actual vs 1-2 day estimate (~10× faster) because the module is pure-logic — all four services are stateless transforms over input data.
The harness matters: first-time harness setup (done once in Slice B) is the expensive part. Phase 3 services inherit @po/test-db, the jest config patterns, and this template. Budget 10-15 min for the TEST_POOL_TOKEN wiring per service; the rest is mechanical spec-filling.
Pre-flight check: pure-logic vs DB-dependent. Run grep -rn "DatabaseService\|RabbitmqService" src/modules/<module>/ first. Zero matches means the module is pure-logic — skip the three-layer pattern entirely and write unit specs directly. This is what calendar-service availability turned out to be (Phase 3 Rank 1, 2026-05-11).