Architecture Simplification Review (Riff #2) — 2026-05-30¶
Read-only analysis of everything deployed per infrastructure/compose/qualification.yml + infrastructure.yml, cross-checked against real callers (grep evidence). Produced by an explorer agent; reviewed by sC. This is an analysis + proposal — no service has been removed. Mothball decisions are the operator's (José) call; tracked as Riffs #225–#227.
Ground truth: the stack is already lean¶
The W1 simplification (search-service, mcp-server, observability stack) and W2 (booking-service, analytics-service, user-management mothball, rag-service fold) are done and confirmed. Notably: Prometheus / Grafana / Loki / Promtail are absent from every compose file — already cut in W1. The system-overview.md §1/§6/§7 prose still describes them (plus Elasticsearch + Qdrant) as present — that is stale narrative, not deployed reality (→ Riff #227 doc-fix).
13 app services deployed; shared infra = postgres(pgvector) + redis + rabbitmq. Every infra component traces to a real load-bearing caller (RabbitMQ → booking-confirm consumer; Redis → gateway CMS cache; MinIO → file-service; pgvector → RAG search). Nothing to cut in infra.
Mothball candidates (3 services, near-zero launch risk)¶
| Component | Verdict | Evidence | Risk | Effort |
|---|---|---|---|---|
| review-service | mothball (keep source) | Zero frontend callers (/api/reviews/reviewService grep empty across all 4 FEs). experience-service ReviewClient is dead (declared, injected nowhere). Emits review.created but no consumer binds it. No reviews exist pre-launch. |
Low — DSAR loses one fail-soft fan-out target; gateway /api/reviews 404s (no caller) |
~1–2h |
| contract-service | defer post-launch (pair w/ doc-signing) | Partner-contract lifecycle only. Callers = partner/admin-console contract pages + notification contract-event consumer. All partner-onboarding path, which is gated (Riff #200 Option B: needs Strapi↔partner-service linkage + per-partner Stripe Connect). Not on booking / RAG / web-presence launch paths. | Med — consoles need a feature-flag or accept dead nav | ~half day |
| document-signing-service | defer post-launch (with contract) | Only caller = contract-service sign/countersign + an admin test UI. Carries the .p12 cert secret + storage volume + JVM-slow-start (the 6×20s smoke retry exists for this). Pure partner-onboarding machinery. |
Med — coupled to contract-service | ~1h to stop deploying |
docling-serve — keep (real RAG-ingest dependency) but mem_limit: 1g / cpus: 1.0 is over-provisioned for ~10 ingests/day on a VPS that already hit CPU-steal throttling. Consider halving after checking docker stats under a real ingest. (Resource trim, not removal — minor.)
Everything else (partner/experience/payment/notification/file/auth/ai-service/api-gateway/strapi/keycloak/calendar) traces to a launch-critical path — keep. (calendar-service is live: experience-service/scoring.service.ts → calendarClient.checkAvailability() on the intent→plan path reached from public-fo.)
Dead code found (mention-only per house rules)¶
services/experience-service/src/common/clients/review.client.ts—ReviewClient, declared, instantiated nowhere. Would be removed with the review-service mothball.review-serviceevent publishing —review.createdonreviews.eventshas subscribers only in docs.system-overview.md§1 ASCII diagram + §6 "Elasticsearch required for production search" + §7 observability — all describe removed/never-deployed components.
Recommendation¶
Mothball review-service first (highest confidence, ~1–2h, mirrors the proven user-management mothball). Treat contract-service + document-signing-service as a deferred pair to revisit when partner onboarding (Riff #200 Option B) is actually activated. Fix the stale system-overview.md. Net: 3 of 13 deployed services off, near-zero launch risk.
Decision — Riff #226 (contract + doc-signing defer): documented, NOT executed yet¶
José's call (2026-05-31): document the deferral decision + nav-degradation plan, but do not stop deploying the two services now. Rationale: the deferral is gated on Riff #200 Option B (partner onboarding), which is not live. Stopping deployment today would remove two working services before their replacement-gating condition exists, buying ~nothing (idle services cost little on the current VPS) while taking on console-degradation risk early. So #226 is ready-to-execute, parked until #200 activates — not a present action.
Execution plan (run this when Riff #200 Option B activates)¶
When partner onboarding goes live and the contract→signing path is genuinely exercised, either keep both services (they're now load-bearing) or, if #200 lands without contract signing, execute the defer:
- Stop deploying both. Remove (or comment) the
contract-serviceanddocument-signing-serviceblocks frominfrastructure/compose/qualification.yml; keep source in-tree (mothball, not delete — mirrors user-management). Drop them fromproduction.ymltoo (they were never deployed there). - Console nav degradation = feature-flag (chosen strategy, not dead links). Gate the contract nav entries in partner-console + admin-console behind a build-time flag (e.g.
VITE_CONTRACTS_ENABLED=false). Hide the contract pages/links rather than leaving them pointing at now-dead routes (404s are a visibly broken UX; a hidden nav item is clean). The two consoles' contract pages stay in source, re-enabled by flipping the flag when #200 activates. - Notification consumer. The
contract-event-consumerbinding in notification-service can stay (it's inert with no producer) or be guarded by the same flag — low effort, no rush. .p12cert + storage volume. With document-signing-service undeployed, thesecrets/signing/certificate.p12mount and its volume are unused; leave the secret in place (re-deploy needs it). Note the 6×20s JVM smoke-retry hack in CI'sdeploy-qualbecomes dead and should be removed alongside.
Effort when triggered: ~half day (console coordination). Risk: med, all in the two consoles. Until then: no change — both services keep deploying.