Skip to content

Architecture Simplification Review (Riff #2) — 2026-05-30

Read-only analysis of everything deployed per infrastructure/compose/qualification.yml + infrastructure.yml, cross-checked against real callers (grep evidence). Produced by an explorer agent; reviewed by sC. This is an analysis + proposal — no service has been removed. Mothball decisions are the operator's (José) call; tracked as Riffs #225–#227.

Ground truth: the stack is already lean

The W1 simplification (search-service, mcp-server, observability stack) and W2 (booking-service, analytics-service, user-management mothball, rag-service fold) are done and confirmed. Notably: Prometheus / Grafana / Loki / Promtail are absent from every compose file — already cut in W1. The system-overview.md §1/§6/§7 prose still describes them (plus Elasticsearch + Qdrant) as present — that is stale narrative, not deployed reality (→ Riff #227 doc-fix).

13 app services deployed; shared infra = postgres(pgvector) + redis + rabbitmq. Every infra component traces to a real load-bearing caller (RabbitMQ → booking-confirm consumer; Redis → gateway CMS cache; MinIO → file-service; pgvector → RAG search). Nothing to cut in infra.

Mothball candidates (3 services, near-zero launch risk)

Component Verdict Evidence Risk Effort
review-service mothball (keep source) Zero frontend callers (/api/reviews/reviewService grep empty across all 4 FEs). experience-service ReviewClient is dead (declared, injected nowhere). Emits review.created but no consumer binds it. No reviews exist pre-launch. Low — DSAR loses one fail-soft fan-out target; gateway /api/reviews 404s (no caller) ~1–2h
contract-service defer post-launch (pair w/ doc-signing) Partner-contract lifecycle only. Callers = partner/admin-console contract pages + notification contract-event consumer. All partner-onboarding path, which is gated (Riff #200 Option B: needs Strapi↔partner-service linkage + per-partner Stripe Connect). Not on booking / RAG / web-presence launch paths. Med — consoles need a feature-flag or accept dead nav ~half day
document-signing-service defer post-launch (with contract) Only caller = contract-service sign/countersign + an admin test UI. Carries the .p12 cert secret + storage volume + JVM-slow-start (the 6×20s smoke retry exists for this). Pure partner-onboarding machinery. Med — coupled to contract-service ~1h to stop deploying

docling-serve — keep (real RAG-ingest dependency) but mem_limit: 1g / cpus: 1.0 is over-provisioned for ~10 ingests/day on a VPS that already hit CPU-steal throttling. Consider halving after checking docker stats under a real ingest. (Resource trim, not removal — minor.)

Everything else (partner/experience/payment/notification/file/auth/ai-service/api-gateway/strapi/keycloak/calendar) traces to a launch-critical path — keep. (calendar-service is live: experience-service/scoring.service.tscalendarClient.checkAvailability() on the intent→plan path reached from public-fo.)

Dead code found (mention-only per house rules)

  • services/experience-service/src/common/clients/review.client.tsReviewClient, declared, instantiated nowhere. Would be removed with the review-service mothball.
  • review-service event publishing — review.created on reviews.events has subscribers only in docs.
  • system-overview.md §1 ASCII diagram + §6 "Elasticsearch required for production search" + §7 observability — all describe removed/never-deployed components.

Recommendation

Mothball review-service first (highest confidence, ~1–2h, mirrors the proven user-management mothball). Treat contract-service + document-signing-service as a deferred pair to revisit when partner onboarding (Riff #200 Option B) is actually activated. Fix the stale system-overview.md. Net: 3 of 13 deployed services off, near-zero launch risk.

Decision — Riff #226 (contract + doc-signing defer): documented, NOT executed yet

José's call (2026-05-31): document the deferral decision + nav-degradation plan, but do not stop deploying the two services now. Rationale: the deferral is gated on Riff #200 Option B (partner onboarding), which is not live. Stopping deployment today would remove two working services before their replacement-gating condition exists, buying ~nothing (idle services cost little on the current VPS) while taking on console-degradation risk early. So #226 is ready-to-execute, parked until #200 activates — not a present action.

Execution plan (run this when Riff #200 Option B activates)

When partner onboarding goes live and the contract→signing path is genuinely exercised, either keep both services (they're now load-bearing) or, if #200 lands without contract signing, execute the defer:

  1. Stop deploying both. Remove (or comment) the contract-service and document-signing-service blocks from infrastructure/compose/qualification.yml; keep source in-tree (mothball, not delete — mirrors user-management). Drop them from production.yml too (they were never deployed there).
  2. Console nav degradation = feature-flag (chosen strategy, not dead links). Gate the contract nav entries in partner-console + admin-console behind a build-time flag (e.g. VITE_CONTRACTS_ENABLED=false). Hide the contract pages/links rather than leaving them pointing at now-dead routes (404s are a visibly broken UX; a hidden nav item is clean). The two consoles' contract pages stay in source, re-enabled by flipping the flag when #200 activates.
  3. Notification consumer. The contract-event-consumer binding in notification-service can stay (it's inert with no producer) or be guarded by the same flag — low effort, no rush.
  4. .p12 cert + storage volume. With document-signing-service undeployed, the secrets/signing/certificate.p12 mount and its volume are unused; leave the secret in place (re-deploy needs it). Note the 6×20s JVM smoke-retry hack in CI's deploy-qual becomes dead and should be removed alongside.

Effort when triggered: ~half day (console coordination). Risk: med, all in the two consoles. Until then: no change — both services keep deploying.