Skip to content

ADR 004: MCP + RAG Integration Pattern

Date: 2025-12-22 Status: Accepted — superseded in part by ADR-002 W2-7 (rag-service folded into ai-service, 2026-04-30); see Evolution below.

Note: Retrospective ADR written 2026-04-26 to document a decision that predates the lifecycle/plan discipline. The original work landed across 90d7bca (2025-12-22, "integrate RAG service and document management features") and was extended via subsequent ai-service / mcp-server commits through Q1 2026.

Context

Portugal Odyssey is an AI-native marketplace: the assembly of multi-service "Experiences" depends on semantic matching against partner-supplied service descriptions and uploaded documents (insurance, certifications, brochures). Three needs emerged together:

  1. Document understanding. Partners upload heterogenous PDFs/DOCX/XLSX. Keyword search is insufficient — Experience matching needs semantic retrieval over content extracted from those documents.
  2. LLM orchestration. Several flows (translate, summarize, sanitize, moderate, enhance partner-authored copy) need a vendor-agnostic LLM facade rather than each Node service shipping its own OpenAI client.
  3. Agent surface. External agents (Cursor, Claude Desktop, Antigravity, future internal agents) needed a stable way to invoke platform capabilities — search the catalog, query partner docs, run the LLM tools — without learning a bespoke REST API per service.

The Node-heavy stack (NestJS / Express) was a poor fit for the LangChain/LangGraph and Docling Python tooling we needed. The platform's primary store (Postgres) had no vector capability at the time.

Decision

Adopt a two-Python-service split with Model Context Protocol (MCP) as the agent-facing contract:

  1. rag-service (FastAPI + asyncpg) owns ingestion. Subscribes to document.uploaded on RabbitMQ, calls Docling Serve to convert source files to Markdown, generates embeddings (OpenAI text-embedding-3-small, 1536 dim), and stores vectors keyed by partner_id / service_id / document_id. Exposes /api/v1/search, /api/v1/chat, /api/v1/convert over plain HTTP for internal callers (experience-service, admin-console, partner-service).

  2. ai-service (FastAPI + LangGraph) owns reasoning. Hosts the LLM tool implementations (ai.translate, ai.summarize, ai.analyze_sentiment, ai.sanitize, ai.moderate, ai.enhance) and the agentic orchestration layer.

  3. MCP as the single agent surface. Both Python services expose their capabilities as MCP tools mounted at /api/v1/mcp so any MCP-compliant client gets a consistent, namespaced toolset (ai.*, rag.*). API-key + (later) Keycloak JWT auth on the MCP mount.

  4. Vector store: initially Qdrant (separate container). Reconsidered and consolidated into Postgres+pgvector under ADR-002 (W1-4) once the actual workload — one vector per partner document, partner-scoped filter queries, no hybrid/sparse search — proved Qdrant's differentiators unused.

Why MCP rather than ad-hoc REST

REST endpoints would have worked for the internal callers, but MCP gives external agents a discoverable, self-describing tool registry. The same Python code serves both audiences — internal HTTP for service-to-service speed, MCP for agents — without duplicating contracts.

Consequences

Positive

  • Right-tool fit: Python lives where it's strongest (LangChain ecosystem, Docling, embeddings); the rest of the platform stays on Node.
  • Single agent surface: external agents authenticate once and discover the full toolset, instead of stitching together per-service APIs.
  • Decoupled ingestion: RabbitMQ-driven document processing means partner-service / file-service don't block on Docling; failures are observable per message.
  • Vendor-agnostic LLM: the ai.* tools abstract OpenAI/Anthropic/Gemini behind a stable contract.

Negative

  • Three services where two might do. ai-service and rag-service overlapped conceptually (both Python, both FastAPI, both touching LLMs). Resolved in W2-7 (2026-04-30): rag-service folded into ai-service.
  • MCP duplication during 2025-12 → 2026-04. The platform briefly ran three MCP surfaces — mcp-server (TS proxy), ai-service (native), rag-service (native) — before ADR-002 W1-5 collapsed the TS proxy and made ai-service the single externally-routed MCP host (mcp-dev / ai-qual / ai subdomains).
  • Vector store over-specified initially. Qdrant ran for ~4 months before W1-4 swapped it for pgvector. The cost was a container, a public Traefik route at qdrant.portugalodyssey.pt, and ~1 GB of memory — not catastrophic, but evidence of the "AI-native means everything-bespoke" reflex worth flagging.
  • Auth gap. Keycloak JWT validation on the MCP endpoint was deferred at creation time (see ADR-002 W1-5 verification notes); only API-key auth landed initially.

Evolution

  • ADR-002 W1-4 (2026-04-26) — replaced Qdrant with pgvector inside the existing Postgres.
  • ADR-002 W1-5 (2026-04-26) — consolidated mcp-server (TS) into ai-service; single MCP URL per environment.
  • ADR-002 W2-7 (2026-04-30) — folded rag-service into ai-service. Vector pipeline (vector_service.py), Docling client (docling_service.py), document-ingest RabbitMQ consumer, and the /api/v1/{search,chat,convert} HTTP endpoints all moved in-process. The 3 rag.* MCP tools are now local function calls instead of httpx round-trips. The MCP contract that external agents depend on is unchanged.

End state (post-W2-7): one Python service (ai-service) hosts the entire AI capability layer — RAG ingestion, retrieval, LLM tools, MCP surface — backed by one vector store (Postgres+pgvector, table rag_document_vectors).

References

  • services/ai-service/app/services/mcp_server.py — current MCP tool registry (9 tools, all in-process post-W2-7)
  • services/ai-service/app/services/vector_service.py — pgvector-backed retrieval (post-W2-7)
  • services/ai-service/app/services/docling_service.py — Docling client (post-W2-7)
  • services/ai-service/app/api/endpoints/rag_ops.py/search, /chat, /convert (post-W2-7)
  • docs/developers/architecture/ai-architecture.md — high-level AI stack diagram
  • ADR-002 §"Candidate 5" / §"Candidate 2" / §"W2-7" — consolidation decisions
  • CLAUDE.md §"MCP & Agent Integration" — current platform MCP endpoint reference