AI & RAG Architecture¶

The Portugal Odyssey platform features a Retrieval-Augmented Generation (RAG) system that powers intelligent experience matching and exposes platform capabilities to external AI agents through a single MCP surface.

W2-7 (2026-04-30): the standalone rag-service was folded into ai-service. Vector pipeline, Docling integration, document-ingest RabbitMQ consumer, and the /api/v1/{search,chat,convert} HTTP endpoints all moved in-process. The 3 rag.* MCP tools are now local function calls instead of httpx proxies. rag_qual / rag_prod Postgres dbs are unchanged; the docling-serve sidecar is unchanged.

1. High-Level AI Stack¶

graph TD
    PC["Partner Console"] --> FS["File Service"]
    FS -->|"Publish doc.uploaded / doc.deleted"| RMQ["RabbitMQ"]
    RMQ -->|"Consume"| AS["AI Service (Python)"]

    subgraph AI_Infrastructure [AI Infrastructure Layer]
        AS -->|"Convert"| DS["Docling Serve (sidecar)"]
        AS -->|"Store/Search"| PG["Postgres + pgvector"]
        AS -->|"Reasoning"| LLM["LLM Providers (OAI/Anthropic/Gemini)"]
    end

    subgraph Agentic_Layer [Agentic Layer]
        AS -->|"Expose 9 tools"| MCP["MCP endpoint (ai.* / ai.qual.* / mcp-dev.*)"]
        Agent["External AI Agents"] -->|"Query / Reason"| MCP
    end

    ES["Experience Service"] -->|"/api/v1/chat (HTTP)"| AS
    PSV["Partner Service"] -->|"service_text.process.requested"| RMQ

2. Key Components¶

AI Service (`services/ai-service`)¶

The platform's capability layer — single external MCP surface, RAG ingestion + retrieval pipeline, and partner-text moderation worker. Built with FastAPI + LangChain (langgraph was previously listed but never imported; dropped 2026-04-29 per Riff #31).

Hosts the MCP endpoint at ai.portugalodyssey.pt/api/v1/mcp (prod) / ai.qual.portugalodyssey.pt/... (qual) / mcp-dev.portugalodyssey.pt/... (dev). Auth via X-API-Key header (MCP_API_KEYS env, key:principal semicolon-separated) or Keycloak JWT (Authorization: Bearer ..., validated against the realm's JWKS).

Exposes 9 tools — all in-process since W2-7: - 6 ai.* (translate, summarize, analyze_sentiment, sanitize, moderate, enhance) - 3 rag.* (search_services, query_rag, convert_document)

Plus plain HTTP at /api/v1/{translate,summarize,sentiment,sanitize,enhance,search,chat,convert} for internal callers.

No agent loop here. ai-service exposes capabilities (LLM tools, RAG, MCP) — not orchestration. The DAG orchestrator that implements the AI-curated Experience thesis lives in experience-service (see docs/implementation-plans/021-experience-service-dag-orchestrator/). experience-service consumes ai-service via internal HTTP — mirroring how partner-service consumes it for the service_text.process.requested text-moderation pipeline (partner.events exchange → ai_service_text_tasks queue).

Postgres + pgvector¶

Vector storage. Single table rag_document_vectors in the rag database (qual: rag_qual, prod: rag_prod), with an IVFFlat cosine index (lists=100, sized for ~100k–1M vectors). 1536-dimensional embeddings (OpenAI text-embedding-3-small). W1-4 (2026-04-26) consolidated this from a separate Qdrant service into the existing shared Postgres — see docs/developers/adr/ADR-002-wave1-service-consolidation.md.

Docling Serve¶

Sidecar microservice for advanced document parsing. Converts complex formats (PDF, DOCX, XLSX) into clean Markdown, preserving semantic structure critical for high-quality RAG. Per-env: docling-serve (dev) / docling-serve-qual / docling-serve-prod.

Model Context Protocol (MCP)¶

A single platform-wide MCP endpoint hosted by ai-service lets any MCP-compliant agent (Cursor, Claude Desktop, Anthropic Antigravity, custom agents) use the platform's knowledge through standardized tools. Pre-W1-5 (2026-04-26), MCP was hosted by a dedicated mcp-server Node service; that service was consolidated into ai-service. Pre-W2-7 (2026-04-30), the 3 rag.* tools were httpx proxies to a separate rag-service; they are now in-process.

3. Core Workflows¶

Document Ingestion¶

Partner uploads a document via the Partner Console.
File Service stores the file; Partner Service publishes document.uploaded on partner.events.
AI Service (RabbitMQ consumer, queue ai_service_rag_documents) consumes the event:
- Calls Docling Serve to convert the document to Markdown.
- Performs content safety checks (placeholder keyword filter; primary moderation runs upstream via ai.moderate).
- Generates embeddings via OpenAI text-embedding-3-small (LangChain OpenAIEmbeddings).
- Stores the vector in rag_document_vectors (partner_id / service_id filterable columns).
- Callbacks Partner Service with status (PROCESSING → READY / FAILED) and broadcasts via Notification Service WebSocket.
AI Service also binds document.deleted on the same queue and removes the corresponding vector so deleting a partner document never leaves orphan embeddings.

Intelligent Service Matching¶

The Experience Service uses ai-service's /api/v1/chat endpoint (AiClientService, was RagClientService pre-W2-7) to match partner services with client preferences. Instead of keyword search, it performs semantic search over curated service descriptions and documents to find the best fit for an experience itinerary.

External Agent Tool Use¶

External AI agents authenticate to the MCP endpoint, call any of the 9 tools, and receive responses. All tools execute in-process inside ai-service.

4. Multi-LLM Support¶

The platform is vendor-agnostic and supports: - OpenAI: GPT-4o-mini for ai.* tools, GPT-4-turbo for RAG completions, text-embedding-3-small for embeddings - Anthropic: Claude 3.5 Sonnet/Opus - Google: Gemini 1.5 Pro

LLM choice is configurable per-tool in ai-service.

5. Extensibility¶

The agentic layer via MCP ensures that as the platform grows, new AI capabilities can be exposed as tools without refactoring core services. New tools land directly in ai-service — there is now one Python service to extend, not two.