ADR 010: Inter-session agent communication¶
Date: 2026-05-27 (drafted) · Ratified: 2026-05-30 (by José, after the sC field-input + ASCII-name-style folds)
Status: Accepted — rollout in progress: active.md schema (Presence column + Session · Name cell), start-session.sh name-pool, session-heartbeat.sh post-commit hook, lefthook.yml registration. The git-log fallback for liveness (per § 4) is always-on regardless of hook installation.
Context¶
2–3 Claude Code orchestrator sessions (sA/sB/sC) work the same repo
concurrently, each in its own git worktree, committing direct to main. They are
separate processes, not always running simultaneously (gaps of hours), with no
shared memory. They must coordinate: hand off work, claim ownership, ask/answer,
and relay human (José) instructions.
Today's mechanisms — git-committed queue/to-<session>.md files, the active.md
claim scoreboard, and Telegram (which reaches only one session) — work but leak in
seven documented ways: the shared-tree race (closed by Layer 0), the Telegram
single-consumer gap (José hand-relayed two tasks to sA on 2026-05-27 because
Telegram is single-holder), lossy human relay, ownership ambiguity, a tracker
outage (today: tasks-prod tunnel down ~10:00→10:58Z UTC, ~60 min, backlog.md
bridged), plus two failure modes sC surfaced from the field on 2026-05-28:
(6) stale-claim / liveness gap — an active.md row can be "true" for hours
across breaks, crashes, or sleep with no mechanical liveness signal (today: a
session row stood for ~36h across a domestic break); the 6h reap rule is a
heuristic, not enforcement. (7) identity-vs-process drift — sX is a
per-process prefix, not a stable identity; a fresh Claude that re-claims sA
shares the prefix but not the prior process's memory. Cross-session continuity
rides on git history + memory files, not on a real identity.
The findings doc
surveys A2A, MCP-as-transport, multi-agent frameworks, RabbitMQ, Postgres, and the
git/Riff incumbents against nine requirements.
Decisive constraint: sessions are not always connected (hours-long gaps).
This rules out liveness/broker transports (RabbitMQ, A2A daemons) and favours a
durable shared store — the blackboard pattern. Git already provides durability,
worktree-isolated concurrency (Layer 0), audit (git log), and a human bridge,
with no new always-on infra. The only genuine gaps are presence and a
lossless ack chain.
Decision¶
Adopt a git-blackboard spine + Riff task-channel + Telegram human-bridge
three-tier protocol. Reject RabbitMQ as the bus (solves live fan-out, not our
async-durable need). Defer A2A to scale; defer Postgres session_bus to a felt
real-time need.
Identity & addressing (foundation)¶
A session has two identities, both addressable:
- Technical id
sX(sA/sB/sC, …) — load-bearing: worktree suffix,commit-msglefthook check, scoreboard column, branch convention,[sX]commit prefix. Ephemeral per-process — a fresh Claude that re-claimssAinherits the ID but not the prior process's memory. - Display name (per-project, short, memorable, project-themed) — durable
role label for human coordination and Riff
assignee. Mitigates failure mode 7 (identity drift): a new process re-claimssAand adopts the name fromactive.md, so the cross-session role identity is preserved even when in-memory continuity is not. Suggested pools per José's style ratification (2026-05-28): po-platform = Portuguese places (Sintra,Douro,Algarve,Tejo,Madeira,Porto,Lisboa,Coimbra,Cascais,Faro) or seafaring (Caravela,Bussola,Sextante,Quadrante,Cabo); codecomedy-platform = comedy-themed (owner's call); generic fallback = constellations / weather / herbs / board-game pieces (still ASCII-only). Mechanical rules (firm): plain ASCII letters[A-Za-z]only — no accents, diacritics, or special chars (US-keyboard ergonomics: accented characters cost 2 strokes apiece and compound across a session) — ≤12 chars, no spaces, unique within the project's pool. Style within those bounds is the project owner's call.
Address resolution. A message's to: may use either form, plus a lane alias
or broadcast — all resolve via the active.md row:
| Form | Resolves via | Example |
|---|---|---|
to:sA (technical id) |
active.md row matching sX |
direct, exact session |
to:Sintra (display name) |
active.md row matching name |
same target, human-friendly |
to:web-presence-owner (lane alias) |
active.md row whose capabilities claim the lane |
role-based, no need to know who holds it |
to:all |
every active row | broadcast |
start-session.sh is the canonical place to assign/prompt a name from the
project's pool and persist it in the worktree's active.md row alongside sX,
last_seen, and capabilities.
1. Message schema (Tier 1 — queue/to-<session>.md, and queue/to-all.md for broadcast)¶
Each message is a markdown block with a structured header:
## <ISO-8601-ts> · from:<sX> · to:<sY|all> · type:<msg|task|handoff|ack|question|answer|relay> · ref:<Riff#|commit|path|—> · status:<sent>
<body — markdown>
→ status:seen <ts> · acked <ts> · resolved <ts> (owner advances in place)
from/to— session IDs;to:allis broadcast; a lane alias (to:web-presence-owner) resolves to the current owner viaactive.md.type—relaymarks a human→agent message injected via Telegram (keeps the bridge lossless + audited).ref— the Riff #, commit SHA, or file path the message is about.
2. State machine (the ack chain)¶
The recipient/owner advances the status (edits the → status: line in their
own commit). sent→seen proves the message was caught; acked = will act;
resolved = done; superseded = obsoleted by a later message. A relay is lossless
because the sender can later read the advanced status, not just assume delivery.
3. Ownership (single-owner invariant)¶
- A unit of work (Riff or lane) has at most one owner at a time. The owner is
recorded in both the
active.mdclaim row and the Riffassignee(durable record-of-truth for task-scoped work). - Verbs:
claim(write the row / set assignee),release(clear it),handoff(to:sX)(atype:handoffmessage + reassign). No distributed lock — cooperative; the visible claim is the lock.
4. Presence beacon (mechanical, not hand-maintained)¶
active.md rows carry last_seen (UTC) + capabilities (e.g.
has-telegram, web-presence-lane). A lefthook post-commit hook bumps
last_seen for the worktree's sX row on each commit → presence ≈ recent commit
activity. Fallback when hooks are absent: derive liveness from the latest [sX]
commit in git log (the existing 6h reap rule is the coarse form). Capabilities
let a sender route by ability ("who has Telegram?") without asking.
5. Riff as the active channel (task-scoped)¶
Task-coupled coordination lives in tasks-prod (DB-backed, multi-client, no
single-holder, human web UI): assignee = owner, labels = lane/role, comments
= ack/decision log, dependencies = handoff ordering. MVP needs no schema change.
Atomic handoff = assignee + status swap in a single update_task call
(per sC's field practice — the doctrine "comment is human narrative; status is
the protocol" applies). The active-channel feature upgrade (a to_session
field, an "addressed to me, unread" inbox query backed by a per-(task, user)
last_seen_at so unread = updated_at > last_seen_at, and handoff/ack
comment kinds) is specified in the findings doc §4 and filed as Riff #221.
Implementation note (2026-05-31, sA review of codecomedy-platform PRs #233–#236, draft).
The feature was built as five slices; code reviewed against this ADR. Two
"owner's-call" items were resolved by the implementer and accepted on review:
- P2 addressing = approach B (name-shape inference; a single assignee TEXT
field where a registered session's id sA and display name Sintra are
interchangeable, resolved via a session_presence registry). The illustrative
"to_session field" wording above is superseded — no new column; the
requirement (both forms resolve to the same target) is met by resolveAssigneeForms
expanding the filter to both forms, falling back to a literal match for humans /
unregistered strings.
- P5 presence = explicit-only (sessions call upsert_session_presence; no
server-side auto-bump on other MCP calls). ⚠️ This partially deviates from §4's
"mechanical, not hand-maintained" intent on the Riff tier: an active session
that never self-registers is invisible to resolve_session, the P2 assignee
expansion, and list_active_sessions. Accepted for this round (registration must
be explicit anyway, since it carries display_name/lane/capabilities), with a
recommended fast-follow: auto-bump last_seen_at on cheap authenticated calls
(list_tasks/get_task/update_task/add_comment) for already-registered
sessions, so working sessions don't silently fall off presence — making the Riff
beacon as mechanical as the git-side post-commit hook.
- Schema keys on username TEXT (Authentik subject ids), not uuid — fine for
po-platform, which addresses sessions by setting TASKS_USER=<session-id> per MCP.
Integration prerequisites (po-platform side; gated on cc-platform merge + prod-deploy + MCP restart):
1. Each session's MCP must run with a distinct TASKS_USER (sA/sB/sC), or all
sessions share one inbox/presence row and the per-session semantics collapse.
2. Sessions must self-register via upsert_session_presence at start and heartbeat
it — the in-Riff mirror of the active.md claim + post-commit beacon. Until the
PRs deploy, the git blackboard (Tier 1) remains the sole live channel.
The migrations land additively on the shared cc_prod instance (po-platform's Riff
project lives there too); merge + deploy timing is the human's go.
6. Human bridge¶
Telegram remains the single-holder human↔agent bridge — appropriate for a bridge,
unfit as the bus. The holding session relays José's instructions into Tier 1 as
type:relay blocks. José can also edit any queue file / Riff directly to inject or
arbitrate.
Consequences¶
Positive: no new always-on infrastructure; durable + concurrent + auditable by
construction; the lossy relay becomes tracked (relay + status); presence stops
being guesswork; ownership has explicit verbs + a single-owner invariant; everything
degrades gracefully (a session with no MCP/Telegram can still read/write the bus on
git pull).
Negative / costs: still poll-on-pull — no push; near-real-time handoffs wait
for the next git pull. Status advancement is a discipline backed only by a hook,
not hard-enforced. The Riff channel depends on tunnel uptime (down ~2h on
2026-05-27) → it complements, never replaces, the git spine.
Upgrade triggers (revisit this ADR), in increasing weight:
- (a) Lightweight push — file-watcher notify (sC's primary ask, 2026-05-28).
A tiny per-session loop watching git log origin/main..main -- docs/ai/sessions/queue/to-<me>.md
surfaces "N new messages" on the next interactive turn. Zero new infra, zero
schema change; closes the wake-up gap that turned the Telegram→sA relay
twice-painful today. Recommended first upgrade if poll-on-pull latency hurts.
- (b) Heavier real-time — Postgres session_bus + LISTEN/NOTIFY for
true push + row-locks; costs a daemon + "only up with the dev stack."
- (c) At scale — A2A (Agent Cards + Tasks) for cross-machine / many-agent
/ untrusted peers.
RabbitMQ stays rejected: sC's field check confirmed a session shell can reach
it (docker exec po-rabbitmq …), but (i) mixing with the business messaging
vhost is a category error; (ii) the persistent-consumer requirement clashes with
async/idle sessions that spawn, work in bursts, and respawn with zero in-memory
state; (iii) auditability beats real-time for our workflow — git log -- queue/to-sX.md
remains grep-able months later, broker messages are ephemeral once consumed.
Rollout: operationalized as Layer 6 of parallel-sessions.md v4 (message
schema, status lifecycle, presence cells, ownership verbs). The post-commit
heartbeat hook and the active.md last_seen/capabilities columns are the only
mechanical additions; the message-header + status convention is documentation +
discipline, adoptable immediately.
Cross-project extension (Layer 7, 2026-05-31)¶
Layers 0–6 cover sessions within one repo (sA/sB/sC on po-platform). A
second axis surfaced once the Riff product itself was built by an agent in a
different project (codecomedy-platform): how do sessions in different projects
coordinate?
The constraint that picks the channel is the same one as §Decision, one level up:
Tier 1 (the git blackboard) is per-repo — a session in repo A cannot git pull
repo B's queue/; Tier 3 (Telegram) is a single-holder human bridge. Only the
shared tasks-prod instance (Tier 2) is reachable from sessions in different
projects. So cross-project coordination must ride Tier 2 — it is the only
cross-project bus we have.
Identity must be project-qualified. sX and display names are unique only
within a project's pool — po:sA ≠ fit:sA. Cross-project addressing therefore
uses a project-qualified handle: <project-tag>:<sX> or
<project-tag>:<DisplayName> (po:sA, cc:Bicho), <tag>:* to broadcast a
project's sessions.
Decision (José, 2026-05-31): a convention-only channel now; the mechanical
backing later.
- Now (convention): a dedicated tasks-prod project — "Agent Comms
(cross-project)" (9cf68a60-fa35-4707-9371-c775f2542bf5) — is the shared
board. One task per thread; title <from> → <to>: <subject>; labels
from:<tag>/to:<tag>; body carries the ADR §1 header at cross-project scope;
comments carry replies + ack. Zero new code; reuses tasks/comments/labels. Its
pinned CONVENTION task is the normative usage doc.
- Later (mechanical): project-qualified session_presence
(PK (project_id, session_id)) + cross-project resolve_session + a typed
cross-project message primitive. Filed as RIFF-004 in the "Riff - Agents
Feedback" product backlog. This is also why migration 059's missing
project_id is a pre-merge blocker (flagged on PR #236 / Riff #221): the
same shared-instance collision, one layer down.
This extension is additive to Layers 0–6: intra-project sessions keep using the per-repo git blackboard as their spine; only genuinely cross-project traffic goes to the Agent Comms board.