Architecture
Architecture
PlanVault follows a layered architecture with clear separation between the HTTP API layer, the planning engine, the FSM execution runtime, and the persistence layer. The backend is Scala; the frontend is a React SPA.
Pipeline (static diagram)
End-to-end orchestration from integrated APIs and the tool catalog through PlanVault to deterministic execution (diagram below). The vertical client message path is described in the Data flow subsection further down.
Your APIs
OpenAPI / Swagger
MCP servers
Webhooks
Tool Selection
500+ tools in catalog
Vector + FTS search
Adaptive routing
AI Planning
LLM produces an execution plan
Structured JSON or DSL
HITL approval gate
Execution
Event-sourced FSM
AES-256-GCM encrypted
Real-time SSE stream
System components
- HTTP API Layer
Exposes the Runtime API (/api/v1/…): project-scoped routes (sessions, messages, tools, scheduled jobs, and related operations) plus public organisation inbound triggers at POST /api/v1/orgs/{orgId}/webhooks/{triggerKey}. The Admin API (/admin/v1/…) covers organisation, project, and console configuration (Keycloak JWT). SSE endpoints stream live execution. OpenAPI is generated automatically; the interactive explorer at /api-docs publishes the public contract (platform-operator `/superadmin/v1` routes are typically served by a separate process and are not part of that bundle). Errors follow RFC 7807; use X-Request-Id for correlation.
- Planning Engine
Receives user prompts with a shortlisted set of tools, constructs a system prompt including scenario instructions and context variables, and calls the configured LLM via the proxy. Depending on plannerMode, the response is parsed as structured JSON or a text DSL within <script> tags. Both paths compile into a single execution plan for the FSM.
- Execution Runtime
Session execution is modelled as a crash-resilient finite-state machine on Apache Pekko: state transitions are journaled for recovery after restarts; after completion the ephemeral journal is removed while long-term encrypted history stays in the configured event store.
- Tool Executor
Dispatches tool calls to external services via HTTP (REST), MCP protocol, or outbound webhook POST. Secrets are decrypted from scope and injected into request headers/bodies at execution time — values never reach the LLM prompt. Tool approvalPolicy, project planApprovalMode, and session autoApprovePlan control human-in-the-loop pauses before side effects.
- Adaptive Retrieval
Narrows the full tool catalog to a relevant shortlist per query. Strategies include vector similarity, PostgreSQL full-text search (FTS), hierarchical centroid routing, and scenario-based boosts — all fused via weighted scoring with usage statistics. Configurable per project and org with auto mode that adapts based on tool count thresholds.
- Persistence Layer
PostgreSQL stores organisations, projects, sessions, tool catalog metadata, scenarios, audit logs, and GDPR exports. Long-term encrypted session history in hosted SaaS is typically separated from the ephemeral execution journal; self-hosted operators choose storage mode (PostgreSQL or filesystem). Optional Redis handles idempotency keys on message and action endpoints.
Data flow
The public Runtime API orchestrates the full pipeline: tool selection → planning → execution. The FSM runs asynchronously — after POST /api/v1/projects/{projectId}/sessions/{id}/messages the client typically receives HTTP 202 Accepted and monitors progress via SSE (GET …/sessions/{id}/chat) or history polling.
Client → Runtime API
POST /api/v1/projects/{projectId}/sessions/{id}/messages — the client submits a natural-language prompt.Adaptive Retrieval → tool shortlist
The retrieval pipeline selects the optimal strategy (Direct / FlatRag / FullRag / HierarchicalRag) based on catalog size and narrows thousands of tools to a focused shortlist.Planning engine → LLM → execution plan
Tool signatures and session context are sent to the LLM via the proxy. The model returns a structured execution plan — never raw function calls.Session → execution runtime
Inside the session a crash-resilient FSM is created; state transitions are journaled for recovery before they take effect.Tool execution → external services
Each plan step is executed against your APIs (webhooks, MCP servers, OpenAPI endpoints). Results are encrypted and stored as session events.SSE stream → Client
Execution progress, tool results, and final output stream back to the client in real-time via Server-Sent Events.Architecture Decision Records
Key architectural decisions behind PlanVault and the reasoning that led to each choice.
Why adaptive tool selection?
Enterprise API landscapes routinely exceed 500 endpoints. Passing all tool signatures into a single LLM prompt is infeasible: context windows are finite, latency grows linearly, and plan quality degrades with noise. A fixed-strategy approach forces operators to choose between coverage and quality.
Decision: implement a 4-tier adaptive retrieval pipeline that automatically selects the optimal strategy based on catalog size. • Direct (≤20 tools) — all tools in prompt, zero retrieval overhead • FlatRag (≤100) — vector similarity narrows the set • FullRag (≤200) — hybrid vector + FTS with Reciprocal Rank Fusion • HierarchicalRag (200+) — centroid-based group routing before vector search Why centroids instead of an LLM classifier? The classic approach (pass each group description to an LLM and ask it to choose) adds 1–2 seconds of latency before every agent action and scales linearly with the number of groups. Instead, PlanVault stores a mean L2-normalised embedding (centroid) for each tool_group in the tool_group_centroids table. Finding the nearest K groups is a single pgvector query at the DB level (~10–50ms) vs ~1000–2000ms for an LLM call. This ensures plan quality remains constant regardless of catalog size, while keeping prompt token usage minimal. Scenario-based boosting adds an adaptive feedback loop: Semantic Routing Cache tracks which tools succeed for which query patterns and promotes them in future shortlists.
Why event sourcing?
AI agent execution is inherently non-deterministic and long-running. Traditional request-response patterns fail when an agent crashes mid-plan, when a tool call times out, or when a human approval gate pauses execution for hours. Losing execution state in any of these scenarios means lost work, duplicate side effects, and broken audit trails.
Decision: build the execution runtime as a crash-resilient finite-state machine (FSM) on Apache Pekko: every state transition is persisted as an event before it takes effect; after a failure the journal is replayed and execution is reconciled. • Encrypted long-term session history is stored separately from the ephemeral execution journal; storage mode is configured per deployment (`session-store.mode`). • The Idempotency-Key header with Redis reduces duplicate mutations on client retries. The result: sessions survive restarts, network partitions, and infrastructure failures through explicit recovery states and idempotency.
Why envelope encryption?
Multi-tenant platforms store sensitive data from multiple organisations in shared infrastructure. A single database encryption key means one breach exposes every tenant. Column-level encryption with a shared key only protects against disk theft, not application-layer compromise. Regulatory frameworks (GDPR, HIPAA, SOC 2) increasingly require tenant-isolated key management.
Decision: envelope encryption with a per-organisation AES-256-GCM DEK; wrapping follows deployment mode (**typical SaaS:** Google **Tink** with an **AWS KMS**–backed key encryption key; **legacy:** direct AWS KMS wrap of DEK bytes; **self-hosted:** operator-managed KEK). • Each organisation receives a unique DEK at creation • The `organizations.dek_wrap` column selects the unwrap stack (Tink vs legacy KMS) • Async DEK rotation re-encrypts data in background batches; reads stay available and new encrypted writes during rotation use the pending DEK version • Crypto-shredding semantics tie to org deletion / GDPR flows rather than a single toggle described here • Secrets are stored encrypted and only resolved at execution time — they never appear in LLM prompts • A separate HMAC signing key pseudonymises external user identifiers and is distinct from KMS/Tink KEK material This yields tenant-isolated keys; KEK material stays outside application code — IAM in cloud deployments or operator secret stores when self-hosted.
Why separate planning from execution?
Most agent frameworks give the LLM direct control over side effects: the model decides what to call and the framework immediately executes it. This makes the model a runtime controller with no safety boundary — one hallucinated tool call can mutate production data, and there is no consistent point to insert approval gates, retry policies, or audit logging.
Decision: separate planning (the LLM produces a structured execution plan) from execution (deterministic FSM runtime). • By default, the LLM receives tool signatures and metadata, not raw payloads; the only exception is explicitly enabled bounded evidence replan for read-only tools. Secrets are never sent into the prompt. • The output is a structured execution plan, not a direct function call • The runtime validates the plan, applies approval gates, and executes steps one at a time • Each step has explicit retry, timeout, and error-handling policies • The plan can be reviewed, summarised (via utility model), and approved before any side effect occurs This architecture makes AI orchestration auditable, controllable, and safe for regulated environments where uncontrolled model-driven execution is unacceptable.
Operational Capabilities
Beyond the orchestration core, PlanVault includes a set of operational capabilities critical for production: adaptive tool selection with Semantic Routing Cache and feedback loops, replay debugging, run-level tracing, automatic OpenAPI drift correction, crash recovery, and scheduled execution.
Feedback Loops for Scenarios and Plans
The adaptive tool selection system does not just execute queries — it refines routing from every outcome. Scenario ranking considers execution outcomes, HITL rejections, and explicit user feedback (like/dislike) to improve future tool selection without retraining the model.
• Signal-aware scenario scoring: weighted execution failures, HITL rejects, and explicit likes/dislikes • Penalisation on HITL reject for contributing auto-scenario vectors • POST …/sessions/{sessionId}/feedback records like/dislike for a terminal run (Runtime project key with sessionWrite scope or member JWT); optional X-Operator-Id header when using an API key • Encrypted feedback events tied to runs/scenarios feed ranking updates • Console chat surfaces feedback controls after completed runs
Scenarios & Adaptive Tool Selection
Scenarios are the key mechanism for ensuring the right tools are selected for every query and that selection accuracy improves with each execution. Scenarios work as cached mappings from query patterns to tool sets and are created in two ways: manually by administrators (priority 2–100) or automatically after every successful execution (priority 1).
Auto scenarios build an adaptive routing loop that operates without operator intervention. After every successful execution, the system boosts the ranking of the matched tools for similar queries in the future. Matching uses the **Semantic Routing Cache** (organisation-scoped query embeddings; raw end-user prompts are not stored) — organisation owners can disable it under General settings. If an execution fails, the scenario receives a negative signal — its success_rate drops and its influence on tool selection decreases. HITL rejections and explicit user feedback (like/dislike) are also factored in. Routing quality improves without retraining the LLM.
Manual scenarios give full control: an administrator explicitly maps sample queries to tool sets and can add a systemInstruction — extra planner prompt text with {{key}} placeholders filled from session contextVars. This guarantees that for typical business processes the system always selects the same proven set of tools — a stable, predictable scenario that covers the majority of cases.
• Automatic scenario creation after successful executions — routing optimises without manual intervention • Semantic Routing Cache — embeddings for semantic matching; enabling/disabling the org-wide cache is **OWNER-only** (Admin API); Routing Cache and Suggested Patterns tabs in the console are available to OWNER, ADMIN, and DEVELOPER (MEMBER cannot manage scenarios) • Manual scenarios with priority 2–100 for critical business processes — predictable outcomes • Success rate tracking: every auto-scenario has a success metric updated after each execution • Weighted fusion with retrieval RRF scores and usage boost — scenarios complement search, not replace it • systemInstruction with {{key}} placeholders for context-dependent planner behaviour • Group caps prevent any single integration from dominating the shortlist • Test selection endpoint for dry-run of the full pipeline without actual execution
Debugging, Replay & Failure Analysis
Debugging complex AI-orchestrated workflows is expensive: re-running entire scenarios, spending tokens, re-invoking external systems. PlanVault provides a debug/replay layer for efficient failure analysis.
• Modes frozen and live reuse the saved plan snapshot; frozen replays recorded tool outputs, live performs real HTTP/MCP/webhook calls (optional JSON parameter/tool overrides) • replan_frozen / replan_live re-run planning from the original prompt with frozen or live side effects respectively • Checkpoint replay (frozen/live only): resume from tool:<name> plus optional completion index — incompatible with replan_* modes • Plan snapshots are persisted per run for replay inputs • Runtime API (Keycloak JWT only, sessionHistoryRead scope + org debug content access): POST /api/v1/projects/{projectId}/sessions/{sessionId}/runs/{runId}/replay, GET /api/v1/projects/{projectId}/sessions/{sessionId}/runs/{runId}/replay-status, GET /api/v1/projects/{projectId}/sessions/{sessionId}/runs/{runId}/replay-runs, GET /api/v1/projects/{projectId}/sessions/{sessionId}/runs/{runId}/diff?compareWith={otherRunId} — rate limits apply
Run Tracing & Operational Diagnostics
Every run receives a full diagnostics stream with graph-oriented events: from selection and planning through every tool call to the terminal state. This enables step-level execution analysis.
• Persisted diagnostics rows: seq_no, visibility, graph-normalised kinds (node_entered, node_completed, branch_taken, node_paused for HITL, terminal for completion/errors), graph node ID, duration, outcome, error taxonomy • GET …/runs/{runId}/timeline — product-safe timeline; GET …/runs/{runId}/diagnostics — detailed rows (Keycloak JWT required; project API keys are rejected) • Correlation: requestId and traceId on diagnostic rows when present • Planner/tool latency, replan, HITL, and failure signals surface in the graph • Console Run diagnostics page (JWT users): execution graph and timeline • Retention: diagnostics are pruned after a short default window unless org session retention extends it — not an indefinite audit archive
OpenAPI Auto-Healer
OpenAPI specs drift: fields become required, types change, responses stop matching the imported schema. When an OpenAPI-backed HTTP tool fails on PlanVault’s HTTP replan path, Auto-Healer ingests the status code and error snippet, diagnoses likely drift with deterministic rules (optional LLM-assisted proposals), and applies or queues patches under policy.
• Ingestion from OpenAPI HTTP tool failures on the planner replan path (HTTP status + sanitised error text) • Deterministic diagnosis: missing fields, type mismatch, nullable/required drift, enum/content-type mismatches (some classes stay review-only) • Policy modes: auto-apply supported patches or escalate risky changes to operators • Apply creates a new tool revision (integrations rebound); audit logs record OPENAPI_TOOL_SPEC_HEALED • Admin API (Keycloak JWT): cursor-paged list, fetch one event, actions retry | apply | reject | review
Runtime Recovery & Execution Semantics
PlanVault formalises every run lifecycle through canonical statuses (queued, planning, awaiting_confirmation, awaiting_slots, executing, awaiting_external_signal, completed, failed, interrupted, needs_manual_recovery, and related transitions). After a process restart the runtime reconciles open runs automatically.
• Explicit run lifecycle with canonical states and transitions via FSM events • First-class run persistence in a dedicated table with recovery policy • Run-level source of truth independent of coarse session status • Restart reconciliation: open runs automatically recover after a process restart • Idempotency keys to prevent duplicate mutations during crash recovery • SSE and lifecycle webhooks reflect run-level transitions
Scheduled Execution
PlanVault supports delayed and recurring orchestration. One-shot delays use durable scheduled jobs (native schedule_execution tool or Scheduled Jobs API). Recurring schedules use hourly/daily/weekly rules with an IANA timezone — not arbitrary cron strings surfaced in the product UI.
• Native schedule_execution enqueues a durable job to open a new session or resume an existing one after delaySeconds or an absolute runAt — capped by maxScheduleHorizonDays • Recurring schedules (Admin/API): hourly, daily, or weekly rules with timezone + recurrence payload; targets new_session_prompt or resume_session • Worker dispatch survives process restarts; short sleep-style waits inside a plan remain separate from business scheduling • Job state, retries, and encrypted schedule secrets persist in PostgreSQL