Getting started

Overview

PlanVault is an enterprise AI orchestration platform. Clients send natural-language prompts; an LLM generates structured execution plans with dynamic routing to the right tools (REST APIs, MCP servers, webhooks). The platform handles multi-tenant isolation, adaptive tool selection, webhook and API confirmations, secret management, multi-provider LLM routing, event-sourced FSM execution with crash recovery, and provides both a web console for administration and a Runtime API for programmatic integration.

This guide covers the system architecture, core concepts, request lifecycle, and security model. For REST API endpoints, request/response schemas, authentication details, and interactive Swagger — see the API documentation page.

Key capabilities: • Multi-tenant architecture with organisation/project isolation and role-based access • Dual planner modes: structured JSON Schema and Python-like DSL — same internal plan representation, same runtime • Adaptive tool retrieval with vector search, FTS, hierarchical centroid routing, and scenario-based fusion • Event-sourced FSM execution with crash recovery (Apache Pekko persistence) • Envelope encryption (AES-256-GCM) for all secrets, provider keys, and session data at rest • Real-time SSE streaming for execution progress • Human-in-the-loop: plan approval and slot/form input • Lifecycle webhooks for backend integration (outbound POST for subscribed types: completed, failed, requires_action, interrupted, recovery_required) • Inbound webhook triggers to start sessions from external systems (Slack, GitHub, CI) • Data export, per–external-user erasure workflows, and configurable session retention for portability and subject-rights handling • Self-hosted and air-gapped VPC deployment support

API docs and Swagger

Technical Advantages

PlanVault is built around capabilities that matter in production: scaling to thousands of tools without context overflow, handling multi-megabyte API responses, encrypting every byte at rest, and recovering from crashes mid-execution. This section covers the technical differentiators in detail.

Intelligent tool selection

Most LLM-based agent platforms are limited to 128–200 tools per prompt context. PlanVault’s 4-tier adaptive retrieval scales to thousands of registered tools — the planner always receives a focused, relevant shortlist. On cold starts (before the feedback loop kicks in) semantically similar APIs may compete, but auto-scenarios quickly adapt the ranking.

• Automatic OpenAPI → tools ingestion with full lifecycle: versioning, embedding generation, search document construction • 4-tier adaptive strategy with configurable thresholds (defaults: Direct ≤20 tools, FlatRag ≤100, FullRag ≤200, HierarchicalRag 200+) • Scenario-based boosting — manual templates (priority 2–100) and auto scenarios with Semantic Routing Cache (semantic embedding match; raw user prompts are not stored) plus success_rate tracking • Per-group caps prevent any single service from dominating the shortlist • Test selection endpoint — dry-run the full pipeline without real execution (POST …/tools/test-selection) • Configurable fusion weights (RRF-K, usage boost cap, centroid top-K) per org/project • Hard cap at 200 tools in the final shortlist; retrievalMaxTools default 30

Large response handling

API responses of 1–20 MB don’t break the agent. PlanVault extracts what matters before data becomes runtime context; the planner does not receive raw payloads by default, and bounded evidence replan for read-only tools is explicitly opt-in.

• Input & output schema flattening — nested OpenAPI structures are converted into flat parameter lists, reducing hallucinations during plan generation • resultJsonPath on webhook tool execution details — extract a specific fragment from a large JSON response before it enters execution scope • get_field / set_field / merge stdlib tools — the planner can work with large objects incrementally without pulling everything into the prompt • Output field visibility cap in prompt generators — the LLM sees only the first five output schema fields, not an entire 500-field JSON definition • On the **evidence replan** path with `postSuccessReplan.redaction` enabled, large JSON is trimmed via configurable maxDepth (default 4), string caps, and optional key drops before the planner sees a fragment — ordinary tool results are not universally depth-truncated for the model

Envelope encryption

Bank-grade encryption. Secrets never reach the LLM. Every byte at rest is encrypted with the organisation’s unique key.

• AES-256-GCM with per-organisation Data Encryption Key (DEK) • DEK wrapping follows deployment mode (**default SaaS:** **Tink** + **AWS KMS** KEK; **legacy:** direct KMS; **self-hosted:** operator KEK) • Async DEK rotation with batch re-encryption — no downtime for reads; new encrypted writes during healthy rotation use the pending DEK version • Secrets never placed into LLM prompts — only variable names as handles; FSM decrypts and injects values at tool execution time • Session events are always encrypted at rest in the configured session event store (hosted and self-hosted defaults differ; self-hosted: PostgreSQL or filesystem per `session-store.mode`) • External user IDs hashed with HMAC-SHA256 before storage (never stored in plaintext) • Documented threat model: database compromise, stolen API keys, JWT forgery, prompt injection

Multi-protocol integration

Import your OpenAPI spec. Connect MCP servers. Wire up webhooks. All tools land in one catalog, one selection pipeline, one execution runtime.

• OpenAPI / Swagger — automatic import from JSON or YAML with full schema parsing, auto-embedding generation, and search document construction • MCP (Model Context Protocol) — stdio and remote HTTP transport; tools synced into the org catalog automatically • Outbound webhooks — PlanVault calls external services (n8n, Zapier, Make, custom endpoints) with resultJsonPath for response filtering • Inbound webhooks — external systems trigger new PlanVault sessions (Kafka, event buses, CI) via HMAC-SHA256 signed requests • Unified catalog — all tool sources share the same adaptive selection pipeline and execution runtime

Self-hosted deployment

Deploy in your VPC, data centre, or air-gapped network. Customer-managed infrastructure keeps tenant data and catalog metadata under your control; LLM traffic goes only to backends you configure.

• Full stack deploys on customer infrastructure (Docker, Kubernetes, or bare-metal) • Organisation DEK wrapping is driven by your self-hosted KEK configuration — not universally “AWS KMS in your customer account”; development stacks may use a compatible KMS-like endpoint • An LLM proxy layer enables local models (Ollama, vLLM, custom base URLs) without mandatory external providers • Outbound calls go only to backends you configure (LLM vendors, integrated APIs); there is no separate PlanVault product telemetry in self-hosted deployments • Optional `session-store.mode=postgres` or `file` for durable session events; file mode is typically single-node — see the public self-hosted setup guide • GDPR export and erasure out of the box (organisation, project, external user) • Configurable session retention per org with automatic pruning • Ephemeral execution state is cleaned up after runs, including across crash/restart

Crash recovery & event sourcing

Agent crashes mid-execution? PlanVault reconciles the run automatically, keeps encrypted history, and separates recoverable states from manual recovery.

• A durable execution journal lets the runtime reconcile state after a crash and resume only safe transitions • Long-term encrypted event history is stored separately from the ephemeral execution journal • Finished runs clear the temporary journal after durable events are confirmed • Run lifecycle uses explicit statuses for interruptions and cases that need manual recovery • The session message queue serialises concurrent prompts • The Idempotency-Key header with Redis supports safe client retries within a typical TTL

Human-in-the-loop

Keep humans in control. Plans can require approval before execution; agents can request additional information via slot forms.

• Plan approval before execution — three-layer policy: tool level (approvalPolicy: always/default/auto_ok), project level (planApprovalMode: require/auto), and session level (autoApprovePlan: true) • Auto-approve bypasses the HITL wait and writes a PLAN_AUTO_APPROVED audit row with approvalSource; a tool with approvalPolicy=always blocks auto-approval and writes PLAN_AUTO_APPROVE_BLOCKED • Changing planApprovalMode at the project level is recorded in the audit log with a timestamp and actor (PROJECT_PLAN_APPROVAL_MODE_CHANGED) • Slot filling — the agent can pause and request additional data from the user via structured input forms • SSE streaming for real-time execution UX (GET …/sessions/{id}/chat); confirm_plan_result events include an approvalSource field • Plan summary via utility model — a separate LLM call generates a plain-language numbered summary (max 5 items) for non-technical approvers • Lifecycle webhooks: session.requires_action when approval or slots are needed; session.completed / session.failed for terminal outcomes; when subscribed, session.interrupted and session.recovery_required cover crash/recovery semantics; session.completed payloads include approvalMode

LLM budget control

Control costs at every level. Set token and spend caps per organisation and per project with automatic enforcement.

• Per-org and per-project budgets: token count and/or USD caps per billing period (calendar month or rolling 30 days) • Multi-provider LLM routing (OpenAI, Anthropic, Google, local models via custom api_base) • Model override per project — different projects can use different models and cost profiles • Provider API key encryption — all vendor credentials encrypted with the org DEK • Automatic enforcement: HTTP 403 with specific error codes (ORG_LLM_BUDGET_TOKENS_EXCEEDED, PROJECT_LLM_BUDGET_SPEND_EXCEEDED) when limits are hit

Scoped API keys (HRN)

Fine-grained access control for API integrations. Each key carries explicit scopes matching specific resource patterns.

• Each API key carries HRN-based scopes (e.g. hrn:project:session:create, hrn:project:tools:read, hrn:project:*) • Per-project key quota comes from the billing plan / config (hosted default: 25) — one primary (full access) plus additional scoped keys • Key rotation without downtime — new key issued instantly, old hash invalidated • Key preview (last 4 characters) for identification without exposing the full secret • Keys hashed (SHA-256) before storage; plaintext shown only at creation or rotation

Integration examples

Production-ready integration examples for common patterns, available in the open-source planvault-examples repository.

• React SSE chat — real-time streaming chat UI with session management • Kafka → webhook — event-driven session creation (Scala, Java, Python variants) • MCP stdio — Python + SQLite tool server connected via Model Context Protocol • n8n workflows — outbound + inbound webhook integration patterns • Bash E2E smoke test — quick deployment verification script

PlanVault/planvault-examples on GitHub

Why PlanVault

PlanVault was designed for production enterprise workloads from day one. The table below compares key capabilities against popular alternatives across the dimensions that matter most for regulated, high-scale deployments.

CapabilityPlanVaultOpenAI AssistantsLangGraphCrewAI
Tool limit per session1 000+ tools in catalog; adaptive shortlist per turn (hard cap 200)128Manual, typically ≤20Manual, typically ≤20
Large response handlingSchema flattening, JSONPath extraction, stdlib tools, depth truncationToken limit onlyCustom code requiredCustom code required
Encryption at restAES-256-GCM envelope, per-org DEK; SaaS: Tink+AWS KMS KEK; legacy KMS; self-hosted KEKProvider-managedNot built-inNot built-in
On-premise / air-gappedFull stack, local LLMs via built-in LLM proxyNo (API-only)Self-managed infra, cloud LLMSelf-managed infra, cloud LLM
Human-in-the-loopPlan approval, slot filling, webhooksNot built-inCustom implementationCustom implementation
Crash recoveryEvent-sourced FSM, auto-recovery, idempotency keysNo (thread-bound, provider-managed state)Checkpoints (manual)No
Multi-protocol integrationOpenAPI, MCP, webhooks — unified catalogCustom functions onlyCustom tools onlyCustom tools only
Routing latencyDB-level centroid routing for large catalogs (typically milliseconds vs LLM classifiers)Internal OpenAI implementationLLM as classifier (slow)LLM as classifier (slow)
Popularity bias protectionLogarithmic RRF smoothingNoneNoneNone
Adaptive routing (feedback)Auto-scenarios with success weight updatesNoneManual prompt tuningManual prompt tuning
Built-in audit trailImmutable audit log (append-only), all approvals/rejections with timestamps and details, configurable retentionNoneNoneNone
Comparison methodology
Comparison based on publicly documented capabilities as of Q2 2026. Third-party platforms may have updated their feature sets since. PlanVault values are verified against the codebase.

APIArchitecture

Support page

API and documentation questions: [email protected]