Architecture, runtime, and developer workflow
Review how PlanVault routes tools, validates plans, executes controlled runs, records audit events, and supports replay diagnostics. Bring Your Own Key (BYOK) for AI providers keeps credentials under your control.
A model plan becomes a validated, replayable execution trace
PlanValidator passed
Schemas, params, and terminal path checked before execution.Approval required
Runtime pauses side effects until policy approval.Model-safe context
Secrets stay out of the prompt; raw large outputs stay out of the planner by default, except explicitly enabled bounded evidence for read-only replans.The LLM returns a plan, not side effects
plan "notify_customers"
orders = list_orders(limit: 10)
for order in orders
email = get_field(order, "billing_email")
if email.present?
notify_customer(order.id, email)
reply("Notifications queued")PlanVault validates and executes
tool shortlist
donePlanValidator
doneapproval gate
waitingreply
queuedEvery step leaves evidence
Approval event
Approval policy, actor, and decision are preserved with the run.Secret boundary
Secrets resolve inside runtime and never become prompt text.Replay trail
Diagnostics keep the trace inspectable without exposing raw output by default.Controls for risks you cannot leave to the model
Data boundaries, routing, recovery, secrets, and large responses are controlled by the system, not prompt instructions.
Data stays under your control
Deploy inside your VPC or an air-gapped capable environment, connect local models through LiteLLM, and keep data inside your own perimeter.
Tool scale without prompt chaos
PlanVault narrows a large service catalog to the relevant tools before planning, so the model does not see thousands of descriptions or spend tokens on routing. A Semantic Routing Cache accumulates anonymized embedding vectors from successful runs and improves retrieval accuracy over time — without ever storing raw query text.
Recovery without uncontrolled side effects
Every execution step is recorded as an event. After a failure, a run can recover or stop safely without zombie processes or duplicated mutations.
Late-bound secrets, encrypted at rest
Secrets are late-bound through scoped references; prompts receive variable names, not plaintext credentials.
Large responses without context overflow
The runtime works with large service responses without pushing raw payloads into the LLM by default; read-only tools can explicitly opt into a bounded evidence excerpt for replanning.
Where PlanVault sits next to your agent stack
PlanVault is the governance and execution layer that wraps your agent stack — any LangChain, LangGraph, CrewAI, AutoGen, DSPy, OpenAI Assistants, or MCP agent runs as a PlanVault tool, with tool calls, secrets, approvals, and recovery handled outside the model and every run recorded for audit and replay.
PlanVault keeps tool calls, secrets, approvals, and recovery outside the model.
Tool routing
Centroid-based DB routing; shortlist built before the LLM
Internal OpenAI implementation
LLM as classifier (slow)
LLM-as-classifier; not centroid-based
Self-hosted · VPC-ready
External API dependency
Self-managed infra, cloud LLM
Hosted SaaS; no self-host
Catalog scalability
Dynamic routing from catalogs of thousands of tools; adaptive shortlist per step
128 tools (Assistants API); context-bounded (Responses API)
Manual assignment, typically ≤20 per agent
Manual catalog; ≤ low hundreds
Crash recovery
No (thread-bound, provider-managed state)
Checkpoints (manual); recovery semantics stay app-owned
Conversation-bound; no event journal
Large responses
Schema flattening, JSONPath extraction, stdlib tools, depth truncation
Token limit only
Custom code required
Token limit; no JSONPath / flattening
From existing services to governed AI execution
Five steps show where the model stops and the governed runtime takes over: tool selection, validation, execution, audit, and replay.
Connect the services that already do the work
Start with what your team already has: APIs, MCP servers, webhooks, session context, and correlation metadata. PlanVault turns them into a tool catalog that is easier to select from and safer to plan against.
OpenAPI/Swagger import runs asynchronously and versions tool definitions.
MCP servers and outbound webhooks become normal tools in the same catalog.
Flattened schemas, descriptions, and search documents make tools easier for retrieval and planner prompts.
X-Request-Id, traceparent, tags, and metadata help correlate runs with your systems.
PlanVault prepares the shortlist and context before the LLM
Before the model is called, PlanVault narrows a large catalog into a relevant shortlist. Selection combines retrieval, scenarios, usage signals, and project limits so the model sees prepared context instead of the whole enterprise catalog.
Manual and auto-recorded scenarios provide boosts and optional planner guidance.
Adaptive retrieval moves from direct selection to FTS/vector/hierarchical search as the catalog grows. A Semantic Routing Cache accumulates anonymized embedding vectors after successful runs, improving retrieval precision over time without storing any raw query text.
Hybrid fusion combines retrieval score, scenario boost, and capped usage signal.
Secrets and raw large outputs do not become planner prompt content by default; bounded evidence replan is explicit for read-only tools.
The model plans, but does not execute side effects
The planner can use different providers through LiteLLM, and the planning mode depends on the selected model. The LLM creates a plan from allowed context; execution, secrets, and side effects stay in runtime.
The model resolves at session, project, organization, or global default level.
Structured JSON and Python-like DSL planning modes are supported.
A utility model can handle short auxiliary responses such as slot summaries.
An embedding model builds vectors for tools and scenarios in the background so future requests can find relevant tools faster.
PlanVault validates the plan and executes it deterministically
The plan is validated and executed through a reactive JVM actor backend. Approval policies, idempotency safeguards, error handling, recovery, and event sourcing live here, instead of letting the model directly control tool calls.
PlanValidator checks tools, params, variables, and exactly one terminal reply/fail.
The execution graph runs through a crash-resilient engine with event-sourced state.
HITL gates depend on project/session/tool policies, not arbitrary LLM decisions.
Retry and replan paths are bounded by configuration and runtime control.
Get traceable output, audit, and production workflow integration
The result is more than a user-facing answer. PlanVault leaves an operational trail: SSE, history, diagnostics, replay, audit logs, encrypted data, and lifecycle hooks for your systems.
SSE and history expose planGraph, tool events, approvals, errors, and terminal states.
Diagnostics and replay help investigate runs without unsafe raw-content access by default.
Envelope encryption, retention, export/erasure, and pseudonymisation support GDPR-oriented operations.
Lifecycle webhooks and APIs let you integrate incrementally with the team and services you already have.
Connect the services that already do the work
Start with what your team already has: APIs, MCP servers, webhooks, session context, and correlation metadata. PlanVault turns them into a tool catalog that is easier to select from and safer to plan against.
OpenAPI/Swagger import runs asynchronously and versions tool definitions.
MCP servers and outbound webhooks become normal tools in the same catalog.
Flattened schemas, descriptions, and search documents make tools easier for retrieval and planner prompts.
X-Request-Id, traceparent, tags, and metadata help correlate runs with your systems.
PlanVault prepares the shortlist and context before the LLM
Before the model is called, PlanVault narrows a large catalog into a relevant shortlist. Selection combines retrieval, scenarios, usage signals, and project limits so the model sees prepared context instead of the whole enterprise catalog.
Manual and auto-recorded scenarios provide boosts and optional planner guidance.
Adaptive retrieval moves from direct selection to FTS/vector/hierarchical search as the catalog grows. A Semantic Routing Cache accumulates anonymized embedding vectors after successful runs, improving retrieval precision over time without storing any raw query text.
Hybrid fusion combines retrieval score, scenario boost, and capped usage signal.
Secrets and raw large outputs do not become planner prompt content by default; bounded evidence replan is explicit for read-only tools.
The model plans, but does not execute side effects
The planner can use different providers through LiteLLM, and the planning mode depends on the selected model. The LLM creates a plan from allowed context; execution, secrets, and side effects stay in runtime.
The model resolves at session, project, organization, or global default level.
Structured JSON and Python-like DSL planning modes are supported.
A utility model can handle short auxiliary responses such as slot summaries.
An embedding model builds vectors for tools and scenarios in the background so future requests can find relevant tools faster.
PlanVault validates the plan and executes it deterministically
The plan is validated and executed through a reactive JVM actor backend. Approval policies, idempotency safeguards, error handling, recovery, and event sourcing live here, instead of letting the model directly control tool calls.
PlanValidator checks tools, params, variables, and exactly one terminal reply/fail.
The execution graph runs through a crash-resilient engine with event-sourced state.
HITL gates depend on project/session/tool policies, not arbitrary LLM decisions.
Retry and replan paths are bounded by configuration and runtime control.
Get traceable output, audit, and production workflow integration
The result is more than a user-facing answer. PlanVault leaves an operational trail: SSE, history, diagnostics, replay, audit logs, encrypted data, and lifecycle hooks for your systems.
SSE and history expose planGraph, tool events, approvals, errors, and terminal states.
Diagnostics and replay help investigate runs without unsafe raw-content access by default.
Envelope encryption, retention, export/erasure, and pseudonymisation support GDPR-oriented operations.
Lifecycle webhooks and APIs let you integrate incrementally with the team and services you already have.
Architecture, integration, and runtime details for the team that will operate it
Inspect the full execution path: service integration, tool selection, validation, diagnostics, and replay. The backend uses a reactive JVM actor architecture for concurrent, stateful workflows.
Developer workflow for integrating and operating AI execution
Engineers connect services, inspect the tool catalog, investigate diagnostics, and replay runs safely without reverse-engineering every prompt or payload.
Connect existing service contracts
Import OpenAPI, connect MCP servers, or add webhooks without rewriting services around an agent framework.
See what the model is allowed to use
Review flattened schemas, search documents, scenario guidance, and the shortlist that will become planner context.
Find integration failures faster
Diagnostics surface contract drift, failed tool calls, validation errors, and runtime state without manual payload archaeology.
Recover from persisted execution state
Replay and history let teams reproduce a run from the controlled event log instead of reconstructing it from chat.
Debug the system boundary, not the model transcript
Instead of debugging agent behavior from model logs, teams inspect service contracts, selected context, runtime state, and the replay path.
OpenAPI + MCP + webhooks indexed
Shortlist, schemas, and scenario guidance visible
Contract drift marked for shadow version
Replay from persisted run events