Architecture, runtime, and developer workflow

Review how PlanVault routes tools, validates plans, executes controlled runs, records audit events, and supports replay diagnostics. Bring Your Own Key (BYOK) for AI providers keeps credentials under your control.

app.planvault.ai/runs/pv-2481
Product proof

A model plan becomes a validated, replayable execution trace

Plan onlyRuntime controlledAudit ready
Validation

PlanValidator passed

Schemas, params, and terminal path checked before execution.
Action control

Approval required

Runtime pauses side effects until policy approval.
Data boundary

Model-safe context

Secrets stay out of the prompt; raw large outputs stay out of the planner by default, except explicitly enabled bounded evidence for read-only replans.
Model output

The LLM returns a plan, not side effects

plan "notify_customers"
  orders = list_orders(limit: 10)
  for order in orders
    email = get_field(order, "billing_email")
    if email.present?
      notify_customer(order.id, email)
  reply("Notifications queued")
No plaintext secrets in prompt
Allowed tools and parameters only
Exactly one terminal reply/fail path
Validation + runtime

PlanVault validates and executes

tool shortlist

done
Adaptive retrieval selected the allowed tools.

PlanValidator

done
Tools, params, variables, and terminal state checked.

approval gate

waiting
Policy pauses side effects before customer notification.

reply

queued
Final response waits for controlled execution state.
Audit / replay

Every step leaves evidence

Approval event

Approval policy, actor, and decision are preserved with the run.

Secret boundary

Secrets resolve inside runtime and never become prompt text.

Replay trail

Diagnostics keep the trace inspectable without exposing raw output by default.
Replay available for this run
Technical differentiators

Controls for risks you cannot leave to the model

Data boundaries, routing, recovery, secrets, and large responses are controlled by the system, not prompt instructions.

Data boundary

Data stays under your control

Deploy inside your VPC or an air-gapped capable environment, connect local models through LiteLLM, and keep data inside your own perimeter.

Tool routing

Tool scale without prompt chaos

PlanVault narrows a large service catalog to the relevant tools before planning, so the model does not see thousands of descriptions or spend tokens on routing. A Semantic Routing Cache accumulates anonymized embedding vectors from successful runs and improves retrieval accuracy over time — without ever storing raw query text.

Recovery

Recovery without uncontrolled side effects

Every execution step is recorded as an event. After a failure, a run can recover or stop safely without zombie processes or duplicated mutations.

Secrets

Late-bound secrets, encrypted at rest

Secrets are late-bound through scoped references; prompts receive variable names, not plaintext credentials.

Large outputs

Large responses without context overflow

The runtime works with large service responses without pushing raw payloads into the LLM by default; read-only tools can explicitly opt into a bounded evidence excerpt for replanning.

Head-to-head

Where PlanVault sits next to your agent stack

PlanVault is the governance and execution layer that wraps your agent stack — any LangChain, LangGraph, CrewAI, AutoGen, DSPy, OpenAI Assistants, or MCP agent runs as a PlanVault tool, with tool calls, secrets, approvals, and recovery handled outside the model and every run recorded for audit and replay.

PlanVault keeps tool calls, secrets, approvals, and recovery outside the model.

Tool routing

PlanVaultGoverned control layer

Centroid-based DB routing; shortlist built before the LLM

OpenAI Assistants

Internal OpenAI implementation

LangGraph / CrewAI

LLM as classifier (slow)

Credal

LLM-as-classifier; not centroid-based

Self-hosted · VPC-ready

PlanVaultGoverned control layer
Self-hosted in customer VPC; air-gapped capable; local models via LiteLLM Security details
OpenAI Assistants

External API dependency

LangGraph / CrewAI

Self-managed infra, cloud LLM

Credal

Hosted SaaS; no self-host

Catalog scalability

PlanVaultGoverned control layer

Dynamic routing from catalogs of thousands of tools; adaptive shortlist per step

OpenAI Assistants

128 tools (Assistants API); context-bounded (Responses API)

LangGraph / CrewAI

Manual assignment, typically ≤20 per agent

Credal

Manual catalog; ≤ low hundreds

Crash recovery

PlanVaultGoverned control layer
Event-sourced FSM, auto-recovery, idempotency keys Security details
OpenAI Assistants

No (thread-bound, provider-managed state)

LangGraph / CrewAI

Checkpoints (manual); recovery semantics stay app-owned

Credal

Conversation-bound; no event journal

Large responses

PlanVaultGoverned control layer

Schema flattening, JSONPath extraction, stdlib tools, depth truncation

OpenAI Assistants

Token limit only

LangGraph / CrewAI

Custom code required

Credal

Token limit; no JSONPath / flattening

Execution journey

From existing services to governed AI execution

Five steps show where the model stops and the governed runtime takes over: tool selection, validation, execution, audit, and replay.

Connect01 / 05

Connect the services that already do the work

Start with what your team already has: APIs, MCP servers, webhooks, session context, and correlation metadata. PlanVault turns them into a tool catalog that is easier to select from and safer to plan against.

OpenAPI/Swagger import runs asynchronously and versions tool definitions.

MCP servers and outbound webhooks become normal tools in the same catalog.

Flattened schemas, descriptions, and search documents make tools easier for retrieval and planner prompts.

X-Request-Id, traceparent, tags, and metadata help correlate runs with your systems.

Read integration docs
Prepare02 / 05

PlanVault prepares the shortlist and context before the LLM

Before the model is called, PlanVault narrows a large catalog into a relevant shortlist. Selection combines retrieval, scenarios, usage signals, and project limits so the model sees prepared context instead of the whole enterprise catalog.

Manual and auto-recorded scenarios provide boosts and optional planner guidance.

Adaptive retrieval moves from direct selection to FTS/vector/hierarchical search as the catalog grows. A Semantic Routing Cache accumulates anonymized embedding vectors after successful runs, improving retrieval precision over time without storing any raw query text.

Hybrid fusion combines retrieval score, scenario boost, and capped usage signal.

Secrets and raw large outputs do not become planner prompt content by default; bounded evidence replan is explicit for read-only tools.

Read tool selection docs
Plan03 / 05

The model plans, but does not execute side effects

The planner can use different providers through LiteLLM, and the planning mode depends on the selected model. The LLM creates a plan from allowed context; execution, secrets, and side effects stay in runtime.

The model resolves at session, project, organization, or global default level.

Structured JSON and Python-like DSL planning modes are supported.

A utility model can handle short auxiliary responses such as slot summaries.

An embedding model builds vectors for tools and scenarios in the background so future requests can find relevant tools faster.

Read planner architecture
Execute04 / 05

PlanVault validates the plan and executes it deterministically

The plan is validated and executed through a reactive JVM actor backend. Approval policies, idempotency safeguards, error handling, recovery, and event sourcing live here, instead of letting the model directly control tool calls.

PlanValidator checks tools, params, variables, and exactly one terminal reply/fail.

The execution graph runs through a crash-resilient engine with event-sourced state.

HITL gates depend on project/session/tool policies, not arbitrary LLM decisions.

Retry and replan paths are bounded by configuration and runtime control.

Read runtime controls
Operate05 / 05

Get traceable output, audit, and production workflow integration

The result is more than a user-facing answer. PlanVault leaves an operational trail: SSE, history, diagnostics, replay, audit logs, encrypted data, and lifecycle hooks for your systems.

SSE and history expose planGraph, tool events, approvals, errors, and terminal states.

Diagnostics and replay help investigate runs without unsafe raw-content access by default.

Envelope encryption, retention, export/erasure, and pseudonymisation support GDPR-oriented operations.

Lifecycle webhooks and APIs let you integrate incrementally with the team and services you already have.

Read security and operations
Connect01 / 05

Connect the services that already do the work

Start with what your team already has: APIs, MCP servers, webhooks, session context, and correlation metadata. PlanVault turns them into a tool catalog that is easier to select from and safer to plan against.

OpenAPI/Swagger import runs asynchronously and versions tool definitions.

MCP servers and outbound webhooks become normal tools in the same catalog.

Flattened schemas, descriptions, and search documents make tools easier for retrieval and planner prompts.

X-Request-Id, traceparent, tags, and metadata help correlate runs with your systems.

Read integration docs
Prepare02 / 05

PlanVault prepares the shortlist and context before the LLM

Before the model is called, PlanVault narrows a large catalog into a relevant shortlist. Selection combines retrieval, scenarios, usage signals, and project limits so the model sees prepared context instead of the whole enterprise catalog.

Manual and auto-recorded scenarios provide boosts and optional planner guidance.

Adaptive retrieval moves from direct selection to FTS/vector/hierarchical search as the catalog grows. A Semantic Routing Cache accumulates anonymized embedding vectors after successful runs, improving retrieval precision over time without storing any raw query text.

Hybrid fusion combines retrieval score, scenario boost, and capped usage signal.

Secrets and raw large outputs do not become planner prompt content by default; bounded evidence replan is explicit for read-only tools.

Read tool selection docs
Plan03 / 05

The model plans, but does not execute side effects

The planner can use different providers through LiteLLM, and the planning mode depends on the selected model. The LLM creates a plan from allowed context; execution, secrets, and side effects stay in runtime.

The model resolves at session, project, organization, or global default level.

Structured JSON and Python-like DSL planning modes are supported.

A utility model can handle short auxiliary responses such as slot summaries.

An embedding model builds vectors for tools and scenarios in the background so future requests can find relevant tools faster.

Read planner architecture
Execute04 / 05

PlanVault validates the plan and executes it deterministically

The plan is validated and executed through a reactive JVM actor backend. Approval policies, idempotency safeguards, error handling, recovery, and event sourcing live here, instead of letting the model directly control tool calls.

PlanValidator checks tools, params, variables, and exactly one terminal reply/fail.

The execution graph runs through a crash-resilient engine with event-sourced state.

HITL gates depend on project/session/tool policies, not arbitrary LLM decisions.

Retry and replan paths are bounded by configuration and runtime control.

Read runtime controls
Operate05 / 05

Get traceable output, audit, and production workflow integration

The result is more than a user-facing answer. PlanVault leaves an operational trail: SSE, history, diagnostics, replay, audit logs, encrypted data, and lifecycle hooks for your systems.

SSE and history expose planGraph, tool events, approvals, errors, and terminal states.

Diagnostics and replay help investigate runs without unsafe raw-content access by default.

Envelope encryption, retention, export/erasure, and pseudonymisation support GDPR-oriented operations.

Lifecycle webhooks and APIs let you integrate incrementally with the team and services you already have.

Read security and operations
For engineering teams

Architecture, integration, and runtime details for the team that will operate it

Inspect the full execution path: service integration, tool selection, validation, diagnostics, and replay. The backend uses a reactive JVM actor architecture for concurrent, stateful workflows.

Developer workflow

Developer workflow for integrating and operating AI execution

Engineers connect services, inspect the tool catalog, investigate diagnostics, and replay runs safely without reverse-engineering every prompt or payload.

Integrate

Connect existing service contracts

Import OpenAPI, connect MCP servers, or add webhooks without rewriting services around an agent framework.

Inspect

See what the model is allowed to use

Review flattened schemas, search documents, scenario guidance, and the shortlist that will become planner context.

Debug

Find integration failures faster

Diagnostics surface contract drift, failed tool calls, validation errors, and runtime state without manual payload archaeology.

Replay

Recover from persisted execution state

Replay and history let teams reproduce a run from the controlled event log instead of reconstructing it from chat.

Engineering handoff

Debug the system boundary, not the model transcript

Instead of debugging agent behavior from model logs, teams inspect service contracts, selected context, runtime state, and the replay path.

Catalog

OpenAPI + MCP + webhooks indexed

Context

Shortlist, schemas, and scenario guidance visible

Failure

Contract drift marked for shadow version

Recovery

Replay from persisted run events

Versioned tool definitions
Diagnostics without raw content by default
Replay from event-sourced state
Docs/API path for production integration