Sessions & tool routing

On this page

Sessions & tool routing

Core Concepts

PlanVault is built around a small set of well-defined abstractions. Understanding these concepts is essential for both configuring the platform and integrating with the Runtime API.

Multi-tenancy model

PlanVault uses a two-level hierarchy: Organisation → Project. An organisation is the top-level tenant unit that groups team members, a shared tool catalog, LLM provider configurations, organisation-wide secrets, scenarios, and budget limits. A project is a deployable unit within an org — it has its own Runtime API keys, an enabled subset of org tools, model overrides, project-level secrets, and sessions.

This separation allows a single PlanVault deployment to serve multiple teams or products with isolated configurations and data. Rows are scoped by organisation and project identifiers; each organisation has its own encryption key (DEK), so data is cryptographically isolated between tenants. Every member has one of four organisation roles — OWNER, ADMIN, DEVELOPER, or MEMBER — controlling console and Admin API access and (for Runtime JWT calls) the HRN scope bundle; see **Roles & access control** below.

Session

A session represents a single user interaction thread. Each session is driven by a crash-resilient executor that manages the full lifecycle: prompt → planning → execution → response. Sessions support multiple sequential prompts (follow-ups) within the same conversation context.

During execution, intermediate states and tool payloads are written to an ephemeral journal for crash recovery. After completion or failure the temporary journal is removed and long-term encrypted history remains in the configured event store.

Each **run** (one user prompt through the stack) has a detailed lifecycle: queued → planning → optionally awaiting_confirmation / awaiting_slots / awaiting_external_signal → executing → completed or failed; interrupted and needs_manual_recovery cover actor loss or operator recovery. A session aggregates runs and can be started from the console, Runtime API, inbound webhook triggers, or scheduled/recurring jobs.

Plan

The planner produces a validated execution plan that the FSM interprets. Plans reference tools by normalised names with named parameters, can branch (if/else), and iterate (for item in collection — no while loops in the grammar). Secret and context variable names are metadata supplied to the LLM; decrypted values are injected into execution scope by the FSM, never sent to the model.

Note on planner modes

structured_json: the LLM uses response_format with JSON Schema to produce predictable output. Requires model support for structured outputs (e.g. gpt-4o). Fails fast with HTTP 400 if unsupported. python_dsl: the LLM outputs a text DSL inside <script> tags. Works with any model, including local or air-gapped deployments. Both modes compile to the same internal plan representation — runtime behaviour is identical.

Tool

Tools are planner-facing capabilities organised by service (integration group). Sources include OpenAPI endpoints (Swagger JSON/YAML import), synced MCP servers, native intrinsic functions (stdlib), and outbound webhooks (for n8n, Make, Zapier integration). Tools are imported at the organisation level into a shared catalog, then selectively enabled per project.

During a session, the full enabled tool set is narrowed to a shortlist via the adaptive retrieval pipeline (see Tool Selection Pipeline below). The planner receives only the shortlisted tools. At execution time the platform dispatches calls via REST, MCP, or webhook — injecting decrypted secrets from execution scope into request parameters.

Adaptive retrieval & scenarios

When a prompt arrives, the system selects the most relevant tools from the project’s enabled catalog. This avoids sending hundreds of tool definitions to the LLM, which would exceed context limits and degrade plan quality. The pipeline has three conceptual stages: scenario contributions, adaptive retrieval, and hybrid fusion.

Scenarios are cached mappings from query patterns to tool lists. Manual scenarios (created by admins, priority 2–100) explicitly map sample queries to tools and may include systemInstruction — extra planner prompt text with {{key}} placeholders filled from session contextVars. Auto scenarios (priority 1) extend selection using the **Semantic Routing Cache**: anonymized query embeddings stored in your organisation database (no raw end-user prompt text persisted); organisation owners can disable it under General settings.

After retrieval and scenario matching, results are fused: scenario boosts + retrieval RRF scores + capped usage boost (log-scaled tool usage counters). Group caps prevent one integration from dominating. The final shortlist is bounded by retrievalMaxTools (default 30) and a hard cap of 200. The same pipeline can be tested via POST …/tools/test-selection or the Debug screen in project settings.

Secrets & encryption

Secrets are organised in three layers: Organisation → Project → Session. At execution time, layers are merged — the most specific level wins (session overrides project, project overrides org). Values are encrypted at rest using AES-256-GCM envelope encryption with the organisation DEK (wrapping uses Tink+KMS on typical SaaS, legacy direct KMS, or operator KEK when self-hosted); a separate HMAC signing key pseudonymises external user identifiers.

Note on secret merging

Secrets are merged at execution time in this order: Organisation > Project > Session. If the same key name exists across levels, the most specific level wins (e.g. Session overrides Project and Organisation).

Secret values are never placed into the LLM prompt. The model sees only parameter names (secret handles); the FSM decrypts and injects actual values into HTTP tool requests at execution time. LLM provider API keys and custom backend credentials stored under Providers are also DEK-encrypted.

Webhooks

Inbound webhooks are public trigger endpoints (POST /api/v1/orgs/{orgId}/webhooks/{triggerKey}) that allow external systems — Slack, GitHub, CI pipelines — to start new sessions automatically. Each trigger is protected by an HMAC signature to verify authenticity.

Outbound lifecycle webhooks are per-project POST notifications for subscribed event types (session.completed, session.failed, session.requires_action, session.interrupted, session.recovery_required). HTTPS is typical; HTTP or private targets require organisation outbound URL policy opt-in. The payload includes session identifiers, tags, metadata, and a compact data object — not full chat history. Signed with X-Signature (HMAC-SHA256 over recursively sorted JSON bytes). Up to 4 delivery attempts with backoff (5 s, 30 s, 2 min). See the API documentation for field schemas and integration details.

Request Lifecycle

Below is the end-to-end flow from the moment a client creates a session to the final response. Each step is handled asynchronously — the client is never blocked waiting for LLM or tool execution.

1. Session creation — the client calls POST /api/v1/projects/{projectId}/sessions with optional externalUserId, tags, contextVars, metadata, secrets, and autoApprovePlan. The server creates the session record, starts session execution, and returns sessionId. 2. Prompt submission — the client POSTs …/sessions/{id}/messages with the user text. The server persists a prompt event, responds HTTP 202 Accepted with messageId, and schedules work on the session message queue. 3. Tool selection — the Runtime engine loads enabled project tools and merged secrets (organisation + project + session). The adaptive pipeline narrows candidates (vector/FTS/clustering → scenario merge → hybrid fusion → bounded shortlist). 4. Planning — the planner resolves planner mode, builds the system prompt (project prompt, scenario instructions, tool definitions, secret/context names), and calls the LLM via the proxy. The response compiles into an execution plan. 5. Human-in-the-loop (optional) — when plan approval is required, execution pauses. A lifecycle webhook (session.requires_action) fires if configured. The client acts via POST …/actions. 6. Execution — the FSM walks the plan step by step. Each tool call resolves secrets from scope, performs HTTP/MCP/webhook calls, and records results. Progress streams via SSE (GET …/chat). 7. Completion — on success the encrypted history is kept, ephemeral execution state is cleaned up, and auto-scenarios may be updated. When lifecycle webhooks are configured: terminal outcomes use session.completed or session.failed; when subscribed, session.interrupted or session.recovery_required cover crash/recovery semantics. The assistant reply is available via GET …/history.

Tool Selection Pipeline

Tool selection determines which subset of a project’s enabled tools are presented to the LLM planner for any given query. Sending all tools every time is impractical — a large org may have hundreds of tools, which would exceed context limits and hurt plan quality.

Retrieval modes

auto mode dynamically selects the retrieval strategy based on the number of enabled project tools: • ≤ retrievalTier1Max (default 20) — Direct: every enabled tool goes to the planner with no RAG. • ≤ retrievalTier2Max (default 100) — flat_rag: vector similarity search only. • ≤ retrievalTier3Max (default 200) — full_rag: Reciprocal Rank Fusion (RRF) combining vector search + PostgreSQL full-text search. • Above retrievalTier3Max — hierarchical_rag: centroid-based group routing → narrowed search within matched groups (no separate LLM classifier). These fields are configured under Tool retrieval in project and organisation settings (JSON keys retrievalTier1Max / retrievalTier2Max / retrievalTier3Max). Manual mode overrides: lexical (keyword only), flat_rag, full_rag, hierarchical_rag.

Fusion & ranking

After retrieval and scenario matching, all candidates are scored: wRrf × retrievalRRF + scenarioBoost + capped usageBoost (log-scaled usage_count). Manual scenarios (priority 2–100) produce stronger boosts than auto-detected ones (priority 1, scaled by success_rate). Group caps limit how many tools from a single service can enter the shortlist. The final list is bounded by retrievalMaxTools (default 30) and a hard system limit of 200 tools, with stdlib tools appended unconditionally. The usageBoostCapRrfOffset parameter (default 280) is a mathematical safeguard against popularity bias. Without it, frequently called tools (e.g. sendEmail, log_event) would accumulate unbounded usage scores and crowd out rare but critical APIs from the context window. The cap ensures adaptive retrieval stays grounded in semantic relevance (RRF) rather than merely reflecting historical call frequency.

Test the full pipeline without starting a session: POST /api/v1/projects/{projectId}/tools/test-selection returns the retrieval strategy, per-source tool lists, and the merged shortlist. An optional toolRetrievalMode parameter lets you override the mode for a single call. The same data is visualised in the Debug: Tools and Plan screen under project settings.

API Architecture

Support page

API and documentation questions: [email protected]