Composable Runtime Architecture
This chapter explains what actually happens on the platform side when the host system sends a
message with createMessage (POST /conversations/{conversation_id}/messages). The short
version: the platform assembles a runtime environment on demand — the right repository, the
right skills, the right storage, the right agent runtime — runs the agent inside an isolated
sandbox that holds zero credentials, streams the result back, and tears the environment down.
Nothing about that environment is baked in; everything is composed, per request, from data the
Integration API already manages.
Understanding this model matters for integrators because every knob on the API —
repository overrides, skill narrowing, runtime.agent_type, sticky sandboxes, secrets
pass-through, approvals — is a handle on one stage of this composition pipeline. This chapter maps
the knobs to the machinery.
1. The composable-runtime thesis: capabilities are files
Section titled “1. The composable-runtime thesis: capabilities are files”The platform’s core design commitment is that an agent’s capabilities are files in a git
repository — not rows in a proprietary capability database, not code compiled into the platform.
A skill is a folder with a SKILL.md manifest (frontmatter + instructions + optional scripts and
resources). Agent instructions, personas, and tool configurations follow the same
convention-defined, file-based format.
The platform’s job is not to define capabilities — it is to compose them, per request, into a concrete runtime environment:
Repository registry → which capability trees exist (registerRepository)Assignment cascade → which tree applies to THIS requestSkill-access intersection → which skills from that tree this run may useRuntime selection → which agent harness executes them (runtime.agent_type)Sandbox provisioning → an isolated environment with exactly that compositionThe repository registry
Section titled “The repository registry”Repositories are top-level registry entries, registered once with registerRepository
(POST /repositories) together with a pre-authenticated credential
(createCredential, POST /credentials — the git token is write-only and never readable
again). A repository dictates which skills exist: listRepositorySkills
(GET /repositories/{repository_id}/skills) enumerates them, syncRepository
(POST /repositories/{repository_id}/sync) re-scans the tree after a push, and
createRepositorySkill (POST /repositories/{repository_id}/skills) authors a new skill
directly into the tree — committed to git like any other change.
The resolution cascade
Section titled “The resolution cascade”A registered repository is then assigned down a cascade, most specific wins:
flowchart LR
M["message.repository_id<br/><i>one-off override</i>"] --> C["conversation.repository_id<br/><i>thread override</i>"]
C --> U["user.default_repository_id<br/><i>per-user override</i>"]
U --> R["role.repository_id<br/><i>role override</i>"]
R --> T["tenant default<br/><i>attached via attachTenantRepository</i>"]
style M fill:#dbeafe,stroke:#1d4ed8
style T fill:#dcfce7,stroke:#15803d
| Level | Set via | Typical use |
|---|---|---|
| Tenant default | attachTenantRepository (PUT /tenants/{tenant_id}/repositories/{repository_id}, {is_default: true}) | The baseline capability tree every user in the tenant falls through to. |
| Role override | createRole / updateRole (role.repository_id) | A job function that works from a different tree (e.g. a dispatcher role pinned to a dispatch-tools repo). |
| User override | updateUser (user.default_repository_id) | A single power user piloting a new capability tree. |
| Conversation override | createConversation (repository_id) | One thread that should run against a specific tree for its whole lifetime. |
| Message override | createMessage (repository_id) | The “magic thing for one command” — a single turn executed against a different tree, without touching any stored configuration. |
The effective skill set
Section titled “The effective skill set”Once the effective repository is resolved, the skills that actually enter the run are an intersection of four filters — each layer can only narrow, never widen:
effective_skills = skills of the effective repository (what exists) ∩ role skill_access (what the role permits) ∩ conversation.selected_skill_ids (what this thread opted into) ∩ message.skill_ids (what this turn narrowed to)Integrators can inspect the resolution at every level without running anything:
listUserSkills (GET /users/{user_id}/skills) returns a user’s effective skills with
provenance (via {role_id, repository_id}), and listRoleSkills
(GET /roles/{role_id}/skills) resolves a single role.
Why this design pays off
Section titled “Why this design pays off”- Versioned, auditable capability. Every capability change is a git commit — diffable,
revertable, attributable. “What could the agent do on March 3rd?” is a
git logquestion, not a forensic reconstruction. - Per-tenant customization without platform code. A new tenant vertical, a new playbook, a
reworked skill set — all of it is repository content. The platform binary never changes; a new
capability profile is a new repo (or branch) plus one
attachTenantRepositorycall. - Instant capability updates. Push to the repository, call
syncRepository, and the next message runs with the updated tree. No deployment, no restart, no migration. - Least-capability by construction. Because every cascade layer intersects downward, the narrowest intent wins. A message that names two skills runs with at most two skills — the LLM’s context contains nothing else, which is both a security property and a quality property (see next section).
2. The platform controls the run — not just an API wrapper
Section titled “2. The platform controls the run — not just an API wrapper”A common integration failure mode is treating an agent platform as a thin proxy in front of an LLM API: prompt in, tokens out, everything else is the caller’s problem. This platform is the opposite: it owns the LLM run end-to-end — what enters the model’s context, what the model is allowed to do mid-run, what the caller sees while it runs, and what happens when the model wants something it doesn’t have.
| Concern | Thin API wrapper | This platform |
|---|---|---|
| Context contents | Caller concatenates strings and hopes | Composed from the resolved repo + effective skills + conversation history + workspace posture; nothing ungranted is even visible to the agent |
| Capability scope | Prompt-level pleading (“do not use tool X”) | Structural: ungranted skills are absent from the sandbox filesystem — the agent cannot list, read, or invoke them |
| Latency UX | Dead air until first token | Filler/gate: optional low-latency filler deltas (flagged data.filler: true) bridge the gap while the full run warms |
| Dangerous actions | Caller inspects output after the fact | Mid-run HITL gate: the run parks on approval_required and cannot proceed without a cryptographically signed approval |
| Secrets | In the prompt or env, visible to the model | Never in the run at all — the agent sees aliases; the egress proxy resolves them on the wire |
| Output | An unstructured token stream | A typed NDJSON event grammar with monotonic seq, distinct event types, and a mandatory terminal event |
The individual mechanisms:
- Skill narrowing keeps context lean. Skill instructions consume context-window budget, and
irrelevant instructions measurably degrade model behavior. Because the effective skill set is an
intersection (§1), a host that knows a turn only needs
invoice-lookupcan say so oncreateMessageviaskill_ids— and the run’s context contains that skill and nothing else. Narrowing is simultaneously a quality control (focused context), a cost control (fewer tokens), and a security control (smaller action surface). - Context composition is platform-owned. The platform assembles what enters the agent loop:
the resolved repository tree, the filtered skills, the conversation history, per-message
envparameters, and a system-prompt description of the workspace (which paths are writable, what each storage zone is for). The host system supplies intent; the platform supplies a provably-scoped environment. - Filler/gate for latency UX. Provisioning an isolated environment takes real milliseconds.
When filler is enabled (cascading tenant settings → conversation → message, most specific wins),
the stream opens with one or two short
content_deltaevents flaggeddata.filler: true— a natural spoken-style opener the host can render or suppress — while the full run warms behind it. The real agent continues from the filler; the flag lets hosts distinguish the two. See Streaming Contract for the event grammar. - HITL approval gates. Mid-run, the agent can declare that it needs something it doesn’t have —
permission for a consequential action, or a secret that was never provided. The run emits
approval_required(carrying an Approval object withrequested_items[]) and parks inawaiting_approval. Resolution goes throughapproveApproval(POST /approvals/{approval_id}/approve) ordenyApproval(POST /approvals/{approval_id}/deny) and requires a signed assertion made with a per-tenant approver key that is distinct from the integration API key — the transport that carries an approval cannot mint one. §5 covers how this composes with secrets. - Structured streaming. Every run emits typed NDJSON events (
message_start,content_delta,queued,approval_required,resumed,message_end,error) with a monotonicseqfor gap and truncation detection, and a guaranteed terminal event. The host never scrapes free text to infer run state.
The takeaway for integrators: the Integration API is not “send prompt, get completion.” It is “declare intent and constraints; the platform manufactures a scoped run and reports it back in a typed protocol.”
3. Pluggable agent runtimes
Section titled “3. Pluggable agent runtimes”The composition pipeline in §1 deliberately does not assume any particular agent implementation.
The files-in-a-repository convention (skills as SKILL.md folders, instructions as markdown) is
an industry convention, not a vendor lock — multiple agent harnesses read the same layout.
The conversation’s runtime.agent_type field selects which harness executes the run:
{ "runtime": { "agent_type": "claude-agent-sdk", "mode": "sticky", "sticky_ttl_seconds": 900 }}agent_type is an open enum — claude-agent-sdk (the default), codex, deepagent, with
more added over time without a breaking API change. The default comes from tenant settings;
createConversation can override it per conversation. Unknown values are rejected with a
validation-error problem listing the runtimes the deployment actually has installed.
The Agent Runtime Interface
Section titled “The Agent Runtime Interface”Every runtime sits behind the same abstraction — the Agent Runtime Interface. A conforming runtime is an image that reads its prompt and composed workspace from defined locations, streams run events in a declared format, honors the platform’s env-var contract (including routing all outbound traffic through the platform proxy), and exits cleanly. Anything that meets the contract slots in; nothing else in the platform changes.
What the abstraction guarantees to the host system, regardless of which agent_type runs:
| Guarantee | Meaning |
|---|---|
| Same Integration API contract | createConversation, createMessage, the NDJSON event grammar, approvals, secrets — identical request/response shapes for every runtime. Switching agent_type changes zero lines of host code. |
| Same skill conventions | The effective repository and effective skill set (§1) are composed identically. A skill authored once works across runtimes that honor the convention. |
| Same isolation | Every runtime executes inside the sandbox model of §4 — read-only repo, alias-only credentials, proxy-only egress. A runtime cannot opt out of the security envelope. |
| Same observability | Runs emit the same typed event stream and land in the same history (listMessages), with the same usage accounting, whichever harness produced them. |
What can legitimately differ between runtimes: reasoning style and quality, tool-use behavior,
latency and cost profile, and which optional conventions (sub-agents, slash commands) each harness
supports. Those are selection criteria for choosing an agent_type — not integration risks.
This is why agent_type is safe to expose as a caller-facing knob: it selects an engine inside a
fixed chassis. The chassis — API contract, capability composition, sandbox, streaming — is
invariant.
4. Sandbox-per-run security: the environment an agent actually gets
Section titled “4. Sandbox-per-run security: the environment an agent actually gets”Every agent run executes inside its own isolated sandbox — a disposable environment provisioned for the run and destroyed after it (or, for sticky conversations, kept for the lease and then destroyed; §5). The sandbox is where the composed capability set (§1) becomes a concrete filesystem, and where the platform’s central security claim is enforced:
The LLM is treated as an untrusted component. It never holds a real credential — not the git token, not the host system’s API keys, not conversation secrets, not even the key for its own upstream LLM provider.
flowchart TB
subgraph control["Control plane (trusted — holds real secrets)"]
API["Integration API<br/>composition & scheduling"]
Vault[("Vault<br/>credentials (crd_) ·<br/>conversation secrets<br/>alias → value")]
API --- Vault
end
subgraph sandbox["Sandbox — one isolated environment per run (untrusted, LLM-driven)"]
direction TB
Agent["Agent runtime<br/>(runtime.agent_type)"]
RO["/workspace/repo — read-only<br/>resolved repository tree +<br/>filtered effective skills"]
RW["writable zones<br/>per-run scratch space ·<br/>user storage · conversation storage<br/>(S3-style buckets)"]
Agent --- RO
Agent --- RW
end
Proxy["Egress proxy<br/>resolves aliases → real values<br/>ON THE WIRE, at the boundary"]
Ext["External systems<br/>host APIs · SaaS · data stores"]
API -->|"provisions run:<br/>repo + skills + aliases + env"| sandbox
Agent -->|"outbound call carrying<br/>ALIAS only, e.g. {{secret:CRM_API_KEY}}"| Proxy
Proxy <-->|"alias lookup<br/>(logged per resolution)"| Vault
Proxy -->|"real credential substituted"| Ext
Ext --> Proxy --> Agent
Agent -.->|"any other egress path"| Blocked["✕ denied<br/>default-deny network policy"]
style control fill:#dbeafe,stroke:#1d4ed8
style sandbox fill:#dcfce7,stroke:#15803d
style Proxy fill:#fef3c7,stroke:#b45309
style Ext fill:#fee2e2,stroke:#b91c1c
style Blocked fill:#fee2e2,stroke:#b91c1c,stroke-dasharray: 5 5
Filesystem: what the agent sees
Section titled “Filesystem: what the agent sees”| Zone | Access | Lifetime | Contents |
|---|---|---|---|
| Repository mount | Read-only | Frozen for this run | The resolved repository tree with the filtered effective skill set — ungranted skills are physically absent, not merely “disallowed” |
| Run scratch space | Read-write | Destroyed with the run | Temporary working files |
| Conversation storage | Read-write | Life of the conversation | The conversation’s own S3-style bucket (conversation.storage) — files persist across turns in this thread |
| User storage | Read-write | Persistent | The user’s auto-attached S3-style bucket (user.storage; BYO-linkable via updateUser) — notes and artifacts that carry across all of the user’s conversations |
Two properties are worth underlining:
- The capability tree is immutable during a run. The repository mount is read-only, so a
prompt-injected agent cannot edit its own skills or instructions to persist a compromise. Skill
authoring flows through the control plane (
createRepositorySkill) and lands as an auditable git commit — never through a running agent. - Skill filtering is structural, not advisory. The intersection from §1 is applied by removing ungranted skills from the mounted view before the agent starts. The agent cannot list, read, or invoke what is not there. There is no “ignore previous instructions” path around a file that does not exist.
Network: one door, and it resolves credentials
Section titled “Network: one door, and it resolves credentials”The sandbox’s network posture is default-deny egress with exactly one door: the credential-resolving forward proxy.
- The agent’s environment contains aliases only — opaque tokens like
{{secret:CRM_API_KEY}}referencing vault entries (registry credentials created withcreateCredential, or conversation-scoped secrets supplied viaputConversationSecrets/ thesecretsfield oncreateMessage; see §5). - When the agent makes an outbound call, the proxy resolves the alias and substitutes the real value on the wire, at the boundary — outside the agent’s process. The response returns through the same boundary. Every resolution is logged: which run, which alias, which destination.
- Any egress that does not go through the proxy — direct socket, metadata endpoints, cluster services, sibling sandboxes — is denied by network policy.
The consequence: a fully compromised agent can exfiltrate no secret, because there is no secret in its memory, its environment, its filesystem, or its logs to exfiltrate. Aliases are worthless outside the proxy, and the proxy only resolves them for destinations within the credential’s scope. Combined with the HITL rule that approvals require a signing key the runtime never holds (§2), even a rogue agent cannot grant itself anything.
Process: defense in depth
Section titled “Process: defense in depth”Isolation never rests on a single mechanism. Inside the sandbox, the agent process runs:
- as a non-root user, with all Linux capabilities dropped and privilege escalation disabled;
- on a read-only root filesystem — only the designated writable zones accept writes;
- under a seccomp allowlist that blocks mount, namespace, tracing, and other escape-adjacent syscalls;
- with mandatory-access-control profiles (AppArmor/SELinux) layered on where the substrate provides them;
- under hard CPU, memory, and disk quotas, so a runaway run degrades itself, not its neighbors.
These layers are independent: any single one failing still leaves a confused or adversarial agent receiving a clean permission error from the kernel rather than silently succeeding. And because every run gets a fresh sandbox, there is no residue: nothing written by one run is visible to the next except what was deliberately persisted to conversation or user storage.
5. How the Integration API knobs drive this machinery
Section titled “5. How the Integration API knobs drive this machinery”Every runtime-facing knob on the API maps onto one stage of the pipeline above.
Sticky vs pooled sandboxes — runtime.mode
Section titled “Sticky vs pooled sandboxes — runtime.mode”The trade is latency vs density:
pooled(default): eachcreateMessageclaims a pre-warmed sandbox from the shared pool, composes the workspace, runs, releases. Best density; per-turn composition cost.sticky:createConversationwithruntime.mode: "sticky"andsticky_ttl_secondsdedicates a sandbox to the conversation for the TTL. Turns land in an already-composed, already-warm environment — the low-latency choice for interactive UX. Each message refreshes the lease;updateConversation(PATCH /conversations/{conversation_id}) extends or releases it early. Observe the lease viaruntime.sandbox_state(warm | active | expired) andruntime.expires_at.
Stickiness changes when the sandbox is destroyed — never what it may do. A sticky sandbox has the identical read-only-repo, alias-only, proxy-only posture as a pooled one, and tenant settings can cap the maximum TTL and the number of concurrent sticky sandboxes.
Capacity — on_capacity and getCapacity
Section titled “Capacity — on_capacity and getCapacity”Sandboxes are real, bounded resources. When the pool is exhausted, the caller chooses the failure mode per request:
on_capacity: "reject"(default) — immediate429capacity-exhaustedproblem withRetry-After; the host system owns the retry.on_capacity: "hold"— the request queues: the stream first emits{type: "queued", data: {position, retry_hint}}events, then proceeds when a sandbox frees (bounded by a documented maximum hold, after which it failscapacity-exhausted).
getCapacity (GET /capacity) exposes pool state
({warm_available, sticky_active, at_capacity}) so integrators can pre-check before dispatching,
or feed back-pressure into their own routing.
Runtime selection — runtime.agent_type
Section titled “Runtime selection — runtime.agent_type”Set per conversation at createConversation (tenant settings supply the default). Per §3, this
swaps the engine inside a fixed chassis — the sandbox posture, capability composition, event
grammar, and approval mechanics are identical across runtimes.
Env and secrets pass-through — env and secrets on createMessage
Section titled “Env and secrets pass-through — env and secrets on createMessage”Both ride on the message so the host system stays stateless — nothing to vault or persist on the adapter side:
env— plaintext, non-secret run parameters (locale, feature flags, request context). These are visible to the agent; the spec is emphatic that secret material must never travel here.secrets— a write-onlyalias → valuemap. Values are vaulted at the API boundary, scoped to the conversation, and never appear in any response, log, or history. The run receives only the aliases; the egress proxy resolves them on outbound calls (§4). Manage them independently of messages viaputConversationSecrets(PUT /conversations/{conversation_id}/secrets),listConversationSecrets(aliases and metadata only — never values), anddeleteConversationSecret(DELETE /conversations/{conversation_id}/secrets/{alias}).
Approvals — the HITL weave
Section titled “Approvals — the HITL weave”Approvals close the loop between §2’s gate and §4’s vault. The composed flow:
- Mid-run, the agent determines it needs something — commonly a credential it was never given.
It raises an Approval with
requested_items: [{kind: "secret", alias: "CRM_API_KEY", ...}]; the stream emitsapproval_requiredand the message parks inawaiting_approval. A sticky sandbox keeps the run warm until the approval’sexpires_at; a pooled run checkpoints and re-hydrates on approval. - The host system surfaces the request to its own approval authority and calls
approveApproval(POST /approvals/{approval_id}/approve) with a signed assertion over{approval_id, decision, exp}— made with the per-tenant approver key, which is deliberately distinct from the integration API key. Optionally, the approve body’ssecretsmap supplies the requested value, which is vaulted on arrival like any other secret. - The run resumes (
resumedevent) with the new alias available; the proxy resolves it on the agent’s outbound call; the run streams tomessage_end. A denial or expiry ends the messagefailedwith a problem-typederrorevent. Pending work is discoverable vialistApprovals(GET /approvals?status=pending) andgetApproval.
The guarantee this weave delivers: the agent can ask for anything, but can obtain nothing on its own. The value never enters the runtime (aliases only), the approval cannot be forged by anything inside the platform’s request path (the runtime holds no signing key, and the integration key can transport but not mint an approval), and every step — request, decision, resolution — is an auditable event.
One request, end to end
Section titled “One request, end to end”Putting all five knobs together, a single createMessage traverses the full pipeline:
- Resolve — walk the repository cascade; intersect the four skill filters; pick the
conversation’s
agent_type. - Admit — sticky lease available, warm-pool claim, or the
on_capacitypath (reject / queued events). - Compose — mount the resolved repo read-only with the filtered skill view; attach
conversation and user storage; inject
envand the alias set; append the workspace posture to the system prompt. - Run — the selected runtime executes inside the hardened sandbox; filler deltas bridge warm-up if enabled; tool calls exit only through the credential-resolving proxy.
- Gate — if the agent raises an Approval, park; resume only on a signed approval (optionally carrying the requested secret).
- Deliver & tear down — stream to
message_end, persist the message to history, release or destroy the sandbox perruntime.mode.
Every stage is observable through the API (runtime.sandbox_state, getCapacity, the event
stream, listApprovals), and every stage is driven by data the host system controls through the
same API. That is the composable-runtime promise: capability is configuration, execution is
disposable, and trust is structural.
6. Observability: point the deployment at your OTEL destination
Section titled “6. Observability: point the deployment at your OTEL destination”Every shiftagent deployment is instrumented with OpenTelemetry end to end, and every deployment can be configured with an OTEL destination — an OTLP endpoint of the operator’s choosing. Observability is a deployment-configuration concern, not a code change:
# Deployment configuration (Helm values)observability: otelEndpoint: "http://otel-collector.observability.svc.cluster.local:4318"which surfaces to the services as the standard OpenTelemetry environment contract:
OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector.observability.svc.cluster.local:4318OTEL_EXPORTER_OTLP_PROTOCOL=http/protobufWhat flows to that destination:
| Signal | What it covers |
|---|---|
| Traces | One trace per agent run: provisioning lookups, sandbox acquisition (warm-pool claim or sticky-lease reuse), repository/skill resolution, LLM calls (OTel GenAI semantic conventions — model, token usage), egress-proxy calls, stream completion |
| Metrics | Time-to-first-token and turn latency, sandbox pool utilization (getCapacity counters), queued/held request counts, approval wait times, provisioning cold-path rates, per-tenant token usage |
| Logs | Structured service logs with request_id correlation to traces; never message content or secret material |
Three properties matter for a host-system embedding:
- Vendor-neutral. The destination is any OTLP-compatible backend — a host-operated OTel collector, SigNoz, Datadog, Honeycomb, Grafana Tempo, Langfuse/Langsmith for the LLM spans, or the host system’s existing observability pipeline. The platform only requires that the OTLP endpoint be reachable from the deployment.
- Self-hosted collector is replaceable. A self-hosted install ships with a bundled OTel
collector by default; operators can point
otelEndpointat their own collector instead and the bundled one is never in the path. - Per-tenant fan-out. Tenants can be configured with additional OTLP exporters so the same telemetry ships into a downstream customer’s own observability stack — the same inherit-and-tighten model as every other tenant setting.
The adapter should propagate its X-Request-Id (and W3C traceparent, if the host system runs
OpenTelemetry too) on every Integration API call — that stitches host-side spans and
shiftagent-side spans into one distributed trace across the fusion boundary.
Related documents
Section titled “Related documents”- Integration Guide — architecture overview, auth model, and external-ID conventions.
- Provisioning Flow — how tenants, repositories, roles, and users get wired together before the first conversation.
- Streaming Contract — the full NDJSON event grammar, including
queued,approval_required,resumed, and filler-flagged deltas. - Adapter Implementation Guide — the stateless adapter’s duties, including approval transport and the no-secret-logging rules.