Composable Runtime Architecture

This chapter explains what actually happens on the platform side when the host system sends a message with createMessage (POST /conversations/{conversation_id}/messages). The short version: the platform assembles a runtime environment on demand — the right repository, the right skills, the right storage, the right agent runtime — runs the agent inside an isolated sandbox that holds zero credentials, streams the result back, and tears the environment down. Nothing about that environment is baked in; everything is composed, per request, from data the Integration API already manages.

Understanding this model matters for integrators because every knob on the API — repository overrides, skill narrowing, runtime.agent_type, sticky sandboxes, secrets pass-through, approvals — is a handle on one stage of this composition pipeline. This chapter maps the knobs to the machinery.

1. The composable-runtime thesis: capabilities are files

The platform’s core design commitment is that an agent’s capabilities are files in a git repository — not rows in a proprietary capability database, not code compiled into the platform. A skill is a folder with a SKILL.md manifest (frontmatter + instructions + optional scripts and resources). Agent instructions, personas, and tool configurations follow the same convention-defined, file-based format.

The platform’s job is not to define capabilities — it is to compose them, per request, into a concrete runtime environment:

Repository registry          →  which capability trees exist (registerRepository)
Assignment cascade           →  which tree applies to THIS request
Skill-access intersection    →  which skills from that tree this run may use
Runtime selection            →  which agent harness executes them (runtime.agent_type)
Sandbox provisioning         →  an isolated environment with exactly that composition

The repository registry

Repositories are top-level registry entries, registered once with registerRepository (POST /repositories) together with a pre-authenticated credential (createCredential, POST /credentials — the git token is write-only and never readable again). A repository dictates which skills exist: listRepositorySkills (GET /repositories/{repository_id}/skills) enumerates them, syncRepository (POST /repositories/{repository_id}/sync) re-scans the tree after a push, and createRepositorySkill (POST /repositories/{repository_id}/skills) authors a new skill directly into the tree — committed to git like any other change.

The resolution cascade

A registered repository is then assigned down a cascade, most specific wins:

flowchart LR
    M["message.repository_id<br/><i>one-off override</i>"] --> C["conversation.repository_id<br/><i>thread override</i>"]
    C --> U["user.default_repository_id<br/><i>per-user override</i>"]
    U --> R["role.repository_id<br/><i>role override</i>"]
    R --> T["tenant default<br/><i>attached via attachTenantRepository</i>"]
    style M fill:#dbeafe,stroke:#1d4ed8
    style T fill:#dcfce7,stroke:#15803d

Level	Set via	Typical use
Tenant default	`attachTenantRepository` (`PUT /tenants/{tenant_id}/repositories/{repository_id}`, `{is_default: true}`)	The baseline capability tree every user in the tenant falls through to.
Role override	`createRole` / `updateRole` (`role.repository_id`)	A job function that works from a different tree (e.g. a dispatcher role pinned to a dispatch-tools repo).
User override	`updateUser` (`user.default_repository_id`)	A single power user piloting a new capability tree.
Conversation override	`createConversation` (`repository_id`)	One thread that should run against a specific tree for its whole lifetime.
Message override	`createMessage` (`repository_id`)	The “magic thing for one command” — a single turn executed against a different tree, without touching any stored configuration.

The effective skill set

Once the effective repository is resolved, the skills that actually enter the run are an intersection of four filters — each layer can only narrow, never widen:

effective_skills =
      skills of the effective repository        (what exists)
    ∩ role skill_access                         (what the role permits)
    ∩ conversation.selected_skill_ids           (what this thread opted into)
    ∩ message.skill_ids                         (what this turn narrowed to)

Integrators can inspect the resolution at every level without running anything: listUserSkills (GET /users/{user_id}/skills) returns a user’s effective skills with provenance (via {role_id, repository_id}), and listRoleSkills (GET /roles/{role_id}/skills) resolves a single role.

Why this design pays off

Versioned, auditable capability. Every capability change is a git commit — diffable, revertable, attributable. “What could the agent do on March 3rd?” is a git log question, not a forensic reconstruction.
Per-tenant customization without platform code. A new tenant vertical, a new playbook, a reworked skill set — all of it is repository content. The platform binary never changes; a new capability profile is a new repo (or branch) plus one attachTenantRepository call.
Instant capability updates. Push to the repository, call syncRepository, and the next message runs with the updated tree. No deployment, no restart, no migration.
Least-capability by construction. Because every cascade layer intersects downward, the narrowest intent wins. A message that names two skills runs with at most two skills — the LLM’s context contains nothing else, which is both a security property and a quality property (see next section).

2. The platform controls the run — not just an API wrapper

A common integration failure mode is treating an agent platform as a thin proxy in front of an LLM API: prompt in, tokens out, everything else is the caller’s problem. This platform is the opposite: it owns the LLM run end-to-end — what enters the model’s context, what the model is allowed to do mid-run, what the caller sees while it runs, and what happens when the model wants something it doesn’t have.

Concern	Thin API wrapper	This platform
Context contents	Caller concatenates strings and hopes	Composed from the resolved repo + effective skills + conversation history + workspace posture; nothing ungranted is even visible to the agent
Capability scope	Prompt-level pleading (“do not use tool X”)	Structural: ungranted skills are absent from the sandbox filesystem — the agent cannot list, read, or invoke them
Latency UX	Dead air until first token	Filler/gate: optional low-latency filler deltas (flagged `data.filler: true`) bridge the gap while the full run warms
Dangerous actions	Caller inspects output after the fact	Mid-run HITL gate: the run parks on `approval_required` and cannot proceed without a cryptographically signed approval
Secrets	In the prompt or env, visible to the model	Never in the run at all — the agent sees aliases; the egress proxy resolves them on the wire
Output	An unstructured token stream	A typed NDJSON event grammar with monotonic `seq`, distinct event types, and a mandatory terminal event

The individual mechanisms:

Skill narrowing keeps context lean. Skill instructions consume context-window budget, and irrelevant instructions measurably degrade model behavior. Because the effective skill set is an intersection (§1), a host that knows a turn only needs invoice-lookup can say so on createMessage via skill_ids — and the run’s context contains that skill and nothing else. Narrowing is simultaneously a quality control (focused context), a cost control (fewer tokens), and a security control (smaller action surface).
Context composition is platform-owned. The platform assembles what enters the agent loop: the resolved repository tree, the filtered skills, the conversation history, per-message env parameters, and a system-prompt description of the workspace (which paths are writable, what each storage zone is for). The host system supplies intent; the platform supplies a provably-scoped environment.
Filler/gate for latency UX. Provisioning an isolated environment takes real milliseconds. When filler is enabled (cascading tenant settings → conversation → message, most specific wins), the stream opens with one or two short content_delta events flagged data.filler: true — a natural spoken-style opener the host can render or suppress — while the full run warms behind it. The real agent continues from the filler; the flag lets hosts distinguish the two. See Streaming Contract for the event grammar.
HITL approval gates. Mid-run, the agent can declare that it needs something it doesn’t have — permission for a consequential action, or a secret that was never provided. The run emits approval_required (carrying an Approval object with requested_items[]) and parks in awaiting_approval. Resolution goes through approveApproval (POST /approvals/{approval_id}/approve) or denyApproval (POST /approvals/{approval_id}/deny) and requires a signed assertion made with a per-tenant approver key that is distinct from the integration API key — the transport that carries an approval cannot mint one. §5 covers how this composes with secrets.
Structured streaming. Every run emits typed NDJSON events (message_start, content_delta, queued, approval_required, resumed, message_end, error) with a monotonic seq for gap and truncation detection, and a guaranteed terminal event. The host never scrapes free text to infer run state.

The takeaway for integrators: the Integration API is not “send prompt, get completion.” It is “declare intent and constraints; the platform manufactures a scoped run and reports it back in a typed protocol.”

3. Pluggable agent runtimes

The composition pipeline in §1 deliberately does not assume any particular agent implementation. The files-in-a-repository convention (skills as SKILL.md folders, instructions as markdown) is an industry convention, not a vendor lock — multiple agent harnesses read the same layout.

The conversation’s runtime.agent_type field selects which harness executes the run:

{
  "runtime": {
    "agent_type": "claude-agent-sdk",
    "mode": "sticky",
    "sticky_ttl_seconds": 900
  }
}

agent_type is an open enum — claude-agent-sdk (the default), codex, deepagent, with more added over time without a breaking API change. The default comes from tenant settings; createConversation can override it per conversation. Unknown values are rejected with a validation-error problem listing the runtimes the deployment actually has installed.

The Agent Runtime Interface

Every runtime sits behind the same abstraction — the Agent Runtime Interface. A conforming runtime is an image that reads its prompt and composed workspace from defined locations, streams run events in a declared format, honors the platform’s env-var contract (including routing all outbound traffic through the platform proxy), and exits cleanly. Anything that meets the contract slots in; nothing else in the platform changes.

What the abstraction guarantees to the host system, regardless of which agent_type runs:

Guarantee	Meaning
Same Integration API contract	`createConversation`, `createMessage`, the NDJSON event grammar, approvals, secrets — identical request/response shapes for every runtime. Switching `agent_type` changes zero lines of host code.
Same skill conventions	The effective repository and effective skill set (§1) are composed identically. A skill authored once works across runtimes that honor the convention.
Same isolation	Every runtime executes inside the sandbox model of §4 — read-only repo, alias-only credentials, proxy-only egress. A runtime cannot opt out of the security envelope.
Same observability	Runs emit the same typed event stream and land in the same history (`listMessages`), with the same usage accounting, whichever harness produced them.

What can legitimately differ between runtimes: reasoning style and quality, tool-use behavior, latency and cost profile, and which optional conventions (sub-agents, slash commands) each harness supports. Those are selection criteria for choosing an agent_type — not integration risks.

This is why agent_type is safe to expose as a caller-facing knob: it selects an engine inside a fixed chassis. The chassis — API contract, capability composition, sandbox, streaming — is invariant.

4. Sandbox-per-run security: the environment an agent actually gets

Every agent run executes inside its own isolated sandbox — a disposable environment provisioned for the run and destroyed after it (or, for sticky conversations, kept for the lease and then destroyed; §5). The sandbox is where the composed capability set (§1) becomes a concrete filesystem, and where the platform’s central security claim is enforced:

The LLM is treated as an untrusted component. It never holds a real credential — not the git token, not the host system’s API keys, not conversation secrets, not even the key for its own upstream LLM provider.

flowchart TB
    subgraph control["Control plane (trusted — holds real secrets)"]
        API["Integration API<br/>composition & scheduling"]
        Vault[("Vault<br/>credentials (crd_) ·<br/>conversation secrets<br/>alias → value")]
        API --- Vault
    end

    subgraph sandbox["Sandbox — one isolated environment per run (untrusted, LLM-driven)"]
        direction TB
        Agent["Agent runtime<br/>(runtime.agent_type)"]
        RO["/workspace/repo — read-only<br/>resolved repository tree +<br/>filtered effective skills"]
        RW["writable zones<br/>per-run scratch space ·<br/>user storage · conversation storage<br/>(S3-style buckets)"]
        Agent --- RO
        Agent --- RW
    end

    Proxy["Egress proxy<br/>resolves aliases → real values<br/>ON THE WIRE, at the boundary"]
    Ext["External systems<br/>host APIs · SaaS · data stores"]

    API -->|"provisions run:<br/>repo + skills + aliases + env"| sandbox
    Agent -->|"outbound call carrying<br/>ALIAS only, e.g. {{secret:CRM_API_KEY}}"| Proxy
    Proxy <-->|"alias lookup<br/>(logged per resolution)"| Vault
    Proxy -->|"real credential substituted"| Ext
    Ext --> Proxy --> Agent

    Agent -.->|"any other egress path"| Blocked["✕ denied<br/>default-deny network policy"]

    style control fill:#dbeafe,stroke:#1d4ed8
    style sandbox fill:#dcfce7,stroke:#15803d
    style Proxy fill:#fef3c7,stroke:#b45309
    style Ext fill:#fee2e2,stroke:#b91c1c
    style Blocked fill:#fee2e2,stroke:#b91c1c,stroke-dasharray: 5 5

Filesystem: what the agent sees

Zone	Access	Lifetime	Contents
Repository mount	Read-only	Frozen for this run	The resolved repository tree with the filtered effective skill set — ungranted skills are physically absent, not merely “disallowed”
Run scratch space	Read-write	Destroyed with the run	Temporary working files
Conversation storage	Read-write	Life of the conversation	The conversation’s own S3-style bucket (`conversation.storage`) — files persist across turns in this thread
User storage	Read-write	Persistent	The user’s auto-attached S3-style bucket (`user.storage`; BYO-linkable via `updateUser`) — notes and artifacts that carry across all of the user’s conversations

Two properties are worth underlining:

The capability tree is immutable during a run. The repository mount is read-only, so a prompt-injected agent cannot edit its own skills or instructions to persist a compromise. Skill authoring flows through the control plane (createRepositorySkill) and lands as an auditable git commit — never through a running agent.
Skill filtering is structural, not advisory. The intersection from §1 is applied by removing ungranted skills from the mounted view before the agent starts. The agent cannot list, read, or invoke what is not there. There is no “ignore previous instructions” path around a file that does not exist.

Network: one door, and it resolves credentials

The sandbox’s network posture is default-deny egress with exactly one door: the credential-resolving forward proxy.

The agent’s environment contains aliases only — opaque tokens like {{secret:CRM_API_KEY}} referencing vault entries (registry credentials created with createCredential, or conversation-scoped secrets supplied via putConversationSecrets / the secrets field on createMessage; see §5).
When the agent makes an outbound call, the proxy resolves the alias and substitutes the real value on the wire, at the boundary — outside the agent’s process. The response returns through the same boundary. Every resolution is logged: which run, which alias, which destination.
Any egress that does not go through the proxy — direct socket, metadata endpoints, cluster services, sibling sandboxes — is denied by network policy.

The consequence: a fully compromised agent can exfiltrate no secret, because there is no secret in its memory, its environment, its filesystem, or its logs to exfiltrate. Aliases are worthless outside the proxy, and the proxy only resolves them for destinations within the credential’s scope. Combined with the HITL rule that approvals require a signing key the runtime never holds (§2), even a rogue agent cannot grant itself anything.

Process: defense in depth

Isolation never rests on a single mechanism. Inside the sandbox, the agent process runs:

as a non-root user, with all Linux capabilities dropped and privilege escalation disabled;
on a read-only root filesystem — only the designated writable zones accept writes;
under a seccomp allowlist that blocks mount, namespace, tracing, and other escape-adjacent syscalls;
with mandatory-access-control profiles (AppArmor/SELinux) layered on where the substrate provides them;
under hard CPU, memory, and disk quotas, so a runaway run degrades itself, not its neighbors.

These layers are independent: any single one failing still leaves a confused or adversarial agent receiving a clean permission error from the kernel rather than silently succeeding. And because every run gets a fresh sandbox, there is no residue: nothing written by one run is visible to the next except what was deliberately persisted to conversation or user storage.

5. How the Integration API knobs drive this machinery

Every runtime-facing knob on the API maps onto one stage of the pipeline above.

Sticky vs pooled sandboxes — `runtime.mode`

The trade is latency vs density:

pooled (default): each createMessage claims a pre-warmed sandbox from the shared pool, composes the workspace, runs, releases. Best density; per-turn composition cost.
sticky: createConversation with runtime.mode: "sticky" and sticky_ttl_seconds dedicates a sandbox to the conversation for the TTL. Turns land in an already-composed, already-warm environment — the low-latency choice for interactive UX. Each message refreshes the lease; updateConversation (PATCH /conversations/{conversation_id}) extends or releases it early. Observe the lease via runtime.sandbox_state (warm | active | expired) and runtime.expires_at.

Stickiness changes when the sandbox is destroyed — never what it may do. A sticky sandbox has the identical read-only-repo, alias-only, proxy-only posture as a pooled one, and tenant settings can cap the maximum TTL and the number of concurrent sticky sandboxes.

Capacity — `on_capacity` and `getCapacity`

Sandboxes are real, bounded resources. When the pool is exhausted, the caller chooses the failure mode per request:

on_capacity: "reject" (default) — immediate 429 capacity-exhausted problem with Retry-After; the host system owns the retry.
on_capacity: "hold" — the request queues: the stream first emits {type: "queued", data: {position, retry_hint}} events, then proceeds when a sandbox frees (bounded by a documented maximum hold, after which it fails capacity-exhausted).

getCapacity (GET /capacity) exposes pool state ({warm_available, sticky_active, at_capacity}) so integrators can pre-check before dispatching, or feed back-pressure into their own routing.

Runtime selection — `runtime.agent_type`

Set per conversation at createConversation (tenant settings supply the default). Per §3, this swaps the engine inside a fixed chassis — the sandbox posture, capability composition, event grammar, and approval mechanics are identical across runtimes.

Env and secrets pass-through — `env` and `secrets` on `createMessage`

Both ride on the message so the host system stays stateless — nothing to vault or persist on the adapter side:

env — plaintext, non-secret run parameters (locale, feature flags, request context). These are visible to the agent; the spec is emphatic that secret material must never travel here.
secrets — a write-only alias → value map. Values are vaulted at the API boundary, scoped to the conversation, and never appear in any response, log, or history. The run receives only the aliases; the egress proxy resolves them on outbound calls (§4). Manage them independently of messages via putConversationSecrets (PUT /conversations/{conversation_id}/secrets), listConversationSecrets (aliases and metadata only — never values), and deleteConversationSecret (DELETE /conversations/{conversation_id}/secrets/{alias}).

Approvals — the HITL weave

Approvals close the loop between §2’s gate and §4’s vault. The composed flow:

Mid-run, the agent determines it needs something — commonly a credential it was never given. It raises an Approval with requested_items: [{kind: "secret", alias: "CRM_API_KEY", ...}]; the stream emits approval_required and the message parks in awaiting_approval. A sticky sandbox keeps the run warm until the approval’s expires_at; a pooled run checkpoints and re-hydrates on approval.
The host system surfaces the request to its own approval authority and calls approveApproval (POST /approvals/{approval_id}/approve) with a signed assertion over {approval_id, decision, exp} — made with the per-tenant approver key, which is deliberately distinct from the integration API key. Optionally, the approve body’s secrets map supplies the requested value, which is vaulted on arrival like any other secret.
The run resumes (resumed event) with the new alias available; the proxy resolves it on the agent’s outbound call; the run streams to message_end. A denial or expiry ends the message failed with a problem-typed error event. Pending work is discoverable via listApprovals (GET /approvals?status=pending) and getApproval.

The guarantee this weave delivers: the agent can ask for anything, but can obtain nothing on its own. The value never enters the runtime (aliases only), the approval cannot be forged by anything inside the platform’s request path (the runtime holds no signing key, and the integration key can transport but not mint an approval), and every step — request, decision, resolution — is an auditable event.

One request, end to end

Putting all five knobs together, a single createMessage traverses the full pipeline:

Resolve — walk the repository cascade; intersect the four skill filters; pick the conversation’s agent_type.
Admit — sticky lease available, warm-pool claim, or the on_capacity path (reject / queued events).
Compose — mount the resolved repo read-only with the filtered skill view; attach conversation and user storage; inject env and the alias set; append the workspace posture to the system prompt.
Run — the selected runtime executes inside the hardened sandbox; filler deltas bridge warm-up if enabled; tool calls exit only through the credential-resolving proxy.
Gate — if the agent raises an Approval, park; resume only on a signed approval (optionally carrying the requested secret).
Deliver & tear down — stream to message_end, persist the message to history, release or destroy the sandbox per runtime.mode.

Every stage is observable through the API (runtime.sandbox_state, getCapacity, the event stream, listApprovals), and every stage is driven by data the host system controls through the same API. That is the composable-runtime promise: capability is configuration, execution is disposable, and trust is structural.

6. Observability: point the deployment at your OTEL destination

Every shiftagent deployment is instrumented with OpenTelemetry end to end, and every deployment can be configured with an OTEL destination — an OTLP endpoint of the operator’s choosing. Observability is a deployment-configuration concern, not a code change:

# Deployment configuration (Helm values)
observability:
  otelEndpoint: "http://otel-collector.observability.svc.cluster.local:4318"

which surfaces to the services as the standard OpenTelemetry environment contract:

OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector.observability.svc.cluster.local:4318
OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf

What flows to that destination:

Signal	What it covers
Traces	One trace per agent run: provisioning lookups, sandbox acquisition (warm-pool claim or sticky-lease reuse), repository/skill resolution, LLM calls (OTel GenAI semantic conventions — model, token usage), egress-proxy calls, stream completion
Metrics	Time-to-first-token and turn latency, sandbox pool utilization (`getCapacity` counters), queued/held request counts, approval wait times, provisioning cold-path rates, per-tenant token usage
Logs	Structured service logs with `request_id` correlation to traces; never message content or secret material

Three properties matter for a host-system embedding:

Vendor-neutral. The destination is any OTLP-compatible backend — a host-operated OTel collector, SigNoz, Datadog, Honeycomb, Grafana Tempo, Langfuse/Langsmith for the LLM spans, or the host system’s existing observability pipeline. The platform only requires that the OTLP endpoint be reachable from the deployment.
Self-hosted collector is replaceable. A self-hosted install ships with a bundled OTel collector by default; operators can point otelEndpoint at their own collector instead and the bundled one is never in the path.
Per-tenant fan-out. Tenants can be configured with additional OTLP exporters so the same telemetry ships into a downstream customer’s own observability stack — the same inherit-and-tighten model as every other tenant setting.

The adapter should propagate its X-Request-Id (and W3C traceparent, if the host system runs OpenTelemetry too) on every Integration API call — that stitches host-side spans and shiftagent-side spans into one distributed trace across the fusion boundary.

Integration Guide — architecture overview, auth model, and external-ID conventions.
Provisioning Flow — how tenants, repositories, roles, and users get wired together before the first conversation.
Streaming Contract — the full NDJSON event grammar, including queued, approval_required, resumed, and filler-flagged deltas.
Adapter Implementation Guide — the stateless adapter’s duties, including approval transport and the no-secret-logging rules.