Context management
Long-running agents can hit input token ceilings fast. Set contextStrategy on AgentConfig to control how the conversation shrinks as it grows:
const agent: AgentConfig = { name: 'long-runner', model: 'claude-sonnet-4-6', // Pick one: contextStrategy: { type: 'sliding-window', maxTurns: 20 }, // contextStrategy: { type: 'summarize', maxTokens: 80_000, summaryModel: 'claude-haiku-4-5' }, // contextStrategy: { type: 'compact', maxTokens: 100_000, preserveRecentTurns: 4 }, // contextStrategy: { type: 'custom', compress: (messages, estimatedTokens) => ... },}| Strategy | When to reach for it |
|---|---|
sliding-window | Cheapest. Keep the last N turns, drop the rest. |
summarize | Send old turns to a summary model; keep the summary in place of the originals. |
compact | Rule-based: truncate large assistant text blocks and tool results, keep recent turns intact. No extra LLM call. |
custom | Supply your own compress(messages, estimatedTokens) function. |
Compressing Tool Results
Section titled “Compressing Tool Results”Tool outputs persist in the conversation history across turns even after the agent has acted on them. In long runs this can consume a significant portion of the context budget.
compressToolResults replaces already-consumed tool results (those followed by an assistant response) with a short marker before each new LLM call:
const agent: AgentConfig = { name: 'long-runner', model: 'claude-sonnet-4-6', // Enable with the default threshold (500 chars): compressToolResults: true, // Or only compress results longer than N characters: // compressToolResults: { minChars: 2000 },}| Value | Behaviour |
|---|---|
true | Compress results longer than 500 characters (default threshold) |
{ minChars: N } | Compress results longer than N characters |
false / undefined | Disabled (default) |
Notes:
- Error tool results are never compressed.
- Delegation
tool_resultblocks (fromdelegate_to_agent) are exempt — the parent agent always retains the full sub-agent output. - Works alongside
contextStrategy; combine both for maximum context headroom.
Truncating Tool Output
Section titled “Truncating Tool Output”maxToolOutputChars caps the raw output length for every tool used by an agent. Outputs longer than the limit are truncated to a head + tail excerpt with a marker in between. This applies at execution time, before the result enters the conversation.
const agent: AgentConfig = { name: 'long-runner', model: 'claude-sonnet-4-6', maxToolOutputChars: 10_000, // truncate any single tool output to 10 k chars}Per-tool maxOutputChars (set on ToolDefinition) takes priority over the agent-level maxToolOutputChars.
Preserving Reasoning Across Providers
Section titled “Preserving Reasoning Across Providers”Reasoning models (OpenAI o-series, DeepSeek reasoner, Anthropic extended thinking, Gemini thought summaries) emit intermediate reasoning that the framework extracts as ReasoningBlocks with a provenance field identifying the producing adapter. By default, only same-provider blocks with a valid signature are echoed back; everything else is silently dropped on outbound conversion to avoid the receiving model rejecting an unsigned thinking block or to keep prompt size predictable.
preserveReasoningAsText opts into a <thinking>...</thinking> text fallback: whenever an outbound conversion encounters a reasoning block the target adapter cannot natively echo, the block is downgraded to inline text and prepended to the next assistant message:
const agent: AgentConfig = { name: 'cross-provider', model: 'gpt-5', provider: 'openai', // Enable text fallback for reasoning blocks the target adapter can't echo. preserveReasoningAsText: true, // Defaults to ON when preserveReasoningAsText is true; head+tail truncate // each block to 1200 chars. Override to tune, or set false to disable. // compressReasoningText: { minChars: 4000 },}When the fallback fires:
| Source provenance | Target adapter capability | Behaviour with preserveReasoningAsText: true |
|---|---|---|
Matches target ('anthropic' → Anthropic) and has signature | 'own-issued' | Native echo (unchanged) |
| Matches target but no signature | 'own-issued' | Text fallback |
Foreign (e.g. 'openai' → Anthropic) | 'own-issued' | Text fallback |
Matches target ('deepseek' → DeepSeek) in a tool-calling conversation | 'tool-use-only' | Native reasoning_content echo (per DeepSeek V4 spec, see #251) |
Matches target but no tool_use anywhere in history | 'tool-use-only' | Dropped (DeepSeek spec ignores non-tool reasoning) |
Foreign (e.g. 'openai' → DeepSeek) | 'tool-use-only' | Text fallback |
Matches target ('bedrock' → Bedrock) and has signature or is redacted | 'own-issued' | Native echo via reasoningContent.{reasoningText,redactedContent} (see #223) |
| Any | 'never' (OpenAI, Azure, Copilot, AI SDK, etc.) | Text fallback |
Redacted reasoning (Anthropic safety-filtered) emits the placeholder <thinking>[redacted]</thinking> to signal that reasoning occurred without leaking the opaque payload.
Notes:
- Disabled by default to avoid silently inflating prompt tokens.
- Default-on truncation (
compressReasoningText) is mandatory for safety on long chain-of-thought; disable only when debugging. - Some local OpenAI-compatible models may echo
<thinking>text back into their assistant response, which can trip the loop detector. Seeexamples/patterns/cross-provider-reasoning.tsfor the failure mode and mitigations. - Bedrock has
capabilities.echoesReasoning === 'own-issued': signed reasoning blocks (reasoningContent.reasoningText.signature) and redacted blocks (reasoningContent.redactedContent) round-trip natively on bothchat()andstream(), in both inbound extraction and outbound serialization (see #223). 'tool-use-only'(DeepSeek V4) is the only capability where same-provider echo works without the user opting intopreserveReasoningAsText— it’s forced on internally because the DeepSeek API requires it.