Skip to content

Context management

Long-running agents can hit input token ceilings fast. Set contextStrategy on AgentConfig to control how the conversation shrinks as it grows:

const agent: AgentConfig = {
name: 'long-runner',
model: 'claude-sonnet-4-6',
// Pick one:
contextStrategy: { type: 'sliding-window', maxTurns: 20 },
// contextStrategy: { type: 'summarize', maxTokens: 80_000, summaryModel: 'claude-haiku-4-5' },
// contextStrategy: { type: 'compact', maxTokens: 100_000, preserveRecentTurns: 4 },
// contextStrategy: { type: 'custom', compress: (messages, estimatedTokens) => ... },
}
StrategyWhen to reach for it
sliding-windowCheapest. Keep the last N turns, drop the rest.
summarizeSend old turns to a summary model; keep the summary in place of the originals.
compactRule-based: truncate large assistant text blocks and tool results, keep recent turns intact. No extra LLM call.
customSupply your own compress(messages, estimatedTokens) function.

Tool outputs persist in the conversation history across turns even after the agent has acted on them. In long runs this can consume a significant portion of the context budget.

compressToolResults replaces already-consumed tool results (those followed by an assistant response) with a short marker before each new LLM call:

const agent: AgentConfig = {
name: 'long-runner',
model: 'claude-sonnet-4-6',
// Enable with the default threshold (500 chars):
compressToolResults: true,
// Or only compress results longer than N characters:
// compressToolResults: { minChars: 2000 },
}
ValueBehaviour
trueCompress results longer than 500 characters (default threshold)
{ minChars: N }Compress results longer than N characters
false / undefinedDisabled (default)

Notes:

  • Error tool results are never compressed.
  • Delegation tool_result blocks (from delegate_to_agent) are exempt — the parent agent always retains the full sub-agent output.
  • Works alongside contextStrategy; combine both for maximum context headroom.

maxToolOutputChars caps the raw output length for every tool used by an agent. Outputs longer than the limit are truncated to a head + tail excerpt with a marker in between. This applies at execution time, before the result enters the conversation.

const agent: AgentConfig = {
name: 'long-runner',
model: 'claude-sonnet-4-6',
maxToolOutputChars: 10_000, // truncate any single tool output to 10 k chars
}

Per-tool maxOutputChars (set on ToolDefinition) takes priority over the agent-level maxToolOutputChars.

Reasoning models (OpenAI o-series, DeepSeek reasoner, Anthropic extended thinking, Gemini thought summaries) emit intermediate reasoning that the framework extracts as ReasoningBlocks with a provenance field identifying the producing adapter. By default, only same-provider blocks with a valid signature are echoed back; everything else is silently dropped on outbound conversion to avoid the receiving model rejecting an unsigned thinking block or to keep prompt size predictable.

preserveReasoningAsText opts into a <thinking>...</thinking> text fallback: whenever an outbound conversion encounters a reasoning block the target adapter cannot natively echo, the block is downgraded to inline text and prepended to the next assistant message:

const agent: AgentConfig = {
name: 'cross-provider',
model: 'gpt-5',
provider: 'openai',
// Enable text fallback for reasoning blocks the target adapter can't echo.
preserveReasoningAsText: true,
// Defaults to ON when preserveReasoningAsText is true; head+tail truncate
// each block to 1200 chars. Override to tune, or set false to disable.
// compressReasoningText: { minChars: 4000 },
}

When the fallback fires:

Source provenanceTarget adapter capabilityBehaviour with preserveReasoningAsText: true
Matches target ('anthropic' → Anthropic) and has signature'own-issued'Native echo (unchanged)
Matches target but no signature'own-issued'Text fallback
Foreign (e.g. 'openai' → Anthropic)'own-issued'Text fallback
Matches target ('deepseek' → DeepSeek) in a tool-calling conversation'tool-use-only'Native reasoning_content echo (per DeepSeek V4 spec, see #251)
Matches target but no tool_use anywhere in history'tool-use-only'Dropped (DeepSeek spec ignores non-tool reasoning)
Foreign (e.g. 'openai' → DeepSeek)'tool-use-only'Text fallback
Matches target ('bedrock' → Bedrock) and has signature or is redacted'own-issued'Native echo via reasoningContent.{reasoningText,redactedContent} (see #223)
Any'never' (OpenAI, Azure, Copilot, AI SDK, etc.)Text fallback

Redacted reasoning (Anthropic safety-filtered) emits the placeholder <thinking>[redacted]</thinking> to signal that reasoning occurred without leaking the opaque payload.

Notes:

  • Disabled by default to avoid silently inflating prompt tokens.
  • Default-on truncation (compressReasoningText) is mandatory for safety on long chain-of-thought; disable only when debugging.
  • Some local OpenAI-compatible models may echo <thinking> text back into their assistant response, which can trip the loop detector. See examples/patterns/cross-provider-reasoning.ts for the failure mode and mitigations.
  • Bedrock has capabilities.echoesReasoning === 'own-issued': signed reasoning blocks (reasoningContent.reasoningText.signature) and redacted blocks (reasoningContent.redactedContent) round-trip natively on both chat() and stream(), in both inbound extraction and outbound serialization (see #223).
  • 'tool-use-only' (DeepSeek V4) is the only capability where same-provider echo works without the user opting into preserveReasoningAsText — it’s forced on internally because the DeepSeek API requires it.