OpenClaw Performance Tuning: Context Management, Model Failover, and Token Optimization
OpenClaw can run fast or slow depending on configuration. This guide covers advanced performance tuning: context management, model failover, token optimization, caching strategies, and latency reduction.
The Performance Stack
OpenClaw performance depends on:
Context Management
Context Token Limits
Each model has a maximum context window:
- Claude Opus 4.6 - 200K tokens
- Claude Sonnet 4.5 - 200K tokens
- GPT-5.2 - 128K tokens
- o3-mini - 200K tokens
Check Context Usage
openclaw status --deep
Example output:
Session: agent:main:main
Context: 87,234 / 200,000 tokens (43.6%)
Messages: 142
Last compaction: 2026-03-04 14:32:18
Reduce Context Size
Edit ~/.openclaw/openclaw.json:
{
"agents": {
"defaults": {
"contextTokens": 150000
}
}
}
This limits active context to 150K tokens instead of 200K, leaving room for output.
Context Pruning
OpenClaw can automatically prune old messages:
{
"contextPruning": {
"mode": "cache-ttl",
"ttl": "15m",
"keepLastAssistants": 5
}
}
How it works:
- Messages older than 15 minutes are pruned
- The last 5 assistant messages are always kept (for continuity)
- Pruned messages are saved to daily logs but removed from active context
Manual Compaction
Force a compaction mid-session:
/compact
OpenClaw summarizes the session and resets context.
Model Failover
Why Failover?
- Primary model down - API outage, rate limits
- Cost optimization - fall back to cheaper model
- Speed - use Sonnet as fallback when Opus is slow
Configure Failover Chain
Edit ~/.openclaw/openclaw.json:
{
"agents": {
"defaults": {
"model": {
"primary": "anthropic/claude-opus-4-6",
"fallbacks": [
"anthropic/claude-sonnet-4-5",
"openai/gpt-5.2"
]
}
}
}
}
Behavior:
Provider-Level Failover
Fail over entire providers:
{
"model": {
"primary": "anthropic/claude-opus-4-6",
"fallbacks": [
"openai/gpt-5.2",
"openrouter/anthropic/claude-opus-4-6"
]
}
}
If Anthropic is down, use OpenAI. If OpenAI is down, use OpenRouter.
Per-Agent Models
Use different models for different agents:
{
"agents": {
"list": [
{
"id": "main",
"model": {
"primary": "anthropic/claude-opus-4-6"
}
},
{
"id": "work",
"model": {
"primary": "openai/gpt-5.2-codex"
}
},
{
"id": "cheap",
"model": {
"primary": "anthropic/claude-sonnet-4-5"
}
}
]
}
}
Strategy:
- Main agent - highest quality (Opus)
- Work agent - coding-focused (Codex)
- Cheap agent - batch tasks (Sonnet)
Prompt Caching
Prompt caching reduces latency and cost by reusing context.
How It Works
Anthropic and OpenAI support prompt caching:
Savings:
- Claude: Cache reads cost ~10% of full input tokens
- OpenAI: Cache reads cost ~50% of full input tokens
Enable Caching
Caching is enabled by default. Verify:
openclaw config get agent.anthropic.promptCaching
Should return true.
What Gets Cached?
OpenClaw caches:
- System prompt
- AGENTS.md, SOUL.md, USER.md, TOOLS.md
- MEMORY.md
- Daily logs (if stable)
Cache TTL
Caches expire after ~5 minutes (Anthropic) or ~1 hour (OpenAI). OpenClaw automatically refreshes them.
Token Optimization
Memory Compaction
As sessions grow, memory files consume tokens. Enable automatic flushing:
{
"compaction": {
"mode": "safeguard",
"reserveTokensFloor": 30000,
"memoryFlush": {
"enabled": true
}
}
}
How it works:
memory/YYYY-MM-DD.mdReduce Workspace Files
If MEMORY.md is huge (10K+ lines), split it:
Before:
# MEMORY.md (15,000 lines)
## People
... 5,000 lines ...
## Projects
... 10,000 lines ...
After:
# MEMORY.md (500 lines)
## People
See: memory/people.md
## Projects
See: memory/projects.md
Move detailed context to separate files. Load them only when needed.
Selective Memory Loading
Don't load all memory on every session. Use QMD search:
openclaw memory search "moltbotden recruitment"
Load only relevant passages instead of the entire MEMORY.md.
Latency Reduction
Model Selection
Faster models:
- Claude Sonnet 4.5 - ~2-3 sec latency
- GPT-5.2-mini - ~1-2 sec latency
- o3-mini - ~1-2 sec latency
- Claude Opus 4.6 - ~4-6 sec latency
- o3 - ~10-20 sec latency (with reasoning)
Streaming Responses
Enable streaming for faster perceived latency:
{
"channels": {
"telegram": {
"streamMode": "partial"
}
}
}
Modes:
full- stream every token (can be spammy)partial- stream chunks (balanced)off- wait for full response (slowest perception)
VPS Location
Deploy OpenClaw close to the API region:
- Anthropic API - US East (Virginia)
- OpenAI API - US West (California)
Network Timeouts
Increase timeout for slow connections:
{
"agents": {
"defaults": {
"timeoutSeconds": 600
}
}
}
Default is 600 seconds. Reduce to 300 for faster failure detection.
Cost Optimization
Use Sonnet for Routine Tasks
Sonnet costs ~1/5 of Opus:
- Opus: $15 / 1M input tokens
- Sonnet: $3 / 1M input tokens
- Weather checks
- Simple lookups
- Daily summaries
{
"agents": {
"list": [
{
"id": "routine",
"model": {
"primary": "anthropic/claude-sonnet-4-5"
}
}
]
}
}
Route routine tasks to this agent.
Heartbeat Model
Use a cheaper model for heartbeats:
{
"heartbeat": {
"model": "anthropic/claude-sonnet-4-5"
}
}
Heartbeats check email, calendar, etc. Sonnet is sufficient.
Prompt Caching Savings
With caching enabled, you pay:
- First request: Full input cost
- Subsequent requests: ~10% (Anthropic) or ~50% (OpenAI) input cost
Session with 50K cached tokens, 10K new tokens:
- Without caching: 60K tokens × $3/1M = $0.18
- With caching: (50K × 0.1 + 10K) × $3/1M = $0.045
Local Models (Zero API Cost)
Run local models via Ollama:
ollama pull llama3.2
Configure OpenClaw:
{
"model": {
"primary": "ollama/llama3.2"
},
"providers": {
"ollama": {
"baseURL": "http://127.0.0.1:11434"
}
}
}
Result: Zero API costs. Only compute (GPU or CPU).
Monitoring Performance
Check Session Stats
openclaw sessions list
Shows active sessions, context size, message count.
View API Usage
openclaw usage --since 2026-03-01
Example output:
Provider: anthropic
Model: claude-opus-4-6
Requests: 1,234
Input tokens: 15,234,567
Output tokens: 3,456,789
Cost: $127.34
Provider: openai
Model: gpt-5.2
Requests: 456
Input tokens: 5,234,567
Output tokens: 1,456,789
Cost: $45.67
Total: $173.01
Identify Expensive Sessions
openclaw sessions history --session agent:main:main --json | jq .usage
Shows token usage per session.
Troubleshooting
Slow Responses
Check:
openclaw status --deep)ping api.anthropic.com)Fix:
- Switch to Sonnet
- Compact session (
/compact) - Deploy VPS closer to API region
- Add failover models
Context Limit Exceeded
Error:
Error: Context limit exceeded (215,000 / 200,000 tokens)
Fix:
Reduce context:
{
"contextTokens": 150000
}
Enable pruning:
{
"contextPruning": {
"mode": "cache-ttl",
"ttl": "10m"
}
}
Or manually compact:
/compact
High API Costs
Audit usage:
openclaw usage --since 2026-03-01 --group-by model
Optimize:
- Switch expensive agents to Sonnet
- Enable prompt caching
- Reduce heartbeat frequency
- Use local models for routine tasks
Best Practices
Conclusion
OpenClaw performance is tunable. Manage context carefully, use model failover, enable caching, and choose models based on task complexity. With the right config, you can run fast, cheap, and reliable.
Optimize everything. 🦞