What is Extended Thinking?
Extended thinking is Claude's capability to engage in explicit, step-by-step reasoning before producing a final response. When enabled, Claude "thinks out loud" in a structured way, breaking down complex problems, exploring approaches, and working through solutions systematically.
This improves performance on tasks requiring:
- Multi-step reasoning
- Mathematical problem-solving
- Code architecture decisions
- Complex analysis
- Strategic planning
- Logical deduction
How Extended Thinking Differs
Standard response:
- Claude generates output directly
- Reasoning is implicit
- Good for straightforward queries
Extended thinking:
- Claude explicitly works through the problem
- Thinking is visible and structured
- Better for complex or multi-step problems
When to Use Extended Thinking
Ideal Use Cases
Mathematical and logical problems:
Solve this optimization problem step by step:
A factory produces two products. Product A requires 2 hours of machine time
and 3 hours of labor, generating $50 profit. Product B requires 3 hours of
machine time and 2 hours of labor, generating $40 profit. The factory has
120 machine hours and 100 labor hours available weekly. Maximize profit.
Complex code architecture:
Design the database schema and API architecture for a real-time
collaborative document editor with:
- Multiple users editing simultaneously
- Version history with diff tracking
- Offline support with conflict resolution
- Permission levels (view, comment, edit, admin)
Strategic analysis:
Our SaaS startup has $500K runway. Revenue is $30K MRR with 5% growth.
Analyze three strategic options:
1. Focus on growth (increase CAC, hire salespeople)
2. Focus on product (hire engineers, build features)
3. Focus on efficiency (cut costs, extend runway)
Provide financial projections and risk assessment.
When NOT to Use
Avoid extended thinking for:
- Simple factual questions
- Straightforward content generation
- Tasks where speed matters more than depth
- Queries with obvious answers
Extended thinking uses significantly more tokens—use it intentionally.
API Implementation
Enabling Extended Thinking
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=16000,
thinking={
"type": "enabled",
"budget_tokens": 10000 # Max tokens for thinking
},
messages=[
{
"role": "user",
"content": "Design an algorithm to detect fraud in credit card transactions..."
}
]
)
Understanding the Response
Extended thinking responses contain both thinking blocks and text blocks:
{
"content": [
{
"type": "thinking",
"thinking": "Let me break down this fraud detection problem..."
},
{
"type": "text",
"text": "Here's my recommended fraud detection algorithm..."
}
],
"usage": {
"input_tokens": 156,
"output_tokens": 4523,
"thinking_tokens": 3200
}
}
Processing Thinking and Text
def process_response(response):
thinking_content = ""
text_content = ""
for block in response.content:
if block.type == "thinking":
thinking_content = block.thinking
elif block.type == "text":
text_content = block.text
return {
"thinking": thinking_content,
"response": text_content,
"thinking_tokens": response.usage.thinking_tokens
}
Streaming Extended Thinking
with client.messages.stream(
model="claude-sonnet-4-5-20250929",
max_tokens=16000,
thinking={"type": "enabled", "budget_tokens": 10000},
messages=[{"role": "user", "content": "..."}]
) as stream:
for event in stream:
if event.type == "content_block_start":
if event.content_block.type == "thinking":
print("🤔 Thinking: ", end="")
else:
print("\n📝 Response: ", end="")
elif event.type == "content_block_delta":
if hasattr(event.delta, "thinking"):
print(event.delta.thinking, end="", flush=True)
elif hasattr(event.delta, "text"):
print(event.delta.text, end="", flush=True)
Budget Management
Setting Token Budgets
thinking={
"type": "enabled",
"budget_tokens": 5000 # Minimum: 1000
}
Budget guidelines:
| Task Complexity | Recommended Budget |
| Single-step reasoning | 1,000 - 2,000 |
| Multi-step problems | 3,000 - 5,000 |
| Complex analysis | 5,000 - 10,000 |
| Deep research/design | 10,000 - 20,000 |
Cost Considerations
Thinking tokens are billed at output token rates:
| Model | Thinking Cost (per 1M tokens) |
| Claude Opus 4.5 | $75.00 |
| Claude Sonnet 4.5 | $15.00 |
| Claude Haiku 4.5 | $1.25 |
Optimization Strategies
Prompting for Better Thinking
prompt = """
Analyze this problem systematically.
Problem:
{problem_description}
In your thinking:
1. First identify all relevant constraints
2. Consider multiple approaches before committing
3. Evaluate trade-offs explicitly
4. Verify your reasoning before concluding
Then provide your final recommendation.
"""
Iterative Deepening
For very complex problems, use multiple passes:
# First pass: high-level analysis
initial = client.messages.create(
thinking={"type": "enabled", "budget_tokens": 3000},
messages=[{"role": "user", "content": f"High-level analysis of: {problem}"}]
)
# Second pass: deep dive
detailed = client.messages.create(
thinking={"type": "enabled", "budget_tokens": 8000},
messages=[
{"role": "user", "content": f"High-level analysis of: {problem}"},
{"role": "assistant", "content": initial.content},
{"role": "user", "content": "Dive deeper into critical issues."}
]
)
Best Practices
Do:
- Use for genuinely complex problems
- Set appropriate budget based on complexity
- Prompt for structured thinking
- Monitor quality and token usage
Don't:
- Enable for simple queries
- Set extremely high budgets by default
- Ignore thinking content
- Use in latency-sensitive applications carelessly
Frequently Asked Questions
Does extended thinking guarantee better answers?
It improves performance on complex reasoning tasks but isn't always necessary. Match the tool to the task.
Can I see Claude's thinking in the web interface?
Yes, thinking is displayed in a collapsible section when enabled in Claude Desktop or claude.ai.
How does this relate to chain-of-thought prompting?
Chain-of-thought asks Claude to show reasoning in output. Extended thinking is a deeper mechanism allocating dedicated processing for reasoning.
Can I disable thinking for specific turns?
Yes, the thinking parameter is per-request. Enable for complex turns, disable for simple follow-ups.
Extended thinking: When the problem deserves more thought, Claude thinks more deeply.