What is Extended Thinking?

Extended thinking is Claude's capability to engage in explicit, step-by-step reasoning before producing a final response. When enabled, Claude "thinks out loud" in a structured way, breaking down complex problems, exploring approaches, and working through solutions systematically.

This improves performance on tasks requiring:

Multi-step reasoning

Mathematical problem-solving

Code architecture decisions

Complex analysis

Strategic planning

Logical deduction

How Extended Thinking Differs

Standard response:

Claude generates output directly

Reasoning is implicit

Good for straightforward queries

Extended thinking:

Claude explicitly works through the problem

Thinking is visible and structured

Better for complex or multi-step problems

When to Use Extended Thinking

Ideal Use Cases

Mathematical and logical problems:

Solve this optimization problem step by step:
A factory produces two products. Product A requires 2 hours of machine time
and 3 hours of labor, generating $50 profit. Product B requires 3 hours of
machine time and 2 hours of labor, generating $40 profit. The factory has
120 machine hours and 100 labor hours available weekly. Maximize profit.

Complex code architecture:

Design the database schema and API architecture for a real-time
collaborative document editor with:
- Multiple users editing simultaneously
- Version history with diff tracking
- Offline support with conflict resolution
- Permission levels (view, comment, edit, admin)

Strategic analysis:

Our SaaS startup has $500K runway. Revenue is $30K MRR with 5% growth.
Analyze three strategic options:
1. Focus on growth (increase CAC, hire salespeople)
2. Focus on product (hire engineers, build features)
3. Focus on efficiency (cut costs, extend runway)

Provide financial projections and risk assessment.

When NOT to Use

Avoid extended thinking for:

Simple factual questions

Straightforward content generation

Tasks where speed matters more than depth

Queries with obvious answers

Extended thinking uses significantly more tokens—use it intentionally.

API Implementation

Enabling Extended Thinking

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-5-20250929",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 10000  # Max tokens for thinking
    },
    messages=[
        {
            "role": "user",
            "content": "Design an algorithm to detect fraud in credit card transactions..."
        }
    ]
)

Understanding the Response

Extended thinking responses contain both thinking blocks and text blocks:

{
    "content": [
        {
            "type": "thinking",
            "thinking": "Let me break down this fraud detection problem..."
        },
        {
            "type": "text",
            "text": "Here's my recommended fraud detection algorithm..."
        }
    ],
    "usage": {
        "input_tokens": 156,
        "output_tokens": 4523,
        "thinking_tokens": 3200
    }
}

Processing Thinking and Text

def process_response(response):
    thinking_content = ""
    text_content = ""

    for block in response.content:
        if block.type == "thinking":
            thinking_content = block.thinking
        elif block.type == "text":
            text_content = block.text

    return {
        "thinking": thinking_content,
        "response": text_content,
        "thinking_tokens": response.usage.thinking_tokens
    }

Streaming Extended Thinking

with client.messages.stream(
    model="claude-sonnet-4-5-20250929",
    max_tokens=16000,
    thinking={"type": "enabled", "budget_tokens": 10000},
    messages=[{"role": "user", "content": "..."}]
) as stream:
    for event in stream:
        if event.type == "content_block_start":
            if event.content_block.type == "thinking":
                print("🤔 Thinking: ", end="")
            else:
                print("\n📝 Response: ", end="")

        elif event.type == "content_block_delta":
            if hasattr(event.delta, "thinking"):
                print(event.delta.thinking, end="", flush=True)
            elif hasattr(event.delta, "text"):
                print(event.delta.text, end="", flush=True)

Budget Management

Setting Token Budgets

thinking={
    "type": "enabled",
    "budget_tokens": 5000   # Minimum: 1000
}

Budget guidelines:

Task Complexity

Recommended Budget

Single-step reasoning	1,000 - 2,000
Multi-step problems	3,000 - 5,000
Complex analysis	5,000 - 10,000
Deep research/design	10,000 - 20,000

Cost Considerations

Thinking tokens are billed at output token rates:

Model

Thinking Cost (per 1M tokens)

Claude Opus 4.5	$75.00
Claude Sonnet 4.5	$15.00
Claude Haiku 4.5	$1.25

Optimization Strategies

Prompting for Better Thinking

prompt = """
Analyze this problem systematically.

Problem:
{problem_description}

In your thinking:
1. First identify all relevant constraints
2. Consider multiple approaches before committing
3. Evaluate trade-offs explicitly
4. Verify your reasoning before concluding

Then provide your final recommendation.
"""

Iterative Deepening

For very complex problems, use multiple passes:

# First pass: high-level analysis
initial = client.messages.create(
    thinking={"type": "enabled", "budget_tokens": 3000},
    messages=[{"role": "user", "content": f"High-level analysis of: {problem}"}]
)

# Second pass: deep dive
detailed = client.messages.create(
    thinking={"type": "enabled", "budget_tokens": 8000},
    messages=[
        {"role": "user", "content": f"High-level analysis of: {problem}"},
        {"role": "assistant", "content": initial.content},
        {"role": "user", "content": "Dive deeper into critical issues."}
    ]
)

Best Practices

Do:

Use for genuinely complex problems
Set appropriate budget based on complexity
Prompt for structured thinking
Monitor quality and token usage

Don't:

Enable for simple queries
Set extremely high budgets by default
Ignore thinking content
Use in latency-sensitive applications carelessly

Frequently Asked Questions

Does extended thinking guarantee better answers?

It improves performance on complex reasoning tasks but isn't always necessary. Match the tool to the task.

Can I see Claude's thinking in the web interface?

Yes, thinking is displayed in a collapsible section when enabled in Claude Desktop or claude.ai.

How does this relate to chain-of-thought prompting?

Chain-of-thought asks Claude to show reasoning in output. Extended thinking is a deeper mechanism allocating dedicated processing for reasoning.

Can I disable thinking for specific turns?

Yes, the thinking parameter is per-request. Enable for complex turns, disable for simple follow-ups.

Extended thinking: When the problem deserves more thought, Claude thinks more deeply.

Claude Extended Thinking: Understanding and Leveraging Deep Reasoning