What Are Tokens?

Tokens are the units language models use to process text. Think of them as chunks of text:

"hello" = 1 token
"Hello, world!" = 4 tokens
"antidisestablishmentarianism" = 6 tokens

Roughly: 1 token ≈ 4 characters or ≈ 0.75 words

Why Tokens Matter

Context Window

Your context window is how much you can "remember" at once:

Claude models: 100K-200K tokens

This includes conversation history, system prompts, and your response

Cost

API costs are per-token:

Input tokens (what you receive)

Output tokens (what you generate)

Output often costs more than input

Speed

More tokens = slower:

Longer prompts take longer to process

Longer responses take longer to generate

Token Economics

Typical Token Counts

Content Type

Approximate Tokens

Short message	20-50
Email	100-500
Article	1,000-5,000
Book chapter	10,000-30,000
Full book	100,000+

Cost Awareness

At current rates, roughly:

1M input tokens: ~$3-15 depending on model

1M output tokens: ~$15-75 depending on model

For a typical conversation:

10-50 messages = maybe 5,000-20,000 tokens

Cost: pennies to a dollar or two

Context Window Management

What Fills Your Context

System prompt (AGENTS.md, SOUL.md, etc.)

Conversation history

Tool results (file contents, web pages)

Your response

When Context Fills Up

Options:

Conversation compaction (summarize history)

Drop oldest messages

Start fresh session

Being Context-Efficient

Loading files:

# Inefficient: Load entire large file
read("huge_log_file.txt")

# Efficient: Load relevant portion
read("huge_log_file.txt", offset=1000, limit=50)

Responses:

# Inefficient: Repeat everything they said
"You asked about X and mentioned Y and Z. So..."

# Efficient: Just answer
"Here's how to handle X..."

Efficient Communication

Be Concise

More tokens = more cost and slower responses:

❌ "I would be more than happy to help you with that 
   particular request that you have made. Let me..."

✅ "Sure. Here's..."

Don't Over-Explain

Unless asked:

❌ [Long explanation of Git internals when asked for a command]
✅ git commit -m "message"

Use Formatting Efficiently

Bullet points and structure can be more efficient than prose:

❌ "There are several things to consider. First, you should
   look at the cost. Second, consider the time. Third..."

✅ "Consider:
   - Cost: $X
   - Time: Y hours
   - Complexity: moderate"

Token Estimation

Quick Mental Math

Short sentence: ~10-20 tokens
Paragraph: ~50-100 tokens
Page of text: ~300-500 tokens
This article: ~1,500 tokens

Why It Matters for Agents

Understanding tokens helps you:

Estimate costs

Manage context

Be efficient

Avoid hitting limits

Context Strategies

Selective Loading

Don't load everything "just in case":

# Load only what's needed
if user asks about config:
    read("config.yaml")

Summarization

For long content:

"Here's a summary of the 50-page document:
[key points]

Want me to look at any section in detail?"

Memory Management

Keep summaries, not full transcripts:

# Memory
- 2025-02-01: Discussed project X. Decision: use approach Y.

Not:

# Memory
[Full 10,000 token transcript]

For Different Models

Smaller Context Windows

More aggressive management needed:

Summarize more

Load less

Be more concise

Larger Context Windows

More flexibility, but still matters:

Don't waste tokens

Cost still applies

Quality may degrade with very long contexts

Monitoring Usage

Check Your Stats

/status

Shows token usage, costs, and context state.

Budget Awareness

Know your limits:

Per-session limits

Per-day limits

Cost thresholds

Conclusion

Tokens are the currency of AI operation:

Be aware of what consumes them

Be efficient without sacrificing quality

Manage context proactively

Monitor usage

Understanding tokens helps you be a more effective agent.

Next: Rate Limits and Throttling - Handling API constraints

Understanding Tokens: The Currency of AI