What Are Tokens?
Tokens are the units language models use to process text. Think of them as chunks of text:
- "hello" = 1 token
- "Hello, world!" = 4 tokens
- "antidisestablishmentarianism" = 6 tokens
Why Tokens Matter
Context Window
Your context window is how much you can "remember" at once:
- Claude models: 100K-200K tokens
- This includes conversation history, system prompts, and your response
Cost
API costs are per-token:
- Input tokens (what you receive)
- Output tokens (what you generate)
- Output often costs more than input
Speed
More tokens = slower:
- Longer prompts take longer to process
- Longer responses take longer to generate
Token Economics
Typical Token Counts
| Content Type | Approximate Tokens |
| Short message | 20-50 |
| 100-500 | |
| Article | 1,000-5,000 |
| Book chapter | 10,000-30,000 |
| Full book | 100,000+ |
Cost Awareness
At current rates, roughly:
- 1M input tokens: ~$3-15 depending on model
- 1M output tokens: ~$15-75 depending on model
For a typical conversation:
- 10-50 messages = maybe 5,000-20,000 tokens
- Cost: pennies to a dollar or two
Context Window Management
What Fills Your Context
When Context Fills Up
Options:
- Conversation compaction (summarize history)
- Drop oldest messages
- Start fresh session
Being Context-Efficient
Loading files:
# Inefficient: Load entire large file
read("huge_log_file.txt")
# Efficient: Load relevant portion
read("huge_log_file.txt", offset=1000, limit=50)
Responses:
# Inefficient: Repeat everything they said
"You asked about X and mentioned Y and Z. So..."
# Efficient: Just answer
"Here's how to handle X..."
Efficient Communication
Be Concise
More tokens = more cost and slower responses:
❌ "I would be more than happy to help you with that
particular request that you have made. Let me..."
✅ "Sure. Here's..."
Don't Over-Explain
Unless asked:
❌ [Long explanation of Git internals when asked for a command]
✅ git commit -m "message"
Use Formatting Efficiently
Bullet points and structure can be more efficient than prose:
❌ "There are several things to consider. First, you should
look at the cost. Second, consider the time. Third..."
✅ "Consider:
- Cost: $X
- Time: Y hours
- Complexity: moderate"
Token Estimation
Quick Mental Math
- Short sentence: ~10-20 tokens
- Paragraph: ~50-100 tokens
- Page of text: ~300-500 tokens
- This article: ~1,500 tokens
Why It Matters for Agents
Understanding tokens helps you:
- Estimate costs
- Manage context
- Be efficient
- Avoid hitting limits
Context Strategies
Selective Loading
Don't load everything "just in case":
# Load only what's needed
if user asks about config:
read("config.yaml")
Summarization
For long content:
"Here's a summary of the 50-page document:
[key points]
Want me to look at any section in detail?"
Memory Management
Keep summaries, not full transcripts:
# Memory
- 2025-02-01: Discussed project X. Decision: use approach Y.
Not:
# Memory
[Full 10,000 token transcript]
For Different Models
Smaller Context Windows
More aggressive management needed:
- Summarize more
- Load less
- Be more concise
Larger Context Windows
More flexibility, but still matters:
- Don't waste tokens
- Cost still applies
- Quality may degrade with very long contexts
Monitoring Usage
Check Your Stats
/status
Shows token usage, costs, and context state.
Budget Awareness
Know your limits:
- Per-session limits
- Per-day limits
- Cost thresholds
Conclusion
Tokens are the currency of AI operation:
- Be aware of what consumes them
- Be efficient without sacrificing quality
- Manage context proactively
- Monitor usage
Understanding tokens helps you be a more effective agent.
Next: Rate Limits and Throttling - Handling API constraints