ralph-loops
> **First time?** Read [SETUP.md](./SETUP.md) first to install dependencies and verify your setup.
Installation
npx clawhub@latest install ralph-loopsView the full skill documentation and source below.
Documentation
Ralph Loops Skill
First time? Read SETUP.md first to install dependencies and verify your setup.
Autonomous AI agent loops for iterative development. Based on Geoffrey Huntley's Ralph Wiggum technique, as documented by Clayton Farr.
Script: skills/ralph-loops/scripts/ralph-loop.mjs
Dashboard: skills/ralph-loops/dashboard/ (run with node server.mjs)
Templates: skills/ralph-loops/templates/
Archive: ~/clawd/logs/ralph-archive/
⚠️ Don't Block the Conversation!
When running a Ralph loop, don't monitor it synchronously. The loop runs as a separate Claude CLI process — you can keep chatting.
❌ Wrong (blocks conversation):
Start loop → sleep 60 → poll → sleep 60 → poll → ... (6 minutes of silence)
✅ Right (stays responsive):
Start loop → "It's running, I'll check periodically" → keep chatting → check on heartbeats
How to monitor without blocking:
node ralph-loop.mjs ... (runs in background)process poll when asked or during heartbeatsThe loop is autonomous — that's the whole point. Don't babysit it at the cost of ignoring your human.
Trigger Phrases
When human says:
| Phrase | Action |
| "Interview me about system X" | Start Phase 1 requirements interview |
| "Start planning system X" | Run ./loop.sh plan (needs specs first) |
| "Start building system X" | Run ./loop.sh build (needs plan first) |
| "Ralph loop over X" | ASK which phase (see below) |
When Human Says "Ralph Loop" — Clarify the Phase!
Don't assume which phase. Ask:
"Which type of Ralph loop are we doing?
1️⃣ Interview — I'll ask you questions to build specs (Phase 1)
2️⃣ Planning — I'll iterate on an implementation plan (Phase 2)
3️⃣ Building — I'll implement from a plan, one task per iteration (Phase 3)
4️⃣ Generic — Simple iterative refinement on a single topic"
Then proceed based on their answer:
| Choice | Action |
| Interview | Use templates/requirements-interview.md protocol |
| Planning | Need specs first → run planning loop with PROMPT_plan.md |
| Building | Need plan first → run build loop with PROMPT_build.md |
| Generic | Create prompt file, run ralph-loop.mjs directly |
Generic Ralph Loop Flow (Phase 4)
For simple iterative refinement (not full system builds):
/tmp/ralph-prompt-.mdnode skills/ralph-loops/scripts/ralph-loop.mjs \
--prompt "/tmp/ralph-prompt-<task>.md" \
--model opus \
--max 10 \
--done "RALPH_DONE"Core Philosophy
"Human roles shift from 'telling the agent what to do' to 'engineering conditions where good outcomes emerge naturally through iteration."
— Clayton Farr
Three principles drive everything:
Three-Phase Workflow
┌─────────────────────────────────────────────────────────────────────┐
│ Phase 1: REQUIREMENTS │
│ Human + LLM conversation → JTBD → Topics → specs/*.md │
├─────────────────────────────────────────────────────────────────────┤
│ Phase 2: PLANNING │
│ Gap analysis (specs vs code) → IMPLEMENTATION_PLAN.md │
├─────────────────────────────────────────────────────────────────────┤
│ Phase 3: BUILDING │
│ One task per iteration → fresh context → backpressure → commit │
└─────────────────────────────────────────────────────────────────────┘
Phase 1: Requirements (Talk to Human)
Goal: Understand what to build BEFORE building it.
This is the most important phase. Use structured conversation to:
- What user need or outcome are we solving?
- Not features — outcomes
- Each topic = one distinct aspect/component
- Use the "one sentence without 'and'" test
- ✓ "The color extraction system analyzes images to identify dominant colors"
- ✗ "The user system handles authentication, profiles, and billing" → 3 topics
- One markdown file per topic in
specs/- Capture requirements, acceptance criteria, edge cases
Template: templates/requirements-interview.md
Phase 2: Planning (Gap Analysis)
Goal: Create a prioritized task list without implementing anything.
Uses PROMPT_plan.md in the loop:
- Study all specs
- Study existing codebase
- Compare specs vs code (gap analysis)
- Generate
IMPLEMENTATION_PLAN.mdwith prioritized tasks - NO implementation — planning only
Usually completes in 1-2 iterations.
Phase 3: Building (One Task Per Iteration)
Goal: Implement tasks one at a time with fresh context.
Uses PROMPT_build.md in the loop:
IMPLEMENTATION_PLAN.mdKey insight: One task per iteration keeps context lean. The agent stays in the "smart zone" instead of accumulating cruft.
Why fresh context matters:
- No accumulated mistakes — Each iteration starts clean; previous errors don't compound
- Full context budget — 200K tokens for THIS task, not shared with finished work
- Reduced hallucination — Shorter contexts = more grounded responses
- Natural checkpoints — Each commit is a save point; easy to revert single iterations
File Structure
project/
├── loop.sh # Ralph loop script
├── PROMPT_plan.md # Planning mode instructions
├── PROMPT_build.md # Building mode instructions
├── AGENTS.md # Operational guide (~60 lines max)
├── IMPLEMENTATION_PLAN.md # Prioritized task list (generated)
└── specs/ # Requirement specs
├── topic-a.md
├── topic-b.md
└── ...
File Purposes
| File | Purpose | Who Creates |
specs/*.md | Source of truth for requirements | Human + Phase 1 |
PROMPT_plan.md | Instructions for planning mode | Copy from template |
PROMPT_build.md | Instructions for building mode | Copy from template |
AGENTS.md | Build/test/lint commands | Human + Ralph |
IMPLEMENTATION_PLAN.md | Task list with priorities | Ralph (Phase 2) |
Project Organization (Systems)
For Clawdbot systems, each Ralph project lives in /systems//:
systems/
├── health-tracker/ # Example system
│ ├── specs/
│ │ ├── daily-tracking.md
│ │ └── test-scheduling.md
│ ├── PROMPT_plan.md
│ ├── PROMPT_build.md
│ ├── AGENTS.md
│ ├── IMPLEMENTATION_PLAN.md # ← exists = past Phase 1
│ └── src/
└── activity-planner/
├── specs/ # ← empty = still in Phase 1
└── ...
Phase Detection (Auto)
Detect current phase by checking what files exist:
| What Exists | Current Phase | Next Action |
Nothing / empty specs/ | Phase 1: Requirements | Run requirements interview |
specs/*.md but no IMPLEMENTATION_PLAN.md | Ready for Phase 2 | Run ./loop.sh plan |
specs/*.md + IMPLEMENTATION_PLAN.md | Phase 2 or 3 | Review plan, run ./loop.sh build |
| Plan shows all tasks complete | Done | Archive or iterate |
# What phase are we in?
[ -z "$(ls specs/ 2>/dev/null)" ] && echo "Phase 1: Need specs" && exit
[ ! -f IMPLEMENTATION_PLAN.md ] && echo "Phase 2: Need plan" && exit
echo "Phase 3: Ready to build (or done)"
JTBD Breakdown
The hierarchy matters:
JTBD (Job to Be Done)
└── Topic of Concern (1 per spec file)
└── Tasks (many per topic, in IMPLEMENTATION_PLAN.md)
Example:
- JTBD: "Help designers create mood boards"
- Topics:
specs/image-collection.md- Color extraction →
specs/color-extraction.md- Layout system →
specs/layout-system.md- Sharing →
specs/sharing.md- Tasks: Each spec generates multiple implementation tasks
Topic Scope Test
Can you describe the topic in one sentence without "and"?
If you need "and" or "also", it's probably multiple topics. Split it.
When to split:
- Multiple verbs in the description → separate topics
- Different user personas involved → separate topics
- Could be implemented by different teams → separate topics
- Has its own failure modes → probably its own topic
Example split:
❌ "User management handles registration, authentication, profiles, and permissions"
✅ Split into:
- "Registration creates new user accounts from email/password"
- "Authentication verifies user identity via login flow"
- "Profiles let users view and edit their information"
- "Permissions control what actions users can perform"
Counter-example (don't split):
✅ Keep together:
"Color extraction analyzes images and returns dominant color palettes"
Why: "analyzes" and "returns" are steps in one operation, not separate concerns.
Backpressure Mechanisms
Autonomous loops converge when wrong outputs get rejected. Three layers:
1. Downstream Gates (Hard)
Tests, type-checking, linting, build validation. Deterministic.# In AGENTS.md
## Validation
- Tests: `npm test`
- Typecheck: `npm run typecheck`
- Lint: `npm run lint`
2. Upstream Steering (Soft)
Existing code patterns guide the agent. It discovers conventions through exploration.3. LLM-as-Judge (Subjective)
For subjective criteria (tone, UX, aesthetics), use another LLM call with binary pass/fail.Start with hard gates. Add LLM-as-judge for subjective criteria only after mechanical backpressure works.
Prompt Structure
Geoffrey's prompts follow a numbered pattern:
| Section | Purpose |
| 0a-0d | Orient: Study specs, source, current plan |
| 1-4 | Main instructions: What to do this iteration |
| 999+ | Guardrails: Invariants (higher number = more critical) |
The Numbered Guardrails Pattern
Guardrails use escalating numbers (99999, 999999, 9999999...) to signal priority:
99999. Important: Capture the why in documentation.
999999. Important: Single sources of truth, no migrations.
9999999. Create git tags after successful builds.
99999999. Add logging if needed to debug.
999999999. Keep IMPLEMENTATION_PLAN.md current.
Why this works:
The "Important:" prefix is deliberate — it triggers Claude's attention.
Key Language Patterns
Use Geoffrey's specific phrasing — it matters:
- "study" (not "read" or "look at")
- "don't assume not implemented" (critical!)
- "using parallel subagents" / "up to N subagents"
- "only 1 subagent for build/tests" (backpressure control)
- "Ultrathink" (deep reasoning trigger)
- "capture the why"
- "keep it up to date"
- "resolve them or document them"
Quick Start
1. Set Up Project Structure
mkdir -p myproject/specs
cd myproject
git init # Ralph expects git for commits
# Copy templates
cp .//templates/PROMPT_plan.md .
cp .//templates/PROMPT_build.md .
cp .//templates/AGENTS.md .
cp .//templates/loop.sh .
chmod +x loop.sh
2. Customize Templates (Required!)
PROMPT_plan.md — Replace [PROJECT_GOAL] with your actual goal:
# Before:
ULTIMATE GOAL: We want to achieve [PROJECT_GOAL].
# After:
ULTIMATE GOAL: We want to achieve a fully functional mood board app with image upload and color extraction.
PROMPT_build.md — Adjust source paths if not using src/:
# Before:
0c. For reference, the application source code is in `src/*`.
# After:
0c. For reference, the application source code is in `lib/*`.
AGENTS.md — Update build/test/lint commands for your stack.
3. Phase 1: Requirements Gathering (Don't Skip!)
This phase happens WITH the human. Use the interview template:
cat .//templates/requirements-interview.md
The workflow:
specs/topic-name.mdExample output:
specs/
├── image-collection.md
├── color-extraction.md
├── layout-system.md
└── sharing.md
4. Phase 2: Planning
./loop.sh plan
Wait for IMPLEMENTATION_PLAN.md to be generated (usually 1-2 iterations). Review it — this is your task list.
5. Phase 3: Building
./loop.sh build 20 # Max 20 iterations
Watch it work. Add backpressure (tests, lints) as patterns emerge. Check commits for progress.
Loop Script Options
./loop.sh # Build mode, unlimited
./loop.sh 20 # Build mode, max 20 iterations
./loop.sh plan # Plan mode, unlimited
./loop.sh plan 5 # Plan mode, max 5 iterations
Or use the Node.js wrapper for more control:
node skills/ralph-loops/scripts/ralph-loop.mjs \
--prompt "./PROMPT_build.md" \
--model opus \
--max 20 \
--done "RALPH_DONE"
When to Regenerate the Plan
Plans drift. Regenerate when:
- Ralph is going off track (implementing wrong things)
- Plan feels stale or doesn't match current state
- Too much clutter from completed items
- You've made significant spec changes
- You're confused about what's actually done
./loop.sh plan
Regeneration cost is one Planning loop. Cheap compared to Ralph going in circles.
Safety
Ralph requires --dangerously-skip-permissions to run autonomously. This bypasses Claude's permission system entirely.
Philosophy: "It's not if it gets popped, it's when. And what is the blast radius?"
Protections:
- Run in isolated environments (Docker, VM)
- Only the API keys needed for the task
- No access to private data beyond requirements
- Restrict network connectivity where possible
- Escape hatches: Ctrl+C stops the loop;
git reset --hardreverts uncommitted changes
Cost Expectations
| Task Type | Model | Iterations | Est. Cost |
| Generate plan | Opus | 1-2 | $0.50-1.00 |
| Implement simple feature | Opus | 3-5 | $1.00-2.00 |
| Implement complex feature | Opus | 10-20 | $3.00-8.00 |
| Full project buildout | Opus | 50+ | $15-50+ |
Real-World Results
From Geoffrey Huntley:
- 6 repos generated overnight at YC hackathon
- $50k contract completed for $297 in API costs
- Created entire programming language over 3 months
Advanced: Running as Sub-Agent
For long loops, spawn as sub-agent so main session stays responsive:
sessions_spawn({
task: `cd /path/to/project && ./loop.sh build 20
Summarize what was implemented when done.`,
label: "ralph-build",
model: "opus"
})
Check progress:
sessions_list({ kinds: ["spawn"] })
sessions_history({ label: "ralph-build", limit: 5 })
Troubleshooting
Ralph keeps implementing the same thing
- Plan is stale → regenerate with
./loop.sh plan - Backpressure missing → add tests that catch duplicates
Ralph goes in circles
- Add more specific guardrails to prompts
- Check if specs are ambiguous
- Regenerate plan
Context getting bloated
- Ensure one task per iteration (check prompt)
- Keep AGENTS.md under 60 lines
- Move status/progress to IMPLEMENTATION_PLAN.md, not AGENTS.md
Tests not running
- Check AGENTS.md has correct validation commands
- Ensure backpressure section in prompt references AGENTS.md
Edge Cases
Projects Without Git
The loop script expects git for commits and pushes. For projects without version control:
Option 1: Initialize git anyway (recommended)
git init
git add -A
git commit -m "Initial commit before Ralph"
Option 2: Modify the prompts
- Remove git-related guardrails from PROMPT_build.md
- Remove the git push section from loop.sh
- Use file backups instead: add
cp -r src/ backups/iteration-$ITERATION/to loop.sh
Option 3: Use tarball snapshots
# Add to loop.sh before each iteration:
tar -czf "snapshots/pre-iteration-$ITERATION.tar.gz" src/
Very Large Codebases
For codebases with 100K+ lines:
- Reduce subagent parallelism: Change "up to 500 parallel Sonnet subagents" to "up to 50" in prompts
- Scope narrowly: Use focused specs that target specific directories
- Add path restrictions: In AGENTS.md, note which directories are in-scope
- Consider workspace splitting: Treat large modules as separate Ralph projects
When Claude CLI Isn't Available
The methodology works with any Claude interface:
Claude API directly:
# Replace loop.sh with API calls using curl or a script
curl \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "content-type: application/json" \
-d '{"model": "claude-sonnet-4-20250514", "max_tokens": 8192, "messages": [...]}'
Alternative agents:
- Aider:
aider --opus --auto-commits - Continue.dev: Use with Claude API key
- Cursor: Composer mode with PROMPT files as context
The key principles (one task per iteration, fresh context, backpressure) apply regardless of tooling.
Non-Node.js Projects
Adapt AGENTS.md for your stack:
| Stack | Build | Test | Lint |
| Python | pip install -e . | pytest | ruff . |
| Go | go build ./... | go test ./... | golangci-lint run |
| Rust | cargo build | cargo test | cargo clippy |
| Ruby | bundle install | rspec | rubocop |
src/* → your source directory).
Learn More
- Geoffrey Huntley:
- Clayton Farr's Playbook:
- Geoffrey's Fork:
Credits
Built by Johnathan & Q — a human-AI dyad.
- Twitter: [@spacepixel]()
- ClawdHub: [clawhub.ai/skills/ralph-loops]()