cron-retry
Auto-retry failed cron jobs on connection recovery.
Installation
npx clawhub@latest install cron-retryView the full skill documentation and source below.
Documentation
Cron Retry Skill
Automatically detect and retry cron jobs that failed due to network/connection errors.
Quick Start (Heartbeat Integration)
Add this to your HEARTBEAT.md:
## Cron Recovery Check
Check for cron jobs with lastStatus: "error". If the error matches network patterns (connection error, sendMessage failed, fetch failed, ETIMEDOUT, ECONNREFUSED), retry the job using cron tool with action: "run" and the job ID. Report what was recovered.
That's it. On each heartbeat, failed network jobs get retried automatically.
How It Works
cron listlastStatus = "error" and enabled = truelastError matches network-related patternscron runNetwork Error Patterns (Retryable)
These errors indicate transient network issues worth retrying:
Network request.*failedConnection errorECONNREFUSEDETIMEDOUTENOTFOUNDsendMessage.*failedfetch failedsocket hang up
What Gets Retried vs Skipped
Retried:
- Network timeouts
- Connection refused
- Message send failures
- DNS lookup failures
Skipped (not retried):
- Logic errors (bad config, missing data)
- Auth failures
- Disabled jobs
- Jobs that just ran successfully
Manual Recovery Check
To check and retry failed jobs manually:
# List all jobs and their status
clawdbot cron list
# Find failed jobs
clawdbot cron list | jq '.jobs[] | select(.state.lastStatus == "error") | {name, error: .state.lastError}'
# Retry a specific job
clawdbot cron run --id <JOB_ID>
Agent Implementation
When implementing the heartbeat check:
1. Call cron tool with action: "list"
2. For each job in response.jobs:
- Skip if job.enabled !== true
- Skip if job.state.lastStatus !== "error"
- Check if job.state.lastError matches network patterns
- If retryable: call cron tool with action: "run", jobId: job.id
3. Report: "Recovered X jobs" or "No failed jobs to recover"
Example Scenario
lastStatus: "error", lastError: "Network request for 'sendMessage' failed!"Safety
- Only retries transient network errors
- Respects job enabled state
- Won't create retry loops (checks lastRunAtMs)
- Reports all recovery attempts