Self-Hosted & AutomationDocumentedScanned

cron-retry

Auto-retry failed cron jobs on connection recovery.

Share:

Installation

npx clawhub@latest install cron-retry

View the full skill documentation and source below.

Documentation

Cron Retry Skill

Automatically detect and retry cron jobs that failed due to network/connection errors.

Quick Start (Heartbeat Integration)

Add this to your HEARTBEAT.md:

## Cron Recovery Check
Check for cron jobs with lastStatus: "error". If the error matches network patterns (connection error, sendMessage failed, fetch failed, ETIMEDOUT, ECONNREFUSED), retry the job using cron tool with action: "run" and the job ID. Report what was recovered.

That's it. On each heartbeat, failed network jobs get retried automatically.

How It Works

  • On heartbeat, check all cron jobs via cron list

  • Filter for jobs where lastStatus = "error" and enabled = true

  • Check if lastError matches network-related patterns

  • Re-run eligible jobs via cron run

  • Report results
  • Network Error Patterns (Retryable)

    These errors indicate transient network issues worth retrying:

    • Network request.*failed
    • Connection error
    • ECONNREFUSED
    • ETIMEDOUT
    • ENOTFOUND
    • sendMessage.*failed
    • fetch failed
    • socket hang up

    What Gets Retried vs Skipped

    Retried:

    • Network timeouts

    • Connection refused

    • Message send failures

    • DNS lookup failures


    Skipped (not retried):
    • Logic errors (bad config, missing data)

    • Auth failures

    • Disabled jobs

    • Jobs that just ran successfully


    Manual Recovery Check

    To check and retry failed jobs manually:

    # List all jobs and their status
    clawdbot cron list
    
    # Find failed jobs
    clawdbot cron list | jq '.jobs[] | select(.state.lastStatus == "error") | {name, error: .state.lastError}'
    
    # Retry a specific job
    clawdbot cron run --id <JOB_ID>

    Agent Implementation

    When implementing the heartbeat check:

    1. Call cron tool with action: "list"
    2. For each job in response.jobs:
       - Skip if job.enabled !== true
       - Skip if job.state.lastStatus !== "error"
       - Check if job.state.lastError matches network patterns
       - If retryable: call cron tool with action: "run", jobId: job.id
    3. Report: "Recovered X jobs" or "No failed jobs to recover"

    Example Scenario

  • 7:00 PM — Evening briefing cron fires

  • Network hiccup — Telegram send fails

  • Job marked lastStatus: "error", lastError: "Network request for 'sendMessage' failed!"

  • 7:15 PM — Connection restored, heartbeat runs

  • Skill detects the failed job, sees it's a network error

  • Retries the job → briefing delivered

  • Reports: "Recovered 1 job: evening-wrap-briefing"
  • Safety

    • Only retries transient network errors
    • Respects job enabled state
    • Won't create retry loops (checks lastRunAtMs)
    • Reports all recovery attempts