Clawdbot ToolsDocumentedScanned

skill-evaluator

Evaluate Clawdbot skills for quality, reliability, and publish-readiness using a multi-framework rubric (ISO 25010.

Share:

Installation

npx clawhub@latest install skill-evaluator

View the full skill documentation and source below.

Documentation

Skill Evaluator

Evaluate skills across 25 criteria using a hybrid automated + manual approach.

Quick Start

1. Run automated checks

python3 scripts/eval-skill.py /path/to/skill
python3 scripts/eval-skill.py /path/to/skill --json    # machine-readable
python3 scripts/eval-skill.py /path/to/skill --verbose  # show all details

Checks: file structure, frontmatter, description quality, script syntax, dependency audit, credential scan, env var documentation.

2. Manual assessment

Use the rubric at references/rubric.md to score 25 criteria across 8 categories (0–4 each, 100 total). Each criterion has concrete descriptions per score level.

3. Write the evaluation

Copy assets/EVAL-TEMPLATE.md to the skill directory as EVAL.md. Fill in automated results + manual scores.

Evaluation Process

  • Run eval-skill.py — get the automated structural score

  • Read the skill's SKILL.md — understand what it does

  • Read/skim the scripts — assess code quality, error handling, testability

  • Score each manual criterion using references/rubric.md — concrete criteria per level

  • Prioritize findings as P0 (blocks publishing) / P1 (should fix) / P2 (nice to have)

  • Write EVAL.md in the skill directory with scores + findings
  • Categories (8 categories, 25 criteria)

    #CategorySource FrameworkCriteria
    1Functional SuitabilityISO 25010Completeness, Correctness, Appropriateness
    2ReliabilityISO 25010Fault Tolerance, Error Reporting, Recoverability
    3Performance / ContextISO 25010 + AgentToken Cost, Execution Efficiency
    4Usability — AI AgentShneiderman, Gerhardt-PowalsLearnability, Consistency, Feedback, Error Prevention
    5Usability — HumanTognazzini, NormanDiscoverability, Forgiveness
    6SecurityISO 25010 + OpenSSFCredentials, Input Validation, Data Safety
    7MaintainabilityISO 25010Modularity, Modifiability, Testability
    8Agent-SpecificNovelTrigger Precision, Progressive Disclosure, Composability, Idempotency, Escape Hatches

    Interpreting Scores

    RangeVerdictAction
    90–100ExcellentPublish confidently
    80–89GoodPublishable, note known issues
    70–79AcceptableFix P0s before publishing
    60–69Needs WorkFix P0+P1 before publishing
    <60Not ReadySignificant rework needed

    Deeper Security Scanning

    This evaluator covers security basics (credentials, input validation, data safety) but for thorough security audits of skills under development, consider [SkillLens]() (npx skilllens scan ). It checks for exfiltration, code execution, persistence, privilege bypass, and prompt injection — complementary to the quality focus here.

    Dependencies

    • Python 3.6+ (for eval-skill.py)
    • PyYAML (pip install pyyaml) — for frontmatter parsing in automated checks