Cookbook Audit: Comprehensive Notebook Review and Quality Assurance
The cookbook audit skill from Anthropic Cookbooks provides AI assistants with capabilities to review Anthropic Cookbook notebooks against comprehensive rubrics and style guidelines. Rather than subjective critiques, this skill delivers structured audits evaluating narrative quality, code quality, technical accuracy, and actionability using consistent scoring criteria.
What This Skill Does
This skill conducts multi-dimensional audits of notebook content, combining automated technical checks with manual review against established style guidelines. It evaluates how well notebooks teach through demonstration, whether code follows best practices, if learning objectives map to conclusions, and whether content builds agency rather than just showing steps.
The audit process integrates automated validation (checking for hardcoded API keys, deprecated patterns, missing dependencies) with qualitative assessment of narrative structure, code presentation, and educational effectiveness. Automated scripts catch technical issues like credential exposure or invalid model names, while manual review evaluates whether introductions hook with problems, code blocks have explanatory text, and conclusions provide actionable guidance.
Scoring uses a 20-point scale across four dimensions: narrative quality (5 points), code quality (5 points), technical accuracy (5 points), and actionability/understanding (5 points). Each dimension has specific criteria preventing subjective assessment. The skill generates detailed reports identifying strengths, critical issues, and prioritized recommendations with concrete examples.
Getting Started
Before auditing any notebook, read the style guide at style_guide.md. This canonical document contains templates, good/bad examples, and detailed standards for introductions, prerequisites, code presentation, and conclusions. The style guide provides the baseline against which notebooks are evaluated.
The audit workflow begins by identifying the target notebook, then running automated validation with validate_notebook.py. This Python script checks technical requirements, scans for hardcoded credentials using detect-secrets with custom patterns, and generates clean markdown output in the tmp/ folder (which is gitignored to avoid committing review artifacts).
The markdown conversion excludes cell outputs while preserving code and explanatory text, making manual review more efficient than reading raw Jupyter notebook JSON. This markdown serves as the primary review artifact, saving context and improving readability during evaluation.
Key Features
Automated Technical Validation: The validation script catches common issues before human review—hardcoded API keys, deprecated API patterns, invalid model names, missing dependency specifications, verbose output not suppressed with %%capture, and improper environment variable handling.
Credential Scanning: Integration with detect-secrets prevents accidental credential exposure. Custom patterns defined in scripts/detect-secrets/plugins.py check against baselines, ensuring notebooks don't leak API keys or secrets.
Style Guide Compliance: Comprehensive guidelines cover problem-first introductions with Terminal Learning Objectives (TLOs) and Enabling Learning Objectives (ELOs), prerequisite and setup patterns, core content structure with explanatory text before/after code blocks, and conclusions mapping back to learning objectives.
Structured Scoring: The 20-point scoring system with four 5-point dimensions provides objective evaluation. Narrative quality assesses whether notebooks lead with problems rather than machinery. Code quality checks explanatory text, comment quality, and best practices. Technical accuracy verifies executability and API pattern validity. Actionability measures whether users can apply learning beyond the specific example.
Comprehensive Reporting: Audit outputs include executive summaries with overall scores and key insights, detailed dimension-by-dimension scoring with justifications, specific recommendations prioritized by impact, and concrete examples showing improvements with line references to style guide templates.
Usage Examples
When reviewing a notebook about agentic workflows, the audit checks whether the introduction hooks with the problem (teams spending hours on manual tasks) rather than leading with machinery (building an agent system). If the intro lists SDK methods instead of explaining value delivered, the audit flags this as a narrative quality issue and references style guide examples showing problem-first framing.
For code quality assessment, the skill verifies every code block has explanatory text preceding it describing what's about to happen, and text following it explaining what was learned. Blocks without context receive deductions. Comments explaining "what" the code does (when code should be self-documenting) are flagged versus comments explaining "why" particular approaches were chosen.
Technical accuracy evaluation checks whether notebooks run without modification (except API keys), use current API patterns, reference valid model names (claude-sonnet-4-5, claude-haiku-4-5, claude-opus-4-5), define model names as constants at the top, and suppress noisy output appropriately.
Best Practices
Always read the style guide first, even if you think you know the standards. The style guide evolves, contains updated templates, and provides the exact examples to reference in audit feedback. Skipping this step leads to inconsistent reviews.
Run automated validation before manual review. Catching technical issues programmatically saves human review time for qualitative assessment. Fix automated findings first, then evaluate narrative and educational effectiveness.
Use the markdown output for manual review rather than raw notebook JSON. The cleaned markdown focuses attention on content and structure without distraction from output cell metadata or encoding artifacts.
Provide specific examples when reporting issues. Don't just say "introduction is weak"—show the problematic text, explain why it fails (leads with machinery instead of problem), and provide concrete improvement suggestions referencing style guide templates.
Score each dimension independently before calculating overall scores. This prevents halo effects where strong narrative quality influences technical accuracy scoring. Evaluate objectively against rubric criteria for each dimension.
When to Use This Skill
Use this skill when reviewing Anthropic Cookbook submissions before publication. Every notebook should undergo audit to maintain consistent quality and educational effectiveness across the cookbook collection.
The skill is valuable when onboarding new cookbook contributors. Running audits on early submissions with detailed feedback helps contributors understand standards and improve future work without extensive back-and-forth.
It's ideal for periodic quality reviews of existing notebooks. As standards evolve and API patterns change, auditing published notebooks identifies content needing updates to maintain currency and consistency.
When NOT to Use This Skill
Don't use this skill for reviewing non-Anthropic cookbook content. The style guidelines and rubrics are specific to Anthropic's educational philosophy—problem-first framing, agency-building, learning contracts. Other documentation may have different valid approaches.
Avoid using it as the sole quality gate. While the skill catches many issues, human judgment about domain-specific technical accuracy, appropriate complexity for target audiences, and pedagogical effectiveness remains essential.
It's not appropriate for draft notebooks in early development stages. Let authors develop complete first drafts before formal audits. Premature review on incomplete work wastes effort and discourages experimentation.
Don't expect the skill to make content decisions. It evaluates how well notebooks execute chosen approaches against standards but doesn't determine whether the subject matter is appropriate or the examples are the best choices for teaching particular concepts.
Related Skills
This skill complements skill-creator for developing new skills with quality standards, doc for document review workflows, and jupyter-notebook for working with notebook formats.
Source
This skill is maintained by Anthropic Cookbooks. View on GitHub