Shopping & E-commerceDocumentedScanned

clawdbites

Extract recipes from Instagram reels.

Share:

Installation

npx clawhub@latest install clawdbites

View the full skill documentation and source below.

Documentation

Instagram Recipe Extractor

Extract recipes from Instagram reels using a multi-layered approach:

  • Caption parsing — Instant, check description first

  • Audio transcription — Whisper (local, no API key)

  • Frame analysis — Vision model for on-screen text
  • No Instagram login required. Works on public reels.

    When to Use

    • User sends an Instagram reel link
    • User mentions "recipe from Instagram" or "save this reel"
    • User wants to extract recipe details from a video post

    How It Works (MANDATORY FLOW)

    ALWAYS follow this complete flow — do not stop after caption if instructions are missing:

  • User sends Instagram reel URL

  • Extract metadata using yt-dlp (--dump-json)

  • Parse the caption for recipe details

  • Check completeness: Does caption have BOTH ingredients AND instructions?

  • - ✅ YES: Present the recipe
    - ❌ NO (missing instructions or incomplete): Automatically proceed to audio transcription — do NOT stop or ask the user
  • If audio transcription needed:

  • - Download video: yt-dlp -o "/tmp/reel.mp4" "URL"
    - Extract audio: ffmpeg -y -i /tmp/reel.mp4 -vn -acodec pcm_s16le -ar 16000 -ac 1 /tmp/reel.wav
    - Transcribe: whisper /tmp/reel.wav --model base --output_format txt --output_dir /tmp
    - Merge caption ingredients with audio instructions
  • Present clean, formatted recipe (combining caption + audio as needed)

  • User decides what to do (save to notes, add to wishlist, etc.)
  • Completeness check heuristics:

    • Has ingredients = contains 3+ quantity+item patterns (e.g., "1 cup flour", "2 lbs chicken")

    • Has instructions = contains action verbs (blend, cook, bake, mix, pour, add) + sequence OR numbered steps


    Extraction Command

    yt-dlp --dump-json "" 2>/dev/null

    Key fields from JSON output:

    • description — The caption containing the recipe

    • uploader — Creator's name

    • channel — Creator's handle

    • webpage_url — Original URL

    • like_count — Popularity indicator


    Recipe Parsing

    Look for these patterns in the caption:

    Macros:

    • "X Calories | Xg P | Xg C | Xg F"

    • "Macros per serving"

    • "Cal/Protein/Carbs/Fat"


    Ingredients:
    • Lines starting with quantities (1 cup, 2 tbsp, 24oz)

    • Lines with measurement units

    • Emoji bullet points (🥩 🌽 🧀 etc.)


    Sections:
    • "For the [component]:"

    • "Ingredients:"

    • "Instructions:"

    • "Directions:"


    Output Format

    Present extracted recipe cleanly:

    ## [Recipe Name]
    *From @[handle]*
    
    **Macros (per serving):** X cal | Xg P | Xg C | Xg F
    
    ### Ingredients
    - [ingredient 1]
    - [ingredient 2]
    ...
    
    ### Instructions
    1. [step 1]
    2. [step 2]
    ...
    
    ---
    Source: [original URL]

    User Actions After Extraction

    Let the user decide what to do:

    • "Save to my recipes" → Save to Apple Notes (if meal-planner skill available)

    • "Add to wishlist" → Save to memory/recipe-wishlist.json

    • "Just show me" → Display only, no save

    • "Plan this for next week" → Hand off to meal-planner skill


    Wishlist Storage

    Optional storage for recipes user wants to try later:

    memory/recipe-wishlist.json:

    {
      "recipes": [
        {
          "name": "Recipe Name",
          "source": "instagram",
          "sourceUrl": "",
          "handle": "@creator",
          "addedDate": "2026-01-26",
          "tried": false,
          "macros": {
            "calories": 585,
            "protein": 56,
            "carbs": 25,
            "fat": 28,
            "servings": 3
          },
          "ingredients": [...],
          "instructions": [...]
        }
      ]
    }

    Error Handling

    If yt-dlp fails:

    • Check if URL is valid Instagram reel format

    • May be a private account — inform user

    • Suggest user paste caption text manually as fallback


    If no recipe found in caption (IMPORTANT):

    After extracting, scan the caption for recipe indicators:

    • Ingredient quantities (numbers + units like oz, cups, tbsp, lbs)

    • Recipe sections ("For the...", "Ingredients:", "Instructions:")

    • Cooking verbs (bake, cook, sauté, mix, combine)

    • Macro information (calories, protein, carbs, fat)


    If none found, tell the user clearly:

    "I pulled the caption but it doesn't look like the recipe is there — it might just be a teaser or the recipe is only shown in the video itself. Here's what the caption says:
    [show caption]
    A few options:
    1. Check the comments — sometimes creators post recipes there
    2. Check their bio link — might lead to the full recipe
    3. Describe what you saw in the video and I can help find a similar recipe"

    Recipe detection heuristics:

    HAS_RECIPE if caption contains:
    - 3+ ingredient-like patterns (quantity + food item)
    - OR "recipe" + ingredient list
    - OR macro breakdown + ingredients
    - OR numbered/bulleted instructions
    
    NO_RECIPE if caption is:
    - Mostly hashtags
    - Just a description/teaser
    - Under 100 characters
    - No quantities or measurements

    Integration with meal-planner

    The meal-planner skill can reference this skill:

    • When planning meals, check wishlist for untried recipes

    • Suggest wishlist recipes that match pantry items

    • Mark recipes as "tried" after they're used in a meal plan


    Audio Transcription (V2) — MANDATORY FALLBACK

    When caption is missing instructions, ALWAYS transcribe the audio automatically. Do not stop and ask the user — just do it. This is the most common case since creators often put ingredients in captions but speak the instructions.

    Step 1: Download video

    yt-dlp -o "/tmp/reel.mp4" ""

    Step 2: Extract audio

    ffmpeg -i /tmp/reel.mp4 -vn -acodec pcm_s16le -ar 16000 -ac 1 /tmp/reel.wav

    Step 3: Transcribe with Whisper

    /Users/kylekirkland/Library/Python/3.14/bin/whisper /tmp/reel.wav --model base --output_format txt --output_dir /tmp

    Step 4: Parse transcript for recipe
    Look for cooking instructions, ingredients mentioned verbally.

    Inference for Missing Measurements

    ALWAYS infer quantities when not provided. Never present a recipe without amounts — estimate based on context and standard package sizes.

    Vague Language → Specific Amounts

    What they sayInfer
    "some chicken"~1 lb
    "a bit of garlic"2-3 cloves
    "handful of spinach"~2 cups
    "drizzle of oil"1-2 tbsp
    "season to taste"½ tsp salt, ¼ tsp pepper
    "splash of soy sauce"1-2 tbsp
    "a few tablespoons"2-3 tbsp
    "some rice"1 cup dry
    "cheese on top"½ - 1 cup shredded
    "diced onion"1 medium onion
    "bell peppers"2 peppers

    Standard Package Sizes (when item mentioned without amount)

    IngredientStandard PackageInfer
    Puff pastry17oz sheet1 sheet
    Ground beef/turkey1 lb pack1 lb
    Chicken breast~1.5 lb pack1.5 lbs
    Sausage links14oz / 4-5 links1 package
    Bacon12oz / 12 slices½ package (6 slices)
    Shredded cheese8oz bag1-2 cups
    Tortillas8-10 count1 package
    Canned beans15oz can1 can
    Broth/stock32oz carton1-2 cups
    Pasta16oz box8oz (half box)
    Rice2 lb bag1-2 cups dry

    Context-Aware Scaling

    By recipe type:

    • Stir fry for 2 → 1 lb protein, 4 cups veggies

    • Soup/stew → 1.5-2 lbs protein, 4 cups broth

    • Sheet pan meal → 1.5 lbs protein, 3-4 cups veggies

    • Appetizers → smaller portions, estimate ~12-15 pieces per batch


    By servings mentioned:
    • "Serves 4" → Scale standard amounts for 4

    • "Meal prep for the week" → Assume 5-8 servings

    • No servings mentioned → Default to 4 servings


    By protein target (if user has macro goals):
    • 40-50g protein per serving → ~6-8oz cooked meat per portion

    • Scale recipe protein accordingly


    Output Format

    Always present inferred amounts clearly:

    ### Ingredients
    - 1 lb ground turkey *(estimated)*
    - 1 medium onion, diced *(estimated)*
    - 2 cups broth *(estimated based on typical soup)*

    Mark inferred quantities with (estimated) so user knows what came from the source vs inference.

    Combined Extraction Flow

    1. TRY CAPTION (instant)
       └── yt-dlp --dump-json → parse description
       └── Recipe found? → DONE ✅
       └── Check for "pinned" / "in comments" / "check comments" → FLAG
       
    2. IF FLAGGED: CHECK FOR CREATOR COMMENT
       └── Look through comments for creator's username
       └── If creator comment found with recipe → DONE ✅
       └── If not found → continue + notify user
    
    3. TRY AUDIO (30-60 sec)
       └── Download video
       └── Extract audio with ffmpeg
       └── Transcribe with Whisper (base model)
       └── Parse transcript for recipe
       └── Infer missing measurements
       └── Recipe found? → DONE ✅
    
    4. PRESENT RESULTS + PROMPT IF NEEDED
       └── Show what was extracted from audio
       └── If "pinned" was flagged, tell user:
           "The creator mentioned the full recipe is pinned in the comments.
            I extracted what I could from the audio, but if you want the 
            exact measurements, paste the pinned comment here and I'll 
            merge it with what I found."
       
    5. TRY FRAME ANALYSIS (if audio incomplete)
       └── Extract 5-8 key frames with ffmpeg
       └── Send to Claude vision
       └── Ask: "Extract any recipe text, ingredients, or measurements shown"
       └── Merge findings with audio transcript
       
    6. FALLBACK (nothing found)
       └── Inform user: "Recipe wasn't in caption or audio/video"
       └── Offer: search for similar recipe based on video title/description

    Frame Analysis

    Extract key frames and analyze with vision model.

    Extract frames:

    # Extract 1 frame every 5 seconds
    ffmpeg -i /tmp/reel.mp4 -vf "fps=1/5" /tmp/frame_%02d.jpg
    
    # Or extract specific number of frames evenly distributed
    ffmpeg -i /tmp/reel.mp4 -vf "select='not(mod(n,30))'" -vsync vfr /tmp/frame_%02d.jpg

    Send to vision model:
    Use Claude's image analysis to read each frame:

    • Recipe cards / title screens

    • Ingredient lists shown on screen

    • Measurements in text overlays

    • Step-by-step instructions displayed


    Vision prompt:
    Analyze this frame from a cooking video. Extract any:
    - Recipe name or title
    - Ingredients with quantities
    - Cooking instructions
    - Nutritional information / macros
    - Any other recipe-related text shown
    
    If no recipe text is visible, respond with "No recipe text found."

    Merge strategy:

    • Audio transcript = primary source (spoken instructions)

    • Frame analysis = supplement (exact measurements, recipe cards)

    • Combine both, prefer specific measurements from visual over inferred from audio


    Pinned Comment Detection

    Scan caption for these phrases (case-insensitive):

    • "recipe pinned"

    • "pinned in comments"

    • "check comments"

    • "in the comments"

    • "comment below"

    • "recipe below"

    • "full recipe in comments"


    If detected, flag and notify user after extraction:

    "Heads up — the creator said the recipe is pinned in the comments.
    I got what I could from the audio, but yt-dlp can't access pinned comments
    without login. If you want the exact recipe, copy the pinned comment and
    send it to me — I'll format it properly."

    Requirements

    • yt-dlpbrew install yt-dlp
    • ffmpegbrew install ffmpeg
    • whisperpip3 install openai-whisper (runs locally, no API key)
    • No Instagram login required for public reels