Image & Video GenerationDocumentedScanned

ai-video-gen

End-to-end AI video generation - create videos from text prompts using image generation, video synthesis, voice-over.

Share:

Installation

npx clawhub@latest install ai-video-gen

View the full skill documentation and source below.

Documentation

AI Video Generation Skill

Generate complete videos from text descriptions using AI.

Capabilities

  • Image Generation - DALL-E 3, Stable Diffusion, Flux

  • Video Generation - LumaAI, Runway, Replicate models

  • Voice-over - OpenAI TTS, ElevenLabs

  • Video Editing - FFmpeg assembly, transitions, overlays
  • Quick Start

    # Generate a complete video
    python skills/ai-video-gen/generate_video.py --prompt "A sunset over mountains" --output sunset.mp4
    
    # Just images to video
    python skills/ai-video-gen/images_to_video.py --images img1.png img2.png --output result.mp4
    
    # Add voiceover
    python skills/ai-video-gen/add_voiceover.py --video input.mp4 --text "Your narration" --output final.mp4

    Setup

    Required API Keys

    Add to your environment or .env file:

    # Image Generation (pick one)
    OPENAI_API_KEY=sk-...              # DALL-E 3
    REPLICATE_API_TOKEN=r8_...         # Stable Diffusion, Flux
    
    # Video Generation (pick one)
    LUMAAI_API_KEY=luma_...           # LumaAI Dream Machine
    RUNWAY_API_KEY=...                # Runway ML
    REPLICATE_API_TOKEN=r8_...        # Multiple models
    
    # Voice (optional)
    OPENAI_API_KEY=sk-...             # OpenAI TTS
    ELEVENLABS_API_KEY=...            # ElevenLabs
    
    # Or use FREE local options (no API needed)

    Install Dependencies

    pip install openai requests pillow replicate python-dotenv

    FFmpeg

    Already installed via winget.

    Usage Examples

    1. Text to Video (Full Pipeline)

    python skills/ai-video-gen/generate_video.py \
      --prompt "A futuristic city at night with flying cars" \
      --duration 5 \
      --voiceover "Welcome to the future" \
      --output future_city.mp4

    2. Multiple Scenes

    python skills/ai-video-gen/multi_scene.py \
      --scenes "Morning sunrise" "Busy city street" "Peaceful night" \
      --duration 3 \
      --output day_in_life.mp4

    3. Image Sequence to Video

    python skills/ai-video-gen/images_to_video.py \
      --images frame1.png frame2.png frame3.png \
      --fps 24 \
      --output animation.mp4

    Workflow Options

    Budget Mode (FREE)

    • Image: Stable Diffusion (local or free API)
    • Video: Open source models
    • Voice: OpenAI TTS (cheap) or free TTS
    • Edit: FFmpeg

    Quality Mode (Paid)

    • Image: DALL-E 3 or Midjourney
    • Video: Runway Gen-3 or LumaAI
    • Voice: ElevenLabs
    • Edit: FFmpeg + effects

    Scripts Reference

    • generate_video.py - Main end-to-end generator
    • images_to_video.py - Convert image sequence to video
    • add_voiceover.py - Add narration to existing video
    • multi_scene.py - Create multi-scene videos
    • edit_video.py - Apply effects, transitions, overlays

    API Cost Estimates

    • DALL-E 3: ~$0.04-0.08 per image
    • Replicate: ~$0.01-0.10 per generation
    • LumaAI: $0-0.50 per 5sec (free tier available)
    • Runway: ~$0.05 per second
    • OpenAI TTS: ~$0.015 per 1K characters
    • ElevenLabs: ~$0.30 per 1K characters (better quality)

    Examples

    See examples/ folder for sample outputs and prompts.