Image & Video GenerationDocumentedScanned

pollinations

Pollinations.ai: text, images, videos, audio with 25+ models.

Installation

npx clawhub@latest install pollinations

View the full skill documentation and source below.

Documentation

Pollinations 🧬

Unified AI platform for text, images, videos, and audio generation with 25+ models.

API Key

Get free or paid keys at

Secret Keys (sk_): Server-side, no rate limits (recommended)

Optional for many operations (free tier available)

Store key in environment variable:

export POLLINATIONS_API_KEY="sk_your_key_here"

Quick Start

Text Generation

Simple text generation:

curl ""

Chat completions (OpenAI-compatible):

curl -X POST  \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $POLLINATIONS_API_KEY" \
  -d '{
    "model": "openai",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Use script: scripts/chat.sh for easy chat completions

Image Generation

curl ""

Use script: scripts/image.sh for image generation

Audio Generation (TTS)

curl -X POST  \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai-audio",
    "messages": [
      {"role": "system", "content": "You are a text reader. Read the user text exactly without responding, adding conversation, or changing anything."},
      {"role": "user", "content": "Say: Hello world"}
    ],
    "modalities": ["text", "audio"],
    "audio": {"voice": "nova", "format": "mp3"}
  }'

Use script: scripts/tts.sh for text-to-speech

API Endpoints

Base URLs

Chat/Text: - Simple Text:
Image: - Video: (generates video)

Supported Operations

1. Text/Chat Generation

Models: OpenAI, Claude, Gemini, Mistral, DeepSeek, Grok, Qwen Coder, Perplexity, and 20+ more

Common models: openai, claude, gemini, mistral, deepseek, qwen, gpt-4, o1, o3

Parameters:

model (string): Model name/ID

messages (array): Chat messages with roles (system/user/assistant)

temperature (number): 0-2, default 1

max_tokens (number): Max response length

top_p (number): Nucleus sampling, default 1

seed (number): Reproducibility (-1 for random)

jsonMode (boolean): Force JSON response

reasoning_effort (string): For o1/o3/R1 (high/medium/low/minimal/none)

thinking_budget (number): Tokens for reasoning (thinking models)

Vision support: Include image_url in message content for multi-modal:

{
  "role": "user",
  "content": [
    {"type": "text", "text": "Describe this image"},
    {"type": "image_url", "image_url": {"url": ""}}
  ]
}

2. Image Generation

Models: flux (default), turbo, gptimage, kontext, seedream, nanobanana, nanobanana-pro

Parameters:

model (string): Model selection

width/height (number): 16-2048px, default 1024

seed (number): Reproducibility

negative_prompt (string): What to avoid

nologo (boolean): Remove watermark

private (boolean): Private generation

safe (boolean): Enable NSFW filter

enhance (boolean): AI prompt enhancement

quality (string): low/medium/high/hd (gptimage)

transparent (boolean): Transparent background (gptimage)

count (number): 1-4 images (premium)

image (string): Input image URL (image-to-image)

Format: Returns binary image data (determined by Content-Type header)

3. Image to Image

Use same image endpoint with image parameter:

4. Video Generation

Models: veo (4-8s), seedance (2-10s)

Parameters:

model (string): veo or seedance

width/height (number): Dimensions

duration (number): Seconds (veo: 4/6/8, seedance: 2-10)

aspectRatio (string): 16:9 or 9:16

audio (boolean): Enable audio (veo only)

image (string): Input image URL (frame interpolation: image[0]=first, image[1]=last)

negative_prompt (string): What to avoid

seed (number): Reproducibility

private/safe (boolean): Privacy/safety options

Format: Returns binary video data

5. Audio Generation (TTS)

Models: openai-audio

Voices: alloy, echo, fable, onyx, nova, shimmer, coral, verse, ballad, ash, sage, amuch, dan

Formats: mp3, wav, flac, opus, pcm16

Parameters:

model: openai-audio

modalities: ["text", "audio"]

audio.voice: Voice selection

audio.format: Output format

Note: Use "Say:" prefix in user message for direct text reading

6. Audio Transcription

Use chat completions endpoint with vision/audio-capable models:

Models: gemini, gemini-large, gemini-legacy, openai-audio

Upload audio file as binary input

Include transcription prompt in system message

7. Image Analysis

Use chat completions with vision models:

Models: Any vision-capable model (gemini, claude, openai)

Include image_url in message content

8. Video Analysis

Use chat completions with video-capable models:

Models: gemini, claude, openai

Upload video file as binary input

Include analysis prompt

Scripts

`scripts/chat.sh`

Interactive chat completions with model selection and options.

Usage:

scripts/chat.sh "your message here"
scripts/chat.sh "your message" --model claude --temp 0.7

`scripts/image.sh`

Generate images from text prompts.

Usage:

scripts/image.sh "a sunset over mountains"
scripts/image.sh "a sunset" --model flux --width 1024 --height 1024 --seed 123

`scripts/tts.sh`

Convert text to speech.

Usage:

scripts/tts.sh "Hello world"
scripts/tts.sh "Hello world" --voice nova --format mp3 --output hello.mp3

Tips

Free tier available: Many operations work without an API key (rate limited)

OpenAI-compatible: Use chat endpoint with existing OpenAI integrations

Reproducibility: Use seed parameter for consistent outputs

Image enhancement: Enable enhance=true for AI-improved prompts

Video interpolation: Pass two images with image[0]=first&image[1]=last for veo

Audio reading: Always use "Say:" prefix and proper system prompt for TTS

API Documentation

Full docs:

Back to Skills Directory

pollinations

Installation

Documentation

Pollinations 🧬

API Key

Quick Start

Text Generation

Image Generation

Audio Generation (TTS)

API Endpoints

Base URLs

Supported Operations

1. Text/Chat Generation

2. Image Generation

3. Image to Image

4. Video Generation

5. Audio Generation (TTS)

6. Audio Transcription

7. Image Analysis

8. Video Analysis

Scripts

`scripts/chat.sh`

`scripts/image.sh`

`scripts/tts.sh`

Tips

API Documentation

Related Skills in Image & Video Generation

afame

ai-video-gen

algorithmic-art

atxp

beauty-generation-api