Productivity & TasksDocumentedScanned

audiopod

Use AudioPod AI's API for audio processing tasks including AI music generation (text-to-music, text-to-rap, instrumen.

Share:

Installation

npx clawhub@latest install audiopod

View the full skill documentation and source below.

Documentation

AudioPod AI

Full audio processing API: music generation, stem separation, TTS, noise reduction, transcription, speaker separation, wallet management.

Setup

pip install audiopod  # Python
npm install audiopod  # Node.js

Auth: set AUDIOPOD_API_KEY env var or pass to client constructor.

Getting an API Key

  • Sign up at (free, no credit card required)
  • Go to
  • Click "Create API Key" and copy the key (starts with ap_)
  • Add funds to your wallet at (pay-as-you-go, no subscription)
  • from audiopod import AudioPod
    client = AudioPod()  # uses AUDIOPOD_API_KEY env var
    # or: client = AudioPod(api_key="ap_...")

    AI Music Generation

    Generate songs, rap, instrumentals, samples, and vocals from text prompts.

    Tasks: text2music (song with vocals), text2rap (rap), prompt2instrumental (instrumental), lyric2vocals (vocals only), text2samples (loops/samples), audio2audio (style transfer), songbloom

    Python SDK

    # Generate a full song with lyrics
    result = client.music.song(
        prompt="Upbeat pop, synth, drums, 120 bpm, female vocals, radio-ready",
        lyrics="Verse 1:\nWalking down the street on a sunny day\n\nChorus:\nWe're on fire tonight!",
        duration=60
    )
    print(result["output_url"])
    
    # Generate rap
    result = client.music.rap(
        prompt="Lo-Fi Hip Hop, 100 BPM, male rap, melancholy, keyboard chords",
        lyrics="Verse 1:\nStarted from the bottom, now we climbing...",
        duration=60
    )
    
    # Generate instrumental (no lyrics needed)
    result = client.music.instrumental(
        prompt="Atmospheric ambient soundscape, uplifting, driving mood",
        duration=30
    )
    
    # Generic generate with explicit task
    result = client.music.generate(
        prompt="Electronic dance music, high energy",
        task="text2samples",  # any task type
        duration=30
    )
    
    # Async: submit then poll
    job = client.music.create(
        prompt="Chill lofi beat", 
        duration=30, 
        task="prompt2instrumental"
    )
    result = client.music.wait_for_completion(job["id"], timeout=600)
    
    # Get available genre presets
    presets = client.music.get_presets()
    
    # List/manage jobs
    jobs = client.music.list(skip=0, limit=50)
    job = client.music.get(job_id=123)
    client.music.delete(job_id=123)

    cURL

    # Song with lyrics
    curl -X POST "" \
      -H "X-API-Key: $AUDIOPOD_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{"prompt":"upbeat pop, synth, 120bpm, female vocals", "lyrics":"Walking down the street...", "audio_duration":60}'
    
    # Rap
    curl -X POST "" \
      -H "X-API-Key: $AUDIOPOD_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{"prompt":"Lo-Fi Hip Hop, male rap, 100 BPM", "lyrics":"Started from the bottom...", "audio_duration":60}'
    
    # Instrumental
    curl -X POST "" \
      -H "X-API-Key: $AUDIOPOD_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{"prompt":"ambient soundscape, uplifting", "audio_duration":30}'
    
    # Samples/loops
    curl -X POST "" \
      -H "X-API-Key: $AUDIOPOD_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{"prompt":"drum loop, sad mood", "audio_duration":15}'
    
    # Vocals only
    curl -X POST "" \
      -H "X-API-Key: $AUDIOPOD_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{"prompt":"clean vocals, happy", "lyrics":"Eternal chorus of unity...", "audio_duration":30}'
    
    # Check job status / get result
    curl "" \
      -H "X-API-Key: $AUDIOPOD_API_KEY"
    
    # Get genre presets
    curl "" \
      -H "X-API-Key: $AUDIOPOD_API_KEY"
    
    # List jobs
    curl "" \
      -H "X-API-Key: $AUDIOPOD_API_KEY"
    
    # Delete job
    curl -X DELETE "" \
      -H "X-API-Key: $AUDIOPOD_API_KEY"

    Parameters

    FieldRequiredDescription
    promptyesStyle/genre description
    lyricsfor song/rap/vocalsSong lyrics with verse/chorus structure
    audio_durationnoDuration in seconds (default: 30)
    genre_presetnoGenre preset name (from presets endpoint)
    display_namenoTrack display name

    Stem Separation

    Split audio into individual instrument/vocal tracks.

    Modes

    ModeStemsOutputUse Case
    single1Specified stem onlyVocal isolation, drum extraction
    two2vocals + instrumentalKaraoke tracks
    four4vocals, drums, bass, otherStandard remixing (default)
    six6+ guitar, pianoFull instrument separation
    producer8+ kick, snare, hihatBeat production
    studio12+ cymbals, sub_bass, synthProfessional mixing
    mastering16Maximum detailForensic analysis
    Single stem options: vocals, drums, bass, guitar, piano, other

    Python SDK

    # Sync: extract and wait for result
    result = client.stems.separate(
        url="",
        mode="six",
        timeout=600
    )
    for stem, url in result["download_urls"].items():
        print(f"{stem}: {url}")
    
    # From local file
    result = client.stems.separate(file="/path/to/song.mp3", mode="four")
    
    # Single stem extraction
    result = client.stems.separate(
        url="",
        mode="single",
        stem="vocals"
    )
    
    # Async: submit then poll
    job = client.stems.extract(url="", mode="six")
    print(f"Job ID: {job['id']}")
    status = client.stems.status(job["id"])
    # or wait:
    result = client.stems.wait_for_completion(job["id"], timeout=600)
    
    # List available modes
    modes = client.stems.modes()
    
    # Job management
    jobs = client.stems.list(skip=0, limit=50, status="COMPLETED")
    job = client.stems.get(job_id=1234)
    client.stems.delete(job_id=1234)

    cURL

    # Extract from URL
    curl -X POST "" \
      -H "X-API-Key: $AUDIOPOD_API_KEY" \
      -F "url=" \
      -F "mode=six"
    
    # Extract from file
    curl -X POST "" \
      -H "X-API-Key: $AUDIOPOD_API_KEY" \
      -F "file=@/path/to/song.mp3" \
      -F "mode=four"
    
    # Single stem
    curl -X POST "" \
      -H "X-API-Key: $AUDIOPOD_API_KEY" \
      -F "url=URL" \
      -F "mode=single" \
      -F "stem=vocals"
    
    # Check job status
    curl "" \
      -H "X-API-Key: $AUDIOPOD_API_KEY"
    
    # List available modes
    curl "" \
      -H "X-API-Key: $AUDIOPOD_API_KEY"
    
    # List jobs (filter by status: PENDING, PROCESSING, COMPLETED, FAILED)
    curl "" \
      -H "X-API-Key: $AUDIOPOD_API_KEY"
    
    # Get specific job
    curl "" \
      -H "X-API-Key: $AUDIOPOD_API_KEY"
    
    # Delete job
    curl -X DELETE "" \
      -H "X-API-Key: $AUDIOPOD_API_KEY"

    Response Format

    {
      "id": 1234,
      "status": "COMPLETED",
      "download_urls": {
        "vocals": "",
        "drums": "",
        "bass": "",
        "other": ""
      },
      "quality_scores": {
        "vocals": 0.95,
        "drums": 0.88
      }
    }

    Text to Speech

    Generate speech from text with 50+ voices in 60+ languages. Supports voice cloning.

    Voice Types

    • 50+ production-ready voices — multilingual, supporting 60+ languages with auto-detection
    • Custom clones — clone any voice with ~5 seconds of audio sample

    Python SDK

    # Generate speech and wait for result
    result = client.voice.generate(
        text="Hello, world! This is a test.",
        voice_id=123,
        speed=1.0
    )
    print(result["output_url"])
    
    # Async: submit then poll
    job = client.voice.speak(
        text="Hello world",
        voice_id=123,
        speed=1.0
    )
    status = client.voice.get_job(job["id"])
    result = client.voice.wait_for_completion(job["id"], timeout=300)
    
    # List all available voices
    voices = client.voice.list()
    for v in voices:
        print(f"{v['id']}: {v['name']}")
    
    # Clone a voice (needs ~5 sec audio sample)
    new_voice = client.voice.create(
        name="My Voice Clone",
        audio_file="./sample.mp3",
        description="Cloned from recording"
    )
    
    # Get/delete voice
    voice = client.voice.get(voice_id=123)
    client.voice.delete(voice_id=123)

    cURL (Raw HTTP — most reliable)

    # List all voices
    curl "" \
      -H "X-API-Key: $AUDIOPOD_API_KEY"
    
    # Generate speech (FORM DATA, not JSON!)
    curl -X POST "" \
      -H "Authorization: Bearer $AUDIOPOD_API_KEY" \
      -d "input_text=Hello world, this is a test" \
      -d "audio_format=mp3" \
      -d "speed=1.0"
    
    # Poll job status
    curl "" \
      -H "Authorization: Bearer $AUDIOPOD_API_KEY"
    
    # SDK-style endpoints (alternative)
    # Generate via SDK endpoint
    curl -X POST "" \
      -H "X-API-Key: $AUDIOPOD_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{"text":"Hello world","voice_id":123,"speed":1.0}'
    
    # Poll via SDK endpoint
    curl "" \
      -H "X-API-Key: $AUDIOPOD_API_KEY"
    
    # List voices (SDK endpoint)
    curl "" \
      -H "X-API-Key: $AUDIOPOD_API_KEY"
    
    # Clone a voice
    curl -X POST "" \
      -H "X-API-Key: $AUDIOPOD_API_KEY" \
      -F "name=My Voice" \
      -F "file=@sample.mp3" \
      -F "description=Cloned voice"
    
    # Delete voice
    curl -X DELETE "" \
      -H "X-API-Key: $AUDIOPOD_API_KEY"

    Generate Parameters

    FieldRequiredDescription
    input_textyesText to speak (max 5000 chars). Use input_text for raw HTTP, text for SDK
    audio_formatnomp3, wav, ogg (default: mp3)
    speedno0.25 - 4.0 (default: 1.0)
    languagenoISO code, auto-detected if omitted

    Response Format

    // Generate response
    {"job_id": 12345, "status": "pending", "credits_reserved": 25}
    
    // Status response (completed)
    {"status": "completed", "output_url": ""}

    Important Notes

    • Raw HTTP generate endpoint uses form data, not JSON. Field is input_text not text
    • SDK endpoint (/api/v1/voice/tts/generate) uses JSON with field text
    • Output files may be WAV disguised as .mp3 — convert with ffmpeg -i output.mp3 -c:a aac real.m4a
    • ~55 credits per generation, wallet-based billing

    Speaker Separation

    Separate audio by speaker with automatic diarization.

    Python SDK

    # Diarize and wait for result
    result = client.speaker.identify(
        file="./meeting.mp3",
        num_speakers=3,  # optional hint for accuracy
        timeout=600
    )
    for segment in result["segments"]:
        print(f"Speaker {segment['speaker']}: {segment['text']} [{segment['start']:.1f}s - {segment['end']:.1f}s]")
    
    # From URL
    result = client.speaker.identify(
        url="",
        num_speakers=2
    )
    
    # Async: submit then poll
    job = client.speaker.diarize(
        file="./meeting.mp3",
        num_speakers=3
    )
    result = client.speaker.wait_for_completion(job["id"], timeout=600)
    
    # Job management
    jobs = client.speaker.list(skip=0, limit=50, status="COMPLETED")
    job = client.speaker.get(job_id=123)
    client.speaker.delete(job_id=123)

    cURL

    # Diarize from file
    curl -X POST "" \
      -H "X-API-Key: $AUDIOPOD_API_KEY" \
      -F "file=@meeting.mp3" \
      -F "num_speakers=3"
    
    # Diarize from URL
    curl -X POST "" \
      -H "X-API-Key: $AUDIOPOD_API_KEY" \
      -F "url=" \
      -F "num_speakers=2"
    
    # Check job status
    curl "" \
      -H "X-API-Key: $AUDIOPOD_API_KEY"
    
    # List jobs
    curl "" \
      -H "X-API-Key: $AUDIOPOD_API_KEY"
    
    # Delete job
    curl -X DELETE "" \
      -H "X-API-Key: $AUDIOPOD_API_KEY"

    Speech to Text (Transcription)

    Transcribe audio/video with speaker diarization, word-level timestamps, and multiple output formats.

    Python SDK

    # Transcribe URL and wait
    result = client.transcription.transcribe(
        url="",
        speaker_diarization=True,
        min_speakers=2,
        max_speakers=5,
        timeout=600
    )
    print(f"Language: {result['detected_language']}")
    for seg in result["segments"]:
        print(f"[{seg['start']:.1f}s] {seg.get('speaker','?')}: {seg['text']}")
    
    # Batch: multiple URLs at once
    result = client.transcription.transcribe(
        urls=["", ""],
        speaker_diarization=True
    )
    
    # Upload local file
    job = client.transcription.upload(
        file_path="./recording.mp3",
        language="en",
        speaker_diarization=True
    )
    result = client.transcription.wait_for_completion(job["id"], timeout=600)
    
    # Async: submit then poll
    job = client.transcription.create(
        url="",
        language="en",
        speaker_diarization=True,
        word_timestamps=True,
        min_speakers=2,
        max_speakers=4
    )
    result = client.transcription.wait_for_completion(job["id"], timeout=600)
    
    # Get transcript in different formats
    transcript_json = client.transcription.get_transcript(job_id=123, format="json")
    transcript_srt = client.transcription.get_transcript(job_id=123, format="srt")
    transcript_vtt = client.transcription.get_transcript(job_id=123, format="vtt")
    transcript_txt = client.transcription.get_transcript(job_id=123, format="txt")
    
    # Job management
    jobs = client.transcription.list(skip=0, limit=50, status="COMPLETED")
    job = client.transcription.get(job_id=123)
    client.transcription.delete(job_id=123)

    cURL

    # Transcribe from URL
    curl -X POST "" \
      -H "X-API-Key: $AUDIOPOD_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{"url":"","enable_speaker_diarization":true,"word_timestamps":true}'
    
    # Transcribe multiple URLs
    curl -X POST "" \
      -H "X-API-Key: $AUDIOPOD_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{"urls":["URL1","URL2"],"enable_speaker_diarization":true}'
    
    # Upload file for transcription
    curl -X POST "" \
      -H "X-API-Key: $AUDIOPOD_API_KEY" \
      -F "files=@recording.mp3" \
      -F "language=en" \
      -F "enable_speaker_diarization=true"
    
    # Get job status
    curl "" \
      -H "X-API-Key: $AUDIOPOD_API_KEY"
    
    # Get transcript in specific format (json, srt, vtt, txt)
    curl "" \
      -H "X-API-Key: $AUDIOPOD_API_KEY"
    
    # List jobs
    curl "" \
      -H "X-API-Key: $AUDIOPOD_API_KEY"
    
    # Delete job
    curl -X DELETE "" \
      -H "X-API-Key: $AUDIOPOD_API_KEY"

    Parameters

    FieldRequiredDescription
    url / urlsyes (or file)URL(s) to transcribe (YouTube, SoundCloud, direct links)
    languagenoISO 639-1 code (auto-detected if omitted)
    enable_speaker_diarizationnoEnable speaker identification (default: false)
    min_speakers / max_speakersnoSpeaker count hints for better diarization
    word_timestampsnoEnable word-level timestamps (default: true)

    Output Formats

    • json — Full structured output with segments, timestamps, speakers
    • srt — SubRip subtitle format
    • vtt — WebVTT subtitle format
    • txt — Plain text transcript

    Noise Reduction

    Remove background noise from audio/video files.

    Python SDK

    # Denoise and wait for result
    result = client.denoiser.denoise(file="./noisy-audio.mp3", timeout=600)
    print(f"Clean audio: {result['output_url']}")
    
    # From URL
    result = client.denoiser.denoise(url="")
    
    # Async: submit then poll
    job = client.denoiser.create(file="./noisy-audio.mp3")
    result = client.denoiser.wait_for_completion(job["id"], timeout=600)
    
    # From URL (async)
    job = client.denoiser.create(url="")
    
    # Job management
    jobs = client.denoiser.list(skip=0, limit=50, status="COMPLETED")
    job = client.denoiser.get(job_id=123)
    client.denoiser.delete(job_id=123)

    cURL

    # Denoise from file
    curl -X POST "" \
      -H "X-API-Key: $AUDIOPOD_API_KEY" \
      -F "file=@noisy-audio.mp3"
    
    # Denoise from URL
    curl -X POST "" \
      -H "X-API-Key: $AUDIOPOD_API_KEY" \
      -F "url="
    
    # Check job status
    curl "" \
      -H "X-API-Key: $AUDIOPOD_API_KEY"
    
    # List jobs
    curl "" \
      -H "X-API-Key: $AUDIOPOD_API_KEY"
    
    # Delete job
    curl -X DELETE "" \
      -H "X-API-Key: $AUDIOPOD_API_KEY"

    Wallet & Billing

    Check balance, estimate costs, and view usage history.

    Python SDK

    # Get current balance
    balance = client.wallet.get_balance()
    print(f"Balance: ${balance['balance_usd']}")
    
    # Check if balance is sufficient for an operation
    check = client.wallet.check_balance(
        service_type="stem_extraction",
        duration_seconds=180
    )
    print(f"Sufficient: {check['sufficient']}")
    
    # Estimate cost before running
    estimate = client.wallet.estimate_cost(
        service_type="transcription",
        duration_seconds=300
    )
    print(f"Cost: ${estimate['cost_usd']}")
    
    # Get pricing for all services
    pricing = client.wallet.get_pricing()
    
    # View usage history
    usage = client.wallet.get_usage(page=1, limit=50)

    cURL

    # Get balance
    curl "" \
      -H "X-API-Key: $AUDIOPOD_API_KEY"
    
    # Check balance sufficiency
    curl -X POST "" \
      -H "X-API-Key: $AUDIOPOD_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{"service_type":"stem_extraction","duration_seconds":180}'
    
    # Estimate cost
    curl -X POST "" \
      -H "X-API-Key: $AUDIOPOD_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{"service_type":"transcription","duration_seconds":300}'
    
    # Get pricing
    curl "" \
      -H "X-API-Key: $AUDIOPOD_API_KEY"
    
    # Usage history
    curl "" \
      -H "X-API-Key: $AUDIOPOD_API_KEY"

    API Endpoint Summary

    ServiceEndpointMethod
    Music/api/v1/music/{task}POST
    Music jobs/api/v1/music/jobs/{id}GET/DELETE
    Music presets/api/v1/music/presetsGET
    Stems/api/v1/stem-extraction/api/extractPOST (multipart)
    Stems status/api/v1/stem-extraction/status/{id}GET
    Stems modes/api/v1/stem-extraction/modesGET
    Stems jobs/api/v1/stem-extraction/jobsGET
    TTS generate/api/v1/voice/voices/{uuid}/generatePOST (form data)
    TTS generate (SDK)/api/v1/voice/tts/generatePOST (JSON)
    TTS status/api/v1/voice/tts-jobs/{id}/statusGET
    TTS status (SDK)/api/v1/voice/tts/status/{id}GET
    Voice list/api/v1/voice/voice-profilesGET
    Voice list (SDK)/api/v1/voice/voicesGET
    Speaker/api/v1/speaker/diarizePOST (multipart)
    Speaker jobs/api/v1/speaker/jobs/{id}GET/DELETE
    Transcribe URL/api/v1/transcribe/transcribePOST (JSON)
    Transcribe upload/api/v1/transcribe/transcribe-uploadPOST (multipart)
    Transcript output/api/v1/transcribe/jobs/{id}/transcript?format=GET
    Transcribe jobs/api/v1/transcribe/jobsGET
    Denoise/api/v1/denoiser/denoisePOST (multipart)
    Denoise jobs/api/v1/denoiser/jobs/{id}GET/DELETE
    Wallet balance/api/v1/api-wallet/balanceGET
    Wallet pricing/api/v1/api-wallet/pricingGET
    Wallet usage/api/v1/api-wallet/usageGET

    Auth Headers

    Two auth styles work:

    • X-API-Key: ap_... — works for most endpoints

    • Authorization: Bearer ap_... — works for TTS generate/status


    Known Issues

    • SDK method signatures may differ from raw API — when in doubt, use cURL examples
    • TTS output stored on Cloudflare R2, download via output_url in job status
    • TTS output files may be WAV disguised as .mp3 — convert with ffmpeg before sending via WhatsApp