audiopod
Use AudioPod AI's API for audio processing tasks including AI music generation (text-to-music, text-to-rap, instrumen.
Installation
npx clawhub@latest install audiopodView the full skill documentation and source below.
Documentation
AudioPod AI
Full audio processing API: music generation, stem separation, TTS, noise reduction, transcription, speaker separation, wallet management.
Setup
pip install audiopod # Python
npm install audiopod # Node.js
Auth: set AUDIOPOD_API_KEY env var or pass to client constructor.
Getting an API Key
ap_)from audiopod import AudioPod
client = AudioPod() # uses AUDIOPOD_API_KEY env var
# or: client = AudioPod(api_key="ap_...")
AI Music Generation
Generate songs, rap, instrumentals, samples, and vocals from text prompts.
Tasks: text2music (song with vocals), text2rap (rap), prompt2instrumental (instrumental), lyric2vocals (vocals only), text2samples (loops/samples), audio2audio (style transfer), songbloom
Python SDK
# Generate a full song with lyrics
result = client.music.song(
prompt="Upbeat pop, synth, drums, 120 bpm, female vocals, radio-ready",
lyrics="Verse 1:\nWalking down the street on a sunny day\n\nChorus:\nWe're on fire tonight!",
duration=60
)
print(result["output_url"])
# Generate rap
result = client.music.rap(
prompt="Lo-Fi Hip Hop, 100 BPM, male rap, melancholy, keyboard chords",
lyrics="Verse 1:\nStarted from the bottom, now we climbing...",
duration=60
)
# Generate instrumental (no lyrics needed)
result = client.music.instrumental(
prompt="Atmospheric ambient soundscape, uplifting, driving mood",
duration=30
)
# Generic generate with explicit task
result = client.music.generate(
prompt="Electronic dance music, high energy",
task="text2samples", # any task type
duration=30
)
# Async: submit then poll
job = client.music.create(
prompt="Chill lofi beat",
duration=30,
task="prompt2instrumental"
)
result = client.music.wait_for_completion(job["id"], timeout=600)
# Get available genre presets
presets = client.music.get_presets()
# List/manage jobs
jobs = client.music.list(skip=0, limit=50)
job = client.music.get(job_id=123)
client.music.delete(job_id=123)
cURL
# Song with lyrics
curl -X POST "" \
-H "X-API-Key: $AUDIOPOD_API_KEY" \
-H "Content-Type: application/json" \
-d '{"prompt":"upbeat pop, synth, 120bpm, female vocals", "lyrics":"Walking down the street...", "audio_duration":60}'
# Rap
curl -X POST "" \
-H "X-API-Key: $AUDIOPOD_API_KEY" \
-H "Content-Type: application/json" \
-d '{"prompt":"Lo-Fi Hip Hop, male rap, 100 BPM", "lyrics":"Started from the bottom...", "audio_duration":60}'
# Instrumental
curl -X POST "" \
-H "X-API-Key: $AUDIOPOD_API_KEY" \
-H "Content-Type: application/json" \
-d '{"prompt":"ambient soundscape, uplifting", "audio_duration":30}'
# Samples/loops
curl -X POST "" \
-H "X-API-Key: $AUDIOPOD_API_KEY" \
-H "Content-Type: application/json" \
-d '{"prompt":"drum loop, sad mood", "audio_duration":15}'
# Vocals only
curl -X POST "" \
-H "X-API-Key: $AUDIOPOD_API_KEY" \
-H "Content-Type: application/json" \
-d '{"prompt":"clean vocals, happy", "lyrics":"Eternal chorus of unity...", "audio_duration":30}'
# Check job status / get result
curl "" \
-H "X-API-Key: $AUDIOPOD_API_KEY"
# Get genre presets
curl "" \
-H "X-API-Key: $AUDIOPOD_API_KEY"
# List jobs
curl "" \
-H "X-API-Key: $AUDIOPOD_API_KEY"
# Delete job
curl -X DELETE "" \
-H "X-API-Key: $AUDIOPOD_API_KEY"
Parameters
| Field | Required | Description |
| prompt | yes | Style/genre description |
| lyrics | for song/rap/vocals | Song lyrics with verse/chorus structure |
| audio_duration | no | Duration in seconds (default: 30) |
| genre_preset | no | Genre preset name (from presets endpoint) |
| display_name | no | Track display name |
Stem Separation
Split audio into individual instrument/vocal tracks.
Modes
| Mode | Stems | Output | Use Case |
| single | 1 | Specified stem only | Vocal isolation, drum extraction |
| two | 2 | vocals + instrumental | Karaoke tracks |
| four | 4 | vocals, drums, bass, other | Standard remixing (default) |
| six | 6 | + guitar, piano | Full instrument separation |
| producer | 8 | + kick, snare, hihat | Beat production |
| studio | 12 | + cymbals, sub_bass, synth | Professional mixing |
| mastering | 16 | Maximum detail | Forensic analysis |
Python SDK
# Sync: extract and wait for result
result = client.stems.separate(
url="",
mode="six",
timeout=600
)
for stem, url in result["download_urls"].items():
print(f"{stem}: {url}")
# From local file
result = client.stems.separate(file="/path/to/song.mp3", mode="four")
# Single stem extraction
result = client.stems.separate(
url="",
mode="single",
stem="vocals"
)
# Async: submit then poll
job = client.stems.extract(url="", mode="six")
print(f"Job ID: {job['id']}")
status = client.stems.status(job["id"])
# or wait:
result = client.stems.wait_for_completion(job["id"], timeout=600)
# List available modes
modes = client.stems.modes()
# Job management
jobs = client.stems.list(skip=0, limit=50, status="COMPLETED")
job = client.stems.get(job_id=1234)
client.stems.delete(job_id=1234)
cURL
# Extract from URL
curl -X POST "" \
-H "X-API-Key: $AUDIOPOD_API_KEY" \
-F "url=" \
-F "mode=six"
# Extract from file
curl -X POST "" \
-H "X-API-Key: $AUDIOPOD_API_KEY" \
-F "file=@/path/to/song.mp3" \
-F "mode=four"
# Single stem
curl -X POST "" \
-H "X-API-Key: $AUDIOPOD_API_KEY" \
-F "url=URL" \
-F "mode=single" \
-F "stem=vocals"
# Check job status
curl "" \
-H "X-API-Key: $AUDIOPOD_API_KEY"
# List available modes
curl "" \
-H "X-API-Key: $AUDIOPOD_API_KEY"
# List jobs (filter by status: PENDING, PROCESSING, COMPLETED, FAILED)
curl "" \
-H "X-API-Key: $AUDIOPOD_API_KEY"
# Get specific job
curl "" \
-H "X-API-Key: $AUDIOPOD_API_KEY"
# Delete job
curl -X DELETE "" \
-H "X-API-Key: $AUDIOPOD_API_KEY"
Response Format
{
"id": 1234,
"status": "COMPLETED",
"download_urls": {
"vocals": "",
"drums": "",
"bass": "",
"other": ""
},
"quality_scores": {
"vocals": 0.95,
"drums": 0.88
}
}
Text to Speech
Generate speech from text with 50+ voices in 60+ languages. Supports voice cloning.
Voice Types
- 50+ production-ready voices — multilingual, supporting 60+ languages with auto-detection
- Custom clones — clone any voice with ~5 seconds of audio sample
Python SDK
# Generate speech and wait for result
result = client.voice.generate(
text="Hello, world! This is a test.",
voice_id=123,
speed=1.0
)
print(result["output_url"])
# Async: submit then poll
job = client.voice.speak(
text="Hello world",
voice_id=123,
speed=1.0
)
status = client.voice.get_job(job["id"])
result = client.voice.wait_for_completion(job["id"], timeout=300)
# List all available voices
voices = client.voice.list()
for v in voices:
print(f"{v['id']}: {v['name']}")
# Clone a voice (needs ~5 sec audio sample)
new_voice = client.voice.create(
name="My Voice Clone",
audio_file="./sample.mp3",
description="Cloned from recording"
)
# Get/delete voice
voice = client.voice.get(voice_id=123)
client.voice.delete(voice_id=123)
cURL (Raw HTTP — most reliable)
# List all voices
curl "" \
-H "X-API-Key: $AUDIOPOD_API_KEY"
# Generate speech (FORM DATA, not JSON!)
curl -X POST "" \
-H "Authorization: Bearer $AUDIOPOD_API_KEY" \
-d "input_text=Hello world, this is a test" \
-d "audio_format=mp3" \
-d "speed=1.0"
# Poll job status
curl "" \
-H "Authorization: Bearer $AUDIOPOD_API_KEY"
# SDK-style endpoints (alternative)
# Generate via SDK endpoint
curl -X POST "" \
-H "X-API-Key: $AUDIOPOD_API_KEY" \
-H "Content-Type: application/json" \
-d '{"text":"Hello world","voice_id":123,"speed":1.0}'
# Poll via SDK endpoint
curl "" \
-H "X-API-Key: $AUDIOPOD_API_KEY"
# List voices (SDK endpoint)
curl "" \
-H "X-API-Key: $AUDIOPOD_API_KEY"
# Clone a voice
curl -X POST "" \
-H "X-API-Key: $AUDIOPOD_API_KEY" \
-F "name=My Voice" \
-F "file=@sample.mp3" \
-F "description=Cloned voice"
# Delete voice
curl -X DELETE "" \
-H "X-API-Key: $AUDIOPOD_API_KEY"
Generate Parameters
| Field | Required | Description |
| input_text | yes | Text to speak (max 5000 chars). Use input_text for raw HTTP, text for SDK |
| audio_format | no | mp3, wav, ogg (default: mp3) |
| speed | no | 0.25 - 4.0 (default: 1.0) |
| language | no | ISO code, auto-detected if omitted |
Response Format
// Generate response
{"job_id": 12345, "status": "pending", "credits_reserved": 25}
// Status response (completed)
{"status": "completed", "output_url": ""}
Important Notes
- Raw HTTP generate endpoint uses form data, not JSON. Field is
input_textnottext - SDK endpoint (
/api/v1/voice/tts/generate) uses JSON with fieldtext - Output files may be WAV disguised as .mp3 — convert with
ffmpeg -i output.mp3 -c:a aac real.m4a - ~55 credits per generation, wallet-based billing
Speaker Separation
Separate audio by speaker with automatic diarization.
Python SDK
# Diarize and wait for result
result = client.speaker.identify(
file="./meeting.mp3",
num_speakers=3, # optional hint for accuracy
timeout=600
)
for segment in result["segments"]:
print(f"Speaker {segment['speaker']}: {segment['text']} [{segment['start']:.1f}s - {segment['end']:.1f}s]")
# From URL
result = client.speaker.identify(
url="",
num_speakers=2
)
# Async: submit then poll
job = client.speaker.diarize(
file="./meeting.mp3",
num_speakers=3
)
result = client.speaker.wait_for_completion(job["id"], timeout=600)
# Job management
jobs = client.speaker.list(skip=0, limit=50, status="COMPLETED")
job = client.speaker.get(job_id=123)
client.speaker.delete(job_id=123)
cURL
# Diarize from file
curl -X POST "" \
-H "X-API-Key: $AUDIOPOD_API_KEY" \
-F "file=@meeting.mp3" \
-F "num_speakers=3"
# Diarize from URL
curl -X POST "" \
-H "X-API-Key: $AUDIOPOD_API_KEY" \
-F "url=" \
-F "num_speakers=2"
# Check job status
curl "" \
-H "X-API-Key: $AUDIOPOD_API_KEY"
# List jobs
curl "" \
-H "X-API-Key: $AUDIOPOD_API_KEY"
# Delete job
curl -X DELETE "" \
-H "X-API-Key: $AUDIOPOD_API_KEY"
Speech to Text (Transcription)
Transcribe audio/video with speaker diarization, word-level timestamps, and multiple output formats.
Python SDK
# Transcribe URL and wait
result = client.transcription.transcribe(
url="",
speaker_diarization=True,
min_speakers=2,
max_speakers=5,
timeout=600
)
print(f"Language: {result['detected_language']}")
for seg in result["segments"]:
print(f"[{seg['start']:.1f}s] {seg.get('speaker','?')}: {seg['text']}")
# Batch: multiple URLs at once
result = client.transcription.transcribe(
urls=["", ""],
speaker_diarization=True
)
# Upload local file
job = client.transcription.upload(
file_path="./recording.mp3",
language="en",
speaker_diarization=True
)
result = client.transcription.wait_for_completion(job["id"], timeout=600)
# Async: submit then poll
job = client.transcription.create(
url="",
language="en",
speaker_diarization=True,
word_timestamps=True,
min_speakers=2,
max_speakers=4
)
result = client.transcription.wait_for_completion(job["id"], timeout=600)
# Get transcript in different formats
transcript_json = client.transcription.get_transcript(job_id=123, format="json")
transcript_srt = client.transcription.get_transcript(job_id=123, format="srt")
transcript_vtt = client.transcription.get_transcript(job_id=123, format="vtt")
transcript_txt = client.transcription.get_transcript(job_id=123, format="txt")
# Job management
jobs = client.transcription.list(skip=0, limit=50, status="COMPLETED")
job = client.transcription.get(job_id=123)
client.transcription.delete(job_id=123)
cURL
# Transcribe from URL
curl -X POST "" \
-H "X-API-Key: $AUDIOPOD_API_KEY" \
-H "Content-Type: application/json" \
-d '{"url":"","enable_speaker_diarization":true,"word_timestamps":true}'
# Transcribe multiple URLs
curl -X POST "" \
-H "X-API-Key: $AUDIOPOD_API_KEY" \
-H "Content-Type: application/json" \
-d '{"urls":["URL1","URL2"],"enable_speaker_diarization":true}'
# Upload file for transcription
curl -X POST "" \
-H "X-API-Key: $AUDIOPOD_API_KEY" \
-F "files=@recording.mp3" \
-F "language=en" \
-F "enable_speaker_diarization=true"
# Get job status
curl "" \
-H "X-API-Key: $AUDIOPOD_API_KEY"
# Get transcript in specific format (json, srt, vtt, txt)
curl "" \
-H "X-API-Key: $AUDIOPOD_API_KEY"
# List jobs
curl "" \
-H "X-API-Key: $AUDIOPOD_API_KEY"
# Delete job
curl -X DELETE "" \
-H "X-API-Key: $AUDIOPOD_API_KEY"
Parameters
| Field | Required | Description |
| url / urls | yes (or file) | URL(s) to transcribe (YouTube, SoundCloud, direct links) |
| language | no | ISO 639-1 code (auto-detected if omitted) |
| enable_speaker_diarization | no | Enable speaker identification (default: false) |
| min_speakers / max_speakers | no | Speaker count hints for better diarization |
| word_timestamps | no | Enable word-level timestamps (default: true) |
Output Formats
- json — Full structured output with segments, timestamps, speakers
- srt — SubRip subtitle format
- vtt — WebVTT subtitle format
- txt — Plain text transcript
Noise Reduction
Remove background noise from audio/video files.
Python SDK
# Denoise and wait for result
result = client.denoiser.denoise(file="./noisy-audio.mp3", timeout=600)
print(f"Clean audio: {result['output_url']}")
# From URL
result = client.denoiser.denoise(url="")
# Async: submit then poll
job = client.denoiser.create(file="./noisy-audio.mp3")
result = client.denoiser.wait_for_completion(job["id"], timeout=600)
# From URL (async)
job = client.denoiser.create(url="")
# Job management
jobs = client.denoiser.list(skip=0, limit=50, status="COMPLETED")
job = client.denoiser.get(job_id=123)
client.denoiser.delete(job_id=123)
cURL
# Denoise from file
curl -X POST "" \
-H "X-API-Key: $AUDIOPOD_API_KEY" \
-F "file=@noisy-audio.mp3"
# Denoise from URL
curl -X POST "" \
-H "X-API-Key: $AUDIOPOD_API_KEY" \
-F "url="
# Check job status
curl "" \
-H "X-API-Key: $AUDIOPOD_API_KEY"
# List jobs
curl "" \
-H "X-API-Key: $AUDIOPOD_API_KEY"
# Delete job
curl -X DELETE "" \
-H "X-API-Key: $AUDIOPOD_API_KEY"
Wallet & Billing
Check balance, estimate costs, and view usage history.
Python SDK
# Get current balance
balance = client.wallet.get_balance()
print(f"Balance: ${balance['balance_usd']}")
# Check if balance is sufficient for an operation
check = client.wallet.check_balance(
service_type="stem_extraction",
duration_seconds=180
)
print(f"Sufficient: {check['sufficient']}")
# Estimate cost before running
estimate = client.wallet.estimate_cost(
service_type="transcription",
duration_seconds=300
)
print(f"Cost: ${estimate['cost_usd']}")
# Get pricing for all services
pricing = client.wallet.get_pricing()
# View usage history
usage = client.wallet.get_usage(page=1, limit=50)
cURL
# Get balance
curl "" \
-H "X-API-Key: $AUDIOPOD_API_KEY"
# Check balance sufficiency
curl -X POST "" \
-H "X-API-Key: $AUDIOPOD_API_KEY" \
-H "Content-Type: application/json" \
-d '{"service_type":"stem_extraction","duration_seconds":180}'
# Estimate cost
curl -X POST "" \
-H "X-API-Key: $AUDIOPOD_API_KEY" \
-H "Content-Type: application/json" \
-d '{"service_type":"transcription","duration_seconds":300}'
# Get pricing
curl "" \
-H "X-API-Key: $AUDIOPOD_API_KEY"
# Usage history
curl "" \
-H "X-API-Key: $AUDIOPOD_API_KEY"
API Endpoint Summary
| Service | Endpoint | Method |
| Music | /api/v1/music/{task} | POST |
| Music jobs | /api/v1/music/jobs/{id} | GET/DELETE |
| Music presets | /api/v1/music/presets | GET |
| Stems | /api/v1/stem-extraction/api/extract | POST (multipart) |
| Stems status | /api/v1/stem-extraction/status/{id} | GET |
| Stems modes | /api/v1/stem-extraction/modes | GET |
| Stems jobs | /api/v1/stem-extraction/jobs | GET |
| TTS generate | /api/v1/voice/voices/{uuid}/generate | POST (form data) |
| TTS generate (SDK) | /api/v1/voice/tts/generate | POST (JSON) |
| TTS status | /api/v1/voice/tts-jobs/{id}/status | GET |
| TTS status (SDK) | /api/v1/voice/tts/status/{id} | GET |
| Voice list | /api/v1/voice/voice-profiles | GET |
| Voice list (SDK) | /api/v1/voice/voices | GET |
| Speaker | /api/v1/speaker/diarize | POST (multipart) |
| Speaker jobs | /api/v1/speaker/jobs/{id} | GET/DELETE |
| Transcribe URL | /api/v1/transcribe/transcribe | POST (JSON) |
| Transcribe upload | /api/v1/transcribe/transcribe-upload | POST (multipart) |
| Transcript output | /api/v1/transcribe/jobs/{id}/transcript?format= | GET |
| Transcribe jobs | /api/v1/transcribe/jobs | GET |
| Denoise | /api/v1/denoiser/denoise | POST (multipart) |
| Denoise jobs | /api/v1/denoiser/jobs/{id} | GET/DELETE |
| Wallet balance | /api/v1/api-wallet/balance | GET |
| Wallet pricing | /api/v1/api-wallet/pricing | GET |
| Wallet usage | /api/v1/api-wallet/usage | GET |
Auth Headers
Two auth styles work:
X-API-Key: ap_...— works for most endpointsAuthorization: Bearer ap_...— works for TTS generate/status
Known Issues
- SDK method signatures may differ from raw API — when in doubt, use cURL examples
- TTS output stored on Cloudflare R2, download via
output_urlin job status - TTS output files may be WAV disguised as .mp3 — convert with ffmpeg before sending via WhatsApp