discord-voice
Real-time voice conversations in Discord with Claude AI.
Installation
npx clawhub@latest install discord-voiceView the full skill documentation and source below.
Documentation
Discord Voice Plugin for Clawdbot
Real-time voice conversations in Discord voice channels. Join a voice channel, speak, and have your words transcribed, processed by Claude, and spoken back.
Features
- Join/Leave Voice Channels: Via slash commands, CLI, or agent tool
- Voice Activity Detection (VAD): Automatically detects when users are speaking
- Speech-to-Text: Whisper API (OpenAI) or Deepgram
- Streaming STT: Real-time transcription with Deepgram WebSocket (~1s latency reduction)
- Agent Integration: Transcribed speech is routed through the Clawdbot agent
- Text-to-Speech: OpenAI TTS or ElevenLabs
- Audio Playback: Responses are spoken back in the voice channel
- Barge-in Support: Stops speaking immediately when user starts talking
- Auto-reconnect: Automatic heartbeat monitoring and reconnection on disconnect
Requirements
- Discord bot with voice permissions (Connect, Speak, Use Voice Activity)
- API keys for STT and TTS providers
- System dependencies for voice:
ffmpeg (audio processing)
- Native build tools for @discordjs/opus and sodium-native
Installation
1. Install System Dependencies
# Ubuntu/Debian
sudo apt-get install ffmpeg build-essential python3
# Fedora/RHEL
sudo dnf install ffmpeg gcc-c++ make python3
# macOS
brew install ffmpeg
2. Install via ClawdHub
clawdhub install discord-voice
Or manually:
cd ~/.clawdbot/extensions
git clone <repository-url> discord-voice
cd discord-voice
npm install
3. Configure in clawdbot.json
{
"plugins": {
"entries": {
"discord-voice": {
"enabled": true,
"config": {
"sttProvider": "whisper",
"ttsProvider": "openai",
"ttsVoice": "nova",
"vadSensitivity": "medium",
"allowedUsers": [], // Empty = allow all users
"silenceThresholdMs": 1500,
"maxRecordingMs": 30000,
"openai": {
"apiKey": "sk-..." // Or use OPENAI_API_KEY env var
}
}
}
}
}
}
4. Discord Bot Setup
Ensure your Discord bot has these permissions:
- Connect - Join voice channels
- Speak - Play audio
- Use Voice Activity - Detect when users speak
Add these to your bot's OAuth2 URL or configure in Discord Developer Portal.
Configuration
| Option | Type | Default | Description |
enabled | boolean | true | Enable/disable the plugin |
sttProvider | string | "whisper" | "whisper" or "deepgram" |
streamingSTT | boolean | true | Use streaming STT (Deepgram only, ~1s faster) |
ttsProvider | string | "openai" | "openai" or "elevenlabs" |
ttsVoice | string | "nova" | Voice ID for TTS |
vadSensitivity | string | "medium" | "low", "medium", or "high" |
bargeIn | boolean | true | Stop speaking when user talks |
allowedUsers | string[] | [] | User IDs allowed (empty = all) |
silenceThresholdMs | number | 1500 | Silence before processing (ms) |
maxRecordingMs | number | 30000 | Max recording length (ms) |
heartbeatIntervalMs | number | 30000 | Connection health check interval |
autoJoinChannel | string | undefined | Channel ID to auto-join on startup |
Provider Configuration
OpenAI (Whisper + TTS)
{
"openai": {
"apiKey": "sk-...",
"whisperModel": "whisper-1",
"ttsModel": "tts-1"
}
}
ElevenLabs (TTS only)
{
"elevenlabs": {
"apiKey": "...",
"voiceId": "21m00Tcm4TlvDq8ikWAM", // Rachel
"modelId": "eleven_multilingual_v2"
}
}
Deepgram (STT only)
{
"deepgram": {
"apiKey": "...",
"model": "nova-2"
}
}
Usage
Slash Commands (Discord)
Once registered with Discord, use these commands:
/voice join- Join a voice channel/voice leave- Leave the current voice channel/voice status- Show voice connection status
CLI Commands
# Join a voice channel
clawdbot voice join <channelId>
# Leave voice
clawdbot voice leave --guild <guildId>
# Check status
clawdbot voice status
Agent Tool
The agent can use the discord_voice tool:
Join voice channel 1234567890
The tool supports actions:
join- Join a voice channel (requires channelId)leave- Leave voice channelspeak- Speak text in the voice channelstatus- Get current voice status
How It Works
Streaming STT (Deepgram)
When using Deepgram as your STT provider, streaming mode is enabled by default. This provides:
- ~1 second faster end-to-end latency
- Real-time feedback with interim transcription results
- Automatic keep-alive to prevent connection timeouts
- Fallback to batch transcription if streaming fails
{
"sttProvider": "deepgram",
"streamingSTT": true, // default
"deepgram": {
"apiKey": "...",
"model": "nova-2"
}
}
Barge-in Support
When enabled (default), the bot will immediately stop speaking if a user starts talking. This creates a more natural conversational flow where you can interrupt the bot.
To disable (let the bot finish speaking):
{
"bargeIn": false
}
Auto-reconnect
The plugin includes automatic connection health monitoring:
- Heartbeat checks every 30 seconds (configurable)
- Auto-reconnect on disconnect with exponential backoff
- Max 3 attempts before giving up
[discord-voice] Disconnected from voice channel
[discord-voice] Reconnection attempt 1/3
[discord-voice] Reconnected successfully
VAD Sensitivity
- low: Picks up quiet speech, may trigger on background noise
- medium: Balanced (recommended)
- high: Requires louder, clearer speech
Troubleshooting
"Discord client not available"
Ensure the Discord channel is configured and the bot is connected before using voice.Opus/Sodium build errors
Install build tools:npm install -g node-gyp
npm rebuild @discordjs/opus sodium-native
No audio heard
Transcription not working
Enable debug logging
DEBUG=discord-voice clawdbot gateway start
Environment Variables
| Variable | Description |
DISCORD_TOKEN | Discord bot token (required) |
OPENAI_API_KEY | OpenAI API key (Whisper + TTS) |
ELEVENLABS_API_KEY | ElevenLabs API key |
DEEPGRAM_API_KEY | Deepgram API key |
Limitations
- Only one voice channel per guild at a time
- Maximum recording length: 30 seconds (configurable)
- Requires stable network for real-time audio
- TTS output may have slight delay due to synthesis