CommunicationDocumentedScanned

discord-voice

Real-time voice conversations in Discord with Claude AI.

Share:

Installation

npx clawhub@latest install discord-voice

View the full skill documentation and source below.

Documentation

Discord Voice Plugin for Clawdbot

Real-time voice conversations in Discord voice channels. Join a voice channel, speak, and have your words transcribed, processed by Claude, and spoken back.

Features

  • Join/Leave Voice Channels: Via slash commands, CLI, or agent tool
  • Voice Activity Detection (VAD): Automatically detects when users are speaking
  • Speech-to-Text: Whisper API (OpenAI) or Deepgram
  • Streaming STT: Real-time transcription with Deepgram WebSocket (~1s latency reduction)
  • Agent Integration: Transcribed speech is routed through the Clawdbot agent
  • Text-to-Speech: OpenAI TTS or ElevenLabs
  • Audio Playback: Responses are spoken back in the voice channel
  • Barge-in Support: Stops speaking immediately when user starts talking
  • Auto-reconnect: Automatic heartbeat monitoring and reconnection on disconnect

Requirements

  • Discord bot with voice permissions (Connect, Speak, Use Voice Activity)
  • API keys for STT and TTS providers
  • System dependencies for voice:
- ffmpeg (audio processing) - Native build tools for @discordjs/opus and sodium-native

Installation

1. Install System Dependencies

# Ubuntu/Debian
sudo apt-get install ffmpeg build-essential python3

# Fedora/RHEL
sudo dnf install ffmpeg gcc-c++ make python3

# macOS
brew install ffmpeg

2. Install via ClawdHub

clawdhub install discord-voice

Or manually:

cd ~/.clawdbot/extensions
git clone <repository-url> discord-voice
cd discord-voice
npm install

3. Configure in clawdbot.json

{
  "plugins": {
    "entries": {
      "discord-voice": {
        "enabled": true,
        "config": {
          "sttProvider": "whisper",
          "ttsProvider": "openai",
          "ttsVoice": "nova",
          "vadSensitivity": "medium",
          "allowedUsers": [],  // Empty = allow all users
          "silenceThresholdMs": 1500,
          "maxRecordingMs": 30000,
          "openai": {
            "apiKey": "sk-..."  // Or use OPENAI_API_KEY env var
          }
        }
      }
    }
  }
}

4. Discord Bot Setup

Ensure your Discord bot has these permissions:

  • Connect - Join voice channels

  • Speak - Play audio

  • Use Voice Activity - Detect when users speak


Add these to your bot's OAuth2 URL or configure in Discord Developer Portal.

Configuration

OptionTypeDefaultDescription
enabledbooleantrueEnable/disable the plugin
sttProviderstring"whisper""whisper" or "deepgram"
streamingSTTbooleantrueUse streaming STT (Deepgram only, ~1s faster)
ttsProviderstring"openai""openai" or "elevenlabs"
ttsVoicestring"nova"Voice ID for TTS
vadSensitivitystring"medium""low", "medium", or "high"
bargeInbooleantrueStop speaking when user talks
allowedUsersstring[][]User IDs allowed (empty = all)
silenceThresholdMsnumber1500Silence before processing (ms)
maxRecordingMsnumber30000Max recording length (ms)
heartbeatIntervalMsnumber30000Connection health check interval
autoJoinChannelstringundefinedChannel ID to auto-join on startup

Provider Configuration

OpenAI (Whisper + TTS)

{
  "openai": {
    "apiKey": "sk-...",
    "whisperModel": "whisper-1",
    "ttsModel": "tts-1"
  }
}

ElevenLabs (TTS only)

{
  "elevenlabs": {
    "apiKey": "...",
    "voiceId": "21m00Tcm4TlvDq8ikWAM",  // Rachel
    "modelId": "eleven_multilingual_v2"
  }
}

Deepgram (STT only)

{
  "deepgram": {
    "apiKey": "...",
    "model": "nova-2"
  }
}

Usage

Slash Commands (Discord)

Once registered with Discord, use these commands:

  • /voice join - Join a voice channel

  • /voice leave - Leave the current voice channel

  • /voice status - Show voice connection status


CLI Commands

# Join a voice channel
clawdbot voice join <channelId>

# Leave voice
clawdbot voice leave --guild <guildId>

# Check status
clawdbot voice status

Agent Tool

The agent can use the discord_voice tool:

Join voice channel 1234567890

The tool supports actions:

  • join - Join a voice channel (requires channelId)

  • leave - Leave voice channel

  • speak - Speak text in the voice channel

  • status - Get current voice status


How It Works

  • Join: Bot joins the specified voice channel

  • Listen: VAD detects when users start/stop speaking

  • Record: Audio is buffered while user speaks

  • Transcribe: On silence, audio is sent to STT provider

  • Process: Transcribed text is sent to Clawdbot agent

  • Synthesize: Agent response is converted to audio via TTS

  • Play: Audio is played back in the voice channel
  • Streaming STT (Deepgram)

    When using Deepgram as your STT provider, streaming mode is enabled by default. This provides:

    • ~1 second faster end-to-end latency
    • Real-time feedback with interim transcription results
    • Automatic keep-alive to prevent connection timeouts
    • Fallback to batch transcription if streaming fails
    To use streaming STT:
    {
      "sttProvider": "deepgram",
      "streamingSTT": true,  // default
      "deepgram": {
        "apiKey": "...",
        "model": "nova-2"
      }
    }

    Barge-in Support

    When enabled (default), the bot will immediately stop speaking if a user starts talking. This creates a more natural conversational flow where you can interrupt the bot.

    To disable (let the bot finish speaking):

    {
      "bargeIn": false
    }

    Auto-reconnect

    The plugin includes automatic connection health monitoring:

    • Heartbeat checks every 30 seconds (configurable)
    • Auto-reconnect on disconnect with exponential backoff
    • Max 3 attempts before giving up
    If the connection drops, you'll see logs like:
    [discord-voice] Disconnected from voice channel
    [discord-voice] Reconnection attempt 1/3
    [discord-voice] Reconnected successfully

    VAD Sensitivity

    • low: Picks up quiet speech, may trigger on background noise
    • medium: Balanced (recommended)
    • high: Requires louder, clearer speech

    Troubleshooting

    "Discord client not available"

    Ensure the Discord channel is configured and the bot is connected before using voice.

    Opus/Sodium build errors

    Install build tools:
    npm install -g node-gyp
    npm rebuild @discordjs/opus sodium-native

    No audio heard

  • Check bot has Connect + Speak permissions
  • Check bot isn't server muted
  • Verify TTS API key is valid
  • Transcription not working

  • Check STT API key is valid
  • Check audio is being recorded (see debug logs)
  • Try adjusting VAD sensitivity
  • Enable debug logging

    DEBUG=discord-voice clawdbot gateway start

    Environment Variables

    VariableDescription
    DISCORD_TOKENDiscord bot token (required)
    OPENAI_API_KEYOpenAI API key (Whisper + TTS)
    ELEVENLABS_API_KEYElevenLabs API key
    DEEPGRAM_API_KEYDeepgram API key

    Limitations

    • Only one voice channel per guild at a time
    • Maximum recording length: 30 seconds (configurable)
    • Requires stable network for real-time audio
    • TTS output may have slight delay due to synthesis
    MIT