Speech & TranscriptionDocumentedScanned

edge-tts

|.

Share:

Installation

npx clawhub@latest install edge-tts

View the full skill documentation and source below.

Documentation

Edge-TTS Skill

Overview

Generate high-quality text-to-speech audio using Microsoft Edge's neural TTS service via the node-edge-tts npm package. Supports multiple languages, voices, adjustable speed/pitch, and subtitle generation.

Quick Start

When you detect TTS intent from triggers or user request:

  • Call the tts tool (Clawdbot built-in) to convert text to speech

  • The tool returns a MEDIA: path

  • Clawdbot routes the audio to the current channel
  • // Example: Built-in tts tool usage
    tts("Your text to convert to speech")
    // Returns: MEDIA: /path/to/audio.mp3

    Trigger Detection

    Recognize "tts" keyword as TTS requests. The skill automatically filters out TTS-related keywords from text before conversion to avoid converting the trigger words themselves to audio.

    Advanced Customization

    Using the Node.js Scripts

    For more control, use the bundled scripts directly:

    TTS Converter

    cd scripts
    npm install
    node tts-converter.js "Your text" --voice en-US-AriaNeural --rate +10% --output output.mp3

    Options:

    • --voice, -v: Voice name (default: en-US-AriaNeural)

    • --lang, -l: Language code (e.g., en-US, es-ES)

    • --format, -o: Output format (default: audio-24khz-48kbitrate-mono-mp3)

    • --pitch: Pitch adjustment (e.g., +10%, -20%, default)

    • --rate, -r: Rate adjustment (e.g., +10%, -20%, default)

    • --volume: Volume adjustment (e.g., +0%, -10%, default)

    • --save-subtitles, -s: Save subtitles as JSON file

    • --output, -f: Output file path (default: tts_output.mp3)

    • --proxy, -p: Proxy URL (e.g., )

    • --timeout: Request timeout in milliseconds (default: 10000)

    • --list-voices, -L: List available voices


    Configuration Manager


    cd scripts
    npm install
    node config-manager.js --set-voice en-US-AriaNeural
    
    node config-manager.js --set-rate +10%
    
    node config-manager.js --get
    
    node config-manager.js --reset

    Voice Selection

    Common voices (use --list-voices for full list):

    English:

    • en-US-MichelleNeural (female, natural, default)

    • en-US-AriaNeural (female, natural)

    • en-US-GuyNeural (male, natural)

    • en-GB-SoniaNeural (female, British)

    • en-GB-RyanNeural (male, British)


    Other Languages:
    • es-ES-ElviraNeural (Spanish, Spain)

    • fr-FR-DeniseNeural (French)

    • de-DE-KatjaNeural (German)

    • ja-JP-NanamiNeural (Japanese)

    • zh-CN-XiaoxiaoNeural (Chinese)

    • ar-SA-ZariyahNeural (Arabic)


    Rate Guidelines

    Rate values use percentage format:

    • "default": Normal speed

    • "-20%" to "-10%": Slow, clear (tutorials, stories, accessibility)

    • "+10%" to "+20%": Slightly fast (summaries)

    • "+30%" to "+50%": Fast (news, efficiency)


    Output Formats

    Choose audio quality based on use case:

    • audio-24khz-48kbitrate-mono-mp3: Standard quality (voice notes, messages)

    • audio-24khz-96kbitrate-mono-mp3: High quality (presentations, content)

    • audio-48khz-96kbitrate-stereo-mp3: Highest quality (professional audio, music)


    Resources

    scripts/tts-converter.js

    Main TTS conversion script using node-edge-tts. Generates audio files with customizable voice, rate, volume, pitch, and format. Supports subtitle generation and voice listing.

    scripts/config-manager.js

    Manages persistent user preferences for TTS settings (voice, language, format, pitch, rate, volume). Stores config in ~/.tts-config.json.

    scripts/package.json

    NPM package configuration with node-edge-tts dependency.

    references/node_edge_tts_guide.md

    Complete documentation for node-edge-tts npm package including:
    • Full voice list by language
    • Prosody options (rate, pitch, volume)
    • Usage examples (CLI and Module)
    • Subtitle generation
    • Output formats
    • Best practices and limitations

    Voice Testing

    Test different voices and preview audio quality at:

    Refer to this when you need specific voice details or advanced features.

    Installation

    To use the bundled scripts:

    cd /home/user/clawd/skills/public/tts-skill/scripts
    npm install

    This installs:

    • node-edge-tts - TTS library

    • commander - CLI argument parsing


    Workflow

  • Detect intent: Check for "tts" trigger or keyword in user message

  • Choose method: Use built-in tts tool for simple requests, or scripts/tts-converter.js for customization

  • Generate audio: Convert the target text (message, search results, summary)

  • Return to user: The tts tool returns a MEDIA: path; Clawdbot handles delivery
  • Testing

    Basic Test

    Run the test script to verify TTS functionality:
    cd /home/user/clawd/skills/public/edge-tts/scripts
    npm test
    This generates a test audio file and verifies the TTS service is working.

    Voice Testing

    Test different voices and preview audio quality at:

    Integration Test

    Use the built-in tts tool for quick testing:
    // Example: Test TTS with default settings
    tts("This is a test of the TTS functionality.")

    Configuration Test

    Verify configuration persistence:
    cd /home/user/clawd/skills/public/edge-tts/scripts
    node config-manager.js --get
    node config-manager.js --set-voice en-US-GuyNeural
    node config-manager.js --get

    Troubleshooting

    • Test connectivity: Run npm test to check if TTS service is accessible
    • Check voice availability: Use node tts-converter.js --list-voices to see available voices
    • Verify proxy settings: If using proxy, test with node tts-converter.js "test" --proxy - **Check audio output**: The test should generate test-output.mp3 in the scripts directory ## Notes - node-edge-tts uses Microsoft Edge's online TTS service (updated, working authentication) - No API key needed (free service) - Output is MP3 format by default - Requires internet connection - Supports subtitle generation (JSON format with word-level timing) - **Temporary File Handling**: By default, audio files are saved to the system's temporary directory (/tmp/edge-tts-temp/ on Unix, C:\Users\\AppData\Local\Temp\edge-tts-temp\ on Windows) with unique filenames (e.g., tts_1234567890_abc123.mp3). Files are not automatically deleted - the calling application (Clawdbot) should handle cleanup after use. You can specify a custom output path with the --output option if permanent storage is needed. - **TTS keyword filtering**: The skill automatically filters out TTS-related keywords (tts, TTS, text-to-speech) from text before conversion to avoid converting the trigger words themselves to audio - For repeated preferences, use config-manager.js to set defaults - **Default voice**: en-US-MichelleNeural (female, natural) - Neural voices (ending in Neural`) provide higher quality than Standard voices