Speech & TranscriptionDocumentedScanned

voice-agent

Local Voice Input/Output for Agents using the AI Voice Agent API.

Share:

Installation

npx clawhub@latest install voice-agent

View the full skill documentation and source below.

Documentation

Voice Agent

This skill allows you to speak and listen to the user using the local Voice Agent API.

Behavior Guidelines

  • Audio First: When the user communicates via audio (files), your PRIMARY mode of response is Audio File.
  • Silent Delivery: When sending an audio response, DO NOT send a text explanation like "I sent an audio". Just send the audio file.
  • Workflow:
1. User sends audio. 2. You usage transcribe to read it. 3. You think of a response. 4. You usage synthesize to generate the audio file. 5. You send the file. 6. STOP. Do not add text commentary.

Tools

Transcribe File

To transcribe an audio file (Speech-to-Text), run the client script with the transcribe command.
python3 {baseDir}/scripts/client.py transcribe "/path/to/audio/file.ogg"

Synthesize to File

To generate audio from text and save it to a file (Text-to-Speech), run the client script with the synthesize command.
python3 {baseDir}/scripts/client.py synthesize "Text to speak" --output "/path/to/output.mp3"

Health Check

To check if the voice agent API is running and healthy:
python3 {baseDir}/scripts/client.py health

Service Management

If the Health Check fails or you receive a connection error, the service may be stopped. You can attempt to start it by running:
{baseDir}/scripts/start.sh