MonoClaw

Text-to-Speech

Mona can speak her responses using text-to-speech (TTS). This works across all platforms — CLI, Telegram, Discord, and more.

TTS providers

ProviderQualityCostBest for
Edge TTSGoodFreeEveryday use
ElevenLabsExcellentPaidProfessional voice, cloning

Edge TTS (default)

Edge TTS is free and requires no API key. It uses Microsoft's online TTS service.

Setup

Already included in the base install. Configure the voice:

monoclaw config set tts.provider edge-tts
monoclaw config set tts.edge-tts.voice "zh-HK-HiuMaanNeural"

Available voices

List available voices:

monoclaw tts voices --provider edge-tts

Popular voices:

VoiceLanguageStyle
en-US-AriaNeuralEnglish (US)Natural, neutral
en-GB-SoniaNeuralEnglish (UK)Professional
zh-HK-HiuMaanNeuralCantoneseNatural
zh-CN-XiaoxiaoNeuralMandarinFriendly

ElevenLabs (premium)

For the highest quality voices, including voice cloning.

Setup

cd ~/.monoclaw/monoclaw-runtime
uv pip install -e ".[tts-premium]"
monoclaw config set tts.provider elevenlabs
monoclaw config set ELEVENLABS_API_KEY "your-key"

See the dedicated ElevenLabs guide for detailed configuration.

Using TTS

CLI

/voice on

Mona speaks all responses. Press Ctrl+B to send voice messages.

Telegram

TTS is automatic for voice mode. To enable TTS for text replies:

monoclaw config set telegram.tts.enabled true

Discord

Enable in voice channels or for text replies:

monoclaw config set discord.tts.enabled true

TTS configuration

# ~/.monoclaw/config.yaml
tts:
  provider: edge-tts
  edge-tts:
    voice: "en-US-AriaNeural"
    rate: "+0%"      # Speaking rate
    pitch: "+0Hz"    # Pitch adjustment
  elevenlabs:
    voice_id: "21m00Tcm4TlvDq8ikWAM"
    model: "eleven_multilingual_v2"

Platform-specific settings

tts:
  cli:
    enabled: true
  telegram:
    enabled: false    # Only in voice mode
  discord:
    enabled: true
  slack:
    enabled: false    # Slack doesn't support voice well

Best practices

  • Use Edge TTS for most cases — It's free and good enough
  • Reserve ElevenLabs for presentations — Higher quality but costs per character
  • Match voice to language — Use Chinese voices for Chinese text
  • Adjust rate for long content — Slightly slower for technical explanations

Troubleshooting

ProblemFix
"TTS not available"Install the appropriate extra
"Voice not found"Check the voice ID is correct
"API key invalid"Verify your ElevenLabs key
Audio not playingCheck system volume and audio output
Garbled speechEnsure TTS language matches text language