Text-to-Speech

Mona can speak her responses using text-to-speech (TTS). This works across all platforms — CLI, Telegram, Discord, and more.

TTS providers

Provider	Quality	Cost	Best for
Edge TTS	Good	Free	Everyday use
ElevenLabs	Excellent	Paid	Professional voice, cloning

Edge TTS (default)

Edge TTS is free and requires no API key. It uses Microsoft's online TTS service.

Setup

Already included in the base install. Configure the voice:

monoclaw config set tts.provider edge-tts
monoclaw config set tts.edge-tts.voice "zh-HK-HiuMaanNeural"

Available voices

List available voices:

monoclaw tts voices --provider edge-tts

Popular voices:

Voice	Language	Style
`en-US-AriaNeural`	English (US)	Natural, neutral
`en-GB-SoniaNeural`	English (UK)	Professional
`zh-HK-HiuMaanNeural`	Cantonese	Natural
`zh-CN-XiaoxiaoNeural`	Mandarin	Friendly

ElevenLabs (premium)

For the highest quality voices, including voice cloning.

Setup

cd ~/.monoclaw/monoclaw-runtime
uv pip install -e ".[tts-premium]"

monoclaw config set tts.provider elevenlabs
monoclaw config set ELEVENLABS_API_KEY "your-key"

See the dedicated ElevenLabs guide for detailed configuration.

Using TTS

CLI

/voice on

Mona speaks all responses. Press Ctrl+B to send voice messages.

TTS is automatic for voice mode. To enable TTS for text replies:

monoclaw config set telegram.tts.enabled true

Discord

Enable in voice channels or for text replies:

monoclaw config set discord.tts.enabled true

TTS configuration

# ~/.monoclaw/config.yaml
tts:
  provider: edge-tts
  edge-tts:
    voice: "en-US-AriaNeural"
    rate: "+0%"      # Speaking rate
    pitch: "+0Hz"    # Pitch adjustment
  elevenlabs:
    voice_id: "21m00Tcm4TlvDq8ikWAM"
    model: "eleven_multilingual_v2"

Platform-specific settings

tts:
  cli:
    enabled: true
  telegram:
    enabled: false    # Only in voice mode
  discord:
    enabled: true
  slack:
    enabled: false    # Slack doesn't support voice well

Best practices

Use Edge TTS for most cases — It's free and good enough
Reserve ElevenLabs for presentations — Higher quality but costs per character
Match voice to language — Use Chinese voices for Chinese text
Adjust rate for long content — Slightly slower for technical explanations

Troubleshooting

Problem	Fix
"TTS not available"	Install the appropriate extra
"Voice not found"	Check the voice ID is correct
"API key invalid"	Verify your ElevenLabs key
Audio not playing	Check system volume and audio output
Garbled speech	Ensure TTS language matches text language