Text-to-Speech
Mona can speak her responses using text-to-speech (TTS). This works across all platforms — CLI, Telegram, Discord, and more.
TTS providers
| Provider | Quality | Cost | Best for |
|---|---|---|---|
| Edge TTS | Good | Free | Everyday use |
| ElevenLabs | Excellent | Paid | Professional voice, cloning |
Edge TTS (default)
Edge TTS is free and requires no API key. It uses Microsoft's online TTS service.
Setup
Already included in the base install. Configure the voice:
monoclaw config set tts.provider edge-tts
monoclaw config set tts.edge-tts.voice "zh-HK-HiuMaanNeural"
Available voices
List available voices:
monoclaw tts voices --provider edge-tts
Popular voices:
| Voice | Language | Style |
|---|---|---|
en-US-AriaNeural | English (US) | Natural, neutral |
en-GB-SoniaNeural | English (UK) | Professional |
zh-HK-HiuMaanNeural | Cantonese | Natural |
zh-CN-XiaoxiaoNeural | Mandarin | Friendly |
ElevenLabs (premium)
For the highest quality voices, including voice cloning.
Setup
cd ~/.monoclaw/monoclaw-runtime
uv pip install -e ".[tts-premium]"
monoclaw config set tts.provider elevenlabs
monoclaw config set ELEVENLABS_API_KEY "your-key"
See the dedicated ElevenLabs guide for detailed configuration.
Using TTS
CLI
/voice on
Mona speaks all responses. Press Ctrl+B to send voice messages.
Telegram
TTS is automatic for voice mode. To enable TTS for text replies:
monoclaw config set telegram.tts.enabled true
Discord
Enable in voice channels or for text replies:
monoclaw config set discord.tts.enabled true
TTS configuration
# ~/.monoclaw/config.yaml
tts:
provider: edge-tts
edge-tts:
voice: "en-US-AriaNeural"
rate: "+0%" # Speaking rate
pitch: "+0Hz" # Pitch adjustment
elevenlabs:
voice_id: "21m00Tcm4TlvDq8ikWAM"
model: "eleven_multilingual_v2"
Platform-specific settings
tts:
cli:
enabled: true
telegram:
enabled: false # Only in voice mode
discord:
enabled: true
slack:
enabled: false # Slack doesn't support voice well
Best practices
- Use Edge TTS for most cases — It's free and good enough
- Reserve ElevenLabs for presentations — Higher quality but costs per character
- Match voice to language — Use Chinese voices for Chinese text
- Adjust rate for long content — Slightly slower for technical explanations
Troubleshooting
| Problem | Fix |
|---|---|
| "TTS not available" | Install the appropriate extra |
| "Voice not found" | Check the voice ID is correct |
| "API key invalid" | Verify your ElevenLabs key |
| Audio not playing | Check system volume and audio output |
| Garbled speech | Ensure TTS language matches text language |