Configuring Models
The model you choose determines Mona's capabilities, cost, and latency. MonoClaw supports 20+ providers and any OpenAI-compatible endpoint.
Quick setup
monoclaw model
This interactive wizard walks you through provider selection, model choice, and API key configuration.
Supported providers
| Provider | Setup | Notes |
|---|---|---|
| OpenRouter | API key | Multi-provider routing, free tier available |
| Anthropic | API key | Claude models, high quality |
| OpenAI | API key | GPT models, Codex |
| DeepSeek | API key | Cost-effective, strong reasoning |
| Kimi / Moonshot | API key | Coding specialist |
| Alibaba Cloud | API key | Qwen models |
| Hugging Face | HF_TOKEN | 20+ open models |
| AWS Bedrock | IAM / aws configure | Enterprise-grade |
| NVIDIA NIM | API key | Nemotron models |
| GitHub Copilot | OAuth / token | Copilot subscription models |
| Custom Endpoint | Base URL + key | VLLM, SGLang, Ollama, etc. |
Setting API keys
monoclaw config set OPENROUTER_API_KEY sk-or-...
monoclaw config set ANTHROPIC_API_KEY sk-ant-...
monoclaw config set OPENAI_API_KEY sk-...
Secrets are stored in ~/.monoclaw/.env.
Minimum context requirement
Mona requires 64,000 tokens of context minimum. Models with smaller windows cannot maintain enough working memory for multi-step tool-calling workflows and will be rejected at startup.
Most hosted models meet this easily:
- Claude: 200K–1M tokens
- GPT-5.5: 1M tokens
- Gemini: 1M tokens
- Qwen3: 262K tokens
- DeepSeek V4: 1M tokens
If you're running a local model, set its context size to at least 64K:
# llama.cpp
./server --ctx-size 65536
# Ollama
ollama run llama3 --ctx-size 65536
Custom endpoints
For self-hosted models (VLLM, SGLang, Ollama, llama.cpp):
monoclaw config set model.custom.endpoint "http://localhost:8000/v1"
monoclaw config set model.custom.api_key "sk-local"
monoclaw config set model.custom.name "local-llama"
Or in config.yaml:
model:
default: "custom"
custom:
endpoint: "http://localhost:8000/v1"
api_key: "${LOCAL_API_KEY}"
name: "local-llama"
context_length: 65536
Provider routing
Configure fallback providers if your primary is unavailable:
model:
default: "anthropic/claude-sonnet-4"
fallback:
- "openai/gpt-5.5"
- "openrouter/openai/gpt-4o"
Mona will automatically retry with the fallback provider if the primary fails.
Credential pools
For high-volume deployments, distribute requests across multiple API keys:
model:
provider: openrouter
credential_pool:
- key: "${OPENROUTER_KEY_1}"
- key: "${OPENROUTER_KEY_2}"
- key: "${OPENROUTER_KEY_3}"
Keys are rotated round-robin.
Model aliases
Create short aliases for frequently used models:
model:
aliases:
fast: "openai/gpt-5.4-mini"
smart: "anthropic/claude-opus-4.7"
cheap: "deepseek/deepseek-v4-flash"
Use them in chat:
/model fast
Verifying your setup
monoclaw doctor
This checks:
- API key validity
- Model availability
- Context length compatibility
- Provider endpoint reachability