Modal
Modal is a serverless compute platform that lets you run Mona on GPUs and CPUs in the cloud, scaling from zero to thousands of workers.
When to use Modal
- You need GPU acceleration for local model inference
- You want ephemeral, isolated execution environments
- You want to scale concurrent agent sessions horizontally
- You don't want to manage servers
Installation
The Modal extra is included in the [all] install. If you installed the minimal bundle:
cd ~/.monoclaw/monoclaw-runtime
uv pip install -e ".[modal]"
Authentication
- Sign up at modal.com
- Install the Modal CLI:
pip install modal - Run
modal token newto authenticate
Configure MonoClaw
Set Modal as your terminal backend:
monoclaw config set terminal.backend modal
Or configure in config.yaml:
terminal:
backend: modal
modal:
app_name: "monoclaw-agent"
gpu: "a10g" # or "t4", "a100", "h100"
cpu: 4
memory: 16384 # MB
timeout: 3600 # seconds
How it works
When Mona needs to run a command:
- MonoClaw spins up a Modal sandbox
- The command executes inside the sandbox
- Output streams back to Mona in real time
- The sandbox shuts down when idle (or stays warm if configured)
Cost optimization
Modal charges only for compute time. Tips to minimize costs:
- Use
keep_warm: 1to keep one sandbox warm for fast responses - Use cheaper GPUs (
t4) for light workloads - Set
timeoutlow to prevent runaway processes - Use
spotinstances for non-critical tasks
Example: GPU-accelerated local model
Run a local model inside Modal:
model:
default: "custom"
custom:
endpoint: "https://your-modal-app.modal.run/v1"
api_key: "${MODAL_API_KEY}"
Deploy an OpenAI-compatible endpoint on Modal using vLLM or TGI, then point Mona at it.
Limitations
- Cold start latency: 5–30 seconds for the first command if no sandbox is warm
- Network: Sandboxes have internet access but no persistent local storage
- Secrets: Pass secrets via environment variables, not command arguments