Type what you want in natural language. Kai translates it to the right shell command — entirely on your machine. The model (Qwen 3 1.7B) ships built into the binary via Candle. No accounts. No API keys. No daemons. No data leaves your device.
Modern small models are good enough for shell commands — and running them locally beats sending every keystroke to someone else's server.
Every prompt, every directory listing, every command stays on your machine. There is no upstream to leak to.
No subscriptions, no rate limits, no per-token fees. Use it on a plane, on a flaky cafe wi-fi, on an air-gapped box.
Qwen 3 0.6B / 1.7B / 4B today; more as Candle adds them. Edit one line in config.toml and the next launch downloads a new brain.
You type Kai (Rust binary) Candle (in-process)
───────── ───────────────────── ───────────────────
"list rust ──> classify input ──> Qwen 3 1.7B (Q4)
files" + gather context running on Metal/CPU
(cwd, git, OS)
<── "find . -name '*.rs'"
show confirm UI
[Enter] ──> PTY ──> your shell runs it
Kai wraps your existing shell (zsh / fish / bash) as a PTY. Plain commands like ls pass through untouched; natural language gets routed to the in-process model. No daemon.
macOS (Apple Silicon / Intel) と Linux 向けにビルド済みバイナリを提供します。Homebrew tap の公開準備中です。
First launch downloads the model (~1.1 GB) from Hugging Face and caches it in ~/.cache/huggingface/hub/. Subsequent launches are instant. On an M-series Mac, inference runs at 30–80 tok/s via Metal — commands feel instant.
Defaults work out of the box. To use a different model, edit ~/.config/kai/config.toml:
model = "qwen3:1.7b" # or "qwen3:0.6b" / "qwen3:4b" device = "auto" # or "cpu" / "metal" / "cuda"