TL;DR — Set OPENAI_API_BASE=https://api.jusinfer.com/v1 and OPENAI_API_KEY=jinf_... then run `aider --model openai/jusInfer-auto`. Aider's per-turn full-context re-send pattern means it benefits more than most agents from per-call model routing — typical savings 60-80% with no quality regression.
Aider on a budget — cut your bill 70% with one env var
Aider is among the best command-line pair-programmers on the planet. It's also expensive to run on default settings because its design — re-send the relevant repo context every turn — burns tokens. This post shows how to route Aider through a cheaper inference endpoint without changing anything about how you use it.
The two env vars that change everything
Aider respects the standard OpenAI env vars. Set these in your shell:
export OPENAI_API_BASE="https://api.jusinfer.com/v1"
export OPENAI_API_KEY="jinf_your_key_here"
Then run Aider with an explicit model name so it knows to use the OpenAI provider path:
aider --model openai/jusInfer-auto
That's it. Aider now sends every call to jusInfer, which routes per-call to the cheapest capable model. Average bill drops 60-80% on a typical refactor session.
Why Aider gets expensive
Aider operates by sending the entire content of every file you've added to the chat with every turn. This is correct behavior — the model needs context. But it means:
- A 5-turn refactor on three 300-line files = ~30k prompt tokens × 5 turns = 150k tokens just in repeated context.
- At Sonnet 4.5 rates that's $0.45 in prompt-only spend per session.
- Multiply by 30 sessions/day × 4 weeks = $540/month per engineer.
jusInfer's mitigation isn't magic — it's that most of those turns don't need Sonnet. A "fix this lint error" turn that comes after a successful "implement feature X" turn can route to an 8B-parameter model for 1/50th the price. The next turn ("now add tests") can route back up. You don't pick — we do.
Pinning a model when you need to
Sometimes you want a specific model for a session — say, you're doing a tricky algorithmic refactor and you want Sonnet end-to-end. Pass it explicitly:
aider --model openai/anthropic/claude-sonnet-4.5
# or
aider --model openai/openai/gpt-5
jusInfer normalizes the provider prefix — you don't need accounts at each upstream. The openai/ outer prefix tells Aider to use the OpenAI client path (which then hits jusInfer). The inner anthropic/... or openai/... is the model id we route to.
Verifying it works
After a session, run:
aider --usage
You'll see token counts. Cross-reference against your jusInfer dashboard at jusinfer.com/developer → Usage tab. The numbers should match within a token or two (Aider counts client-side; jusInfer counts server-side; they diverge only on streamed responses with mid-stream cutoffs).
What works, what to watch for
| Feature | Works? | Notes |
|---|---|---|
/add files | ✅ | normal repo-context flow |
/drop files | ✅ | |
/diff edits | ✅ | Aider's "diff" edit format works because models we route to all support it |
/ask mode | ✅ | reads files, doesn't propose edits |
/code mode | ✅ | |
| Voice mode | ✅ | uses Aider's Whisper integration; transcription stays local |
| Git auto-commits | ✅ | unrelated to inference |
| Custom edit formats | ✅ | --edit-format diff and udiff both work |
| Repo map | ✅ | the auto-generated repo map gets included in context |
| Image inputs | ✅ | auto-routes to a vision-capable model |
Architect mode (--architect) | ✅ | uses two models — architect + editor; both go through jusInfer |
Architect mode + cost
Aider's --architect mode is great for big features (architect proposes, editor implements). On default settings it uses Sonnet for both, which doubles cost. Aider lets you specify them separately:
aider --architect \
--architect-model openai/anthropic/claude-sonnet-4.5 \
--editor-model openai/qwen3-coder-480b
This pattern — frontier model for planning, cheap model for execution — typically halves architect-mode cost with no quality regression on the architect's output.
Cache-aware modes
If you're on a model that supports prompt caching (Sonnet 4.5, GPT-5), Aider passes the cache-control hints through. jusInfer forwards them. Your second turn in a session uses cached context, which is up to 90% cheaper. Watch your dashboard to verify cache hits land — the savings show as a separate line on the usage breakdown.
Setup checklist
- Install Aider:
pipx install aider-install && aider-install - Mint a jusInfer key at jusinfer.com/developer → Keys tab.
- Set the env vars above in
~/.zshrcor~/.bashrc. source ~/.zshrccdinto your repo and runaider --model openai/jusInfer-auto.- Use Aider normally for a day. Check spend on the dashboard.
Related reading
- OpenAI-compatible drop-in (full reference for every harness)
- The cheapest LLM API for coding agents in 2026
- Use jusInfer with Claude Code
- Use jusInfer with OpenCode
Raw markdown: /blog/aider-cheap-inference.md