Skip to content
2026-05-26 · Kalmantic

TL;DR — Set OPENAI_API_BASE=https://api.jusinfer.com/v1 and OPENAI_API_KEY=jinf_... then run `aider --model openai/jusInfer-auto`. Aider's per-turn full-context re-send pattern means it benefits more than most agents from per-call model routing — typical savings 60-80% with no quality regression.

Aider on a budget — cut your bill 70% with one env var

Aider is among the best command-line pair-programmers on the planet. It's also expensive to run on default settings because its design — re-send the relevant repo context every turn — burns tokens. This post shows how to route Aider through a cheaper inference endpoint without changing anything about how you use it.

The two env vars that change everything

Aider respects the standard OpenAI env vars. Set these in your shell:

export OPENAI_API_BASE="https://api.jusinfer.com/v1"
export OPENAI_API_KEY="jinf_your_key_here"

Then run Aider with an explicit model name so it knows to use the OpenAI provider path:

aider --model openai/jusInfer-auto

That's it. Aider now sends every call to jusInfer, which routes per-call to the cheapest capable model. Average bill drops 60-80% on a typical refactor session.

Why Aider gets expensive

Aider operates by sending the entire content of every file you've added to the chat with every turn. This is correct behavior — the model needs context. But it means:

  • A 5-turn refactor on three 300-line files = ~30k prompt tokens × 5 turns = 150k tokens just in repeated context.
  • At Sonnet 4.5 rates that's $0.45 in prompt-only spend per session.
  • Multiply by 30 sessions/day × 4 weeks = $540/month per engineer.

jusInfer's mitigation isn't magic — it's that most of those turns don't need Sonnet. A "fix this lint error" turn that comes after a successful "implement feature X" turn can route to an 8B-parameter model for 1/50th the price. The next turn ("now add tests") can route back up. You don't pick — we do.

Pinning a model when you need to

Sometimes you want a specific model for a session — say, you're doing a tricky algorithmic refactor and you want Sonnet end-to-end. Pass it explicitly:

aider --model openai/anthropic/claude-sonnet-4.5
# or
aider --model openai/openai/gpt-5

jusInfer normalizes the provider prefix — you don't need accounts at each upstream. The openai/ outer prefix tells Aider to use the OpenAI client path (which then hits jusInfer). The inner anthropic/... or openai/... is the model id we route to.

Verifying it works

After a session, run:

aider --usage

You'll see token counts. Cross-reference against your jusInfer dashboard at jusinfer.com/developer → Usage tab. The numbers should match within a token or two (Aider counts client-side; jusInfer counts server-side; they diverge only on streamed responses with mid-stream cutoffs).

What works, what to watch for

FeatureWorks?Notes
/add filesnormal repo-context flow
/drop files
/diff editsAider's "diff" edit format works because models we route to all support it
/ask modereads files, doesn't propose edits
/code mode
Voice modeuses Aider's Whisper integration; transcription stays local
Git auto-commitsunrelated to inference
Custom edit formats--edit-format diff and udiff both work
Repo mapthe auto-generated repo map gets included in context
Image inputsauto-routes to a vision-capable model
Architect mode (--architect)uses two models — architect + editor; both go through jusInfer

Architect mode + cost

Aider's --architect mode is great for big features (architect proposes, editor implements). On default settings it uses Sonnet for both, which doubles cost. Aider lets you specify them separately:

aider --architect \
  --architect-model openai/anthropic/claude-sonnet-4.5 \
  --editor-model openai/qwen3-coder-480b

This pattern — frontier model for planning, cheap model for execution — typically halves architect-mode cost with no quality regression on the architect's output.

Cache-aware modes

If you're on a model that supports prompt caching (Sonnet 4.5, GPT-5), Aider passes the cache-control hints through. jusInfer forwards them. Your second turn in a session uses cached context, which is up to 90% cheaper. Watch your dashboard to verify cache hits land — the savings show as a separate line on the usage breakdown.

Setup checklist

  1. Install Aider: pipx install aider-install && aider-install
  2. Mint a jusInfer key at jusinfer.com/developer → Keys tab.
  3. Set the env vars above in ~/.zshrc or ~/.bashrc.
  4. source ~/.zshrc
  5. cd into your repo and run aider --model openai/jusInfer-auto.
  6. Use Aider normally for a day. Check spend on the dashboard.

Raw markdown: /blog/aider-cheap-inference.md

aideropenai-compatiblecost-optimizationcustom-base-urlai-pair-programming