TL;DR — In Cline → Settings → API Provider, pick "OpenAI Compatible". Set Base URL to https://api.jusinfer.com/v1, paste a jinf_ token, and use jusInfer-auto as the model. Cline keeps full autonomous-edit and tool-use behavior; jusInfer picks the cheapest capable model per step.
Cline + custom endpoint — cut your bill 60%+
Cline (formerly Claude Dev) is the most polished autonomous-edit agent in VS Code. It's also expensive on default settings because every "read file → propose edit → run test" loop step talks to a frontier model. Cline natively supports any OpenAI-compatible base URL, so you can route through jusInfer and drop the bill 60-80% without changing your workflow.
What you'll change
Two values in Cline's settings panel. No code, no extension reinstall, no workflow change.
Setup
1. Mint a jusInfer API key
Sign in at jusinfer.com/login with Google or Microsoft. Open jusinfer.com/developer → Keys tab → Mint key. Copy the jinf_… token (shown once).
2. Configure Cline
Open VS Code's Cline panel → click the gear icon → API Provider dropdown → select "OpenAI Compatible". Then fill in:
Base URL: https://api.jusinfer.com/v1
API Key: jinf_your_key_here
Model ID: jusInfer-auto
Click Save.
3. Verify
Start a Cline task — say, "rename this function and update all call sites." It should work exactly as before. After it finishes, open jusinfer.com/developer → Usage tab. You'll see the spend tick up at a fraction of what Sonnet 4.5 would cost.
What you get
| Behavior | Status |
|---|---|
| Autonomous edit mode | ✅ unchanged |
| Tool use (read_file, write_file, execute_command) | ✅ unchanged |
| Multi-step plans | ✅ unchanged |
| File context / repo awareness | ✅ unchanged |
| Image inputs (when needed) | ✅ auto-routes to vision-capable model |
| Streaming responses | ✅ |
| Cost per task | down 60-80% on real workloads |
Pinning a specific model
jusInfer-auto lets us pick. If you want every Cline call to use a specific upstream — say Sonnet 4.5 for hard refactors or Hermes 4 405B for tool-heavy stuff — use the provider-prefixed model id instead:
Model ID: anthropic/claude-sonnet-4.5
# or
Model ID: nousresearch/hermes-4-405b
# or
Model ID: openai/gpt-5
jusInfer normalizes provider prefixes — you don't need separate accounts at each upstream.
Why Cline benefits so much from per-call routing
Cline operates on tight loops: read → think → edit → verify → repeat. A typical task is 10-30 LLM calls. Most of those calls are tactical (apply a small edit, parse a tool result) and don't need a frontier model. A few are strategic (decide architecture, debug a hard failure) and do.
jusInfer looks at each request's shape (context length, tool-call pattern, recent turn history) and routes accordingly. Tactical calls go to an 8-30B parameter model that's fast and cheap; strategic calls go to Sonnet or GPT-5. You get the same task-completion rate at a fraction of the cost.
Real numbers from a real team
A 6-engineer team using Cline daily switched from default Sonnet 4.5 to jusInfer in March 2026. Before: $480/seat/month average. After: $115/seat/month over the following 30 days. No quality regression on their internal eval (which is a fixed set of 20 real refactor tasks they grade by hand). The savings came from 80% of their daily Cline traffic routing to mid-tier models that handle "apply this small edit" perfectly well.
Common gotchas
- The model ID must exist in your Cline settings dropdown. Some Cline versions require you to type
jusInfer-autoin a custom-models field before it appears as selectable. - Tool-call shape is normalized. Cline expects OpenAI-style tool calls; jusInfer normalizes Anthropic-format responses transparently so it doesn't matter which upstream actually runs.
- Streaming is forwarded end-to-end. If your Cline UI lags on responses, that's network — not the gateway.
What about Cline's Anthropic-direct path?
Cline also supports the Anthropic provider directly. That path uses the Anthropic Messages API, not OpenAI Chat Completions. To route Anthropic-shape traffic through jusInfer, see Use jusInfer with Claude Code — same two env vars (ANTHROPIC_BASE_URL + ANTHROPIC_AUTH_TOKEN), and Cline picks them up if you set them in your shell before launching VS Code.
Setup checklist
- Sign up at jusinfer.com/login.
- Mint a
jinf_key at /developer → Keys. - Cline → Settings → API Provider: OpenAI Compatible.
- Base URL:
https://api.jusinfer.com/v1. API Key: your jinf_ token. Model:jusInfer-auto. - Run a normal Cline task. Check spend on the dashboard.
- Set per-user caps in Tenant tab if multi-engineer.
Related reading
- OpenAI-compatible drop-in (Cursor, Aider, Continue, Goose…)
- Your Cursor bill is too high — three ways to cut it
- Aider on a budget
- The cheapest LLM API for coding agents in 2026
Raw markdown: /blog/cline-custom-endpoint.md