2026-05-26 · Kalmantic

TL;DR — In Cline → Settings → API Provider, pick "OpenAI Compatible". Set Base URL to https://api.jusinfer.com/v1, paste a jinf_ token, and use jusInfer-auto as the model. Cline keeps full autonomous-edit and tool-use behavior; jusInfer picks the cheapest capable model per step.

Cline + custom endpoint — cut your bill 60%+

Cline (formerly Claude Dev) is the most polished autonomous-edit agent in VS Code. It's also expensive on default settings because every "read file → propose edit → run test" loop step talks to a frontier model. Cline natively supports any OpenAI-compatible base URL, so you can route through jusInfer and drop the bill 60-80% without changing your workflow.

What you'll change

Two values in Cline's settings panel. No code, no extension reinstall, no workflow change.

Setup

1. Mint a jusInfer API key

Sign in at jusinfer.com/login with Google or Microsoft. Open jusinfer.com/developer → Keys tab → Mint key. Copy the jinf_… token (shown once).

2. Configure Cline

Open VS Code's Cline panel → click the gear icon → API Provider dropdown → select "OpenAI Compatible". Then fill in:

Base URL:   https://api.jusinfer.com/v1
API Key:    jinf_your_key_here
Model ID:   jusInfer-auto

Click Save.

3. Verify

Start a Cline task — say, "rename this function and update all call sites." It should work exactly as before. After it finishes, open jusinfer.com/developer → Usage tab. You'll see the spend tick up at a fraction of what Sonnet 4.5 would cost.

What you get

Behavior	Status
Autonomous edit mode	✅ unchanged
Tool use (read_file, write_file, execute_command)	✅ unchanged
Multi-step plans	✅ unchanged
File context / repo awareness	✅ unchanged
Image inputs (when needed)	✅ auto-routes to vision-capable model
Streaming responses	✅
Cost per task	down 60-80% on real workloads

Pinning a specific model

jusInfer-auto lets us pick. If you want every Cline call to use a specific upstream — say Sonnet 4.5 for hard refactors or Hermes 4 405B for tool-heavy stuff — use the provider-prefixed model id instead:

Model ID: anthropic/claude-sonnet-4.5
# or
Model ID: nousresearch/hermes-4-405b
# or
Model ID: openai/gpt-5

jusInfer normalizes provider prefixes — you don't need separate accounts at each upstream.

Why Cline benefits so much from per-call routing

Cline operates on tight loops: read → think → edit → verify → repeat. A typical task is 10-30 LLM calls. Most of those calls are tactical (apply a small edit, parse a tool result) and don't need a frontier model. A few are strategic (decide architecture, debug a hard failure) and do.

jusInfer looks at each request's shape (context length, tool-call pattern, recent turn history) and routes accordingly. Tactical calls go to an 8-30B parameter model that's fast and cheap; strategic calls go to Sonnet or GPT-5. You get the same task-completion rate at a fraction of the cost.

Real numbers from a real team

A 6-engineer team using Cline daily switched from default Sonnet 4.5 to jusInfer in March 2026. Before: $480/seat/month average. After: $115/seat/month over the following 30 days. No quality regression on their internal eval (which is a fixed set of 20 real refactor tasks they grade by hand). The savings came from 80% of their daily Cline traffic routing to mid-tier models that handle "apply this small edit" perfectly well.

Common gotchas

The model ID must exist in your Cline settings dropdown. Some Cline versions require you to type jusInfer-auto in a custom-models field before it appears as selectable.
Tool-call shape is normalized. Cline expects OpenAI-style tool calls; jusInfer normalizes Anthropic-format responses transparently so it doesn't matter which upstream actually runs.
Streaming is forwarded end-to-end. If your Cline UI lags on responses, that's network — not the gateway.

What about Cline's Anthropic-direct path?

Cline also supports the Anthropic provider directly. That path uses the Anthropic Messages API, not OpenAI Chat Completions. To route Anthropic-shape traffic through jusInfer, see Use jusInfer with Claude Code — same two env vars (ANTHROPIC_BASE_URL + ANTHROPIC_AUTH_TOKEN), and Cline picks them up if you set them in your shell before launching VS Code.

Setup checklist

Sign up at jusinfer.com/login.
Mint a jinf_ key at /developer → Keys.
Cline → Settings → API Provider: OpenAI Compatible.
Base URL: https://api.jusinfer.com/v1. API Key: your jinf_ token. Model: jusInfer-auto.
Run a normal Cline task. Check spend on the dashboard.
Set per-user caps in Tenant tab if multi-engineer.

Raw markdown: /blog/cline-custom-endpoint.md

clinevscode-agentopenai-compatiblecustom-base-urlcost-optimization