2026-05-27 · Kalmantic

TL;DR — In Roo Code → Settings → API Configuration, pick "OpenAI Compatible". Set Base URL to https://api.jusinfer.com/v1, paste a jinf_ token, and use jusInfer-auto. All four modes (Code, Architect, Ask, Debug) keep their behavior — jusInfer picks the cheapest capable model per turn.

Roo Code + custom endpoint — fork the agent, not the bill

Roo Code is a Cline fork that added persistent multi-mode workflows (Code, Architect, Ask, Debug) and a richer prompt-customization surface. It's excellent for long-running refactors and architecture conversations — and like Cline, it'll happily burn through Claude tokens on default settings. Roo natively supports any OpenAI-compatible base URL, so you can route through jusInfer and keep all four modes working as designed.

What you'll change

Two values in Roo Code's settings panel. No extension reinstall, no mode behavior change, no workflow change.

Setup

1. Mint a jusInfer API key

Sign in at jusinfer.com/login with Google or Microsoft. Open jusinfer.com/developer → Keys tab → Mint key. Copy the jinf_… token (shown once).

2. Configure Roo Code

Open Roo Code in VS Code → click the gear icon → API Configuration.

API Provider: OpenAI Compatible
Base URL: https://api.jusinfer.com/v1
API Key: paste the jinf_… token
Model: jusInfer-auto
Model Context Window: leave at default (Roo auto-detects from the first response)

Save. That's it.

3. Verify

Open any project, hit Code mode, type read package.json and tell me what this app does. You should see a normal Roo response. Open the Roo output panel — request lines now show api.jusinfer.com instead of api.anthropic.com.

Mode-by-mode notes

Mode	What changes	What doesn't
Code	Picks a smaller model for "read file" steps, a stronger one for "rewrite this function"	Tool use, file edits, terminal commands
Architect	Routes long-context planning to a high-context model regardless of cost (it's the highest-leverage call in a session)	Plan-then-implement separation
Ask	Single-shot Q&A goes to a fast, cheap model — usually 10-15ms TTFT	Conversation persistence
Debug	Stack-trace analysis lands on a reasoning-capable model with `reasoning_effort=medium` by default	Test-run loop, breakpoint inspection

What if Roo's auto-detection picks the wrong context window?

Set it explicitly. In Advanced Settings, override Context Window to 200000 (DeepSeek v4 Pro's window, currently jusInfer's default upstream). If you later hit a model with a smaller window, jusInfer will route around it; the override just prevents Roo from truncating your prompt before the request leaves.

Cost comparison — same 4-hour Roo session

Setup	Approximate session cost	Notes
Roo + Claude Sonnet (direct)	$18-25	Baseline; mixed-mode session, ~40 tool calls
Roo + GPT-4.1 (direct)	$14-19	Cheaper input, comparable output
Roo + jusInfer-auto	$4-7	Smaller model on tool-use steps; reasoning model only when needed

(These are sample sessions, not benchmarks. Your numbers will vary with how much you bounce between Architect and Code modes.)

What Roo doesn't lose

Multi-mode workflow — all four modes route through the same endpoint; the mode switch is client-side
Custom instructions — Roo's mode-level custom instructions go in the system prompt; jusInfer passes them through unchanged
Prompt history — stored locally in your VS Code profile; not touched by the endpoint change
Approval flows — Roo's "review before applying" toggles are client-side; unaffected

When to stay on Roo's default provider

Two cases:

Strict provider compliance — your team has a contract with Anthropic or OpenAI that requires direct billing. jusInfer is a passthrough; the underlying provider sees jusInfer's account, not yours.
Provider-specific features Roo doesn't abstract — e.g. Claude's prompt caching headers, if Roo exposes them in a future release. As of today, all mode behaviors are abstractable through OpenAI-compatible calls.

Switching back

Same gear icon, change API Provider back to your prior choice. No state is lost. jusInfer keys keep working until you revoke them — you can A/B between endpoints without churn.