Skip to content
2026-05-27 · Kalmantic

TL;DR — In Roo Code → Settings → API Configuration, pick "OpenAI Compatible". Set Base URL to https://api.jusinfer.com/v1, paste a jinf_ token, and use jusInfer-auto. All four modes (Code, Architect, Ask, Debug) keep their behavior — jusInfer picks the cheapest capable model per turn.

Roo Code + custom endpoint — fork the agent, not the bill

Roo Code is a Cline fork that added persistent multi-mode workflows (Code, Architect, Ask, Debug) and a richer prompt-customization surface. It's excellent for long-running refactors and architecture conversations — and like Cline, it'll happily burn through Claude tokens on default settings. Roo natively supports any OpenAI-compatible base URL, so you can route through jusInfer and keep all four modes working as designed.

What you'll change

Two values in Roo Code's settings panel. No extension reinstall, no mode behavior change, no workflow change.

Setup

1. Mint a jusInfer API key

Sign in at jusinfer.com/login with Google or Microsoft. Open jusinfer.com/developerKeys tab → Mint key. Copy the jinf_… token (shown once).

2. Configure Roo Code

Open Roo Code in VS Code → click the gear icon → API Configuration.

  • API Provider: OpenAI Compatible
  • Base URL: https://api.jusinfer.com/v1
  • API Key: paste the jinf_… token
  • Model: jusInfer-auto
  • Model Context Window: leave at default (Roo auto-detects from the first response)

Save. That's it.

3. Verify

Open any project, hit Code mode, type read package.json and tell me what this app does. You should see a normal Roo response. Open the Roo output panel — request lines now show api.jusinfer.com instead of api.anthropic.com.

Mode-by-mode notes

ModeWhat changesWhat doesn't
CodePicks a smaller model for "read file" steps, a stronger one for "rewrite this function"Tool use, file edits, terminal commands
ArchitectRoutes long-context planning to a high-context model regardless of cost (it's the highest-leverage call in a session)Plan-then-implement separation
AskSingle-shot Q&A goes to a fast, cheap model — usually 10-15ms TTFTConversation persistence
DebugStack-trace analysis lands on a reasoning-capable model with reasoning_effort=medium by defaultTest-run loop, breakpoint inspection

What if Roo's auto-detection picks the wrong context window?

Set it explicitly. In Advanced Settings, override Context Window to 200000 (DeepSeek v4 Pro's window, currently jusInfer's default upstream). If you later hit a model with a smaller window, jusInfer will route around it; the override just prevents Roo from truncating your prompt before the request leaves.

Cost comparison — same 4-hour Roo session

SetupApproximate session costNotes
Roo + Claude Sonnet (direct)$18-25Baseline; mixed-mode session, ~40 tool calls
Roo + GPT-4.1 (direct)$14-19Cheaper input, comparable output
Roo + jusInfer-auto$4-7Smaller model on tool-use steps; reasoning model only when needed

(These are sample sessions, not benchmarks. Your numbers will vary with how much you bounce between Architect and Code modes.)

What Roo doesn't lose

  • Multi-mode workflow — all four modes route through the same endpoint; the mode switch is client-side
  • Custom instructions — Roo's mode-level custom instructions go in the system prompt; jusInfer passes them through unchanged
  • Prompt history — stored locally in your VS Code profile; not touched by the endpoint change
  • Approval flows — Roo's "review before applying" toggles are client-side; unaffected

When to stay on Roo's default provider

Two cases:

  1. Strict provider compliance — your team has a contract with Anthropic or OpenAI that requires direct billing. jusInfer is a passthrough; the underlying provider sees jusInfer's account, not yours.
  2. Provider-specific features Roo doesn't abstract — e.g. Claude's prompt caching headers, if Roo exposes them in a future release. As of today, all mode behaviors are abstractable through OpenAI-compatible calls.

Switching back

Same gear icon, change API Provider back to your prior choice. No state is lost. jusInfer keys keep working until you revoke them — you can A/B between endpoints without churn.

Further reading

roo-codevscode-agentopenai-compatiblecustom-base-urlcost-optimization