TL;DR — In Roo Code → Settings → API Configuration, pick "OpenAI Compatible". Set Base URL to https://api.jusinfer.com/v1, paste a jinf_ token, and use jusInfer-auto. All four modes (Code, Architect, Ask, Debug) keep their behavior — jusInfer picks the cheapest capable model per turn.
Roo Code + custom endpoint — fork the agent, not the bill
Roo Code is a Cline fork that added persistent multi-mode workflows (Code, Architect, Ask, Debug) and a richer prompt-customization surface. It's excellent for long-running refactors and architecture conversations — and like Cline, it'll happily burn through Claude tokens on default settings. Roo natively supports any OpenAI-compatible base URL, so you can route through jusInfer and keep all four modes working as designed.
What you'll change
Two values in Roo Code's settings panel. No extension reinstall, no mode behavior change, no workflow change.
Setup
1. Mint a jusInfer API key
Sign in at jusinfer.com/login with Google or Microsoft. Open jusinfer.com/developer → Keys tab → Mint key. Copy the jinf_… token (shown once).
2. Configure Roo Code
Open Roo Code in VS Code → click the gear icon → API Configuration.
- API Provider:
OpenAI Compatible - Base URL:
https://api.jusinfer.com/v1 - API Key: paste the
jinf_…token - Model:
jusInfer-auto - Model Context Window: leave at default (Roo auto-detects from the first response)
Save. That's it.
3. Verify
Open any project, hit Code mode, type read package.json and tell me what this app does. You should see a normal Roo response. Open the Roo output panel — request lines now show api.jusinfer.com instead of api.anthropic.com.
Mode-by-mode notes
| Mode | What changes | What doesn't |
|---|---|---|
| Code | Picks a smaller model for "read file" steps, a stronger one for "rewrite this function" | Tool use, file edits, terminal commands |
| Architect | Routes long-context planning to a high-context model regardless of cost (it's the highest-leverage call in a session) | Plan-then-implement separation |
| Ask | Single-shot Q&A goes to a fast, cheap model — usually 10-15ms TTFT | Conversation persistence |
| Debug | Stack-trace analysis lands on a reasoning-capable model with reasoning_effort=medium by default | Test-run loop, breakpoint inspection |
What if Roo's auto-detection picks the wrong context window?
Set it explicitly. In Advanced Settings, override Context Window to 200000 (DeepSeek v4 Pro's window, currently jusInfer's default upstream). If you later hit a model with a smaller window, jusInfer will route around it; the override just prevents Roo from truncating your prompt before the request leaves.
Cost comparison — same 4-hour Roo session
| Setup | Approximate session cost | Notes |
|---|---|---|
| Roo + Claude Sonnet (direct) | $18-25 | Baseline; mixed-mode session, ~40 tool calls |
| Roo + GPT-4.1 (direct) | $14-19 | Cheaper input, comparable output |
| Roo + jusInfer-auto | $4-7 | Smaller model on tool-use steps; reasoning model only when needed |
(These are sample sessions, not benchmarks. Your numbers will vary with how much you bounce between Architect and Code modes.)
What Roo doesn't lose
- Multi-mode workflow — all four modes route through the same endpoint; the mode switch is client-side
- Custom instructions — Roo's mode-level custom instructions go in the system prompt; jusInfer passes them through unchanged
- Prompt history — stored locally in your VS Code profile; not touched by the endpoint change
- Approval flows — Roo's "review before applying" toggles are client-side; unaffected
When to stay on Roo's default provider
Two cases:
- Strict provider compliance — your team has a contract with Anthropic or OpenAI that requires direct billing. jusInfer is a passthrough; the underlying provider sees jusInfer's account, not yours.
- Provider-specific features Roo doesn't abstract — e.g. Claude's prompt caching headers, if Roo exposes them in a future release. As of today, all mode behaviors are abstractable through OpenAI-compatible calls.
Switching back
Same gear icon, change API Provider back to your prior choice. No state is lost. jusInfer keys keep working until you revoke them — you can A/B between endpoints without churn.
Further reading
- Cline + custom endpoint — the parent project's setup
- Custom agent harness on an OpenAI-compatible base URL — what every harness needs
- jusInfer API reference — every supported field including
reasoning_effortandthinking