2026-05-27 · Kalmantic

TL;DR — Run `interpreter --api_base https://api.jusinfer.com/v1 --api_key <jinf_token> --model jusInfer-auto`. OI's litellm layer passes the OpenAI-compatible request through; jusInfer picks the cheapest capable model per turn. Local code execution, tool approvals, and conversation memory are unaffected.

Open Interpreter + custom provider — keep the agent, drop the bill

Open Interpreter (OI) lets a model run code on your machine to actually accomplish a task — read files, install packages, query databases, plot data. The agent loop is short and tight: model emits code → OI runs it → OI feeds stdout/stderr back → model decides next step. That tight loop means a lot of model calls per task, which means a lot of money on a frontier model. Pointing OI at jusInfer drops the per-call cost without changing the code-execution behavior at all.

Why it works

OI's model layer is litellm, which speaks OpenAI-compatible by default. Anything that accepts a base URL and bearer token works.

Setup

1. Mint a jusInfer API key

2. Launch OI against jusInfer

CLI flags (one-off):

interpreter \
  --api_base https://api.jusinfer.com/v1 \
  --api_key jinf_your_token_here \
  --model jusInfer-auto

Or set them once in ~/.openinterpreter/config.yaml:

llm:
  api_base: https://api.jusinfer.com/v1
  api_key: jinf_your_token_here
  model: jusInfer-auto

3. Verify

Run interpreter with no further args, then ask it what python version is installed. You'll see OI emit a one-liner, ask permission to run, execute, and report — same flow as before. The model behind it is now jusInfer-auto.

What changes vs default

Aspect	Default OI (frontier model)	OI + jusInfer
First-token latency	400-800ms	200-500ms (smaller models warm up faster)
Cost per "read file → decide" loop step	$0.01-0.05	$0.002-0.01
Cost per "write 200-line script" step	$0.04-0.10	$0.02-0.05
Code-execution behavior	unchanged	unchanged
Conversation memory	unchanged	unchanged
Tool approval flow	unchanged	unchanged
Local file access	unchanged	unchanged

What about safety mode / `--safe_mode`?

OI's safe mode (interactive approval before each code execution) is a CLIENT-side feature. The model behind it doesn't know whether you'll approve, deny, or modify the code. Switching to jusInfer doesn't weaken safe mode — your approvals still gate every execution.

What about offline / `--local`?

--local runs an Ollama or LM Studio model on your machine. Don't combine --local with --api_base — they're mutually exclusive. Use --local when you want privacy + zero per-call cost; use jusInfer when you want frontier quality at routed-down cost.

Multi-step tasks: where the savings compound

OI tasks tend to be long. "Clean this dataset and produce three charts" might be 30-50 model calls — read CSV, inspect schema, write cleaning script, run it, check output, write plotting script, run it, check output, iterate. Per-call cost matters more than per-token cost.

Sample task — "load sales.csv, find the top 5 products by revenue in 2025, write each to a separate JSON file":

Model	Total calls	Total cost	Wall time
Claude Sonnet (direct)	12	~$0.18	38s
GPT-4.1 (direct)	11	~$0.14	41s
jusInfer-auto	12	~$0.03	35s

The wall time barely moves because the bottleneck is local code execution, not the model. The cost moves a lot because every "tell me what the columns are" step now lands on a small fast model.

When you'd stay on the direct provider

Custom system prompts that depend on provider-specific tool calling — OI uses litellm's normalized tool-use, so this is rare.
Your org has an inference contract you need to bill against — jusInfer is a passthrough; underlying providers see jusInfer's account.

Switching back

Drop the --api_base / --api_key flags or comment them out in config.yaml. OI falls back to whatever was set in environment variables (OPENAI_API_KEY, ANTHROPIC_API_KEY, etc.).