2026-05-26 · Kalmantic

TL;DR — Run `goose configure`, pick OpenAI as the provider, point host to https://api.jusinfer.com and paste a jinf_ token. Goose's tool-use loop keeps working exactly as before; per-call routing drops typical bills 60-80%.

Use Goose (Block) with a custom provider

Goose is Block's open-source agent toolkit — strong on extensions (called "goose-extensions"), tool-use, and giving you full control over the agent loop. It accepts any OpenAI-compatible provider, which makes pointing it at jusInfer a 5-minute config change with immediate cost savings.

The 5-minute setup

goose configure

When prompted:

? Which provider?
  ↳ OpenAI

? Host (without trailing slash)
  ↳ https://api.jusinfer.com

? API key
  ↳ jinf_your_key_here

? Model
  ↳ jusInfer-auto

Goose stores this in ~/.config/goose/config.yaml. Verify:

GOOSE_PROVIDER__TYPE: openai
GOOSE_PROVIDER__HOST: https://api.jusinfer.com
GOOSE_PROVIDER__API_KEY: jinf_…
GOOSE_PROVIDER__MODEL: jusInfer-auto

Done. Start a session:

goose session start

Getting an API key

Sign in at jusinfer.com/login → jusinfer.com/developer → Keys tab → Mint key. Copy the jinf_… token (shown once). Paste into the configure prompt above.

What works

Feature	Status
`goose session start` interactive loop	✅
Tool extensions (`goose-extensions`)	✅
Memory between sessions	✅
Custom system prompts	✅
Streaming responses	✅
Plan-then-execute mode	✅
Image inputs (vision)	✅ auto-routes to vision-capable model

Pinning a specific model

jusInfer-auto lets jusInfer pick per call. If you want every Goose interaction to hit a specific upstream:

GOOSE_PROVIDER__MODEL: anthropic/claude-sonnet-4.5
# or
GOOSE_PROVIDER__MODEL: nousresearch/hermes-4-405b
# or
GOOSE_PROVIDER__MODEL: openai/gpt-5

jusInfer normalizes provider prefixes — you don't need separate accounts at each upstream.

Why Goose benefits from per-call routing

Goose sessions tend to be tool-heavy: the agent calls developer__list_files, then developer__read_file, then developer__write_file, then developer__shell to run tests. That's 4 LLM round-trips just to apply one edit and verify it. Most of those round-trips are tactical (parse a directory listing, propose a small edit) and don't need a frontier model.

jusInfer's per-call routing sends the tactical steps to a small fast model (Qwen3 8B, Hermes 70B) and reserves the frontier model for steps that actually need reasoning (architect a refactor, debug a confusing failure). On real Goose workloads we see 60-80% cost reduction with no task-completion regression.

Goose-extensions still work

If you have custom goose-extensions (Slack integration, internal API wrappers, custom tool sets), they keep working unchanged — extensions communicate via Goose's tool-call protocol, which is OpenAI-compatible, which is what jusInfer speaks. No extension-side change required.

Common gotchas

HOST is the base URL without /v1 — Goose appends /v1/chat/completions itself. So set https://api.jusinfer.com, not https://api.jusinfer.com/v1.
Streaming is on by default in Goose. If you see "Connection closed by peer" mid-stream, that's network, not the gateway — retry the session.
Goose retries failed tool calls up to N times. If you hit 429 RATE_LIMIT on a burst, either wait or raise the per-key rpm in your dashboard → Tenant tab.

Multi-engineer setup

Each engineer mints their own jinf_ key via /developer (separate keys = per-user spend attribution). They each run goose configure once with their own key. The owner can set per-user monthly soft caps under the Tenant tab to prevent any one engineer's runaway loop from draining the team wallet.

Setup checklist

Sign up at jusinfer.com/login.
Mint a jinf_ key at /developer → Keys.
goose configure → OpenAI provider, host https://api.jusinfer.com, paste key.
goose session start, run a normal workflow.
Check spend on the Usage tab.
(Multi-engineer) set per-user caps under Tenant tab.

Raw markdown: /blog/goose-custom-provider.md

gooseblockopenai-compatiblecustom-provideragent-toolkitai-coding