Skip to content
Inference endpoints · Agentic + CoWork · Tuned per workload

Two workloads. One sharp API.

jusInfer serves Agentic and CoWork — autonomous agents and human-in-the-loop teams — each with its own tuned endpoints: the Claw line, Hermes, and OpenCowork. Point at one API; we resolve the right model for the moment and bill you once.

$1 per user / month · first user free · pre-paid credits, no expiry

2Workload classes
0Vendor lock-in
1API, one bill
$0Until you ship
01 · The two workloads

Agents and CoWork get different tuning and endpoints. That's the difference.

Most gateways give every request the same treatment. jusInfer sorts each request into a workload class first — because a tool-looping agent and a human drafting in a chat window want very different things from a model. Each class owns its own roster, router, system prompt, parameters, and upstream endpoints.

WORKLOAD CLASS
Agentic

Autonomous agents that plan, loop, and call tools.

  • Many sequential calls per task
  • Heavy tool / MCP function-calling
  • Context accumulates across steps
  • Reasoning where it counts, cheap where it doesn't

Serves Claude Code, OpenCode, Cursor, Aider, Cline, Continue, and your own agents.

WORKLOAD CLASS
CoWork

Collaborative work — humans and agents, together.

  • Turn-based, latency-sensitive chat & drafting
  • Multi-agent collaboration over shared state
  • Tuned for conversational quality, not tool loops
  • Home of the OpenCowork harness

Serves Interactive assist, multi-agent teams, and the first-party OpenCowork client.

Fig. A — Set workload on the request, target a workload alias, or bind it to your API key. Unset requests are inferred.

02 · How it works

Two pillars, one abstraction. You see one bill.

The workload class is the seam where both of our pillars hang. Per request, we tune for your kind of work, then route to the model that wins on cost and capability — and the answer keeps improving as the market does. Your code never changes.

PILLAR
Workload-tuned optimization
  • ·Per-class system prompt + guardrails
  • ·Per-class parameters (reasoning, tool encouragement)
  • ·Per-class model roster + endpoints
  • ·The right model, tuned for your kind of work
PILLAR
Cost routing
  • ·Cheapest capable model per request
  • ·Cache and reuse across the team
  • ·Capacity arbitrage across providers
  • ·Opaque by default, inspectable on demand

Fig. B — The two pillars. We don't sell you a model menu; we sell the right model for the work, chosen on every call.

03 · Mission
Our mission is to make intelligence affordable. The frontier keeps moving — what should not move is the cost of using it.
04 · How we're cheaper

Old-world supply chain. Neo-world delivery.

We borrow what works from a century of supply-chain optimization — inventory, routing, bin-packing, hedging — and apply it to the new substrate of LLM inference.

01

Right-model routing

Every task is graded on difficulty before it runs. Most tasks don't need the biggest model — and don't get one.

02

Cache and reuse

Prompts, tool calls, and intermediate plans are cached aggressively across the team. The same work is never paid for twice.

03

Capacity arbitrage

We buy across providers, regions, and time-of-day. When one provider spikes, we route around it — you never notice.

05 · Pricing

One platform fee. Pre-paid credits for inference.

$1 per user per month is the platform fee. First user free. Credits are pre-paid in packs ($5, $10, larger). The gateway debits the actual inference cost per request. Credits roll over and don't expire. Hit zero, requests pause until you top up. No negative bill is possible.

Platform — $1 / user / month

Billed monthly

First user free. Each additional teammate is $1/mo, billed via Stripe and prorated when you add or remove members. Solo accounts stay free forever.

Credits — pre-paid, pay per request

Pre-paid · no expiry

Buy a pack ($5, $10, larger) and the gateway debits actual inference cost per request. Credits roll over and don't expire. Hit zero, requests pause until you top up — no negative bill possible.

Plans

JustAvailable now
Just

Drop-in for your coding agent. OpenAI- and Anthropic-API compatible. Tuned for coding workloads.

  • Works with Claude Code, OpenCode, Cursor, Aider, Cline
  • OpenAI-compatible API
  • Anthropic-API compatible
  • Best-model-per-task routing, inspectable on any call
  • Pay only for what runs
Start coding
ProNext
Pro

Less hand-holding. Multi-repo context. Background learning.

  • Everything in Just
  • Multi-repo aware — agent reasons across boundaries
  • RL loop on your repo — learns your patterns over time
  • Background agents for long-running plans
  • Custom routing policies
Waitlist
OrgLater
Org

Autonomous execution with audit-grade compliance.

  • Everything in Pro
  • SOC 2 · EU AI Act
  • Self-improving codebase — patches, deps, refactors
  • Org-wide agent fleets with role boundaries
  • Custom SLAs and dedicated capacity
Contact
06 · Add-ons

Bolt-ons for teams ready to coordinate.

Optional add-ons that layer on top of any plan. Priced per tenant. Both ship after the core plans stabilize.

Add-onComing soon

Shared configs

One config, every workstation.

Routing rules, prompt templates, tool allowlists, and review policies pushed instantly to every developer. No drift between workstations.

Add-onComing soon

AgenticPM

Issue tracker becomes the execution plan.

Agents pick up tickets, draft PRs, attach evidence, and report progress on the kanban your team already uses. Humans stay in the loop on merges.

07 · FAQ

Plain answers.

How does billing actually work?

Two things: a small seat fee ($1/user/month, first user free), and pre-paid credit packs ($5, $10, more) that cover inference. The seat fee pays for collaboration; credits cover the model compute. You can't go negative — if credits hit zero, requests pause until you top up.

What happens to unused credits?

They roll over forever. There's no monthly reset, no expiry. The wallet just accumulates until you spend it.

How are you cheaper?

We optimize execution across models. Old-world supply-chain techniques — inventory, hedging, bin-packing — applied to neo-world intelligence delivery. Most of your tasks do not need the biggest model; we make sure they do not get one.

Which models do you use?

The ones that win the cost-vs-capability tradeoff for your task, right now. The mix improves as the market improves. The model behind any call is opaque by default but inspectable on demand — you can ask "what ran?" anytime.

What's the difference between the plans?

Just (today) is the drop-in: point your existing agent at api.jusinfer.com and you get best-model-per-task routing on a single repo. Pro adds multi-repo awareness and an RL loop that learns your patterns over time. Org layers SOC 2 / EU AI Act compliance and a self-improving codebase on top. You only pay for what's shipping.

What are you launching next?

CLI and a desktop app are on the roadmap. The first focus is a VS Code plugin — that is where most coding actually happens.

Can I see which model ran?

Yes. By default we keep the model opaque so you can focus on shipping. On any call, you can prompt for the routing decision or flip the inspector on for the session.

What about my data?

Your code is yours. We do not train on it. Routing metadata is kept only as long as needed to make the next decision better.

Ready

Pick your workload. We'll handle the rest.