2026-05-27 · Kalmantic

TL;DR — OpenRouter = price-shopping marketplace. Helicone = observability proxy bolted on top of your direct provider keys. Portkey = enterprise routing + guardrails. jusInfer = coding-agent-specific router with per-step model selection. Pick the one whose primary axis matches your highest-pain problem.

LLM gateway comparison 2026

"LLM gateway" gets used to mean four different products. Picking the wrong shape is the most common mistake we see — a team installs an observability proxy when they actually needed a router, or installs a router when they actually needed cost telemetry on their existing provider keys.

The four shapes:

Shape	Example	Optimizes for	What you give up
Marketplace	OpenRouter	Lowest-price access to many models	No per-step routing intelligence; pay-per-use markup
Observability proxy	Helicone	Telemetry, logs, retries on YOUR keys	No price optimization; you still pay the underlying provider directly
Enterprise router	Portkey	Governance, fallbacks, PII redaction	Heavier setup; priced for orgs not individuals
Coding-agent router	jusInfer	Per-step model selection for coding workloads	Not optimized for chat, image, embeddings

OpenRouter

What it is: a marketplace. You send a request specifying a model id (anthropic/claude-sonnet-4, deepseek/deepseek-chat, etc.); OpenRouter forwards to the cheapest-currently-available provider hosting that model and charges you their price plus a small markup.

Best for: getting access to many models behind a single key, especially open-weight models hosted by multiple inference providers (Together, Fireworks, DeepInfra, etc.) where price varies day to day.

Where it's weak for coding agents: you still pick the model. OpenRouter doesn't watch your conversation and route an "is this file a JSON config?" question to a small model. If you pin Claude Sonnet in your harness, every step uses Claude Sonnet.

Helicone

What it is: a proxy you point your existing OpenAI/Anthropic SDK at by swapping the base URL. Your keys stay yours; Helicone logs every request, computes per-user cost, exposes a dashboard. Free tier covers a generous quota.

Best for: teams that already standardized on a direct provider and want telemetry without changing their billing model.

Where it's weak: no price optimization layer. If your bill is $X today with the direct provider, your bill is still $X with Helicone — you just see it itemized. Adding routing requires migrating to a different product class.

Portkey

What it is: an enterprise-shaped router with guardrails (PII redaction, prompt firewalls), multi-provider fallback chains, cost budgets per virtual key, and an audit-log surface.

Best for: orgs with a procurement process, compliance requirements, or multi-team budget allocation. The configuration surface is rich because that's what enterprises need.

Where it's heavier than you want: for an individual developer or a small team, Portkey's setup-to-value time is longer than the alternatives. The features pay back at scale, not at one-developer scale.

jusInfer

What it is: a router specifically for coding-agent traffic. Same OpenAI-compatible API surface, but the model selection inside the gateway is informed by what a coding loop looks like — short "read this file" turns get cheap models, "rewrite this 400-line function" gets a stronger one, "design this system" gets reasoning-capable.

Best for: people running Claude Code, Cursor, Cline, OpenCode, Aider, or any other coding agent and watching their monthly bill grow with usage.

Where it's weak: not optimized for image generation, embeddings, fine-tuning, or chat-style consumer products. The per-step routing logic encodes "what does a coding turn look like" — applied to chatbot traffic, it just looks like a regular cheap-by-default proxy.

Decision matrix

Pick by your highest-pain problem:

Your pain	Pick
"I want access to 100+ models behind one key, even niche open-weight ones"	OpenRouter
"My provider bill is fine, but I have no visibility into who's burning the most"	Helicone
"Compliance + budgets + multi-team governance is the bottleneck"	Portkey
"My coding-agent bill is growing faster than my team"	jusInfer
"I want all of the above"	Compose two: Helicone in front of jusInfer, for example. They're not exclusive.

What about combining?

You can. Helicone is content-neutral — it'll happily proxy any OpenAI-compatible endpoint, including jusInfer's. Stack order:

your harness  →  Helicone (telemetry)  →  jusInfer (per-step routing)  →  underlying provider

Your Helicone dashboard shows the request to jusInfer; jusInfer's dashboard shows the routed underlying model. Two layers of visibility, one bill at the bottom.

What you can't usefully stack: two price-optimization layers (OpenRouter behind jusInfer, or vice versa). One of them ends up just being a passthrough that adds latency without changing the routed model.

Honest weaknesses of each (we've used all four)

OpenRouter — quality varies day to day as it shifts between sub-providers. Fine for batch, occasionally surprising for interactive coding.
Helicone — sampling at scale needs paid plan; free tier hits limits quickly under heavy agent traffic.
Portkey — config-as-data is powerful but takes a day to internalize for a new user. Not a "5-minute install."
jusInfer — narrower scope; if your workload is 50% coding agents and 50% something else, the savings only land on the coding half.

When the right answer is "none, go direct"

Three cases:

You're under $20/month of inference — gateway cost (free or otherwise) is solving a problem you don't have.
You're locked to one model intentionally (a research project measuring one specific model's behavior).
Your tooling doesn't support custom base URLs (rare in 2026; even Cursor and Copilot have it now).