Skip to content
2026-05-26 · Kalmantic

TL;DR — OpenAI-compatible" means an endpoint accepts the same HTTP request shape as OpenAI's Chat Completions API. Any client built for OpenAI can use it by changing two values (base_url + api_key). The standard isn't perfect — streaming, tool calls, and JSON mode are where compatibility breaks down — but it's the closest thing the LLM space has to a universal protocol.

OpenAI-compatible API, explained

In 2024, OpenAI's Chat Completions API became the de facto standard for talking to LLMs. By 2026, "OpenAI-compatible" appears on the homepage of nearly every inference provider. It's the most important interoperability standard in AI infrastructure — but the term hides real differences in what compatibility actually means.

The 30-second version

When a provider says "OpenAI-compatible API," they mean: their endpoint accepts requests in OpenAI's Chat Completions shape (POST /v1/chat/completions with messages, tools, stream, etc.) and returns responses in OpenAI's response shape.

The practical implication: any client library built for OpenAI works against their endpoint by changing two values — the base URL and the API key. No code changes. No new SDK.

# Before — OpenAI direct
client = OpenAI(api_key="sk-…")

# After — any OpenAI-compatible provider
client = OpenAI(
    base_url="https://api.example-provider.com/v1",
    api_key="provider_specific_key",
)
# Everything else stays the same.

That's the whole standard. Or at least, that's the marketing version. The real story is more nuanced.

What's actually in the spec

OpenAI doesn't publish a formal "OpenAI-compatible" specification. There's no compliance suite, no version mark, no certification. Compatibility is judged by behavior. Roughly, an endpoint is OpenAI-compatible if it correctly handles:

SurfaceWhat "correct" means
POST /v1/chat/completionsAccepts messages, returns choices[0].message.content
messages shapesystem/user/assistant/tool roles
stream: trueReturns Server-Sent Events with the same delta-format chunks
tools / function_callAccepts tool-schema array, returns tool calls in the response
response_format: json_objectConstrains output to valid JSON when requested
temperature, top_p, max_tokens, seedHonored if the model supports them
usage.prompt_tokens + usage.completion_tokens in responsesFor billing reconciliation

A provider can claim "OpenAI-compatible" while only handling the basics (non-streaming, no tool calls). That makes the term load-bearing — what you really want to know is which subset they support.

The compatibility cliffs

Compatibility breaks down at three places. If you're picking a provider, test these specifically:

1. Streaming + tool calls together

The hardest case. Many providers handle streaming OR tool calls fine in isolation but mangle them together. Tool-call deltas arrive in fragments; the client must accumulate them into a complete JSON object. Some providers emit valid streamed tool calls; some emit them all in one chunk at the end; some emit them in a non-standard shape that breaks downstream parsing.

Test: make a streamed call with a tool, log the raw SSE bytes. Verify the tool-call object is reconstructable using the OpenAI SDK's accumulator.

2. Anthropic-shape ↔ OpenAI-shape translation

When a provider hosts both Anthropic and open-weights models behind one OpenAI-compatible endpoint, they have to translate. Anthropic's Messages API differs from Chat Completions in several places — system prompts live in their own field, tool calls have different schemas, stop_reason values are different. Some providers normalize cleanly; some leak the underlying differences.

Test: call the same prompt against an Anthropic model and an OpenAI model via the same provider. Compare the response shapes. They should be byte-identical at the API surface.

3. usage.cost, prompt caching, and other extensions

OpenAI's response includes usage.prompt_tokens + usage.completion_tokens. It does not include cost — you have to multiply tokens × your per-token price. Some providers extend the response with usage.cost, prompt-cache metadata, or other useful extras. These extensions are non-standard but harmless if your client ignores unknown fields.

Test: log a response, check for non-standard fields. If they exist, your client SDK should pass them through (most do).

Why the standard matters so much

Before OpenAI compatibility became universal, switching LLM providers meant rewriting the client side of your agent. Different SDKs, different response shapes, different error handling. Provider lock-in was real and expensive.

Post-2024, it's a config change. Your Cursor, Aider, Claude Code, Cline, Continue, Goose, custom harness — any of them can point at any OpenAI-compatible endpoint without code changes. The cost of switching providers dropped from "1-2 weeks of engineering" to "5 minutes of config."

That has secondary effects:

  • Provider competition is real. No lock-in means providers compete on quality + price + latency, not on having captured your codebase.
  • Router-tier services are possible. Routers like jusInfer work because the underlying endpoints all speak the same protocol; we can route per call without coordinating with each provider individually.
  • Open-weights infrastructure is viable. Together, Fireworks, Cloudflare Workers AI, etc. all serve open-weights models behind OpenAI-compatible endpoints. That makes Llama / Qwen / Hermes / DeepSeek / Mistral interchangeable with frontier models from your client's perspective.

What OpenAI-compatible is NOT

Three things people assume "OpenAI-compatible" includes but it usually doesn't:

  1. The same model quality. An OpenAI-compatible endpoint serving Llama 4 8B will return shorter, less reasoning-dense responses than OpenAI direct serving GPT-5. The API shape is identical; the model is not.
  2. The same pricing model. Some providers charge per token, some per request, some per second of compute. OpenAI-compatible says nothing about how you're billed.
  3. The same SLA / uptime. Compatibility is about the response surface, not the operational guarantees behind it. Look at the provider's status page separately.

What about the Responses API?

OpenAI shipped a newer API in late 2025 — /v1/responses — for stateful conversations. It's separate from Chat Completions. Calling something "OpenAI-compatible" today still usually means Chat Completions. Responses-API support is rarer and a forward-looking provider differentiator.

(jusInfer supports both: /v1/chat/completions and /v1/responses. Most third-party tools still only emit Chat Completions calls.)

How to test a provider's claim in 5 minutes

# 1. Mint a key with the provider. Note their base URL.
BASE_URL="https://api.example-provider.com/v1"
KEY="your_key"

# 2. Non-streaming round trip.
curl "$BASE_URL/chat/completions" \
  -H "Authorization: Bearer $KEY" -H "Content-Type: application/json" \
  -d '{"model":"<their model>","messages":[{"role":"user","content":"Hi"}]}'

# 3. Streaming round trip.
curl "$BASE_URL/chat/completions" \
  -H "Authorization: Bearer $KEY" -H "Content-Type: application/json" \
  -d '{"model":"<their model>","messages":[{"role":"user","content":"Hi"}],"stream":true}'

# 4. Tool-call round trip (streaming).
curl "$BASE_URL/chat/completions" \
  -H "Authorization: Bearer $KEY" -H "Content-Type: application/json" \
  -d '{
    "model":"<their model>",
    "messages":[{"role":"user","content":"What is the weather in Tokyo?"}],
    "tools":[{"type":"function","function":{"name":"get_weather","parameters":{"type":"object","properties":{"city":{"type":"string"}}}}}],
    "stream":true
  }'

# Pass = the response has a tool_call delta with name "get_weather" and {"city":"Tokyo"}.

If steps 2-4 all produce valid OpenAI-shape responses, the provider is genuinely OpenAI-compatible for the basics. If any of them break, that's a real-world limit to know before committing.

Why this matters for coding agents

Coding agents are heavy on tool calls — read file, write file, run command, search codebase, propose edit. Every step is a tool call. A provider that's "OpenAI-compatible" for chat but mangles streamed tool calls will break your agent in subtle ways (responses get cut off, tool arguments arrive malformed, the agent retries forever).

When you're picking an endpoint for a coding agent, the "OpenAI-compatible" claim isn't enough. You want streaming + tool-call compatibility, specifically tested on your workload's shape.

jusInfer normalizes tool calls across underlying providers (Anthropic, OpenAI, Together-hosted opens, etc.) so the response your agent sees is always OpenAI-shape regardless of which model actually ran. That's the value of a routing layer that takes compatibility seriously.

Setup checklist (jusInfer)

  1. Sign up at jusinfer.com/login.
  2. Mint a jinf_ key at /developer → Keys.
  3. Change base_url in your client to https://api.jusinfer.com/v1.
  4. Test with the 4 curl calls above.
  5. Point your real agent at the new base URL.

Raw markdown: /blog/openai-compatible-api-explained.md

openai-compatiblellm-apiinference-endpointcompatibilityopenai-standardai-coding