2026-05-26 · Kalmantic

TL;DR — If your custom agent harness uses the OpenAI Python or Node SDK, change two values — base_url to https://api.jusinfer.com/v1 and api_key to a jinf_ token. No code changes beyond the client constructor. Tool-call shapes are normalized server-side.

Your custom agent harness — point it at a cheaper endpoint in one line

Some teams build their own coding-agent harness instead of using Claude Code, OpenCode, or Cursor. The reasons vary — domain-specific tools, privacy, in-house benchmarks, or just a strong opinion about how an agent loop should work. Names you see in the wild: in-house "OpenClaw" / "nemoClaw" style projects, Hermes-based agent harnesses, custom CrewAI / LangGraph orchestrations, internal forks of Aider.

All of them have the same property: if the harness speaks OpenAI Chat Completions, it speaks jusInfer. This post is the universal setup, plus the three things to verify before you ship.

The universal config

99% of custom agents use one of two HTTP client patterns. Here are both.

Pattern 1 — OpenAI Python SDK

from openai import OpenAI

client = OpenAI(
    base_url="https://api.jusinfer.com/v1",
    api_key=os.environ["JUSINFER_API_KEY"],  # jinf_… from /developer
)

# Everything else stays the same.
resp = client.chat.completions.create(
    model="jusInfer-auto",        # or any provider/model id you want pinned
    messages=conversation,
    tools=tool_schemas,
    stream=True,
)

Pattern 2 — Raw fetch / requests

import requests, os

resp = requests.post(
    "https://api.jusinfer.com/v1/chat/completions",
    headers={
        "Authorization": f"Bearer {os.environ['JUSINFER_API_KEY']}",
        "Content-Type": "application/json",
    },
    json={
        "model": "jusInfer-auto",
        "messages": conversation,
        "tools": tool_schemas,
        "stream": True,
    },
    stream=True,
)

That's the whole integration. The hard part — model selection per call, upstream provider routing, failover — happens server-side.

Mapping your existing config

Most custom harnesses have a config block like:

LLM_PROVIDER = "anthropic"   # or "openai", "together", etc.
LLM_MODEL    = "claude-sonnet-4-5"
LLM_BASE_URL = "https://api.anthropic.com"
LLM_API_KEY  = "sk-ant-..."

To switch to jusInfer, only three lines change:

LLM_PROVIDER = "openai"      # we speak OpenAI shape regardless of underlying model
LLM_MODEL    = "jusInfer-auto"  # or pin a specific provider/model id
LLM_BASE_URL = "https://api.jusinfer.com/v1"
LLM_API_KEY  = "jinf_..."

If your harness has hard-coded "anthropic" / "openai" / "together" branches in code, the simplest fix is to add a "jusInfer" branch that uses the OpenAI client path. Or just use the OpenAI branch and point its base URL at us.

What works for any custom harness

Feature	Status
`messages` shape (system/user/assistant/tool)	✅
Streaming via SSE	✅
Tool calls (OpenAI shape)	✅
Tool calls (Anthropic shape)	✅ we normalize
`response_format: json_object`	✅
Parallel tool calls	✅ when the underlying model supports it
Vision inputs (image_url)	✅ auto-routes to a vision-capable model
Logprobs	⚠️ pass-through if upstream supports; ignored if not
Fine-tunes	❌ we don't host fine-tunes; use the provider directly for those

Three things to verify before shipping

1. Tool-call shape on YOUR specific model id

If you pin a specific model (e.g. nousresearch/hermes-4-405b or qwen/qwen3-coder-480b), test that tool calls land in the shape your agent expects. We normalize but edge cases exist — log one tool call end-to-end and inspect the JSON.

2. Cost per task on YOUR workload

Don't trust the homepage promise. Run 10 real tasks through your harness, check the usage tab. If it's not 50%+ cheaper than your prior setup, something's miscoded (often: you're sending the same prompt twice because of a retry bug in your harness).

3. Failure mode when upstream stutters

We retry transparently when an upstream model 502s or rate-limits. Test this: kill your network for 5 seconds mid-stream. Your agent should see a 502 UPSTREAM_ERROR only if all our retries failed (rare). Otherwise the stream resumes from the same response with no duplicate text.

Per-user accounting in a multi-user agent

If your harness serves multiple users and you need per-user spend attribution:

Mint one jinf_ key per user via POST /v1/keys (JWT-authed; humans sign in first).
Each user's calls go through their own key.
The usage dashboard breaks down spend per key.
Set per-user soft caps via POST /v1/tenant/members/:id/cap so a single user can't blow the team budget.

For a single shared key (single tenant) we still attribute by user_id if your agent passes it in headers — see API reference.

A note on the names

Got asked about specific custom harnesses recently — "OpenClaw", "nemoClaw", "Hermes agents". These aren't standardized products in the sense Cursor or Claude Code are. They're patterns or in-house projects. If you're building one of them or its equivalent, this guide covers you. If "OpenClaw" or "nemoClaw" is something more specific you want a tailored guide for, email hello@jusinfer.com with a link to your repo and we'll write one.

Setup checklist

Sign up at jusinfer.com/login.
Mint a jinf_ key at /developer.
Set base URL + key in your harness config.
Run one task end-to-end. Verify response on the Usage tab.
Set per-user caps if multi-tenant.
Set rate limits in the Tenant tab if you expect burst traffic.

Raw markdown: /blog/custom-agent-harness-openai-compatible.md

custom-agentopenai-compatibleagent-harnessopenclawnemoclawhermes-agentsbyo-agent