---
title: Your custom agent harness — point it at a cheaper endpoint in one line
description: Building your own coding-agent harness (OpenClaw, nemoClaw, Hermes-based, in-house)? If it speaks OpenAI Chat Completions, it speaks jusInfer. Here's the universal config plus what to watch for.
tldr: If your custom agent harness uses the OpenAI Python or Node SDK, change two values — base_url to https://api.jusinfer.com/v1 and api_key to a jinf_ token. No code changes beyond the client constructor. Tool-call shapes are normalized server-side.
date: 2026-05-26
author: jusInfer
cluster: integration
tags: custom-agent, openai-compatible, agent-harness, openclaw, nemoclaw, hermes-agents, byo-agent
---

# Your custom agent harness — point it at a cheaper endpoint in one line

Some teams build their own coding-agent harness instead of using Claude Code, OpenCode, or Cursor. The reasons vary — domain-specific tools, privacy, in-house benchmarks, or just a strong opinion about how an agent loop should work. Names you see in the wild: in-house "OpenClaw" / "nemoClaw" style projects, Hermes-based agent harnesses, custom CrewAI / LangGraph orchestrations, internal forks of Aider.

All of them have the same property: **if the harness speaks OpenAI Chat Completions, it speaks jusInfer**. This post is the universal setup, plus the three things to verify before you ship.

## The universal config

99% of custom agents use one of two HTTP client patterns. Here are both.

### Pattern 1 — OpenAI Python SDK

```python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.jusinfer.com/v1",
    api_key=os.environ["JUSINFER_API_KEY"],  # jinf_… from /developer
)

# Everything else stays the same.
resp = client.chat.completions.create(
    model="jusInfer-auto",        # or any provider/model id you want pinned
    messages=conversation,
    tools=tool_schemas,
    stream=True,
)
```

### Pattern 2 — Raw fetch / requests

```python
import requests, os

resp = requests.post(
    "https://api.jusinfer.com/v1/chat/completions",
    headers={
        "Authorization": f"Bearer {os.environ['JUSINFER_API_KEY']}",
        "Content-Type": "application/json",
    },
    json={
        "model": "jusInfer-auto",
        "messages": conversation,
        "tools": tool_schemas,
        "stream": True,
    },
    stream=True,
)
```

That's the whole integration. The hard part — model selection per call, upstream provider routing, failover — happens server-side.

## Mapping your existing config

Most custom harnesses have a config block like:

```python
LLM_PROVIDER = "anthropic"   # or "openai", "together", etc.
LLM_MODEL    = "claude-sonnet-4-5"
LLM_BASE_URL = "https://api.anthropic.com"
LLM_API_KEY  = "sk-ant-..."
```

To switch to jusInfer, only three lines change:

```python
LLM_PROVIDER = "openai"      # we speak OpenAI shape regardless of underlying model
LLM_MODEL    = "jusInfer-auto"  # or pin a specific provider/model id
LLM_BASE_URL = "https://api.jusinfer.com/v1"
LLM_API_KEY  = "jinf_..."
```

If your harness has hard-coded "anthropic" / "openai" / "together" branches in code, the simplest fix is to add a `"jusInfer"` branch that uses the OpenAI client path. Or just use the OpenAI branch and point its base URL at us.

## What works for any custom harness

| Feature | Status |
|---|---|
| `messages` shape (system/user/assistant/tool) | ✅ |
| Streaming via SSE | ✅ |
| Tool calls (OpenAI shape) | ✅ |
| Tool calls (Anthropic shape) | ✅ we normalize |
| `response_format: json_object` | ✅ |
| Parallel tool calls | ✅ when the underlying model supports it |
| Vision inputs (image_url) | ✅ auto-routes to a vision-capable model |
| Logprobs | ⚠️ pass-through if upstream supports; ignored if not |
| Fine-tunes | ❌ we don't host fine-tunes; use the provider directly for those |

## Three things to verify before shipping

### 1. Tool-call shape on YOUR specific model id

If you pin a specific model (e.g. `nousresearch/hermes-4-405b` or `qwen/qwen3-coder-480b`), test that tool calls land in the shape your agent expects. We normalize but edge cases exist — log one tool call end-to-end and inspect the JSON.

### 2. Cost per task on YOUR workload

Don't trust the homepage promise. Run 10 real tasks through your harness, check the usage tab. If it's not 50%+ cheaper than your prior setup, something's miscoded (often: you're sending the same prompt twice because of a retry bug in your harness).

### 3. Failure mode when upstream stutters

We retry transparently when an upstream model 502s or rate-limits. Test this: kill your network for 5 seconds mid-stream. Your agent should see a `502 UPSTREAM_ERROR` only if all our retries failed (rare). Otherwise the stream resumes from the same response with no duplicate text.

## Per-user accounting in a multi-user agent

If your harness serves multiple users and you need per-user spend attribution:

1. Mint one `jinf_` key per user via `POST /v1/keys` (JWT-authed; humans sign in first).
2. Each user's calls go through their own key.
3. The usage dashboard breaks down spend per key.
4. Set per-user soft caps via `POST /v1/tenant/members/:id/cap` so a single user can't blow the team budget.

For a single shared key (single tenant) we still attribute by `user_id` if your agent passes it in headers — see [API reference](/docs/api-reference/).

## A note on the names

Got asked about specific custom harnesses recently — "OpenClaw", "nemoClaw", "Hermes agents". These aren't standardized products in the sense Cursor or Claude Code are. They're patterns or in-house projects. If you're building one of them or its equivalent, this guide covers you. If "OpenClaw" or "nemoClaw" is something more specific you want a tailored guide for, email hello@jusinfer.com with a link to your repo and we'll write one.

## Setup checklist

1. Sign up at [jusinfer.com/login](https://jusinfer.com/login).
2. Mint a `jinf_` key at [/developer](https://jusinfer.com/developer).
3. Set base URL + key in your harness config.
4. Run one task end-to-end. Verify response on the [Usage tab](https://jusinfer.com/developer).
5. Set per-user caps if multi-tenant.
6. Set rate limits in the [Tenant tab](https://jusinfer.com/developer) if you expect burst traffic.

## Related reading

- [OpenAI-compatible drop-in (every popular harness, step by step)](/docs/openai-drop-in/)
- [Hermes models for coding agents](/blog/hermes-models-and-coding-agents/)
- [Inference endpoints for coding agents — what's different](/blog/inference-endpoint-coding-agents/)
- [API reference](/docs/api-reference/)

---

*Raw markdown: [/blog/custom-agent-harness-openai-compatible.md](/blog/custom-agent-harness-openai-compatible.md)*
