chore(model): remove inert --name flag from `obol model setup custom` by bussyjd · Pull Request #509 · ObolNetwork/obol-stack

bussyjd · 2026-05-21T10:09:26Z

Summary

What changed:

Removed --name flag from obol model setup custom.
Dropped name parameter from model.AddCustomEndpoint(cfg, u, endpoint, modelName, apiKey) signature.
Removed OBOL_LLM_NAME env var from flows/lib.sh::route_llm_via_obol_cli and flows/buy-external.sh.
Updated CLAUDE.md, internal/embed/skills/monetize-guide/SKILL.md, and .agents/skills/obol-stack-dev/references/llm-routing.md example commands and env-var lists.

Why it matters:
The --name flag was 100% inert — it never reached ModelEntry, never persisted to the LiteLLM ConfigMap, and never influenced obol model list/prefer/status/sync/remove or any routing. The string was echoed in two log lines and passed as a UI label to RestartLiteLLM on the fallback path. Nothing else. Its help text said "informational only — LiteLLM keys the route by --model, not --name".

The trap: operators run obol model setup custom --name foo --model my/model …, then call the route as foo, and get:

litellm.BadRequestError: You passed in model=foo.
There are no healthy deployments for this model.

This was the same error string in the v0.10.0-rc1 upgrade report attributed to a "cache survives obol stack up" bug. After five fresh-cluster probes on rc3 could not reproduce the cache bug, the reliable reproducer turned out to be calling the route by the user-given --name rather than the --model value LiteLLM actually keys on. Removing the flag eliminates that UX trap.

Risk level: low

Commit under test: b9ff172 (this PR), parent f8df92e (tag v0.10.0-rc3)

Base branch: main

Scope

Validation

CI checks:

Check	Status	Link
Unit tests (touched packages)	✅ pass	local — `go test ./cmd/obol/ ./internal/model/ -count=1`
Full unit suite	⚠️ 1 pre-existing fail	local — see below
Shell syntax	✅ pass	local — `bash -n flows/*.sh`
Release-smoke (flows 01-12)	⚠️ 5 fails, all env-related	local — see report below
Live cluster smoke (`obol model setup custom` end-to-end)	✅ pass	local

Unit tests:

$ go test ./... -count=1
ok      github.com/ObolNetwork/obol-stack/cmd/obol      1.476s
ok      github.com/ObolNetwork/obol-stack/internal/model      0.731s
ok      ... (39 packages total)
FAIL    github.com/ObolNetwork/obol-stack/internal/stack      8.202s
  └── TestWarnIfNoChatModel_EmitsWarnWhenNoModels

PRE-EXISTING: Reproduces on clean `main` HEAD (f8df92e / v0.10.0-rc3 tag)
with this PR's changes stashed. Test asserts the warn is on stderr but
the message arrives on stdout. Unrelated to this PR.

Integration tests:

SKIPPED — internal/openclaw integration tests expect host Ollama on
:11434; QA host runs Unsloth Studio on :8888 instead. Not exercised
by this change either way (--name was never used by Ollama setup path).

Flow tests:

Flow	Network	QA machine label	Worktree	Result	Artifacts
flow-02-stack-init-up	n/a	macOS Docker Desktop	none (in-tree)	PASS	`.tmp/release-smoke-20260521-134647/flow-02-stack-init-up.log`
flow-05-network	n/a	macOS Docker Desktop	none (in-tree)	PASS	`.tmp/release-smoke-20260521-134647/flow-05-network.log`
flow-07-sell-verify	n/a	macOS Docker Desktop	none (in-tree)	PASS	`.tmp/release-smoke-20260521-134647/flow-07-sell-verify.log`
flow-08-buy	base-sepolia	macOS Docker Desktop	none (in-tree)	PASS	`.tmp/release-smoke-20260521-134647/flow-08-buy.log`
flow-09-lifecycle	n/a	macOS Docker Desktop	none (in-tree)	PASS	`.tmp/release-smoke-20260521-134647/flow-09-lifecycle.log`
flow-10-anvil-facilitator	local	macOS Docker Desktop	none (in-tree)	PASS	`.tmp/release-smoke-20260521-134647/flow-10-anvil-facilitator.log`
flow-01-prerequisites	n/a	macOS Docker Desktop	none (in-tree)	FAIL (env)	Unsloth `/v1/models` returns 401 to unauthenticated probe
flow-03-inference	n/a	macOS Docker Desktop	none (in-tree)	FAIL (env)	`endpoint validation failed: ... context deadline exceeded` — Unsloth Studio 27B cold-start exceeds `ValidateCustomEndpoint`'s 60s timeout. CLI parsed args correctly — no `--name` regression.
flow-04-agent	n/a	macOS Docker Desktop	none (in-tree)	FAIL (env)	cascades from flow-03 (no model registered)
flow-06-sell-setup	n/a	macOS Docker Desktop	none (in-tree)	FAIL (env)	hard-coded preflight for Ollama at `:11434`; QA host runs Unsloth
flow-11-dual-stack	base-sepolia	macOS Docker Desktop	none (in-tree)	FAIL (env)	same Unsloth cold-start timeout during alice's `obol model setup custom` validation. Preflight (Alice ETH, Bob USDC, facilitator) all PASS.

Release smoke:

$ OBOL_LLM_ENDPOINT=http://host.k3d.internal:8888/v1 \
  OBOL_LLM_MODEL=unsloth/Qwen3.6-27B-MTP-GGUF \
  OBOL_LLM_API_KEY=<unsloth-studio-jwt> \
  bash flows/release-smoke.sh

| Flow                       | Result | FAIL lines | SKIP lines | Exit code |
| -------------------------- | ------ | ---------: | ---------: | --------: |
| flow-01-prerequisites      | FAIL   |          1 |          0 |         1 |
| flow-02-stack-init-up      | PASS   |          0 |          0 |         0 |
| flow-03-inference          | FAIL   |          6 |          0 |         1 |
| flow-04-agent              | FAIL   |          1 |          0 |         1 |
| flow-05-network            | PASS   |          0 |          0 |         0 |
| flow-06-sell-setup         | FAIL   |          1 |          0 |         1 |
| flow-07-sell-verify        | PASS   |          0 |          0 |         0 |
| flow-10-anvil-facilitator  | PASS   |          0 |          0 |         0 |
| flow-08-buy                | PASS   |          0 |          0 |         0 |
| flow-09-lifecycle          | PASS   |          0 |          0 |         0 |
| flow-11-dual-stack         | FAIL   |          0 |          0 |         1 |

Release smoke failed: 5 flow(s)

Failure attribution: Zero failures involve --name parsing. grep "unknown flag\|flag provided but not defined" .tmp/release-smoke-*/*.log returns nothing. Every obol model setup custom invocation parsed arguments and reached endpoint validation correctly. All 5 failures are upstream of the CLI:

Unsloth Studio auth — /v1/models requires bearer token; flow-01's simple unauthenticated probe gets 401.
Unsloth Studio cold-start — first inference call on the 27B GGUF triggers model load, exceeding ValidateCustomEndpoint's 60s timeout. Surfaces in flow-03 and flow-11.
flow-06 Ollama hardcode — flow-06-sell-setup.sh preflight checks localhost:11434; QA host runs Unsloth, not Ollama.
flow-04 — cascades from flow-03 leaving LiteLLM without the routed model.

A vLLM/llama.cpp QA host without auth would not hit any of these. Six flows pass cleanly including all on-chain payment flows (flow-08 buy, flow-09 lifecycle).

Live Chain Evidence

Do not include private keys, seed phrases, passwords, hostnames, personal paths, or raw bearer tokens.

Network: base-sepolia (flow-08, flow-11)

RPC/provider: default free-tier fallback (no paid RPC set this run)

Facilitator: https://x402.gcp.obol.tech (reachable, supports Base Sepolia exact)

Contracts and tokens:

Name	Address	Version / notes
USDC (Base Sepolia)	`0x036CbD53842c5426634e7929541eC2318f3dCF7e`	facilitator-default

Wallet roles:

Role	Address	Source
Alice / seller / register	`0xC0De030F6C37f490594F93fB99e2756703c4297E`	flow-11 derived from REMOTE_SIGNER_PRIVATE_KEY
Bob / buyer / payer	`0x57b0eF875DeB5A37301F1640E469a2129Da9490E`	flow-11 derived from REMOTE_SIGNER_PRIVATE_KEY (2nd derive)
Facilitator / receiver	n/a	hosted x402-rs

Balances:

Token	Address	Before	After	Expected delta	Actual delta
USDC	`0x036CbD…CF7e`	Bob 4.95 USDC	Bob 4.95 USDC	0 (no purchase fired — flow-11 failed at LLM setup)	0

Transaction receipts: none on-chain this run (PR is CLI-only, no settlement path touched).

Runtime Evidence

QA environment:

Item	Value
OS / arch	macOS Darwin 25.5.0 / arm64
Backend	Docker Desktop + k3d v5.8.3
Tool versions	go1.25.5, k3d 5.8.3, helm 3.x, helmfile 1.4.x, kubectl 1.35.x
QA agent/model	Hermes (nousresearch/hermes-agent:v2026.5.7) + Unsloth Studio serving unsloth/Qwen3.6-27B-MTP-GGUF

Images:

Component	Image	Tag / digest	Source
obol-agent (Hermes)	`nousresearch/hermes-agent`	v2026.5.7	docker.io
LiteLLM	`ghcr.io/berriai/litellm:main-stable`	upstream	ghcr
x402-verifier / serviceoffer-controller / x402-buyer / demo-server / public-storefront	`ghcr.io/obolnetwork/<name>`	`:latest` (locally built, OBOL_FORCE_REBUILD_LOCAL_DEV_IMAGES=true)	in-tree Dockerfiles

Kubernetes / stack:

Item	Value
Stack IDs	`smart-dinosaur` (validation phase), `wondrous-crane` (release-smoke) — both torn down
Namespaces	standard set (llm, x402, hermes-obol-agent, erpc, monitoring, traefik, obol-frontend)
Pod readiness	all default infra pods Running 2/2 or 1/1 during validation
Cleanup result	release-smoke's `cleanup_stacks` trap removed test workspaces on exit

Model and routing:

Item	Value
Agent/model used	Hermes → LiteLLM → claude-sonnet-4-6 (Anthropic via ANTHROPIC_API_KEY) and unsloth/Qwen3.6-27B-MTP-GGUF (Unsloth Studio on host)
LiteLLM route	claude-sonnet-4-6 + paid/* + anthropic/* + unsloth/Qwen3.6-27B-MTP-GGUF
Paid endpoint status	not exercised this PR
Auth token source	LITELLM_MASTER_KEY from `kubectl get secret litellm-secrets -n llm`; Unsloth JWT from POST `/api/auth/login`

Artifacts and logs:

Artifact	Location / link	Notes
Release-smoke run	`.tmp/release-smoke-20260521-134647/`	11 per-flow logs + `RELEASE_REPORT.md`
Pre-merge live smoke	tmux pane `obol-0:qa.0`	4 chat probes, all HTTP 200

Demo readiness:

Item	Status	Notes
Seller visible / registered	n/a	not in scope of this PR
Buyer discovery works	✅	flow-08 buy and flow-09 lifecycle both PASS in release-smoke
Paid route works	✅	flow-08 PASS
Settlement visible on-chain	n/a	no settlement triggered this PR

Review Notes

Known gaps:

Unsloth Studio is not natively supported by obol model setup custom — it requires a bearer JWT and has slow cold-start for large GGUFs that exceeds the 60s validation timeout. Adding first-class Unsloth support (alongside Ollama) would let release-smoke run cleanly on hosts without vLLM. Out of scope for this PR.
A separate bug surfaced during validation: obol model setup custom hot-add path fails with [Errno 30] Read-only file system: '/etc/litellm/config.yaml' because the ConfigMap is mounted RO. The CLI correctly falls back to a deployment restart, so users aren't blocked, but every custom-endpoint setup pays the full ~90s rollout cost. Worth a follow-up.
Pre-existing unit-test failure: TestWarnIfNoChatModel_EmitsWarnWhenNoModels in internal/stack is broken on main HEAD as of this PR (asserts warn-on-stderr but the message lands on stdout). Predates this branch.

Follow-ups:

Native Unsloth support in obol model setup (auth handling, longer first-call timeout).
Fix hot-add by writing the merged config to an emptyDir or making the mount RW.
Fix TestWarnIfNoChatModel_EmitsWarnWhenNoModels (separate issue).
Re-validate Issue Add helm to obolup #2 (per-agent Hermes crashloop) on a Linux k3d host — cannot reproduce on macOS Docker Desktop due to virtiofs ownership translation, but source inspection on rc3 confirms agent_render.go::agentPodSpec still lacks an init container.

Reviewer focus:

Confirm the signature change AddCustomEndpoint(cfg, u, endpoint, modelName, apiKey) (no name) is acceptable — the function is only called from cmd/obol/model.go and has no test mocks. Repo-wide grep returns zero stale callers.
Confirm the RestartLiteLLM(cfg, u, modelName) fallback label change is OK — the third arg is a UI-only string.
Confirm OBOL_LLM_NAME removal from flows/buy-external.sh doesn't break any external automation referencing it (none found in this repo or in ~/.config).

The `--name` flag on `obol model setup custom` was documented as informational only and never participated in any routing or persistence: - ModelEntry has only `model_name` (route key) + `litellm_params`; the CLI `--name` value was never written to either. - `detectProvider` (used by `obol model list/status`) inspects `entry.ModelName` + `entry.LiteLLMParams.Model` prefixes; the `--name` string never reached it. - It was only echoed back in two log lines and passed as a UI label to `RestartLiteLLM` on the hot-add fallback path. This caused confusion in QA: an operator running obol model setup custom --name foo --model my/model ... would later call the route as `foo` and get LiteLLM's BadRequestError: ... There are no healthy deployments for this model. (The same error message the operator at #v0.10.0-rc1-upgrade-report attributed to a cache-survives-stack-up bug. Five fresh-cluster probes on rc3 could not reproduce the cache bug — the consistent reproducer turned out to be calling the route by the user-given `--name` rather than the actual registered `--model` value.) Changes: - cmd/obol/model.go: drop --name flag from modelSetupCustomCommand - internal/model/model.go: drop name parameter from AddCustomEndpoint; fallback RestartLiteLLM label now uses modelName - flows/lib.sh: route_llm_via_obol_cli no longer reads OBOL_LLM_NAME or passes --name - flows/buy-external.sh: OBOL_LLM_NAME env var removed (orphan) - CLAUDE.md / monetize-guide SKILL.md / llm-routing.md: example commands and env-var lists drop --name / OBOL_LLM_NAME

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(model): remove inert --name flag from `obol model setup custom`#509

chore(model): remove inert --name flag from `obol model setup custom`#509
bussyjd wants to merge 1 commit into
mainfrom
chore/remove-unused-model-name-flag

bussyjd commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bussyjd commented May 21, 2026

Summary

Scope

Validation

Live Chain Evidence

Runtime Evidence

Review Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant