chore(model): remove inert --name flag from obol model setup custom#509
Open
bussyjd wants to merge 1 commit into
Open
chore(model): remove inert --name flag from obol model setup custom#509bussyjd wants to merge 1 commit into
obol model setup custom#509bussyjd wants to merge 1 commit into
Conversation
The `--name` flag on `obol model setup custom` was documented as
informational only and never participated in any routing or persistence:
- ModelEntry has only `model_name` (route key) + `litellm_params`; the
CLI `--name` value was never written to either.
- `detectProvider` (used by `obol model list/status`) inspects
`entry.ModelName` + `entry.LiteLLMParams.Model` prefixes; the `--name`
string never reached it.
- It was only echoed back in two log lines and passed as a UI label to
`RestartLiteLLM` on the hot-add fallback path.
This caused confusion in QA: an operator running
obol model setup custom --name foo --model my/model ...
would later call the route as `foo` and get LiteLLM's
BadRequestError: ... There are no healthy deployments for this model.
(The same error message the operator at #v0.10.0-rc1-upgrade-report
attributed to a cache-survives-stack-up bug. Five fresh-cluster probes
on rc3 could not reproduce the cache bug — the consistent reproducer
turned out to be calling the route by the user-given `--name` rather
than the actual registered `--model` value.)
Changes:
- cmd/obol/model.go: drop --name flag from modelSetupCustomCommand
- internal/model/model.go: drop name parameter from AddCustomEndpoint;
fallback RestartLiteLLM label now uses modelName
- flows/lib.sh: route_llm_via_obol_cli no longer reads OBOL_LLM_NAME or
passes --name
- flows/buy-external.sh: OBOL_LLM_NAME env var removed (orphan)
- CLAUDE.md / monetize-guide SKILL.md / llm-routing.md: example commands
and env-var lists drop --name / OBOL_LLM_NAME
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
What changed:
--nameflag fromobol model setup custom.nameparameter frommodel.AddCustomEndpoint(cfg, u, endpoint, modelName, apiKey)signature.OBOL_LLM_NAMEenv var fromflows/lib.sh::route_llm_via_obol_cliandflows/buy-external.sh.CLAUDE.md,internal/embed/skills/monetize-guide/SKILL.md, and.agents/skills/obol-stack-dev/references/llm-routing.mdexample commands and env-var lists.Why it matters:
The
--nameflag was 100% inert — it never reachedModelEntry, never persisted to the LiteLLM ConfigMap, and never influencedobol model list/prefer/status/sync/removeor any routing. The string was echoed in two log lines and passed as a UI label toRestartLiteLLMon the fallback path. Nothing else. Its help text said "informational only — LiteLLM keys the route by --model, not --name".The trap: operators run
obol model setup custom --name foo --model my/model …, then call the route asfoo, and get:This was the same error string in the v0.10.0-rc1 upgrade report attributed to a "cache survives
obol stack up" bug. After five fresh-cluster probes on rc3 could not reproduce the cache bug, the reliable reproducer turned out to be calling the route by the user-given--namerather than the--modelvalue LiteLLM actually keys on. Removing the flag eliminates that UX trap.Risk level: low
Commit under test: b9ff172 (this PR), parent f8df92e (tag v0.10.0-rc3)
Base branch: main
Scope
Validation
CI checks:
go test ./cmd/obol/ ./internal/model/ -count=1bash -n flows/*.shobol model setup customend-to-end)Unit tests:
Integration tests:
Flow tests:
.tmp/release-smoke-20260521-134647/flow-02-stack-init-up.log.tmp/release-smoke-20260521-134647/flow-05-network.log.tmp/release-smoke-20260521-134647/flow-07-sell-verify.log.tmp/release-smoke-20260521-134647/flow-08-buy.log.tmp/release-smoke-20260521-134647/flow-09-lifecycle.log.tmp/release-smoke-20260521-134647/flow-10-anvil-facilitator.log/v1/modelsreturns 401 to unauthenticated probeendpoint validation failed: ... context deadline exceeded— Unsloth Studio 27B cold-start exceedsValidateCustomEndpoint's 60s timeout. CLI parsed args correctly — no--nameregression.:11434; QA host runs Unslothobol model setup customvalidation. Preflight (Alice ETH, Bob USDC, facilitator) all PASS.Release smoke:
Failure attribution: Zero failures involve
--nameparsing.grep "unknown flag\|flag provided but not defined" .tmp/release-smoke-*/*.logreturns nothing. Everyobol model setup custominvocation parsed arguments and reached endpoint validation correctly. All 5 failures are upstream of the CLI:/v1/modelsrequires bearer token; flow-01's simple unauthenticated probe gets 401.ValidateCustomEndpoint's 60s timeout. Surfaces in flow-03 and flow-11.flow-06-sell-setup.shpreflight checkslocalhost:11434; QA host runs Unsloth, not Ollama.A vLLM/llama.cpp QA host without auth would not hit any of these. Six flows pass cleanly including all on-chain payment flows (flow-08 buy, flow-09 lifecycle).
Live Chain Evidence
Do not include private keys, seed phrases, passwords, hostnames, personal paths, or raw bearer tokens.
Network: base-sepolia (flow-08, flow-11)
RPC/provider: default free-tier fallback (no paid RPC set this run)
Facilitator:
https://x402.gcp.obol.tech(reachable, supports Base Sepolia exact)Contracts and tokens:
0x036CbD53842c5426634e7929541eC2318f3dCF7eWallet roles:
0xC0De030F6C37f490594F93fB99e2756703c4297E0x57b0eF875DeB5A37301F1640E469a2129Da9490EBalances:
0x036CbD…CF7eTransaction receipts: none on-chain this run (PR is CLI-only, no settlement path touched).
Runtime Evidence
QA environment:
Images:
nousresearch/hermes-agentghcr.io/berriai/litellm:main-stableghcr.io/obolnetwork/<name>:latest(locally built, OBOL_FORCE_REBUILD_LOCAL_DEV_IMAGES=true)Kubernetes / stack:
smart-dinosaur(validation phase),wondrous-crane(release-smoke) — both torn downcleanup_stackstrap removed test workspaces on exitModel and routing:
kubectl get secret litellm-secrets -n llm; Unsloth JWT from POST/api/auth/loginArtifacts and logs:
.tmp/release-smoke-20260521-134647/RELEASE_REPORT.mdobol-0:qa.0Demo readiness:
Review Notes
Known gaps:
obol model setup custom— it requires a bearer JWT and has slow cold-start for large GGUFs that exceeds the 60s validation timeout. Adding first-class Unsloth support (alongside Ollama) would let release-smoke run cleanly on hosts without vLLM. Out of scope for this PR.obol model setup customhot-add path fails with[Errno 30] Read-only file system: '/etc/litellm/config.yaml'because the ConfigMap is mounted RO. The CLI correctly falls back to a deployment restart, so users aren't blocked, but every custom-endpoint setup pays the full ~90s rollout cost. Worth a follow-up.TestWarnIfNoChatModel_EmitsWarnWhenNoModelsininternal/stackis broken onmainHEAD as of this PR (asserts warn-on-stderr but the message lands on stdout). Predates this branch.Follow-ups:
obol model setup(auth handling, longer first-call timeout).TestWarnIfNoChatModel_EmitsWarnWhenNoModels(separate issue).agent_render.go::agentPodSpecstill lacks an init container.Reviewer focus:
AddCustomEndpoint(cfg, u, endpoint, modelName, apiKey)(noname) is acceptable — the function is only called fromcmd/obol/model.goand has no test mocks. Repo-wide grep returns zero stale callers.RestartLiteLLM(cfg, u, modelName)fallback label change is OK — the third arg is a UI-only string.OBOL_LLM_NAMEremoval fromflows/buy-external.shdoesn't break any external automation referencing it (none found in this repo or in~/.config).