Skip to content

feat(sdk): support inference_v3 config_file mount#136

Merged
V2arK merged 7 commits into
mainfrom
honglin/sdk-config-file-support
May 21, 2026
Merged

feat(sdk): support inference_v3 config_file mount#136
V2arK merged 7 commits into
mainfrom
honglin/sdk-config-file-support

Conversation

@V2arK
Copy link
Copy Markdown
Contributor

@V2arK V2arK commented May 21, 2026

Summary

  • Bump platform-api-python-client 4.9.04.10.0 so ConfigFileMount is importable (verified by unpacking the wheel from PyPI; the models/config_file_mount.py file is present).
  • New helper centml/sdk/utils/config_file.py exposing load_config_file_mount(path, mount_path, filename=None) -> ConfigFileMount — reads a UTF-8 text file off disk and wraps it. Pattern mirrors the existing centml/sdk/utils/client_certs.py (free function, no class wrapping of generated client models).
  • Helper is deliberately validation-free — content size cap, filename charset, mount_path rules are owned by the server (platform PRs #3656 and #3667). Keeping limits server-only avoids SDK drift when server limits change.
  • examples/sdk/create_inference.py updated to demonstrate the helper in context; all existing fields preserved.

No CLI changes. No re-export from centml.sdk top-level (consistent with how client_certs is consumed today).

Why this shape

Decision Why
Free function (not subclass ConfigFileMount) Matches client_certs.py; no precedent for subclassing generated models.
filename defaults to os.path.basename(path) Common case is mounting nginx.conf as nginx.conf.
mount_path has no default The in-container path is the whole point of the field.
UTF-8 text mode read Server contract is content: str. Binary configs not in scope.
No client-side validation Server returns 422 with byte counts on overrun (PR #3667); SDK should not drift from server.
==4.10.0 exact pin Matches existing requirements.txt discipline.

Test plan

  • pytest --sanity tests/ — 47 passed (4 new tests in test_sdk_config_file_helper.py)
  • Black --check clean on new files
  • mypy clean on new files (pre-existing auth.py import-untyped is unchanged)
  • PyPI wheel for platform-api-python-client==4.10.0 verified to ship models/config_file_mount.py
  • Manual smoke on local cluster: created inference_v3 with config_file=load_config_file_mount(tmp_conf, "/etc/nginx/conf.d/default.conf") against localhost:16000 (minikube). get_inference(id).config_file roundtripped byte-perfect — filename matched basename, mount_path preserved, content matched the source file. Cleaned up via delete().
Unit test Asserts
test_default_filename_from_basename filename defaults to basename(path), content + mount_path passthrough
test_explicit_filename_overrides_basename explicit filename arg wins
test_utf8_multibyte_content_roundtrips CJK text round-trips byte-perfect (catches binary-mode regression)
test_missing_file_raises_filenotfound helper does not swallow I/O errors

@V2arK V2arK self-assigned this May 21, 2026
@V2arK V2arK force-pushed the honglin/sdk-config-file-support branch 2 times, most recently from 9a5c707 to 33f8ba3 Compare May 21, 2026 19:30
V2arK added 2 commits May 21, 2026 15:33
Bump platform-api-python-client to 4.10.0 so ConfigFileMount is
importable, and add a thin helper in centml/sdk/utils/config_file.py
that reads a file off disk and returns a populated ConfigFileMount.
Example create_inference.py updated to show the helper in context.

Server (platform PRs #3656 and #3667) owns all field-level validation
(64 KiB content cap, filename charset, mount_path rules); the helper
deliberately stays validation-free so the SDK does not drift when
server limits change.

Signed-off-by: Honglin Cao <hocao@nvidia.com>
mount_path is the parent directory inside the container; the chart
concatenates mount_path + "/" + filename for the final volumeMount target.
The original example treated mount_path as the full file path, which
mounted the file as a subdirectory and crashed nginx with an OCI
"cannot create subdirectories in <path>: not a directory" error.

- Helper comment now states the mount_path/filename relationship.
- Example uses mount_path="/etc/nginx/conf.d", path="./default.conf"
  so the file lands at /etc/nginx/conf.d/default.conf as nginx expects.
- Unit tests use directory mount_paths to match real usage.

Discovered while running pod-level e2e smoke against local minikube.

Signed-off-by: Honglin Cao <hocao@nvidia.com>
@V2arK V2arK force-pushed the honglin/sdk-config-file-support branch from 33f8ba3 to 5b90163 Compare May 21, 2026 19:33
Comment thread examples/sdk/create_inference.py Outdated
V2arK added 2 commits May 21, 2026 16:57
…file

Per review feedback on #136: add a separate vLLM example showing
config_file used to mount a chat-template Jinja file that vLLM
consumes via --chat-template. Existing nginx example stays as the
minimal config_file demo; this one shows the realistic LLM-serving
shape (vllm/vllm-openai image, command override, GPU instance).

Signed-off-by: Honglin Cao <hocao@nvidia.com>
scripts/format.sh uses --skip-magic-trailing-comma; load_config_file_mount
call fits on one line under line-length 120 so collapse the multi-line form.

Signed-off-by: Honglin Cao <hocao@nvidia.com>
@V2arK V2arK requested a review from michaelshin May 21, 2026 21:02
Comment thread examples/sdk/create_inference_vllm.py Outdated
michaelshin's review: a config_file is most useful when it carries the
full vLLM startup config (model, dtype, gpu-memory-utilization,
speculative-config, tool-call-parser, etc.) consumed via --config,
not just a chat template.

- Switch the example command to `--config /etc/vllm/vllm_config.yaml`.
- Add sibling examples/sdk/vllm_config.yaml with the Llama-3.1-8B +
  EAGLE3 speculative-decoding setup from review feedback so the example
  is self-runnable.

Signed-off-by: Honglin Cao <hocao@nvidia.com>
Comment thread examples/sdk/create_inference_vllm.py Outdated
Found during dev B200 verification (Qwen2.5-0.5B variant of this shape):

- vllm/vllm-openai image has python3 only, no `python` symlink — the
  K8s container fails to start with `exec: "python": executable file
  not found in $PATH` because the platform's `command` field overrides
  the image's ENTRYPOINT entirely.
- The platform helm path passes numeric arg tokens to the Rollout spec
  as integers, and the K8s API server rejects them (`args[3] ... must
  be of type string: "integer"`). Moving `port: 8000` into the YAML
  config keeps every CLI token a non-numeric string while still letting
  vLLM pick up the port — and matches the "all config in one file"
  intent of --config.

Verified end-to-end on dev (cluster c-01-c-11-centml-org, hw x1-large-b200):
created deployment via SDK → rollout HEALTHY at t+76s → POST
/v1/chat/completions returned HTTP 200 with a real Qwen completion.

Signed-off-by: Honglin Cao <hocao@nvidia.com>
@V2arK V2arK marked this pull request as ready for review May 21, 2026 21:51
@V2arK V2arK requested a review from michaelshin May 21, 2026 21:51
@V2arK
Copy link
Copy Markdown
Contributor Author

V2arK commented May 21, 2026

End-to-end verified on dev with a B200 instance (82decf1):

  • Created deployment via SDK with the example shape (config_file=load_config_file_mount("./vllm_config.yaml", "/etc/vllm"), command="python3 -m vllm.entrypoints.openai.api_server --config /etc/vllm/vllm_config.yaml")
  • Cluster c-01-c-11-centml-org, hw x1-large-b200 (1 GPU), Qwen2.5-0.5B for the test (open model, no HF gating; example file keeps your Llama-3.1-8B + EAGLE3 config verbatim)
  • ArgoCD synced, vLLM container started, image read the mounted YAML, model loaded, torch.compile + cudagraph capture done, rollout HEALTHY at t+76s
  • POST https:///v1/chat/completions returned HTTP 200 with a real Qwen completion ('Hap!', 4 completion tokens)
  • Deleted via SDK

Two example fixes uncovered by the run (in 82decf1):

  • python -> python3 (vllm/vllm-openai image has no python symlink and the platform's command field overrides ENTRYPOINT, so exec: python: not found killed the container)
  • moved port: 8000 from CLI flag into vllm_config.yaml (the platform helm path serializes numeric CLI args as ints, which the K8s API rejects with args[3] ... must be of type string: integer)

Marked ready for re-review.

@michaelshin
Copy link
Copy Markdown
Contributor

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 82decf1dca

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

# so the resulting file lands at mount_path/filename. Pass an inline
# ConfigFileMount(filename=..., mount_path=..., content=...) if the
# content is already in memory.
config_file=load_config_file_mount(path="./default.conf", mount_path="/etc/nginx/conf.d"),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Ship referenced config file or make mount optional

create_inference.py now unconditionally calls load_config_file_mount("./default.conf", ...), but this repository does not include examples/sdk/default.conf, so the example raises FileNotFoundError before any deployment call when run as-is. This makes the existing example non-runnable unless users discover and create an extra file manually.

Useful? React with 👍 / 👎.

Comment thread centml/sdk/utils/config_file.py Outdated
# Field-level validation (size cap, filename charset, mount_path rules) is
# left to the API so SDK doesn't drift when server limits change.
def load_config_file_mount(path: str, mount_path: str, filename: Optional[str] = None) -> ConfigFileMount:
with open(path, "r", encoding="utf-8") as f:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3 Badge Preserve original line endings in config file content

Reading with open(path, "r", encoding="utf-8") uses universal-newline translation, so files with \r\n or \r line endings are normalized to \n before upload. In environments where config line endings are significant (for example Windows-authored configs consumed byte-sensitively), this silently alters the mounted content.

Useful? React with 👍 / 👎.

- helper: open with newline="" so CRLF/CR line endings reach the server
  byte-faithful instead of being silently normalized to \n. Adds a
  regression test for a Windows-authored config (b"line1\r\nline2\r\n").
- example: ship examples/sdk/default.conf so create_inference.py runs
  as-is without the user having to discover and create an extra file
  (matches how create_inference_vllm.py ships vllm_config.yaml).

Signed-off-by: Honglin Cao <hocao@nvidia.com>
@V2arK
Copy link
Copy Markdown
Contributor Author

V2arK commented May 21, 2026

Addressed codex review in 009be9b:

P2 — create_inference.py references missing ./default.conf: shipped examples/sdk/default.conf with a trivial nginx config so the example runs as-is, matching how create_inference_vllm.py ships vllm_config.yaml.

P3 — universal-newline translation strips CRLF: changed open(path, "r", encoding="utf-8") to open(path, "r", encoding="utf-8", newline="") so Windows-authored configs reach the server byte-faithful instead of being silently normalized to \n. Added a regression test test_preserves_crlf_line_endings that locks in b"line1\r\nline2\r\n" round-tripping intact (the test failed against the prior open() call, passes now).

Unit suite: 5/5 (was 4/4 + new CRLF case). Black/mypy clean.

@V2arK
Copy link
Copy Markdown
Contributor Author

V2arK commented May 21, 2026

@michaelshin I will test it from end to end again to make sure it works then I will merge it

@V2arK V2arK merged commit b132fd0 into main May 21, 2026
5 checks passed
@V2arK V2arK deleted the honglin/sdk-config-file-support branch May 21, 2026 22:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants