feat(sdk): support inference_v3 config_file mount by V2arK · Pull Request #136 · CentML/centml-python-client

V2arK · 2026-05-21T18:47:40Z

Summary

Bump platform-api-python-client 4.9.0 → 4.10.0 so ConfigFileMount is importable (verified by unpacking the wheel from PyPI; the models/config_file_mount.py file is present).
New helper centml/sdk/utils/config_file.py exposing load_config_file_mount(path, mount_path, filename=None) -> ConfigFileMount — reads a UTF-8 text file off disk and wraps it. Pattern mirrors the existing centml/sdk/utils/client_certs.py (free function, no class wrapping of generated client models).
Helper is deliberately validation-free — content size cap, filename charset, mount_path rules are owned by the server (platform PRs #3656 and #3667). Keeping limits server-only avoids SDK drift when server limits change.
examples/sdk/create_inference.py updated to demonstrate the helper in context; all existing fields preserved.

No CLI changes. No re-export from centml.sdk top-level (consistent with how client_certs is consumed today).

Why this shape

Decision	Why
Free function (not subclass `ConfigFileMount`)	Matches `client_certs.py`; no precedent for subclassing generated models.
`filename` defaults to `os.path.basename(path)`	Common case is mounting `nginx.conf` as `nginx.conf`.
`mount_path` has no default	The in-container path is the whole point of the field.
UTF-8 text mode read	Server contract is `content: str`. Binary configs not in scope.
No client-side validation	Server returns 422 with byte counts on overrun (PR #3667); SDK should not drift from server.
`==4.10.0` exact pin	Matches existing `requirements.txt` discipline.

Test plan

pytest --sanity tests/ — 47 passed (4 new tests in test_sdk_config_file_helper.py)
Black --check clean on new files
mypy clean on new files (pre-existing auth.py import-untyped is unchanged)
PyPI wheel for platform-api-python-client==4.10.0 verified to ship models/config_file_mount.py
Manual smoke on local cluster: created inference_v3 with config_file=load_config_file_mount(tmp_conf, "/etc/nginx/conf.d/default.conf") against localhost:16000 (minikube). get_inference(id).config_file roundtripped byte-perfect — filename matched basename, mount_path preserved, content matched the source file. Cleaned up via delete().

Unit test	Asserts
`test_default_filename_from_basename`	filename defaults to `basename(path)`, content + mount_path passthrough
`test_explicit_filename_overrides_basename`	explicit `filename` arg wins
`test_utf8_multibyte_content_roundtrips`	CJK text round-trips byte-perfect (catches binary-mode regression)
`test_missing_file_raises_filenotfound`	helper does not swallow I/O errors

Bump platform-api-python-client to 4.10.0 so ConfigFileMount is importable, and add a thin helper in centml/sdk/utils/config_file.py that reads a file off disk and returns a populated ConfigFileMount. Example create_inference.py updated to show the helper in context. Server (platform PRs #3656 and #3667) owns all field-level validation (64 KiB content cap, filename charset, mount_path rules); the helper deliberately stays validation-free so the SDK does not drift when server limits change. Signed-off-by: Honglin Cao <hocao@nvidia.com>

mount_path is the parent directory inside the container; the chart concatenates mount_path + "/" + filename for the final volumeMount target. The original example treated mount_path as the full file path, which mounted the file as a subdirectory and crashed nginx with an OCI "cannot create subdirectories in <path>: not a directory" error. - Helper comment now states the mount_path/filename relationship. - Example uses mount_path="/etc/nginx/conf.d", path="./default.conf" so the file lands at /etc/nginx/conf.d/default.conf as nginx expects. - Unit tests use directory mount_paths to match real usage. Discovered while running pod-level e2e smoke against local minikube. Signed-off-by: Honglin Cao <hocao@nvidia.com>

…file Per review feedback on #136: add a separate vLLM example showing config_file used to mount a chat-template Jinja file that vLLM consumes via --chat-template. Existing nginx example stays as the minimal config_file demo; this one shows the realistic LLM-serving shape (vllm/vllm-openai image, command override, GPU instance). Signed-off-by: Honglin Cao <hocao@nvidia.com>

scripts/format.sh uses --skip-magic-trailing-comma; load_config_file_mount call fits on one line under line-length 120 so collapse the multi-line form. Signed-off-by: Honglin Cao <hocao@nvidia.com>

michaelshin's review: a config_file is most useful when it carries the full vLLM startup config (model, dtype, gpu-memory-utilization, speculative-config, tool-call-parser, etc.) consumed via --config, not just a chat template. - Switch the example command to `--config /etc/vllm/vllm_config.yaml`. - Add sibling examples/sdk/vllm_config.yaml with the Llama-3.1-8B + EAGLE3 speculative-decoding setup from review feedback so the example is self-runnable. Signed-off-by: Honglin Cao <hocao@nvidia.com>

Found during dev B200 verification (Qwen2.5-0.5B variant of this shape): - vllm/vllm-openai image has python3 only, no `python` symlink — the K8s container fails to start with `exec: "python": executable file not found in $PATH` because the platform's `command` field overrides the image's ENTRYPOINT entirely. - The platform helm path passes numeric arg tokens to the Rollout spec as integers, and the K8s API server rejects them (`args[3] ... must be of type string: "integer"`). Moving `port: 8000` into the YAML config keeps every CLI token a non-numeric string while still letting vLLM pick up the port — and matches the "all config in one file" intent of --config. Verified end-to-end on dev (cluster c-01-c-11-centml-org, hw x1-large-b200): created deployment via SDK → rollout HEALTHY at t+76s → POST /v1/chat/completions returned HTTP 200 with a real Qwen completion. Signed-off-by: Honglin Cao <hocao@nvidia.com>

V2arK · 2026-05-21T21:52:11Z

End-to-end verified on dev with a B200 instance (82decf1):

Created deployment via SDK with the example shape (config_file=load_config_file_mount("./vllm_config.yaml", "/etc/vllm"), command="python3 -m vllm.entrypoints.openai.api_server --config /etc/vllm/vllm_config.yaml")
Cluster c-01-c-11-centml-org, hw x1-large-b200 (1 GPU), Qwen2.5-0.5B for the test (open model, no HF gating; example file keeps your Llama-3.1-8B + EAGLE3 config verbatim)
ArgoCD synced, vLLM container started, image read the mounted YAML, model loaded, torch.compile + cudagraph capture done, rollout HEALTHY at t+76s
POST https:///v1/chat/completions returned HTTP 200 with a real Qwen completion ('Hap!', 4 completion tokens)
Deleted via SDK

Two example fixes uncovered by the run (in 82decf1):

python -> python3 (vllm/vllm-openai image has no python symlink and the platform's command field overrides ENTRYPOINT, so exec: python: not found killed the container)
moved port: 8000 from CLI flag into vllm_config.yaml (the platform helm path serializes numeric CLI args as ints, which the K8s API rejects with args[3] ... must be of type string: integer)

Marked ready for re-review.

michaelshin · 2026-05-21T21:52:55Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 82decf1dca

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-21T21:56:43Z

+            # so the resulting file lands at mount_path/filename. Pass an inline
+            # ConfigFileMount(filename=..., mount_path=..., content=...) if the
+            # content is already in memory.
+            config_file=load_config_file_mount(path="./default.conf", mount_path="/etc/nginx/conf.d"),


Ship referenced config file or make mount optional

create_inference.py now unconditionally calls load_config_file_mount("./default.conf", ...), but this repository does not include examples/sdk/default.conf, so the example raises FileNotFoundError before any deployment call when run as-is. This makes the existing example non-runnable unless users discover and create an extra file manually.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-21T21:56:43Z

+# Field-level validation (size cap, filename charset, mount_path rules) is
+# left to the API so SDK doesn't drift when server limits change.
+def load_config_file_mount(path: str, mount_path: str, filename: Optional[str] = None) -> ConfigFileMount:
+    with open(path, "r", encoding="utf-8") as f:


Preserve original line endings in config file content

Reading with open(path, "r", encoding="utf-8") uses universal-newline translation, so files with \r\n or \r line endings are normalized to \n before upload. In environments where config line endings are significant (for example Windows-authored configs consumed byte-sensitively), this silently alters the mounted content.

Useful? React with 👍 / 👎.

- helper: open with newline="" so CRLF/CR line endings reach the server byte-faithful instead of being silently normalized to \n. Adds a regression test for a Windows-authored config (b"line1\r\nline2\r\n"). - example: ship examples/sdk/default.conf so create_inference.py runs as-is without the user having to discover and create an extra file (matches how create_inference_vllm.py ships vllm_config.yaml). Signed-off-by: Honglin Cao <hocao@nvidia.com>

V2arK · 2026-05-21T21:58:41Z

Addressed codex review in 009be9b:

P2 — create_inference.py references missing ./default.conf: shipped examples/sdk/default.conf with a trivial nginx config so the example runs as-is, matching how create_inference_vllm.py ships vllm_config.yaml.

P3 — universal-newline translation strips CRLF: changed open(path, "r", encoding="utf-8") to open(path, "r", encoding="utf-8", newline="") so Windows-authored configs reach the server byte-faithful instead of being silently normalized to \n. Added a regression test test_preserves_crlf_line_endings that locks in b"line1\r\nline2\r\n" round-tripping intact (the test failed against the prior open() call, passes now).

Unit suite: 5/5 (was 4/4 + new CRLF case). Black/mypy clean.

V2arK · 2026-05-21T21:59:53Z

@michaelshin I will test it from end to end again to make sure it works then I will merge it

V2arK self-assigned this May 21, 2026

V2arK force-pushed the honglin/sdk-config-file-support branch 2 times, most recently from 9a5c707 to 33f8ba3 Compare May 21, 2026 19:30

V2arK added 2 commits May 21, 2026 15:33

V2arK force-pushed the honglin/sdk-config-file-support branch from 33f8ba3 to 5b90163 Compare May 21, 2026 19:33

michaelshin reviewed May 21, 2026

View reviewed changes

Comment thread examples/sdk/create_inference.py Outdated

V2arK added 2 commits May 21, 2026 16:57

style(examples): black-format create_inference.py

608e3c4

scripts/format.sh uses --skip-magic-trailing-comma; load_config_file_mount call fits on one line under line-length 120 so collapse the multi-line form. Signed-off-by: Honglin Cao <hocao@nvidia.com>

V2arK requested a review from michaelshin May 21, 2026 21:02

michaelshin reviewed May 21, 2026

View reviewed changes

Comment thread examples/sdk/create_inference_vllm.py Outdated

michaelshin reviewed May 21, 2026

View reviewed changes

Comment thread examples/sdk/create_inference_vllm.py Outdated

V2arK marked this pull request as ready for review May 21, 2026 21:51

V2arK requested a review from michaelshin May 21, 2026 21:51

chatgpt-codex-connector Bot reviewed May 21, 2026

View reviewed changes

michaelshin approved these changes May 21, 2026

View reviewed changes

V2arK merged commit b132fd0 into main May 21, 2026
5 checks passed

V2arK deleted the honglin/sdk-config-file-support branch May 21, 2026 22:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(sdk): support inference_v3 config_file mount#136

feat(sdk): support inference_v3 config_file mount#136
V2arK merged 7 commits into
mainfrom
honglin/sdk-config-file-support

V2arK commented May 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

V2arK commented May 21, 2026

Uh oh!

michaelshin commented May 21, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 21, 2026

Uh oh!

chatgpt-codex-connector Bot May 21, 2026

Uh oh!

V2arK commented May 21, 2026

Uh oh!

V2arK commented May 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

V2arK commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why this shape

Test plan

Uh oh!

Uh oh!

Uh oh!

Uh oh!

V2arK commented May 21, 2026

Uh oh!

michaelshin commented May 21, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

V2arK commented May 21, 2026

Uh oh!

V2arK commented May 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

V2arK commented May 21, 2026 •

edited

Loading