feat(sdk): support inference_v3 config_file mount#136
Conversation
9a5c707 to
33f8ba3
Compare
Bump platform-api-python-client to 4.10.0 so ConfigFileMount is importable, and add a thin helper in centml/sdk/utils/config_file.py that reads a file off disk and returns a populated ConfigFileMount. Example create_inference.py updated to show the helper in context. Server (platform PRs #3656 and #3667) owns all field-level validation (64 KiB content cap, filename charset, mount_path rules); the helper deliberately stays validation-free so the SDK does not drift when server limits change. Signed-off-by: Honglin Cao <hocao@nvidia.com>
mount_path is the parent directory inside the container; the chart concatenates mount_path + "/" + filename for the final volumeMount target. The original example treated mount_path as the full file path, which mounted the file as a subdirectory and crashed nginx with an OCI "cannot create subdirectories in <path>: not a directory" error. - Helper comment now states the mount_path/filename relationship. - Example uses mount_path="/etc/nginx/conf.d", path="./default.conf" so the file lands at /etc/nginx/conf.d/default.conf as nginx expects. - Unit tests use directory mount_paths to match real usage. Discovered while running pod-level e2e smoke against local minikube. Signed-off-by: Honglin Cao <hocao@nvidia.com>
33f8ba3 to
5b90163
Compare
…file Per review feedback on #136: add a separate vLLM example showing config_file used to mount a chat-template Jinja file that vLLM consumes via --chat-template. Existing nginx example stays as the minimal config_file demo; this one shows the realistic LLM-serving shape (vllm/vllm-openai image, command override, GPU instance). Signed-off-by: Honglin Cao <hocao@nvidia.com>
scripts/format.sh uses --skip-magic-trailing-comma; load_config_file_mount call fits on one line under line-length 120 so collapse the multi-line form. Signed-off-by: Honglin Cao <hocao@nvidia.com>
michaelshin's review: a config_file is most useful when it carries the full vLLM startup config (model, dtype, gpu-memory-utilization, speculative-config, tool-call-parser, etc.) consumed via --config, not just a chat template. - Switch the example command to `--config /etc/vllm/vllm_config.yaml`. - Add sibling examples/sdk/vllm_config.yaml with the Llama-3.1-8B + EAGLE3 speculative-decoding setup from review feedback so the example is self-runnable. Signed-off-by: Honglin Cao <hocao@nvidia.com>
Found during dev B200 verification (Qwen2.5-0.5B variant of this shape): - vllm/vllm-openai image has python3 only, no `python` symlink — the K8s container fails to start with `exec: "python": executable file not found in $PATH` because the platform's `command` field overrides the image's ENTRYPOINT entirely. - The platform helm path passes numeric arg tokens to the Rollout spec as integers, and the K8s API server rejects them (`args[3] ... must be of type string: "integer"`). Moving `port: 8000` into the YAML config keeps every CLI token a non-numeric string while still letting vLLM pick up the port — and matches the "all config in one file" intent of --config. Verified end-to-end on dev (cluster c-01-c-11-centml-org, hw x1-large-b200): created deployment via SDK → rollout HEALTHY at t+76s → POST /v1/chat/completions returned HTTP 200 with a real Qwen completion. Signed-off-by: Honglin Cao <hocao@nvidia.com>
|
End-to-end verified on dev with a B200 instance (82decf1):
Two example fixes uncovered by the run (in 82decf1):
Marked ready for re-review. |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 82decf1dca
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| # so the resulting file lands at mount_path/filename. Pass an inline | ||
| # ConfigFileMount(filename=..., mount_path=..., content=...) if the | ||
| # content is already in memory. | ||
| config_file=load_config_file_mount(path="./default.conf", mount_path="/etc/nginx/conf.d"), |
There was a problem hiding this comment.
Ship referenced config file or make mount optional
create_inference.py now unconditionally calls load_config_file_mount("./default.conf", ...), but this repository does not include examples/sdk/default.conf, so the example raises FileNotFoundError before any deployment call when run as-is. This makes the existing example non-runnable unless users discover and create an extra file manually.
Useful? React with 👍 / 👎.
| # Field-level validation (size cap, filename charset, mount_path rules) is | ||
| # left to the API so SDK doesn't drift when server limits change. | ||
| def load_config_file_mount(path: str, mount_path: str, filename: Optional[str] = None) -> ConfigFileMount: | ||
| with open(path, "r", encoding="utf-8") as f: |
There was a problem hiding this comment.
Preserve original line endings in config file content
Reading with open(path, "r", encoding="utf-8") uses universal-newline translation, so files with \r\n or \r line endings are normalized to \n before upload. In environments where config line endings are significant (for example Windows-authored configs consumed byte-sensitively), this silently alters the mounted content.
Useful? React with 👍 / 👎.
- helper: open with newline="" so CRLF/CR line endings reach the server byte-faithful instead of being silently normalized to \n. Adds a regression test for a Windows-authored config (b"line1\r\nline2\r\n"). - example: ship examples/sdk/default.conf so create_inference.py runs as-is without the user having to discover and create an extra file (matches how create_inference_vllm.py ships vllm_config.yaml). Signed-off-by: Honglin Cao <hocao@nvidia.com>
|
Addressed codex review in 009be9b: P2 — P3 — universal-newline translation strips CRLF: changed Unit suite: 5/5 (was 4/4 + new CRLF case). Black/mypy clean. |
|
@michaelshin I will test it from end to end again to make sure it works then I will merge it |
Summary
platform-api-python-client4.9.0→4.10.0soConfigFileMountis importable (verified by unpacking the wheel from PyPI; themodels/config_file_mount.pyfile is present).centml/sdk/utils/config_file.pyexposingload_config_file_mount(path, mount_path, filename=None) -> ConfigFileMount— reads a UTF-8 text file off disk and wraps it. Pattern mirrors the existingcentml/sdk/utils/client_certs.py(free function, no class wrapping of generated client models).examples/sdk/create_inference.pyupdated to demonstrate the helper in context; all existing fields preserved.No CLI changes. No re-export from
centml.sdktop-level (consistent with howclient_certsis consumed today).Why this shape
ConfigFileMount)client_certs.py; no precedent for subclassing generated models.filenamedefaults toos.path.basename(path)nginx.confasnginx.conf.mount_pathhas no defaultcontent: str. Binary configs not in scope.==4.10.0exact pinrequirements.txtdiscipline.Test plan
pytest --sanity tests/— 47 passed (4 new tests intest_sdk_config_file_helper.py)--checkclean on new filesauth.pyimport-untyped is unchanged)platform-api-python-client==4.10.0verified to shipmodels/config_file_mount.pyconfig_file=load_config_file_mount(tmp_conf, "/etc/nginx/conf.d/default.conf")againstlocalhost:16000(minikube).get_inference(id).config_fileroundtripped byte-perfect —filenamematched basename,mount_pathpreserved,contentmatched the source file. Cleaned up viadelete().test_default_filename_from_basenamebasename(path), content + mount_path passthroughtest_explicit_filename_overrides_basenamefilenamearg winstest_utf8_multibyte_content_roundtripstest_missing_file_raises_filenotfound