Describe the Bug
CloudAI resolves hook scenarios from conf/hook relative to the process working directory, even when an external caller supplies absolute paths for the system config, test scenario, tests directory, and output directory.
This prevents orchestration tools from running hook-bearing scenarios outside the CloudAI checkout. The error tells callers to ensure a hook directory exists under the working directory, but cloudai run and cloudai dry-run do not currently provide a way to configure that directory.
We encountered this while validating NVIDIA CloudAI through CloudAI Autotune. Autotune invoked CloudAI from its own repository and preserved the failure in its run log.
Observed on NVIDIA/cloudai main at 79f8bdb2.
Steps to Reproduce
From a working directory outside the CloudAI checkout:
autotune smoke-cloudai \
../cloudai/conf/common/test_scenario/nccl_test.toml \
--cloudai-bin ../cloudai/.venv/bin/cloudai \
--system-config ../cloudai/conf/common/system/example_slurm_cluster.toml \
--tests-dir ../cloudai/conf/common/test \
--runs-dir runs/integration \
--timeout-sec 60
CloudAI exits with code 1:
[ERROR] Pre-test hook 'nccl_test' not found in hook mapping. A corresponding hook should exist under 'conf/hook'. Ensure that a proper hook directory is set under the working directory.
The scenario, system config, and tests directory are all absolute or explicitly configurable, but hook lookup remains relative to the caller's current working directory.
Expected Behavior
run and dry-run should allow callers to provide the hook configuration directory explicitly, similar to --tests-dir, so external orchestration does not need to change its working directory or recreate CloudAI's conf/hook layout.
Existing in-repository commands should continue to use conf/hook by default.
Screenshots
Not applicable. The full failure is textual and reproducible with the command above.
Additional Context
AI was used only for context and guidance.
Describe the Bug
CloudAI resolves hook scenarios from
conf/hookrelative to the process working directory, even when an external caller supplies absolute paths for the system config, test scenario, tests directory, and output directory.This prevents orchestration tools from running hook-bearing scenarios outside the CloudAI checkout. The error tells callers to ensure a hook directory exists under the working directory, but
cloudai runandcloudai dry-rundo not currently provide a way to configure that directory.We encountered this while validating NVIDIA CloudAI through CloudAI Autotune. Autotune invoked CloudAI from its own repository and preserved the failure in its run log.
Observed on NVIDIA/cloudai
mainat79f8bdb2.Steps to Reproduce
From a working directory outside the CloudAI checkout:
CloudAI exits with code 1:
The scenario, system config, and tests directory are all absolute or explicitly configurable, but hook lookup remains relative to the caller's current working directory.
Expected Behavior
runanddry-runshould allow callers to provide the hook configuration directory explicitly, similar to--tests-dir, so external orchestration does not need to change its working directory or recreate CloudAI'sconf/hooklayout.Existing in-repository commands should continue to use
conf/hookby default.Screenshots
Not applicable. The full failure is textual and reproducible with the command above.
Additional Context
AI was used only for context and guidance.