Skip to content

Allow external callers to configure the hook directory #952

Description

@shreyaskommuri

Describe the Bug

CloudAI resolves hook scenarios from conf/hook relative to the process working directory, even when an external caller supplies absolute paths for the system config, test scenario, tests directory, and output directory.

This prevents orchestration tools from running hook-bearing scenarios outside the CloudAI checkout. The error tells callers to ensure a hook directory exists under the working directory, but cloudai run and cloudai dry-run do not currently provide a way to configure that directory.

We encountered this while validating NVIDIA CloudAI through CloudAI Autotune. Autotune invoked CloudAI from its own repository and preserved the failure in its run log.

Observed on NVIDIA/cloudai main at 79f8bdb2.

Steps to Reproduce

From a working directory outside the CloudAI checkout:

autotune smoke-cloudai \
  ../cloudai/conf/common/test_scenario/nccl_test.toml \
  --cloudai-bin ../cloudai/.venv/bin/cloudai \
  --system-config ../cloudai/conf/common/system/example_slurm_cluster.toml \
  --tests-dir ../cloudai/conf/common/test \
  --runs-dir runs/integration \
  --timeout-sec 60

CloudAI exits with code 1:

[ERROR] Pre-test hook 'nccl_test' not found in hook mapping. A corresponding hook should exist under 'conf/hook'. Ensure that a proper hook directory is set under the working directory.

The scenario, system config, and tests directory are all absolute or explicitly configurable, but hook lookup remains relative to the caller's current working directory.

Expected Behavior

run and dry-run should allow callers to provide the hook configuration directory explicitly, similar to --tests-dir, so external orchestration does not need to change its working directory or recreate CloudAI's conf/hook layout.

Existing in-repository commands should continue to use conf/hook by default.

Screenshots

Not applicable. The full failure is textual and reproducible with the command above.

Additional Context

AI was used only for context and guidance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions