Skip to content

[HWORKS-2761] Compute Resources Usage view should only show Kueue-reachable nodes#576

Open
o-alex wants to merge 1 commit into
logicalclocks:mainfrom
o-alex:HWORKS-2761
Open

[HWORKS-2761] Compute Resources Usage view should only show Kueue-reachable nodes#576
o-alex wants to merge 1 commit into
logicalclocks:mainfrom
o-alex:HWORKS-2761

Conversation

@o-alex
Copy link
Copy Markdown
Contributor

@o-alex o-alex commented May 13, 2026

Summary

  • Adds a user-facing page user_guides/projects/scheduling/compute_resources.md explaining the Compute Resources Usage card: the totals, the per-node breakdown, the Queue: filter dropdown (any + per-LocalQueue), the access-notice tooltip, and the fallback when Kueue is not in use.
  • Adds an admin-facing page setup_installation/admin/compute_resources.md covering the LocalQueue → ClusterQueue → ResourceFlavor walk, the node-label / node-taint matching rules, the Kueue ClusterRole RBAC, and a troubleshooting table.
  • Both pages cross-link. Nav entries added under Kubernetes Scheduling (user) and Setup and Administration > Administration (admin).

Test plan

  • mkdocs build -s is clean (no new errors; pre-existing warnings about databricks/*.md not being in nav are unchanged).
  • markdownlint-cli2 clean on both new files.
  • Visual review: nav entries appear in the right sections; cross-links resolve; the three new screenshots render.

🤖 Generated with Claude Code

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds end-user and admin documentation for the Hopsworks Compute Resources Usage view, clarifying how per-project node visibility is determined (Kueue reachability) and how to interpret/filter the UI.

Changes:

  • Add a new user guide explaining the Compute Resources Usage card (totals, per-node breakdown, Queue/Labels filtering, access notice, and non-Kueue fallback).
  • Add a new admin guide explaining the LocalQueue → ClusterQueue → ResourceFlavor mapping and required RBAC, plus troubleshooting.
  • Update mkdocs.yml navigation to include both new pages.

Reviewed changes

Copilot reviewed 5 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
mkdocs.yml Adds nav entries for the new user and admin documentation pages.
docs/user_guides/projects/scheduling/compute_resources.md New end-user guide for reading/filtering the Compute Resources Usage view.
docs/setup_installation/admin/compute_resources.md New admin guide describing how Kueue objects map to node visibility and required RBAC.
.claude/docs/README.md Updates contributor command guidance (mike deploy/serve) and preview notes.
.claude/CLAUDE.md Updates contributor command guidance (mike deploy/serve) and preview notes.
Comments suppressed due to low confidence (1)

docs/setup_installation/admin/compute_resources.md:79

  • These See also links use relative .md paths. Prefer MkDocs autorefs/heading IDs (e.g., [Kueue][kueue]) to avoid fragile links under mike versioning.
- [Compute Resources Usage](../../user_guides/projects/scheduling/compute_resources.md) — the end-user view this configuration drives.
- [Kueue](../../user_guides/projects/scheduling/kueue_details.md) — overview of the Kueue abstractions referenced above.

Comment thread docs/user_guides/projects/scheduling/compute_resources.md Outdated
Comment thread docs/setup_installation/admin/compute_resources.md Outdated
Comment thread docs/setup_installation/admin/compute_resources.md Outdated
Comment thread .claude/docs/README.md Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 9 changed files in this pull request and generated 1 comment.

Comment thread docs/user_guides/projects/scheduling/compute_resources.md Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 9 changed files in this pull request and generated 1 comment.

Comment thread docs/user_guides/projects/scheduling/compute_resources.md Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 9 changed files in this pull request and generated 2 comments.

Comment thread .claude/docs/README.md
uv pip install "git+https://github.com/logicalclocks/hopsworks-api.git@main#subdirectory=python" # needed for Python API section
touch docs/javadoc; uv run mkdocs serve; rm docs/javadoc # preview with live reload
touch docs/javadoc; uv run mkdocs build -s; rm docs/javadoc # build in strict mode
uv run mike deploy <version> latest --update-alias # build a versioned bundle (use repo's current version, e.g. 4.4); first time only: `uv run mike set-default latest`
Comment thread .claude/CLAUDE.md
uv pip install "git+https://github.com/logicalclocks/hopsworks-api.git@main#subdirectory=python" # install Python API (needed for API docs section)
touch docs/javadoc; uv run mkdocs build -s; rm docs/javadoc # build (strict)
uv run mkdocs serve # preview with live reload
uv run mike deploy <version> latest --update-alias # build a versioned bundle to the gh-pages worktree (use repo's current version, e.g. 4.4); first time only: `uv run mike set-default latest`
@o-alex o-alex force-pushed the HWORKS-2761 branch 4 times, most recently from fb85b70 to 9322e9f Compare May 15, 2026 13:18
…chable nodes

https://hopsworks.atlassian.net/browse/HWORKS-2761

The "Compute Resources Usage" view used to list every node in the
cluster regardless of whether the current Hopsworks project could
schedule on it, ignoring Kueue's per-project scoping. Users saw
capacity they could not consume and had no way to tell which queue
any given node belonged to.

The fix is project-aware filtering driven by Kueue's standard
`LocalQueue -> ClusterQueue -> ResourceFlavor` chain: the backend
narrows the node list and exposes a per-queue map, the frontend
renders a Queue filter plus an access notice explaining the filter,
loadtest exercises the matching logic across varied topologies, and
the docs walk users and admins through the new behaviour.

Two new pages cover the view: an end-user guide for reading the
per-node breakdown, the Queue filter, and the access notice; and an
admin guide covering how the project-to-node mapping is derived,
the ClusterRole the Hopsworks service account needs, and a
troubleshooting table for the fallback states.

The troubleshooting row for the "all nodes shown, no notice" state
(Kueue absent, no LocalQueues, or missing RBAC) tells admins to
disambiguate via `kubectl get crd resourceflavors.kueue.x-k8s.io`
for installation and `kubectl auth can-i list
localqueues.kueue.x-k8s.io -n <project-ns>
--as=system:serviceaccount:<hopsworks-ns>:<hopsworks-sa>` for RBAC.
Three screenshots illustrate the card expanded, the Queue dropdown,
and the filtered grid. A `{#kueue-details}` anchor on
`kueue_details.md` lets the new pages autoref to it without
colliding with the `#kueue` section heading in `kube_scheduler.md`.

Signed-off-by: Alex Ormenisan <alex@logicalclocks.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants