Fix: raise CPU-sim scheduler idle cap via platform-defined constant#845
Open
ChaoZheng109 wants to merge 1 commit into
Open
Fix: raise CPU-sim scheduler idle cap via platform-defined constant#845ChaoZheng109 wants to merge 1 commit into
ChaoZheng109 wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces platform-specific constants for MAX_IDLE_ITERATIONS, allowing for a significantly higher threshold in simulation environments compared to onboard hardware to prevent false deadlock timeouts during slow tasks. Review feedback recommends scaling the STALL_LOG_INTERVAL relative to MAX_IDLE_ITERATIONS instead of using a hardcoded value, which would prevent excessive log volume in simulation while maintaining consistent diagnostic frequency across platforms.
30b2340 to
20cd44d
Compare
Fixes hw-native-sys#840 The scheduler's MAX_IDLE_ITERATIONS was a single runtime constant (800000) shared by sim and onboard. On CPU sim, idle iterations accumulate far faster than real kernel progress, so matmul-heavy kernels (e.g. compressor_ratio4) hit the cap and fail with PTO2_ERROR_SCHEDULER_TIMEOUT (-100) while the kernel is still making forward progress. Move the cap to a platform-defined PLATFORM_MAX_IDLE_ITERATIONS in each backend's spin_hint.h (the existing per-backend AICPU header), so the runtime consumes one symbol while sim and onboard supply their own values: - onboard: 800000 (unchanged hardware behavior) - sim: 8000000 (10x the onboard cap; conservative bump, tune if needed) scheduler_types.h aliases MAX_IDLE_ITERATIONS = PLATFORM_MAX_IDLE_ITERATIONS, mirroring the existing MAX_AICPU_THREADS = PLATFORM_MAX_AICPU_THREADS pattern. STALL_LOG_INTERVAL is now derived as MAX_IDLE_ITERATIONS / 2 instead of a fixed 400000, so the stall-diagnostic log fires ~once halfway to timeout on every platform. This preserves the prior onboard behavior exactly (800000 / 2 == 400000). Applied to both a5 and a2a3 trees. The onboard spin_hint.h headers also gain the standard license block required by the pre-commit header check. Note: the repro kernel lives in the PTO/pypto repo, so the sim value should be validated against compressor_ratio4 there. The PTO-ISA TMatmulNzZn optimizations suggested in the issue are out of scope for this repo. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
MAX_IDLE_ITERATIONSwas a single runtime constant (800000) shared by sim andonboard. On CPU sim, idle scheduler iterations accumulate far faster than real
kernel progress, so matmul-heavy kernels (e.g.
compressor_ratio4) hit the capand fail with
PTO2_ERROR_SCHEDULER_TIMEOUT(run_prepared failed with code -100) while the kernel is still making forward progress.This moves the cap to a platform-defined
PLATFORM_MAX_IDLE_ITERATIONSineach backend's
spin_hint.h(the existing per-backend AICPU header), so theruntime consumes a single symbol while sim and onboard supply their own values:
800000(unchanged hardware behavior)8000000(10× the onboard cap — a conservative bump, see caveat below)Changes
PLATFORM_MAX_IDLE_ITERATIONSadded to all fourspin_hint.h(a5/a2a3 ×sim/onboard). The sim variant's doc comment already described this exact
MAX_IDLE_ITERATIONSinteraction, so the cap is co-located there.scheduler_types.haliasesMAX_IDLE_ITERATIONS = PLATFORM_MAX_IDLE_ITERATIONS,mirroring the existing
MAX_AICPU_THREADS = PLATFORM_MAX_AICPU_THREADSpattern.The runtime stays platform-agnostic — no sim/onboard
#ifdefin runtime code.STALL_LOG_INTERVALis now derived asMAX_IDLE_ITERATIONS / 2instead of afixed
400000, so the stall-diagnostic log fires ~once halfway to timeout onevery platform (the larger sim cap would otherwise spam stall warnings).
Preserves prior onboard behavior exactly (
800000 / 2 == 400000).spin_hint.hheaders gain the standard license block required bythe pre-commit header check.
a5anda2a3trees.Design note
The runtime consumes one platform-provided symbol; the sim/onboard split lives
entirely in the platform layer (same precedent as
SPIN_WAIT_HINT()andPLATFORM_MAX_AICPU_THREADS). A wall-clock timeout viaget_sys_cnt_aicpu()was considered but rejected to keep the idle hot path a plain integer compare
(lower per-iteration overhead, no clock noise).
Scope / caveats
8000000) is not empirically calibrated. The repro kernelcompressor_ratio4.pylives in the PTO/pypto repo, not here, so it could notbe reproduced or measured in this repo.
8000000is a 10× bump over theonboard cap; for reference, the sibling
host_build_graphruntime alreadyruns CPU sim with a
50000000idle cap. The value should be validatedagainst
compressor_ratio4in the PTO env and raised if a slow kernel stillfalse-times-out (a one-line change).
TMatmulNzZnoptimizations suggested in the issue are in thePTO-ISA repo and out of scope for this change.
Testing
tensormap_and_ringbuffersim st test passes on both archs (dummy_task)compressor_ratio4on a2a3sim in PTO env (requires PTO_ISA_ROOT)Fixes #840