Skip to content

Fixing GFX Executor missing threadfence_system from v1.67#327

Merged
gilbertlee-amd merged 2 commits into
ROCm:candidatefrom
gilbertlee-amd:ThreadfenceFix
Jun 29, 2026
Merged

Fixing GFX Executor missing threadfence_system from v1.67#327
gilbertlee-amd merged 2 commits into
ROCm:candidatefrom
gilbertlee-amd:ThreadfenceFix

Conversation

@gilbertlee-amd

Copy link
Copy Markdown
Collaborator

The threadfence_system prior to timestamp collection in v1.67 was accidentally removed which could cause artificially inflated Transfer times because stop timestamp could be collected prior to all writes being drained. This should not have impacted GFX Executor times.

@gilbertlee-amd gilbertlee-amd requested review from a team as code owners June 28, 2026 07:07
@nileshnegi nileshnegi requested a review from Copilot June 28, 2026 07:09
Comment thread src/header/TransferBench.hpp

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Restores a missing system-scope thread fence before collecting stop timestamps in GFX kernels (addressing inflated timing due to undrained writes), and bumps the project version/release notes to v1.68.

Changes:

  • Add __threadfence_system() before synchronization + timestamp collection in GFX device kernels.
  • Adjust CPU-side timing start placement in RunGpuExecutor and filter per-transfer cycle aggregation for small/empty sub-executors.
  • Bump version to 1.68 and add a v1.68.00 changelog entry.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
src/header/TransferBench.hpp Adds system-scope fences before timing, adjusts GPU executor CPU timing start point, and filters per-transfer cycle aggregation.
CHANGELOG.md Adds v1.68.00 release notes describing the timing fixes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/header/TransferBench.hpp Outdated
Comment thread src/header/TransferBench.hpp Outdated
Comment thread CHANGELOG.md Outdated

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

@gilbertlee-amd gilbertlee-amd merged commit 594eb70 into ROCm:candidate Jun 29, 2026
7 of 10 checks passed
@gilbertlee-amd gilbertlee-amd deleted the ThreadfenceFix branch June 29, 2026 16:10
gilbertlee-amd added a commit that referenced this pull request Jun 29, 2026
* Fixing GFX Executor missing threadfence_system from v1.67 (#327)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants