Skip to content

Conversation

@AlexAUT
Copy link
Contributor

@AlexAUT AlexAUT commented Oct 22, 2025

Adds support for lowering and in UpdateAsyncWaitCnt.

Note that on gfx1250 async_loads use asynccnt which is separate from register and TDM loads so they can finish out of order. This means register and tdm loads should be ignored by UpdateAsyncWaitCnt and no performance remark for the former compared to GFX9.

Intrinsics will be replaced by ROCDL ops once we bumped LLVM.

@AlexAUT AlexAUT marked this pull request as ready for review October 22, 2025 11:02
@antiagainst antiagainst merged commit a2fdd73 into triton-lang:main Oct 24, 2025
9 checks passed
masahi pushed a commit to masahi/triton that referenced this pull request Oct 24, 2025
Adds support for lowering and in `UpdateAsyncWaitCnt`.

Note that on `gfx1250` async_loads use `asynccnt` which is separate from
register and TDM loads so they can finish out of order. This means
register and tdm loads should be ignored by `UpdateAsyncWaitCnt` and no
performance remark for the former compared to `GFX9`.

Intrinsics will be replaced by ROCDL ops once we bumped LLVM.
tmoreau89 pushed a commit to tmoreau89/triton that referenced this pull request Dec 1, 2025
Adds support for lowering and in `UpdateAsyncWaitCnt`.

Note that on `gfx1250` async_loads use `asynccnt` which is separate from
register and TDM loads so they can finish out of order. This means
register and tdm loads should be ignored by `UpdateAsyncWaitCnt` and no
performance remark for the former compared to `GFX9`.

Intrinsics will be replaced by ROCDL ops once we bumped LLVM.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants