Skip to content

Conversation

@AlexAUT
Copy link
Contributor

@AlexAUT AlexAUT commented Nov 3, 2025

#8575 moved UpdateAsyncWaitCount from TTIR->TTGIR (before converting to buffer ops) to TTGIR->LLVM. This means we now have to include amdgpu.buffer_load_to_local when counting outstanding async instructions. This change is also required for Gluon where we directly emit buffer ops.

Ignoring them causes a performance regressions because we are emitting conservative waits.

@AlexAUT AlexAUT marked this pull request as ready for review November 3, 2025 16:28
@antiagainst antiagainst merged commit 5a6f410 into triton-lang:main Nov 3, 2025
9 checks passed
tmoreau89 pushed a commit to tmoreau89/triton that referenced this pull request Dec 1, 2025
…computations (triton-lang#8621)

triton-lang#8575 moved
`UpdateAsyncWaitCount` from `TTIR->TTGIR` (before converting to buffer
ops) to `TTGIR->LLVM`. This means we now have to include
`amdgpu.buffer_load_to_local` when counting outstanding async
instructions. This change is also required for Gluon where we directly
emit buffer ops.

Ignoring them causes a performance regressions because we are emitting
conservative waits.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants