Skip to content

Conversation

@neildhar
Copy link
Collaborator

@neildhar neildhar commented Nov 14, 2025

We currently force initialisation of operands that have not yet been
visited with setToEntryState. This means that the order in which
values are visited can change the results of the analysis.

This can be a source of bugs. For example, the lowering for
AsyncCopyGlobalToLocalOp validates that the load addresses permit
sufficient vectorisation, however, this is up to the analysis actually
recovering the same information it had when the async copy was created.
Otherwise, we crash during lowering. I have an actual repro for this
but it has been very difficult to minimise it enough to make it suitable
for an lit test:
https://gist.github.com/neildhar/7eea6a312afa39d1cc83dc12627c2ba3

Populating the operands in this way also means that we have to handle
control flow like ForOp and IfOp explicitly in setToEntryState,
because we may attempt to populate their results when we visit their
users.

Instead, when we encounter an operation whose operands have not yet been
encountered, skip over the operation entirely. We can revisit it once
the operands have actually been visited. This improves the quality of
the analysis, and leaves the handling of control flow to the dataflow
framework.

We currently force initialisation of operands that have not yet been
visited with `setToEntryState`. This means that the order in which
values are visited can change the results of the analysis.

This can be a source of bugs. For example, the lowering for
`AsyncCopyGlobalToLocalOp` validates that the load addresses permit
sufficient vectorisation, however, this is up to the analysis actually
recovering the same information it had when the async copy was created.
Otherwise, we crash during lowering. I have an actual repro for this
but it has been very difficult to minimise it enough to make it suitable
for an lit test:
https://gist.github.com/neildhar/7eea6a312afa39d1cc83dc12627c2ba3

Populating the operands in this way also means that we have to handle
control flow like `ForOp` and `IfOp` explicitly in `setToEntryState`,
because we may attempt to populate their results when we visit their
users.

Instead, when we encounter an operation whose operands have not yet been
encountered, skip over the operation entirely. We can revisit it once
the operands have actually been visited. This improves the quality of
the analysis, and leaves the handling of control flow to the dataflow
framework.
@neildhar neildhar force-pushed the fix-axisinfo-unvisited branch from 68fdec0 to a9cfa5c Compare November 14, 2025 06:38
@neildhar neildhar marked this pull request as ready for review November 14, 2025 06:42
@neildhar neildhar requested a review from ptillet as a code owner November 14, 2025 06:42
@ThomasRaoux
Copy link
Collaborator

Why is it expected to be better? Is it possible to write a test where the results are better?
@Mogball, does that match what you expect?

@neildhar
Copy link
Collaborator Author

@ThomasRaoux Yes, although it has been very difficult to minimise into something that would be suitable for a test. The bug in the PR description is an example of this. Existing tests should already exercise the changes in this PR though, since we cannot delete the control flow handling without the fix to skip over operations with unvisited operands.

I think while we're operating on structured control flow, this probably is not any better. The framework visits all the operations in order, with the exception of the scf ops, but we special case them by initialising them with the best case. So forcing all the operands to be initialised should only affect scf ops, which we handle.

However, once the structured control flow is lowered out, the order in which things are visited becomes important. If we visit an operation before its operands are visited. We may end up pessimising the analysis, because we assign that operand the most conservative state. This is what happens in the bug in the PR description. Performing the analysis on the structured control flow in an earlier pass determines that we can use AsyncCopyGlobalToLocal. However, when we actually try to lower the async copy in LoadStoreOpToLLVM, we can no longer recover that information because the analysis produces a worse result.

@ThomasRaoux ThomasRaoux requested a review from Mogball November 14, 2025 17:58
Copy link
Collaborator

@Mogball Mogball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me and is the right way to handle this.

// Control flow operations are initialized with "unknown" state:
// the maximum possible divisibility, contiguity, and constancy.
} else if (isa<gpu::WarpSpecializePartitionsOp>(op)) {
// Initialize the arguments to gpu::WarpSpecializePartitionsOp with
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The argument states of WarpSpecializeOp should be forwarded from its operands. Unfortunately it doesn't implement RegionBranchOpInterface so you might have to implement this yourself.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm planning to look a little more into this, but it seemed orthogonal to this PR. With this PR, we no longer try to populate the state of every operand that hasn't been visited. However, WarpSpecializePartitionsOp is special because we walk the IR and restart the analysis on it. So the framework will call setToEntryState on the block args to it (rather than us calling it ourselves from visitOperation).

When that happens, we want to start with the maximum divisibility, contiguity and constancy, so that we don't start from an artificially restricted state.

@ThomasRaoux ThomasRaoux merged commit e33219f into triton-lang:main Nov 14, 2025
9 checks passed
@neildhar neildhar deleted the fix-axisinfo-unvisited branch November 14, 2025 22:32
peterbell10 added a commit that referenced this pull request Nov 18, 2025
We are seeing failures in the coalesce pass caused by axis info being
unset for some values.
peterbell10 added a commit that referenced this pull request Nov 18, 2025
…" (#8754)

We are seeing failures in the coalesce pass caused by axis info being
unset for some values.
@neildhar neildhar restored the fix-axisinfo-unvisited branch November 18, 2025 16:44
tmoreau89 pushed a commit to tmoreau89/triton that referenced this pull request Dec 1, 2025
)

We currently force initialisation of operands that have not yet been
visited with `setToEntryState`. This means that the order in which
values are visited can change the results of the analysis.

This can be a source of bugs. For example, the lowering for
`AsyncCopyGlobalToLocalOp` validates that the load addresses permit
sufficient vectorisation, however, this is up to the analysis actually
recovering the same information it had when the async copy was created.
Otherwise, we crash during lowering. I have an actual repro for this
but it has been very difficult to minimise it enough to make it suitable
for an lit test:
https://gist.github.com/neildhar/7eea6a312afa39d1cc83dc12627c2ba3

Populating the operands in this way also means that we have to handle
control flow like `ForOp` and `IfOp` explicitly in `setToEntryState`,
because we may attempt to populate their results when we visit their
users.

Instead, when we encounter an operation whose operands have not yet been
encountered, skip over the operation entirely. We can revisit it once
the operands have actually been visited. This improves the quality of
the analysis, and leaves the handling of control flow to the dataflow
framework.
tmoreau89 pushed a commit to tmoreau89/triton that referenced this pull request Dec 1, 2025
…n-lang#8723)" (triton-lang#8754)

We are seeing failures in the coalesce pass caused by axis info being
unset for some values.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants