-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Fix handling of unvisited operands in AxisInfoAnalysis #8723
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
We currently force initialisation of operands that have not yet been visited with `setToEntryState`. This means that the order in which values are visited can change the results of the analysis. This can be a source of bugs. For example, the lowering for `AsyncCopyGlobalToLocalOp` validates that the load addresses permit sufficient vectorisation, however, this is up to the analysis actually recovering the same information it had when the async copy was created. Otherwise, we crash during lowering. I have an actual repro for this but it has been very difficult to minimise it enough to make it suitable for an lit test: https://gist.github.com/neildhar/7eea6a312afa39d1cc83dc12627c2ba3 Populating the operands in this way also means that we have to handle control flow like `ForOp` and `IfOp` explicitly in `setToEntryState`, because we may attempt to populate their results when we visit their users. Instead, when we encounter an operation whose operands have not yet been encountered, skip over the operation entirely. We can revisit it once the operands have actually been visited. This improves the quality of the analysis, and leaves the handling of control flow to the dataflow framework.
68fdec0 to
a9cfa5c
Compare
|
Why is it expected to be better? Is it possible to write a test where the results are better? |
|
@ThomasRaoux Yes, although it has been very difficult to minimise into something that would be suitable for a test. The bug in the PR description is an example of this. Existing tests should already exercise the changes in this PR though, since we cannot delete the control flow handling without the fix to skip over operations with unvisited operands. I think while we're operating on structured control flow, this probably is not any better. The framework visits all the operations in order, with the exception of the scf ops, but we special case them by initialising them with the best case. So forcing all the operands to be initialised should only affect scf ops, which we handle. However, once the structured control flow is lowered out, the order in which things are visited becomes important. If we visit an operation before its operands are visited. We may end up pessimising the analysis, because we assign that operand the most conservative state. This is what happens in the bug in the PR description. Performing the analysis on the structured control flow in an earlier pass determines that we can use |
Mogball
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense to me and is the right way to handle this.
| // Control flow operations are initialized with "unknown" state: | ||
| // the maximum possible divisibility, contiguity, and constancy. | ||
| } else if (isa<gpu::WarpSpecializePartitionsOp>(op)) { | ||
| // Initialize the arguments to gpu::WarpSpecializePartitionsOp with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The argument states of WarpSpecializeOp should be forwarded from its operands. Unfortunately it doesn't implement RegionBranchOpInterface so you might have to implement this yourself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm planning to look a little more into this, but it seemed orthogonal to this PR. With this PR, we no longer try to populate the state of every operand that hasn't been visited. However, WarpSpecializePartitionsOp is special because we walk the IR and restart the analysis on it. So the framework will call setToEntryState on the block args to it (rather than us calling it ourselves from visitOperation).
When that happens, we want to start with the maximum divisibility, contiguity and constancy, so that we don't start from an artificially restricted state.
We are seeing failures in the coalesce pass caused by axis info being unset for some values.
…" (#8754) We are seeing failures in the coalesce pass caused by axis info being unset for some values.
) We currently force initialisation of operands that have not yet been visited with `setToEntryState`. This means that the order in which values are visited can change the results of the analysis. This can be a source of bugs. For example, the lowering for `AsyncCopyGlobalToLocalOp` validates that the load addresses permit sufficient vectorisation, however, this is up to the analysis actually recovering the same information it had when the async copy was created. Otherwise, we crash during lowering. I have an actual repro for this but it has been very difficult to minimise it enough to make it suitable for an lit test: https://gist.github.com/neildhar/7eea6a312afa39d1cc83dc12627c2ba3 Populating the operands in this way also means that we have to handle control flow like `ForOp` and `IfOp` explicitly in `setToEntryState`, because we may attempt to populate their results when we visit their users. Instead, when we encounter an operation whose operands have not yet been encountered, skip over the operation entirely. We can revisit it once the operands have actually been visited. This improves the quality of the analysis, and leaves the handling of control flow to the dataflow framework.
…n-lang#8723)" (triton-lang#8754) We are seeing failures in the coalesce pass caused by axis info being unset for some values.
We currently force initialisation of operands that have not yet been
visited with
setToEntryState. This means that the order in whichvalues are visited can change the results of the analysis.
This can be a source of bugs. For example, the lowering for
AsyncCopyGlobalToLocalOpvalidates that the load addresses permitsufficient vectorisation, however, this is up to the analysis actually
recovering the same information it had when the async copy was created.
Otherwise, we crash during lowering. I have an actual repro for this
but it has been very difficult to minimise it enough to make it suitable
for an lit test:
https://gist.github.com/neildhar/7eea6a312afa39d1cc83dc12627c2ba3
Populating the operands in this way also means that we have to handle
control flow like
ForOpandIfOpexplicitly insetToEntryState,because we may attempt to populate their results when we visit their
users.
Instead, when we encounter an operation whose operands have not yet been
encountered, skip over the operation entirely. We can revisit it once
the operands have actually been visited. This improves the quality of
the analysis, and leaves the handling of control flow to the dataflow
framework.