-
Notifications
You must be signed in to change notification settings - Fork 2.6k
[AMD] Improve layout selection in optimize-lds-usage pass to prefer swizzled layouts #7750
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@antiagainst Hi! Just wanted to follow up on this PR. I'm happy to address any concerns or make changes if needed. Thanks! |
antiagainst
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the patch! I left some comments. @alefimov-amd who touches this part frequently is out of office now; I'd prefer to have him taking a look too next week. BTW do you have some statistics regarding the improvements?
|
Just a general question about swizzled vs padded layouts. It looks like the heuristic should generally chooses the right option, but do we have any way for kernels to specify exactly which they want so they can explore the lds storage options? |
|
@antiagainst Thanks for the suggestions. Regarding the benchmarks, I prepared the file below with several scenarios where this pass is required and compared the outputs of |
|
@guacamoleo If you're asking whether users have control over which layout is used, currently there isn't a way to manually specify this. The system defaults to swizzled layouts since they typically provide better performance and more efficient memory access patterns. It's worth noting that padded layouts have been phased out in NVIDIA's backend. For AMD backend, padded layouts exist for lds consumption reasons, which you can read more about in this PR. |
With Gluon we will have such ability where you can directly program shared memory so the layout therein. |
) # PR Description ## What changed This PR optimizes the `optimize-lds-usage` pass by improving layout selection strategy for convert-layout operations. The changes include: - Removed hardcoded preference for padded layouts in the AMD-specific LDS optimization pass - Enhanced `estimateResourcesForReplacement` method to calculate memory requirements for both padded and swizzled layouts - Modified layout selection logic to prefer swizzled layouts when LDS limits allow, falling back to padded layouts only when necessary ## Why this change was needed The `optimize-lds-usage` pass previously enforced padded layouts based on the assumption that "padded conversion seems more friendly with this optimization." While this approach reduced LDS consumption, it came with significant performance penalties: - Hundreds of additional LLVM IR lines generated for padded versions - Reduced vectorization **The New Approach:** - Prioritizes swizzled layouts for better performance and vectorization - Only falls back to padded layouts when swizzled versions exceed LDS limits - Maintains the pass's core functionality of reducing shared memory consumption through intermediate buffering - Achieves better balance between LDS usage and execution performance
PR Description
What changed
This PR optimizes the
optimize-lds-usagepass by improving layout selection strategy for convert-layout operations. The changes include:estimateResourcesForReplacementmethod to calculate memory requirements for both padded and swizzled layoutsWhy this change was needed
The
optimize-lds-usagepass previously enforced padded layouts based on the assumption that "padded conversion seems more friendly with this optimization." While this approach reduced LDS consumption, it came with significant performance penalties:The New Approach: