src: gpu: intel: jit: enable tiled grf allocation for buffers#4762
src: gpu: intel: jit: enable tiled grf allocation for buffers#4762hidefromkgb wants to merge 1 commit intomainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces an “access map” allocation attribute to describe per-buffer GRF access patterns, enabling nGEN lowering to allocate GRF buffers as non-contiguous tiles to reduce fragmentation and allow larger effective buffers.
Changes:
- Inject a new
access_map_alloc_attr_tinto IR allocations based on observed load/store and selected IR op access patterns. - Teach IR-to-nGEN lowering to allocate GRF buffers using the access map (tiled allocation) instead of requiring contiguous GRF ranges.
- Extend hashing utilities to support hashing
std::pair(used by the new access map representation).
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/gpu/intel/pool/jit/ir_builder.cpp | Runs the new access-map attribute injection pass in the pooling JIT pipeline. |
| src/gpu/intel/conv/jit/ir_builder.cpp | Runs the new access-map attribute injection pass in the convolution JIT pipeline. |
| src/gpu/intel/jit/pass/alloc.hpp | Declares inject_access_map_attribute(). |
| src/gpu/intel/jit/pass/alloc.cpp | Implements access-map collection/injection based on IR buffer accesses. |
| src/gpu/intel/gemm/jit/dsl/ir/core.hpp | Defines the new access_map_alloc_attr_t allocation attribute type. |
| src/gpu/intel/gemm/jit/dsl/ir/codegen/codegen.cpp | Implements tiled GRF allocation driven by the access map; adds fallback “unmapped” allocation path. |
| src/gpu/intel/gemm/jit/include/internal/utils.hpp | Adds hashing support for std::pair to enable hashing access-map vectors. |
| src/gpu/intel/jit/pass/bank_conflict.cpp | Removes the special-case skip for zero-points-related calls during bank-conflict attribute injection. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
2c8f5c0 to
603b859
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
603b859 to
a5cf8ae
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| auto regs = utils::safe_divide(a.second, grf_size); | ||
| allocs.emplace_back(a.first, ra.alloc_range(regs, base_bundle)); |
There was a problem hiding this comment.
The regs parameter (line 380) is shadowed by the local auto regs on line 418, making the function parameter completely unused. This will likely trigger -Wshadow compiler warnings. Consider either removing the parameter (since the function computes its own per-chunk register counts from the access map) or renaming the local variable on line 418 to something like chunk_regs to avoid the shadowing.
| auto regs = utils::safe_divide(a.second, grf_size); | |
| allocs.emplace_back(a.first, ra.alloc_range(regs, base_bundle)); | |
| auto chunk_regs = utils::safe_divide(a.second, grf_size); | |
| allocs.emplace_back( | |
| a.first, ra.alloc_range(chunk_regs, base_bundle)); |
|
make test |
|
make test perf-gpu |
Another byproduct of #4540, formerly a part of #4650.
Buffers can now be allocated non-contiguously, mitigating the GRF fragmentation and enabling larger blocks.