Skip to content

src: gpu: intel: jit: enable tiled grf allocation for buffers#4762

Open
hidefromkgb wants to merge 1 commit intomainfrom
aguskov/jit_tiled_alloc
Open

src: gpu: intel: jit: enable tiled grf allocation for buffers#4762
hidefromkgb wants to merge 1 commit intomainfrom
aguskov/jit_tiled_alloc

Conversation

@hidefromkgb
Copy link
Contributor

Another byproduct of #4540, formerly a part of #4650.
Buffers can now be allocated non-contiguously, mitigating the GRF fragmentation and enabling larger blocks.

@hidefromkgb hidefromkgb requested a review from a team as a code owner March 4, 2026 03:29
@github-actions github-actions bot added the platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel label Mar 4, 2026
@hidefromkgb hidefromkgb requested a review from Copilot March 4, 2026 03:32
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces an “access map” allocation attribute to describe per-buffer GRF access patterns, enabling nGEN lowering to allocate GRF buffers as non-contiguous tiles to reduce fragmentation and allow larger effective buffers.

Changes:

  • Inject a new access_map_alloc_attr_t into IR allocations based on observed load/store and selected IR op access patterns.
  • Teach IR-to-nGEN lowering to allocate GRF buffers using the access map (tiled allocation) instead of requiring contiguous GRF ranges.
  • Extend hashing utilities to support hashing std::pair (used by the new access map representation).

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/gpu/intel/pool/jit/ir_builder.cpp Runs the new access-map attribute injection pass in the pooling JIT pipeline.
src/gpu/intel/conv/jit/ir_builder.cpp Runs the new access-map attribute injection pass in the convolution JIT pipeline.
src/gpu/intel/jit/pass/alloc.hpp Declares inject_access_map_attribute().
src/gpu/intel/jit/pass/alloc.cpp Implements access-map collection/injection based on IR buffer accesses.
src/gpu/intel/gemm/jit/dsl/ir/core.hpp Defines the new access_map_alloc_attr_t allocation attribute type.
src/gpu/intel/gemm/jit/dsl/ir/codegen/codegen.cpp Implements tiled GRF allocation driven by the access map; adds fallback “unmapped” allocation path.
src/gpu/intel/gemm/jit/include/internal/utils.hpp Adds hashing support for std::pair to enable hashing access-map vectors.
src/gpu/intel/jit/pass/bank_conflict.cpp Removes the special-case skip for zero-points-related calls during bank-conflict attribute injection.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@hidefromkgb hidefromkgb force-pushed the aguskov/jit_tiled_alloc branch 3 times, most recently from 2c8f5c0 to 603b859 Compare March 5, 2026 04:46
@hidefromkgb hidefromkgb requested a review from Copilot March 5, 2026 04:49
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +418 to +419
auto regs = utils::safe_divide(a.second, grf_size);
allocs.emplace_back(a.first, ra.alloc_range(regs, base_bundle));
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regs parameter (line 380) is shadowed by the local auto regs on line 418, making the function parameter completely unused. This will likely trigger -Wshadow compiler warnings. Consider either removing the parameter (since the function computes its own per-chunk register counts from the access map) or renaming the local variable on line 418 to something like chunk_regs to avoid the shadowing.

Suggested change
auto regs = utils::safe_divide(a.second, grf_size);
allocs.emplace_back(a.first, ra.alloc_range(regs, base_bundle));
auto chunk_regs = utils::safe_divide(a.second, grf_size);
allocs.emplace_back(
a.first, ra.alloc_range(chunk_regs, base_bundle));

Copilot uses AI. Check for mistakes.
@hidefromkgb
Copy link
Contributor Author

make test
set test_scope=NIGHTLY
disable test_device_cpu
disable benchdnn_all
enable benchdnn_pool
enable benchdnn_conv
enable benchdnn_deconv
enable benchdnn_reorder
enable arch_gpu_xe-hpc
enable arch_gpu_xe-hpg-atsm
enable arch_gpu_xe-hpg-dg2
enable arch_gpu_xe-lp
enable arch_gpu_xe-lpg
enable arch_gpu_xe-lpg+
enable arch_gpu_xe2-hpg-bmg
enable arch_gpu_xe2-lpg
enable arch_gpu_xe3-lpg

@hidefromkgb
Copy link
Contributor Author

make test perf-gpu
set primitive=pool conv deconv reorder

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants