[AMD] Rewrite extract_slice op implementation #7128

plognjen · 2025-06-10T15:27:24Z

This PR refactors the extract_slice operation to support two major improvements:

Relaxed Layout Constraints
The operation now allows more flexible source and destination layouts, aligning better with linear layouts.
Support for Arbitrary Tensor Ranks
extract_slice is no longer limited to 2D tensors and can now handle tensors of any rank.

The "extract_slice" operation enables extracting a slice of a tensor in registers.
It supports the following arguments:
* source: the base tensor on which to create a view tensor
* offsets: offsets into the base tensor at which to create the view
In distributed layouts, tensors are divided into CTA tiles.
A CTA tile represents the smallest contiguous portion of a tensor that is distributed across all threads and warps within a workgroup. The ExtractSlice operation extracts a portion of the tensor that aligns with CTA tile boundaries.

This op is designed to work on logical tensors directly, avoiding the need for complex layout reinterpretation or reshaping.
For example, the tt.split operation only supports splitting along the innermost dimension,
and requires that the resulting innermost dimension provide 2 elements per thread, distributed across registers.
In contrast, extract_slice op imposes no constraints on the extraction dimension or the size of dimensions.

third_party/amd/include/Dialect/TritonAMDGPU/IR/TritonAMDGPUOps.td

third_party/amd/lib/TritonAMDGPUDialectToLLVM/ExtractSliceOpToLLVM.cpp

third_party/amd/lib/Dialect/TritonAMDGPU/IR/Dialect.cpp

plognjen · 2025-06-12T20:35:44Z

@antiagainst I addressed the comments. Thanks for the review!

antiagainst

LGTM now. Thanks for tidying up it! Adding @ThomasRaoux to take another look to make sure this also looks good.

third_party/amd/lib/Utils/Utility.cpp

ravil-mobile · 2025-06-20T08:40:45Z

LGTM now. Thanks for tidying up it! Adding @ThomasRaoux to take another look to make sure this also looks good.

Hi @ThomasRaoux. Would you have time to have a look at this PR?

ThomasRaoux

LGTM, thanks for improving this

third_party/amd/lib/Dialect/TritonAMDGPU/IR/Dialect.cpp

This PR refactors the extract_slice operation to support two major improvements: 1) Relaxed Layout Constraints The operation now allows more flexible source and destination layouts, aligning better with linear layouts. 2) Support for Arbitrary Tensor Ranks extract_slice is no longer limited to 2D tensors and can now handle tensors of any rank. The "extract_slice" operation enables extracting a slice of a tensor in registers. It supports the following arguments: * source: the base tensor on which to create a view tensor * offsets: offsets into the base tensor at which to create the view In distributed layouts, tensors are divided into CTA tiles. A CTA tile represents the smallest contiguous portion of a tensor that is distributed across all threads and warps within a workgroup. The ExtractSlice operation extracts a portion of the tensor that aligns with CTA tile boundaries. This op is designed to work on logical tensors directly, avoiding the need for complex layout reinterpretation or reshaping. For example, the tt.split operation only supports splitting along the innermost dimension, and requires that the resulting innermost dimension provide 2 elements per thread, distributed across registers. In contrast, extract_slice op imposes no constraints on the extraction dimension or the size of dimensions. --------- Co-authored-by: Ognjen Plavsic <plognjen@amd.com> Co-authored-by: Lei Zhang <antiagainst@gmail.com>

plognjen requested review from antiagainst, ptillet and zhanglx13 as code owners June 10, 2025 15:27

plognjen mentioned this pull request Jun 10, 2025

[AMD] Relax conditions in ExtractSlice verifier #6417

Closed

antiagainst requested changes Jun 12, 2025

View reviewed changes

antiagainst changed the title ~~Complete rewrite of extract_slice op~~ [AMD] Rewrite extract_slice op implementation Jun 12, 2025

oplavsic added 3 commits June 12, 2025 20:25

Complete rewrite of extract_slice op

c8b69a1

Fix documentation

32937b6

Address review comments

f961aff

plognjen force-pushed the extract_slice_rewrite branch from 680299e to f961aff Compare June 12, 2025 20:30

Add utils dir

56b5ed0

antiagainst approved these changes Jun 13, 2025

View reviewed changes

third_party/amd/lib/Utils/Utility.cpp Outdated Show resolved Hide resolved

ThomasRaoux approved these changes Jun 20, 2025

View reviewed changes

third_party/amd/lib/Dialect/TritonAMDGPU/IR/Dialect.cpp Outdated Show resolved Hide resolved

antiagainst added 2 commits June 20, 2025 17:03

Merge remote-tracking branch 'origin/main' into extract_slice_rewrite

6a2af81

Fix some nits

b9c344d

antiagainst merged commit 5b7bc04 into triton-lang:main Jun 20, 2025
9 checks passed

antiagainst mentioned this pull request Jun 22, 2025

[AMD] adjusted calculation of sizePerThread in ExtractSliceOpToLLVM.cpp #6295

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMD] Rewrite extract_slice op implementation #7128

[AMD] Rewrite extract_slice op implementation #7128

Uh oh!

plognjen commented Jun 10, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

plognjen commented Jun 12, 2025

Uh oh!

antiagainst left a comment

Uh oh!

Uh oh!

ravil-mobile commented Jun 20, 2025

Uh oh!

ThomasRaoux left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[AMD] Rewrite extract_slice op implementation #7128

[AMD] Rewrite extract_slice op implementation #7128

Uh oh!

Conversation

plognjen commented Jun 10, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

plognjen commented Jun 12, 2025

Uh oh!

antiagainst left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ravil-mobile commented Jun 20, 2025

Uh oh!

ThomasRaoux left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants