[AMD][Gluon] Expose buffer_load and buffer_store to Gluon #7738

zwu-2025 · 2025-08-01T18:11:39Z

Expose AMD buffer_load and buffer_store Gluon. Example usage looks like:

def buffer_ldst_kernel(x, y):
    layout: ttgl.constexpr = ttgl.BlockedLayout(size_per_thread=[1, 1], threads_per_warp=[1, 64], warps_per_cta=[4, 1],  order=[1, 0])
    offsets = ttgl.arange(0, 64 * 64, layout=layout)
    a = ttgl.amd.cdna3.buffer_load(ptr=x, offsets=offsets)
    ttgl.amd.cdna3.buffer_store(stored_value=a, ptr=y, offsets=offsets)

peterbell10 · 2025-08-01T19:57:23Z

python/triton/experimental/gluon/language/_semantic.py

+        layout = ttgl._unwrap_if_constexpr(layout)
+
+        ret_ty = ttgl.distributed_type(element_type, shape, layout)
+        handle = self.builder.create_buffer_load(ret_ty.to_ir(self.builder), ptr, offsets, cache_modifier, mask, other)


Can you infer the layout from the the offsets, the dtype from the pointer and add defaults for mask and other? You probably also need to add broadcasting.

My general feedback would be to think about this as a language API, not just a wrapper to emit IR.

peterbell10 · 2025-08-01T19:58:17Z

python/triton/experimental/gluon/language/_semantic.py

+        handle = self.builder.create_buffer_store(stored_value, ptr, offsets, cache_modifier, mask)
+        return ttgl.tensor(handle, ttgl.void)


I want to move away from tl.void. I don't think it adds any value.

Suggested change

handle = self.builder.create_buffer_store(stored_value, ptr, offsets, cache_modifier, mask)

return ttgl.tensor(handle, ttgl.void)

self.builder.create_buffer_store(stored_value, ptr, offsets, cache_modifier, mask)

peterbell10 · 2025-08-01T19:58:31Z

python/triton/experimental/gluon/language/amd/cdna3/__init__.py

+
+
+@builtin
+def create_buffer_load(ptr, element_type, offsets, cache, mask, layout, other, _semantic=None):


The create_ prefix is only used by the builder, not in the language.

Suggested change

def create_buffer_load(ptr, element_type, offsets, cache, mask, layout, other, _semantic=None):

def buffer_load(ptr, element_type, offsets, cache, mask, layout, other, _semantic=None):

python/src/gluon_ir.cc

python/triton/experimental/gluon/language/amd/cdna3/__init__.py

python/triton/experimental/gluon/language/_semantic.py

same style as the implementation in other backend

python/triton/experimental/gluon/language/amd/cdna3/__init__.py

python/src/gluon_ir.cc

python/triton/experimental/gluon/language/amd/__init__.py

python/triton/experimental/gluon/language/amd/cdna3/__init__.py

antiagainst · 2025-08-04T21:29:04Z

python/triton/experimental/gluon/language/amd/cdna3/__init__.py

+    element_type = ptr.type.scalar.element_ty
+
+    if mask is not None:
+        assert mask.shape == shape, "offsets must have the same shape as offsets"


"mask must have .." :)

This is not addressed yet?

python/test/gluon/test_frontend.py

…ldst

antiagainst

Nice just one final comments from me. Also @peterbell10 to take another look.

python/triton/experimental/gluon/language/amd/cdna3/__init__.py

…ldst

python/src/gluon_ir.cc

python/triton/experimental/gluon/language/amd/cdna3/__init__.py

python/test/gluon/test_frontend.py

python/src/gluon_ir.cc

antiagainst

LGTM. I'll land once @peterbell10 is okay.

Expose AMD buffer_load and buffer_store Gluon. Example usage looks like: ``` def buffer_ldst_kernel(x, y): layout: ttgl.constexpr = ttgl.BlockedLayout(size_per_thread=[1, 1], threads_per_warp=[1, 64], warps_per_cta=[4, 1], order=[1, 0]) offsets = ttgl.arange(0, 64 * 64, layout=layout) a = ttgl.amd.cdna3.buffer_load(ptr=x, offsets=offsets) ttgl.amd.cdna3.buffer_store(stored_value=a, ptr=y, offsets=offsets) ```

[Gluon][AMD] support buffer_load and buffer_store

678c109

peterbell10 reviewed Aug 1, 2025

View reviewed changes

zwu-2025 changed the title ~~Expose buffer_load and buffer_store in Gluon~~ Expose buffer_load and buffer_store to Gluon Aug 1, 2025

borontion mentioned this pull request Aug 1, 2025

[AMD][Gluon] Expose buffer load to local op #7746

Merged

antiagainst requested changes Aug 2, 2025

View reviewed changes

antiagainst changed the title ~~Expose buffer_load and buffer_store to Gluon~~ [AMD][Gluon] Expose buffer_load and buffer_store to Gluon Aug 2, 2025

[AMD][Gluon] Expose buffer_load and buffer_store

132a8fa

zwu-2025 force-pushed the buffer_ldst branch 3 times, most recently from d6fcb54 to 79fdfea Compare August 4, 2025 02:13

comments resolve

9025389

zwu-2025 force-pushed the buffer_ldst branch from 79fdfea to 9025389 Compare August 4, 2025 02:14

antiagainst mentioned this pull request Aug 4, 2025

[AMD] knob for within_2gb check for specialization #7720

Closed

Refactor: call GluonBuilder directly skipping _semantic, following the

cf0bf03

same style as the implementation in other backend

zwu-2025 force-pushed the buffer_ldst branch from 9f9a847 to cf0bf03 Compare August 4, 2025 18:11

antiagainst requested changes Aug 4, 2025

View reviewed changes

zwu-2025 added 5 commits August 4, 2025 18:57

export to cdna4 and other comments resolve

d799068

Merge branch 'main' into buffer_ldst

e8c1a52

test rename

99d6b1f

test rename

c292820

Merge branch 'buffer_ldst' of github.com:zwu-2025/triton into buffer_…

f5b8169

…ldst

antiagainst marked this pull request as ready for review August 5, 2025 00:44

antiagainst approved these changes Aug 5, 2025

View reviewed changes

python/triton/experimental/gluon/language/amd/cdna3/__init__.py Outdated Show resolved Hide resolved

zwu-2025 added 2 commits August 5, 2025 00:38

Merge branch 'buffer_ldst' of github.com:zwu-2025/triton into buffer_…

54d0f4e

…ldst

Merge branch 'buffer_ldst' of github.com:zwu-2025/triton into buffer_…

9f2441b

…ldst

peterbell10 reviewed Aug 5, 2025

View reviewed changes

python/src/gluon_ir.cc Show resolved Hide resolved

python/triton/experimental/gluon/language/amd/cdna3/__init__.py Outdated Show resolved Hide resolved

python/triton/experimental/gluon/language/amd/cdna3/__init__.py Outdated Show resolved Hide resolved

zwu-2025 added 3 commits August 5, 2025 11:42

Merge branch 'main' into buffer_ldst

8f0ea65

comments resolve

a4233e2

support buffer_[load|store] with broadcast

531c9bc

antiagainst requested changes Aug 5, 2025

View reviewed changes

python/triton/experimental/gluon/language/amd/cdna3/__init__.py Show resolved Hide resolved

python/test/gluon/test_frontend.py Show resolved Hide resolved

python/test/gluon/test_frontend.py Show resolved Hide resolved

python/src/gluon_ir.cc Show resolved Hide resolved

comment resolve

c6a8f5d

antiagainst approved these changes Aug 5, 2025

View reviewed changes

peterbell10 approved these changes Aug 5, 2025

View reviewed changes

antiagainst merged commit 376b9b9 into triton-lang:main Aug 5, 2025
9 checks passed

zwu-2025 deleted the buffer_ldst branch August 14, 2025 18:09

antiagainst mentioned this pull request Sep 10, 2025

[AMD][Gluon] Expose buffer_atomic_rmw to Gluon #8112

Merged

		handle = self.builder.create_buffer_store(stored_value, ptr, offsets, cache_modifier, mask)
		return ttgl.tensor(handle, ttgl.void)

	handle = self.builder.create_buffer_store(stored_value, ptr, offsets, cache_modifier, mask)
	return ttgl.tensor(handle, ttgl.void)
	self.builder.create_buffer_store(stored_value, ptr, offsets, cache_modifier, mask)



		@builtin
		def create_buffer_load(ptr, element_type, offsets, cache, mask, layout, other, _semantic=None):

	def create_buffer_load(ptr, element_type, offsets, cache, mask, layout, other, _semantic=None):
	def buffer_load(ptr, element_type, offsets, cache, mask, layout, other, _semantic=None):

[AMD][Gluon] Expose buffer_load and buffer_store to Gluon #7738

[AMD][Gluon] Expose buffer_load and buffer_store to Gluon #7738

Uh oh!

Conversation

zwu-2025 commented Aug 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

peterbell10 Aug 1, 2025

Choose a reason for hiding this comment

Uh oh!

peterbell10 Aug 1, 2025

Choose a reason for hiding this comment

Uh oh!

peterbell10 Aug 1, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

antiagainst Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

antiagainst Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

antiagainst left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

antiagainst left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zwu-2025 commented Aug 1, 2025 •

edited

Loading