-
Notifications
You must be signed in to change notification settings - Fork 2.6k
[AMD][Gluon] Expose buffer_load and buffer_store to Gluon #7738
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| layout = ttgl._unwrap_if_constexpr(layout) | ||
|
|
||
| ret_ty = ttgl.distributed_type(element_type, shape, layout) | ||
| handle = self.builder.create_buffer_load(ret_ty.to_ir(self.builder), ptr, offsets, cache_modifier, mask, other) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you infer the layout from the the offsets, the dtype from the pointer and add defaults for mask and other? You probably also need to add broadcasting.
My general feedback would be to think about this as a language API, not just a wrapper to emit IR.
| handle = self.builder.create_buffer_store(stored_value, ptr, offsets, cache_modifier, mask) | ||
| return ttgl.tensor(handle, ttgl.void) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to move away from tl.void. I don't think it adds any value.
| handle = self.builder.create_buffer_store(stored_value, ptr, offsets, cache_modifier, mask) | |
| return ttgl.tensor(handle, ttgl.void) | |
| self.builder.create_buffer_store(stored_value, ptr, offsets, cache_modifier, mask) |
|
|
||
|
|
||
| @builtin | ||
| def create_buffer_load(ptr, element_type, offsets, cache, mask, layout, other, _semantic=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The create_ prefix is only used by the builder, not in the language.
| def create_buffer_load(ptr, element_type, offsets, cache, mask, layout, other, _semantic=None): | |
| def buffer_load(ptr, element_type, offsets, cache, mask, layout, other, _semantic=None): |
python/triton/experimental/gluon/language/amd/cdna3/__init__.py
Outdated
Show resolved
Hide resolved
python/triton/experimental/gluon/language/amd/cdna3/__init__.py
Outdated
Show resolved
Hide resolved
d6fcb54 to
79fdfea
Compare
same style as the implementation in other backend
python/triton/experimental/gluon/language/amd/cdna3/__init__.py
Outdated
Show resolved
Hide resolved
python/triton/experimental/gluon/language/amd/cdna3/__init__.py
Outdated
Show resolved
Hide resolved
python/triton/experimental/gluon/language/amd/cdna3/__init__.py
Outdated
Show resolved
Hide resolved
python/triton/experimental/gluon/language/amd/cdna3/__init__.py
Outdated
Show resolved
Hide resolved
python/triton/experimental/gluon/language/amd/cdna3/__init__.py
Outdated
Show resolved
Hide resolved
python/triton/experimental/gluon/language/amd/cdna3/__init__.py
Outdated
Show resolved
Hide resolved
| element_type = ptr.type.scalar.element_ty | ||
|
|
||
| if mask is not None: | ||
| assert mask.shape == shape, "offsets must have the same shape as offsets" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"mask must have .." :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not addressed yet?
antiagainst
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice just one final comments from me. Also @peterbell10 to take another look.
python/triton/experimental/gluon/language/amd/cdna3/__init__.py
Outdated
Show resolved
Hide resolved
python/triton/experimental/gluon/language/amd/cdna3/__init__.py
Outdated
Show resolved
Hide resolved
python/triton/experimental/gluon/language/amd/cdna3/__init__.py
Outdated
Show resolved
Hide resolved
antiagainst
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I'll land once @peterbell10 is okay.
Expose AMD buffer_load and buffer_store Gluon. Example usage looks like:
```
def buffer_ldst_kernel(x, y):
layout: ttgl.constexpr = ttgl.BlockedLayout(size_per_thread=[1, 1], threads_per_warp=[1, 64], warps_per_cta=[4, 1], order=[1, 0])
offsets = ttgl.arange(0, 64 * 64, layout=layout)
a = ttgl.amd.cdna3.buffer_load(ptr=x, offsets=offsets)
ttgl.amd.cdna3.buffer_store(stored_value=a, ptr=y, offsets=offsets)
```
Expose AMD buffer_load and buffer_store Gluon. Example usage looks like: