Skip to content

Conversation

@pull
Copy link

@pull pull bot commented Dec 20, 2025

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

…2411)

* rowwise colwise RHT group quant v1

Signed-off-by: Zhongbo Zhu <[email protected]>

* remove local array RW

Signed-off-by: Zhongbo Zhu <[email protected]>

* change wait_barrier

Signed-off-by: Zhongbo Zhu <[email protected]>

* fast math options

Signed-off-by: Zhongbo Zhu <[email protected]>

* use mult to replace div

Signed-off-by: Zhongbo Zhu <[email protected]>

* format

Signed-off-by: Zhongbo Zhu <[email protected]>

* bulk move random states

Signed-off-by: Zhongbo Zhu <[email protected]>

* greptile

Signed-off-by: Zhongbo Zhu <[email protected]>

* lint

Signed-off-by: Zhongbo Zhu <[email protected]>

* revert to use divides

Signed-off-by: Zhongbo Zhu <[email protected]>

* avoid fp32 bf16 round-trip in RHT cast fusion

Signed-off-by: Zhongbo Zhu <[email protected]>

* trigger fastmath by toggle NVTE_RHT_CAST_FUSION_USE_FAST_MATH

Signed-off-by: Zhongbo Zhu <[email protected]>

* integrate row col rht fusion, functional

Signed-off-by: Zhongbo Zhu <[email protected]>

* numerics aligned

Signed-off-by: Zhongbo Zhu <[email protected]>

* style

Signed-off-by: Zhongbo Zhu <[email protected]>

* remove device sync

Signed-off-by: Zhongbo Zhu <[email protected]>

* 128 padding

Signed-off-by: Zhongbo Zhu <[email protected]>

* revert colwise rng state creation because of row-col fused kernel

Signed-off-by: Zhongbo Zhu <[email protected]>

* fix CI, linter

Signed-off-by: Zhongbo Zhu <[email protected]>

* refactor RS for generating two random values

Signed-off-by: Zhongbo Zhu <[email protected]>

* Avoid invalid configs with templated kernel

Signed-off-by: Tim Moon <[email protected]>

* fix acc pipeline init with 0 arrival count

Signed-off-by: Zhongbo Zhu <[email protected]>

* restore rowwise-only mode

Signed-off-by: Zhongbo Zhu <[email protected]>

* switch to dynamic atomic scheduler

Signed-off-by: Zhongbo Zhu <[email protected]>

* Avoid instantiating group RHT+cast kernel without row-wise or col-wise output

Signed-off-by: Tim Moon <[email protected]>

* Include fast math option in quantization config

Signed-off-by: Tim Moon <[email protected]>

* Fix linter warnings and review nits

Signed-off-by: Tim Moon <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Use TE license

Signed-off-by: Tim Moon <[email protected]>

* Fix bug where kernel is always launched on stream

Signed-off-by: Tim Moon <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Restore BF16 intermediate downcast in fused RHT-cast kernels

Signed-off-by: Tim Moon <[email protected]>

* fix numerical test of grouped kernel

Signed-off-by: Zhongbo Zhu <[email protected]>

* Make sure row-wise and col-wise quantization use different RNG seeds

Signed-off-by: Tim Moon <[email protected]>

* Restore autoformatter

Signed-off-by: Tim Moon <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Zhongbo Zhu <[email protected]>
Signed-off-by: Tim Moon <[email protected]>
Signed-off-by: Tim Moon <[email protected]>
Co-authored-by: Tim Moon <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Tim Moon <[email protected]>
@pull pull bot locked and limited conversation to collaborators Dec 20, 2025
@pull pull bot added the ⤵️ pull label Dec 20, 2025
@pull pull bot merged commit eb8e792 into dumpmemory:main Dec 20, 2025
10 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant