[AMD]Clamp Results in Downcasting to FP8E4M3 and FP8E5M2 #7337

knwng · 2025-06-27T02:38:43Z

Resolved https://github.com/ROCm/triton-internal/issues/849 and https://github.com/ROCm/triton-internal/issues/850.

There are several conversion ops on the NV side using satfinite mode, but on the AMD side, some of those are in non-saturation mode. We need to align AMD ops with NV.

For example, fp32 to OCP fp8 on mi350 is lowered to ROCDL::CvtScaleF32PkFp8F32Op, and is eventually lowered to v_cvt_scalef32_pk_fp8_f32, which, according to ISA, is in non-saturation mode. But on the NV side, it's lowered to
cvt.rn.satfinite.e4m3x2.f32, which is in saturation mode.

Other examples including:

Conversion	ROCDL dialect	Instruction
fp32 to fp8e4m3fn	ROCDL::CvtScaleF32PkFp8F32Op	v_cvt_scalef32_pk_fp8_f32
fp32 to fp8e5m2	ROCDL::CvtScaleF32PkBf8F32Op	v_cvt_scalef32_pk_bf8_f32
fp16 to fp8e4m3fn	ROCDL::CvtScaleF32PkFp8F16Op	v_cvt_scalef32_pk_fp8_f16
fp16 to fp8e5m2	ROCDL::CvtScaleF32PkBf8F16Op	v_cvt_scalef32_pk_bf8_f16
bf16 to fp8e4m3fn	ROCDL::CvtScaleF32PkFp8Bf16Op	v_cvt_scalef32_pk_fp8_bf16
bf16 to fp8e5m2	ROCDL::CvtScaleF32PkBf8Bf16Op	v_cvt_scalef32_pk_bf8_bf16

This PR fixed this issue by enabling the FP16_OVFL flag in the Mode register before these conversion instrs.

Created with @ravil-mobile

…able_fp16_ovfl

python/test/unit/language/test_conversions.py

third_party/amd/lib/TritonAMDGPUToLLVM/ElementwiseOpToLLVM.cpp

python/test/unit/language/test_conversions.py

python/triton_kernels/tests/test_mxfp.py

…g#7337) There are several conversion ops on the NV side using `satfinite` mode, but on the AMD side, some of those are in non-saturation mode. We need to align AMD ops with NV. For example, fp32 to OCP fp8 on mi350 is lowered to `ROCDL::CvtScaleF32PkFp8F32Op`, and is eventually lowered to `v_cvt_scalef32_pk_fp8_f32`, which, according to ISA, is in non-saturation mode. But on the NV side, it's lowered to `cvt.rn.satfinite.e4m3x2.f32`, which is in saturation mode. Other examples including: | Conversion | ROCDL dialect | Instruction | | ----------------- | ----------------------------- | -------------------------- | | fp32 to fp8e4m3fn | ROCDL::CvtScaleF32PkFp8F32Op | v_cvt_scalef32_pk_fp8_f32 | | fp32 to fp8e5m2 | ROCDL::CvtScaleF32PkBf8F32Op | v_cvt_scalef32_pk_bf8_f32 | | fp16 to fp8e4m3fn | ROCDL::CvtScaleF32PkFp8F16Op | v_cvt_scalef32_pk_fp8_f16 | | fp16 to fp8e5m2 | ROCDL::CvtScaleF32PkBf8F16Op | v_cvt_scalef32_pk_bf8_f16 | | bf16 to fp8e4m3fn | ROCDL::CvtScaleF32PkFp8Bf16Op | v_cvt_scalef32_pk_fp8_bf16 | | bf16 to fp8e5m2 | ROCDL::CvtScaleF32PkBf8Bf16Op | v_cvt_scalef32_pk_bf8_bf16 | This PR fixed this issue by enabling the `FP16_OVFL` flag in the Mode register before these conversion instrs. --------- Co-authored-by: ravil-mobile <ravil.aviva.com@gmail.com>

ravil-mobile and others added 4 commits June 26, 2025 15:33

[AMD] Added clamping for Fp8E4M3 and Fp8E5M2

7bb732a

enable fp16_ovfl

0cee5d5

Merge remote-tracking branch 'ravil/ravil/mi350-downcast-fp8' into en…

305061d

…able_fp16_ovfl

x

5fa3339

antiagainst requested changes Jun 27, 2025

View reviewed changes

knwng added 2 commits June 26, 2025 22:01

fix instr scheduling

6d3f4d4

resolve comments

5ee4165

antiagainst reviewed Jun 27, 2025

View reviewed changes

python/triton_kernels/tests/test_mxfp.py Show resolved Hide resolved

resolve comments

e88e7ba

knwng requested a review from antiagainst June 27, 2025 05:11

antiagainst approved these changes Jun 27, 2025

View reviewed changes

antiagainst marked this pull request as ready for review June 27, 2025 05:18

antiagainst requested review from ptillet and zhanglx13 as code owners June 27, 2025 05:18

antiagainst merged commit ddacd46 into triton-lang:main Jun 27, 2025
9 checks passed

antiagainst mentioned this pull request Jun 27, 2025

[AMD] Added clamping for Fp8E4M3 and Fp8E5M2 #7327

Closed

knwng deleted the enable_fp16_ovfl branch July 31, 2025 00:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMD]Clamp Results in Downcasting to FP8E4M3 and FP8E5M2 #7337

[AMD]Clamp Results in Downcasting to FP8E4M3 and FP8E5M2 #7337

Uh oh!

knwng commented Jun 27, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[AMD]Clamp Results in Downcasting to FP8E4M3 and FP8E5M2 #7337

[AMD]Clamp Results in Downcasting to FP8E4M3 and FP8E5M2 #7337

Uh oh!

Conversation

knwng commented Jun 27, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants