Skip to content

Conversation

@knwng
Copy link
Contributor

@knwng knwng commented Jun 27, 2025

Resolved https://github.com/ROCm/triton-internal/issues/849 and https://github.com/ROCm/triton-internal/issues/850.

There are several conversion ops on the NV side using satfinite mode, but on the AMD side, some of those are in non-saturation mode. We need to align AMD ops with NV.

For example, fp32 to OCP fp8 on mi350 is lowered to ROCDL::CvtScaleF32PkFp8F32Op, and is eventually lowered to v_cvt_scalef32_pk_fp8_f32, which, according to ISA, is in non-saturation mode. But on the NV side, it's lowered to
cvt.rn.satfinite.e4m3x2.f32, which is in saturation mode.

Other examples including:

Conversion ROCDL dialect Instruction
fp32 to fp8e4m3fn ROCDL::CvtScaleF32PkFp8F32Op v_cvt_scalef32_pk_fp8_f32
fp32 to fp8e5m2 ROCDL::CvtScaleF32PkBf8F32Op v_cvt_scalef32_pk_bf8_f32
fp16 to fp8e4m3fn ROCDL::CvtScaleF32PkFp8F16Op v_cvt_scalef32_pk_fp8_f16
fp16 to fp8e5m2 ROCDL::CvtScaleF32PkBf8F16Op v_cvt_scalef32_pk_bf8_f16
bf16 to fp8e4m3fn ROCDL::CvtScaleF32PkFp8Bf16Op v_cvt_scalef32_pk_fp8_bf16
bf16 to fp8e5m2 ROCDL::CvtScaleF32PkBf8Bf16Op v_cvt_scalef32_pk_bf8_bf16

This PR fixed this issue by enabling the FP16_OVFL flag in the Mode register before these conversion instrs.

Created with @ravil-mobile

@knwng knwng requested a review from antiagainst June 27, 2025 05:11
@antiagainst antiagainst marked this pull request as ready for review June 27, 2025 05:18
@antiagainst antiagainst merged commit ddacd46 into triton-lang:main Jun 27, 2025
9 checks passed
@knwng knwng deleted the enable_fp16_ovfl branch July 31, 2025 00:58
tie-pilot-qxw pushed a commit to tie-pilot-qxw/triton that referenced this pull request Aug 30, 2025
…g#7337)

There are several conversion ops on the NV side using `satfinite` mode,
but on the AMD side, some of those are in non-saturation mode. We need
to align AMD ops with NV.

For example, fp32 to OCP fp8 on mi350 is lowered to
`ROCDL::CvtScaleF32PkFp8F32Op`, and is eventually lowered to
`v_cvt_scalef32_pk_fp8_f32`, which, according to ISA, is in
non-saturation mode. But on the NV side, it's lowered to
`cvt.rn.satfinite.e4m3x2.f32`, which is in saturation mode.

Other examples including:

| Conversion | ROCDL dialect | Instruction |
| ----------------- | ----------------------------- |
-------------------------- |
| fp32 to fp8e4m3fn | ROCDL::CvtScaleF32PkFp8F32Op |
v_cvt_scalef32_pk_fp8_f32 |
| fp32 to fp8e5m2 | ROCDL::CvtScaleF32PkBf8F32Op |
v_cvt_scalef32_pk_bf8_f32 |
| fp16 to fp8e4m3fn | ROCDL::CvtScaleF32PkFp8F16Op |
v_cvt_scalef32_pk_fp8_f16 |
| fp16 to fp8e5m2 | ROCDL::CvtScaleF32PkBf8F16Op |
v_cvt_scalef32_pk_bf8_f16 |
| bf16 to fp8e4m3fn | ROCDL::CvtScaleF32PkFp8Bf16Op |
v_cvt_scalef32_pk_fp8_bf16 |
| bf16 to fp8e5m2 | ROCDL::CvtScaleF32PkBf8Bf16Op |
v_cvt_scalef32_pk_bf8_bf16 |

This PR fixed this issue by enabling the `FP16_OVFL` flag in the Mode
register before these conversion instrs.

---------

Co-authored-by: ravil-mobile <ravil.aviva.com@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants