Skip to content

Conversation

@xiaohuguo2023
Copy link
Contributor

@xiaohuguo2023 xiaohuguo2023 commented Oct 27, 2025

The Problem with the Original Formula

The original formula is:

tanh(x) = (e^(2x) - 1) / (e^(2x) + 1)
  • Issue with large positive x:
    • When x = 20: e^(40) ≈ 2.4 × 10^17 → manageable
    • When x = 50: e^(100) ≈ 2.7 × 10^43 → overflow to infinity
    • Result: (∞ - 1)/(∞ + 1) = NaN x
  • For negative x: The formula actually works fine because e^(2x) → 0, giving (-1)/(1) = -1

The Numerically Stable Solution

  • For Positive x: Reformulation
tanh(x) = (e^(2x) - 1) / (e^(2x) + 1) = (e^(2x) + 1 - 2) / (e^(2x) + 1) = 1 - 2/(e^(2x) + 1)
  • For Negative x: Using Symmetry
tanh(-x) = (e^(-2x) - 1) / (e^(-2x) + 1) =  (2/(e^(-2x) + 1) - 1) = -1 × (1 - 2/(e^(2|x|) + 1))

Unified formulation:

tanh(x) = sign(x) × (1 - 2/(e^(2|x|) + 1))

@xiaohuguo2023 xiaohuguo2023 changed the title reimplement fast_tanhf() to avoid overflow [AMD]: reimplement fast_tanhf() to avoid overflow Oct 28, 2025
@xiaohuguo2023 xiaohuguo2023 marked this pull request as ready for review October 29, 2025 07:36
Copy link
Collaborator

@zhanglx13 zhanglx13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@antiagainst antiagainst enabled auto-merge (squash) October 29, 2025 15:13
auto-merge was automatically disabled October 29, 2025 16:18

Head branch was pushed to by a user without write access

@zhanglx13 zhanglx13 enabled auto-merge (squash) October 29, 2025 16:23
@antiagainst antiagainst disabled auto-merge October 29, 2025 16:38
Copy link
Collaborator

@antiagainst antiagainst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually can you fix up the builder API following #8572?

@antiagainst antiagainst enabled auto-merge (squash) October 29, 2025 22:02
@antiagainst antiagainst disabled auto-merge October 29, 2025 22:26
@antiagainst antiagainst merged commit 3f5eb50 into triton-lang:main Oct 29, 2025
8 of 9 checks passed
@xiaohuguo2023 xiaohuguo2023 deleted the fix_fast_tanhf branch October 29, 2025 22:34
naromero77amd pushed a commit to ROCm/triton that referenced this pull request Nov 8, 2025
The original formula is:
```
tanh(x) = (e^(2x) - 1) / (e^(2x) + 1)
```
- Issue with large positive x:
   - When x = 20: e^(40) ≈ 2.4 × 10^17 → manageable
   - When x = 50: e^(100) ≈ 2.7 × 10^43 → overflow to infinity
   - Result: (∞ - 1)/(∞ + 1) = NaN x
- For negative x: The formula actually works fine because e^(2x) → 0,
giving (-1)/(1) = -1

- For Positive x: Reformulation
```
tanh(x) = (e^(2x) - 1) / (e^(2x) + 1) = (e^(2x) + 1 - 2) / (e^(2x) + 1) = 1 - 2/(e^(2x) + 1)
```
-  For Negative x: Using Symmetry
```
tanh(-x) = (e^(-2x) - 1) / (e^(-2x) + 1) =  (2/(e^(-2x) + 1) - 1) = -1 × (1 - 2/(e^(2|x|) + 1))
```

```
tanh(x) = sign(x) × (1 - 2/(e^(2|x|) + 1))
```

(cherry picked from commit 3f5eb50)
naromero77amd pushed a commit to ROCm/triton that referenced this pull request Nov 8, 2025
The original formula is:
```
tanh(x) = (e^(2x) - 1) / (e^(2x) + 1)
```
- Issue with large positive x:
   - When x = 20: e^(40) ≈ 2.4 × 10^17 → manageable
   - When x = 50: e^(100) ≈ 2.7 × 10^43 → overflow to infinity
   - Result: (∞ - 1)/(∞ + 1) = NaN x
- For negative x: The formula actually works fine because e^(2x) → 0,
giving (-1)/(1) = -1

- For Positive x: Reformulation
```
tanh(x) = (e^(2x) - 1) / (e^(2x) + 1) = (e^(2x) + 1 - 2) / (e^(2x) + 1) = 1 - 2/(e^(2x) + 1)
```
-  For Negative x: Using Symmetry
```
tanh(-x) = (e^(-2x) - 1) / (e^(-2x) + 1) =  (2/(e^(-2x) + 1) - 1) = -1 × (1 - 2/(e^(2|x|) + 1))
```

```
tanh(x) = sign(x) × (1 - 2/(e^(2|x|) + 1))
```

(cherry picked from commit 3f5eb50)
naromero77amd pushed a commit to ROCm/triton that referenced this pull request Nov 10, 2025
The original formula is:
```
tanh(x) = (e^(2x) - 1) / (e^(2x) + 1)
```
- Issue with large positive x:
   - When x = 20: e^(40) ≈ 2.4 × 10^17 → manageable
   - When x = 50: e^(100) ≈ 2.7 × 10^43 → overflow to infinity
   - Result: (∞ - 1)/(∞ + 1) = NaN x
- For negative x: The formula actually works fine because e^(2x) → 0,
giving (-1)/(1) = -1

- For Positive x: Reformulation
```
tanh(x) = (e^(2x) - 1) / (e^(2x) + 1) = (e^(2x) + 1 - 2) / (e^(2x) + 1) = 1 - 2/(e^(2x) + 1)
```
-  For Negative x: Using Symmetry
```
tanh(-x) = (e^(-2x) - 1) / (e^(-2x) + 1) =  (2/(e^(-2x) + 1) - 1) = -1 × (1 - 2/(e^(2|x|) + 1))
```

```
tanh(x) = sign(x) × (1 - 2/(e^(2|x|) + 1))
```

(cherry picked from commit 3f5eb50)
(cherry picked from commit 60297e6)
jataylo pushed a commit to ROCm/triton that referenced this pull request Nov 11, 2025
…900)

The original formula is:
```
tanh(x) = (e^(2x) - 1) / (e^(2x) + 1)
```
- Issue with large positive x:
   - When x = 20: e^(40) ≈ 2.4 × 10^17 → manageable
   - When x = 50: e^(100) ≈ 2.7 × 10^43 → overflow to infinity
   - Result: (∞ - 1)/(∞ + 1) = NaN x
- For negative x: The formula actually works fine because e^(2x) → 0,
giving (-1)/(1) = -1

- For Positive x: Reformulation
```
tanh(x) = (e^(2x) - 1) / (e^(2x) + 1) = (e^(2x) + 1 - 2) / (e^(2x) + 1) = 1 - 2/(e^(2x) + 1)
```
-  For Negative x: Using Symmetry
```
tanh(-x) = (e^(-2x) - 1) / (e^(-2x) + 1) =  (2/(e^(-2x) + 1) - 1) = -1 × (1 - 2/(e^(2|x|) + 1))
```

```
tanh(x) = sign(x) × (1 - 2/(e^(2|x|) + 1))
```

(cherry picked from commit 3f5eb50)

Co-authored-by: xiaohuguo2023 <149615094+xiaohuguo2023@users.noreply.github.com>
jataylo pushed a commit to ROCm/triton that referenced this pull request Nov 11, 2025
…901)

The original formula is:
```
tanh(x) = (e^(2x) - 1) / (e^(2x) + 1)
```
- Issue with large positive x:
   - When x = 20: e^(40) ≈ 2.4 × 10^17 → manageable
   - When x = 50: e^(100) ≈ 2.7 × 10^43 → overflow to infinity
   - Result: (∞ - 1)/(∞ + 1) = NaN x
- For negative x: The formula actually works fine because e^(2x) → 0,
giving (-1)/(1) = -1

- For Positive x: Reformulation
```
tanh(x) = (e^(2x) - 1) / (e^(2x) + 1) = (e^(2x) + 1 - 2) / (e^(2x) + 1) = 1 - 2/(e^(2x) + 1)
```
-  For Negative x: Using Symmetry
```
tanh(-x) = (e^(-2x) - 1) / (e^(-2x) + 1) =  (2/(e^(-2x) + 1) - 1) = -1 × (1 - 2/(e^(2|x|) + 1))
```

```
tanh(x) = sign(x) × (1 - 2/(e^(2|x|) + 1))
```

(cherry picked from commit 3f5eb50)

Co-authored-by: xiaohuguo2023 <149615094+xiaohuguo2023@users.noreply.github.com>
jataylo pushed a commit to ROCm/triton that referenced this pull request Nov 11, 2025
…902)

The original formula is:
```
tanh(x) = (e^(2x) - 1) / (e^(2x) + 1)
```
- Issue with large positive x:
   - When x = 20: e^(40) ≈ 2.4 × 10^17 → manageable
   - When x = 50: e^(100) ≈ 2.7 × 10^43 → overflow to infinity
   - Result: (∞ - 1)/(∞ + 1) = NaN x
- For negative x: The formula actually works fine because e^(2x) → 0,
giving (-1)/(1) = -1

- For Positive x: Reformulation
```
tanh(x) = (e^(2x) - 1) / (e^(2x) + 1) = (e^(2x) + 1 - 2) / (e^(2x) + 1) = 1 - 2/(e^(2x) + 1)
```
-  For Negative x: Using Symmetry
```
tanh(-x) = (e^(-2x) - 1) / (e^(-2x) + 1) =  (2/(e^(-2x) + 1) - 1) = -1 × (1 - 2/(e^(2|x|) + 1))
```

```
tanh(x) = sign(x) × (1 - 2/(e^(2|x|) + 1))
```

(cherry picked from commit 3f5eb50)
(cherry picked from commit 60297e6)

Co-authored-by: xiaohuguo2023 <149615094+xiaohuguo2023@users.noreply.github.com>
tmoreau89 pushed a commit to tmoreau89/triton that referenced this pull request Dec 1, 2025
### The Problem with the Original Formula
The original formula is:
```
tanh(x) = (e^(2x) - 1) / (e^(2x) + 1)
```
- Issue with large positive x:
   - When x = 20: e^(40) ≈ 2.4 × 10^17 → manageable
   - When x = 50: e^(100) ≈ 2.7 × 10^43 → overflow to infinity
   - Result: (∞ - 1)/(∞ + 1) = NaN x
- For negative x: The formula actually works fine because e^(2x) → 0,
giving (-1)/(1) = -1

### The Numerically Stable Solution
- For Positive x: Reformulation
```
tanh(x) = (e^(2x) - 1) / (e^(2x) + 1) = (e^(2x) + 1 - 2) / (e^(2x) + 1) = 1 - 2/(e^(2x) + 1)
```
-  For Negative x: Using Symmetry
```
tanh(-x) = (e^(-2x) - 1) / (e^(-2x) + 1) =  (2/(e^(-2x) + 1) - 1) = -1 × (1 - 2/(e^(2|x|) + 1))
```

### Unified formulation:
```
tanh(x) = sign(x) × (1 - 2/(e^(2|x|) + 1))
```
phambinhfin added a commit to ROCm/jax that referenced this pull request Jan 20, 2026
AMD CDNA3 (MI300X/gfx942) does not have a hardware tanh instruction like
NVIDIA's PTX tanh.approx. Instead of using PTX inline assembly (which
doesn't work on ROCm), we use OCML's __ocml_tanh_f32 function.

Triton's AMD backend lowers this using a numerically stable fast exp-based
formula: tanh(x) = sign(x) * (1 - 2/(e^(2|x|) + 1))

This implementation:
- For f32: calls __ocml_tanh_f32 directly via extern_elementwise
- For f16/bf16: extends to f32, calls __ocml_tanh_f32, truncates back

Also fixes the bf16 skip condition to only apply to CUDA (not ROCm).

References:
- Triton PR jax-ml#7780: triton-lang/triton#7780
- Triton PR jax-ml#8551: triton-lang/triton#8551
- NVIDIA PTX ISA: https://docs.nvidia.com/cuda/parallel-thread-execution/
- AMD CDNA3 ISA: https://www.amd.com/content/dam/amd/en/documents/instinct-tech-docs/instruction-set-architectures/amd-instinct-mi300-cdna3-instruction-set-architecture.pdf
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants