0.0.4 (2024-05-01)
- pytorch 2.3 support
- gpu sampling kernels (top-p, top-k)
- more gqa group sizes
- add mma instructions for fp8 (#179) (d305798)
- mma rowsum for fp8 (#180) (5af935c)
- support any num_heads for get_alibi_slope (#200) (b217a6f)
0.0.3 (2024-03-08)
- adding
sm_scale
field for all attention APIs (#145) (85d4018) - enable
head_dim=256
for attention kernels (#132) (0372acc) - pytorch api of fp8 kv-cache (#156) (66ee066)
- support ALiBi (#146) (383518b)
- bugfix to pr 135 (#136) (3d55c71)
- fix bugs introduced in #132 (#135) (9b7b0b9)
- fix FindThrust.cmake (#161) (30fa584)