Stars
A bunch of kernels that might make stuff slower 😉
Code for "Theoretical Foundations of Deep Selective State-Space Models" (NeurIPS 2024)
Code for ICLR 2025 Paper "What is Wrong with Perplexity for Long-context Language Modeling?"
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
SlamKit is an open source tool kit for efficient training of SpeechLMs. It was used for "Slamming: Training a Speech Language Model on One GPU in a Day"
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problems
An extremely fast Python package and project manager, written in Rust.
Tile primitives for speedy kernels
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
Fully open reproduction of DeepSeek-R1
MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone
Unofficial implementation of Titans, SOTA memory for transformers, in Pytorch
Scalable RL solution for advanced reasoning of language models
Open-sourcing code associated with the AAAI-25 paper "On the Expressiveness and Length Generalization of Selective State-Space Models on Regular Languages"
noise_step: Training in 1.58b With No Gradient Memory
[ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule