Stars
A fast communication-overlapping library for tensor/expert parallelism on GPUs.
Ongoing research training transformer models at scale
CL-bench: A Benchmark for Context Learning
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
We introduce BabyVision, a benchmark revealing the infancy of AI vision.
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
An Open Foundation Model and Benchmark to Accelerate Generative Recommendation
All-in-One Sandbox for AI Agents that combines Browser, Shell, File, MCP and VSCode Server in a single Docker container.
Secure and fast microVMs for serverless computing.
SWE-agent takes a GitHub issue and tries to automatically fix it, using your LM of choice. It can also be employed for offensive cybersecurity or competitive coding challenges. [NeurIPS 2024]
Checkpoint-engine is a simple middleware to update model weights in LLM inference engines
🌟 The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming
Accelerating MoE with IO and Tile-aware Optimizations
Easily and securely send things from one computer to another 🐊 📦
Simple, safe way to store and distribute tensors
Incredibly fast JavaScript runtime, bundler, test runner, and package manager – all in one
The official implementation for [NeurIPS2025 Oral] Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
FlashInfer: Kernel Library for LLM Serving
slime is an LLM post-training framework for RL Scaling.
Supercharge Your LLM with the Fastest KV Cache Layer
[ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
C++20 实现的跨平台、header only,易用的高性能http库; modern c++(c++20), cross-platform, header-only, easy to use http framework
Official implementation of "Continuous Autoregressive Language Models"
Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond
A PyTorch native platform for training generative AI models

