Highlights
- Pro
Stars
Solution of the NTIRE 2024 Challenge on Efficient Super-Resolution
Janus-Series: Unified Multimodal Understanding and Generation Models
📚 Collection of awesome generation acceleration resources.
Official Pytorch Implementation of Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think (ICLR 2025)
Towards Unified Deep Image Deraining: A Survey and A New Benchmark
Unified KV Cache Compression Methods for Auto-Regressive Models
A paper list of some recent works about Token Compress for Vit and VLM
[NeurIPS 2024] official code release for our paper "Revisiting the Integration of Convolution and Attention for Vision Backbone".
Sample codes for my CUDA programming book
Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis
This repo contains the code for 1D tokenizer and generator
A method to increase the speed and lower the memory footprint of existing vision transformers.
📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).
[NeurIPS 24] PromptFix: You Prompt and We Fix the Photo
CoDe: Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient
APOLLO: SGD-like Memory, AdamW-level Performance
This is the official PyTorch implementation of "ZipAR: Accelerating Auto-regressive Image Generation through Spatial Locality"
[ICLR25] High-performance Image Tokenizers for VAR and AR
Solve puzzles. Improve your pytorch.
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
4 bits quantization of LLaMA using GPTQ
Solution of the NTIRE 2024 Challenge on Efficient Super-Resolution