Lists (3)
Sort Name ascending (A-Z)
Starred repositories
Edit-R1: Reinforce Image Editing with Diffusion Negative-Aware Finetuning and MLLM Implicit Feedback
[NeurIPS 2025] An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
LLM-Powered Semi-Structured Table Question Answering
verl: Volcano Engine Reinforcement Learning for LLMs
The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.
Ongoing research training transformer models at scale
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3, Qwen3-MoE, DeepSeek-R1, GLM4.5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, …
A streamlined and customizable framework for efficient large model (LLM, VLM, AIGC) evaluation and performance benchmarking.
DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation
MAGI-1: Autoregressive Video Generation at Scale
Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.
The simplest, fastest repository for training/finetuning small-sized VLMs.
Janus-Series: Unified Multimodal Understanding and Generation Models
The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (V…
Enjoy the magic of Diffusion models!
[ICCV 2025 Highlight] OminiControl: Minimal and Universal Control for Diffusion Transformer
A unified inference and post-training framework for accelerated video generation.
Scripts and doc for https://www.dolthub.com/repositories/chenditc/investment_data
Valley is a cutting-edge multimodal large model designed to handle a variety of tasks involving text, images, and video data.
A PyTorch native platform for training generative AI models
Official repo for paper "Structured 3D Latents for Scalable and Versatile 3D Generation" (CVPR'25 Spotlight).
You can easily calculate FVD, PSNR, SSIM, LPIPS for evaluating the quality of generated or predicted videos.
[CVPR2025] We present StableAnimator, the first end-to-end ID-preserving video diffusion framework, which synthesizes high-quality videos without any post-processing, conditioned on a reference ima…



