-
Alibaba DAMO Academy
- Hangzhou, China
- https://kyonhuang.top/
- @KyonHuang
Lists (1)
Sort Name ascending (A-Z)
Starred repositories
Compression with Global Guidance: Towards Training-free High-Resolution MLLMs Acceleration
A Telegram bot to recommend arXiv papers
📚 Collection of token reduction for model compression resources.
A paper list of some recent works about Token Compress for Vit and VLM
openvla / openvla
Forked from TRI-ML/prismatic-vlmsOpenVLA: An open-source vision-language-action model for robotic manipulation.
Accelerating Diffusion Transformers with Token-wise Feature Caching
This is official library of "ProFD: Prompt-guided Feature Disentangling for Occluded Person Re-Identification"
📚 Collection of awesome generation acceleration resources.
A comprehensive list of papers using large language/multi-modal models for Robotics/RL, including papers, codes, and related websites
A generative speech model for daily dialogue.
Example models using DeepSpeed
[ICASSP 2024] VGDiffZero: Text-to-image Diffusion Models Can Be Zero-shot Visual Grounders
Ranni: Taming Text-to-Image Diffusion for Accurate Instruction Following
AcadHomepage: A Modern and Responsive Academic Personal Homepage
ModelScope: bring the notion of Model-as-a-Service to life.
OpenSTL: A Comprehensive Benchmark of Spatio-Temporal Predictive Learning
my attempt at implementing the DiffEdit paper (WIP)
Speed up Stable Diffusion with this one simple trick!
[CVPR 2024] Troika: Multi-Path Cross-Modal Traction for Compositional Zero-Shot Learning
[ICLR2023] PLOT: Prompt Learning with Optimal Transport for Vision-Language Models
A collection of parameter-efficient transfer learning papers focusing on computer vision and multimodal domains.
Jupyter notebook on Gumbel-max and Gumbel-softmax tricks