CVPR 2024 decisions are now available on OpenReview!
注1:欢迎各位大佬提交issue,分享CVPR 2024论文和开源项目!
注2:关于往年CV顶会论文以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision
欢迎扫码加入【CVer学术交流群】,这是最大的计算机视觉AI知识星球!每日更新,第一时间分享最新最前沿的计算机视觉、AI绘画、图像处理、深度学习、自动驾驶、医疗影像和AIGC等方向的学习资料,学起来!
- 3DGS(Gaussian Splatting)
- Avatars
- Backbone
- CLIP
- MAE
- Embodied AI
- GAN
- GNN
- 多模态大语言模型(MLLM)
- 大语言模型(LLM)
- NAS
- OCR
- NeRF
- DETR
- Prompt
- 扩散模型(Diffusion Models)
- ReID(重识别)
- 长尾分布(Long-Tail)
- Vision Transformer
- 视觉和语言(Vision-Language)
- 自监督学习(Self-supervised Learning)
- 数据增强(Data Augmentation)
- 目标检测(Object Detection)
- 异常检测(Anomaly Detection)
- 目标跟踪(Visual Tracking)
- 语义分割(Semantic Segmentation)
- 实例分割(Instance Segmentation)
- 全景分割(Panoptic Segmentation)
- 医学图像(Medical Image)
- 医学图像分割(Medical Image Segmentation)
- 视频目标分割(Video Object Segmentation)
- 视频实例分割(Video Instance Segmentation)
- 参考图像分割(Referring Image Segmentation)
- 图像抠图(Image Matting)
- 图像编辑(Image Editing)
- Low-level Vision
- 超分辨率(Super-Resolution)
- 去噪(Denoising)
- 去模糊(Deblur)
- 自动驾驶(Autonomous Driving)
- 3D点云(3D Point Cloud)
- 3D目标检测(3D Object Detection)
- 3D语义分割(3D Semantic Segmentation)
- 3D目标跟踪(3D Object Tracking)
- 3D语义场景补全(3D Semantic Scene Completion)
- 3D配准(3D Registration)
- 3D人体姿态估计(3D Human Pose Estimation)
- 3D人体Mesh估计(3D Human Mesh Estimation)
- 医学图像(Medical Image)
- 图像生成(Image Generation)
- 视频生成(Video Generation)
- 3D生成(3D Generation)
- 视频理解(Video Understanding)
- 行为检测(Action Detection)
- 文本检测(Text Detection)
- 知识蒸馏(Knowledge Distillation)
- 模型剪枝(Model Pruning)
- 图像压缩(Image Compression)
- 三维重建(3D Reconstruction)
- 深度估计(Depth Estimation)
- 轨迹预测(Trajectory Prediction)
- 车道线检测(Lane Detection)
- 图像描述(Image Captioning)
- 视觉问答(Visual Question Answering)
- 手语识别(Sign Language Recognition)
- 视频预测(Video Prediction)
- 新视点合成(Novel View Synthesis)
- Zero-Shot Learning(零样本学习)
- 立体匹配(Stereo Matching)
- 特征匹配(Feature Matching)
- 场景图生成(Scene Graph Generation)
- 隐式神经表示(Implicit Neural Representations)
- 图像质量评价(Image Quality Assessment)
- 视频质量评价(Video Quality Assessment)
- 数据集(Datasets)
- 新任务(New Tasks)
- 其他(Others)
Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering
- Homepage: https://city-super.github.io/scaffold-gs/
- Paper: https://arxiv.org/abs/2312.00109
- Code: https://github.com/city-super/Scaffold-GS
GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis
- Homepage: https://shunyuanzheng.github.io/GPS-Gaussian
- Paper: https://arxiv.org/abs/2312.02155
- Code: https://github.com/ShunyuanZheng/GPS-Gaussian
GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians
- Paper: https://arxiv.org/abs/2312.02134
- Code: https://github.com/huliangxiao/GaussianAvatar
GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting
- Paper: https://arxiv.org/abs/2311.14521
- Code: https://github.com/buaacyw/GaussianEditor
Deformable 3D Gaussians for High-Fidelity Monocular Dynamic Scene Reconstruction
- Homepage: https://ingra14m.github.io/Deformable-Gaussians/
- Paper: https://arxiv.org/abs/2309.13101
- Code: https://github.com/ingra14m/Deformable-3D-Gaussians
SC-GS: Sparse-Controlled Gaussian Splatting for Editable Dynamic Scenes
- Homepage: https://yihua7.github.io/SC-GS-web/
- Paper: https://arxiv.org/abs/2312.14937
- Code: https://github.com/yihua7/SC-GS
Spacetime Gaussian Feature Splatting for Real-Time Dynamic View Synthesis
- Homepage: https://oppo-us-research.github.io/SpacetimeGaussians-website/
- Paper: https://arxiv.org/abs/2312.16812
- Code: https://github.com/oppo-us-research/SpacetimeGaussians
DNGaussian: Optimizing Sparse-View 3D Gaussian Radiance Fields with Global-Local Depth Normalization
- Homepage: https://fictionarry.github.io/DNGaussian/
- Paper: https://arxiv.org/abs/2403.06912
- Code: https://github.com/Fictionarry/DNGaussian
4D Gaussian Splatting for Real-Time Dynamic Scene Rendering
- Paper: https://arxiv.org/abs/2310.08528
- Code: https://github.com/hustvl/4DGaussians
GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models
- Paper: https://arxiv.org/abs/2310.08529
- Code: https://github.com/hustvl/GaussianDreamer
GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians
- Paper: https://arxiv.org/abs/2312.02134
- Code: https://github.com/huliangxiao/GaussianAvatar
Real-Time Simulated Avatar from Head-Mounted Sensors
- Homepage: https://www.zhengyiluo.com/SimXR/
- Paper: https://arxiv.org/abs/2403.06862
RepViT: Revisiting Mobile CNN From ViT Perspective
- Paper: https://arxiv.org/abs/2307.09283
- Code: https://github.com/THU-MIG/RepViT
TransNeXt: Robust Foveal Visual Perception for Vision Transformers
- Paper: https://arxiv.org/abs/2311.17132
- Code: https://github.com/DaiShiResearch/TransNeXt
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
- Paper: https://arxiv.org/abs/2312.03818
- Code: https://github.com/SunzeY/AlphaCLIP
FairCLIP: Harnessing Fairness in Vision-Language Learning
- Paper: https://arxiv.org/abs/2403.19949
- Code: https://github.com/Harvard-Ophthalmology-AI-Lab/FairCLIP
EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI
- Homepage: https://tai-wang.github.io/embodiedscan/
- Paper: https://arxiv.org/abs/2312.16170
- Code: https://github.com/OpenRobotLab/EmbodiedScan
MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception
- Homepage: https://iranqin.github.io/MP5.github.io/
- Paper: https://arxiv.org/abs/2312.07472
- Code: https://github.com/IranQin/MP5
LEMON: Learning 3D Human-Object Interaction Relation from 2D Images
- Paper: https://arxiv.org/abs/2312.08963
- Code: https://github.com/yyvhang/lemon_3d
An Empirical Study of Scaling Law for OCR
- Paper: https://arxiv.org/abs/2401.00028
- Code: https://github.com/large-ocr-model/large-ocr-model.github.io
ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting
- Paper: https://arxiv.org/abs/2403.00303
- Code: https://github.com/PriNing/ODM
PIE-NeRF🍕: Physics-based Interactive Elastodynamics with NeRF
- Paper: https://arxiv.org/abs/2311.13099
- Code: https://github.com/FYTalon/pienerf/
DETRs Beat YOLOs on Real-time Object Detection
- Paper: https://arxiv.org/abs/2304.08069
- Code: https://github.com/lyuwenyu/RT-DETR
Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement
- Paper: https://arxiv.org/abs/2403.16131
- Code: https://github.com/xiuqhou/Salience-DETR
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration
- Paper: https://arxiv.org/abs/2311.04257
- Code: https://github.com/X-PLUG/mPLUG-Owl/tree/main/mPLUG-Owl2
Link-Context Learning for Multimodal LLMs
- Paper: https://arxiv.org/abs/2308.07891
- Code: https://github.com/isekai-portal/Link-Context-Learning/tree/main
OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation
- Paper: https://arxiv.org/abs/2311.17911
- Code: https://github.com/shikiw/OPERA
Making Large Multimodal Models Understand Arbitrary Visual Prompts
- Homepage: https://vip-llava.github.io/
- Paper: https://arxiv.org/abs/2312.00784
Pink: Unveiling the power of referential comprehension for multi-modal llms
- Paper: https://arxiv.org/abs/2310.00582
- Code: https://github.com/SY-Xuan/Pink
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
- Paper: https://arxiv.org/abs/2311.08046
- Code: https://github.com/PKU-YuanGroup/Chat-UniVi
OneLLM: One Framework to Align All Modalities with Language
- Paper: https://arxiv.org/abs/2312.03700
- Code: https://github.com/csuhan/OneLLM
VTimeLLM: Empower LLM to Grasp Video Moments
- Paper: https://arxiv.org/abs/2311.18445
- Code: https://github.com/huangb23/VTimeLLM
Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification
- Paper: https://arxiv.org/abs/2403.10254
- Code: https://github.com/924973292/EDITOR
InstanceDiffusion: Instance-level Control for Image Generation
-
Paper: https://arxiv.org/abs/2402.03290
-
Code: https://github.com/frank-xwang/InstanceDiffusion
Residual Denoising Diffusion Models
- Paper: https://arxiv.org/abs/2308.13712
- Code: https://github.com/nachifur/RDDM
DeepCache: Accelerating Diffusion Models for Free
- Paper: https://arxiv.org/abs/2312.00858
- Code: https://github.com/horseee/DeepCache
DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations
-
Homepage: https://tianhao-qi.github.io/DEADiff/
-
Paper: https://arxiv.org/abs/2403.06951
-
Code: https://github.com/Tianhao-Qi/DEADiff_code
SVGDreamer: Text Guided SVG Generation with Diffusion Model
- Paper: https://arxiv.org/abs/2312.16476
- Code: https://ximinng.github.io/SVGDreamer-project/
InteractDiffusion: Interaction-Control for Text-to-Image Diffusion Model
- Paper: https://arxiv.org/abs/2312.05849
- Code: https://github.com/jiuntian/interactdiffusion
MMA-Diffusion: MultiModal Attack on Diffusion Models
- Paper: https://arxiv.org/abs/2311.17516
- Code: https://github.com/yangyijune/MMA-Diffusion
VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models
- Homeoage: https://video-motion-customization.github.io/
- Paper: https://arxiv.org/abs/2312.00845
- Code: https://github.com/HyeonHo99/Video-Motion-Customization
TransNeXt: Robust Foveal Visual Perception for Vision Transformers
- Paper: https://arxiv.org/abs/2311.17132
- Code: https://github.com/DaiShiResearch/TransNeXt
RepViT: Revisiting Mobile CNN From ViT Perspective
- Paper: https://arxiv.org/abs/2307.09283
- Code: https://github.com/THU-MIG/RepViT
PromptKD: Unsupervised Prompt Distillation for Vision-Language Models
- Paper: https://arxiv.org/abs/2403.02781
- Code: https://github.com/zhengli97/PromptKD
FairCLIP: Harnessing Fairness in Vision-Language Learning
- Paper: https://arxiv.org/abs/2403.19949
- Code: https://github.com/Harvard-Ophthalmology-AI-Lab/FairCLIP
DETRs Beat YOLOs on Real-time Object Detection
- Paper: https://arxiv.org/abs/2304.08069
- Code: https://github.com/lyuwenyu/RT-DETR
Boosting Object Detection with Zero-Shot Day-Night Domain Adaptation
- Paper: https://arxiv.org/abs/2312.01220
- Code: https://github.com/ZPDu/Boosting-Object-Detection-with-Zero-Shot-Day-Night-Domain-Adaptation
YOLO-World: Real-Time Open-Vocabulary Object Detection
- Paper: https://arxiv.org/abs/2401.17270
- Code: https://github.com/AILab-CVC/YOLO-World
Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement
- Paper: https://arxiv.org/abs/2403.16131
- Code: https://github.com/xiuqhou/Salience-DETR
Anomaly Heterogeneity Learning for Open-set Supervised Anomaly Detection
- Paper: https://arxiv.org/abs/2310.12790
- Code: https://github.com/mala-lab/AHL
Delving into the Trajectory Long-tail Distribution for Muti-object Tracking
- Paper: https://arxiv.org/abs/2403.04700
- Code: https://github.com/chen-si-jia/Trajectory-Long-tail-Distribution-for-MOT
Stronger, Fewer, & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation
- Paper: https://arxiv.org/abs/2312.04265
- Code: https://github.com/w1oves/Rein
SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation
- Paper: https://arxiv.org/abs/2311.15537
- Code: https://github.com/xb534/SED
Feature Re-Embedding: Towards Foundation Model-Level Performance in Computational Pathology
- Paper: https://arxiv.org/abs/2402.17228
- Code: https://github.com/DearCaat/RRT-MIL
VoCo: A Simple-yet-Effective Volume Contrastive Learning Framework for 3D Medical Image Analysis
- Paper: https://arxiv.org/abs/2402.17300
- Code: https://github.com/Luffy03/VoCo
ChAda-ViT : Channel Adaptive Attention for Joint Representation Learning of Heterogeneous Microscopy Images
- Paper: https://arxiv.org/abs/2311.15264
- Code: https://github.com/nicoboou/chada_vit
UniPAD: A Universal Pre-training Paradigm for Autonomous Driving
- Paper: https://arxiv.org/abs/2310.08370
- Code: https://github.com/Nightmare-n/UniPAD
Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications
- Paper: https://arxiv.org/abs/2311.17663
- Code: https://github.com/haomo-ai/Cam4DOcc
Memory-based Adapters for Online 3D Scene Perception
- Paper: https://arxiv.org/abs/2403.06974
- Code: https://github.com/xuxw98/Online3D
Symphonize 3D Semantic Scene Completion with Contextual Instance Queries
- Paper: https://arxiv.org/abs/2306.15670
- Code: https://github.com/hustvl/Symphonies
A Real-world Large-scale Dataset for Roadside Cooperative Perception
- Paper: https://arxiv.org/abs/2403.10145
- Code: https://github.com/AIR-THU/DAIR-RCooper
Adaptive Fusion of Single-View and Multi-View Depth for Autonomous Driving
- Paper: https://arxiv.org/abs/2403.07535
- Code: https://github.com/Junda24/AFNet
PTT: Point-Trajectory Transformer for Efficient Temporal 3D Object Detection
- Paper: https://arxiv.org/abs/2312.08371
- Code: https://github.com/kuanchihhuang/PTT
UniMODE: Unified Monocular 3D Object Detection
Edit One for All: Interactive Batch Image Editing
- Homepage: https://thaoshibe.github.io/edit-one-for-all
- Paper: https://arxiv.org/abs/2401.10219
- Code: https://github.com/thaoshibe/edit-one-for-all
MaskINT: Video Editing via Interpolative Non-autoregressive Masked Transformers
-
Homepage: https://maskint.github.io
-
Paper: https://arxiv.org/abs/2312.12468
Residual Denoising Diffusion Models
- Paper: https://arxiv.org/abs/2308.13712
- Code: https://github.com/nachifur/RDDM
Boosting Image Restoration via Priors from Pre-trained Models
SeD: Semantic-Aware Discriminator for Image Super-Resolution
- Paper: https://arxiv.org/abs/2402.19387
- Code: https://github.com/lbc12345/SeD
APISR: Anime Production Inspired Real-World Anime Super-Resolution
- Paper: https://arxiv.org/abs/2403.01598
- Code: https://github.com/Kiteretsu77/APISR
InstanceDiffusion: Instance-level Control for Image Generation
-
Paper: https://arxiv.org/abs/2402.03290
-
Code: https://github.com/frank-xwang/InstanceDiffusion
ECLIPSE: A Resource-Efficient Text-to-Image Prior for Image Generations
-
Homepage: https://eclipse-t2i.vercel.app/
-
Paper: https://arxiv.org/abs/2312.04655
-
Code: https://github.com/eclipse-t2i/eclipse-inference
Instruct-Imagen: Image Generation with Multi-modal Instruction
Residual Denoising Diffusion Models
- Paper: https://arxiv.org/abs/2308.13712
- Code: https://github.com/nachifur/RDDM
UniGS: Unified Representation for Image Generation and Segmentation
Multi-Instance Generation Controller for Text-to-Image Synthesis
- Paper: https://arxiv.org/abs/2402.05408
- Code: https://github.com/limuloo/migc
SVGDreamer: Text Guided SVG Generation with Diffusion Model
- Paper: https://arxiv.org/abs/2312.16476
- Code: https://ximinng.github.io/SVGDreamer-project/
InteractDiffusion: Interaction-Control for Text-to-Image Diffusion Model
- Paper: https://arxiv.org/abs/2312.05849
- Code: https://github.com/jiuntian/interactdiffusion
Ranni: Taming Text-to-Image Diffusion for Accurate Prompt Following
- Paper: https://arxiv.org/abs/2311.17002
- Code: https://github.com/ali-vilab/Ranni
Vlogger: Make Your Dream A Vlog
- Paper: https://arxiv.org/abs/2401.09414
- Code: https://github.com/Vchitect/Vlogger
VBench: Comprehensive Benchmark Suite for Video Generative Models
- Homepage: https://vchitect.github.io/VBench-project/
- Paper: https://arxiv.org/abs/2311.17982
- Code: https://github.com/Vchitect/VBench
VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models
- Homeoage: https://video-motion-customization.github.io/
- Paper: https://arxiv.org/abs/2312.00845
- Code: https://github.com/HyeonHo99/Video-Motion-Customization
CityDreamer: Compositional Generative Model of Unbounded 3D Cities
- Homepage: https://haozhexie.com/project/city-dreamer/
- Paper: https://arxiv.org/abs/2309.00610
- Code: https://github.com/hzxie/city-dreamer
LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching
- Paper: https://arxiv.org/abs/2311.11284
- Code: https://github.com/EnVision-Research/LucidDreamer
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
- Paper: https://arxiv.org/abs/2311.17005
- Code: https://github.com/OpenGVLab/Ask-Anything/tree/main/video_chat2
Logit Standardization in Knowledge Distillation
- Paper: https://arxiv.org/abs/2403.01427
- Code: https://github.com/sunshangquan/logit-standardization-KD
Efficient Dataset Distillation via Minimax Diffusion
- Paper: https://arxiv.org/abs/2311.15529
- Code: https://github.com/vimar-gu/MinimaxDiffusion
Neural Markov Random Field for Stereo Matching
- Paper: https://arxiv.org/abs/2403.11193
- Code: https://github.com/aeolusguan/NMRF
HiKER-SGG: Hierarchical Knowledge Enhanced Robust Scene Graph Generation
- Homepage: https://zhangce01.github.io/HiKER-SGG/
- Paper : https://arxiv.org/abs/2403.12033
- Code: https://github.com/zhangce01/HiKER-SGG
KVQ: Kaleidoscope Video Quality Assessment for Short-form Videos
-
Homepage: https://lixinustc.github.io/projects/KVQ/
-
Paper: https://arxiv.org/abs/2402.07220
-
Code: https://github.com/lixinustc/KVQ-Challenge-CVPR-NTIRE2024
A Real-world Large-scale Dataset for Roadside Cooperative Perception
- Paper: https://arxiv.org/abs/2403.10145
- Code: https://github.com/AIR-THU/DAIR-RCooper
Object Recognition as Next Token Prediction
- Paper: https://arxiv.org/abs/2312.02142
- Code: https://github.com/kaiyuyue/nxtp
ParameterNet: Parameters Are All You Need for Large-scale Visual Pretraining of Mobile Networks
- Paper: https://arxiv.org/abs/2306.14525
- Code: https://parameternet.github.io/
Seamless Human Motion Composition with Blended Positional Encodings
- Paper: https://arxiv.org/abs/2402.15509
- Code: https://github.com/BarqueroGerman/FlowMDM
LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning
-
Homepage: https://ll3da.github.io/
-
Paper: https://arxiv.org/abs/2311.18651
-
Code: https://github.com/Open3DA/LL3DA
CLOVA: A Closed-LOop Visual Assistant with Tool Usage and Update
- Homepage: https://clova-tool.github.io/
- Paper: https://arxiv.org/abs/2312.10908
MoMask: Generative Masked Modeling of 3D Human Motions
- Paper: https://arxiv.org/abs/2312.00063
- Code: https://github.com/EricGuo5513/momask-codes
Amodal Ground Truth and Completion in the Wild
- Homepage: https://www.robots.ox.ac.uk/~vgg/research/amodal/
- Paper: https://arxiv.org/abs/2312.17247
- Code: https://github.com/Championchess/Amodal-Completion-in-the-Wild
Improved Visual Grounding through Self-Consistent Explanations
- Paper: https://arxiv.org/abs/2312.04554
- Code: https://github.com/uvavision/SelfEQ
ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic Object
- Homepage: https://chenshuang-zhang.github.io/imagenet_d/
- Paper: https://arxiv.org/abs/2403.18775
- Code: https://github.com/chenshuang-zhang/imagenet_d
Learning from Synthetic Human Group Activities
- Homepage: https://cjerry1243.github.io/M3Act/
- Paper https://arxiv.org/abs/2306.16772
- Code: https://github.com/cjerry1243/M3Act
MindBridge: A Cross-Subject Brain Decoding Framework
