Skip to content
forked from 0raiser0/PH-Reg

Official code for "Vision Transformers with Self-Distilled Registers" (NeurIPS 2025 Spotlight)

License

Notifications You must be signed in to change notification settings

diamond2nv/PH-Reg

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Vision Transformers with Self-Distilled Registers (NeurIPS 2025 Spotlight)

If you like our PH-Reg, please give us a star ⭐ on GitHub for the latest update~

Teaser Image

This repository contains the official PyTorch implementation for our NeurIPS 2025 paper, Vision Transformers with Self-Distilled Registers.

Environment Requirements

To train PH-Reg, please install the following packages. We used Python 3.10 in our experiments.

pip install -r requirements_eval.txt
pip install numpy==1.26.4
pip install matplotlib scipy scikit-image scikit-learn h5py

pip install openmim
mim install mmengine==0.8.4 
mim install mmcv==2.0.1 
mim install mmsegmentation==1.1.1

pip install transformers==4.37.2
pip install accelerate
pip install diffusers
pip install timm

pip install open-clip-torch==2.31.0
pip install imageio
pip install openai-clip
pip install opencv-python

pip install yapf==0.40.1

Training

Please download the Flickr30k dataset from https://shannon.cs.illinois.edu/DenotationGraph/

For a single GPU, please run:

python3 distill_main.py --data_root $YOUR_Flickr_PATH$ -- save_dir $YOUR_CHECKPOINT_PATH$ --pretrained_path 'facebook/dinov2-base'

For multiple GPUs, please run:

CUDA_VISIBLE_DEVICES=0,1,2,3 accelerate launch --multi_gpu --mixed_precision='bf16' distill_main.py --data_root $YOUR_Flickr_PATH$ -- save_dir $YOUR_CHECKPOINT_PATH$ --pretrained_path 'facebook/dinov2-base'

Demo

We provide demo code for performing inference and visualization. You can also find a detailed tutorial on the denoising process in the same file.

Before using it, please download the distilled CLIP weights from link.

Citation

If you find our project useful, please consider citing our paper 📝 and giving a star ⭐.

@misc{chen2025visiontransformersselfdistilledregisters,
      title={Vision Transformers with Self-Distilled Registers}, 
      author={Yinjie Chen and Zipeng Yan and Chong Zhou and Bo Dai and Andrew F. Luo},
      year={2025},
      eprint={2505.21501},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2505.21501}, 
}

Acknowledgments

We gratefully thank the authors of CLIP, SCLIP, ClearCLIP, NACLIP, MMSegmentation, DINOv2 on which our code is based.

About

Official code for "Vision Transformers with Self-Distilled Registers" (NeurIPS 2025 Spotlight)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 98.5%
  • Python 1.5%