GitHub - diamond2nv/PH-Reg: Official code for "Vision Transformers with Self-Distilled Registers" (NeurIPS 2025 Spotlight)

Vision Transformers with Self-Distilled Registers (NeurIPS 2025 Spotlight)

If you like our PH-Reg, please give us a star ⭐ on GitHub for the latest update~

This repository contains the official PyTorch implementation for our NeurIPS 2025 paper, Vision Transformers with Self-Distilled Registers.

Environment Requirements

To train PH-Reg, please install the following packages. We used Python 3.10 in our experiments.

pip install -r requirements_eval.txt
pip install numpy==1.26.4
pip install matplotlib scipy scikit-image scikit-learn h5py

pip install openmim
mim install mmengine==0.8.4 
mim install mmcv==2.0.1 
mim install mmsegmentation==1.1.1

pip install transformers==4.37.2
pip install accelerate
pip install diffusers
pip install timm

pip install open-clip-torch==2.31.0
pip install imageio
pip install openai-clip
pip install opencv-python

pip install yapf==0.40.1

Training

Please download the Flickr30k dataset from https://shannon.cs.illinois.edu/DenotationGraph/

For a single GPU, please run:

python3 distill_main.py --data_root $YOUR_Flickr_PATH$ -- save_dir $YOUR_CHECKPOINT_PATH$ --pretrained_path 'facebook/dinov2-base'

For multiple GPUs, please run:

CUDA_VISIBLE_DEVICES=0,1,2,3 accelerate launch --multi_gpu --mixed_precision='bf16' distill_main.py --data_root $YOUR_Flickr_PATH$ -- save_dir $YOUR_CHECKPOINT_PATH$ --pretrained_path 'facebook/dinov2-base'

Demo

We provide demo code for performing inference and visualization. You can also find a detailed tutorial on the denoising process in the same file.

Before using it, please download the distilled CLIP weights from link.

Citation

If you find our project useful, please consider citing our paper 📝 and giving a star ⭐.

@misc{chen2025visiontransformersselfdistilledregisters,
      title={Vision Transformers with Self-Distilled Registers}, 
      author={Yinjie Chen and Zipeng Yan and Chong Zhou and Bo Dai and Andrew F. Luo},
      year={2025},
      eprint={2505.21501},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2505.21501}, 
}

Acknowledgments

We gratefully thank the authors of CLIP, SCLIP, ClearCLIP, NACLIP, MMSegmentation, DINOv2 on which our code is based.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
DINOv2_full		DINOv2_full
images		images
models		models
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
demo_tutorial.ipynb		demo_tutorial.ipynb
requirements_eval.txt		requirements_eval.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vision Transformers with Self-Distilled Registers (NeurIPS 2025 Spotlight)

If you like our PH-Reg, please give us a star ⭐ on GitHub for the latest update~

Environment Requirements

Training

Demo

Citation

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

License

diamond2nv/PH-Reg

Folders and files

Latest commit

History

Repository files navigation

Vision Transformers with Self-Distilled Registers (NeurIPS 2025 Spotlight)

If you like our PH-Reg, please give us a star ⭐ on GitHub for the latest update~

Environment Requirements

Training

Demo

Citation

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages