Ruoxin Chen1, Junwei Xi2, Zhiyuan Yan3, Keyue Zhang1, Shuang Wu1,
Jingyi Xie4, Xu Chen2, Lei Xu5, Isabel Guan6†, Taiping Yao1†, Shouhong Ding1
1Tencent YouTu Lab 2East China University of Science and Technology 3Peking University
4Renmin University of China 5Shenzhen University 6Hong Kong University of Science and Technology
2025/09: 🎉 Accepted by NeurIPS 2025 as Spotlight.
JPEG compression with a quality factor of 96 is applied to the synthetic images in GenImage, ForenSynths, and AIGCDetectionBenchmark to mitigate format bias. The number of generators used in each dataset is reported: G refers to GAN, D to Diffusion, and AR to Auto-Regressive models. Among the 11 benchmarks, Chameleon, Synthwildx, WildRF, and Bfree-Online are the 4 in-the-wild datasets. Notably, DDA is the first detector to achieve over 80% cross-data accuracy on Chameleon.
| Benchmark | NPR (CVPR'24) | UnivFD (CVPR'23) | FatFormer (CVPR'24) | SAFE (KDD'25) | C2P-CLIP (AAAI'25) | AIDE (ICLR'25) | DRCT (ICML'24) | AlignedForensics (ICLR'25) | DDA (ours) |
|---|---|---|---|---|---|---|---|---|---|
| GenImage (1G + 7D) | 51.5 ± 6.3 | 64.1 ± 10.8 | 62.8 ± 10.4 | 50.3 ± 1.2 | 74.4 ± 8.4 | 61.2 ± 11.9 | 84.7 ± 2.7 | 79.0 ± 22.7 | 91.7 ± 7.8 |
| DRCT-2M (16D) | 37.3 ± 15.0 | 61.8 ± 8.9 | 52.2 ± 5.7 | 59.3 ± 19.2 | 59.2 ± 9.9 | 64.6 ± 11.8 | 90.5 ± 7.4 | 95.5 ± 6.1 | 98.1 ± 1.4 |
| DDA-COCO (5D) | 42.2 ± 5.4 | 52.4 ± 1.5 | 51.7 ± 1.5 | 49.9 ± 0.3 | 51.3 ± 0.6 | 50.0 ± 0.4 | 60.2 ± 4.3 | 86.5 ± 19.1 | 92.2 ± 10.6 |
| EvalGEN (3D + 2AR) | 2.9 ± 2.7 | 15.4 ± 14.2 | 45.6 ± 33.1 | 1.1 ± 0.6 | 38.9 ± 31.2 | 19.1 ± 11.1 | 77.8 ± 5.4 | 68.0 ± 20.7 | 97.2 ± 4.2 |
| Synthbuster (9D) | 50.0 ± 2.6 | 67.8 ± 14.4 | 56.1 ± 10.7 | 46.5 ± 20.8 | 68.5 ± 11.4 | 53.9 ± 18.6 | 84.8 ± 3.6 | 77.4 ± 25.0 | 90.1 ± 5.6 |
| ForenSynths (11G) | 47.9 ± 22.6 | 77.7 ± 16.1 | 90.0 ± 11.8 | 49.7 ± 2.7 | 92.0 ± 10.1 | 59.4 ± 24.6 | 73.9 ± 13.4 | 53.9 ± 7.1 | 81.4 ± 13.9 |
| AIGCDetectionBenchmark (7G + 10D) | 53.1 ± 12.2 | 72.5 ± 17.3 | 85.0 ± 14.9 | 50.3 ± 1.1 | 81.4 ± 15.6 | 63.6 ± 13.9 | 81.4 ± 12.2 | 66.6 ± 21.6 | 87.8 ± 12.6 |
| Chameleon (Unknown) | 59.9 | 50.7 | 51.2 | 59.2 | 51.1 | 63.1 | 56.6 | 71.0 | 82.4 |
| Synthwildx (3D) | 49.8 ± 10.0 | 52.3 ± 11.3 | 52.1 ± 8.2 | 49.1 ± 0.7 | 57.1 ± 4.2 | 48.8 ± 0.8 | 55.1 ± 1.8 | 78.8 ± 17.8 | 90.9 ± 3.1 |
| WildRF (Unknown) | 63.5 ± 13.6 | 55.3 ± 5.7 | 58.9 ± 8.0 | 57.2 ± 18.5 | 59.6 ± 7.7 | 58.4 ± 12.9 | 50.6 ± 3.5 | 80.1 ± 10.3 | 90.3 ± 3.5 |
| Bfree-Online (Unknown) | 49.5 | 49.0 | 50.0 | 50.5 | 50.0 | 53.1 | 55.7 | 68.5 | 95.1 |
| Avg ACC | 46.1 ± 16.1 | 56.3 ± 16.5 | 59.6 ± 14.6 | 47.6 ± 16.0 | 62.1 ± 15.6 | 54.1 ± 12.8 | 70.1 ± 14.6 | 75.0 ± 11.1 | 90.7 ± 5.3 |
| Min ACC | 2.9 | 15.4 | 45.6 | 1.1 | 38.9 | 19.1 | 50.6 | 53.9 | 81.4 |
All evaluation benchmarks used in our experiments are obtained from publicly available sources.
We sincerely thank the original authors for providing these valuable AIGI detection datasets.
| Benchmark | Paper | Download |
|---|---|---|
| GenImage | GenImage: A Million-Scale Benchmark for Detecting AI-Generated Image | Google Drive |
| DRCT-2M | DRCT: Diffusion Reconstruction Contrastive Training towards Universe Detection of Diffusion Generated Images | ModelScope |
| Synthbuster | Synthbuster: Towards Detection of Diffusion Model Generated Images | Official Page |
| ForenSynths | CNN-generated images are surprisingly easy to spot... for now | Google Drive · CMU Box |
| AIGCDetectionBenchmark | A Comprehensive Benchmark for AI-generated Image Detection | ModelScope |
| Chameleon | A Sanity Check for AI-generated Image Detection | Contact: tattoo.ysl@gmail.com |
| SynthwildX | Raising the Bar of AI-generated Image Detection with CLIP | GitHub |
| WildRF | Real-Time Deepfake Detection in the Real-World | Google Drive |
| Bfree-Online | A Bias-Free Training Paradigm for More General AI-generated Image Detection | Official Download |
| DDA-COCO | Dual Data Alignment Makes AI-Generated Image Detector Easier Generalizable | ModelScope |
| EvalGEN | Dual Data Alignment Makes AI-Generated Image Detector Easier Generalizable | HuggingFace |
The training dataset has been released on ModelScope and HuggingFace.
The checkpoint has been released on ModelScope and HuggingFace.
DDA-COCO Benchmark has been released on ModelScope and HuggingFace.
EvalGEN Benchmark has been released on ModelScope and HuggingFace.
- Release arxiv paper with complete BibTeX citation
- Release checkpoint and inference code
- Release training set and training script
- Release code for DDA data construction
If you have any questions or suggestions, please feel free to contact us at cusmochen@tencent.com.
Feel free to reach out if you have any questions. This WeChat group brings together researchers working on AI-generated image detection, including authors of Effort (ICML 2025 Oral), DRCT (ICML 2024 Spotlight), and related work. Our goal is to build a focused community where researchers can exchange ideas and inspire new directions in AIGI detection.
Part of this codebase is adapted from UniversalFakeDetect. Huge thanks to the original authors for sharing their excellent work!
If you find this repository useful for your work, please consider citing it as follows:
@inproceedings{chen2025dual,
title={Dual Data Alignment Makes {AI}-Generated Image Detector Easier Generalizable},
author={Ruoxin Chen and Junwei Xi and Zhiyuan Yan and Ke-Yue Zhang and Shuang Wu and Jingyi Xie and Xu Chen and Lei Xu and Isabel Guan and Taiping Yao and Shouhong Ding},
booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
year={2025},
url={https://openreview.net/forum?id=C39ShJwtD5}
}
