VisionGPT: LLM-Assisted Real-Time Anomaly Detection for Safe Visual Navigation

Use

Copy/Download the repository, replace the API key in OPENAI_API_KEY.yaml, and run the demo.py.

Demonstration

See our another project for the movement prediction: H-Splitter

Overview

This project explores the potential of Large Language Models(LLMs) in zero-shot anomaly detection for safe visual navigation.

With the assistance of state-of-the-art real-time open-world object detection model Yolo-World and specialized prompts, the proposed framework can identify anomalies within camera-captured frames that include any possible obstacles, then generate concise, audio-delivered descriptions emphasizing abnormalities, assist in safe visual navigation in complex circumstances.

Moreover, our proposed framework leverages the advantages of LLMs and the open-vocabulary object detection model to achieve the dynamic scenario switch, which allows users to transition smoothly from scene to scene, which addresses the limitation of traditional visual navigation.

Furthermore, this project explored the performance contribution of different prompt components provided the vision for future improvement in visual accessibility and paved the way for LLMs in video anomaly detection and vision-language understanding.

Method

Yolo-World

We apply the latest Yolo-world for the open-world object detection task to adapt the system in any scenario any situation. The detection classes are generated by GPT-4 and can be replaced dynamically.

GPT-3.5

We apply GPT-3.5 for fast response and low cost. We have tested GPT-4 and GPT-4V but found them not financial-friendly.

H-splitter

We implemented an H-splitter to assist object detection and categorize the objects into 3 different types based on the priority.

See our another project for more info: H-Splitter

Experiments

We use Yolo-World with the H-splitter for universal object detection. For any object that falls (a)in Area 3 or (b)in Area 1/2 with 15% of window size, we record the corresponding frame as anomalies. We set this Yolo-World-H setting as the ground truth for the benchmark.

System Sensitivity

We pre-set the system with 3 different sensitivities to report the emergency: low, normal, and high. We find that the low system sensitivity is good for daily use due to the low false alarm rate.

Detection accuracy

We compare the VisionGPT with low system sensitivity with the ground truth to evaluate its performance. We find that VisionGPT has high Accuracy and prefers less False Positive (unnecessary reports).

System Performance & Compatibility

Cost Evaluation

Acknowledgements:

Please cite our work if you find this project helpful.

@article{wang2024visiongpt,
  title={VisionGPT: LLM-Assisted Real-Time Anomaly Detection for Safe Visual Navigation},
  author={Wang, Hao and Qin, Jiayou and Bastola, Ashish and Chen, Xiwen and Suchanek, John and Gong, Zihao and Razi, Abolfazl},
  journal={arXiv preprint arXiv:2403.12415},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Figures		Figures
Video		Video
output		output
plot		plot
.gitignore		.gitignore
OPENAI_API_KEY.yaml		OPENAI_API_KEY.yaml
README.md		README.md
demo.py		demo.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VisionGPT: LLM-Assisted Real-Time Anomaly Detection for Safe Visual Navigation

Use

Demonstration

Overview