RAViG-Bench

A Benchmark for Retrieval-Augmented Visually-rich Generation with Multi-modal Automated Evaluation

Official implementation of
"RAViG-Bench: A Benchmark for Retrieval-Augmented Visually-rich Generation with Multi-modal Automated Evaluation"

RAViG-Bench is a comprehensive benchmark for evaluating Retrieval-Augmented Visually-rich Generation (RAViG) systems—models that generate HTML/CSS code enriched with visual design informed by retrieved reference documents. Our benchmark introduces a multi-dimensional evaluation framework covering execution correctness, design quality, and content quality, supported by automated, multi-modal assessment tools.

🌟 Key Features

Multi-dimensional Evaluation: Assesses models across three critical dimensions:
- Execution Correctness: Validate HTML execution correctness.
- Design Quality: Assess visual design quality via automated screenshot analysis and heuristic checks.
- Content Quality: Evaluate content quality using LLM-based prompting.

📁 File Organization

config/
├── authorization.json          # API keys and urls for LLM services (e.g., OpenAI, Gemini)

execution_eval/                 # ✅ Functionality Validation
└── check_html.py               # Validates HTML syntax, structure, and renderability

design_eval/                    # 🎨 Design Quality Assessment
├── screenshot-tool/            
│   ├── module_screenshot.py    # Captures screenshots of sections based on H1/H2 headings
│   └── web_screenshot.py       # Takes full-page screenshots
├── big_charts.py               # Detects oversized chart elements
├── big_svg.py                  # Identifies excessively large SVG components
├── missing.py                  # Flags missing expected UI elements
├── occlusion.py                # Detects overlapping or occluded content
├── color_detect.py             # Evaluates text/background color contrast (WCAG compliance)
├── color_detect_chart.py       # Specialized contrast check for chart elements
├── overflow_detect.py          # Identifies layout overflow issues
└── merge_results.py            # Aggregates all design evaluation metrics

information_eval/               # 📝 Content Quality Evaluation
├── information_prompts/        # Prompt templates for assessing Reasonableness, Comprehensiveness and Faithfulness.
└── information_eval_report.sh  # Script to run LLM-based content evaluation

functions/                      # Helper utilities
data/                           # dataset and project-related data
├── dataset/                    # RAViG-Bench dataset
├── few_shots/                  # few-shots for design_eval
└── test_case/                  # Sample inputs

▶️ Execution Methods

🔁 End-to-End Evaluation

Run the full evaluation pipeline on your model's generated HTML outputs:

bash run_eval.sh --base_dir <OUTPUT_DIR> --infer_file_name <GENERATED_HTML_FILE> --model_name <YOUR_MODEL_NAME>

Note: Before running, ensure:

config/authorization.json contains valid API keys for any required LLM services.

The <OUTPUT_DIR> contains your generated HTML files.

Parameters in run_eval.sh (e.g., paths, thresholds) are adjusted as needed.

This command will:

Validate HTML execution correctness
Assess visual design quality via automated screenshot analysis and heuristic checks
Evaluate content quality using LLM-based prompting

Results are saved in structured JSON reports under the specified --base_dir.

🧪 Demo Evaluation

To test the pipeline on sample data:

bash run_eval_demo.sh

This runs a lightweight demo using built-in example outputs and prints summarized metrics to the console. Useful for verifying installation and understanding output format.

⚙️ Setup & Dependencies

Python Environment
We recommend Python ≥3.10. Create a virtual environment:

python -m venv ravig_env
source ravig_env/bin/activate  # Linux/Mac
# or ravig_env\Scripts\activate  # Windows

Install Requirements
```
pip install -r requirements.txt
```
(Ensure requirements.txt is included in your repo with dependencies like selenium, Pillow, beautifulsoup4, openai, etc.)
Configure API Keys
Edit config/authorization.json:
Install Browser Drivers (for screenshot tools)
Ensure ChromeDriver or GeckoDriver is installed and in your PATH if using Selenium-based screenshot tools.

📄 License

This project is licensed under the Apache License 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
config		config
content_eval		content_eval
data		data
design_eval		design_eval
execution_eval		execution_eval
functions		functions
.gitignore		.gitignore
LEGAL.md		LEGAL.md
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
requirements.txt		requirements.txt
run_eval.sh		run_eval.sh
run_eval_demo.sh		run_eval_demo.sh
run_screenshot.sh		run_screenshot.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAViG-Bench

🌟 Key Features

📁 File Organization

▶️ Execution Methods

🔁 End-to-End Evaluation

🧪 Demo Evaluation

⚙️ Setup & Dependencies

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

antgroup/ravig-bench

Folders and files

Latest commit

History

Repository files navigation

RAViG-Bench

🌟 Key Features

📁 File Organization

▶️ Execution Methods

🔁 End-to-End Evaluation

🧪 Demo Evaluation

⚙️ Setup & Dependencies

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages