Support Multi-Modal LLM #234

hiyuchang · 2025-08-28T08:21:27Z

Description

Support vllm_model.chat_mm and vllm_model.generate_mm
Support API service for VLM
Allow Users to define min_pixels and max_pixels for processor

Checklist

Please check the following items before code is ready to be reviewed.

Code has passed all tests
Docstrings have been added/updated in Google Style
Documentation has been updated
Code is ready for review

gemini-code-assist

Summary of Changes

Hello @hiyuchang, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the framework by integrating multi-modal capabilities into its Large Language Model (LLM) operations. The primary goal is to allow models to process and generate content that combines text with visual data, such as images and videos. This involves extending core data structures to accommodate multi-modal inputs, introducing new API methods for multi-modal chat and generation, and updating the training infrastructure to properly handle these new data types. The changes lay the groundwork for developing and training advanced multi-modal LLMs within the system.

Highlights

Multi-Modal LLM Support: This PR introduces comprehensive support for multi-modal Large Language Models (LLMs) within the framework, enabling the processing and generation of responses that incorporate both text and visual data (images and videos).
New Multi-Modal API Endpoints: New chat_mm and generate_mm methods have been added to the ModelWrapper and specifically implemented for vllm_model, allowing users to interact with multi-modal LLMs through chat and generation interfaces.
Enhanced Data Structures for Multi-Modality: The core Experience data structure has been extended to store multi_modal_data and multi_modal_inputs, ensuring that multi-modal information is properly captured and propagated throughout the system, from data loading to training.
Multi-Modal Workflow Integration: A new SimpleMMWorkflow has been introduced, demonstrating how multi-modal data can be integrated into existing task workflows, particularly for tasks involving both text and images.
Trainer Updates for Multi-Modal Data: The training pipeline, specifically the verl_trainer, has been updated to correctly handle and process multi-modal inputs, including the dynamic calculation of position_ids for models like Qwen2VL.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request adds foundational support for multi-modal models, including new methods for handling image and video data in experiences and workflows. The changes are extensive, touching upon configuration, data structures, model wrappers, and the training pipeline. My review focuses on improving code correctness, performance, and maintainability. Key feedback includes fixing a potential crash in experience creation, addressing performance bottlenecks in multi-modal data processing, and correcting logic in the new VLM and workflow implementations. Overall, this is a great step towards multi-modal capabilities, and the following suggestions aim to solidify this new functionality.

trinity/common/experience.py

trinity/common/models/mm_utils.py

trinity/common/models/vllm_model.py

examples/grpo_vlm/README.md

examples/grpo_vlm/vlm.yaml

trinity/common/experience.py

trinity/common/models/vllm_model.py

trinity/common/workflows/simple_mm_workflow.py

trinity/trainer/verl_trainer.py

trinity/common/config.py

trinity/common/experience.py

hiyuchang · 2025-09-02T07:48:07Z

/unittest-module-common

hiyuchang · 2025-09-02T08:13:25Z

/unittest-module-trainer

github-actions · 2025-09-02T08:20:42Z

Summary

Tests 📝	Passed ✅	Failed ❌	Skipped ⏭️	Other ❓	Flaky 🍂	Duration ⏱️
31	21	10	0	0	0	26ms

Failed Tests

Failed Tests ❌	Fail Message
❌ tests/common/vllm_test.py::ModelWrapperTest_0::test_generate	The test failed in the call phase
❌ tests/common/vllm_test.py::ModelWrapperTest_1::test_generate	The test failed in the call phase
❌ tests/common/vllm_test.py::ModelWrapperTest_2::test_generate	The test failed in the call phase
❌ tests/common/vllm_test.py::ModelWrapperTest_3::test_generate	The test failed in the call phase
❌ tests/common/vllm_test.py::ModelWrapperTest_4::test_generate	The test failed in the call phase
❌ tests/common/vllm_test.py::ModelWrapperTest_5::test_generate	The test failed in the call phase
❌ tests/common/vllm_test.py::ModelWrapperTest_6::test_generate	The test failed in the call phase
❌ tests/common/vllm_test.py::TestAPIServer::test_api	The test failed in the call phase
❌ tests/common/vllm_test.py::TestAPIServerToolCall_0_deepseek_r1::test_api_tool_calls	The test failed in the call phase
❌ tests/common/vllm_test.py::TestAPIServerToolCall_1::test_api_tool_calls	The test failed in the call phase

Tests

Test Name	Status	Duration
tests/common/config_test.py::TestConfig::test_all_examples_are_valid	✅	3ms
tests/common/config_test.py::TestConfig::test_config_flatten	✅	1ms
tests/common/config_test.py::TestConfig::test_continue_from_checkpoint_is_valid	✅	1ms
tests/common/config_test.py::TestConfig::test_load_default_config	✅	4ms
tests/common/experience_test.py::TestEID::test_eid_properties	✅	1ms
tests/common/experience_test.py::TestExperience::test_action_mask_and_logprobs_type	✅	1ms
tests/common/experience_test.py::TestExperience::test_assertions	✅	1ms
tests/common/experience_test.py::TestExperience::test_dpo_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_gather	✅	1ms
tests/common/experience_test.py::TestExperience::test_hf_datasets_conversion	✅	1ms
tests/common/experience_test.py::TestExperience::test_multi_turn_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_serialize_deserialize	✅	1ms
tests/common/experience_test.py::TestExperience::test_single_turn_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_to_dict	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_batch_conversion	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_dpo_experience_batch_conversion	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_experience_model_experience_conversion	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_gather_experiences_with_custom_fields	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_multiturn_experience_batch_converstion	✅	1ms
tests/common/vllm_test.py::ModelWrapperTest_0::test_generate	❌	2ms
tests/common/vllm_test.py::ModelWrapperTest_1::test_generate	❌	1ms
tests/common/vllm_test.py::ModelWrapperTest_2::test_generate	❌	1ms
tests/common/vllm_test.py::ModelWrapperTest_3::test_generate	❌	1ms
tests/common/vllm_test.py::ModelWrapperTest_4::test_generate	❌	1ms
tests/common/vllm_test.py::ModelWrapperTest_5::test_generate	❌	1ms
tests/common/vllm_test.py::ModelWrapperTest_6::test_generate	❌	1ms
tests/common/vllm_test.py::TestAPIServer::test_api	❌	1ms
tests/common/vllm_test.py::TestTokenizer::test_action_mask	✅	1ms
tests/common/vllm_test.py::TestTokenizer::test_action_mask_with_tools	✅	1ms
tests/common/vllm_test.py::TestAPIServerToolCall_0_deepseek_r1::test_api_tool_calls	❌	1ms
tests/common/vllm_test.py::TestAPIServerToolCall_1::test_api_tool_calls	❌	1ms

Github Test Reporter by CTRF 💚

hiyuchang · 2025-09-02T08:43:25Z

/unittest-module-common

github-actions · 2025-09-02T08:49:55Z

Summary

Tests 📝	Passed ✅	Failed ❌	Skipped ⏭️	Other ❓	Flaky 🍂	Duration ⏱️
31	31	0	0	0	0	337ms

Tests

Test Name	Status	Duration
tests/common/config_test.py::TestConfig::test_all_examples_are_valid	✅	3ms
tests/common/config_test.py::TestConfig::test_config_flatten	✅	1ms
tests/common/config_test.py::TestConfig::test_continue_from_checkpoint_is_valid	✅	1ms
tests/common/config_test.py::TestConfig::test_load_default_config	✅	4ms
tests/common/experience_test.py::TestEID::test_eid_properties	✅	1ms
tests/common/experience_test.py::TestExperience::test_action_mask_and_logprobs_type	✅	1ms
tests/common/experience_test.py::TestExperience::test_assertions	✅	1ms
tests/common/experience_test.py::TestExperience::test_dpo_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_gather	✅	1ms
tests/common/experience_test.py::TestExperience::test_hf_datasets_conversion	✅	1ms
tests/common/experience_test.py::TestExperience::test_multi_turn_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_serialize_deserialize	✅	1ms
tests/common/experience_test.py::TestExperience::test_single_turn_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_to_dict	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_batch_conversion	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_dpo_experience_batch_conversion	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_experience_model_experience_conversion	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_gather_experiences_with_custom_fields	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_multiturn_experience_batch_converstion	✅	1ms
tests/common/vllm_test.py::ModelWrapperTest_0::test_generate	✅	38ms
tests/common/vllm_test.py::ModelWrapperTest_1::test_generate	✅	16ms
tests/common/vllm_test.py::ModelWrapperTest_2::test_generate	✅	16ms
tests/common/vllm_test.py::ModelWrapperTest_3::test_generate	✅	54ms
tests/common/vllm_test.py::ModelWrapperTest_4::test_generate	✅	48ms
tests/common/vllm_test.py::ModelWrapperTest_5::test_generate	✅	36ms
tests/common/vllm_test.py::ModelWrapperTest_6::test_generate	✅	48ms
tests/common/vllm_test.py::TestAPIServer::test_api	✅	24ms
tests/common/vllm_test.py::TestTokenizer::test_action_mask	✅	1ms
tests/common/vllm_test.py::TestTokenizer::test_action_mask_with_tools	✅	1ms
tests/common/vllm_test.py::TestAPIServerToolCall_0_deepseek_r1::test_api_tool_calls	✅	21ms
tests/common/vllm_test.py::TestAPIServerToolCall_1::test_api_tool_calls	✅	19ms

Github Test Reporter by CTRF 💚

hiyuchang · 2025-09-02T08:54:36Z

/unittest-module-trainer

github-actions · 2025-09-02T09:07:52Z

Summary

Tests 📝	Passed ✅	Failed ❌	Skipped ⏭️	Other ❓	Flaky 🍂	Duration ⏱️
14	13	0	1	0	0	743ms

Skipped

Tests	Status
tests/trainer/trainer_test.py::TestTrainerMultiModal::test_trainer	skipped ⏭️

Tests

Test Name	Status	Duration
tests/trainer/trainer_test.py::TestTrainerCountdown::test_trainer	✅	134ms
tests/trainer/trainer_test.py::TestStepAheadAsyncRL::test_trainer	✅	61ms
tests/trainer/trainer_test.py::TestTrainerGSM8K_0_fsdp::test_trainer	✅	45ms
tests/trainer/trainer_test.py::TestTrainerGSM8K_1_fsdp2::test_trainer	✅	44ms
tests/trainer/trainer_test.py::TestTrainerGSM8K_2_fsdp::test_trainer	✅	49ms
tests/trainer/trainer_test.py::TestTrainerGSM8K_3_fsdp2::test_trainer	✅	57ms
tests/trainer/trainer_test.py::TestTrainerSFTWarmupGSM8K::test_trainer	✅	62ms
tests/trainer/trainer_test.py::TestTrainerDPO::test_trainer	✅	31ms
tests/trainer/trainer_test.py::TestTrainerSFT::test_trainer	✅	29ms
tests/trainer/trainer_test.py::TestTrainerToolsSFT::test_trainer_tools	✅	29ms
tests/trainer/trainer_test.py::TestFullyAsyncMode::test_fully_async_mode_0_queue	✅	67ms
tests/trainer/trainer_test.py::TestFullyAsyncMode::test_fully_async_mode_1_priority_queue	✅	69ms
tests/trainer/trainer_test.py::TestTrainerMIX::test_trainer	✅	56ms
tests/trainer/trainer_test.py::TestTrainerMultiModal::test_trainer	⏭️	1ms

Github Test Reporter by CTRF 💚

feat: Add multi-modal chat support for model

2c7aa99

gemini-code-assist bot reviewed Aug 28, 2025

View reviewed changes

hiyuchang added 2 commits August 28, 2025 16:33

fix comments from gemini

4edde1a

use math reward

9228506

pan-x-c reviewed Sep 2, 2025

View reviewed changes

trinity/common/config.py Outdated Show resolved Hide resolved

trinity/common/experience.py Show resolved Hide resolved

trinity/common/experience.py Outdated Show resolved Hide resolved

trinity/common/experience.py Outdated Show resolved Hide resolved

hiyuchang added 3 commits September 2, 2025 14:18

Merge branch 'main' into feat/vlm

705d548

fix comments

019f8e0

add test for vlm

7e8b73d

hiyuchang changed the title ~~[WIP] Support Multi-Modal LLM~~ Support Multi-Modal LLM Sep 2, 2025

add vlm_model_path

e09f79f

fix test

5e0619d

skip test

d517af3

hiyuchang added 3 commits September 2, 2025 18:34

clear multi_modal_data

1645466

fix arg passing

c6bbbfd

add experimental note

6b81303

pan-x-c approved these changes Sep 3, 2025

View reviewed changes

pan-x-c merged commit 6b5edec into modelscope:main Sep 3, 2025
1 check passed

yanxi-chen mentioned this pull request Sep 5, 2025

关于多模态的支持有计划吗 #215

Closed

yaochaorui pushed a commit to yaochaorui/Trinity-RFT that referenced this pull request Sep 19, 2025

Support Multi-Modal LLM (modelscope#234)

d822f1b

Support Multi-Modal LLM #234

Support Multi-Modal LLM #234

Uh oh!

Conversation

hiyuchang commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hiyuchang commented Sep 2, 2025

Uh oh!

hiyuchang commented Sep 2, 2025

Uh oh!

github-actions bot commented Sep 2, 2025

Summary

Failed Tests

Tests

Uh oh!

hiyuchang commented Sep 2, 2025

Uh oh!

github-actions bot commented Sep 2, 2025

Summary

Tests

Uh oh!

hiyuchang commented Sep 2, 2025

Uh oh!

github-actions bot commented Sep 2, 2025

Summary

Skipped

Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hiyuchang commented Aug 28, 2025 •

edited

Loading