Skip to content

Conversation

@hiyuchang
Copy link
Collaborator

@hiyuchang hiyuchang commented Aug 28, 2025

Description

  • Support vllm_model.chat_mm and vllm_model.generate_mm
  • Support API service for VLM
  • Allow Users to define min_pixels and max_pixels for processor

Checklist

Please check the following items before code is ready to be reviewed.

  • Code has passed all tests
  • Docstrings have been added/updated in Google Style
  • Documentation has been updated
  • Code is ready for review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @hiyuchang, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the framework by integrating multi-modal capabilities into its Large Language Model (LLM) operations. The primary goal is to allow models to process and generate content that combines text with visual data, such as images and videos. This involves extending core data structures to accommodate multi-modal inputs, introducing new API methods for multi-modal chat and generation, and updating the training infrastructure to properly handle these new data types. The changes lay the groundwork for developing and training advanced multi-modal LLMs within the system.

Highlights

  • Multi-Modal LLM Support: This PR introduces comprehensive support for multi-modal Large Language Models (LLMs) within the framework, enabling the processing and generation of responses that incorporate both text and visual data (images and videos).
  • New Multi-Modal API Endpoints: New chat_mm and generate_mm methods have been added to the ModelWrapper and specifically implemented for vllm_model, allowing users to interact with multi-modal LLMs through chat and generation interfaces.
  • Enhanced Data Structures for Multi-Modality: The core Experience data structure has been extended to store multi_modal_data and multi_modal_inputs, ensuring that multi-modal information is properly captured and propagated throughout the system, from data loading to training.
  • Multi-Modal Workflow Integration: A new SimpleMMWorkflow has been introduced, demonstrating how multi-modal data can be integrated into existing task workflows, particularly for tasks involving both text and images.
  • Trainer Updates for Multi-Modal Data: The training pipeline, specifically the verl_trainer, has been updated to correctly handle and process multi-modal inputs, including the dynamic calculation of position_ids for models like Qwen2VL.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds foundational support for multi-modal models, including new methods for handling image and video data in experiences and workflows. The changes are extensive, touching upon configuration, data structures, model wrappers, and the training pipeline. My review focuses on improving code correctness, performance, and maintainability. Key feedback includes fixing a potential crash in experience creation, addressing performance bottlenecks in multi-modal data processing, and correcting logic in the new VLM and workflow implementations. Overall, this is a great step towards multi-modal capabilities, and the following suggestions aim to solidify this new functionality.

@hiyuchang
Copy link
Collaborator Author

/unittest-module-common

@hiyuchang hiyuchang changed the title [WIP] Support Multi-Modal LLM Support Multi-Modal LLM Sep 2, 2025
@hiyuchang
Copy link
Collaborator Author

/unittest-module-trainer

@github-actions
Copy link

github-actions bot commented Sep 2, 2025

Summary

Tests 📝 Passed ✅ Failed ❌ Skipped ⏭️ Other ❓ Flaky 🍂 Duration ⏱️
31 21 10 0 0 0 26ms

Failed Tests

Failed Tests ❌ Fail Message
❌ tests/common/vllm_test.py::ModelWrapperTest_0::test_generate The test failed in the call phase
❌ tests/common/vllm_test.py::ModelWrapperTest_1::test_generate The test failed in the call phase
❌ tests/common/vllm_test.py::ModelWrapperTest_2::test_generate The test failed in the call phase
❌ tests/common/vllm_test.py::ModelWrapperTest_3::test_generate The test failed in the call phase
❌ tests/common/vllm_test.py::ModelWrapperTest_4::test_generate The test failed in the call phase
❌ tests/common/vllm_test.py::ModelWrapperTest_5::test_generate The test failed in the call phase
❌ tests/common/vllm_test.py::ModelWrapperTest_6::test_generate The test failed in the call phase
❌ tests/common/vllm_test.py::TestAPIServer::test_api The test failed in the call phase
❌ tests/common/vllm_test.py::TestAPIServerToolCall_0_deepseek_r1::test_api_tool_calls The test failed in the call phase
❌ tests/common/vllm_test.py::TestAPIServerToolCall_1::test_api_tool_calls The test failed in the call phase

Tests

Test Name Status Flaky Duration
tests/common/config_test.py::TestConfig::test_all_examples_are_valid 3ms
tests/common/config_test.py::TestConfig::test_config_flatten 1ms
tests/common/config_test.py::TestConfig::test_continue_from_checkpoint_is_valid 1ms
tests/common/config_test.py::TestConfig::test_load_default_config 4ms
tests/common/experience_test.py::TestEID::test_eid_properties 1ms
tests/common/experience_test.py::TestExperience::test_action_mask_and_logprobs_type 1ms
tests/common/experience_test.py::TestExperience::test_assertions 1ms
tests/common/experience_test.py::TestExperience::test_dpo_experience 1ms
tests/common/experience_test.py::TestExperience::test_gather 1ms
tests/common/experience_test.py::TestExperience::test_hf_datasets_conversion 1ms
tests/common/experience_test.py::TestExperience::test_multi_turn_experience 1ms
tests/common/experience_test.py::TestExperience::test_serialize_deserialize 1ms
tests/common/experience_test.py::TestExperience::test_single_turn_experience 1ms
tests/common/experience_test.py::TestExperience::test_to_dict 1ms
tests/common/experience_test.py::TestExperienceConversion::test_batch_conversion 1ms
tests/common/experience_test.py::TestExperienceConversion::test_dpo_experience_batch_conversion 1ms
tests/common/experience_test.py::TestExperienceConversion::test_experience_model_experience_conversion 1ms
tests/common/experience_test.py::TestExperienceConversion::test_gather_experiences_with_custom_fields 1ms
tests/common/experience_test.py::TestExperienceConversion::test_multiturn_experience_batch_converstion 1ms
tests/common/vllm_test.py::ModelWrapperTest_0::test_generate 2ms
tests/common/vllm_test.py::ModelWrapperTest_1::test_generate 1ms
tests/common/vllm_test.py::ModelWrapperTest_2::test_generate 1ms
tests/common/vllm_test.py::ModelWrapperTest_3::test_generate 1ms
tests/common/vllm_test.py::ModelWrapperTest_4::test_generate 1ms
tests/common/vllm_test.py::ModelWrapperTest_5::test_generate 1ms
tests/common/vllm_test.py::ModelWrapperTest_6::test_generate 1ms
tests/common/vllm_test.py::TestAPIServer::test_api 1ms
tests/common/vllm_test.py::TestTokenizer::test_action_mask 1ms
tests/common/vllm_test.py::TestTokenizer::test_action_mask_with_tools 1ms
tests/common/vllm_test.py::TestAPIServerToolCall_0_deepseek_r1::test_api_tool_calls 1ms
tests/common/vllm_test.py::TestAPIServerToolCall_1::test_api_tool_calls 1ms

Github Test Reporter by CTRF 💚

@hiyuchang
Copy link
Collaborator Author

/unittest-module-common

@github-actions
Copy link

github-actions bot commented Sep 2, 2025

Summary

Tests 📝 Passed ✅ Failed ❌ Skipped ⏭️ Other ❓ Flaky 🍂 Duration ⏱️
31 31 0 0 0 0 337ms

Tests

Test Name Status Flaky Duration
tests/common/config_test.py::TestConfig::test_all_examples_are_valid 3ms
tests/common/config_test.py::TestConfig::test_config_flatten 1ms
tests/common/config_test.py::TestConfig::test_continue_from_checkpoint_is_valid 1ms
tests/common/config_test.py::TestConfig::test_load_default_config 4ms
tests/common/experience_test.py::TestEID::test_eid_properties 1ms
tests/common/experience_test.py::TestExperience::test_action_mask_and_logprobs_type 1ms
tests/common/experience_test.py::TestExperience::test_assertions 1ms
tests/common/experience_test.py::TestExperience::test_dpo_experience 1ms
tests/common/experience_test.py::TestExperience::test_gather 1ms
tests/common/experience_test.py::TestExperience::test_hf_datasets_conversion 1ms
tests/common/experience_test.py::TestExperience::test_multi_turn_experience 1ms
tests/common/experience_test.py::TestExperience::test_serialize_deserialize 1ms
tests/common/experience_test.py::TestExperience::test_single_turn_experience 1ms
tests/common/experience_test.py::TestExperience::test_to_dict 1ms
tests/common/experience_test.py::TestExperienceConversion::test_batch_conversion 1ms
tests/common/experience_test.py::TestExperienceConversion::test_dpo_experience_batch_conversion 1ms
tests/common/experience_test.py::TestExperienceConversion::test_experience_model_experience_conversion 1ms
tests/common/experience_test.py::TestExperienceConversion::test_gather_experiences_with_custom_fields 1ms
tests/common/experience_test.py::TestExperienceConversion::test_multiturn_experience_batch_converstion 1ms
tests/common/vllm_test.py::ModelWrapperTest_0::test_generate 38ms
tests/common/vllm_test.py::ModelWrapperTest_1::test_generate 16ms
tests/common/vllm_test.py::ModelWrapperTest_2::test_generate 16ms
tests/common/vllm_test.py::ModelWrapperTest_3::test_generate 54ms
tests/common/vllm_test.py::ModelWrapperTest_4::test_generate 48ms
tests/common/vllm_test.py::ModelWrapperTest_5::test_generate 36ms
tests/common/vllm_test.py::ModelWrapperTest_6::test_generate 48ms
tests/common/vllm_test.py::TestAPIServer::test_api 24ms
tests/common/vllm_test.py::TestTokenizer::test_action_mask 1ms
tests/common/vllm_test.py::TestTokenizer::test_action_mask_with_tools 1ms
tests/common/vllm_test.py::TestAPIServerToolCall_0_deepseek_r1::test_api_tool_calls 21ms
tests/common/vllm_test.py::TestAPIServerToolCall_1::test_api_tool_calls 19ms

Github Test Reporter by CTRF 💚

@hiyuchang
Copy link
Collaborator Author

/unittest-module-trainer

@github-actions
Copy link

github-actions bot commented Sep 2, 2025

Summary

Tests 📝 Passed ✅ Failed ❌ Skipped ⏭️ Other ❓ Flaky 🍂 Duration ⏱️
14 13 0 1 0 0 743ms

Skipped

Tests Status
tests/trainer/trainer_test.py::TestTrainerMultiModal::test_trainer skipped ⏭️

Tests

Test Name Status Flaky Duration
tests/trainer/trainer_test.py::TestTrainerCountdown::test_trainer 134ms
tests/trainer/trainer_test.py::TestStepAheadAsyncRL::test_trainer 61ms
tests/trainer/trainer_test.py::TestTrainerGSM8K_0_fsdp::test_trainer 45ms
tests/trainer/trainer_test.py::TestTrainerGSM8K_1_fsdp2::test_trainer 44ms
tests/trainer/trainer_test.py::TestTrainerGSM8K_2_fsdp::test_trainer 49ms
tests/trainer/trainer_test.py::TestTrainerGSM8K_3_fsdp2::test_trainer 57ms
tests/trainer/trainer_test.py::TestTrainerSFTWarmupGSM8K::test_trainer 62ms
tests/trainer/trainer_test.py::TestTrainerDPO::test_trainer 31ms
tests/trainer/trainer_test.py::TestTrainerSFT::test_trainer 29ms
tests/trainer/trainer_test.py::TestTrainerToolsSFT::test_trainer_tools 29ms
tests/trainer/trainer_test.py::TestFullyAsyncMode::test_fully_async_mode_0_queue 67ms
tests/trainer/trainer_test.py::TestFullyAsyncMode::test_fully_async_mode_1_priority_queue 69ms
tests/trainer/trainer_test.py::TestTrainerMIX::test_trainer 56ms
tests/trainer/trainer_test.py::TestTrainerMultiModal::test_trainer ⏭️ 1ms

Github Test Reporter by CTRF 💚

@pan-x-c pan-x-c merged commit 6b5edec into modelscope:main Sep 3, 2025
1 check passed
yaochaorui pushed a commit to yaochaorui/Trinity-RFT that referenced this pull request Sep 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants