[Feat] Allow user to set `train_batch_size` #177

hiyuchang · 2025-08-11T10:08:07Z

Description

As the title says.
Also fix some bugs in config manager.

Checklist

Please check the following items before code is ready to be reviewed.

Code has passed all tests
Docstrings have been added/updated in Google Style
Documentation has been updated
Code is ready for review

gemini-code-assist

Summary of Changes

Hello @hiyuchang, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

I've separated the batch_size into explore_batch_size and train_batch_size. This clarifies the different roles of batch sizes in the system, with explore_batch_size governing data collection and train_batch_size controlling the actual training process. This change also includes several bug fixes in the configuration management and updates across the codebase to ensure consistency and correctness.

Highlights

Introduction of train_batch_size: A new configuration parameter train_batch_size has been added to BufferConfig to explicitly define the batch size used for model training. The existing batch_size now specifically refers to the batch size for data exploration.
Unified Batch Size Logic: The internal read_batch_size has been deprecated and replaced with train_batch_size across various components, including buffer readers, sample strategies, and policy loss functions, streamlining how batch sizes are handled internally.
Improved Configuration Management: The configuration manager UI and backend logic have been updated to correctly present and process both explore_batch_size and train_batch_size, enhancing user clarity and preventing common configuration errors.
Simplified PPO Mini-Batch Calculation: The calculation of ppo_mini_batch_size no longer requires manual multiplication by repeat_times, as it now directly leverages the train_batch_size, leading to more intuitive and less error-prone batch sizing for PPO algorithms.
Comprehensive Documentation and Example Updates: All relevant documentation, tutorial examples, and configuration files have been updated to reflect the new train_batch_size parameter and the refined batch size semantics, ensuring users have accurate and up-to-date guidance.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

hiyuchang · 2025-08-11T10:10:16Z

/unittest-all

gemini-code-assist

Code Review

This pull request introduces train_batch_size as a new configuration parameter to distinguish it from batch_size, which is now used for the number of tasks. The changes are consistently applied across the codebase, including documentation, examples, tests, and core logic. This is a good refactoring that improves clarity.

I've found a few issues:

An error message in trinity/common/verl_config.py was not updated to use the new parameter name.
The default_eval_workflow_type configuration seems to have been unintentionally removed from the config manager UI (trinity/manager/config_manager.py), which prevents users from setting it.

Overall, the changes are well-implemented. Addressing the identified issues will further improve the quality of the code.

hiyuchang · 2025-08-11T11:26:24Z

/unittest-all

trinity/common/config.py

trinity/common/verl_config.py

hiyuchang · 2025-08-12T02:26:30Z

/unittest-all

github-actions · 2025-08-12T02:52:19Z

Summary

Tests 📝	Passed ✅	Failed ❌	Skipped ⏭️	Other ❓	Flaky 🍂	Duration ⏱️
93	91	2	0	0	0	1.5s

Failed Tests

Failed Tests ❌	Fail Message
❌ tests/common/vllm_test.py::TestAPIServerToolCall_1::test_api_tool_calls	The test failed in the call phase due to an assertion error
❌ tests/trainer/trainer_test.py::TestTrainerSFT::test_trainer	The test failed in the call phase due to an assertion error

Tests

Test Name	Status	Duration
tests/algorithm/add_strategy_test.py::TestAddStrategy::test_correct_bias_strategy	✅	1ms
tests/algorithm/add_strategy_test.py::TestAddStrategy::test_duplicate_add_strategy	✅	1ms
tests/algorithm/add_strategy_test.py::TestAddStrategy::test_grpo_args	✅	1ms
tests/algorithm/add_strategy_test.py::TestAddStrategy::test_reward_variance_strategy	✅	1ms
tests/algorithm/add_strategy_test.py::TestAddStrategy::test_step_wise_grpo_strategy	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_dpo_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_gspo_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_mix_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_opmd_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_ppo_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_sft_policy_loss	✅	1ms
tests/buffer/file_test.py::TestFileBuffer::test_file_buffer	✅	3ms
tests/buffer/file_test.py::TestFileBuffer::test_file_reader	✅	1ms
tests/buffer/file_test.py::TestFileBuffer::test_file_writer	✅	2ms
tests/buffer/queue_test.py::TestQueueBuffer::test_priority_queue_buffer_reuse	✅	6ms
tests/buffer/queue_test.py::TestQueueBuffer::test_priority_queue_capacity	✅	2ms
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_0_queue	✅	4ms
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_1_priority_queue	✅	4ms
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_capacity	✅	4ms
tests/buffer/sql_test.py::TestSQLBuffer::test_create_sql_buffer	✅	4ms
tests/common/config_test.py::TestConfig::test_all_examples_are_valid	✅	1ms
tests/common/config_test.py::TestConfig::test_config_flatten	✅	1ms
tests/common/config_test.py::TestConfig::test_continue_from_checkpoint_is_valid	✅	1ms
tests/common/config_test.py::TestConfig::test_load_default_config	✅	4ms
tests/common/experience_test.py::TestEID::test_eid_properties	✅	1ms
tests/common/experience_test.py::TestExperience::test_action_mask_and_logprobs_type	✅	1ms
tests/common/experience_test.py::TestExperience::test_assertions	✅	1ms
tests/common/experience_test.py::TestExperience::test_dpo_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_gather	✅	1ms
tests/common/experience_test.py::TestExperience::test_multi_turn_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_serialize_deserialize	✅	1ms
tests/common/experience_test.py::TestExperience::test_single_turn_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_to_dict	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_batch_conversion	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_dpo_experience_batch_conversion	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_experience_model_experience_conversion	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_multiturn_experience_batch_converstion	✅	1ms
tests/common/vllm_test.py::ModelWrapperTest_0::test_generate	✅	36ms
tests/common/vllm_test.py::ModelWrapperTest_1::test_generate	✅	53ms
tests/common/vllm_test.py::ModelWrapperTest_2::test_generate	✅	49ms
tests/common/vllm_test.py::ModelWrapperTest_3::test_generate	✅	35ms
tests/common/vllm_test.py::ModelWrapperTest_4::test_generate	✅	45ms
tests/common/vllm_test.py::TestAPIServer::test_api	✅	24ms
tests/common/vllm_test.py::TestTokenizer::test_assistant_token_mask	✅	1ms
tests/common/vllm_test.py::TestAPIServerToolCall_0_deepseek_r1::test_api_tool_calls	✅	20ms
tests/common/vllm_test.py::TestAPIServerToolCall_1::test_api_tool_calls	❌	19ms
tests/explorer/explorer_test.py::BaseExplorerCase::test_explorer	✅	1ms
tests/explorer/explorer_test.py::TestExplorerCountdownEval::test_explorer	✅	66ms
tests/explorer/explorer_test.py::TestExplorerCountdownNoEval::test_explorer	✅	61ms
tests/explorer/explorer_test.py::TestExplorerWithAddStrategy::test_explorer	✅	36ms
tests/explorer/scheduler_test.py::SchedulerTest::test_concurrent_operations	✅	4ms
tests/explorer/scheduler_test.py::SchedulerTest::test_get_results	✅	19ms
tests/explorer/scheduler_test.py::SchedulerTest::test_multi_step_execution	✅	4ms
tests/explorer/scheduler_test.py::SchedulerTest::test_non_repeatable_workflow	✅	4ms
tests/explorer/scheduler_test.py::SchedulerTest::test_scheduler_all_methods	✅	14ms
tests/explorer/scheduler_test.py::SchedulerTest::test_scheduler_restart_after_stop	✅	7ms
tests/explorer/scheduler_test.py::SchedulerTest::test_split_tasks	✅	7ms
tests/explorer/scheduler_test.py::SchedulerTest::test_stepwise_experience_eid	✅	4ms
tests/explorer/scheduler_test.py::SchedulerTest::test_wait_all	✅	7ms
tests/explorer/scheduler_test.py::SchedulerTest::test_wait_all_timeout_with_multi_batch	✅	13ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_reward_propagation_workflow	✅	1ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_step_wise_reward_workflow	✅	1ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_workflows_raise_error	✅	1ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_workflows_stop_at_max_env_steps	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_gsm8k_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_boxed_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_complex_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_eval_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_fraction_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_rm_gallery_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_repeatable	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_resettable	✅	1ms
tests/manager/synchronizer_test.py::TestSynchronizerExit::test_synchronizer	✅	28ms
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_0::test_synchronizer	✅	62ms
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_1::test_synchronizer	✅	65ms
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_2::test_synchronizer	✅	87ms
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_3::test_synchronizer	✅	102ms
tests/manager/synchronizer_test.py::TestNCCLBasedSynchronizer_0::test_synchronizer	✅	51ms
tests/manager/synchronizer_test.py::TestNCCLBasedSynchronizer_1::test_synchronizer	✅	50ms
tests/trainer/trainer_test.py::BaseTrainerCase::test_trainer	✅	1ms
tests/trainer/trainer_test.py::TestTrainerCountdown::test_trainer	✅	135ms
tests/trainer/trainer_test.py::TestStepAheadAsyncRL::test_trainer	✅	48ms
tests/trainer/trainer_test.py::TestTrainerGSM8K::test_trainer	✅	43ms
tests/trainer/trainer_test.py::TestTrainerSFTWarmupGSM8K::test_trainer	✅	56ms
tests/trainer/trainer_test.py::TestTrainerDPO::test_trainer	✅	34ms
tests/trainer/trainer_test.py::TestTrainerSFT::test_trainer	❌	26ms
tests/trainer/trainer_test.py::TestFullyAsyncMode::test_fully_async_mode_0_queue	✅	65ms
tests/trainer/trainer_test.py::TestFullyAsyncMode::test_fully_async_mode_1_priority_queue	✅	61ms
tests/utils/eval_utils_test.py::TestMathEvalUtils::test_extract_answer	✅	1ms
tests/utils/eval_utils_test.py::TestMathEvalUtils::test_verify_math_answer	✅	1ms
tests/utils/eval_utils_test.py::TestEvalUtils::test_is_equiv	✅	1ms
tests/utils/plugin_test.py::TestPluginLoader::test_load_plugins	✅	4ms

Github Test Reporter by CTRF 💚

hiyuchang · 2025-08-12T04:52:23Z

/unittest-module-common

github-actions · 2025-08-12T04:58:06Z

Summary

Tests 📝	Passed ✅	Failed ❌	Skipped ⏭️	Other ❓	Flaky 🍂	Duration ⏱️
26	26	0	0	0	0	299ms

Tests

Test Name	Status	Duration
tests/common/config_test.py::TestConfig::test_all_examples_are_valid	✅	2ms
tests/common/config_test.py::TestConfig::test_config_flatten	✅	1ms
tests/common/config_test.py::TestConfig::test_continue_from_checkpoint_is_valid	✅	1ms
tests/common/config_test.py::TestConfig::test_load_default_config	✅	4ms
tests/common/experience_test.py::TestEID::test_eid_properties	✅	1ms
tests/common/experience_test.py::TestExperience::test_action_mask_and_logprobs_type	✅	1ms
tests/common/experience_test.py::TestExperience::test_assertions	✅	1ms
tests/common/experience_test.py::TestExperience::test_dpo_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_gather	✅	1ms
tests/common/experience_test.py::TestExperience::test_multi_turn_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_serialize_deserialize	✅	1ms
tests/common/experience_test.py::TestExperience::test_single_turn_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_to_dict	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_batch_conversion	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_dpo_experience_batch_conversion	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_experience_model_experience_conversion	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_multiturn_experience_batch_converstion	✅	1ms
tests/common/vllm_test.py::ModelWrapperTest_0::test_generate	✅	36ms
tests/common/vllm_test.py::ModelWrapperTest_1::test_generate	✅	52ms
tests/common/vllm_test.py::ModelWrapperTest_2::test_generate	✅	47ms
tests/common/vllm_test.py::ModelWrapperTest_3::test_generate	✅	35ms
tests/common/vllm_test.py::ModelWrapperTest_4::test_generate	✅	47ms
tests/common/vllm_test.py::TestAPIServer::test_api	✅	24ms
tests/common/vllm_test.py::TestTokenizer::test_assistant_token_mask	✅	1ms
tests/common/vllm_test.py::TestAPIServerToolCall_0_deepseek_r1::test_api_tool_calls	✅	22ms
tests/common/vllm_test.py::TestAPIServerToolCall_1::test_api_tool_calls	✅	20ms

Github Test Reporter by CTRF 💚

hiyuchang · 2025-08-12T04:58:17Z

/unittest-module-trainer

github-actions · 2025-08-12T05:07:15Z

Summary

Tests 📝	Passed ✅	Failed ❌	Skipped ⏭️	Other ❓	Flaky 🍂	Duration ⏱️
9	9	0	0	0	0	480ms

Tests

Test Name	Status	Duration
tests/trainer/trainer_test.py::BaseTrainerCase::test_trainer	✅	2ms
tests/trainer/trainer_test.py::TestTrainerCountdown::test_trainer	✅	135ms
tests/trainer/trainer_test.py::TestStepAheadAsyncRL::test_trainer	✅	62ms
tests/trainer/trainer_test.py::TestTrainerGSM8K::test_trainer	✅	44ms
tests/trainer/trainer_test.py::TestTrainerSFTWarmupGSM8K::test_trainer	✅	54ms
tests/trainer/trainer_test.py::TestTrainerDPO::test_trainer	✅	31ms
tests/trainer/trainer_test.py::TestTrainerSFT::test_trainer	✅	29ms
tests/trainer/trainer_test.py::TestFullyAsyncMode::test_fully_async_mode_0_queue	✅	60ms
tests/trainer/trainer_test.py::TestFullyAsyncMode::test_fully_async_mode_1_priority_queue	✅	56ms

Github Test Reporter by CTRF 💚

add train_batch_size

261b6d7

gemini-code-assist bot reviewed Aug 11, 2025

View reviewed changes

hiyuchang added 2 commits August 11, 2025 19:24

fix config

957e237

fix yaml

0ab649a

pan-x-c reviewed Aug 11, 2025

View reviewed changes

trinity/common/config.py Outdated Show resolved Hide resolved

trinity/common/config.py Outdated Show resolved Hide resolved

trinity/common/verl_config.py Outdated Show resolved Hide resolved

hiyuchang added 3 commits August 12, 2025 09:37

fix comment

9c8bfbd

fix test

ed9deaa

Merge branch 'main' into feat/add_train_bs

28fd606

fix test

2ede699

fix eval_workflow_type

165fd2c

pan-x-c approved these changes Aug 12, 2025

View reviewed changes

pan-x-c merged commit 952c2d3 into modelscope:main Aug 12, 2025
2 checks passed

[Feat] Allow user to set train_batch_size #177

[Feat] Allow user to set train_batch_size #177

Uh oh!

Conversation

hiyuchang commented Aug 11, 2025

Description

Checklist

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

hiyuchang commented Aug 11, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

hiyuchang commented Aug 11, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hiyuchang commented Aug 12, 2025

Uh oh!

github-actions bot commented Aug 12, 2025

Summary

Failed Tests

Tests

Uh oh!

hiyuchang commented Aug 12, 2025

Uh oh!

github-actions bot commented Aug 12, 2025

Summary

Tests

Uh oh!

hiyuchang commented Aug 12, 2025

Uh oh!

github-actions bot commented Aug 12, 2025

Summary

Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Feat] Allow user to set `train_batch_size` #177

[Feat] Allow user to set `train_batch_size` #177