Skip to content

Conversation

@hiyuchang
Copy link
Collaborator

@hiyuchang hiyuchang commented Jul 11, 2025

Description

  1. Refactor RewardFn to fit rm-gallery in reward_fn.py
  2. Basic (math/countdown) rewards are moved into basic_reward.py
  3. Add reward_fn_args
  4. Add MathRMWorkflow and use it in grpo_math.yaml

Checklist

Please check the following items before code is ready to be reviewed.

  • Code has passed all tests
  • Docstrings have been added/updated in Google Style
  • Documentation has been updated
  • Code is ready for review

@hiyuchang
Copy link
Collaborator Author

/run-unittest

@github-actions
Copy link

Summary

Tests 📝 Passed ✅ Failed ❌ Skipped ⏭️ Other ❓ Flaky 🍂 Duration ⏱️
0 0 0 0 0 0 4ms

Tests

Test Name Status Flaky Duration

Github Test Reporter by CTRF 💚

@hiyuchang
Copy link
Collaborator Author

/run-unittest

@github-actions
Copy link

Summary

Tests 📝 Passed ✅ Failed ❌ Skipped ⏭️ Other ❓ Flaky 🍂 Duration ⏱️
0 0 0 0 0 0 4ms

Tests

Test Name Status Flaky Duration

Github Test Reporter by CTRF 💚

@hiyuchang
Copy link
Collaborator Author

/run-unittest

@github-actions
Copy link

Summary

Tests 📝 Passed ✅ Failed ❌ Skipped ⏭️ Other ❓ Flaky 🍂 Duration ⏱️
0 0 0 0 0 0 4ms

Tests

Test Name Status Flaky Duration

Github Test Reporter by CTRF 💚

@hiyuchang
Copy link
Collaborator Author

/run-unittest

@github-actions
Copy link

Summary

Tests 📝 Passed ✅ Failed ❌ Skipped ⏭️ Other ❓ Flaky 🍂 Duration ⏱️
40 40 0 0 0 0 1.2s

Tests

Test Name Status Flaky Duration
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_dpo_policy_loss 1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_mix_policy_loss 1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_opmd_policy_loss 1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_ppo_policy_loss 1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_sft_policy_loss 1ms
tests/buffer/file_test.py::TestFileBuffer::test_file_buffer 3ms
tests/buffer/file_test.py::TestFileBuffer::test_file_reader 1ms
tests/buffer/file_test.py::TestFileBuffer::test_file_writer 2ms
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer 6ms
tests/buffer/sql_test.py::TestSQLBuffer::test_create_sql_buffer 3ms
tests/common/config_test.py::TestConfig::test_all_examples_are_valid 1ms
tests/common/config_test.py::TestConfig::test_load_default_config 4ms
tests/common/experience_test.py::TestExperienceConversion::test_batch_conversion 1ms
tests/common/experience_test.py::TestExperienceConversion::test_experience_model_experience_conversion 1ms
tests/common/vllm_test.py::TestModelWrapperSyncV0::test_generate 42ms
tests/common/vllm_test.py::TestModelWrapperAsyncV0::test_generate 40ms
tests/common/vllm_test.py::TestModelWrapperAsyncTPV0::test_generate 50ms
tests/common/vllm_test.py::TestModelWrapperAsyncTPV1::test_generate 50ms
tests/common/vllm_test.py::TestModelWrapperAsyncV1::test_generate 38ms
tests/common/vllm_test.py::TestAPIServer::test_api 24ms
tests/common/vllm_test.py::TestTokenizer::test_assistant_token_mask 1ms
tests/explorer/explorer_test.py::BaseExplorerCase::test_explorer 1ms
tests/explorer/explorer_test.py::TestExplorerCountdownEval::test_explorer 128ms
tests/explorer/explorer_test.py::TestExplorerCountdownNoEval::test_explorer 101ms
tests/explorer/runner_pool_test.py::RunnerPoolTest::test_runner_pool 22ms
tests/explorer/runner_pool_test.py::RunnerPoolTest::test_runner_pool_with_auxiliary_models 4ms
tests/explorer/workflow_test.py::WorkflowTest::test_gsm8k_workflow 1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_boxed_workflow 1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_complex_workflow 1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_fraction_workflow 1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_workflow 1ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_resettable 1ms
tests/trainer/trainer_test.py::BaseTrainerCase::test_trainer 1ms
tests/trainer/trainer_test.py::TestTrainerCountdown::test_trainer 266ms
tests/trainer/trainer_test.py::TestStepAheadAsyncRL::test_trainer 100ms
tests/trainer/trainer_test.py::TestTrainerGSM8K::test_trainer 59ms
tests/trainer/trainer_test.py::TestTrainerSFTWarmupGSM8K::test_trainer 85ms
tests/trainer/trainer_test.py::TestTrainerDPO::test_trainer 43ms
tests/trainer/trainer_test.py::TestFullyAsyncMode::test_fully_async_mode 109ms
tests/utils/plugin_test.py::TestPluginLoader::test_load_plugins 4ms

Github Test Reporter by CTRF 💚

@hiyuchang
Copy link
Collaborator Author

/run-unittest

@hiyuchang hiyuchang changed the title [WIP] Refactor RewardFn Refactor RewardFn Jul 14, 2025
@github-actions
Copy link

Summary

Tests 📝 Passed ✅ Failed ❌ Skipped ⏭️ Other ❓ Flaky 🍂 Duration ⏱️
50 50 0 0 0 0 1.3s

Tests

Test Name Status Flaky Duration
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_dpo_policy_loss 1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_mix_policy_loss 1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_opmd_policy_loss 1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_ppo_policy_loss 1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_sft_policy_loss 1ms
tests/buffer/file_test.py::TestFileBuffer::test_file_buffer 3ms
tests/buffer/file_test.py::TestFileBuffer::test_file_reader 1ms
tests/buffer/file_test.py::TestFileBuffer::test_file_writer 1ms
tests/buffer/queue_test.py::TestQueueBuffer::test_priority_queue_buffer_reuse 7ms
tests/buffer/queue_test.py::TestQueueBuffer::test_priority_queue_capacity 3ms
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_0_queue 5ms
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_1_priority_queue 5ms
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_capacity 4ms
tests/buffer/sql_test.py::TestSQLBuffer::test_create_sql_buffer 4ms
tests/common/config_test.py::TestConfig::test_all_examples_are_valid 1ms
tests/common/config_test.py::TestConfig::test_load_default_config 4ms
tests/common/experience_test.py::TestExperienceConversion::test_batch_conversion 1ms
tests/common/experience_test.py::TestExperienceConversion::test_experience_model_experience_conversion 1ms
tests/common/vllm_test.py::TestModelWrapperSyncV0::test_generate 41ms
tests/common/vllm_test.py::TestModelWrapperAsyncV0::test_generate 40ms
tests/common/vllm_test.py::TestModelWrapperAsyncTPV0::test_generate 50ms
tests/common/vllm_test.py::TestModelWrapperAsyncTPV1::test_generate 51ms
tests/common/vllm_test.py::TestModelWrapperAsyncV1::test_generate 38ms
tests/common/vllm_test.py::TestAPIServer::test_api 24ms
tests/common/vllm_test.py::TestTokenizer::test_assistant_token_mask 1ms
tests/explorer/explorer_test.py::BaseExplorerCase::test_explorer 1ms
tests/explorer/explorer_test.py::TestExplorerCountdownEval::test_explorer 96ms
tests/explorer/explorer_test.py::TestExplorerCountdownNoEval::test_explorer 83ms
tests/explorer/scheduler_test.py::SchedulerTest::test_concurrent_operations 4ms
tests/explorer/scheduler_test.py::SchedulerTest::test_get_results 19ms
tests/explorer/scheduler_test.py::SchedulerTest::test_scheduler_all_methods 14ms
tests/explorer/scheduler_test.py::SchedulerTest::test_scheduler_restart_after_stop 8ms
tests/explorer/scheduler_test.py::SchedulerTest::test_split_tasks 7ms
tests/explorer/scheduler_test.py::SchedulerTest::test_wait_all 7ms
tests/explorer/scheduler_test.py::SchedulerTest::test_wait_all_timeout_with_multi_batch 13ms
tests/explorer/workflow_test.py::WorkflowTest::test_gsm8k_workflow 1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_boxed_workflow 1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_complex_workflow 1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_fraction_workflow 1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_workflow 1ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_resettable 1ms
tests/trainer/trainer_test.py::BaseTrainerCase::test_trainer 1ms
tests/trainer/trainer_test.py::TestTrainerCountdown::test_trainer 247ms
tests/trainer/trainer_test.py::TestStepAheadAsyncRL::test_trainer 94ms
tests/trainer/trainer_test.py::TestTrainerGSM8K::test_trainer 59ms
tests/trainer/trainer_test.py::TestTrainerSFTWarmupGSM8K::test_trainer 85ms
tests/trainer/trainer_test.py::TestTrainerDPO::test_trainer 43ms
tests/trainer/trainer_test.py::TestFullyAsyncMode::test_fully_async_mode_0_queue 90ms
tests/trainer/trainer_test.py::TestFullyAsyncMode::test_fully_async_mode_1_priority_queue 97ms
tests/utils/plugin_test.py::TestPluginLoader::test_load_plugins 4ms

Github Test Reporter by CTRF 💚

@pan-x-c
Copy link
Collaborator

pan-x-c commented Jul 14, 2025

/run-unittest

@github-actions
Copy link

Summary

Tests 📝 Passed ✅ Failed ❌ Skipped ⏭️ Other ❓ Flaky 🍂 Duration ⏱️
51 50 0 1 0 0 1.3s

Skipped

Tests Status
tests/explorer/workflow_test.py::WorkflowTest::test_rm_gallery_workflow skipped ⏭️

Tests

Test Name Status Flaky Duration
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_dpo_policy_loss 1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_mix_policy_loss 1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_opmd_policy_loss 1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_ppo_policy_loss 1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_sft_policy_loss 1ms
tests/buffer/file_test.py::TestFileBuffer::test_file_buffer 3ms
tests/buffer/file_test.py::TestFileBuffer::test_file_reader 1ms
tests/buffer/file_test.py::TestFileBuffer::test_file_writer 1ms
tests/buffer/queue_test.py::TestQueueBuffer::test_priority_queue_buffer_reuse 7ms
tests/buffer/queue_test.py::TestQueueBuffer::test_priority_queue_capacity 3ms
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_0_queue 5ms
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_1_priority_queue 5ms
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_capacity 4ms
tests/buffer/sql_test.py::TestSQLBuffer::test_create_sql_buffer 4ms
tests/common/config_test.py::TestConfig::test_all_examples_are_valid 1ms
tests/common/config_test.py::TestConfig::test_load_default_config 4ms
tests/common/experience_test.py::TestExperienceConversion::test_batch_conversion 1ms
tests/common/experience_test.py::TestExperienceConversion::test_experience_model_experience_conversion 1ms
tests/common/vllm_test.py::TestModelWrapperSyncV0::test_generate 42ms
tests/common/vllm_test.py::TestModelWrapperAsyncV0::test_generate 40ms
tests/common/vllm_test.py::TestModelWrapperAsyncTPV0::test_generate 50ms
tests/common/vllm_test.py::TestModelWrapperAsyncTPV1::test_generate 51ms
tests/common/vllm_test.py::TestModelWrapperAsyncV1::test_generate 38ms
tests/common/vllm_test.py::TestAPIServer::test_api 24ms
tests/common/vllm_test.py::TestTokenizer::test_assistant_token_mask 1ms
tests/explorer/explorer_test.py::BaseExplorerCase::test_explorer 1ms
tests/explorer/explorer_test.py::TestExplorerCountdownEval::test_explorer 77ms
tests/explorer/explorer_test.py::TestExplorerCountdownNoEval::test_explorer 96ms
tests/explorer/scheduler_test.py::SchedulerTest::test_concurrent_operations 4ms
tests/explorer/scheduler_test.py::SchedulerTest::test_get_results 19ms
tests/explorer/scheduler_test.py::SchedulerTest::test_scheduler_all_methods 14ms
tests/explorer/scheduler_test.py::SchedulerTest::test_scheduler_restart_after_stop 8ms
tests/explorer/scheduler_test.py::SchedulerTest::test_split_tasks 7ms
tests/explorer/scheduler_test.py::SchedulerTest::test_wait_all 7ms
tests/explorer/scheduler_test.py::SchedulerTest::test_wait_all_timeout_with_multi_batch 13ms
tests/explorer/workflow_test.py::WorkflowTest::test_gsm8k_workflow 1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_boxed_workflow 1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_complex_workflow 1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_fraction_workflow 1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_workflow 1ms
tests/explorer/workflow_test.py::WorkflowTest::test_rm_gallery_workflow ⏭️ 1ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_resettable 1ms
tests/trainer/trainer_test.py::BaseTrainerCase::test_trainer 1ms
tests/trainer/trainer_test.py::TestTrainerCountdown::test_trainer 244ms
tests/trainer/trainer_test.py::TestStepAheadAsyncRL::test_trainer 93ms
tests/trainer/trainer_test.py::TestTrainerGSM8K::test_trainer 60ms
tests/trainer/trainer_test.py::TestTrainerSFTWarmupGSM8K::test_trainer 94ms
tests/trainer/trainer_test.py::TestTrainerDPO::test_trainer 45ms
tests/trainer/trainer_test.py::TestFullyAsyncMode::test_fully_async_mode_0_queue 96ms
tests/trainer/trainer_test.py::TestFullyAsyncMode::test_fully_async_mode_1_priority_queue 88ms
tests/utils/plugin_test.py::TestPluginLoader::test_load_plugins 4ms

Github Test Reporter by CTRF 💚

@pan-x-c pan-x-c merged commit 63d4920 into modelscope:main Jul 14, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants