Skip to content

Conversation

@HYLcool
Copy link
Collaborator

@HYLcool HYLcool commented Jun 30, 2025

Description

  • Support activating the experience pipeline between the explorer and trainer, which helps to reshape the rewards, augment the responses of experiences from the explorer, and send them to the trainer.
    • A new example on gsm8k with the experience pipeline for reward shaping is available. A basic and naive reward shaping arch is added.
    • A new service route is added for the experience pipeline, which is active for the whole training procedure. A stop_all route is added to stop all services when training is finished.
    • Users only need to specify the names of input/output buffers for the experience pipeline, which are derived from the existing buffers in the explorer or trainer during config checking.
  • Others:
    • Add explicit type casting after initializing the data class.
    • Remove the old and useless schema code.

Checklist

Please check the following items before code is ready to be reviewed.

  • Code has passed all tests
  • Docstrings have been added/updated in Google Style
  • Documentation has been updated
  • Code is ready for review

@HYLcool HYLcool self-assigned this Jun 30, 2025
@HYLcool HYLcool added the enhancement New feature or request label Jun 30, 2025
@HYLcool
Copy link
Collaborator Author

HYLcool commented Jun 30, 2025

Unittests for data module are all passed.

@HYLcool
Copy link
Collaborator Author

HYLcool commented Jun 30, 2025

/run-unittest

@github-actions
Copy link

Summary

Tests 📝 Passed ✅ Failed ❌ Skipped ⏭️ Pending ⏳ Other ❓ Flaky 🍂 Duration ⏱️
40 39 1 0 0 0 0 1.2s

Failed Tests

Failed Tests ❌ Fail Message
❌ tests/common/config_test.py::TestConfig::test_all_examples_are_valid The test failed in the call phase due to an exception

Flaky Tests

No flaky tests ✨

Skipped

No skipped tests ✨

Tests

Test Name Status Flaky Duration
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_dpo_policy_loss 1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_mix_policy_loss 1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_opmd_policy_loss 1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_ppo_policy_loss 1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_sft_policy_loss 1ms
tests/buffer/file_test.py::TestFileBuffer::test_file_buffer 3ms
tests/buffer/file_test.py::TestFileBuffer::test_file_reader 1ms
tests/buffer/file_test.py::TestFileBuffer::test_file_writer 2ms
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer 3ms
tests/buffer/sql_test.py::TestSQLBuffer::test_create_sql_buffer 3ms
tests/common/config_test.py::TestConfig::test_all_examples_are_valid 1ms
tests/common/config_test.py::TestConfig::test_load_default_config 4ms
tests/common/experience_test.py::TestExperienceConversion::test_batch_conversion 1ms
tests/common/experience_test.py::TestExperienceConversion::test_experience_model_experience_conversion 1ms
tests/common/vllm_test.py::TestModelWrapperSyncV0::test_generate 42ms
tests/common/vllm_test.py::TestModelWrapperAsyncV0::test_generate 40ms
tests/common/vllm_test.py::TestModelWrapperAsyncTPV0::test_generate 49ms
tests/common/vllm_test.py::TestModelWrapperAsyncTPV1::test_generate 52ms
tests/common/vllm_test.py::TestModelWrapperAsyncV1::test_generate 38ms
tests/common/vllm_test.py::TestAPIServer::test_api 24ms
tests/common/vllm_test.py::TestTokenizer::test_assistant_token_mask 1ms
tests/explorer/explorer_test.py::BaseExplorerCase::test_explorer 1ms
tests/explorer/explorer_test.py::TestExplorerCountdownEval::test_explorer 119ms
tests/explorer/explorer_test.py::TestExplorerCountdownNoEval::test_explorer 124ms
tests/explorer/runner_pool_test.py::RunnerPoolTest::test_runner_pool 22ms
tests/explorer/runner_pool_test.py::RunnerPoolTest::test_runner_pool_with_auxiliary_models 4ms
tests/explorer/workflow_test.py::WorkflowTest::test_gsm8k_workflow 1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_boxed_workflow 1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_complex_workflow 1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_fraction_workflow 1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_workflow 1ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_resettable 1ms
tests/trainer/trainer_test.py::BaseTrainerCase::test_trainer 1ms
tests/trainer/trainer_test.py::TestTrainerCountdown::test_trainer 264ms
tests/trainer/trainer_test.py::TestStepAheadAsyncRL::test_trainer 93ms
tests/trainer/trainer_test.py::TestTrainerGSM8K::test_trainer 67ms
tests/trainer/trainer_test.py::TestTrainerSFTWarmupGSM8K::test_trainer 70ms
tests/trainer/trainer_test.py::TestTrainerDPO::test_trainer 53ms
tests/trainer/trainer_test.py::TestFullyAsyncMode::test_fully_async_mode 113ms
tests/utils/plugin_test.py::TestPluginLoader::test_load_plugins 4ms

Github Test Reporter by CTRF 💚

* set ray_namespace for StorageConfigs to the global ray_namespace if they are not set
@HYLcool
Copy link
Collaborator Author

HYLcool commented Jun 30, 2025

/run-unittest

@github-actions
Copy link

Summary

Tests 📝 Passed ✅ Failed ❌ Skipped ⏭️ Pending ⏳ Other ❓ Flaky 🍂 Duration ⏱️
40 40 0 0 0 0 0 1.2s

Failed Tests

No failed tests ✨

Flaky Tests

No flaky tests ✨

Skipped

No skipped tests ✨

Tests

Test Name Status Flaky Duration
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_dpo_policy_loss 1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_mix_policy_loss 1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_opmd_policy_loss 1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_ppo_policy_loss 1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_sft_policy_loss 1ms
tests/buffer/file_test.py::TestFileBuffer::test_file_buffer 3ms
tests/buffer/file_test.py::TestFileBuffer::test_file_reader 1ms
tests/buffer/file_test.py::TestFileBuffer::test_file_writer 2ms
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer 2ms
tests/buffer/sql_test.py::TestSQLBuffer::test_create_sql_buffer 3ms
tests/common/config_test.py::TestConfig::test_all_examples_are_valid 1ms
tests/common/config_test.py::TestConfig::test_load_default_config 4ms
tests/common/experience_test.py::TestExperienceConversion::test_batch_conversion 1ms
tests/common/experience_test.py::TestExperienceConversion::test_experience_model_experience_conversion 1ms
tests/common/vllm_test.py::TestModelWrapperSyncV0::test_generate 41ms
tests/common/vllm_test.py::TestModelWrapperAsyncV0::test_generate 40ms
tests/common/vllm_test.py::TestModelWrapperAsyncTPV0::test_generate 50ms
tests/common/vllm_test.py::TestModelWrapperAsyncTPV1::test_generate 51ms
tests/common/vllm_test.py::TestModelWrapperAsyncV1::test_generate 38ms
tests/common/vllm_test.py::TestAPIServer::test_api 24ms
tests/common/vllm_test.py::TestTokenizer::test_assistant_token_mask 1ms
tests/explorer/explorer_test.py::BaseExplorerCase::test_explorer 1ms
tests/explorer/explorer_test.py::TestExplorerCountdownEval::test_explorer 123ms
tests/explorer/explorer_test.py::TestExplorerCountdownNoEval::test_explorer 120ms
tests/explorer/runner_pool_test.py::RunnerPoolTest::test_runner_pool 22ms
tests/explorer/runner_pool_test.py::RunnerPoolTest::test_runner_pool_with_auxiliary_models 4ms
tests/explorer/workflow_test.py::WorkflowTest::test_gsm8k_workflow 1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_boxed_workflow 1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_complex_workflow 1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_fraction_workflow 1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_workflow 1ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_resettable 1ms
tests/trainer/trainer_test.py::BaseTrainerCase::test_trainer 1ms
tests/trainer/trainer_test.py::TestTrainerCountdown::test_trainer 260ms
tests/trainer/trainer_test.py::TestStepAheadAsyncRL::test_trainer 98ms
tests/trainer/trainer_test.py::TestTrainerGSM8K::test_trainer 72ms
tests/trainer/trainer_test.py::TestTrainerSFTWarmupGSM8K::test_trainer 69ms
tests/trainer/trainer_test.py::TestTrainerDPO::test_trainer 52ms
tests/trainer/trainer_test.py::TestFullyAsyncMode::test_fully_async_mode 115ms
tests/utils/plugin_test.py::TestPluginLoader::test_load_plugins 4ms

Github Test Reporter by CTRF 💚

@pan-x-c
Copy link
Collaborator

pan-x-c commented Jun 30, 2025

LGTM

@yxdyc yxdyc requested a review from lingzhq June 30, 2025 09:30
Copy link
Collaborator

@yxdyc yxdyc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only two minor issues, others LGTM. After this PR, can we reply this pending issue?

@HYLcool
Copy link
Collaborator Author

HYLcool commented Jun 30, 2025

Only two minor issues, others LGTM. After this PR, can we reply this pending issue?

Not yet. This implementation is only integrated into RftDataset, but it could be transformed into Data-Juicer along with the buffer module in Trinity.

@yxdyc yxdyc merged commit 339d658 into main Jun 30, 2025
3 checks passed
@HYLcool HYLcool deleted the feat/exp_pipeline branch June 30, 2025 12:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants