Support Experience Pipeline #105

HYLcool · 2025-06-30T06:35:42Z

Description

Support activating the experience pipeline between the explorer and trainer, which helps to reshape the rewards, augment the responses of experiences from the explorer, and send them to the trainer.
- A new example on gsm8k with the experience pipeline for reward shaping is available. A basic and naive reward shaping arch is added.
- A new service route is added for the experience pipeline, which is active for the whole training procedure. A stop_all route is added to stop all services when training is finished.
- Users only need to specify the names of input/output buffers for the experience pipeline, which are derived from the existing buffers in the explorer or trainer during config checking.
Others:
- Add explicit type casting after initializing the data class.
- Remove the old and useless schema code.

Checklist

Please check the following items before code is ready to be reviewed.

Code has passed all tests
Docstrings have been added/updated in Google Style
Documentation has been updated
Code is ready for review

# Conflicts: # trinity/cli/launcher.py

+ add a new route for stopping async pipelines

* check and update configs in the data module server

* explicitly cast type after init of Experience

+ release output buffer after the active iterator is finished

HYLcool · 2025-06-30T06:38:05Z

Unittests for data module are all passed.

HYLcool · 2025-06-30T06:41:12Z

/run-unittest

github-actions · 2025-06-30T07:01:58Z

Summary

Tests 📝	Passed ✅	Failed ❌	Skipped ⏭️	Pending ⏳	Other ❓	Flaky 🍂	Duration ⏱️
40	39	1	0	0	0	0	1.2s

Failed Tests

Failed Tests ❌	Fail Message
❌ tests/common/config_test.py::TestConfig::test_all_examples_are_valid	The test failed in the call phase due to an exception

Flaky Tests

No flaky tests ✨

Skipped

No skipped tests ✨

Tests

Test Name	Status	Duration
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_dpo_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_mix_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_opmd_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_ppo_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_sft_policy_loss	✅	1ms
tests/buffer/file_test.py::TestFileBuffer::test_file_buffer	✅	3ms
tests/buffer/file_test.py::TestFileBuffer::test_file_reader	✅	1ms
tests/buffer/file_test.py::TestFileBuffer::test_file_writer	✅	2ms
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer	✅	3ms
tests/buffer/sql_test.py::TestSQLBuffer::test_create_sql_buffer	✅	3ms
tests/common/config_test.py::TestConfig::test_all_examples_are_valid	❌	1ms
tests/common/config_test.py::TestConfig::test_load_default_config	✅	4ms
tests/common/experience_test.py::TestExperienceConversion::test_batch_conversion	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_experience_model_experience_conversion	✅	1ms
tests/common/vllm_test.py::TestModelWrapperSyncV0::test_generate	✅	42ms
tests/common/vllm_test.py::TestModelWrapperAsyncV0::test_generate	✅	40ms
tests/common/vllm_test.py::TestModelWrapperAsyncTPV0::test_generate	✅	49ms
tests/common/vllm_test.py::TestModelWrapperAsyncTPV1::test_generate	✅	52ms
tests/common/vllm_test.py::TestModelWrapperAsyncV1::test_generate	✅	38ms
tests/common/vllm_test.py::TestAPIServer::test_api	✅	24ms
tests/common/vllm_test.py::TestTokenizer::test_assistant_token_mask	✅	1ms
tests/explorer/explorer_test.py::BaseExplorerCase::test_explorer	✅	1ms
tests/explorer/explorer_test.py::TestExplorerCountdownEval::test_explorer	✅	119ms
tests/explorer/explorer_test.py::TestExplorerCountdownNoEval::test_explorer	✅	124ms
tests/explorer/runner_pool_test.py::RunnerPoolTest::test_runner_pool	✅	22ms
tests/explorer/runner_pool_test.py::RunnerPoolTest::test_runner_pool_with_auxiliary_models	✅	4ms
tests/explorer/workflow_test.py::WorkflowTest::test_gsm8k_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_boxed_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_complex_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_fraction_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_resettable	✅	1ms
tests/trainer/trainer_test.py::BaseTrainerCase::test_trainer	✅	1ms
tests/trainer/trainer_test.py::TestTrainerCountdown::test_trainer	✅	264ms
tests/trainer/trainer_test.py::TestStepAheadAsyncRL::test_trainer	✅	93ms
tests/trainer/trainer_test.py::TestTrainerGSM8K::test_trainer	✅	67ms
tests/trainer/trainer_test.py::TestTrainerSFTWarmupGSM8K::test_trainer	✅	70ms
tests/trainer/trainer_test.py::TestTrainerDPO::test_trainer	✅	53ms
tests/trainer/trainer_test.py::TestFullyAsyncMode::test_fully_async_mode	✅	113ms
tests/utils/plugin_test.py::TestPluginLoader::test_load_plugins	✅	4ms

Github Test Reporter by CTRF 💚

* set ray_namespace for StorageConfigs to the global ray_namespace if they are not set

HYLcool · 2025-06-30T07:08:50Z

/run-unittest

github-actions · 2025-06-30T07:29:45Z

Summary

Tests 📝	Passed ✅	Failed ❌	Skipped ⏭️	Pending ⏳	Other ❓	Flaky 🍂	Duration ⏱️
40	40	0	0	0	0	0	1.2s

Failed Tests

No failed tests ✨

Flaky Tests

No flaky tests ✨

Skipped

No skipped tests ✨

Tests

Test Name	Status	Duration
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_dpo_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_mix_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_opmd_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_ppo_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_sft_policy_loss	✅	1ms
tests/buffer/file_test.py::TestFileBuffer::test_file_buffer	✅	3ms
tests/buffer/file_test.py::TestFileBuffer::test_file_reader	✅	1ms
tests/buffer/file_test.py::TestFileBuffer::test_file_writer	✅	2ms
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer	✅	2ms
tests/buffer/sql_test.py::TestSQLBuffer::test_create_sql_buffer	✅	3ms
tests/common/config_test.py::TestConfig::test_all_examples_are_valid	✅	1ms
tests/common/config_test.py::TestConfig::test_load_default_config	✅	4ms
tests/common/experience_test.py::TestExperienceConversion::test_batch_conversion	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_experience_model_experience_conversion	✅	1ms
tests/common/vllm_test.py::TestModelWrapperSyncV0::test_generate	✅	41ms
tests/common/vllm_test.py::TestModelWrapperAsyncV0::test_generate	✅	40ms
tests/common/vllm_test.py::TestModelWrapperAsyncTPV0::test_generate	✅	50ms
tests/common/vllm_test.py::TestModelWrapperAsyncTPV1::test_generate	✅	51ms
tests/common/vllm_test.py::TestModelWrapperAsyncV1::test_generate	✅	38ms
tests/common/vllm_test.py::TestAPIServer::test_api	✅	24ms
tests/common/vllm_test.py::TestTokenizer::test_assistant_token_mask	✅	1ms
tests/explorer/explorer_test.py::BaseExplorerCase::test_explorer	✅	1ms
tests/explorer/explorer_test.py::TestExplorerCountdownEval::test_explorer	✅	123ms
tests/explorer/explorer_test.py::TestExplorerCountdownNoEval::test_explorer	✅	120ms
tests/explorer/runner_pool_test.py::RunnerPoolTest::test_runner_pool	✅	22ms
tests/explorer/runner_pool_test.py::RunnerPoolTest::test_runner_pool_with_auxiliary_models	✅	4ms
tests/explorer/workflow_test.py::WorkflowTest::test_gsm8k_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_boxed_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_complex_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_fraction_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_resettable	✅	1ms
tests/trainer/trainer_test.py::BaseTrainerCase::test_trainer	✅	1ms
tests/trainer/trainer_test.py::TestTrainerCountdown::test_trainer	✅	260ms
tests/trainer/trainer_test.py::TestStepAheadAsyncRL::test_trainer	✅	98ms
tests/trainer/trainer_test.py::TestTrainerGSM8K::test_trainer	✅	72ms
tests/trainer/trainer_test.py::TestTrainerSFTWarmupGSM8K::test_trainer	✅	69ms
tests/trainer/trainer_test.py::TestTrainerDPO::test_trainer	✅	52ms
tests/trainer/trainer_test.py::TestFullyAsyncMode::test_fully_async_mode	✅	115ms
tests/utils/plugin_test.py::TestPluginLoader::test_load_plugins	✅	4ms

Github Test Reporter by CTRF 💚

pan-x-c · 2025-06-30T09:25:35Z

LGTM

yxdyc

Only two minor issues, others LGTM. After this PR, can we reply this pending issue?

docs/sphinx_doc/source/tutorial/example_data_functionalities.md

HYLcool · 2025-06-30T11:47:45Z

Only two minor issues, others LGTM. After this PR, can we reply this pending issue?

Not yet. This implementation is only integrated into RftDataset, but it could be transformed into Data-Juicer along with the buffer module in Trinity.

HYLcool added 20 commits June 24, 2025 15:55

* prepare the initial config files for exp pipeline

b52809f

+ add basic reward shaping func

1430035

Merge branch 'main' into feat/exp_pipeline

061407c

Merge branch 'main' into feat/exp_pipeline

d8e9331

# Conflicts: # trinity/cli/launcher.py

- remove common.schema

78da769

* allow async exp pipeline

04f64aa

+ add a new route for stopping async pipelines

Merge branch 'main' into feat/exp_pipeline

fe0407f

+ add more logs

56dd112

+ add buffer check and sync for experience pipeline

510b2af

* set several default values for format config

f1f6ba0

* check and update configs in the data module server

* convert experience to dict before converting to dataset

f78b6e7

* fix conversion bugs in dataset

979ab5a

* explicitly cast type after init of Experience

* fix bugs

e359179

* update configs of exp_pipeline

d9d4773

+ init ray in the same namespace for data processor

d9501cf

+ release output buffer after the active iterator is finished

* update example docs for experience pipeline

d16f0a8

* after pre-commit

d5e46f3

Merge branch 'main' into feat/exp_pipeline

a1cdc7f

Merge branch 'main' into feat/exp_pipeline

a1b3b01

* fix dataset buffer logics and tests

55800f6

HYLcool requested review from chenyushuo, hiyuchang, pan-x-c and yxdyc June 30, 2025 06:35

HYLcool self-assigned this Jun 30, 2025

HYLcool added the enhancement New feature or request label Jun 30, 2025

* update ray init method

c10dd93

* set ray_namespace for StorageConfigs to the global ray_namespace if they are not set

* ignore dj configs when checking example validation

17c91aa

HYLcool added 3 commits June 30, 2025 16:32

* move data processor related funcs to data/utils.py

062722f

* after pre-commit

974f3ab

+ add missing docs

60abb01

pan-x-c approved these changes Jun 30, 2025

View reviewed changes

yxdyc requested a review from lingzhq June 30, 2025 09:30

yxdyc reviewed Jun 30, 2025

View reviewed changes

docs/sphinx_doc/source/tutorial/example_data_functionalities.md Outdated Show resolved Hide resolved

docs/sphinx_doc/source/tutorial/example_data_functionalities.md Show resolved Hide resolved

HYLcool added 2 commits June 30, 2025 19:43

+ fix typo and add infos about how to set api keys.

266ba19

* after pre-commit

471a93d

yxdyc approved these changes Jun 30, 2025

View reviewed changes

yxdyc merged commit 339d658 into main Jun 30, 2025
3 checks passed

HYLcool deleted the feat/exp_pipeline branch June 30, 2025 12:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support Experience Pipeline #105

Support Experience Pipeline #105

Uh oh!

HYLcool commented Jun 30, 2025 •

edited

Loading

Uh oh!

HYLcool commented Jun 30, 2025

Uh oh!

HYLcool commented Jun 30, 2025

Uh oh!

github-actions bot commented Jun 30, 2025

Uh oh!

HYLcool commented Jun 30, 2025

Uh oh!

github-actions bot commented Jun 30, 2025

Uh oh!

pan-x-c commented Jun 30, 2025

Uh oh!

yxdyc left a comment

Uh oh!

Uh oh!

Uh oh!

HYLcool commented Jun 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Support Experience Pipeline #105

Support Experience Pipeline #105

Uh oh!

Conversation

HYLcool commented Jun 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Uh oh!

HYLcool commented Jun 30, 2025

Uh oh!

HYLcool commented Jun 30, 2025

Uh oh!

github-actions bot commented Jun 30, 2025

Summary

Failed Tests

Flaky Tests

Skipped

Tests

Uh oh!

HYLcool commented Jun 30, 2025

Uh oh!

github-actions bot commented Jun 30, 2025

Summary

Failed Tests

Flaky Tests

Skipped

Tests

Uh oh!

pan-x-c commented Jun 30, 2025

Uh oh!

yxdyc left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

HYLcool commented Jun 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

HYLcool commented Jun 30, 2025 •

edited

Loading