Enhance SFT/DPO reader #226

pan-x-c · 2025-08-26T09:53:45Z

Description

Add Formatter to format SFT/DPO data.
Remove CHATPAIR PromptType
Remove ReadStrategy from buffer reader
Remove some unused fields in FormatConfig

Checklist

Please check the following items before code is ready to be reviewed.

Code has passed all tests
Docstrings have been added/updated in Google Style
Documentation has been updated
Code is ready for review

gemini-code-assist

Summary of Changes

Hello @pan-x-c, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the SFT (Supervised Fine-Tuning) and DPO (Direct Preference Optimization) data reading capabilities by introducing a new, dedicated Formatter abstraction. This refactoring centralizes data processing logic, making the readers more modular and easier to maintain. Key changes include the removal of the CHATPAIR prompt type and the ReadStrategy enum, simplifying the overall data pipeline and configuration. The update also includes general code cleanup and new test coverage for the introduced formatters.

Highlights

New Formatter Abstraction: Introduced a new Formatter abstraction and concrete implementations (SFTMessagesFormatter, DPOPlaintextFormatter, etc.) to centralize and standardize the logic for processing and tokenizing SFT and DPO data.
Data Reader Refactoring: Refactored SFTDataReader and DPODataReader to leverage the new Formatter classes, significantly simplifying their internal logic by delegating data preparation and tokenization.
Removal of CHATPAIR PromptType: Eliminated the CHATPAIR prompt type from the system, streamlining the supported data formats for SFT and DPO tasks.
Removal of ReadStrategy: Removed the ReadStrategy enum and its associated parameters from the BufferReader interface and all its implementations, simplifying data retrieval mechanisms.
Configuration Cleanup: Cleaned up the FormatConfig by removing unused and deprecated fields, improving configuration clarity and maintainability.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

pan-x-c · 2025-08-26T09:55:19Z

/unittest-all

gemini-code-assist

Code Review

This pull request introduces a significant and valuable refactoring by creating Formatter classes to handle SFT and DPO data formatting. This greatly simplifies the data reader implementations and improves code organization. The removal of the deprecated CHATPAIR prompt type and the ReadStrategy further streamlines the codebase. The changes to FormatConfig also contribute to better clarity.

I've found a couple of critical issues in the new file_reader.py that would cause runtime errors, and some minor inconsistencies in the new formatter.py file. Please see my detailed comments for suggestions on how to fix them.

trinity/buffer/reader/file_reader.py

trinity/buffer/schema/formatter.py

pan-x-c · 2025-08-26T10:32:44Z

/unittest-all

github-actions · 2025-08-26T11:05:49Z

Summary

Tests 📝	Passed ✅	Failed ❌	Skipped ⏭️	Other ❓	Flaky 🍂	Duration ⏱️
117	117	0	0	0	0	1.9s

Tests

Test Name	Status	Duration
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_duplicate_grpo	✅	1ms
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_grpo_advantage	✅	1ms
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_grpo_correct_bias	✅	1ms
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_grpo_reward_std	✅	1ms
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_step_wise_grpo_advantage	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_dpo_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_gspo_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_mix_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_opmd_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_ppo_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_sft_policy_loss	✅	1ms
tests/buffer/experience_pipeline_test.py::TestExperiencePipeline::test_experience_pipeline	✅	11ms
tests/buffer/file_test.py::TestFileBuffer::test_file_buffer	✅	2ms
tests/buffer/file_test.py::TestFileBuffer::test_file_reader	✅	1ms
tests/buffer/file_test.py::TestFileBuffer::test_file_writer	✅	2ms
tests/buffer/formatter_test.py::TestFormatter::test_dpo_messages_formatter	✅	1ms
tests/buffer/formatter_test.py::TestFormatter::test_dpo_plaintext_formatter	✅	1ms
tests/buffer/formatter_test.py::TestFormatter::test_sft_messages_formatter	✅	1ms
tests/buffer/formatter_test.py::TestFormatter::test_sft_plaintext_formatter	✅	1ms
tests/buffer/queue_test.py::TestQueueBuffer::test_priority_queue_buffer_reuse	✅	7ms
tests/buffer/queue_test.py::TestQueueBuffer::test_priority_queue_capacity	✅	3ms
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_0_queue	✅	4ms
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_1_priority_queue	✅	4ms
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_capacity	✅	5ms
tests/buffer/reward_shaping_mapper_test.py::TestRewardShapingMapper::test_basic_usage	✅	1ms
tests/buffer/sql_test.py::TestSQLBuffer::test_create_sql_buffer	✅	4ms
tests/cli/launcher_test.py::TestLauncherMain::test_main_run_command	✅	1ms
tests/cli/launcher_test.py::TestLauncherMain::test_main_run_in_dlc	✅	1ms
tests/cli/launcher_test.py::TestLauncherMain::test_main_studio_command	✅	1ms
tests/common/config_test.py::TestConfig::test_all_examples_are_valid	✅	1ms
tests/common/config_test.py::TestConfig::test_config_flatten	✅	1ms
tests/common/config_test.py::TestConfig::test_continue_from_checkpoint_is_valid	✅	1ms
tests/common/config_test.py::TestConfig::test_load_default_config	✅	4ms
tests/common/experience_test.py::TestEID::test_eid_properties	✅	1ms
tests/common/experience_test.py::TestExperience::test_action_mask_and_logprobs_type	✅	1ms
tests/common/experience_test.py::TestExperience::test_assertions	✅	1ms
tests/common/experience_test.py::TestExperience::test_dpo_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_gather	✅	1ms
tests/common/experience_test.py::TestExperience::test_hf_datasets_conversion	✅	1ms
tests/common/experience_test.py::TestExperience::test_multi_turn_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_serialize_deserialize	✅	1ms
tests/common/experience_test.py::TestExperience::test_single_turn_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_to_dict	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_batch_conversion	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_dpo_experience_batch_conversion	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_experience_model_experience_conversion	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_gather_experiences_with_custom_fields	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_multiturn_experience_batch_converstion	✅	1ms
tests/common/vllm_test.py::ModelWrapperTest_0::test_generate	✅	37ms
tests/common/vllm_test.py::ModelWrapperTest_1::test_generate	✅	16ms
tests/common/vllm_test.py::ModelWrapperTest_2::test_generate	✅	16ms
tests/common/vllm_test.py::ModelWrapperTest_3::test_generate	✅	54ms
tests/common/vllm_test.py::ModelWrapperTest_4::test_generate	✅	49ms
tests/common/vllm_test.py::ModelWrapperTest_5::test_generate	✅	35ms
tests/common/vllm_test.py::ModelWrapperTest_6::test_generate	✅	46ms
tests/common/vllm_test.py::TestAPIServer::test_api	✅	25ms
tests/common/vllm_test.py::TestTokenizer::test_assistant_token_mask	✅	1ms
tests/common/vllm_test.py::TestAPIServerToolCall_0_deepseek_r1::test_api_tool_calls	✅	21ms
tests/common/vllm_test.py::TestAPIServerToolCall_1::test_api_tool_calls	✅	19ms
tests/explorer/explorer_test.py::BaseExplorerCase::test_explorer	✅	1ms
tests/explorer/explorer_test.py::TestExplorerCountdownEval::test_explorer	✅	66ms
tests/explorer/explorer_test.py::TestExplorerCountdownNoEval::test_explorer	✅	51ms
tests/explorer/explorer_test.py::TestExplorerGSM8k::test_explorer	✅	199ms
tests/explorer/scheduler_test.py::SchedulerTest::test_concurrent_operations	✅	4ms
tests/explorer/scheduler_test.py::SchedulerTest::test_get_results	✅	19ms
tests/explorer/scheduler_test.py::SchedulerTest::test_multi_step_execution	✅	4ms
tests/explorer/scheduler_test.py::SchedulerTest::test_non_repeatable_workflow	✅	4ms
tests/explorer/scheduler_test.py::SchedulerTest::test_scheduler_all_methods	✅	14ms
tests/explorer/scheduler_test.py::SchedulerTest::test_scheduler_restart_after_stop	✅	7ms
tests/explorer/scheduler_test.py::SchedulerTest::test_split_tasks	✅	7ms
tests/explorer/scheduler_test.py::SchedulerTest::test_stepwise_experience_eid	✅	4ms
tests/explorer/scheduler_test.py::SchedulerTest::test_wait_all	✅	7ms
tests/explorer/scheduler_test.py::SchedulerTest::test_wait_all_timeout_with_multi_batch	✅	13ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_reward_propagation_workflow	✅	1ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_step_wise_reward_workflow	✅	1ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_workflows_raise_error	✅	1ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_workflows_stop_at_max_env_steps	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_gsm8k_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_boxed_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_complex_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_eval_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_fraction_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_rm_gallery_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_repeatable	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_resettable	✅	1ms
tests/manager/synchronizer_test.py::TestSynchronizerExit::test_synchronizer	✅	29ms
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_0::test_synchronizer	✅	65ms
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_1::test_synchronizer	✅	65ms
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_2::test_synchronizer	✅	98ms
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_3::test_synchronizer	✅	87ms
tests/manager/synchronizer_test.py::TestNCCLBasedSynchronizer_0::test_synchronizer	✅	51ms
tests/manager/synchronizer_test.py::TestNCCLBasedSynchronizer_1::test_synchronizer	✅	52ms
tests/service/data_juicer_test.py::TestDataJuicer::test_config	✅	1ms
tests/service/data_juicer_test.py::TestDataJuicer::test_server_start	✅	21ms
tests/service/data_juicer_test.py::TestDataJuicerExperiencePipeline::test_data_juicer_operators	✅	21ms
tests/service/data_juicer_test.py::TestDataJuicerTaskPipeline::test_data_juicer_task_pipeline	✅	14ms
tests/trainer/trainer_test.py::TestTrainerCountdown::test_trainer	✅	137ms
tests/trainer/trainer_test.py::TestStepAheadAsyncRL::test_trainer	✅	53ms
tests/trainer/trainer_test.py::TestTrainerGSM8K_0_fsdp::test_trainer	✅	48ms
tests/trainer/trainer_test.py::TestTrainerGSM8K_1_fsdp2::test_trainer	✅	46ms
tests/trainer/trainer_test.py::TestTrainerSFTWarmupGSM8K::test_trainer	✅	52ms
tests/trainer/trainer_test.py::TestTrainerDPO::test_trainer	✅	32ms
tests/trainer/trainer_test.py::TestTrainerSFT::test_trainer	✅	29ms
tests/trainer/trainer_test.py::TestTrainerToolsSFT::test_trainer_tools	✅	30ms
tests/trainer/trainer_test.py::TestFullyAsyncMode::test_fully_async_mode_0_queue	✅	68ms
tests/trainer/trainer_test.py::TestFullyAsyncMode::test_fully_async_mode_1_priority_queue	✅	67ms
tests/trainer/trainer_test.py::TestTrainerMIX::test_trainer	✅	57ms
tests/utils/eval_utils_test.py::TestMathEvalUtils::test_extract_answer	✅	1ms
tests/utils/eval_utils_test.py::TestMathEvalUtils::test_verify_math_answer	✅	1ms
tests/utils/eval_utils_test.py::TestEvalUtils::test_is_equiv	✅	1ms
tests/utils/log_test.py::LogTest::test_actor_log	✅	2ms
tests/utils/log_test.py::LogTest::test_group_by_node	✅	2ms
tests/utils/log_test.py::LogTest::test_no_actor_log	✅	1ms
tests/utils/plugin_test.py::TestPluginLoader::test_load_plugins_local	✅	1ms
tests/utils/plugin_test.py::TestPluginLoader::test_load_plugins_remote	✅	6ms
tests/utils/plugin_test.py::TestPluginLoader::test_passing_custom_class	✅	3ms

Github Test Reporter by CTRF 💚

pan-x-c · 2025-08-27T02:36:00Z

/unittest-module-cli

github-actions · 2025-08-27T02:36:56Z

Summary

Tests 📝	Passed ✅	Failed ❌	Skipped ⏭️	Other ❓	Flaky 🍂	Duration ⏱️
3	3	0	0	0	0	6ms

Tests

Test Name	Status	Duration
tests/cli/launcher_test.py::TestLauncherMain::test_main_run_command	✅	2ms
tests/cli/launcher_test.py::TestLauncherMain::test_main_run_in_dlc	✅	1ms
tests/cli/launcher_test.py::TestLauncherMain::test_main_studio_command	✅	1ms

Github Test Reporter by CTRF 💚

Copilot

Pull Request Overview

This PR enhances the SFT/DPO data reader by introducing a new Formatter abstraction and cleaning up configuration classes. The changes focus on simplifying data formatting for both Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) workflows while removing deprecated functionality.

Adds new Formatter classes to handle SFT/DPO data formatting with better separation of concerns
Removes deprecated CHATPAIR PromptType and ReadStrategy enum from constants
Cleans up FormatConfig by removing unused fields and reorganizing remaining ones

Reviewed Changes

Copilot reviewed 15 out of 16 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
trinity/common/constants.py	Removes deprecated imports, `CHATPAIR` enum value, and entire `ReadStrategy` enum
trinity/common/config.py	Reorganizes and removes unused fields from `FormatConfig` and `DataProcessorConfig`
trinity/cli/launcher.py	Moves import and removes unused `log_level` parameter from `run` function
trinity/buffer/schema/formatter.py	Introduces new `Formatter` abstract class and concrete implementations for SFT/DPO data
trinity/buffer/reader/sql_reader.py	Removes `ReadStrategy` parameter from read methods
trinity/buffer/reader/queue_reader.py	Removes `ReadStrategy` parameter and related validation logic
trinity/buffer/reader/file_reader.py	Refactors to use new formatter classes and removes inline formatting logic
trinity/buffer/ray_wrapper.py	Removes `ReadStrategy` parameter and simplifies read logic to use LFU only
trinity/buffer/buffer_reader.py	Removes `ReadStrategy` parameter from abstract interface
tests/cli/launcher_test.py	Adds comprehensive tests for launcher functionality
tests/buffer/formatter_test.py	Adds tests for all new formatter classes
examples/dpo_humanlike/dpo.yaml	Updates comment to remove reference to deprecated `chatpair` option
docs/sphinx_doc/source/tutorial/trinity_configs.md	Updates documentation to reflect config changes
docs/sphinx_doc/source/tutorial/example_reasoning_basic.md	Removes `chatpair` from documentation
docs/sphinx_doc/source/tutorial/example_dpo.md	Removes `chatpair` from documentation

Comments suppressed due to low confidence (2)

trinity/buffer/schema/formatter.py:1

The variable self.prompt_type is not defined in this class. It should be meta.format.prompt_type based on the context.

from abc import ABC, abstractmethod

trinity/buffer/reader/file_reader.py:1

The _get_assistant_message method is no longer used after the refactoring to use formatters. This method should be removed to avoid dead code.

"""Filed based buffer reader."""

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

trinity/buffer/reader/file_reader.py

pan-x-c added 3 commits August 26, 2025 17:11

add formatter

72073bd

fix tests

e306220

add launcher tests

eb80aa1

gemini-code-assist bot reviewed Aug 26, 2025

View reviewed changes

trinity/buffer/reader/file_reader.py Outdated Show resolved Hide resolved

trinity/buffer/reader/file_reader.py Outdated Show resolved Hide resolved

trinity/buffer/schema/formatter.py Outdated Show resolved Hide resolved

trinity/buffer/schema/formatter.py Outdated Show resolved Hide resolved

pan-x-c added 2 commits August 26, 2025 18:06

fix reader

373b2fd

add docs

7af4e7a

pan-x-c requested a review from Copilot August 26, 2025 11:22

This comment was marked as outdated.

Sign in to view

fix comments

58c1b7c

hiyuchang approved these changes Aug 27, 2025

View reviewed changes

pan-x-c requested a review from Copilot August 27, 2025 02:37

Copilot AI reviewed Aug 27, 2025

View reviewed changes

trinity/buffer/reader/file_reader.py Show resolved Hide resolved

pan-x-c merged commit eebced3 into modelscope:main Aug 27, 2025
2 checks passed

pan-x-c mentioned this pull request Aug 27, 2025

fix sft example #212

Closed

4 tasks

Enhance SFT/DPO reader #226

Enhance SFT/DPO reader #226

Uh oh!

Conversation

pan-x-c commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

pan-x-c commented Aug 26, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pan-x-c commented Aug 26, 2025

Uh oh!

github-actions bot commented Aug 26, 2025

Summary

Tests

Uh oh!

This comment was marked as outdated.

Uh oh!

pan-x-c commented Aug 27, 2025

Uh oh!

github-actions bot commented Aug 27, 2025

Summary

Tests

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pan-x-c commented Aug 26, 2025 •

edited

Loading