modelscope · pan-x-c · Jul 25, 2025 · Jul 24, 2025 · Jul 24, 2025 · Jul 24, 2025
diff --git a/docs/sphinx_doc/source/api_reference.rst b/docs/sphinx_doc/source/api_reference.rst
@@ -0,0 +1,18 @@
+.. _api-reference:
+
+API Reference
+=============
+
+This page shows some useful APIs of Trinity-RFT. Click the API name to see the detailed documentation.
+
+.. toctree::
+   :maxdepth: 1
+   :glob:
+
+   build_api/trinity.buffer
+   build_api/trinity.explorer
+   build_api/trinity.trainer
+   build_api/trinity.algorithm
+   build_api/trinity.manager
+   build_api/trinity.common
+   build_api/trinity.utils
diff --git a/docs/sphinx_doc/source/index.rst b/docs/sphinx_doc/source/index.rst
@@ -35,19 +35,14 @@ Welcome to Trinity-RFT's documentation!
 
 .. toctree::
    :maxdepth: 2
+   :hidden:
    :caption: FAQ
 
    tutorial/faq.md
 
 .. toctree::
-   :maxdepth: 1
-   :glob:
+   :maxdepth: 2
+   :hidden:
    :caption: API Reference
 
-   build_api/trinity.buffer
-   build_api/trinity.explorer
-   build_api/trinity.trainer
-   build_api/trinity.algorithm
-   build_api/trinity.manager
-   build_api/trinity.common
-   build_api/trinity.utils
+   api_reference
diff --git a/docs/sphinx_doc/source/main.md b/docs/sphinx_doc/source/main.md
@@ -6,7 +6,6 @@
 # Trinity-RFT: A General-Purpose and Unified Framework for Reinforcement Fine-Tuning of Large Language Models
 
 
-
 ## 🚀 News
 
 * [2025-07] Trinity-RFT v0.2.0 is released.
@@ -82,6 +81,7 @@ It is designed to support diverse application scenarios and serve as a unified p
 ![Trinity-RFT-data-pipelines](../assets/trinity-data-pipelines.png)
 
 </details>
+<br>
 
 
 
@@ -90,12 +90,12 @@ It is designed to support diverse application scenarios and serve as a unified p
 
 * **Adaptation to New Scenarios:**
 
-  Implement agent-environment interaction logic in a single `Workflow` or `MultiTurnWorkflow` class.  ([Example](./docs/sphinx_doc/source/tutorial/example_multi_turn.md))
+  Implement agent-environment interaction logic in a single `Workflow` or `MultiTurnWorkflow` class.  ([Example](/tutorial/example_multi_turn.md))
 
 
 * **RL Algorithm Development:**
 
-  Develop custom RL algorithms (loss design, sampling, data processing) in compact, plug-and-play classes.  ([Example](./docs/sphinx_doc/source/tutorial/example_mix_algo.md))
+  Develop custom RL algorithms (loss design, sampling, data processing) in compact, plug-and-play classes.  ([Example](/tutorial/example_mix_algo.md))
 
 
 * **Low-Code Usage:**
@@ -301,39 +301,39 @@ For studio users, click "Run" in the web interface.
 
 Tutorials for running different RFT modes:
 
-+ [Quick example: GRPO on GSM8k](./docs/sphinx_doc/source/tutorial/example_reasoning_basic.md)
-+ [Off-policy RFT](./docs/sphinx_doc/source/tutorial/example_reasoning_advanced.md)
-+ [Fully asynchronous RFT](./docs/sphinx_doc/source/tutorial/example_async_mode.md)
-+ [Offline learning by DPO or SFT](./docs/sphinx_doc/source/tutorial/example_dpo.md)
++ [Quick example: GRPO on GSM8k](/tutorial/example_reasoning_basic.md)
++ [Off-policy RFT](/tutorial/example_reasoning_advanced.md)
++ [Fully asynchronous RFT](/tutorial/example_async_mode.md)
++ [Offline learning by DPO or SFT](/tutorial/example_dpo.md)
 
 
 Tutorials for adapting Trinity-RFT to a new multi-turn agentic scenario:
 
-+ [Multi-turn tasks](./docs/sphinx_doc/source/tutorial/example_multi_turn.md)
++ [Multi-turn tasks](/tutorial/example_multi_turn.md)
 
 
 Tutorials for data-related functionalities:
 
-+ [Advanced data processing & human-in-the-loop](./docs/sphinx_doc/source/tutorial/example_data_functionalities.md)
++ [Advanced data processing & human-in-the-loop](/tutorial/example_data_functionalities.md)
 
 
 Tutorials for RL algorithm development/research with Trinity-RFT:
 
-+ [RL algorithm development with Trinity-RFT](./docs/sphinx_doc/source/tutorial/example_mix_algo.md)
++ [RL algorithm development with Trinity-RFT](/tutorial/example_mix_algo.md)
 
 
-Guidelines for full configurations: see [this document](./docs/sphinx_doc/source/tutorial/trinity_configs.md)
+Guidelines for full configurations: see [this document](/tutorial/trinity_configs.md)
 
 
 Guidelines for developers and researchers:
 
-+ [Build new RL scenarios](./docs/sphinx_doc/source/tutorial/trinity_programming_guide.md#workflows-for-rl-environment-developers)
-+ [Implement new RL algorithms](./docs/sphinx_doc/source/tutorial/trinity_programming_guide.md#algorithms-for-rl-algorithm-developers)
++ [Build new RL scenarios](/tutorial/trinity_programming_guide.md#workflows-for-rl-environment-developers)
++ [Implement new RL algorithms](/tutorial/trinity_programming_guide.md#algorithms-for-rl-algorithm-developers)
 
 
 
 
-For some frequently asked questions, see [FAQ](./docs/sphinx_doc/source/tutorial/faq.md).
+For some frequently asked questions, see [FAQ](/tutorial/faq.md).
 
 
 

diff --git a/docs/sphinx_doc/source/tutorial/example_data_functionalities.md b/docs/sphinx_doc/source/tutorial/example_data_functionalities.md
@@ -8,7 +8,7 @@ In this example, you will learn how to apply the data processor of Trinity-RFT t
 2. how to configure the data processor
 3. what the data processor can do
 
-Before getting started, you need to prepare the main environment of Trinity-RFT according to the [installation section of the README file](../main.md),
+Before getting started, you need to prepare the main environment of Trinity-RFT according to the [installation section of Quickstart](example_reasoning_basic.md),
 and store the base url and api key in the environment variables `OPENAI_BASE_URL` and `OPENAI_API_KEY` for some agentic or API-model usages if necessary.
 
 ### Data Preparation

diff --git a/docs/sphinx_doc/source/tutorial/example_multi_turn.md b/docs/sphinx_doc/source/tutorial/example_multi_turn.md
@@ -14,7 +14,37 @@ To run the ALFworld and WebShop env, you need to setup the corresponding environ
 - ALFworld is a text-based interactive environment that simulates household scenarios. Agents need to understand natural language instructions and complete various domestic tasks like finding objects, moving items, and operating devices in a virtual home environment.
 - WebShop is a simulated online shopping environment where AI agents learn to shop based on user requirements. The platform allows agents to browse products, compare options, and make purchase decisions, mimicking real-world e-commerce interactions.
 
-You may refer to their original environment to complete the setup.
+<br>
+<details>
+<summary>Guidelines for preparing ALFWorld environment</summary>
+
+1. Pip install: `pip install alfworld[full]`
+
+2. Export the path: `export ALFWORLD_DATA=/path/to/alfworld/data`
+
+3. Download the environment: `alfworld-download`
+
+Now you can find the environment in `$ALFWORLD_DATA` and continue with the following steps.
+</details>
+
+<details>
+<summary>Guidelines for preparing WebShop environment</summary>
+
+1. Install Python 3.8.13
+
+2. Install Java
+
+3. Download the source code: `git clone https://github.com/princeton-nlp/webshop.git webshop`
+
+4. Create a virtual environment: `conda create -n webshop python=3.8.13` and `conda activate webshop`
+
+5. Install requirements into the `webshop` virtual environment via the `setup.sh` script: `./setup.sh [-d small|all]`
+
+Now you can continue with the following steps.
+</details>
+<br>
+
+You may refer to their original environment for more details.
 - For ALFWorld, refer to the [ALFWorld](https://github.com/alfworld/alfworld) repository.
 - For WebShop, refer to the [WebShop](https://github.com/princeton-nlp/WebShop) repository.