chore: initial import of standalone agentscope project
Some checks failed
Pre-commit / run (ubuntu-latest) (push) Has been cancelled
Deploy Sphinx documentation to Pages / build_en (ubuntu-latest, 3.10) (push) Has been cancelled
Deploy Sphinx documentation to Pages / build_zh (ubuntu-latest, 3.10) (push) Has been cancelled
Python Unittest Coverage / test (macos-15, 3.10) (push) Has been cancelled
Python Unittest Coverage / test (macos-15, 3.11) (push) Has been cancelled
Python Unittest Coverage / test (macos-15, 3.12) (push) Has been cancelled
Python Unittest Coverage / test (ubuntu-latest, 3.10) (push) Has been cancelled
Python Unittest Coverage / test (ubuntu-latest, 3.11) (push) Has been cancelled
Python Unittest Coverage / test (ubuntu-latest, 3.12) (push) Has been cancelled
Python Unittest Coverage / test (windows-latest, 3.10) (push) Has been cancelled
Python Unittest Coverage / test (windows-latest, 3.11) (push) Has been cancelled
Python Unittest Coverage / test (windows-latest, 3.12) (push) Has been cancelled
Some checks failed
Pre-commit / run (ubuntu-latest) (push) Has been cancelled
Deploy Sphinx documentation to Pages / build_en (ubuntu-latest, 3.10) (push) Has been cancelled
Deploy Sphinx documentation to Pages / build_zh (ubuntu-latest, 3.10) (push) Has been cancelled
Python Unittest Coverage / test (macos-15, 3.10) (push) Has been cancelled
Python Unittest Coverage / test (macos-15, 3.11) (push) Has been cancelled
Python Unittest Coverage / test (macos-15, 3.12) (push) Has been cancelled
Python Unittest Coverage / test (ubuntu-latest, 3.10) (push) Has been cancelled
Python Unittest Coverage / test (ubuntu-latest, 3.11) (push) Has been cancelled
Python Unittest Coverage / test (ubuntu-latest, 3.12) (push) Has been cancelled
Python Unittest Coverage / test (windows-latest, 3.10) (push) Has been cancelled
Python Unittest Coverage / test (windows-latest, 3.11) (push) Has been cancelled
Python Unittest Coverage / test (windows-latest, 3.12) (push) Has been cancelled
This commit is contained in:
247
docs/tutorial/en/src/task_tuner.py
Normal file
247
docs/tutorial/en/src/task_tuner.py
Normal file
@@ -0,0 +1,247 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""
|
||||
.. _tuner:
|
||||
|
||||
Tuner
|
||||
=================
|
||||
|
||||
AgentScope provides the ``tuner`` module for training agent applications using reinforcement learning (RL).
|
||||
This tutorial will guide you through how to leverage the ``tuner`` module to improve agent performance on specific tasks, including:
|
||||
|
||||
- Introducing the core components of the ``tuner`` module
|
||||
- Demonstrating the key code required for the tuning workflow
|
||||
- Showing how to configure and run the tuning process
|
||||
|
||||
Main Components
|
||||
~~~~~~~~~~~~~~~~~~~
|
||||
The ``tuner`` module introduces three core components essential for RL-based agent training:
|
||||
|
||||
- **Task Dataset**: A collection of tasks for training and evaluating the agent.
|
||||
- **Workflow Function**: Encapsulates the agent's logic to be tuned.
|
||||
- **Judge Function**: Evaluates the agent's performance on tasks and provides reward signals for tuning.
|
||||
|
||||
In addition, ``tuner`` provides several configuration classes for customizing the tuning process, including:
|
||||
|
||||
- **TunerModelConfig**: Model configurations for tuning purposes.
|
||||
- **AlgorithmConfig**: Specifies the RL algorithm (e.g., GRPO, PPO) and its parameters.
|
||||
|
||||
Implementation
|
||||
~~~~~~~~~~~~~~~~~~~
|
||||
This section demonstrates how to use ``tuner`` to train a simple math agent.
|
||||
|
||||
Task Dataset
|
||||
--------------------
|
||||
The task dataset contains tasks for training and evaluating your agent.
|
||||
|
||||
You dataset should follow the Huggingface `datasets <https://huggingface.co/docs/datasets/quickstart>`_ format, which can be loaded with ``datasets.load_dataset``. For example:
|
||||
|
||||
.. code-block:: text
|
||||
|
||||
my_dataset/
|
||||
├── train.jsonl # training samples
|
||||
└── test.jsonl # evaluation samples
|
||||
|
||||
Suppose your `train.jsonl` contains:
|
||||
|
||||
.. code-block:: json
|
||||
|
||||
{"question": "What is 2 + 2?", "answer": "4"}
|
||||
{"question": "What is 4 + 4?", "answer": "8"}
|
||||
|
||||
Before starting tuning, you can verify that your dataset is loaded correctly with:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from agentscope.tuner import DatasetConfig
|
||||
|
||||
dataset = DatasetConfig(path="my_dataset", split="train")
|
||||
dataset.preview(n=2)
|
||||
# Output the first two samples to verify correct loading
|
||||
# [
|
||||
# {
|
||||
# "question": "What is 2 + 2?",
|
||||
# "answer": "4"
|
||||
# },
|
||||
# {
|
||||
# "question": "What is 4 + 4?",
|
||||
# "answer": "8"
|
||||
# }
|
||||
# ]
|
||||
|
||||
Workflow Function
|
||||
--------------------
|
||||
The workflow function defines how the agent interacts with the environment and makes decisions. All workflow functions should follow the input/output signature defined in ``agentscope.tuner.WorkflowType``.
|
||||
|
||||
Below is an example workflow function using a ReAct agent to answer math questions:
|
||||
"""
|
||||
|
||||
from typing import Dict, Optional
|
||||
from agentscope.agent import ReActAgent
|
||||
from agentscope.formatter import OpenAIChatFormatter
|
||||
from agentscope.message import Msg
|
||||
from agentscope.model import ChatModelBase
|
||||
from agentscope.tuner import WorkflowOutput
|
||||
|
||||
|
||||
async def example_workflow_function(
|
||||
task: Dict,
|
||||
model: ChatModelBase,
|
||||
auxiliary_models: Optional[Dict[str, ChatModelBase]] = None,
|
||||
) -> WorkflowOutput:
|
||||
"""An example workflow function for tuning.
|
||||
|
||||
Args:
|
||||
task (`Dict`): The task information.
|
||||
model (`ChatModelBase`): The chat model used by the agent.
|
||||
auxiliary_models (`Optional[Dict[str, ChatModelBase]]`): Additional
|
||||
chat models, generally used to simulate the behavior of other
|
||||
non-training agents in multi-agent scenarios.
|
||||
|
||||
Returns:
|
||||
`WorkflowOutput`: The output generated by the workflow.
|
||||
"""
|
||||
agent = ReActAgent(
|
||||
name="react_agent",
|
||||
sys_prompt="You are a helpful math problem solving agent.",
|
||||
model=model,
|
||||
formatter=OpenAIChatFormatter(),
|
||||
)
|
||||
|
||||
response = await agent.reply(
|
||||
msg=Msg(
|
||||
"user",
|
||||
task["question"],
|
||||
role="user",
|
||||
), # extract question from task
|
||||
)
|
||||
|
||||
return WorkflowOutput( # return the response
|
||||
response=response,
|
||||
)
|
||||
|
||||
|
||||
# %%
|
||||
# You can directly run this workflow function with a task dictionary and a ``DashScopeChatModel`` / ``OpenAIChatModel`` to test its correctness before formal training. For example:
|
||||
|
||||
import asyncio
|
||||
import os
|
||||
from agentscope.model import DashScopeChatModel
|
||||
|
||||
task = {"question": "What is 123 plus 456?", "answer": "579"}
|
||||
model = DashScopeChatModel(
|
||||
model_name="qwen-max",
|
||||
api_key=os.environ["DASHSCOPE_API_KEY"],
|
||||
)
|
||||
workflow_output = asyncio.run(example_workflow_function(task, model))
|
||||
assert isinstance(
|
||||
workflow_output.response,
|
||||
Msg,
|
||||
), "In this example, the response should be a Msg instance."
|
||||
print("\nWorkflow response:", workflow_output.response.get_text_content())
|
||||
|
||||
# %%
|
||||
#
|
||||
# Judge Function
|
||||
# --------------------
|
||||
# The judge function evaluates the agent's performance on a given task and provides a reward signal for tuning.
|
||||
# All judge functions should follow the input/output signature defined in ``agentscope.tuner.JudgeType``.
|
||||
# Below is a simple judge function that compares the agent's response with the ground truth answer:
|
||||
|
||||
from typing import Any
|
||||
from agentscope.tuner import JudgeOutput
|
||||
|
||||
|
||||
async def example_judge_function(
|
||||
task: Dict,
|
||||
response: Any,
|
||||
auxiliary_models: Optional[Dict[str, ChatModelBase]] = None,
|
||||
) -> JudgeOutput:
|
||||
"""A very simple judge function only for demonstration.
|
||||
|
||||
Args:
|
||||
task (`Dict`): The task information.
|
||||
response (`Any`): The response field from the WorkflowOutput.
|
||||
auxiliary_models (`Optional[Dict[str, ChatModelBase]]`): Additional
|
||||
chat models for LLM-as-a-Judge purpose.
|
||||
Returns:
|
||||
`JudgeOutput`: The reward assigned by the judge.
|
||||
"""
|
||||
ground_truth = task["answer"]
|
||||
reward = 1.0 if ground_truth in response.get_text_content() else 0.0
|
||||
return JudgeOutput(reward=reward)
|
||||
|
||||
|
||||
judge_output = asyncio.run(
|
||||
example_judge_function(
|
||||
task,
|
||||
workflow_output.response,
|
||||
),
|
||||
)
|
||||
print(f"Judge reward: {judge_output.reward}")
|
||||
|
||||
# %%
|
||||
# The judge function can also be locally tested in the same way as shown above before formal training to ensure its logic is correct.
|
||||
#
|
||||
# .. tip::
|
||||
# You can leverage existing `MetricBase <https://github.com/agentscope-ai/agentscope/blob/main/src/agentscope/evaluate/_metric_base.py>`_ implementations in your judge function to compute more sophisticated metrics and combine them into a composite reward.
|
||||
#
|
||||
# Configuration and Running
|
||||
# ~~~~~~~~~~~~~~~
|
||||
# Finally, you can configure and run the tuning process using the ``tuner`` module.
|
||||
# Before starting, ensure that `Trinity-RFT <https://github.com/agentscope-ai/Trinity-RFT>`_ is installed in your environment, as it is required for tuning.
|
||||
#
|
||||
# Below is an example of configuring and starting the tuning process:
|
||||
#
|
||||
# .. note::
|
||||
# This example is for demonstration only. For a complete runnable example, see `Tune ReActAgent <https://github.com/agentscope-ai/agentscope/tree/main/examples/tuner/react_agent>`_
|
||||
#
|
||||
# .. code-block:: python
|
||||
#
|
||||
# from agentscope.tuner import tune, AlgorithmConfig, DatasetConfig, TunerModelConfig
|
||||
# # your workflow / judge function here...
|
||||
#
|
||||
# if __name__ == "__main__":
|
||||
# dataset = DatasetConfig(path="my_dataset", split="train")
|
||||
# model = TunerModelConfig(model_path="Qwen/Qwen3-0.6B", max_model_len=16384)
|
||||
# algorithm = AlgorithmConfig(
|
||||
# algorithm_type="multi_step_grpo",
|
||||
# group_size=8,
|
||||
# batch_size=32,
|
||||
# learning_rate=1e-6,
|
||||
# )
|
||||
# tune(
|
||||
# workflow_func=example_workflow_function,
|
||||
# judge_func=example_judge_function,
|
||||
# model=model,
|
||||
# train_dataset=dataset,
|
||||
# algorithm=algorithm,
|
||||
# )
|
||||
#
|
||||
# Here, ``DatasetConfig`` configures the training dataset, ``TunerModelConfig`` sets the parameters for the trainable model, and ``AlgorithmConfig`` specifies the reinforcement learning algorithm and its hyperparameters.
|
||||
#
|
||||
# .. tip::
|
||||
# The ``tune`` function is based on `Trinity-RFT <https://github.com/agentscope-ai/Trinity-RFT>`_ and internally converts input parameters to a YAML configuration.
|
||||
# Advanced users can skip the ``model``, ``train_dataset``, and ``algorithm`` arguments and instead provide a YAML config file path via the ``config_path`` argument.
|
||||
# Using a configuration file is recommended for fine-grained control and to leverage advanced Trinity-RFT features. See the Trinity-RFT `Configuration Guide <https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/trinity_configs.html>`_ for more options.
|
||||
#
|
||||
# Save the above code as ``main.py`` and run it with:
|
||||
#
|
||||
# .. code-block:: bash
|
||||
#
|
||||
# ray start --head
|
||||
# python main.py
|
||||
#
|
||||
# Checkpoints and logs are automatically saved to the ``checkpoints/AgentScope`` directory under your workspace, with each run in a timestamped sub-directory. Tensorboard logs can be found in ``monitor/tensorboard`` within the checkpoint directory.
|
||||
#
|
||||
# .. code-block:: text
|
||||
#
|
||||
# your_workspace/
|
||||
# └── checkpoints/
|
||||
# └──AgentScope/
|
||||
# └── Experiment-20260104185355/ # each run saved in a sub-directory with timestamp
|
||||
# ├── monitor/
|
||||
# │ └── tensorboard/ # tensorboard logs
|
||||
# └── global_step_x/ # saved model checkpoints at step x
|
||||
#
|
||||
# .. tip::
|
||||
# For more tuning examples, refer to the `tuner directory <https://github.com/agentscope-ai/agentscope-samples/tree/main/tuner>`_ of the AgentScope-Samples repository.
|
||||
Reference in New Issue
Block a user