Files
tw2/examples/evaluation/ace_bench/README.md
codex-bot a64378956a
Some checks failed
Pre-commit / run (ubuntu-latest) (push) Has been cancelled
Deploy Sphinx documentation to Pages / build_en (ubuntu-latest, 3.10) (push) Has been cancelled
Deploy Sphinx documentation to Pages / build_zh (ubuntu-latest, 3.10) (push) Has been cancelled
Python Unittest Coverage / test (macos-15, 3.10) (push) Has been cancelled
Python Unittest Coverage / test (macos-15, 3.11) (push) Has been cancelled
Python Unittest Coverage / test (macos-15, 3.12) (push) Has been cancelled
Python Unittest Coverage / test (ubuntu-latest, 3.10) (push) Has been cancelled
Python Unittest Coverage / test (ubuntu-latest, 3.11) (push) Has been cancelled
Python Unittest Coverage / test (ubuntu-latest, 3.12) (push) Has been cancelled
Python Unittest Coverage / test (windows-latest, 3.10) (push) Has been cancelled
Python Unittest Coverage / test (windows-latest, 3.11) (push) Has been cancelled
Python Unittest Coverage / test (windows-latest, 3.12) (push) Has been cancelled
chore: initialize sandbox and overwrite remote content
2026-03-02 22:32:27 +08:00

19 lines
619 B
Markdown

# ACEBench Example
This is an example of agent-oriented evaluation in AgentScope.
We take [ACEBench](https://github.com/ACEBench/ACEBench) as an example benchmark, and run
a ReAct agent with [Ray](https://github.com/ray-project/ray)-based evaluator, which supports
**distributed** and **parallel** evaluation.
To run the example, you need to install AgentScope first, and then run the evaluation with the following command:
```bash
python main.py --data_dir {data_dir} --result_dir {result_dir}
```
## Further Reading
- [ACEBench](https://github.com/ACEBench/ACEBench)
- [Ray](https://github.com/ray-project/ray)