|
| 1 | +Tau2 agent — how to run experiments |
| 2 | +================================= |
| 3 | + |
| 4 | +This document shows the minimal steps to run tau2 experiments locally. |
| 5 | + |
| 6 | +*Steps* |
| 7 | +1) Configure your API Key |
| 8 | +```bash |
| 9 | +echo "policy_base_url: https://api.openai.com/v1 |
| 10 | +policy_api_key: your-openai-api-key |
| 11 | +policy_model_name: gpt-4.1-2025-04-14" > env.yaml |
| 12 | +``` |
| 13 | + |
| 14 | +2) Setup Tau^2 data |
| 15 | + |
| 16 | +- Download the `tau2` folder (https://github.com/sierra-research/tau2-bench/tree/main/data/tau2). |
| 17 | +- Save it to `resources_servers/tau2_bench/data/`. |
| 18 | +- Configure data path (*don't forget* to modify the path accordingly): |
| 19 | +```bash |
| 20 | +export TAU2_DATA_DIR="/your_path/to/resources_servers/tau2_bench/data/" |
| 21 | +``` |
| 22 | + |
| 23 | +3) Launch the NemoGym server |
| 24 | +- In the *first terminal*, launch the server. |
| 25 | + |
| 26 | +Example server for `openai_model`: |
| 27 | +```bash |
| 28 | +config_paths="responses_api_agents/tau2_agent/configs/tau2_agent.yaml,\ |
| 29 | +responses_api_models/openai_model/configs/openai_model.yaml,\ |
| 30 | +resources_servers/tau2_bench/configs/tau2_bench.yaml" |
| 31 | + |
| 32 | +ng_run "+config_paths=[$config_paths]" \ |
| 33 | ++tau2_agent.responses_api_agents.tau2_agent.resources_server.name=tau2_bench_resources_server |
| 34 | +``` |
| 35 | + |
| 36 | +Example server for `vllm_model`: |
| 37 | +```bash |
| 38 | +config_paths="responses_api_agents/tau2_agent/configs/tau2_agent.yaml,\ |
| 39 | +responses_api_models/vllm_model/configs/vllm_model.yaml,\ |
| 40 | +resources_servers/tau2_bench/configs/tau2_bench.yaml" |
| 41 | + |
| 42 | +ng_run "+config_paths=[$config_paths]" \ |
| 43 | + +tau2_agent.responses_api_agents.tau2_agent.resources_server.name=tau2_bench_resources_server \ |
| 44 | ++policy_model.responses_api_models.vllm_model.return_token_id_information=true |
| 45 | +``` |
| 46 | + |
| 47 | +4) Prepare experiment input |
| 48 | +- Prepare an input JSONL file describing which domain/task(s) to run. Set the path in the `input_jsonl_fpath`. An example is in `resources_servers/tau2_bench/data/example_retail_demo.jsonl` |
| 49 | + |
| 50 | +5) Collect rollouts from Tau^2 Bench (separate terminal) |
| 51 | +- In the *second (separate) terminal*, launch the rollout script to kick off the experiment: |
| 52 | + |
| 53 | +```bash |
| 54 | +ng_collect_rollouts +agent_name=tau2_agent \ |
| 55 | + +input_jsonl_fpath=resources_servers/tau2_bench/data/example_retail_demo.jsonl \ |
| 56 | + +output_jsonl_fpath=resources_servers/tau2_bench/data/example_retail_demo_rollouts.jsonl \ |
| 57 | + +limit=1 \ |
| 58 | + +num_repeats=1 \ |
| 59 | + +num_samples_in_parallel=null |
| 60 | +``` |
0 commit comments