Add OpenHands SDK SFT converter by neubig · Pull Request #233 · neulab/agent-data-protocol

neubig · 2026-05-25T02:57:38Z

Summary

Add agents/openhands_sdk, a V1 OpenHands Software Agent SDK SFT converter that emits OpenAI chat-completions style records with native assistant.tool_calls, tool messages, SDK-style text content blocks, and a tools array.
Replace dataset-local api.py stubs with canonical metadata.json files for all 50 standardized datasets.
Bump standardized schema to 1.4.0 and rename per-trajectory available_apis to available_custom_tools.
Add dataset metadata fields:
- custom_tools: OpenAI-standard function tool specs for custom dataset tools.
- code_enabled: exact dataset-level set of CodeAction.language values.
- browser_enabled: dataset-level browser capability flag.
Update OpenHands v0, OpenHands v0 MCP, SWE-agent, and OpenHands SDK SFT converters to read metadata.json and reject action/metadata inconsistencies.
Regenerate OpenHands SDK SFT samples from metadata-driven tool availability.

OpenHands SDK Behavior

code_enabled: ["bash"] enables the SDK TerminalTool representation.
Any non-bash language in code_enabled intentionally fails OpenHands SDK conversion for now.
browser_enabled: true enables the SDK browser tool family.
available_custom_tools narrows the custom tool set for a trajectory; when absent, all metadata custom_tools are available.
SDK-native aliases such as str_replace_editor, submit, finish, and browser-shaped actions are mapped to SDK-native tool names instead of duplicated as custom tools.
Browser detection is context-aware so non-browser homonyms remain custom tools, e.g. SWE-agent goto(line_number) and Android click(x, y).

Current Dataset Status

Metadata files added: 50/50 datasets with sample_std.json.
Dataset-local api.py files removed: all replaced by metadata.json.
OpenHands SDK SFT samples regenerated for datasets compatible with current SDK code handling.
OpenHands SDK SFT samples intentionally removed for datasets with non-bash code actions:
- agenttuning_db: mysql
- code_feedback: cpp, go, javascript, python
- codeactinstruct: python
- jupyter-agent-dataset: python
- omniact: python
- openhands: bash, python

Validation

uv run --with ruff ruff check .
uv run --with pytest --with pydantic --with Pillow --with beautifulsoup4 --with tqdm --with datasets --with matplotlib --with pre-commit pytest -q tests/test_*.py
python scripts/check_schema_version_bump.py --base-ref origin/main --head-ref HEAD

Results:

Ruff: passed
Full test suite: 461 passed, 14 skipped, 4 warnings
Schema version bump check: passed (1.3.0 -> 1.4.0)

Notes

The non-bash SDK gap is intentional in this PR. Future work can define harness-specific mappings for Python and other programming languages without weakening the harness-independent metadata schema.

This PR description was updated by an AI agent on behalf of the user.

Co-authored-by: openhands <openhands@all-hands.dev>

…-sft-converter

…ft-converter

github-actions · 2026-05-26T00:24:49Z

PR Artifacts Notice

This PR contains a .pr/ directory with PR-specific documents. This directory will be automatically removed when the PR is approved.

For fork PRs: Manual removal is required before merging.

…ft-converter

Co-authored-by: openhands <openhands@all-hands.dev>

github-actions

🟡 Acceptable — The core converter (agents/openhands_sdk/std_to_sft.py) is solid: clean OpenAI chat-completions output, proper tool-call/result pairing with explicit IDs, good browser-action heuristics, and metadata-driven tool selection. Tests exercise real code paths and cover important edge cases (legacy XML finish, parallel tool calls, explicit IDs). Two documentation issues need attention.

Also note: .agents/skills/custom-codereview-guide.md (not modified in this PR, so no inline comment) still says api.py is required for ApiAction datasets (line 16) and validates against available_apis (lines 24–25). After this PR merges those review rules will fire incorrectly on every future dataset PR — api.py no longer exists and the field is now available_custom_tools. This should be updated in the same commit that updates AGENTS.md.

[RISK ASSESSMENT]

[Overall PR] ⚠️ Risk Assessment: 🟡 MEDIUM. Large fan-out (~370 files, 50 datasets updated) increases the chance of a silent regression. The stale AGENTS.md and reviewer-skill guide could mislead future contributors and automated reviewers.

VERDICT: ✅ Worth merging after fixing the stale documentation.

KEY INSIGHT: The api.py → metadata.json migration is complete in code but incomplete in documentation — AGENTS.md and the reviewer skill still enforce the old contract.

Was this automated review useful? React with 👍 or 👎 to this review to help us measure review quality.
Workflow run: https://github.com/neulab/agent-data-protocol/actions/runs/26491199686

github-actions · 2026-05-27T04:48:11Z

@@ -89,6 +89,7 @@ Example correct format:
 - Every `ApiAction.kwargs` object must validate by calling the function in `api.py`; include required parameters such as the `message` argument for `finish`.
 - If a trajectory has `available_apis`, it must be a top-level list of function names that is a subset of that dataset's `api.py` functions, and every `ApiAction.function` used in that trajectory must appear in the list.
 - Only populate `available_apis` for datasets that have `api.py` and whose source data explicitly specifies per-instance tool/API availability. Do not fill it with all functions from `api.py`, and do not infer it merely from the APIs used in the trajectory.
+- If a trajectory has `CodeAction` entries, it must have top-level `available_code_languages` containing exactly the `CodeAction.language` values used in that trajectory.


🟠 Important: This new line was added alongside lines 88–91 and line 15 which still reference the old api.py contract and the renamed field available_apis. This PR removes every dataset-local api.py and renames the trajectory field to available_custom_tools, but the surrounding rules were not updated. Future contributors will add api.py files and use available_apis in JSON — both now wrong.

Required updates in the same file:

Line 15 (directory tree): api.py (required if ApiAction is used) → metadata.json (required when sample_std.json exists)

Lines 88–89: replace api.py with metadata.json custom_tools

Lines 90–91: available_apis → available_custom_tools; api.py functions → metadata.json custom_tools

Line 49: remove api.py from the fix-and-regenerate list

Line 172: api.py function signature → metadata.json custom_tools schema

github-actions · 2026-05-27T04:48:11Z

@@ -0,0 +1,28 @@
+# OpenHands SDK Validation Runs


🟡 Suggestion: This directory adds ~150 files (~26,000 lines). Per summary.json, 46 of 50 datasets are pending_rerun — only agenttuning_db and screenagent actually completed a run. Every example.py is a 523-line script that is byte-for-byte identical across all 50 datasets except three top-level constants (DATASET_NAME, RECORD_INDEX, RECORD_ID).

Consider:

Replacing the 50 copies with a single parameterised script, or

Not committing the 46 pending_rerun placeholder stubs — they add no validation signal.

The 50× duplication will make the harness expensive to maintain when the SDK API changes.

neubig · 2026-05-28T04:28:21Z

Closing this in favor of a smaller replacement PR that generates OpenHands SDK SFT data through SDK event/message construction instead of the hand-rolled OpenAI-chat conversion and large validation artifact set.

neubig force-pushed the codex/openhands-sdk-sft-converter branch from 3c4ba33 to f1e6ecd Compare May 25, 2026 03:51

add OpenHands SDK SFT converter

e80b092

Co-authored-by: openhands <openhands@all-hands.dev>

neubig force-pushed the codex/openhands-sdk-sft-converter branch from f1e6ecd to e80b092 Compare May 25, 2026 12:02

openhands-agent added 13 commits May 25, 2026 08:13

add matched tool call links to ADP schema

5ebc27e

require matched tool calls to have results

4f53417

Merge branch 'codex/match-tool-calls-in-adp' into codex/openhands-sdk…

ae7c519

…-sft-converter

validate OpenHands SDK tool call logs

6b797c1

Use dataset task prefixes for SDK validation logs

a117fbd

Indent PR completion logs

cc575ca

Run SDK validation through agent loop

c0792a9

Merge remote-tracking branch 'origin/main' into codex/openhands-sdk-s…

36ecdb6

…ft-converter

Refresh SDK validation after schema merge

918c895

Apply pre-commit formatting

d163374

Refine SDK validation trajectories

d826912

Track available code action languages

61ea0c5

Replace dataset API stubs with metadata

0dc0875

openhands-agent added 5 commits May 25, 2026 20:42

Fix metadata schema version and pre-commit

2cab3cb

Merge remote-tracking branch 'origin/main' into codex/openhands-sdk-s…

bf54d3e

…ft-converter

Handle agenttuning_db mysql metadata

d3b9efa

Refresh OpenHands SDK validation examples

40f2e66

Co-authored-by: openhands <openhands@all-hands.dev>

chore: apply validation formatting

f48e012

Co-authored-by: openhands <openhands@all-hands.dev>

neubig marked this pull request as ready for review May 27, 2026 04:41

github-actions Bot reviewed May 27, 2026

View reviewed changes

neubig closed this May 28, 2026

neubig mentioned this pull request May 28, 2026

Add SDK-backed OpenHands SFT converter #241

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add OpenHands SDK SFT converter#233

Add OpenHands SDK SFT converter#233
neubig wants to merge 19 commits into
mainfrom
codex/openhands-sdk-sft-converter

neubig commented May 25, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 26, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

github-actions Bot May 27, 2026

Uh oh!

github-actions Bot May 27, 2026

Uh oh!

neubig commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

neubig commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

OpenHands SDK Behavior

Current Dataset Status

Validation

Notes

Uh oh!

github-actions Bot commented May 26, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 27, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 27, 2026

Choose a reason for hiding this comment

Uh oh!

neubig commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

neubig commented May 25, 2026 •

edited

Loading