Peer review request: AANA verifier-gated architecture for DataAgentBench
Summary
I would like to submit the Alignment-Aware Neural Architecture (AANA) platform for DataAgentBench maintainer review as a candidate verifier-gated data-agent architecture.
AANA is a runtime architecture for verifier-grounded correction. It wraps a base generator with explicit verifier modules, evidence retrieval, a correction policy, and an alignment gate so an agent action or answer can be routed to accept, revise, retrieve, ask, refuse, or defer before final output.
I am not submitting a DAB leaderboard score in this issue. The README describes a leaderboard submission as a JSON file with repeated runs across all queries and datasets, and current public submissions vary between website-tier 5-run submissions and the README's broader 50-run guidance. I do not want to overclaim a DAB result before wiring AANA into the DAB harness and running the required query set.
Why AANA is relevant to DAB
DataAgentBench stresses agents on realistic enterprise data workloads:
- Multi-database integration.
- Ill-formatted key joins.
- Unstructured text transformation.
- Domain knowledge.
- Read-only query/tool use.
- Final answer validation.
Those are exactly the surfaces where a verifier-gated architecture can be useful:
- Gate generated SQL or tool plans against read-only and dataset-scope constraints.
- Require evidence/source IDs and freshness metadata before final answers.
- Route underspecified joins or missing schema evidence to
retrieve or ask instead of hallucinating.
- Track answer provenance and verifier decisions without publishing raw prompts or evidence text.
- Defer high-risk data operations when database configuration, schema, or validator evidence is incomplete.
Architecture under review
- System model:
S = (f_theta, E_phi, R, Pi_psi, G)
f_theta: base model or data-agent generator.
E_phi: verifier stack for factual, SQL/tool, schema, policy, and task constraints.
R: retrieval or grounding module for schema, query, validator, and run-log evidence.
Pi_psi: correction policy that chooses revise/retrieve/ask/refuse/defer paths.
G: alignment gate that blocks direct acceptance unless verifier and AIx criteria pass.
- AIx output: normalized score, layer components, risk tier, beta, decision, and hard blockers.
Fresh benchmark evidence already run
I ran AANA locally on the HarmActionsEval-style dataset from Agent-Action-Guard as adjacent agent-action safety evidence:
These are not DataAgentBench scores. They are included only to show that the AANA gate has been exercised on a public agent-action benchmark before attempting a DAB run.
Proposed DAB integration path
- Implement an AANA-wrapped data agent using DAB's customized-agent path.
- Use AANA verifiers around tool/query planning, schema evidence, read-only constraints, and final-answer grounding.
- Run a pilot query first to validate the integration and log format.
- Run the accepted DAB submission tier across all datasets and queries.
- Submit a PR with the required JSON result file and an agent configuration note covering base model, run count, dataset hints, verifier settings, and caveats.
Review request
Would the DAB maintainers be willing to review AANA as a candidate verifier-gated data-agent architecture and advise which submission tier/format is preferred before I produce a full DAB score submission?
Peer review request: AANA verifier-gated architecture for DataAgentBench
Summary
I would like to submit the Alignment-Aware Neural Architecture (AANA) platform for DataAgentBench maintainer review as a candidate verifier-gated data-agent architecture.
AANA is a runtime architecture for verifier-grounded correction. It wraps a base generator with explicit verifier modules, evidence retrieval, a correction policy, and an alignment gate so an agent action or answer can be routed to
accept,revise,retrieve,ask,refuse, ordeferbefore final output.I am not submitting a DAB leaderboard score in this issue. The README describes a leaderboard submission as a JSON file with repeated runs across all queries and datasets, and current public submissions vary between website-tier 5-run submissions and the README's broader 50-run guidance. I do not want to overclaim a DAB result before wiring AANA into the DAB harness and running the required query set.
Why AANA is relevant to DAB
DataAgentBench stresses agents on realistic enterprise data workloads:
Those are exactly the surfaces where a verifier-gated architecture can be useful:
retrieveoraskinstead of hallucinating.Architecture under review
S = (f_theta, E_phi, R, Pi_psi, G)f_theta: base model or data-agent generator.E_phi: verifier stack for factual, SQL/tool, schema, policy, and task constraints.R: retrieval or grounding module for schema, query, validator, and run-log evidence.Pi_psi: correction policy that chooses revise/retrieve/ask/refuse/defer paths.G: alignment gate that blocks direct acceptance unless verifier and AIx criteria pass.Fresh benchmark evidence already run
I ran AANA locally on the HarmActionsEval-style dataset from Agent-Action-Guard as adjacent agent-action safety evidence:
eval_outputs/benchmark_scout/aana_harmactions_latest_results.jsonThese are not DataAgentBench scores. They are included only to show that the AANA gate has been exercised on a public agent-action benchmark before attempting a DAB run.
Proposed DAB integration path
Review request
Would the DAB maintainers be willing to review AANA as a candidate verifier-gated data-agent architecture and advise which submission tier/format is preferred before I produce a full DAB score submission?