Make dcx analytics track the
BigQuery Agent Analytics SDK
as the upstream product contract for BigQuery agent analytics.
The goal is not to blindly clone every Python library feature into the Rust CLI.
The goal is to make sure that when the SDK exposes a stable analytics workflow,
dcx analytics stays aligned in:
- command inventory
- flag names and semantics
- environment-variable behavior where applicable
- output schema and exit-code behavior
- evaluator names and execution semantics
- documentation and examples
Based on the SDK README.md, SDK.md, and cli.py, the upstream CLI surface
currently includes:
doctorget-traceevaluateinsightsdriftdistributionhitl-metricslist-tracesviews create-allviews createcategorical-evalcategorical-views
The upstream library also includes broader non-CLI capabilities:
- trace reconstruction and filtering
- LLM-as-judge
- multi-trial evaluation
- grader pipelines
- eval suite management
- eval validation
- long-horizon memory
- context graph
- remote function deployment
- continuous query templates
dcx analytics currently ships:
doctorevaluateget-tracelist-tracesinsightsdriftdistributionhitl-metricsviews create-all
So the current gaps versus the SDK CLI are:
- missing
views create - missing
categorical-eval - missing
categorical-views - partial
evaluateparity - partial flag parity on existing commands
Treat the SDK as the source of truth for BigQuery agent analytics.
Treat dcx as the source of truth for:
- Rust CLI packaging and UX
- cross-source Data Cloud expansion outside BigQuery
- CLI-wide contracts such as
--sanitize, JSON output, skill integration, shell completions, and Gemini manifest support
That means:
- SDK-defined analytics workflows should converge into
dcx analytics dcxcan add CLI-wide behavior, but should not silently drift on analytics semantics- any intentional divergence must be documented explicitly
To keep this tractable, split alignment into three layers.
These should stay aligned continuously:
- command names
- flag names
- evaluator names
- primary output fields
- exit-code semantics
- env-var shortcuts for project and dataset
These should land in dcx shortly after the SDK stabilizes them:
- new analytics CLI commands
- new evaluators and judge criteria
- new report fields that materially affect automation
These do not need immediate direct dcx analytics commands:
- multi-trial evaluation
- grader pipeline composition
- eval suite lifecycle APIs
- context graph APIs
- memory service
- deployment helpers such as remote functions and continuous queries
Those may later surface as:
- new
dcx analyticscommands - doc-only references
- config-driven workflows
- or stay Python-library-only if they are not natural CLI operations
Define one generated compatibility contract for analytics.
Recommended artifacts:
scripts/update_analytics_sdk_contract.shtests/fixtures/upstream_sdk_latest/cli.pytests/fixtures/upstream_sdk_latest/SDK.mdtests/fixtures/analytics_sdk_contract.jsondocs/analytics_sdk_contract.md
It should record, per command:
- command name
- flags
- required vs optional flags
- env vars
- output format fields
- exit-code semantics
- upstream source file / doc reference
- parity status:
exactintentional divergencemissingdcx extension
This becomes the artifact reviewers use instead of ad hoc comparisons.
Need to add:
dcx analytics views createdcx analytics categorical-evaldcx analytics categorical-views
SDK supports evaluator values:
latencyerror_rateturn_counttoken_efficiencyttftcostllm-judge
dcx currently supports a narrower set and uses kebab-case names such as
error-rate.
Plan:
- support both the SDK canonical names and
dcxaliases where needed - choose one canonical JSON output spelling
- document aliases explicitly
Examples from the SDK CLI:
get-tracesupports--session-idor--trace-idinsightssupports--max-sessionsdistributionsupports--modeand--top-kviews createexists alongsideviews create-all
Each existing dcx command should be audited against the SDK definition.
Even when command names match, the automation contract may drift.
Need to align:
- top-level JSON field names
- evaluation summary fields
- per-session result fields
- trace payload shape
- error envelope expectations where analytics commands wrap SDK concepts
dcx can preserve its global {"error":"..."} envelope, but success payloads
should stay close to SDK report structures wherever practical.
Add a small updater that fetches the latest upstream SDK contract from:
main/src/bigquery_agent_analytics/cli.pymain/SDK.md- optionally
main/README.md
Store the fetched files under:
tests/fixtures/upstream_sdk_latest/
Then generate:
tests/fixtures/analytics_sdk_contract.jsondocs/analytics_sdk_contract.md
Recommended operating model:
- scheduled update job: fetch latest SDK
main, regenerate the contract, and open a PR when the mapping changes - normal CI: use the checked-in generated files only
That gives dcx a dynamic mapping from the latest SDK without making routine
CI runs depend on live GitHub availability.
Generate one compatibility table that maps:
- upstream SDK command ->
dcx analyticscommand - upstream flag ->
dcxflag - status and rationale
This should live in version control and be regenerated whenever:
- the upstream SDK changes
dcx analyticschanges
Reviewers should edit only intentional divergence notes, not the generated command inventory itself.
Add tests that assert:
- help output includes the expected commands and flags
- evaluator enum accepts the expected values
- JSON payload contains required keys
- exit codes match SDK semantics where defined
These should be local contract tests in Rust, driven by the checked-in generated contract, not live SDK integration tests.
Some divergence is fine, but it must be explicit.
Examples likely to remain divergent:
--sanitizejson|table|textrenderer detailsDCX_*environment variables vsBQ_AGENT_*- cross-source Data Cloud concepts outside BigQuery analytics
Known current divergences to seed into the initial generated contract:
--table(dcx) vs--table-id(SDK) — flag name mismatch--locationdefaults to"US"in dcx,Nonein SDKdrift --min-coverageanddrift --exit-code— dcx extensions, not in SDK- infrastructure error exit code: SDK uses exit 2, dcx does not
--limitmissing onevaluate,insights,drift,distributionin dcx
Track these in the compatibility table with a reason.
The updater should parse the latest SDK sources for:
- command names from
@app.commandandviews_app.command - flags/options from Typer definitions
- evaluator names from
_CODE_EVALUATORS - judge criteria from
_LLM_JUDGES - env vars from shared options
- output and exit-code notes from
SDK.md
The updater should emit:
analytics_sdk_contract.jsonanalytics_sdk_contract.md- a machine-readable diff summary such as:
- new upstream commands
- removed upstream commands
- new flags
- removed flags
- changed evaluator values
- pull request CI should validate
dcxagainst the checked-in generated contract - a scheduled workflow should refresh from latest SDK
main - when upstream changes, the workflow should open or update a tracking PR
This is the safest way to get "latest SDK" mapping without making local builds or PR CI flaky.
Deliverables:
- add
scripts/update_analytics_sdk_contract.sh - add fetched upstream fixtures under
tests/fixtures/upstream_sdk_latest/ - generate
docs/analytics_sdk_contract.md - generate
tests/fixtures/analytics_sdk_contract.json - classify every current difference as exact / missing / intentional
Done when:
- every
dcx analyticscommand is mapped to an upstream SDK command or marked asdcx-specific by generated output ✓
Deliverables:
- add
dcx analytics views create - add
dcx analytics categorical-eval - add
dcx analytics categorical-views
Done when:
dcx analyticsmatches the current SDK command inventory for stable CLI analytics workflows ✓
Deliverables:
- add missing evaluator values (
ttft,cost,llm-judge) - support SDK-compatible evaluator spellings
- add missing flags on
get-trace,insights,distribution, and others as required by the contract table - implement
LIMITin SQL forevaluate(6 templates) anddistribution - add runtime warnings for placeholder flags (
--criterion,--strict,--mode,--top-k,--trace-idalias) - add
validate_session_id()for SQL injection prevention onlist-traces - add
FLAG_OVERRIDESmechanism for accurate contract classification
Done when:
- the compatibility table shows no
missingitems for stable flags/evaluators ✓
Deliverables:
- align success JSON payloads where practical
- add
BqxError::InfraErrorvariant with exit code 2 (SDK-compatible) - change generic error exit from 1 to 2 in
main.rs - document any remaining intentional differences
- add 6 output-key regression tests for all major result structs
- add 4 exit-code regression tests
Done when:
- analytics automation examples from SDK docs can be translated to
dcxwithout semantic surprises ✓
Deliverables:
- add
contract-checkCI job that regenerates the contract from checked-in fixtures and fails if the output differs - add a scheduled
sdk-syncworkflow (weekly Monday 09:00 UTC) that fetches the latest upstream SDK, regenerates the contract, and opens a PR - configure git identity for bot commits on CI runners
- ensure
sdk-synclabel is created idempotently before use - use
git diff -Ito ignore cosmetic date changes in staleness check
Done when:
- SDK analytics changes produce a visible, reviewable generated delta in
dcx✓
Use this rule for future changes:
- if the SDK adds or changes a stable analytics CLI command, open a matching
dcx analyticstracking issue within one release cycle - if
dcxintentionally diverges, document it in the compatibility table in the same PR - do not merge analytics UX changes in
dcxwithout checking the upstream SDK contract
- add updater script
- add generated contract JSON/Markdown
- add fetched upstream fixtures
- add
views create - add
categorical-eval - add
categorical-views
- add
ttft - add
cost - add
llm-judge - support SDK naming aliases
- align output keys
- align exit behavior
- add regression coverage
This alignment effort is complete when:
dcx analyticscovers the stable SDK CLI analytics commands- evaluator and flag semantics are documented and tested
- all remaining differences are intentional and written down
- the latest SDK contract can be re-fetched and regenerated by script
- CI has a lightweight drift check so the two surfaces do not silently diverge
- a scheduled updater keeps
dcxtracking latest upstream changes