Plan: Keep `dcx analytics` Aligned with BigQuery Agent Analytics SDK

Goal

Make dcx analytics track the BigQuery Agent Analytics SDK as the upstream product contract for BigQuery agent analytics.

The goal is not to blindly clone every Python library feature into the Rust CLI. The goal is to make sure that when the SDK exposes a stable analytics workflow, dcx analytics stays aligned in:

command inventory
flag names and semantics
environment-variable behavior where applicable
output schema and exit-code behavior
evaluator names and execution semantics
documentation and examples

What Upstream Looks Like Today

Based on the SDK README.md, SDK.md, and cli.py, the upstream CLI surface currently includes:

doctor
get-trace
evaluate
insights
drift
distribution
hitl-metrics
list-traces
views create-all
views create
categorical-eval
categorical-views

The upstream library also includes broader non-CLI capabilities:

trace reconstruction and filtering
LLM-as-judge
multi-trial evaluation
grader pipelines
eval suite management
eval validation
long-horizon memory
context graph
remote function deployment
continuous query templates

Current `dcx analytics` Surface

dcx analytics currently ships:

doctor
evaluate
get-trace
list-traces
insights
drift
distribution
hitl-metrics
views create-all

So the current gaps versus the SDK CLI are:

missing views create
missing categorical-eval
missing categorical-views
partial evaluate parity
partial flag parity on existing commands

Alignment Principle

Treat the SDK as the source of truth for BigQuery agent analytics.

Treat dcx as the source of truth for:

Rust CLI packaging and UX
cross-source Data Cloud expansion outside BigQuery
CLI-wide contracts such as --sanitize, JSON output, skill integration, shell completions, and Gemini manifest support

That means:

SDK-defined analytics workflows should converge into dcx analytics
dcx can add CLI-wide behavior, but should not silently drift on analytics semantics
any intentional divergence must be documented explicitly

Scope Boundary

To keep this tractable, split alignment into three layers.

Layer 1: Required parity

These should stay aligned continuously:

command names
flag names
evaluator names
primary output fields
exit-code semantics
env-var shortcuts for project and dataset

Layer 2: Fast-follow parity

These should land in dcx shortly after the SDK stabilizes them:

new analytics CLI commands
new evaluators and judge criteria
new report fields that materially affect automation

Layer 3: Optional CLI surfacing

These do not need immediate direct dcx analytics commands:

multi-trial evaluation
grader pipeline composition
eval suite lifecycle APIs
context graph APIs
memory service
deployment helpers such as remote functions and continuous queries

Those may later surface as:

new dcx analytics commands
doc-only references
config-driven workflows
or stay Python-library-only if they are not natural CLI operations

Contract Model

Define one generated compatibility contract for analytics.

Recommended artifacts:

scripts/update_analytics_sdk_contract.sh
tests/fixtures/upstream_sdk_latest/cli.py
tests/fixtures/upstream_sdk_latest/SDK.md
tests/fixtures/analytics_sdk_contract.json
docs/analytics_sdk_contract.md

It should record, per command:

command name
flags
required vs optional flags
env vars
output format fields
exit-code semantics
upstream source file / doc reference
parity status:
- exact
- intentional divergence
- missing
- dcx extension

This becomes the artifact reviewers use instead of ad hoc comparisons.

Current Gap Map

1. Command parity gaps

Need to add:

dcx analytics views create
dcx analytics categorical-eval
dcx analytics categorical-views

2. `evaluate` parity gaps

SDK supports evaluator values:

latency
error_rate
turn_count
token_efficiency
ttft
cost
llm-judge

dcx currently supports a narrower set and uses kebab-case names such as error-rate.

Plan:

support both the SDK canonical names and dcx aliases where needed
choose one canonical JSON output spelling
document aliases explicitly

3. Existing command flag gaps

Examples from the SDK CLI:

get-trace supports --session-id or --trace-id
insights supports --max-sessions
distribution supports --mode and --top-k
views create exists alongside views create-all

Each existing dcx command should be audited against the SDK definition.

4. Output-shape gaps

Even when command names match, the automation contract may drift.

Need to align:

top-level JSON field names
evaluation summary fields
per-session result fields
trace payload shape
error envelope expectations where analytics commands wrap SDK concepts

dcx can preserve its global {"error":"..."} envelope, but success payloads should stay close to SDK report structures wherever practical.

Proposed Architecture

1. Dynamic upstream fetch + checked-in cache

Add a small updater that fetches the latest upstream SDK contract from:

main/src/bigquery_agent_analytics/cli.py
main/SDK.md
optionally main/README.md

Store the fetched files under:

tests/fixtures/upstream_sdk_latest/

Then generate:

tests/fixtures/analytics_sdk_contract.json
docs/analytics_sdk_contract.md

Recommended operating model:

scheduled update job: fetch latest SDK main, regenerate the contract, and open a PR when the mapping changes
normal CI: use the checked-in generated files only

That gives dcx a dynamic mapping from the latest SDK without making routine CI runs depend on live GitHub availability.

2. Generated compatibility table in-repo

Generate one compatibility table that maps:

upstream SDK command -> dcx analytics command
upstream flag -> dcx flag
status and rationale

This should live in version control and be regenerated whenever:

the upstream SDK changes
dcx analytics changes

Reviewers should edit only intentional divergence notes, not the generated command inventory itself.

3. Golden CLI contract tests

Add tests that assert:

help output includes the expected commands and flags
evaluator enum accepts the expected values
JSON payload contains required keys
exit codes match SDK semantics where defined

These should be local contract tests in Rust, driven by the checked-in generated contract, not live SDK integration tests.

4. Intentional divergence registry

Some divergence is fine, but it must be explicit.

Examples likely to remain divergent:

--sanitize
json|table|text renderer details
DCX_* environment variables vs BQ_AGENT_*
cross-source Data Cloud concepts outside BigQuery analytics

Known current divergences to seed into the initial generated contract:

--table (dcx) vs --table-id (SDK) — flag name mismatch
--location defaults to "US" in dcx, None in SDK
drift --min-coverage and drift --exit-code — dcx extensions, not in SDK
infrastructure error exit code: SDK uses exit 2, dcx does not
--limit missing on evaluate, insights, drift, distribution in dcx

Track these in the compatibility table with a reason.

Dynamic Mapping Flow

Updater inputs

The updater should parse the latest SDK sources for:

command names from @app.command and views_app.command
flags/options from Typer definitions
evaluator names from _CODE_EVALUATORS
judge criteria from _LLM_JUDGES
env vars from shared options
output and exit-code notes from SDK.md

Updater outputs

The updater should emit:

analytics_sdk_contract.json
analytics_sdk_contract.md
a machine-readable diff summary such as:
- new upstream commands
- removed upstream commands
- new flags
- removed flags
- changed evaluator values

CI policy

pull request CI should validate dcx against the checked-in generated contract
a scheduled workflow should refresh from latest SDK main
when upstream changes, the workflow should open or update a tracking PR

This is the safest way to get "latest SDK" mapping without making local builds or PR CI flaky.

Milestones

Milestone A: Build the dynamic contract generator — Complete

Deliverables:

add scripts/update_analytics_sdk_contract.sh
add fetched upstream fixtures under tests/fixtures/upstream_sdk_latest/
generate docs/analytics_sdk_contract.md
generate tests/fixtures/analytics_sdk_contract.json
classify every current difference as exact / missing / intentional

Done when:

every dcx analytics command is mapped to an upstream SDK command or marked as dcx-specific by generated output ✓

Milestone B: Reach CLI command parity — Complete

Deliverables:

add dcx analytics views create
add dcx analytics categorical-eval
add dcx analytics categorical-views

Done when:

dcx analytics matches the current SDK command inventory for stable CLI analytics workflows ✓

Milestone C: Reach flag and evaluator parity — Complete

Deliverables:

add missing evaluator values (ttft, cost, llm-judge)
support SDK-compatible evaluator spellings
add missing flags on get-trace, insights, distribution, and others as required by the contract table
implement LIMIT in SQL for evaluate (6 templates) and distribution
add runtime warnings for placeholder flags (--criterion, --strict, --mode, --top-k, --trace-id alias)
add validate_session_id() for SQL injection prevention on list-traces
add FLAG_OVERRIDES mechanism for accurate contract classification

Done when:

the compatibility table shows no missing items for stable flags/evaluators ✓

Milestone D: Reach output and exit-code parity — Complete

Deliverables:

align success JSON payloads where practical
add BqxError::InfraError variant with exit code 2 (SDK-compatible)
change generic error exit from 1 to 2 in main.rs
document any remaining intentional differences
add 6 output-key regression tests for all major result structs
add 4 exit-code regression tests

Done when:

analytics automation examples from SDK docs can be translated to dcx without semantic surprises ✓

Milestone E: Automate drift detection — Complete

Deliverables:

add contract-check CI job that regenerates the contract from checked-in fixtures and fails if the output differs
add a scheduled sdk-sync workflow (weekly Monday 09:00 UTC) that fetches the latest upstream SDK, regenerates the contract, and opens a PR
configure git identity for bot commits on CI runners
ensure sdk-sync label is created idempotently before use
use git diff -I to ignore cosmetic date changes in staleness check

Done when:

SDK analytics changes produce a visible, reviewable generated delta in dcx ✓

Operational Policy

Use this rule for future changes:

if the SDK adds or changes a stable analytics CLI command, open a matching dcx analytics tracking issue within one release cycle
if dcx intentionally diverges, document it in the compatibility table in the same PR
do not merge analytics UX changes in dcx without checking the upstream SDK contract

Recommended First PRs

PR 1: Dynamic contract generator

add updater script
add generated contract JSON/Markdown
add fetched upstream fixtures

PR 2: Missing command parity

add views create
add categorical-eval
add categorical-views

PR 3: Evaluator parity

add ttft
add cost
add llm-judge
support SDK naming aliases

PR 4: Output and exit-code audit

align output keys
align exit behavior
add regression coverage

Definition of Done

This alignment effort is complete when:

dcx analytics covers the stable SDK CLI analytics commands
evaluator and flag semantics are documented and tested
all remaining differences are intentional and written down
the latest SDK contract can be re-fetched and regenerated by script
CI has a lightweight drift check so the two surfaces do not silently diverge
a scheduled updater keeps dcx tracking latest upstream changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Plan: Keep `dcx analytics` Aligned with BigQuery Agent Analytics SDK

Goal

What Upstream Looks Like Today

Current `dcx analytics` Surface

Alignment Principle

Scope Boundary

Layer 1: Required parity

Layer 2: Fast-follow parity

Layer 3: Optional CLI surfacing

Contract Model

Current Gap Map

1. Command parity gaps

2. `evaluate` parity gaps

3. Existing command flag gaps

4. Output-shape gaps

Proposed Architecture

1. Dynamic upstream fetch + checked-in cache

2. Generated compatibility table in-repo

3. Golden CLI contract tests

4. Intentional divergence registry

Dynamic Mapping Flow

Updater inputs

Updater outputs

CI policy

Milestones

Milestone A: Build the dynamic contract generator — Complete

Milestone B: Reach CLI command parity — Complete

Milestone C: Reach flag and evaluator parity — Complete

Milestone D: Reach output and exit-code parity — Complete

Milestone E: Automate drift detection — Complete

Operational Policy

Recommended First PRs

PR 1: Dynamic contract generator

PR 2: Missing command parity

PR 3: Evaluator parity

PR 4: Output and exit-code audit

Definition of Done

FilesExpand file tree

analytics_sdk_alignment_plan.md

Latest commit

History

analytics_sdk_alignment_plan.md

File metadata and controls

Plan: Keep dcx analytics Aligned with BigQuery Agent Analytics SDK

Goal

What Upstream Looks Like Today

Current dcx analytics Surface

Alignment Principle

Scope Boundary

Layer 1: Required parity

Layer 2: Fast-follow parity

Layer 3: Optional CLI surfacing

Contract Model

Current Gap Map

1. Command parity gaps

2. evaluate parity gaps

3. Existing command flag gaps

4. Output-shape gaps

Proposed Architecture

1. Dynamic upstream fetch + checked-in cache

2. Generated compatibility table in-repo

3. Golden CLI contract tests

4. Intentional divergence registry

Dynamic Mapping Flow

Updater inputs

Updater outputs

CI policy

Milestones

Milestone A: Build the dynamic contract generator — Complete

Milestone B: Reach CLI command parity — Complete

Milestone C: Reach flag and evaluator parity — Complete

Milestone D: Reach output and exit-code parity — Complete

Milestone E: Automate drift detection — Complete

Operational Policy

Recommended First PRs

PR 1: Dynamic contract generator

PR 2: Missing command parity

PR 3: Evaluator parity

PR 4: Output and exit-code audit

Definition of Done

Plan: Keep `dcx analytics` Aligned with BigQuery Agent Analytics SDK

Current `dcx analytics` Surface

2. `evaluate` parity gaps