Skip to content

test(e2e): run onboarding assertions in scenario runner#4657

Draft
cv wants to merge 1 commit into
ci/e2e-scenario-dry-run-summaryfrom
test/e2e-onboarding-assertions-runner
Draft

test(e2e): run onboarding assertions in scenario runner#4657
cv wants to merge 1 commit into
ci/e2e-scenario-dry-run-summaryfrom
test/e2e-onboarding-assertions-runner

Conversation

@cv
Copy link
Copy Markdown
Collaborator

@cv cv commented Jun 2, 2026

Summary

Runs declared YAML onboarding assertions from the shell scenario runner after setup/onboarding and before expected-state validation. Dry-run mode traces and reports the assertions without executing live assertion scripts.

Related Issue

Refs #3588.

Changes

  • Resolve onboarding_assertions from plan.json and nemoclaw_scenarios/scenarios.yaml inside runtime/run-scenario.sh.
  • Execute positive and negative preflight onboarding assertions at the correct point in the runner flow.
  • Add dry-run tracing and regression expectations for the baseline OpenClaw scenario.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • npx prek run --all-files passes
  • npm test passes
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • npm run docs builds without warnings (doc changes only)
  • Doc pages follow the style guide (doc changes only)
  • New doc pages include SPDX header and frontmatter (new pages only)

Additional verification run:

  • bash test/e2e-scenario/runtime/run-scenario.sh ubuntu-repo-cloud-openclaw --dry-run
  • E2E_CONTEXT_DIR=$(mktemp -d) bash test/e2e-scenario/runtime/run-scenario.sh ubuntu-no-docker-preflight-negative --dry-run
  • npx vitest run --project e2e-scenario-framework test/e2e-scenario/framework-tests/e2e-scenario-first-migration.test.ts test/e2e-scenario/framework-tests/e2e-scenario-resolver.test.ts --silent=false --reporter=default
  • npx prek run --files test/e2e-scenario/runtime/run-scenario.sh test/e2e-scenario/framework-tests/e2e-scenario-first-migration.test.ts

Signed-off-by: Carlos Villela cvillela@nvidia.com

@cv cv self-assigned this Jun 2, 2026
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Jun 2, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 2, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 5ac1f7ab-9b24-4068-bed5-761f78bdb168

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch test/e2e-onboarding-assertions-runner

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 2, 2026

E2E Advisor Recommendation

Required E2E: e2e-scenarios/ubuntu-repo-cloud-openclaw, e2e-scenarios/ubuntu-no-docker-preflight-negative, e2e-scenarios/ubuntu-invalid-nvidia-key-negative
Optional E2E: e2e-scenarios/ubuntu-gateway-port-conflict-negative, e2e-scenarios/ubuntu-repo-cloud-hermes

Dispatch hint: ubuntu-repo-cloud-openclaw,ubuntu-no-docker-preflight-negative,ubuntu-invalid-nvidia-key-negative

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/ci/e2e-scenario-dry-run-summary
Head: HEAD
Confidence: medium

Required E2E

  • e2e-scenarios/ubuntu-repo-cloud-openclaw (medium): Primary canonical Ubuntu repo + Docker + cloud OpenClaw scenario. It is the scenario explicitly updated in the framework test and should verify onboarding assertion ordering around install, onboarding, gateway, sandbox, and dry-run expected-state validation.
  • e2e-scenarios/ubuntu-no-docker-preflight-negative (medium): Covers the preflight expected-failure path where run_onboarding_assertions is now called after forced Docker/preflight failure and before failure matching exits.
  • e2e-scenarios/ubuntu-invalid-nvidia-key-negative (medium): Covers an onboarding expected-failure scenario using the expected_failure branch now affected by onboarding assertion execution and side-effect checks.

Optional E2E

  • e2e-scenarios/ubuntu-gateway-port-conflict-negative (medium): Additional negative onboarding confidence for a different failure class and side-effect boundary around gateway startup/port conflicts.
  • e2e-scenarios/ubuntu-repo-cloud-hermes (medium): Useful adjacent coverage that the shared onboarding assertion hook does not regress the Hermes cloud onboarding scenario family.

New E2E recommendations

  • shell-scenario-runner-coverage (high): The existing scenario workflow dispatches test/e2e-scenario/scenarios/run.ts --dry-run, while this PR directly changes test/e2e-scenario/runtime/run-scenario.sh. Add a workflow-dispatched E2E job for the shell runner so future changes to run-scenario.sh are covered outside Vitest framework tests.
    • Suggested test: Add an E2E workflow/job that runs: bash test/e2e-scenario/runtime/run-scenario.sh ubuntu-repo-cloud-openclaw --dry-run and bash test/e2e-scenario/runtime/run-scenario.sh ubuntu-no-docker-preflight-negative --dry-run, uploading E2E_CONTEXT_DIR artifacts on failure.
  • live-onboarding-assertions (medium): Dry-run validates ordering and trace output, but live assertion scripts can fail due to PATH, credentials, sandbox, or preflight state differences. A small live smoke would catch integration bugs in assertion script invocation.
    • Suggested test: Add a live Ubuntu repo cloud OpenClaw shell-runner smoke that executes onboarding assertions before suite dispatch, gated on NVIDIA_API_KEY and using existing cleanup/artifact conventions.

Dispatch hint

  • Workflow: .github/workflows/e2e-scenarios.yaml
  • jobs input: ubuntu-repo-cloud-openclaw,ubuntu-no-docker-preflight-negative,ubuntu-invalid-nvidia-key-negative

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 2, 2026

E2E Scenario Advisor Recommendation

Required scenario E2E: e2e-scenarios-all
Optional scenario E2E: None

Dispatch required scenario E2E:

  • gh workflow run e2e-scenarios-all.yaml --ref <pr-head-ref>

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/ci/e2e-scenario-dry-run-summary
Head: HEAD
Confidence: high

Required scenario E2E

  • e2e-scenarios-all: Scenario runtime runner code changed in test/e2e-scenario/runtime/run-scenario.sh, adding onboarding assertion execution across normal and negative scenario paths. Runtime/runner changes can affect every scenario, so the full scenario fan-out is required.
    • Dispatch: gh workflow run e2e-scenarios-all.yaml --ref <pr-head-ref>

Optional scenario E2E

  • None.

Relevant changed files

  • test/e2e-scenario/framework-tests/e2e-scenario-first-migration.test.ts
  • test/e2e-scenario/runtime/run-scenario.sh

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 2, 2026

PR Review Advisor

Findings: 0 needs attention, 4 worth checking, 0 nice ideas
Top item: Constrain onboarding assertion script execution to trusted paths

Review findings

🛠️ Needs attention

  • None.

🔎 Worth checking

  • Source-of-truth review needed: Onboarding assertion script resolution in run-scenario.sh: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: `run_onboarding_assertions()` parses `plan.json` for IDs, parses `scenarios.yaml` for scripts, and then executes the resulting path.
  • Constrain metadata-driven assertion script execution (test/e2e-scenario/runtime/run-scenario.sh:275): The new runner resolves an assertion script from repository YAML and executes `bash "${full_path}"` after prefixing it with `E2E_ROOT`. The value is quoted, so this is not direct shell metacharacter injection, but the execution point does not normalize the path or enforce that it remains under an expected directory such as `test/e2e-scenario/onboarding_assertions/`. It also re-reads `scenarios.yaml` instead of executing already-resolved assertion metadata from `plan.json`, leaving the runner with a second source of truth for script paths.
    • Recommendation: Before executing, validate assertion IDs and script paths against a narrow allowlist: normalize/realpath the path, reject absolute paths and `..`, require the resolved file to be under the onboarding assertions directory, and consider having the resolver emit `{id, script, assertion_id}` into `plan.json` so the runner executes the resolved plan rather than re-parsing YAML.
    • Evidence: `run_onboarding_assertions()` loads `scenarios?.onboarding_assertions?.[id]`, reads `assertion.script`, builds `full_path="${E2E_ROOT}/${script}"`, checks only `-f`, and then runs `bash "${full_path}"`.
  • Add negative and live-mode coverage for onboarding assertion execution (test/e2e-scenario/framework-tests/e2e-scenario-first-migration.test.ts:85): The committed test update verifies only the positive dry-run path where assertion scripts are intentionally skipped. The new behavior also has important live and failure semantics: unknown assertion IDs, missing or non-contained scripts, assertion script failures, and negative-preflight assertion execution should fail closed at the intended point.
    • Recommendation: Add framework tests that exercise `run_onboarding_assertions()` through `run-scenario.sh` with a temporary or fixture plan/metadata path if needed: successful live execution of a harmless assertion, unknown assertion ID, missing script, rejected path traversal/non-contained script, script failure exit propagation, and the negative-preflight dry-run path claimed in the PR verification.
    • Evidence: The added assertions check `== onboarding assertions ==` and dry-run PASS output for `ubuntu-repo-cloud-openclaw`; there is no committed test covering actual `bash "${full_path}"` execution or the error branches in `run_onboarding_assertions()`.
  • Live preflight assertion may false-fail on benign Docker/container log text (test/e2e-scenario/runtime/run-scenario.sh:475): This PR begins running the existing `preflight-passed` assertion in live positive scenarios. That assertion fails if `onboard.log` contains broad terms such as `docker`, `container`, `daemon`, or `socket`, even if the onboarding completed successfully and the log only contains informational text. The new call site makes that broad grep part of the live scenario gate.
    • Recommendation: Tighten the assertion to match explicit failure/error patterns rather than standalone infrastructure terms, or base it on structured onboarding/preflight status where available. Add a regression test with a successful onboarding log that mentions Docker/container informationally.
    • Evidence: `run-scenario.sh` now calls `run_onboarding_assertions()` after successful onboarding; `onboarding_assertions/preflight/00-preflight-passed.sh` fails on `grep -Eiq "preflight.*(fail|error)|docker|container|daemon|socket" "${E2E_CONTEXT_DIR}/onboard.log"`.

🌱 Nice ideas

  • None.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

@wscurran wscurran added area: e2e End-to-end tests, nightly failures, or validation infrastructure feature PR adds or expands user-visible functionality labels Jun 3, 2026
@wscurran
Copy link
Copy Markdown
Contributor

wscurran commented Jun 3, 2026


Related open issues:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: e2e End-to-end tests, nightly failures, or validation infrastructure feature PR adds or expands user-visible functionality

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants