Draft
Conversation
Rename launch_scripts/{active → h100}/ and {flaky → h100-flaky}/ so all
directories are named after their target hardware. Add a parallel GB200
track that runs the same tests on {runner_prefix}-gb200-x2 runners.
- launch_scripts/gb200/: thin wrapper scripts that exec into h100/; one
per h100/ script for full L0/L1/L2 parity at launch
- launch_scripts/gb200-flaky/: empty placeholder; move a GB200 wrapper
here when it breaks on GB200 but not H100
- cicd-main.yml: generate-gb200-test-matrix job, three
cicd-functional-tests-gb200-l{0,1,2} jobs and a gb200-flaky job using
{runner_prefix}-gb200-x2; all gated on vars.GB200_RUNNER_PREFIX being
set so environments without GB200 runners skip cleanly
- configure: propagates expect_gb200_l{0,1,2} outputs; Nemo_CICD_Test
validates them the same way as H100 tiers
- test-template action: default script_dir updated from "active" to "h100"
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
Contributor
Author
|
/ok to test |
The variable gate caused GB200 jobs to be silently skipped when
GB200_RUNNER_PREFIX was not set as a repo variable. Since GB200 should
always run with the same trigger conditions as H100 (using the same
runner_prefix but -gb200-x2 suffix), remove the gate entirely.
Also simplify Nemo_CICD_Test: GB200 skip-checks reuse EXPECT_L{0,1,2}
rather than a separate EXPECT_GB200_L{0,1,2} set.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Contributor
Author
|
/ok to test |
Signed-off-by: oliver könig <okoenig@nvidia.com>
Contributor
Author
|
Updated PR description and workflow to use the literal runner label |
Contributor
Author
|
/ok to test |
Signed-off-by: oliver könig <okoenig@nvidia.com>
Contributor
Author
|
/ok to test |
Signed-off-by: oliver könig <okoenig@nvidia.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com>
Contributor
Author
|
/ok to test |
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com>
Contributor
Author
|
/ok to test 4ee0e61 |
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com>
Contributor
Author
|
/ok to test adf09fe |
…ility Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com>
Contributor
Author
|
/ok to test a6c0a9a |
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com>
Contributor
Author
|
/ok to test 417129d |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Per-hardware brokenness workflow:
Directory layout
```
tests/functional_tests/launch_scripts/
├── h100/
│ ├── active/ ← H100 tests (run normally)
│ └── flaky/ ← H100 tests known to be flaky
└── gb200/
├── active/ ← GB200 wrapper scripts (delegate to h100/active/ via exec)
└── flaky/ ← GB200 tests known to be flaky
```
Build matrix
Example: marking a test GB200-only broken
```bash
git mv tests/functional_tests/launch_scripts/gb200/active/L0_Launch_recipes_llama_1b.sh
tests/functional_tests/launch_scripts/gb200/flaky/
```
H100 continues to run `h100/active/L0_Launch_recipes_llama_1b.sh` unaffected.
Test plan
🤖 Generated with Claude Code