Skip to content

Conversation

@xibinliu
Copy link
Collaborator

@xibinliu xibinliu commented Jan 9, 2026

Description

Trigger the CI with the stable and nightly images

Tests

Image Build and Test workflows run by this PR: here

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

@xibinliu xibinliu force-pushed the xibin/ci branch 5 times, most recently from 76827fc to d932d98 Compare January 9, 2026 03:34
@xibinliu xibinliu marked this pull request as ready for review January 9, 2026 04:04
@xibinliu xibinliu force-pushed the xibin/ci branch 3 times, most recently from 0f982ee to 6e5c3de Compare January 9, 2026 04:45
@codecov
Copy link

codecov bot commented Jan 9, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@xibinliu xibinliu force-pushed the xibin/ci branch 4 times, most recently from 86f6416 to 3909d48 Compare January 22, 2026 21:08
@xibinliu xibinliu force-pushed the xibin/ci branch 2 times, most recently from a0b8391 to f237ce3 Compare January 23, 2026 20:29
@xibinliu xibinliu force-pushed the xibin/ci branch 2 times, most recently from 1db9c4e to acdad52 Compare January 23, 2026 20:57
@xibinliu xibinliu force-pushed the xibin/ci branch 3 times, most recently from 4c71073 to 6f617e1 Compare January 24, 2026 05:56
@xibinliu xibinliu force-pushed the xibin/ci branch 7 times, most recently from 47a7f7d to 2f286d7 Compare January 24, 2026 06:35
# It runs automatically daily at 12am UTC, on Pull Requests, or manually via Workflow Dispatch.

name: Build Images
name: Build and Test Images
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we split this file or add an option to distinguish stable image building and testing from nightly building and testing ? Assuming that nightly might be frequently broken, my concern is that bundling the two would prevent us from publishing stable images to pypi

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The build and tests jobs are independent for stable and nightly images, so they won't impact each other. If we want to keep them in one file depends on how the pypi publishing process use the output from this workflow. @SurbhiJainUSC could you comment here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This workflow uploads the docker image to PyPI. This is independent of pypi release process. Also, we will be able to release stable docker image to GCR even if nightly docker image fails.

with:
flavor: ${{ matrix.flavor }}
base_image: ${{ matrix.image }}:${{ needs.setup.outputs.image_date }}
is_scheduled_run: ${{ github.event_name == 'schedule' }}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should probably just be true throughout this file. The idea for this flag is to run more tests when this isn't affecting a PR

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trigger could be scheduled or manual for this workflow. But it probably okay to run full tests even with manual trigger. I now changed is_scheduled_run to true throughout the file.

required: false
type: boolean
default: false
worker_group:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to make worker_group and total_workers parameters ? do these change across invocations from uploadDockerImages and build_and_test_maxtext ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I removed the "worker_group" and "total_workers" as parameters from the coordinator, and set them based on the other input (cpu or not).

|| (contains(inputs.flavor, 'gpu') && 'a100-40gb-4' || 'X64') }}
cloud_runner: >-
${{ contains(inputs.flavor, 'tpu') && 'linux-x86-ct6e-180-4tpu'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using complex selection operators here consider creating a setup job that maps the flavor to specific outputs that would then be used as parameters. This centralizes the logic and provides implicit validation. Something like this:

configure:
outputs:
device_type: ${{ steps.map.outputs.device_type }}
device_name: ${{ steps.map.outputs.device_name }}
# ... other outputs
steps:
- id: map
run: |
case "${{ inputs.flavor }}" in
"tpu-unit")
echo "device_type=tpu" >> $GITHUB_OUTPUT
echo "device_name=v6e-4" >> $GITHUB_OUTPUT
# also for pytest markers etc
;;
"gpu-unit")
echo "device_type=cuda12" >> $GITHUB_OUTPUT
echo "device_name=a100-40gb-4" >> $GITHUB_OUTPUT
;;
*)
echo "::error::Unsupported flavor: ${{ inputs.flavor }}"
exit 1
;;
esac

execute-test-package:
needs: configure
uses: ./.github/workflows/run_tests_against_package.yml
with:
device_type: ${{ needs.configure.outputs.device_type }}
# ...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "configure" job has to be run a VM, isn't this heavier than just using selection operators?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants