Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .github/workflows/containers.yml
Original file line number Diff line number Diff line change
Expand Up @@ -45,12 +45,13 @@ jobs:

build-containers:
needs: filter
if: needs.filter.outputs.test == 'true'
Comment thread
cameronrutherford marked this conversation as resolved.
strategy:
fail-fast: false
matrix:
arch: [arm64, x64]
runs-on: [self-hosted, '${{ matrix.arch }}']
steps:
- uses: actions/checkout@v6
if: needs.filter.outputs.test == 'true'
- uses: ./.github/actions/build-container
if: needs.filter.outputs.test == 'true'
72 changes: 72 additions & 0 deletions .github/workflows/ghcr-cleanup.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# GHCR Buildcache Cleanup
#
# The Spack CI (spack.yml) pushes binary packages to an OCI buildcache at
# ghcr.io/awslabs/palace-develop-testing. Over time, this accumulates stale
# versions as dependencies get rebuilt with new hashes.
#
# This workflow periodically prunes old versions, keeping the most recent ones
# so that the cache remains useful without growing unboundedly.
#
# ## How GHCR package linking works
#
# GitHub Packages (GHCR) can be linked to a repository, which grants repo
# collaborators with write access the ability to manage packages. This link
# also allows the GITHUB_TOKEN in workflows to delete package versions.
#
# The link is established by adding an OCI config label:
#
# org.opencontainers.image.source=https://github.com/awslabs/palace
Comment thread
cameronrutherford marked this conversation as resolved.
#
# We used `crane mutate --label` to add this to a tag in the package, which
# caused GHCR to auto-link the package to this repo. Once linked, the
# association persists at the GitHub level regardless of individual tag labels.
Comment on lines +20 to +22
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I am just naive but who is "we" here? Our CI jobs do this?

#
# If the package is ever deleted and recreated (e.g., a full cache rebuild),
# the link must be re-established. To do this:
#
# 1. Install crane: https://github.com/google/go-containerregistry
# 2. Authenticate: echo $TOKEN | crane auth login ghcr.io -u USER --password-stdin
# 3. Label any tag:
# crane mutate ghcr.io/awslabs/palace-develop-testing:index.spack \
# --label org.opencontainers.image.source=https://github.com/awslabs/palace
# 4. Verify the link:
# gh api orgs/awslabs/packages/container/palace-develop-testing \
# --jq '.repository.full_name'
#
# Or as a GitHub Actions step:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am interpreting this as "one needs to do this one-off operation on the index.spack in order to delete binaries in the future"?

Which begs the question - why don't we do this then in our GH Actions?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. This is a one-off operation, but it it not cheap (if you do it naively). I thought about adding to our actions but I found that processing all the packages we have in our binary cache would take several hours.

We could do something smarter, but I figured that we won't need to do this ever again (unless we change the name of the palace-develop-testing package), so I didn't bother

#
# - name: Install crane
# run: |
# curl -sL "https://github.com/google/go-containerregistry/releases/download/v0.20.3/go-containerregistry_Linux_x86_64.tar.gz" \
# | tar xz -C /usr/local/bin crane
#
# - name: Link GHCR package to repository
# run: |
# echo "${{ secrets.GITHUB_TOKEN }}" | crane auth login ghcr.io -u ${{ github.actor }} --password-stdin
# crane mutate ghcr.io/awslabs/palace-develop-testing:index.spack \
# --label org.opencontainers.image.source=https://github.com/${{ github.repository }}
name: Cleanup GHCR Buildcache

on:
schedule:
- cron: '0 6,18 * * *' # Twice daily at 6:00 and 18:00 UTC
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just once a week is fine? Curious how you decided on this cadence.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The action can only remove up to 100 packages every time it runs.

workflow_dispatch:

permissions:
packages: write

jobs:
cleanup:
runs-on: ubuntu-latest
steps:
# Each Spack spec is a tagged version in the container package. We keep
# the most recent 500 versions (roughly 5-10 full build sets) and delete
# older ones. The action deletes at most 100 per run.
Comment on lines +62 to +64
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love to be a bit more scientific about this if possible.

I am reading this as it deletes anything past n=500.

That seems potentially too many deletions if we end up expanding the matrix of builds that we want to support. Ideally I would hope that one can just delete based on the age of an installation / the time since an installation was last used.

In other words, it would be ideal to cleanup based on a different criteria:

Any binary older than a week, that also hasn't been used in the last week

- name: Delete old buildcache versions
uses: actions/delete-package-versions@v5
with:
package-name: palace-develop-testing
package-type: container
owner: awslabs
min-versions-to-keep: 500
token: ${{ secrets.GITHUB_TOKEN }}
Loading