feat: improve CVMFS catalog placement#295
Open
wdconinc wants to merge 5 commits into
Open
Conversation
Script walks a deployed container sandbox (e.g., a CVMFS path) and reports per-catalog 'owned entry' counts alongside suggestions for directories that are over- or under-catalogenized. Usage: .ci/cvmfs_catalog_analysis [--min-entries N] [--max-entries N] [--depth N] <PREFIX> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
System directories (/etc, /usr and two levels of subdirectories) were missing catalog boundaries entirely, leaving all their file entries in the root catalog. Add placement in containers/debian/Dockerfile. In the /opt tree: - Remove per-package-subdir catalogs (lib/, bin/, share/, etc.) by dropping maxdepth from 3 to 2 in the /opt/software find command. These ~1933 catalogs changed atomically with the package and added unnecessary CVMFS traversal overhead. - Add arch-level catalog for linux-x86_64_v2 (depth 1 under /opt/software) to keep the /opt/software root catalog small and future-proof multi-arch. - Add first-level subdirectory catalogs for /opt/local (Spack view) so that a single-package update only invalidates the relevant view subdirs (lib/, bin/, etc.), not the entire merged view. - Add missing catalogs for /opt/detector (and per-version subdirs), /opt/benchmarks, /opt/campaigns, /opt/spack-environment, and /opt/spack-packages. Net effect vs eic_xl:nightly baseline: Before: ~2537 catalogs (/opt only) After: ~832 catalogs (entire container) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Improves CVMFS .cvmfscatalog boundary placement in the EIC container images to reduce catalog traversal overhead on CVMFS clients, and adds a helper tool to analyze catalog placement under a deployed container prefix.
Changes:
- Refines catalog boundary creation under
/opt/softwareand expands coverage to additional/opt/*trees incontainers/eic/Dockerfile. - Adds catalog boundaries for large system directories (
/etc,/usrand selected subtrees) incontainers/debian/Dockerfile. - Introduces
.ci/cvmfs_catalog_analysis, a Python utility to report per-catalog “owned” entry counts and suggest missing boundaries.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
containers/eic/Dockerfile |
Adjusts .cvmfscatalog placement under /opt/software, /opt/local, and additional /opt/* directories. |
containers/debian/Dockerfile |
Adds .cvmfscatalog boundaries for /etc and /usr (and selected /usr subtrees). |
.ci/cvmfs_catalog_analysis |
New CLI tool to analyze catalog placement and flag large non-catalog directories. |
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Reintroduce cvmfs catalog touch commands for benchmarks and campaigns.
Contributor
Author
|
Capybara diffs only due timestamp mismatch. These changes don't affect physics. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR addresses two separate but related improvements to
.cvmfscatalogplacement in the containers deployed to CVMFS atsingularity.opensciencegrid.org.New tool:
.ci/cvmfs_catalog_analysisA Python script that can be run against any deployed container prefix to analyze catalog placement:
For each catalog boundary it reports the number of entries owned by that catalog (not delegated to a nested catalog), and flags large directories that have no catalog yet. Useful for validating catalog placement after each deployment.
Catalog placement improvements
Problem 1: System directories missing catalogs.
/etc,/usr, and all their subdirectories had no catalog boundaries, so the entire/usr/bin(740 entries),/usr/include(1229 entries),/usr/lib(994 entries), etc. were loaded as part of the root/catalog. Fixed incontainers/debian/Dockerfile.Problem 2: Over-catalogenized
/opt/softwaresub-directories.The previous
find -maxdepth 3created ~1933 catalogs forlib/,bin/,share/,include/,.spack/inside each package. These directories always change atomically with the package — having separate catalogs adds traversal overhead with no benefit. Fixed by using-maxdepth 2(per-package, not per-subdir).Problem 3: Missing catalogs for several
/optdirectories./opt/detector,/opt/benchmarks,/opt/campaigns,/opt/spack-environment, and/opt/spack-packageshad no catalog boundaries. The/opt/localSpack view (13 000+ owned entries) had no sub-directory catalogs.Before / After (based on
eic_xl:nightlybaseline)/opt/software/opt/local/opt/detector, benchmarks, campaigns, env