Skip to content

feat: improve CVMFS catalog placement#295

Open
wdconinc wants to merge 5 commits into
masterfrom
copilot/cvmfs-catalog-improvements
Open

feat: improve CVMFS catalog placement#295
wdconinc wants to merge 5 commits into
masterfrom
copilot/cvmfs-catalog-improvements

Conversation

@wdconinc
Copy link
Copy Markdown
Contributor

Summary

This PR addresses two separate but related improvements to .cvmfscatalog placement in the containers deployed to CVMFS at singularity.opensciencegrid.org.

New tool: .ci/cvmfs_catalog_analysis

A Python script that can be run against any deployed container prefix to analyze catalog placement:

.ci/cvmfs_catalog_analysis [--min-entries N] [--max-entries N] [--depth N] <PREFIX>

For each catalog boundary it reports the number of entries owned by that catalog (not delegated to a nested catalog), and flags large directories that have no catalog yet. Useful for validating catalog placement after each deployment.

Catalog placement improvements

Problem 1: System directories missing catalogs.
/etc, /usr, and all their subdirectories had no catalog boundaries, so the entire /usr/bin (740 entries), /usr/include (1229 entries), /usr/lib (994 entries), etc. were loaded as part of the root / catalog. Fixed in containers/debian/Dockerfile.

Problem 2: Over-catalogenized /opt/software sub-directories.
The previous find -maxdepth 3 created ~1933 catalogs for lib/, bin/, share/, include/, .spack/ inside each package. These directories always change atomically with the package — having separate catalogs adds traversal overhead with no benefit. Fixed by using -maxdepth 2 (per-package, not per-subdir).

Problem 3: Missing catalogs for several /opt directories.
/opt/detector, /opt/benchmarks, /opt/campaigns, /opt/spack-environment, and /opt/spack-packages had no catalog boundaries. The /opt/local Spack view (13 000+ owned entries) had no sub-directory catalogs.

Before / After (based on eic_xl:nightly baseline)

Before After
Catalogs under /opt/software ~2535 ~605
Catalogs under /opt/local 1 ~17
Catalogs in system dirs 1 (root only) ~192
Catalogs for /opt/detector, benchmarks, campaigns, env 0 ~18
Total ~2537 ~832

wdconinc and others added 2 commits May 18, 2026 14:59
Script walks a deployed container sandbox (e.g., a CVMFS path) and
reports per-catalog 'owned entry' counts alongside suggestions for
directories that are over- or under-catalogenized.

Usage:
  .ci/cvmfs_catalog_analysis [--min-entries N] [--max-entries N] [--depth N] <PREFIX>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
System directories (/etc, /usr and two levels of subdirectories) were
missing catalog boundaries entirely, leaving all their file entries in
the root catalog.  Add placement in containers/debian/Dockerfile.

In the /opt tree:
- Remove per-package-subdir catalogs (lib/, bin/, share/, etc.) by
  dropping maxdepth from 3 to 2 in the /opt/software find command.
  These ~1933 catalogs changed atomically with the package and added
  unnecessary CVMFS traversal overhead.
- Add arch-level catalog for linux-x86_64_v2 (depth 1 under /opt/software)
  to keep the /opt/software root catalog small and future-proof multi-arch.
- Add first-level subdirectory catalogs for /opt/local (Spack view) so
  that a single-package update only invalidates the relevant view
  subdirs (lib/, bin/, etc.), not the entire merged view.
- Add missing catalogs for /opt/detector (and per-version subdirs),
  /opt/benchmarks, /opt/campaigns, /opt/spack-environment, and
  /opt/spack-packages.

Net effect vs eic_xl:nightly baseline:
  Before: ~2537 catalogs (/opt only)
  After:  ~832 catalogs (entire container)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 18, 2026 20:00
@wdconinc wdconinc changed the title feat(containers): improve CVMFS catalog placement feat: improve CVMFS catalog placement May 18, 2026
@wdconinc wdconinc enabled auto-merge (squash) May 18, 2026 20:05
@wdconinc wdconinc requested a review from veprbl May 18, 2026 20:07
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Improves CVMFS .cvmfscatalog boundary placement in the EIC container images to reduce catalog traversal overhead on CVMFS clients, and adds a helper tool to analyze catalog placement under a deployed container prefix.

Changes:

  • Refines catalog boundary creation under /opt/software and expands coverage to additional /opt/* trees in containers/eic/Dockerfile.
  • Adds catalog boundaries for large system directories (/etc, /usr and selected subtrees) in containers/debian/Dockerfile.
  • Introduces .ci/cvmfs_catalog_analysis, a Python utility to report per-catalog “owned” entry counts and suggest missing boundaries.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
containers/eic/Dockerfile Adjusts .cvmfscatalog placement under /opt/software, /opt/local, and additional /opt/* directories.
containers/debian/Dockerfile Adds .cvmfscatalog boundaries for /etc and /usr (and selected /usr subtrees).
.ci/cvmfs_catalog_analysis New CLI tool to analyze catalog placement and flag large non-catalog directories.

Comment thread containers/eic/Dockerfile Outdated
Comment thread containers/debian/Dockerfile Outdated
Comment thread .ci/cvmfs_catalog_analysis Outdated
Comment thread .ci/cvmfs_catalog_analysis
Comment thread .ci/cvmfs_catalog_analysis Outdated
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 18, 2026 20:12
Reintroduce cvmfs catalog touch commands for benchmarks and campaigns.
@wdconinc wdconinc review requested due to automatic review settings May 18, 2026 20:16
@wdconinc
Copy link
Copy Markdown
Contributor Author

Capybara diffs only due timestamp mismatch. These changes don't affect physics.

Copilot AI review requested due to automatic review settings May 18, 2026 23:47
@wdconinc wdconinc review requested due to automatic review settings May 18, 2026 23:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants