Skip to content

Smart resource reachability analysis for shrink #17

@avelino

Description

@avelino

Problem

The current --shrink flag uses a heuristic approach — it removes known-unnecessary files by pattern (.git, LICENSE, SCM metadata, etc.). This is safe but leaves significant optimization potential on the table. Many resources inside uberjars are never accessed at runtime but cannot be identified by simple pattern matching.

Inspiration

GraalVM's native-image performs reachability analysis on resources — only resources that are actually referenced by reachable code are included in the final binary. Their build reports show that erroneous resource inclusion (e.g., overly broad regex patterns in resource configuration) is one of the most common causes of bloated binaries.

Their size optimization guide demonstrates that removing unreachable resources and code can reduce binary size by 40%+ in some cases.

Expected Outcome

An enhanced --shrink that goes beyond pattern matching to perform actual analysis of resource usage:

Analysis Layers

  1. Class reference analysis: scan bytecode for getResource(), getResourceAsStream(), and similar calls to identify which resources are actually loaded
  2. Namespace dependency analysis: for Clojure AOT classes, trace namespace require chains to identify orphan namespaces (dev/test utilities bundled in production)
  3. Duplicate detection: find identical files included multiple times from different dependencies (common with LICENSE, NOTICE, META-INF/services)
  4. Native library analysis: detect platform-specific native libs for other platforms (e.g., Linux binary bundling .dll files from a Windows dependency)
  5. Dev-dependency detection: identify resources from dependencies that are typically dev-only (test frameworks, REPL utilities, documentation generators)

Shrink Levels

jbundle build --shrink              # current behavior (safe patterns only)
jbundle build --shrink aggressive   # add reachability analysis
jbundle build --shrink report       # show what would be removed without removing

Categories of Removable Content

Category Example Risk
SCM metadata .git/, .svn/, pom.properties None
Duplicate licenses Multiple LICENSE.txt from deps None
Wrong-platform natives .dll in Linux build, .dylib in Linux None
Unreferenced resources XML configs for unused features Low
Dev/test namespaces *-test.class, dev/*.class Low
Unused service providers META-INF/services/ for unused interfaces Medium
Unused class files Classes from deps never referenced Medium

Safety Mechanism

  • --shrink (no argument) remains safe and conservative (current behavior)
  • --shrink aggressive performs deeper analysis but may break apps with highly dynamic resource loading
  • --shrink report (or integration with analyze) shows potential savings without removing anything
  • Allow --shrink-keep <pattern> to whitelist resources that analysis marks as removable but the user knows are needed

Impact Estimate

For a typical Clojure web application uberjar:

  • Current --shrink: removes ~5-15% (metadata, licenses, SCM)
  • With reachability analysis: could remove ~20-40% (unused deps, wrong-platform natives, dev resources)

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/jlinkjdeps, módulos e runtime mínimoperformanceOtimização de tamanho ou velocidade

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions