Skip to content

jar: support unconventional jar names#1467

Merged
github-actions[bot] merged 1 commit intoquay:mainfrom
RTann:jar-unidentified
Apr 7, 2025
Merged

jar: support unconventional jar names#1467
github-actions[bot] merged 1 commit intoquay:mainfrom
RTann:jar-unidentified

Conversation

@RTann
Copy link
Copy Markdown
Contributor

@RTann RTann commented Jan 14, 2025

Some JAR files just have bad names 🤷. Claircore should still continue to search for inner JARs in case the found JAR embeds valid JARs. Before this, we just stopped looking through any top-level JAR file with an unconventional name.

When testing, I realized we cannot really tell the difference between JARs and "inner" JARs. I'm wondering if I should also update the package name to be the full path instead of just the final portion. That is:

return testdata/inner/inner.jar:BOOT-INF/lib/log4j-api-2.14.jar:META-INF/inner-jar/log4j-2.14.0.jar instead of META-INF/inner-jar/log4j-2.14.0.jar.

Also, I realized the packagescanner does not consider a JAR file a valid JAR unless it has a META-INF directory. According to the JAR spec from the last few LTS releases (11, 17, and 21) as well as the latest non-LTS release (23):

A JAR file is essentially a zip file that contains an optional META-INF directory.

So, the META-INF directory is not required, so we may want to consider dropping that constraint. Thoughts?

@RTann RTann marked this pull request as ready for review January 16, 2025 00:11
@RTann RTann requested a review from a team as a code owner January 16, 2025 00:11
@RTann RTann requested review from crozzy and hdonnay and removed request for a team January 16, 2025 00:11
daynewlee
daynewlee previously approved these changes Mar 10, 2025
Copy link
Copy Markdown
Contributor

@daynewlee daynewlee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look ok to me. Maybe get @crozzy to have another look.

Copy link
Copy Markdown
Contributor

@crozzy crozzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic essentially looks good to me, just a few comments on the tests

@RTann
Copy link
Copy Markdown
Contributor Author

RTann commented Mar 18, 2025

@crozzy when you re-review can you also let me know your thoughts about the following:

When testing, I realized we cannot really tell the difference between JARs and "inner" JARs. I'm wondering if I should also update the package name to be the full path instead of just the final portion. That is:

return testdata/inner/inner.jar:BOOT-INF/lib/log4j-api-2.14.jar:META-INF/inner-jar/log4j-2.14.0.jar instead of META-INF/inner-jar/log4j-2.14.0.jar.

Also, I realized the packagescanner does not consider a JAR file a valid JAR unless it has a META-INF directory. According to the JAR spec from the last few LTS releases (11, 17, and 21) as well as the latest non-LTS release (23):

A JAR file is essentially a zip file that contains an optional META-INF directory.

So, the META-INF directory is not required, so we may want to consider dropping that constraint. Thoughts?

@crozzy
Copy link
Copy Markdown
Contributor

crozzy commented Mar 25, 2025

@crozzy when you re-review can you also let me know your thoughts about the following:

When testing, I realized we cannot really tell the difference between JARs and "inner" JARs. I'm wondering if I should also update the package name to be the full path instead of just the final portion. That is:
return testdata/inner/inner.jar:BOOT-INF/lib/log4j-api-2.14.jar:META-INF/inner-jar/log4j-2.14.0.jar instead of META-INF/inner-jar/log4j-2.14.0.jar.
Also, I realized the packagescanner does not consider a JAR file a valid JAR unless it has a META-INF directory. According to the JAR spec from the last few LTS releases (11, 17, and 21) as well as the latest non-LTS release (23):
A JAR file is essentially a zip file that contains an optional META-INF directory.
So, the META-INF directory is not required, so we may want to consider dropping that constraint. Thoughts?

For point 1, Yeah I agree it'd be nicer to know from whence the data came and I think that's the intention of PackageDB (at least for the Java indexer). Whether it's another PR or a commit in this PR is up to you.

For point 2, I'm trying to judge the balance between making sure we're getting every corner-case / doing excess processing / inflating storage with sub-par/unmatchable data. I think it's probably worth documenting what the spec says, I don't think it warrants changing the current flow (in any case it should be another PR to update that logic).

crozzy
crozzy previously approved these changes Mar 25, 2025
Copy link
Copy Markdown
Contributor

@crozzy crozzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in current state, if you adjust the path in the PackageDB dismiss the review and I'll re-review.

@RTann
Copy link
Copy Markdown
Contributor Author

RTann commented Mar 25, 2025

@crozzy I noticed I kept a for loop in the test which set the SHAs to nil even though I added the cmpopts.IgnoreFields(Info{}, "SHA"), so I removed the loop. Can I get another approval?

@RTann
Copy link
Copy Markdown
Contributor Author

RTann commented Mar 25, 2025

@crozzy I also remembered what I was asking about for

When testing, I realized we cannot really tell the difference between JARs and "inner" JARs. I'm wondering if I should also update the package name to be the full path instead of just the final portion. That is:

return testdata/inner/inner.jar:BOOT-INF/lib/log4j-api-2.14.jar:META-INF/inner-jar/log4j-2.14.0.jar instead of META-INF/inner-jar/log4j-2.14.0.jar.

so looking at the jar_test.go we can see the names of packages. These packages are buried pretty deep inside the top-level JAR file (see the README). So when a user sees these packages related to this JAR, they don't really know how to fix them. Where are these coming from? It may be worth updating the package DB to make it clear these are from inner JARs. zi can do that in a followup

@RTann
Copy link
Copy Markdown
Contributor Author

RTann commented Mar 25, 2025

PR to adjust PackageDB: #1503

crozzy
crozzy previously approved these changes Apr 1, 2025
Copy link
Copy Markdown
Contributor

@crozzy crozzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@RTann
Copy link
Copy Markdown
Contributor Author

RTann commented Apr 2, 2025

@crozzy sorry can you 👍 one last time? I rebased, and I also realized I never pushed the change to the comment you requested

@RTann RTann requested a review from crozzy April 2, 2025 18:29
Signed-off-by: RTann <rtannenb@redhat.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
@RTann
Copy link
Copy Markdown
Contributor Author

RTann commented Apr 7, 2025

/fast-forward

@github-actions github-actions bot merged commit cc06bf4 into quay:main Apr 7, 2025
6 checks passed
@RTann RTann deleted the jar-unidentified branch April 7, 2025 23:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants