ENH: Add support for BrotliDecode filter (PDF 2.0) #3223#3254
ENH: Add support for BrotliDecode filter (PDF 2.0) #3223#3254ash01ish wants to merge 2 commits intopy-pdf:mainfrom
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3254 +/- ##
==========================================
- Coverage 97.43% 97.28% -0.16%
==========================================
Files 55 55
Lines 10022 10055 +33
Branches 1842 1848 +6
==========================================
+ Hits 9765 9782 +17
- Misses 149 163 +14
- Partials 108 110 +2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
1573164 to
d339bc8
Compare
stefan6419846
left a comment
There was a problem hiding this comment.
Thanks for the PR. Is the official specification available in the meantime?
Apart from this, I have left some comments I stumbled upon during the review.
274ed94 to
4dd3fa0
Compare
|
Reverting the requirements.txt files to main will show an error that the package is not installed. Is there any other way to get them installed? - I have added them in dev.in and ci.in - expecting that the req will be generated while running the pipeline. |
The official specification for Brotli compression in PDF is not yet released. However, the PDF Association has announced its upcoming inclusion in PDF 2.0. Sample PDF files are available for developers to begin testing. https://pdfa.org/brotli-compression-coming-to-pdf/ |
You can change the |
8b4514f to
b7a5245
Compare
672f080 to
a4fdd61
Compare
|
|
||
| def test_ccitt_fax_decode(): | ||
| data = b"" | ||
| parameters = DictionaryObject( |
There was a problem hiding this comment.
As mentioned previously, please revert all unrelated changes.
There was a problem hiding this comment.
Yes. I have staged a commit which will revert of the changes - It seems I have run ruff fix and everything formatted. Apologies - this will be fixed.
There was a problem hiding this comment.
This still seems to be unresolved?
| def test_brotli_missing_installation_mocked(): | ||
| """Verify BrotliDecode raises ImportError if brotli is not installed.""" | ||
| # Import pypdf.filters *after* patching sys.modules | ||
| import pypdf.filters |
There was a problem hiding this comment.
Do we really need this as local imports?
1f83b52 to
025226a
Compare
stefan6419846
left a comment
There was a problem hiding this comment.
Thanks for your patience. I have added some hopefully final remarks. Additionally, I did some local testing and the results looked correct, although missing support in other common tools is missing and thus complicates verifying the behavior.
aee4bf8 to
3ba2235
Compare
| if brotli is None: | ||
| raise ImportError("Brotli library not installed. Required for BrotliDecode filter.") | ||
| result = brotli.decompress(data) | ||
| if len(result) > BrotliDecode.MAX_OUTPUT_SIZE: |
There was a problem hiding this comment.
This will not help with security, as it will detect limit overflows only after all data has been processed. This makes OOMs more likely.
The configuration value should follow the usual pattern, apart from using the proper API of the brotli library for limiting the output length and detecting unprocessed data.
7723cf6 to
3155e04
Compare
Implements the BrotliDecode filter as specified in ISO 32000-2:2020, Section 7.4.11. Adds necessary constants, integrates the filter into the decoding logic, includes brotli as an optional dependency, adds unit tests, and updates documentation.
Closes #3223