Skip to content

ENH: Add support for BrotliDecode filter (PDF 2.0) #3223#3254

Open
ash01ish wants to merge 2 commits intopy-pdf:mainfrom
ash01ish:feat/add-brotli-decode
Open

ENH: Add support for BrotliDecode filter (PDF 2.0) #3223#3254
ash01ish wants to merge 2 commits intopy-pdf:mainfrom
ash01ish:feat/add-brotli-decode

Conversation

@ash01ish
Copy link
Copy Markdown

Implements the BrotliDecode filter as specified in ISO 32000-2:2020, Section 7.4.11. Adds necessary constants, integrates the filter into the decoding logic, includes brotli as an optional dependency, adds unit tests, and updates documentation.

Closes #3223

@ash01ish ash01ish changed the title feat: Add support for BrotliDecode filter (PDF 2.0) #3223 ENH: Add support for BrotliDecode filter (PDF 2.0) #3223 Apr 13, 2025
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 13, 2025

Codecov Report

❌ Patch coverage is 51.51515% with 16 lines in your changes missing coverage. Please review.
✅ Project coverage is 97.28%. Comparing base (3155e04) to head (38cd4d7).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
pypdf/filters.py 48.38% 14 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3254      +/-   ##
==========================================
- Coverage   97.43%   97.28%   -0.16%     
==========================================
  Files          55       55              
  Lines       10022    10055      +33     
  Branches     1842     1848       +6     
==========================================
+ Hits         9765     9782      +17     
- Misses        149      163      +14     
- Partials      108      110       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ash01ish ash01ish force-pushed the feat/add-brotli-decode branch 7 times, most recently from 1573164 to d339bc8 Compare April 14, 2025 02:15
Copy link
Copy Markdown
Collaborator

@stefan6419846 stefan6419846 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. Is the official specification available in the meantime?

Apart from this, I have left some comments I stumbled upon during the review.

Comment thread CHANGELOG.md Outdated
Comment thread docs/modules/constants.rst Outdated
Comment thread pypdf/constants.py Outdated
Comment thread pypdf/constants.py Outdated
Comment thread pypdf/filters.py Outdated
Comment thread requirements/ci.txt Outdated
Comment thread requirements/dev.txt Outdated
Comment thread tests/test_filters.py Outdated
Comment thread tests/test_filters.py Outdated
Comment thread tests/test_filters.py Outdated
@ash01ish ash01ish force-pushed the feat/add-brotli-decode branch 2 times, most recently from 274ed94 to 4dd3fa0 Compare April 14, 2025 09:21
@ash01ish
Copy link
Copy Markdown
Author

Reverting the requirements.txt files to main will show an error that the package is not installed. Is there any other way to get them installed? - I have added them in dev.in and ci.in - expecting that the req will be generated while running the pipeline.

@ash01ish
Copy link
Copy Markdown
Author

Is the official specification available in the meantime?

The official specification for Brotli compression in PDF is not yet released. However, the PDF Association has announced its upcoming inclusion in PDF 2.0. Sample PDF files are available for developers to begin testing. https://pdfa.org/brotli-compression-coming-to-pdf/

@stefan6419846
Copy link
Copy Markdown
Collaborator

Reverting the requirements.txt files to main will show an error that the package is not installed. Is there any other way to get them installed? - I have added them in dev.in and ci.in - expecting that the req will be generated while running the pipeline.

You can change the .txt variants, but ONLY for the actual dependency.

@ash01ish ash01ish force-pushed the feat/add-brotli-decode branch from 8b4514f to b7a5245 Compare April 14, 2025 12:05
Comment thread pypdf/filters.py Outdated
Comment thread pypdf/filters.py Outdated
Comment thread tests/test_filters.py Outdated
Comment thread tests/test_filters.py Outdated
Comment thread tests/test_filters.py Outdated
Comment thread tests/test_filters.py Outdated
Comment thread tests/test_filters.py Outdated
Comment thread tests/test_filters.py Outdated
Comment thread tests/test_filters.py Outdated
Comment thread tests/test_filters.py Outdated
Comment thread tests/test_filters.py Outdated
Comment thread tests/test_filters.py Outdated
@ash01ish ash01ish force-pushed the feat/add-brotli-decode branch from 672f080 to a4fdd61 Compare April 14, 2025 15:18
Comment thread tests/test_filters.py Outdated
Comment thread tests/test_filters.py

def test_ccitt_fax_decode():
data = b""
parameters = DictionaryObject(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned previously, please revert all unrelated changes.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I have staged a commit which will revert of the changes - It seems I have run ruff fix and everything formatted. Apologies - this will be fixed.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still seems to be unresolved?

Comment thread tests/test_filters.py
Comment thread tests/test_filters.py Outdated
def test_brotli_missing_installation_mocked():
"""Verify BrotliDecode raises ImportError if brotli is not installed."""
# Import pypdf.filters *after* patching sys.modules
import pypdf.filters
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this as local imports?

@ash01ish ash01ish requested a review from stefan6419846 April 17, 2025 17:08
@ash01ish ash01ish force-pushed the feat/add-brotli-decode branch from 1f83b52 to 025226a Compare April 21, 2025 11:10
Comment thread requirements/dev.txt Outdated
Comment thread requirements/ci.txt Outdated
Comment thread requirements/ci-3.11.txt Outdated
Comment thread tests/test_filters.py Outdated
Comment thread tests/test_filters.py Outdated
Comment thread tests/test_filters.py Outdated
Comment thread tests/test_filters.py Outdated
Copy link
Copy Markdown
Collaborator

@stefan6419846 stefan6419846 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your patience. I have added some hopefully final remarks. Additionally, I did some local testing and the results looked correct, although missing support in other common tools is missing and thus complicates verifying the behavior.

@ash01ish ash01ish force-pushed the feat/add-brotli-decode branch from aee4bf8 to 3ba2235 Compare May 9, 2025 11:24
@ash01ish ash01ish requested a review from stefan6419846 May 9, 2025 13:37
Comment thread tests/test_filters.py Outdated
Comment thread tests/test_filters.py Outdated
Comment thread tests/test_filters.py Outdated
Comment thread tests/test_filters.py Outdated
Comment thread tests/test_filters.py
Comment thread tests/test_filters.py Outdated
@stefan6419846 stefan6419846 added the needs-change The PR/issue cannot be handled as issue and needs to be improved label Jun 5, 2025
@ash01ish ash01ish requested a review from stefan6419846 June 5, 2025 16:56
Comment thread pypdf/filters.py Outdated
Copy link
Copy Markdown
Collaborator

@stefan6419846 stefan6419846 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There currently are merge conflicts and the limit handling is wrong. Could you please have another look at it, as well as reviewing the missing coverage?

Comment thread pypdf/filters.py Outdated
if brotli is None:
raise ImportError("Brotli library not installed. Required for BrotliDecode filter.")
result = brotli.decompress(data)
if len(result) > BrotliDecode.MAX_OUTPUT_SIZE:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will not help with security, as it will detect limit overflows only after all data has been processed. This makes OOMs more likely.

The configuration value should follow the usual pattern, apart from using the proper API of the brotli library for limiting the output length and detecting unprocessed data.

@ash01ish ash01ish closed this Apr 10, 2026
@ash01ish ash01ish force-pushed the feat/add-brotli-decode branch from 7723cf6 to 3155e04 Compare April 10, 2026 01:18
@ash01ish ash01ish reopened this Apr 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-change The PR/issue cannot be handled as issue and needs to be improved

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BrotliDecode filter

2 participants