BUG: honor XObject /ColorSpace hint when deriving image mode by eerose2603 · Pull Request #3498 · py-pdf/pypdf

eerose2603 · 2025-10-18T03:37:03Z

Summary

Honor the XObject /ColorSpace hint when deriving Pillow image modes in _get_mode_and_invert_color, especially when nested/indirect color spaces collapse or are ambiguous. This prevents mode mis-detection and preserves CMYK inversion semantics.

Motivation / Problem

In real-world PDFs the image XObject often carries a correct /ColorSpace, while the resolved color space provided to _get_imagemode(...) may be:

NullObject (e.g., after following references),

ICCBased without a usable /Alternate,

Indexed / Separation / DeviceN chains that don’t yield a direct Device* at the point of resolution.

Previously we:

Special-cased only /DeviceRGB, ignoring /DeviceGray and /DeviceCMYK.

Didn’t pass any prior/hinted mode into _get_imagemode(...).

Had no fallback if _get_imagemode(...) returned "".

This could yield the wrong final mode and miss CMYK inversion.

Approach

Compute preferred_mode from IA.COLOR_SPACE:

/DeviceRGB → "RGB", /DeviceGray → "L", /DeviceCMYK → "CMYK".

Pass preferred_mode to _get_imagemode(...) as prev_mode for both:

the <8 bpc branch and

the general branch (including the “2 components if not explicitly gray” rule).

If the derived mode is still "", fall back to preferred_mode.

If the fallback is CMYK, set invert_color=True (consistent with current CMYK handling elsewhere).

Effects / Compatibility

No change when _get_imagemode(...) already returns a concrete mode.

Only ambiguous/degenerate cases change—and they now align with the XObject’s declared /ColorSpace.

Keeps existing decode and flate/JPX handling untouched.

Tests

Added tests asserting that, when color_space collapses (e.g., NullObject), we still:

choose "RGB" for a /DeviceRGB hint (no invert),

choose "L" for /DeviceGray (no invert),

choose "CMYK" for /DeviceCMYK and set invert_color=True.

Preserve hinted color spaces when deriving XObject image modes

stefan6419846 · 2025-10-18T10:20:37Z

Thanks for the PR. Could you please elaborate on the reasons for this change? Additionally, please have a look at the CI failures for us to continue.

eerose2603 · 2025-10-26T05:58:49Z

What changed & why

_get_mode_and_invert_color now treats the XObject’s /ColorSpace entry as an authoritative hint for the final Pillow mode and propagates that hint through the existing _get_imagemode(...) logic:

We derive a preferred_mode from IA.COLOR_SPACE:

/DeviceRGB → "RGB", /DeviceGray → "L", /DeviceCMYK → "CMYK".

We pass preferred_mode as prev_mode to _get_imagemode(...) for both the <8 bpc path and the general path.
This ensures 1/2/4-bit and nested/indirect color spaces respect the hint.

If _get_imagemode(...) collapses to "" (e.g., NullObject, ambiguous Indexed/Separation/DeviceN chains), we fall back to the hint and set invert_color=True when the hint is CMYK—matching existing behavior where CMYK implies inversion.

Problem this solves

Some PDFs declare a reliable /ColorSpace on the image XObject, but the resolved color space passed to _get_imagemode(...) can be ambiguous or collapse (e.g., NullObject, ICCBased without usable Alternate, certain DeviceN/Indexed cases). Previously we only special-cased /DeviceRGB, so the function could return "" (unknown) and lose the author intent—leading to wrong modes (e.g., grayscale treated as RGB) or missed CMYK inversion.

Compatibility

If _get_imagemode(...) returns a concrete mode, nothing changes.

The hint is only used as a tie-breaker / fallback and to seed prev_mode, so it doesn’t override explicit resolvable color spaces.

stefan6419846 · 2025-10-26T09:59:26Z

My above requests still hold through. Where do we have such an example? Could we please have a proper test for this? (I do not think that the possibly AI-generated initial and previous comment explain this issue in a suitable way to make it clear to me.)

pypdf/_xobj_image_helpers.py:

6df34d7

Preserve hinted color spaces when deriving XObject image modes

stefan6419846 added needs-discussion The PR/issue needs more discussion before we can continue needs-change The PR/issue cannot be handled as issue and needs to be improved labels Oct 20, 2025

eerose2603 added 2 commits October 26, 2025 16:41

fixed ci issues

dfb9240

Merge branch 'main' into fixed_Image_Handling

d43cc17

eerose2603 changed the title ~~Preserve hinted color spaces when deriving XObject image modes~~ fix(images): honor /ColorSpace hint when deriving XObject image modes Oct 26, 2025

eerose2603 added 2 commits October 26, 2025 16:49

Fixed CI issue

dd65f32

Fixed CI issue

7d28c3f

eerose2603 changed the title ~~fix(images): honor /ColorSpace hint when deriving XObject image modes~~ BUG: honor XObject /ColorSpace hint when deriving image mode Oct 26, 2025

Fixed CI issue

1351297

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: honor XObject /ColorSpace hint when deriving image mode#3498

BUG: honor XObject /ColorSpace hint when deriving image mode#3498
eerose2603 wants to merge 6 commits intopy-pdf:mainfrom
eerose2603:fixed_Image_Handling

eerose2603 commented Oct 18, 2025 •

edited

Loading

Uh oh!

stefan6419846 commented Oct 18, 2025

Uh oh!

eerose2603 commented Oct 26, 2025

Uh oh!

stefan6419846 commented Oct 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

eerose2603 commented Oct 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stefan6419846 commented Oct 18, 2025

Uh oh!

eerose2603 commented Oct 26, 2025

Uh oh!

stefan6419846 commented Oct 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

eerose2603 commented Oct 18, 2025 •

edited

Loading