Decode all streams in a PDF #3635
-
|
What is a good way to do the following:
Python code in this discussion would be ideal. Preferably using this function as it already exists: Line 766 in 219153e Optionally, are there any files that contain all or most of the filter types and inline images to test this with? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 3 replies
-
|
As pypdf is written, this will only work for filters which are image-only and thus do not rely on external libraries like Pillow or jbig2dec. If you do not care about using internal APIs, something like this works: from pypdf import PdfWriter
from pypdf.generic import DecodedStreamObject, EncodedStreamObject
writer = PdfWriter(clone_from="resources/crazyones.pdf")
for index, obj in enumerate(writer._objects):
if not isinstance(obj, EncodedStreamObject):
continue
new_stream = DecodedStreamObject()
new_stream.set_data(obj.get_data())
for key, value in dict(obj).items():
if key not in {"/Filter"}:
new_stream[key] = value
writer._objects[index] = new_stream
writer.write("out.pdf")I have not tested this with inline images or similar though, and relying on internal APIs is not recommended.
It should not be necessary to use this explicitly,
I am not aware of this and it is rather uncommon to have such a file - except for explicit testing purposes. |
Beta Was this translation helpful? Give feedback.
-
|
@stefan6419846 thank you for the code above; it is a good method. What would be the reverse, to revert it back? You "lose" the original filter so just say you want to encode all the streams using zlib/deflate (so the stream will have filter FlateDecode). |
Beta Was this translation helpful? Give feedback.
As pypdf is written, this will only work for filters which are image-only and thus do not rely on external libraries like Pillow or jbig2dec.
If you do not care about using internal APIs, something like this works: