Using
hierarchical.pdf
having two pages with acroform fields, which are grouped under a single non terminal
acroform field
in
Python 3.13.5 (main, Jun 11 2025, 15:36:57) [Clang 19.1.7 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pypdf
>>> writer = pypdf.PdfWriter()
>>> writer.append('hierarchical.pdf', pages=[0])
>>> with open("result.pdf", "wb") as output:
... writer.write(output)
...
(False, <_io.BufferedWriter name='result.pdf'>)
>>> quit()
and looking into result.pdf, one can see, that there are multiple page objects and
the old page tree root is included, but not referenced anywhere.
My conjecture would be:
I suspect that the acroform fields for the page not appended have not been removed
because of the common non terminal parent. This might have lead to the widgets annotations
of the "to be removed" page have not been removed and therefore the page being kept,
which also leads to keeping the old page tree root, as it is referenced from the page.
Environment
Which environment were you using when you encountered the problem?
$ python -m platform
# macOS-26.4-arm64-arm-64bit-Mach-O
$ python -c "import pypdf;print(pypdf._debug_versions)"
# pypdf==6.10.1, crypt_provider=('cryptography', '45.0.4'), PIL=11.3.0
Code + PDF
hierarchical.pdf
Python 3.13.5 (main, Jun 11 2025, 15:36:57) [Clang 19.1.7 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pypdf
>>> writer = pypdf.PdfWriter()
>>> writer.append('hierarchical.pdf', pages=[0])
>>> with open("result.pdf", "wb") as output:
... writer.write(output)
...
(False, <_io.BufferedWriter name='result.pdf'>)
>>> quit()
Share here the PDF file(s) that cause the issue. The smaller they are, the
better. Let us know if we may add them to our tests!
My test pdf
hierarchical.pdf
was derived from https://github.com/mozilla/pdf.js/blob/9159afd633bf02ebcb38766d75b508c0d8ee7c39/test/pdfs/form_two_pages.pdf and therefore probably has an Apache-2.0 license on it.
Traceback
None
Using
hierarchical.pdf
having two pages with acroform fields, which are grouped under a single non terminal
acroform field
in
and looking into result.pdf, one can see, that there are multiple page objects and
the old page tree root is included, but not referenced anywhere.
My conjecture would be:
I suspect that the acroform fields for the page not appended have not been removed
because of the common non terminal parent. This might have lead to the widgets annotations
of the "to be removed" page have not been removed and therefore the page being kept,
which also leads to keeping the old page tree root, as it is referenced from the page.
Environment
Which environment were you using when you encountered the problem?
Code + PDF
hierarchical.pdf
Share here the PDF file(s) that cause the issue. The smaller they are, the
better. Let us know if we may add them to our tests!
My test pdf
hierarchical.pdf
was derived from https://github.com/mozilla/pdf.js/blob/9159afd633bf02ebcb38766d75b508c0d8ee7c39/test/pdfs/form_two_pages.pdf and therefore probably has an Apache-2.0 license on it.
Traceback
None