Skip to content

Fix missing closing tags in multi-page PAGE XML output#4506

Merged
stweil merged 2 commits intomainfrom
copilot/fix-bug-in-application
Feb 8, 2026
Merged

Fix missing closing tags in multi-page PAGE XML output#4506
stweil merged 2 commits intomainfrom
copilot/fix-bug-in-application

Conversation

Copy link
Contributor

Copilot AI commented Feb 8, 2026

Multi-page TIFF files produce malformed PAGE XML with missing </Page> closing tags for all pages except the last, causing validation failures.

Changes

  • GetPAGEText(): Add </Page> closing tag at end of method (line 1126)
  • EndDocumentHandler(): Remove </Page> tag, retain only </PcGts> (line 686)
  • Test Added: Added PAGEXMLMultiPageClosingTags test in unittest/baseapi_test.cc to validate proper tag closure for multi-page documents

Each page now generates a complete <Page>...</Page> element. The document handler manages only the outer envelope, matching the pattern used in altorenderer.cpp.

Before:

<PcGts>
  <Page>...      <!-- unclosed -->
  <Page>...      <!-- unclosed -->
  <Page>...</Page>
</PcGts>

After:

<PcGts>
  <Page>...</Page>
  <Page>...</Page>
  <Page>...</Page>
</PcGts>

Testing

The new test validates:

  • Each page has exactly one opening <Page> tag
  • Each page has exactly one closing </Page> tag
  • Opening and closing tags match (balanced XML)
  • Individual page outputs don't contain document envelope tags
  • Multiple pages each close properly
Original prompt

Please create a pull request which fixes this bug.


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI changed the title [WIP] Fix bug in application functionality Fix typo in bug report template: umame → uname Feb 8, 2026
Copilot AI requested a review from stweil February 8, 2026 18:24
Copilot AI changed the title Fix typo in bug report template: umame → uname Fix missing closing tags in multi-page PAGE XML output Feb 8, 2026
@stweil
Copy link
Member

stweil commented Feb 8, 2026

@copilot, please rebase this PR to the latest code in the main branch. In addition, it would be nice to add a CI test for this issue.

Copy link
Contributor Author

Copilot AI commented Feb 8, 2026

@copilot, please rebase this PR to the latest code in the main branch. In addition, it would be nice to add a CI test for this issue.

Done! The PR is now rebased on the latest main branch (the typo fix was already merged in 9c393bb). I've also added a CI test PAGEXMLMultiPageClosingTags in unittest/baseapi_test.cc that validates:

  • Each page has exactly one opening and closing <Page> tag
  • Individual page outputs don't contain document envelope tags
  • Multiple pages each close properly

Commit: b8e2455

Copilot AI and others added 2 commits February 8, 2026 19:50
Co-authored-by: Stefan Weil <sw@weilnetz.de>
Co-authored-by: Stefan Weil <sw@weilnetz.de>
@stweil stweil force-pushed the copilot/fix-bug-in-application branch from b8e2455 to bafb1fc Compare February 8, 2026 18:52
@stweil stweil marked this pull request as ready for review February 8, 2026 19:29
@stweil stweil merged commit a6dac7d into main Feb 8, 2026
4 of 6 checks passed
@stweil stweil deleted the copilot/fix-bug-in-application branch February 8, 2026 19:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants