Skip to content

Multiple OMML equations in one paragraph concatenated into a single display block #3121

@smroels

Description

@smroels

omml_multi_equation_paragraph.docx

Bug

When a single <w:p> paragraph contains multiple <m:oMath> sibling elements, docling
concatenates all of them into one $$ display block instead of emitting each as a separate
equation. The equations are also output in an unexpected order.

Given a paragraph with three sibling display equations (a = b, c = d, e = f), the
expected output is three separate display blocks:

$$a = b$$

$$c = d$$

$$e = f$$

Actual output is one block with all content merged (and order scrambled):

$$c=de=fa=b$$

This likely originates in the paragraph-level equation handling in
docling/backend/docx/ms_word_backend.py, where sibling <m:oMath> nodes within one <w:p>
are not iterated and split into individual equation items.

Steps to reproduce

  1. Download the attached DOCX file.

  2. Run:

    docling --from docx --to md --output . omml_multi_equation_paragraph.docx
  3. Inspect omml_multi_equation_paragraph.md. The three equations (a = b, c = d, e = f)
    appear concatenated in a single $$ block rather than as three separate blocks.

The DOCX contains a single paragraph with three sibling <m:oMath> elements:

<w:p>
  <m:oMath>  <!-- a = b -->  </m:oMath>
  <m:oMath>  <!-- c = d -->  </m:oMath>
  <m:oMath>  <!-- e = f -->  </m:oMath>
</w:p>

This structure is produced naturally by Microsoft Word when a user places multiple display
equations in the same paragraph (e.g., by pressing Enter within an equation block and
continuing to type).

Docling version

Docling version: 2.79.0
Docling Core version: 2.69.0
Docling IBM Models version: 3.12.0
Docling Parse version: 5.5.0
Python: cpython-312 (3.12.12)
Platform: Linux-5.14.0-611.16.1.el9_7.x86_64-x86_64-with-glibc2.34

Python version

Python 3.12.12

Attachments

  • omml_multi_equation_paragraph.docx — minimal three-equation paragraph that triggers the bug

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions