Skip to content

feat: add DoclingServeConverter integration#3173

Open
SyedShahmeerAli12 wants to merge 3 commits intodeepset-ai:mainfrom
SyedShahmeerAli12:feat/docling-serve-integration
Open

feat: add DoclingServeConverter integration#3173
SyedShahmeerAli12 wants to merge 3 commits intodeepset-ai:mainfrom
SyedShahmeerAli12:feat/docling-serve-integration

Conversation

@SyedShahmeerAli12
Copy link
Copy Markdown
Contributor

Summary

Adds a DoclingServeConverter component that converts documents using a running docling-serve HTTP server, without any heavy ML dependencies (no PyTorch required).

  • Accepts URLs, local file paths, and ByteStream sources
  • Supports MARKDOWN, TEXT, and JSON export formats
  • Optional API key authentication via Haystack Secret
  • Both synchronous (run) and asynchronous (arun) execution
  • 27 unit tests, all passing

Closes #2960

Test plan

  • 27 unit tests passing
  • Lint clean (ruff check, ruff format)
  • Integration test requires a running docling-serve instance (pytest -m integration)

Adds a new `docling-serve-haystack` integration with a `DoclingServeConverter`
component that converts documents via a remote DoclingServe HTTP server instead
of loading heavy ML dependencies locally (no PyTorch required).

- Supports URLs, local file paths, and ByteStream sources
- Export formats: Markdown (default), plain text, JSON
- Both sync `run()` and async `arun()` methods
- Configurable conversion options, timeout, and optional API key auth
- Full unit test suite (mocked httpx) + integration test markers
- CI workflow, labeler, coverage comment, and root README table entry

Closes deepset-ai#2960
Adds a new DoclingServeConverter component that converts documents
by sending them to a running docling-serve HTTP server. Supports
local files, URLs, and ByteStreams; markdown, text, and JSON export
formats; optional API key authentication; and both sync (run) and
async (arun) execution.

Closes deepset-ai#2960
@SyedShahmeerAli12 SyedShahmeerAli12 requested a review from a team as a code owner April 16, 2026 20:12
@SyedShahmeerAli12 SyedShahmeerAli12 requested review from julian-risch and removed request for a team April 16, 2026 20:12
@github-actions github-actions Bot added topic:CI type:documentation Improvements or additions to documentation labels Apr 16, 2026
@SyedShahmeerAli12
Copy link
Copy Markdown
Contributor Author

SyedShahmeerAli12 commented Apr 16, 2026

heyy ..... @julian-risch
this implements the DoclingServeConverter as described in #2960.

Key design decisions:

  • Used httpx instead of requests for native async support (arun())
  • api_key uses Haystack Secret class for secure serialization
  • convert_options is a single dict instead of individual params ...... cleaner and forward-compatible with new
    docling-serve options
  • Sources are base64-encoded and sent as JSON to /v1/convert/source (avoids multipart complexity)

@SyedShahmeerAli12
Copy link
Copy Markdown
Contributor Author

SyedShahmeerAli12 commented May 6, 2026

Merge conflicts resolved branch is now up to date with main. Ready for review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

topic:CI type:documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add new docling-serve integration

1 participant