mindee · felixdittrich92 · Jun 23, 2026 · Jun 22, 2026 · Jun 22, 2026 · Jun 22, 2026
diff --git a/docs/source/getting_started/installing.rst b/docs/source/getting_started/installing.rst
@@ -15,7 +15,6 @@ Install the last stable release of the package using `pip <https://pip.pypa.io/e
 
     pip install python-doctr
 
-
 We strive towards reducing framework-specific dependencies to a minimum, but some necessary features are developed by third-parties for specific frameworks. To avoid missing some dependencies for a specific framework, you can install specific builds as follows:
 
 .. code:: bash
@@ -24,6 +23,12 @@ We strive towards reducing framework-specific dependencies to a minimum, but som
     # or with preinstalled packages for visualization & html & contrib module support
     pip install "python-doctr[viz,html,contrib]"
 
+Available optional extras:
+
+* ``viz``: installs `matplotlib` and `mplcursors` for result visualisation (e.g. ``Page.show()``)
+* ``html``: installs `weasyprint` for reading HTML documents via :func:`~doctr.io.read_html`
+* ``contrib``: installs `onnxruntime` for the :class:`~doctr.contrib.ArtefactDetector` contrib module
+
 
 Via Git
 =======
@@ -35,3 +40,13 @@ Install the library in developer mode:
 
     git clone https://github.com/mindee/doctr.git
     pip install -e doctr/.
+
+
+Via Docker
+==========
+
+Official Docker images are available on the `GitHub Container Registry <https://github.com/mindee/doctr/pkgs/container/doctr>`_.
+
+.. code:: bash
+
+    docker run -it ghcr.io/mindee/doctr:latest bash
diff --git a/docs/source/getting_started/quickstart.rst b/docs/source/getting_started/quickstart.rst
@@ -0,0 +1,110 @@
+
+**********
+Quickstart
+**********
+
+This page shows you how to get OCR results from a document in just a few lines of code.
+For more details see :ref:`using_models`.
+
+
+Load a document
+===============
+
+docTR can read PDFs, images, and web pages:
+
+.. code:: python3
+
+    from doctr.io import DocumentFile
+
+    # From a PDF
+    doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
+    # From one or more images
+    doc = DocumentFile.from_images("path/to/your/img.jpg")
+    doc = DocumentFile.from_images(["path/to/page1.jpg", "path/to/page2.jpg"])
+    # From a URL (requires the ``html`` extra: pip install "python-doctr[html]")
+    doc = DocumentFile.from_url("https://www.example.com")
+
+
+Run OCR
+=======
+
+.. code:: python3
+
+    from doctr.io import DocumentFile
+    from doctr.models import ocr_predictor
+
+    doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
+    model = ocr_predictor(pretrained=True)
+    result = model(doc)
+
+The predictor uses ``db_resnet50`` for text detection and ``crnn_vgg16_bn`` for text recognition by default.
+You can choose any combination of :ref:`supported architectures <using_models>`.
+
+
+Inspect the output
+==================
+
+The result is a :class:`~doctr.io.Document` object.
+
+Render as plain text::
+
+    print(result.render())
+
+Export as a nested dictionary (JSON-serialisable)::
+
+    import json
+    print(json.dumps(result.export(), indent=2))
+
+Visualize on screen (requires the ``viz`` extra: ``pip install "python-doctr[viz]"``)::
+
+    result.pages[0].show()
+
+
+Multi-page PDF end-to-end example
+==================================
+
+The following snippet processes every page of a PDF and collects the plain-text output:
+
+.. code:: python3
+
+    import json
+    from doctr.io import DocumentFile
+    from doctr.models import ocr_predictor
+
+    model = ocr_predictor(pretrained=True)
+    doc = DocumentFile.from_pdf("path/to/multi_page.pdf")
+    result = model(doc)
+
+    # Plain-text — one string per page
+    for page_idx, page in enumerate(result.pages):
+        print(f"--- Page {page_idx + 1} ---")
+        print(page.render())
+
+    # Structured output — JSON-serialisable dict
+    output = result.export()
+    with open("ocr_output.json", "w") as f:
+        json.dump(output, f, indent=2)
+
+
+Common pitfalls
+===============
+
+.. note::
+
+   * **Visualization** requires the ``viz`` extra (installs ``matplotlib`` and ``mplcursors``):
+     ``pip install "python-doctr[viz]"``.  Calls to ``result.show()`` or
+     ``result.pages[0].show()`` raise a ``ModuleNotFoundError`` without it.
+   * **HTML input** requires the ``html`` extra: ``pip install "python-doctr[html]"``.
+   * **Image format**: pass file paths or NumPy ``uint8`` arrays shaped ``(H, W, C)`` in
+     RGB order.  Grayscale arrays must be converted to 3-channel before use.
+   * **Pretrained weights** are downloaded on first use and cached locally.  Subsequent calls are instantaneous.
+   * **PDF pages are returned as images**: ``DocumentFile.from_pdf`` returns one
+     NumPy array per page, so ``result.pages[i]`` corresponds to the *i*-th PDF page.
+
+
+Next steps
+==========
+
+* :doc:`../using_doctr/using_models` - full predictor guide, architecture benchmarks, GPU usage.
+* :doc:`../using_doctr/custom_models_training` - train and load your own models.
+* :doc:`../using_doctr/sharing_models` - share your trained models on Hugging Face Hub.
diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -18,6 +18,7 @@ Main Features
 -------------
 
 * |:robot:| Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
+* |:page_facing_up:| Layout analysis predictor for detecting document regions (tables, figures, headers, …)
 * |:zap:| User-friendly, 3 lines of code to load a document and extract text with a predictor
 * |:rocket:| State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
 * |:zap:| Optimized for inference speed on both CPU & GPU
@@ -32,6 +33,7 @@ Main Features
    :hidden:
 
    getting_started/installing
+   getting_started/quickstart
    notebooks
 
 

diff --git a/docs/source/modules/io.rst b/docs/source/modules/io.rst
@@ -20,6 +20,12 @@ A Word is an uninterrupted sequence of characters.
 
 .. autoclass:: Word
 
+Prediction
+^^^^^^^^^^
+A Prediction is a Word with an additional crop orientation field indicating the detected text rotation angle.
+
+.. autoclass:: Prediction
+
 Line
 ^^^^
 A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
@@ -33,6 +39,13 @@ An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, l
 
 .. autoclass:: Artefact
 
+LayoutElement
+^^^^^^^^^^^^^
+
+A LayoutElement is a region predicted by a layout detection model (e.g. Title, Text, Table, Page-header, Page-footer). Layout regions are attached to a :class:`Page` when the ``ocr_predictor`` / ``kie_predictor`` is run with ``detect_layout=True``.
+
+.. autoclass:: LayoutElement
+
 Block
 ^^^^^
 A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
@@ -49,6 +62,17 @@ A Page is a collection of Blocks that were on the same physical page.
    .. automethod:: show
 
 
+KIEPage
+^^^^^^^
+
+A KIEPage is returned by the :py:meth:`kie_predictor <doctr.models.kie_predictor>`. It groups predictions by
+semantic class rather than by spatial layout.
+
+.. autoclass:: KIEPage
+
+   .. automethod:: show
+
+
 Document
 ^^^^^^^^
 
@@ -59,6 +83,17 @@ A Document is a collection of Pages.
    .. automethod:: show
 
 
+KIEDocument
+^^^^^^^^^^^
+
+A KIEDocument is a collection of :class:`KIEPage` elements, returned by the
+:py:meth:`kie_predictor <doctr.models.kie_predictor>`.
+
+.. autoclass:: KIEDocument
+
+   .. automethod:: show
+
+
 File reading
 ------------
 

diff --git a/docs/source/modules/models.rst b/docs/source/modules/models.rst
@@ -83,6 +83,8 @@ doctr.models.layout
 
 .. autofunction:: doctr.models.layout.lw_detr_m
 
+.. autofunction:: doctr.models.layout.layout_predictor
+
 
 doctr.models.recognition
 ------------------------
@@ -124,3 +126,13 @@ doctr.models.factory
 .. autofunction:: doctr.models.factory.from_hub
 
 .. autofunction:: doctr.models.factory.push_to_hf_hub
+
+
+doctr.models.utils
+------------------
+
+.. currentmodule:: doctr.models.utils
+
+.. autofunction:: export_model_to_onnx
+
+.. autofunction:: add_whitelist
diff --git a/docs/source/modules/utils.rst b/docs/source/modules/utils.rst
@@ -14,13 +14,19 @@ Easy-to-use functions to make sense of your model's predictions.
 
 .. autofunction:: visualize_page
 
+.. autofunction:: visualize_kie_page
+
+.. autofunction:: draw_boxes
+
 Reconstitution
 ---------------
 
 .. currentmodule:: doctr.utils.reconstitution
 
 .. autofunction:: synthesize_page
 
+.. autofunction:: synthesize_kie_page
+
 
 .. _metrics:
 

diff --git a/docs/source/using_doctr/custom_models_training.rst b/docs/source/using_doctr/custom_models_training.rst
@@ -1,3 +1,5 @@
+.. _custom_models_training:
+
 Train your own model
 ====================
 
@@ -54,18 +56,24 @@ Load a custom recognition model trained on another vocabulary as the default one
     predictor = ocr_predictor(det_arch='linknet_resnet18', reco_arch=reco_model, pretrained=True)
 
 
-Load a custom layout analysis model trained on another set of classes as the default one:
+Plug a custom layout analysis model (trained on another set of classes) directly into the OCR pipeline so the detected regions are attached to every page:
 
 .. code:: python3
 
     import torch
-    from doctr.models import layout_predictor, lw_detr_s
-    from doctr.datasets import VOCABS
+    from doctr.models import ocr_predictor, lw_detr_s
 
-    layout_model = lw_detr_s(pretrained=False, class_names=["class_name_1", "class_name_2", ...])
+    # Custom layout model with your own class names
+    layout_model = lw_detr_s(pretrained=False, class_names=["heading", "paragraph", "figure", "table"])
     layout_model.from_pretrained('<path_to_pt>')
 
-    predictor = layout_predictor(layout_arch=layout_model, pretrained=True)
+    # Pass it through `layout_arch`, exactly as for the detection / recognition models
+    predictor = ocr_predictor(pretrained=True, detect_layout=True, layout_arch=layout_model)
+
+    result = predictor(doc)
+    # The regions (with your custom class names) are available on each page
+    print([(region.type, region.confidence) for region in result.pages[0].layout])
+
 
 Load a custom trained KIE detection model:
 

diff --git a/docs/source/using_doctr/sharing_models.rst b/docs/source/using_doctr/sharing_models.rst
@@ -1,40 +1,40 @@
 Share your model with the community
 ===================================
 
-docTR's focus is on open source, so if you also feel in love with than we appreciate sharing your trained model with the community.
-To make it easy for you, we have integrated a interface to the huggingface hub.
+docTR's focus is on open source, and if you feel the same way, we appreciate you sharing your trained model with the community.
+To make it easy for you, we have integrated an interface to the Hugging Face Hub.
 
 .. currentmodule:: doctr.models.factory
 
 
-Loading from Huggingface Hub
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Loading from Hugging Face Hub
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-This section shows how you can easily load a pretrained model from the Huggingface Hub.
+This section shows how you can easily load a pretrained model from the Hugging Face Hub.
 
 .. code:: python3
 
     from doctr.io import DocumentFile
     from doctr.models import ocr_predictor, from_hub
     image = DocumentFile.from_images(['data/example.jpg'])
-    # Load a custom detection model from huggingface hub
+    # Load a custom detection model from the Hugging Face Hub
     det_model = from_hub('Felix92/doctr-torch-db-mobilenet-v3-large')
-    # Load a custom recognition model from huggingface hub
+    # Load a custom recognition model from the Hugging Face Hub
     reco_model = from_hub('Felix92/doctr-torch-crnn-mobilenet-v3-large-french')
-    # You can easily plug in this models to the OCR predictor
+    # You can easily plug these models in to the OCR predictor
     predictor = ocr_predictor(det_arch=det_model, reco_arch=reco_model)
     result = predictor(image)
 
 
-Pushing to the Huggingface Hub
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Pushing to the Hugging Face Hub
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-You can also push your trained model to the Huggingface Hub.
+You can also push your trained model to the Hugging Face Hub.
 You need only to provide the task type (classification, detection, recognition or obj_detection), a name for your trained model (NOTE:
 existing repositories will not be overwritten) and the model name itself.
 
 - Prerequisites:
-    - Huggingface account (you can easy create one at https://huggingface.co/)
+    - Hugging Face account (you can easily create one at https://huggingface.co/)
     - installed Git LFS (check installation at: https://git-lfs.github.com/) in the repository
 
 .. code:: python3
@@ -68,6 +68,8 @@ We suggest using the following naming conventions for your models:
 
 **Recognition:** ``doctr-<architecture>-<vocab>``
 
+**Layout:** ``doctr-<architecture>``
+
 
 Classification
 --------------
@@ -101,3 +103,13 @@ Recognition
 +---------------------------------+---------------------------------------------------+---------------------+------------------------+
 | parseq                          | rania-sr/doctr-model-v1-arabic                    | arabic              | PyTorch                |
 +---------------------------------+---------------------------------------------------+---------------------+------------------------+
+
+
+Layout
+------
+
++---------------------------------+---------------------------------------------------+------------------------+
+|        **Architecture**         |            **Repo_ID**                            |     **Framework**      |
++=================================+===================================================+========================+
+| lw_detr_s (dummy)               | Felix92/doctr-dummy-torch-lw-detr-s               | PyTorch                |
++---------------------------------+---------------------------------------------------+------------------------+
diff --git a/docs/source/using_doctr/using_model_export.rst b/docs/source/using_doctr/using_model_export.rst
@@ -76,7 +76,7 @@ Further information can be found in the `PyTorch documentation <https://pytorch.
         mobilenet_v3_small_page_orientation(pretrained=True).eval()
     )
 
-    predictor = models.ocr_predictor(
+    predictor = ocr_predictor(
         detection_model, recognition_model, assume_straight_pages=False
     )
     # NOTE: Only required for non-straight pages (`assume_straight_pages=False`) and non-disabled orientation classification