diff --git a/docs/source/getting_started/installing.rst b/docs/source/getting_started/installing.rst
index 17637625b1..8552a2434e 100644
--- a/docs/source/getting_started/installing.rst
+++ b/docs/source/getting_started/installing.rst
@@ -15,7 +15,6 @@ Install the last stable release of the package using `pip <https://pip.pypa.io/e
 
     pip install python-doctr
 
-
 We strive towards reducing framework-specific dependencies to a minimum, but some necessary features are developed by third-parties for specific frameworks. To avoid missing some dependencies for a specific framework, you can install specific builds as follows:
 
 .. code:: bash
@@ -24,6 +23,12 @@ We strive towards reducing framework-specific dependencies to a minimum, but som
     # or with preinstalled packages for visualization & html & contrib module support
     pip install "python-doctr[viz,html,contrib]"
 
+Available optional extras:
+
+* ``viz``: installs `matplotlib` and `mplcursors` for result visualisation (e.g. ``Page.show()``)
+* ``html``: installs `weasyprint` for reading HTML documents via :func:`~doctr.io.read_html`
+* ``contrib``: installs `onnxruntime` for the :class:`~doctr.contrib.ArtefactDetector` contrib module
+
 
 Via Git
 =======
@@ -35,3 +40,13 @@ Install the library in developer mode:
 
     git clone https://github.com/mindee/doctr.git
     pip install -e doctr/.
+
+
+Via Docker
+==========
+
+Official Docker images are available on the `GitHub Container Registry <https://github.com/mindee/doctr/pkgs/container/doctr>`_.
+
+.. code:: bash
+
+    docker run -it ghcr.io/mindee/doctr:latest bash
diff --git a/docs/source/getting_started/quickstart.rst b/docs/source/getting_started/quickstart.rst
new file mode 100644
index 0000000000..94818a2415
--- /dev/null
+++ b/docs/source/getting_started/quickstart.rst
@@ -0,0 +1,110 @@
+
+**********
+Quickstart
+**********
+
+This page shows you how to get OCR results from a document in just a few lines of code.
+For more details see :ref:`using_models`.
+
+
+Load a document
+===============
+
+docTR can read PDFs, images, and web pages:
+
+.. code:: python3
+
+    from doctr.io import DocumentFile
+
+    # From a PDF
+    doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
+    # From one or more images
+    doc = DocumentFile.from_images("path/to/your/img.jpg")
+    doc = DocumentFile.from_images(["path/to/page1.jpg", "path/to/page2.jpg"])
+    # From a URL (requires the ``html`` extra: pip install "python-doctr[html]")
+    doc = DocumentFile.from_url("https://www.example.com")
+
+
+Run OCR
+=======
+
+.. code:: python3
+
+    from doctr.io import DocumentFile
+    from doctr.models import ocr_predictor
+
+    doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
+    model = ocr_predictor(pretrained=True)
+    result = model(doc)
+
+The predictor uses ``db_resnet50`` for text detection and ``crnn_vgg16_bn`` for text recognition by default.
+You can choose any combination of :ref:`supported architectures <using_models>`.
+
+
+Inspect the output
+==================
+
+The result is a :class:`~doctr.io.Document` object.
+
+Render as plain text::
+
+    print(result.render())
+
+Export as a nested dictionary (JSON-serialisable)::
+
+    import json
+    print(json.dumps(result.export(), indent=2))
+
+Visualize on screen (requires the ``viz`` extra: ``pip install "python-doctr[viz]"``)::
+
+    result.pages[0].show()
+
+
+Multi-page PDF end-to-end example
+==================================
+
+The following snippet processes every page of a PDF and collects the plain-text output:
+
+.. code:: python3
+
+    import json
+    from doctr.io import DocumentFile
+    from doctr.models import ocr_predictor
+
+    model = ocr_predictor(pretrained=True)
+    doc = DocumentFile.from_pdf("path/to/multi_page.pdf")
+    result = model(doc)
+
+    # Plain-text — one string per page
+    for page_idx, page in enumerate(result.pages):
+        print(f"--- Page {page_idx + 1} ---")
+        print(page.render())
+
+    # Structured output — JSON-serialisable dict
+    output = result.export()
+    with open("ocr_output.json", "w") as f:
+        json.dump(output, f, indent=2)
+
+
+Common pitfalls
+===============
+
+.. note::
+
+   * **Visualization** requires the ``viz`` extra (installs ``matplotlib`` and ``mplcursors``):
+     ``pip install "python-doctr[viz]"``.  Calls to ``result.show()`` or
+     ``result.pages[0].show()`` raise a ``ModuleNotFoundError`` without it.
+   * **HTML input** requires the ``html`` extra: ``pip install "python-doctr[html]"``.
+   * **Image format**: pass file paths or NumPy ``uint8`` arrays shaped ``(H, W, C)`` in
+     RGB order.  Grayscale arrays must be converted to 3-channel before use.
+   * **Pretrained weights** are downloaded on first use and cached locally.  Subsequent calls are instantaneous.
+   * **PDF pages are returned as images**: ``DocumentFile.from_pdf`` returns one
+     NumPy array per page, so ``result.pages[i]`` corresponds to the *i*-th PDF page.
+
+
+Next steps
+==========
+
+* :doc:`../using_doctr/using_models` - full predictor guide, architecture benchmarks, GPU usage.
+* :doc:`../using_doctr/custom_models_training` - train and load your own models.
+* :doc:`../using_doctr/sharing_models` - share your trained models on Hugging Face Hub.
diff --git a/docs/source/index.rst b/docs/source/index.rst
index 76ca44eed6..1cfdb298b6 100644
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -18,6 +18,7 @@ Main Features
 -------------
 
 * |:robot:| Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
+* |:page_facing_up:| Layout analysis predictor for detecting document regions (tables, figures, headers, …)
 * |:zap:| User-friendly, 3 lines of code to load a document and extract text with a predictor
 * |:rocket:| State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
 * |:zap:| Optimized for inference speed on both CPU & GPU
@@ -32,6 +33,7 @@ Main Features
    :hidden:
 
    getting_started/installing
+   getting_started/quickstart
    notebooks
 
 
diff --git a/docs/source/modules/io.rst b/docs/source/modules/io.rst
index 7ac74025b0..56d88e014d 100644
--- a/docs/source/modules/io.rst
+++ b/docs/source/modules/io.rst
@@ -20,6 +20,12 @@ A Word is an uninterrupted sequence of characters.
 
 .. autoclass:: Word
 
+Prediction
+^^^^^^^^^^
+A Prediction is a Word with an additional crop orientation field indicating the detected text rotation angle.
+
+.. autoclass:: Prediction
+
 Line
 ^^^^
 A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
@@ -33,6 +39,13 @@ An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, l
 
 .. autoclass:: Artefact
 
+LayoutElement
+^^^^^^^^^^^^^
+
+A LayoutElement is a region predicted by a layout detection model (e.g. Title, Text, Table, Page-header, Page-footer). Layout regions are attached to a :class:`Page` when the ``ocr_predictor`` / ``kie_predictor`` is run with ``detect_layout=True``.
+
+.. autoclass:: LayoutElement
+
 Block
 ^^^^^
 A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
@@ -49,6 +62,17 @@ A Page is a collection of Blocks that were on the same physical page.
    .. automethod:: show
 
 
+KIEPage
+^^^^^^^
+
+A KIEPage is returned by the :py:meth:`kie_predictor <doctr.models.kie_predictor>`. It groups predictions by
+semantic class rather than by spatial layout.
+
+.. autoclass:: KIEPage
+
+   .. automethod:: show
+
+
 Document
 ^^^^^^^^
 
@@ -59,6 +83,17 @@ A Document is a collection of Pages.
    .. automethod:: show
 
 
+KIEDocument
+^^^^^^^^^^^
+
+A KIEDocument is a collection of :class:`KIEPage` elements, returned by the
+:py:meth:`kie_predictor <doctr.models.kie_predictor>`.
+
+.. autoclass:: KIEDocument
+
+   .. automethod:: show
+
+
 File reading
 ------------
 
diff --git a/docs/source/modules/models.rst b/docs/source/modules/models.rst
index 55ce88a365..f30c7d1dda 100644
--- a/docs/source/modules/models.rst
+++ b/docs/source/modules/models.rst
@@ -83,6 +83,8 @@ doctr.models.layout
 
 .. autofunction:: doctr.models.layout.lw_detr_m
 
+.. autofunction:: doctr.models.layout.layout_predictor
+
 
 doctr.models.recognition
 ------------------------
@@ -124,3 +126,13 @@ doctr.models.factory
 .. autofunction:: doctr.models.factory.from_hub
 
 .. autofunction:: doctr.models.factory.push_to_hf_hub
+
+
+doctr.models.utils
+------------------
+
+.. currentmodule:: doctr.models.utils
+
+.. autofunction:: export_model_to_onnx
+
+.. autofunction:: add_whitelist
diff --git a/docs/source/modules/utils.rst b/docs/source/modules/utils.rst
index a80f663c37..59bb8f7939 100644
--- a/docs/source/modules/utils.rst
+++ b/docs/source/modules/utils.rst
@@ -14,6 +14,10 @@ Easy-to-use functions to make sense of your model's predictions.
 
 .. autofunction:: visualize_page
 
+.. autofunction:: visualize_kie_page
+
+.. autofunction:: draw_boxes
+
 Reconstitution
 ---------------
 
@@ -21,6 +25,8 @@ Reconstitution
 
 .. autofunction:: synthesize_page
 
+.. autofunction:: synthesize_kie_page
+
 
 .. _metrics:
 
diff --git a/docs/source/using_doctr/custom_models_training.rst b/docs/source/using_doctr/custom_models_training.rst
index 9b28df0fbb..461ab22acc 100644
--- a/docs/source/using_doctr/custom_models_training.rst
+++ b/docs/source/using_doctr/custom_models_training.rst
@@ -1,3 +1,5 @@
+.. _custom_models_training:
+
 Train your own model
 ====================
 
@@ -54,18 +56,24 @@ Load a custom recognition model trained on another vocabulary as the default one
     predictor = ocr_predictor(det_arch='linknet_resnet18', reco_arch=reco_model, pretrained=True)
 
 
-Load a custom layout analysis model trained on another set of classes as the default one:
+Plug a custom layout analysis model (trained on another set of classes) directly into the OCR pipeline so the detected regions are attached to every page:
 
 .. code:: python3
 
     import torch
-    from doctr.models import layout_predictor, lw_detr_s
-    from doctr.datasets import VOCABS
+    from doctr.models import ocr_predictor, lw_detr_s
 
-    layout_model = lw_detr_s(pretrained=False, class_names=["class_name_1", "class_name_2", ...])
+    # Custom layout model with your own class names
+    layout_model = lw_detr_s(pretrained=False, class_names=["heading", "paragraph", "figure", "table"])
     layout_model.from_pretrained('<path_to_pt>')
 
-    predictor = layout_predictor(layout_arch=layout_model, pretrained=True)
+    # Pass it through `layout_arch`, exactly as for the detection / recognition models
+    predictor = ocr_predictor(pretrained=True, detect_layout=True, layout_arch=layout_model)
+
+    result = predictor(doc)
+    # The regions (with your custom class names) are available on each page
+    print([(region.type, region.confidence) for region in result.pages[0].layout])
+
 
 Load a custom trained KIE detection model:
 
diff --git a/docs/source/using_doctr/sharing_models.rst b/docs/source/using_doctr/sharing_models.rst
index b2dcbfbc6f..d7206040eb 100644
--- a/docs/source/using_doctr/sharing_models.rst
+++ b/docs/source/using_doctr/sharing_models.rst
@@ -1,40 +1,40 @@
 Share your model with the community
 ===================================
 
-docTR's focus is on open source, so if you also feel in love with than we appreciate sharing your trained model with the community.
-To make it easy for you, we have integrated a interface to the huggingface hub.
+docTR's focus is on open source, and if you feel the same way, we appreciate you sharing your trained model with the community.
+To make it easy for you, we have integrated an interface to the Hugging Face Hub.
 
 .. currentmodule:: doctr.models.factory
 
 
-Loading from Huggingface Hub
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Loading from Hugging Face Hub
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-This section shows how you can easily load a pretrained model from the Huggingface Hub.
+This section shows how you can easily load a pretrained model from the Hugging Face Hub.
 
 .. code:: python3
 
     from doctr.io import DocumentFile
     from doctr.models import ocr_predictor, from_hub
     image = DocumentFile.from_images(['data/example.jpg'])
-    # Load a custom detection model from huggingface hub
+    # Load a custom detection model from the Hugging Face Hub
     det_model = from_hub('Felix92/doctr-torch-db-mobilenet-v3-large')
-    # Load a custom recognition model from huggingface hub
+    # Load a custom recognition model from the Hugging Face Hub
     reco_model = from_hub('Felix92/doctr-torch-crnn-mobilenet-v3-large-french')
-    # You can easily plug in this models to the OCR predictor
+    # You can easily plug these models in to the OCR predictor
     predictor = ocr_predictor(det_arch=det_model, reco_arch=reco_model)
     result = predictor(image)
 
 
-Pushing to the Huggingface Hub
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Pushing to the Hugging Face Hub
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-You can also push your trained model to the Huggingface Hub.
+You can also push your trained model to the Hugging Face Hub.
 You need only to provide the task type (classification, detection, recognition or obj_detection), a name for your trained model (NOTE:
 existing repositories will not be overwritten) and the model name itself.
 
 - Prerequisites:
-    - Huggingface account (you can easy create one at https://huggingface.co/)
+    - Hugging Face account (you can easily create one at https://huggingface.co/)
     - installed Git LFS (check installation at: https://git-lfs.github.com/) in the repository
 
 .. code:: python3
@@ -68,6 +68,8 @@ We suggest using the following naming conventions for your models:
 
 **Recognition:** ``doctr-<architecture>-<vocab>``
 
+**Layout:** ``doctr-<architecture>``
+
 
 Classification
 --------------
@@ -101,3 +103,13 @@ Recognition
 +---------------------------------+---------------------------------------------------+---------------------+------------------------+
 | parseq                          | rania-sr/doctr-model-v1-arabic                    | arabic              | PyTorch                |
 +---------------------------------+---------------------------------------------------+---------------------+------------------------+
+
+
+Layout
+------
+
++---------------------------------+---------------------------------------------------+------------------------+
+|        **Architecture**         |            **Repo_ID**                            |     **Framework**      |
++=================================+===================================================+========================+
+| lw_detr_s (dummy)               | Felix92/doctr-dummy-torch-lw-detr-s               | PyTorch                |
++---------------------------------+---------------------------------------------------+------------------------+
diff --git a/docs/source/using_doctr/using_model_export.rst b/docs/source/using_doctr/using_model_export.rst
index a3c18fea9c..7cf94accf8 100644
--- a/docs/source/using_doctr/using_model_export.rst
+++ b/docs/source/using_doctr/using_model_export.rst
@@ -76,7 +76,7 @@ Further information can be found in the `PyTorch documentation <https://pytorch.
         mobilenet_v3_small_page_orientation(pretrained=True).eval()
     )
 
-    predictor = models.ocr_predictor(
+    predictor = ocr_predictor(
         detection_model, recognition_model, assume_straight_pages=False
     )
     # NOTE: Only required for non-straight pages (`assume_straight_pages=False`) and non-disabled orientation classification
diff --git a/docs/source/using_doctr/using_models.rst b/docs/source/using_doctr/using_models.rst
index 4f67f73ead..3c589b2799 100644
--- a/docs/source/using_doctr/using_models.rst
+++ b/docs/source/using_doctr/using_models.rst
@@ -1,3 +1,5 @@
+.. _using_models:
+
 Choosing the right model
 ========================
 
@@ -10,6 +12,27 @@ For a given task, docTR provides a Predictor, which is composed of 2 components:
 * Model: a deep learning model, implemented with all supported deep learning backends (PyTorch) along with its specific post-processor to make outputs structured and reusable.
 
 
+Which predictor should I use?
+------------------------------
+
+.. list-table::
+   :widths: 60 40
+   :header-rows: 1
+
+   * - I want to…
+     - Use
+   * - Extract all text (words, lines, layout hierarchy) from a document
+     - :py:meth:`ocr_predictor <doctr.models.ocr_predictor>`
+   * - Detect document regions by type (tables, figures, headers, …)
+     - :py:meth:`layout_predictor <doctr.models.layout_predictor>`
+   * - Get word bounding-boxes only, without recognition
+     - :py:meth:`detection_predictor <doctr.models.detection_predictor>`
+   * - Transcribe pre-cropped word images to strings
+     - :py:meth:`recognition_predictor <doctr.models.recognition_predictor>`
+
+For :doc:`custom model loading <custom_models_training>` or sharing models, see the dedicated pages.
+
+
 Text Detection
 --------------
 
@@ -17,12 +40,11 @@ The task consists of localizing textual elements in a given image.
 While those text elements can represent many things, in docTR, we will consider uninterrupted character sequences (words). Additionally, the localization can take several forms: from straight bounding boxes (delimited by the 2D coordinates of the top-left and bottom-right corner), to polygons, or binary segmentation (flagging which pixels belong to this element, and which don't).
 Our latest detection models works with rotated and skewed documents!
 
-Available architectures
-^^^^^^^^^^^^^^^^^^^^^^^
+Available detection architectures
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 The following architectures are currently supported:
 
-* :py:meth:`linknet_resnet18 <doctr.models.detection.linknet_resnet18>`
 * :py:meth:`linknet_resnet34 <doctr.models.detection.linknet_resnet34>`
 * :py:meth:`linknet_resnet50 <doctr.models.detection.linknet_resnet50>`
 * :py:meth:`db_resnet50 <doctr.models.detection.db_resnet50>`
@@ -70,7 +92,7 @@ Seconds per iteration (with a batch size of 1) is computed after a warmup phase
 Detection predictors
 ^^^^^^^^^^^^^^^^^^^^
 
-:py:meth:`detection_predictor <doctr.models.detection.detection_predictor>` wraps your detection model to make it easily useable with your favorite deep learning framework seamlessly.
+:py:meth:`detection_predictor <doctr.models.detection.detection_predictor>` wraps your detection model to make it easily usable with your favorite deep learning framework seamlessly.
 
 .. code:: python3
 
@@ -81,12 +103,11 @@ Detection predictors
     out = model([dummy_img])
 
 You can pass specific boolean arguments to the predictor:
-* `pretrained`: if you want to use a model that has been pretrained on a specific dataset, setting `pretrained=True` this will load the corresponding weights. If `pretrained=False`, which is the default, would otherwise lead to a random initialization and would lead to no/useless results.
-* `assume_straight_pages`: if you work with straight documents only, it will fit straight bounding boxes to the text areas.
-* `preserve_aspect_ratio`: if you want to preserve the aspect ratio of your documents while resizing before sending them to the model.
-* `symmetric_pad`: if you choose to preserve the aspect ratio, it will pad the image symmetrically and not from the bottom-right.
 
-For instance, this snippet will instantiates a detection predictor able to detect text on rotated documents while preserving the aspect ratio:
+* ``pretrained``: if you want to use a model that has been pretrained on a specific dataset, setting ``pretrained=True`` will load the corresponding weights. If ``pretrained=False`` (the default), the model is randomly initialized and will produce no useful results.
+* ``assume_straight_pages``: if you work with straight documents only, it will fit straight bounding boxes to the text areas.
+* ``preserve_aspect_ratio``: if you want to preserve the aspect ratio of your documents while resizing before sending them to the model.
+* ``symmetric_pad``: if you choose to preserve the aspect ratio, it will pad the image symmetrically and not from the bottom-right.
 
 .. code:: python3
 
@@ -100,8 +121,8 @@ Text Recognition
 The task consists of transcribing the character sequence in a given image.
 
 
-Available architectures
-^^^^^^^^^^^^^^^^^^^^^^^
+Available recognition architectures
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 The following architectures are currently supported:
 
@@ -156,14 +177,14 @@ While most of our recognition models were trained on our french vocab (cf. :ref:
     print(predictor.model.cfg['vocab'])
 
 
-*Disclaimer: both FUNSD subsets combine have 30595 word-level crops which might not be representative enough of the model capabilities*
+*Disclaimer: both FUNSD subsets combined have 30595 word-level crops which might not be representative enough of the model capabilities*
 
 Seconds per iteration (with a batch size of 64) is computed after a warmup phase of 100 tensors, by measuring the average number of processed tensors per second over 1000 samples. Those results were obtained on a `11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz`.
 
 
 Recognition predictors
 ^^^^^^^^^^^^^^^^^^^^^^
-:py:meth:`recognition_predictor <doctr.models.recognition.recognition_predictor>` wraps your recognition model to make it easily useable with your favorite deep learning framework seamlessly.
+:py:meth:`recognition_predictor <doctr.models.recognition.recognition_predictor>` wraps your recognition model to make it easily usable with your favorite deep learning framework seamlessly.
 
 .. code:: python3
 
@@ -181,8 +202,8 @@ The task consists of localizing and classifying visual elements in a given image
 This is a more general task than text detection, as it can be used to detect and classify any type of visual element in a document, such as tables, figures, headers, footers, etc.
 Our latest layout models works with rotated and skewed documents!
 
-Available architectures
-^^^^^^^^^^^^^^^^^^^^^^^
+Available layout architectures
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 The following architectures are currently supported:
 
@@ -210,7 +231,7 @@ Seconds per iteration (with a batch size of 1) is computed after a warmup phase
 Layout predictors
 ^^^^^^^^^^^^^^^^^
 
-:py:meth:`layout_predictor <doctr.models.layout.layout_predictor>` wraps your layout model to make it easily useable with your favorite deep learning framework seamlessly.
+:py:meth:`layout_predictor <doctr.models.layout.layout_predictor>` wraps your layout model to make it easily usable with your favorite deep learning framework seamlessly.
 
 .. code:: python3
 
@@ -221,12 +242,13 @@ Layout predictors
     out = model([dummy_img])
 
 You can pass specific boolean arguments to the predictor:
-* `pretrained`: if you want to use a model that has been pretrained on a specific dataset, setting `pretrained=True` this will load the corresponding weights. If `pretrained=False`, which is the default, would otherwise lead to a random initialization and would lead to no/useless results.
-* `assume_straight_pages`: if you work with straight documents only, it will fit straight bounding boxes to the text areas.
-* `preserve_aspect_ratio`: if you want to preserve the aspect ratio of your documents while resizing before sending them to the model.
-* `symmetric_pad`: if you choose to preserve the aspect ratio, it will pad the image symmetrically and not from the bottom-right.
 
-For instance, this snippet will instantiates a layout predictor able to detect text on rotated documents while preserving the aspect ratio:
+* ``pretrained``: if you want to use a model that has been pretrained on a specific dataset, setting ``pretrained=True`` will load the corresponding weights. If ``pretrained=False`` (the default), the model is randomly initialized and will produce no useful results.
+* ``assume_straight_pages``: if you work with straight documents only, it will fit straight bounding boxes to the text areas.
+* ``preserve_aspect_ratio``: if you want to preserve the aspect ratio of your documents while resizing before sending them to the model.
+* ``symmetric_pad``: if you choose to preserve the aspect ratio, it will pad the image symmetrically and not from the bottom-right.
+
+For instance, this snippet instantiates a layout predictor able to detect text on rotated documents while preserving the aspect ratio:
 
 .. code:: python3
 
@@ -239,8 +261,8 @@ End-to-End OCR
 
 The task consists of both localizing and transcribing textual elements in a given image.
 
-Available architectures
-^^^^^^^^^^^^^^^^^^^^^^^
+Available OCR architectures
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 You can use any combination of detection and recognition models supported by docTR.
 
@@ -280,7 +302,7 @@ For a comprehensive comparison, we have compiled a detailed benchmark on publicl
 All OCR models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
 Explanations about the metrics being used are available in :ref:`metrics`.
 
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
+*Disclaimer: both FUNSD subsets combined have 199 pages which might not be representative enough of the model capabilities*
 
 
 Two-stage approaches
@@ -308,6 +330,10 @@ Additional arguments which can be passed to the `ocr_predictor` are:
 
 * `export_as_straight_boxes`: If you work with rotated and skewed documents but you still want to export straight bounding boxes and not polygons, set it to True.
 * `straighten_pages`: If you want to straighten the pages before sending them to the detection model, set it to True.
+* `detect_orientation`: If you want to estimate the general page orientation and add it to each page, set it to True.
+* `detect_language`: If you want to predict the language of the text on each page, set it to True.
+* `detect_layout`: If you want to run a layout detection model on each page and attach the detected regions to each page, set it to True (default: False).
+* `layout_arch`: The layout architecture name (e.g. ``'lw_detr_s'``, ``'lw_detr_m'``) or your own (fine-tuned) layout model instance to use when ``detect_layout=True``.
 
 For instance, this snippet instantiates an end-to-end ocr_predictor working with rotated documents, which preserves the aspect ratio of the documents, and returns polygons:
 
@@ -319,7 +345,7 @@ For instance, this snippet instantiates an end-to-end ocr_predictor working with
 
 Additionally, you can change the batch size of the underlying detection and recognition predictors to optimize the performance depending on your hardware:
 
-* `det_bs`: batch size for the detection model (default: 2)
+* `det_bs`: batch size for the detection model (default: 2) - will also be used for the layout model if ``detect_layout=True``
 * `reco_bs`: batch size for the recognition model (default: 128)
 
 .. code:: python3
@@ -341,6 +367,34 @@ For example to disable the automatic grouping of lines into blocks:
     model = ocr_predictor(pretrained=True, resolve_blocks=False)
 
 
+Detecting the document layout
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+In addition to running the :py:meth:`layout_predictor <doctr.models.layout.layout_predictor>` standalone, you can plug a layout detection model directly into the end-to-end pipeline by setting ``detect_layout=True``. The detected regions (e.g. Title, Text, Table, Page-header, Page-footer) are attached to every :class:`Page <doctr.io.Page>` and can be accessed through ``page.layout``, exported alongside the rest of the page, and rendered with :py:meth:`show <doctr.io.Page.show>`.
+
+.. code:: python3
+
+    from doctr.io import DocumentFile
+    from doctr.models import ocr_predictor
+
+    model = ocr_predictor(pretrained=True, detect_layout=True)
+    doc = DocumentFile.from_images("path/to/your/doc.jpg")
+    result = model(doc)
+
+    # Access the detected layout regions of the first page
+    for region in result.pages[0].layout:
+        print(region.type, region.confidence, region.geometry)
+
+    # The layout is part of the exported representation
+    export = result.pages[0].export()
+    print(export["layout"])
+
+    # Overlay both text and layout regions (use display_layout=False to hide the regions)
+    result.pages[0].show()
+
+The same ``detect_layout`` / ``layout_arch`` arguments are available for the :py:meth:`kie_predictor <doctr.models.kie_predictor>`.
+
+
 Running the predictors on GPU
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
@@ -369,6 +423,7 @@ The same approach applies to all standalone predictors:
 * `detection_predictor`
 * `crop_orientation_predictor`
 * `page_orientation_predictor`
+* `layout_predictor`
 
 Just create the predictor instance and move it to the appropriate device.
 To enable **half-precision inference**, you can append `.half()` after moving the predictor to the device.
@@ -378,6 +433,7 @@ What should I do with the output?
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 The ocr_predictor returns a `Document` object with a nested structure (with `Page`, `Block`, `Line`, `Word`, `Artefact`).
+When ``detect_layout=True`` was passed, each `Page` additionally carries a list of `LayoutElement` regions under ``page.layout``.
 To get a better understanding of our document model, check our :ref:`document_structure` section
 
 Here is a typical `Document` layout::
@@ -617,3 +673,4 @@ learned confusions, or a ``{forbidden_char: allowed_char}`` dict to override spe
     handle = add_whitelist(predictor, VOCABS["latin"], strategy="nearest")
     out = predictor(input_page)
     handle.remove()
+
diff --git a/doctr/io/elements.py b/doctr/io/elements.py
index c9cad3a12a..33506de9c2 100644
--- a/doctr/io/elements.py
+++ b/doctr/io/elements.py
@@ -22,7 +22,7 @@
 except ModuleNotFoundError:
     pass
 
-__all__ = ["Element", "Word", "Artefact", "Line", "Prediction", "Block", "Page", "KIEPage", "Document", "LayoutElement"]
+__all__ = ["Element", "Word", "Artefact", "Line", "Prediction", "Block", "Page", "KIEPage", "KIEDocument", "Document", "LayoutElement"]
 
 
 class Element(NestedObject):
diff --git a/doctr/models/classification/magc_resnet/pytorch.py b/doctr/models/classification/magc_resnet/pytorch.py
index 2f79467a83..0272973aba 100644
--- a/doctr/models/classification/magc_resnet/pytorch.py
+++ b/doctr/models/classification/magc_resnet/pytorch.py
@@ -152,7 +152,7 @@ def magc_resnet31(pretrained: bool = False, **kwargs: Any) -> ResNet:
     >>> out = model(input_tensor)
 
     Args:
-        pretrained: boolean, True if model is pretrained
+        pretrained: If True, returns a model pre-trained on our classification dataset
         **kwargs: keyword arguments of the ResNet architecture
 
     Returns:
diff --git a/doctr/models/classification/mobilenet/pytorch.py b/doctr/models/classification/mobilenet/pytorch.py
index 801d3b6fed..a6b78c249e 100644
--- a/doctr/models/classification/mobilenet/pytorch.py
+++ b/doctr/models/classification/mobilenet/pytorch.py
@@ -137,7 +137,7 @@ def mobilenet_v3_small(pretrained: bool = False, **kwargs: Any) -> mobilenetv3.M
     >>> out = model(input_tensor)
 
     Args:
-        pretrained: boolean, True if model is pretrained
+        pretrained: If True, returns a model pre-trained on our classification dataset
         **kwargs: keyword arguments of the MobileNetV3 architecture
 
     Returns:
@@ -160,7 +160,7 @@ def mobilenet_v3_small_r(pretrained: bool = False, **kwargs: Any) -> mobilenetv3
     >>> out = model(input_tensor)
 
     Args:
-        pretrained: boolean, True if model is pretrained
+        pretrained: If True, returns a model pre-trained on our classification dataset
         **kwargs: keyword arguments of the MobileNetV3 architecture
 
     Returns:
@@ -187,7 +187,7 @@ def mobilenet_v3_large(pretrained: bool = False, **kwargs: Any) -> mobilenetv3.M
     >>> out = model(input_tensor)
 
     Args:
-        pretrained: boolean, True if model is pretrained
+        pretrained: If True, returns a model pre-trained on our classification dataset
         **kwargs: keyword arguments of the MobileNetV3 architecture
 
     Returns:
@@ -213,7 +213,7 @@ def mobilenet_v3_large_r(pretrained: bool = False, **kwargs: Any) -> mobilenetv3
     >>> out = model(input_tensor)
 
     Args:
-        pretrained: boolean, True if model is pretrained
+        pretrained: If True, returns a model pre-trained on our classification dataset
         **kwargs: keyword arguments of the MobileNetV3 architecture
 
     Returns:
@@ -240,7 +240,7 @@ def mobilenet_v3_small_crop_orientation(pretrained: bool = False, **kwargs: Any)
     >>> out = model(input_tensor)
 
     Args:
-        pretrained: boolean, True if model is pretrained
+        pretrained: If True, returns a model pre-trained on our classification dataset
         **kwargs: keyword arguments of the MobileNetV3 architecture
 
     Returns:
@@ -266,7 +266,7 @@ def mobilenet_v3_small_page_orientation(pretrained: bool = False, **kwargs: Any)
     >>> out = model(input_tensor)
 
     Args:
-        pretrained: boolean, True if model is pretrained
+        pretrained: If True, returns a model pre-trained on our classification dataset
         **kwargs: keyword arguments of the MobileNetV3 architecture
 
     Returns:
diff --git a/doctr/models/classification/resnet/pytorch.py b/doctr/models/classification/resnet/pytorch.py
index 426b2d3d05..d072963e15 100644
--- a/doctr/models/classification/resnet/pytorch.py
+++ b/doctr/models/classification/resnet/pytorch.py
@@ -247,7 +247,7 @@ def resnet18(pretrained: bool = False, **kwargs: Any) -> TVResNet:
     >>> out = model(input_tensor)
 
     Args:
-        pretrained: boolean, True if model is pretrained
+        pretrained: If True, returns a model pre-trained on our classification dataset
         **kwargs: keyword arguments of the ResNet architecture
 
     Returns:
@@ -274,7 +274,7 @@ def resnet31(pretrained: bool = False, **kwargs: Any) -> ResNet:
     >>> out = model(input_tensor)
 
     Args:
-        pretrained: boolean, True if model is pretrained
+        pretrained: If True, returns a model pre-trained on our classification dataset
         **kwargs: keyword arguments of the ResNet architecture
 
     Returns:
@@ -306,7 +306,7 @@ def resnet34(pretrained: bool = False, **kwargs: Any) -> TVResNet:
     >>> out = model(input_tensor)
 
     Args:
-        pretrained: boolean, True if model is pretrained
+        pretrained: If True, returns a model pre-trained on our classification dataset
         **kwargs: keyword arguments of the ResNet architecture
 
     Returns:
@@ -332,7 +332,7 @@ def resnet34_wide(pretrained: bool = False, **kwargs: Any) -> ResNet:
     >>> out = model(input_tensor)
 
     Args:
-        pretrained: boolean, True if model is pretrained
+        pretrained: If True, returns a model pre-trained on our classification dataset
         **kwargs: keyword arguments of the ResNet architecture
 
     Returns:
@@ -364,7 +364,7 @@ def resnet50(pretrained: bool = False, **kwargs: Any) -> TVResNet:
     >>> out = model(input_tensor)
 
     Args:
-        pretrained: boolean, True if model is pretrained
+        pretrained: If True, returns a model pre-trained on our classification dataset
         **kwargs: keyword arguments of the ResNet architecture
 
     Returns:
diff --git a/doctr/models/classification/textnet/pytorch.py b/doctr/models/classification/textnet/pytorch.py
index 6f90219194..fb38381c1a 100644
--- a/doctr/models/classification/textnet/pytorch.py
+++ b/doctr/models/classification/textnet/pytorch.py
@@ -143,7 +143,7 @@ def textnet_tiny(pretrained: bool = False, **kwargs: Any) -> TextNet:
     >>> out = model(input_tensor)
 
     Args:
-        pretrained: boolean, True if model is pretrained
+        pretrained: If True, returns a model pre-trained on our classification dataset
         **kwargs: keyword arguments of the TextNet architecture
 
     Returns:
@@ -190,7 +190,7 @@ def textnet_small(pretrained: bool = False, **kwargs: Any) -> TextNet:
     >>> out = model(input_tensor)
 
     Args:
-        pretrained: boolean, True if model is pretrained
+        pretrained: If True, returns a model pre-trained on our classification dataset
         **kwargs: keyword arguments of the TextNet architecture
 
     Returns:
@@ -237,7 +237,7 @@ def textnet_base(pretrained: bool = False, **kwargs: Any) -> TextNet:
     >>> out = model(input_tensor)
 
     Args:
-        pretrained: boolean, True if model is pretrained
+        pretrained: If True, returns a model pre-trained on our classification dataset
         **kwargs: keyword arguments of the TextNet architecture
 
     Returns:
diff --git a/doctr/models/classification/vgg/pytorch.py b/doctr/models/classification/vgg/pytorch.py
index 823126ef86..0708dae6b9 100644
--- a/doctr/models/classification/vgg/pytorch.py
+++ b/doctr/models/classification/vgg/pytorch.py
@@ -92,7 +92,7 @@ def vgg16_bn_r(pretrained: bool = False, **kwargs: Any) -> tv_vgg.VGG:
     >>> out = model(input_tensor)
 
     Args:
-        pretrained (bool): If True, returns a model pre-trained on ImageNet
+        pretrained: If True, returns a model pre-trained on ImageNet
         **kwargs: keyword arguments of the VGG architecture
 
     Returns:
diff --git a/doctr/models/classification/vip/pytorch.py b/doctr/models/classification/vip/pytorch.py
index 907de205bd..dc00166b76 100644
--- a/doctr/models/classification/vip/pytorch.py
+++ b/doctr/models/classification/vip/pytorch.py
@@ -247,7 +247,7 @@ def vip_tiny(pretrained: bool = False, **kwargs: Any) -> VIPNet:
     https://github.com/cxfyxl/VIPTR/blob/main/modules/VIPTRv2.py)
 
     Args:
-        pretrained: whether to load pretrained weights
+        pretrained: If True, returns a model pre-trained on our classification dataset
         **kwargs: optional arguments
 
     Returns:
@@ -276,7 +276,7 @@ def vip_base(pretrained: bool = False, **kwargs: Any) -> VIPNet:
     https://github.com/cxfyxl/VIPTR/blob/main/modules/VIPTRv2.py)
 
     Args:
-        pretrained: whether to load pretrained weights
+        pretrained: If True, returns a model pre-trained on our classification dataset
         **kwargs: optional arguments
 
     Returns:
@@ -309,7 +309,7 @@ def _vip(
 
     Args:
         arch: architecture key
-        pretrained: load pretrained weights?
+        pretrained: If True, returns a model pre-trained on our classification dataset
         ignore_keys: layer keys to ignore
         **kwargs: arguments passed to VIPNet
 
diff --git a/doctr/models/classification/vit/pytorch.py b/doctr/models/classification/vit/pytorch.py
index fae95ebd70..a356d673a2 100644
--- a/doctr/models/classification/vit/pytorch.py
+++ b/doctr/models/classification/vit/pytorch.py
@@ -150,7 +150,7 @@ def vit_s(pretrained: bool = False, **kwargs: Any) -> VisionTransformer:
     >>> out = model(input_tensor)
 
     Args:
-        pretrained: boolean, True if model is pretrained
+        pretrained: If True, returns a model pre-trained on our classification dataset
         **kwargs: keyword arguments of the VisionTransformer architecture
 
     Returns:
@@ -180,7 +180,7 @@ def vit_b(pretrained: bool = False, **kwargs: Any) -> VisionTransformer:
     >>> out = model(input_tensor)
 
     Args:
-        pretrained: boolean, True if model is pretrained
+        pretrained: If True, returns a model pre-trained on our classification dataset
         **kwargs: keyword arguments of the VisionTransformer architecture
 
     Returns:
diff --git a/doctr/models/classification/vit_det/pytorch.py b/doctr/models/classification/vit_det/pytorch.py
index 0862a05a90..c5c1ad468b 100644
--- a/doctr/models/classification/vit_det/pytorch.py
+++ b/doctr/models/classification/vit_det/pytorch.py
@@ -296,7 +296,7 @@ def vit_det_s(pretrained: bool = False, **kwargs: Any) -> VisionDetectionTransfo
     >>> out = model(input_tensor)
 
     Args:
-        pretrained: boolean, True if model is pretrained
+        pretrained: If True, returns a model pre-trained on our classification dataset
         **kwargs: keyword arguments of the VisionDetectionTransformer architecture
 
     Returns:
@@ -328,7 +328,7 @@ def vit_det_m(pretrained: bool = False, **kwargs: Any) -> VisionDetectionTransfo
     >>> out = model(input_tensor)
 
     Args:
-        pretrained: boolean, True if model is pretrained
+        pretrained: If True, returns a model pre-trained on our classification dataset
         **kwargs: keyword arguments of the VisionTransformer architecture
 
     Returns:
diff --git a/doctr/models/detection/differentiable_binarization/pytorch.py b/doctr/models/detection/differentiable_binarization/pytorch.py
index 848bb009bd..526c39e8c0 100644
--- a/doctr/models/detection/differentiable_binarization/pytorch.py
+++ b/doctr/models/detection/differentiable_binarization/pytorch.py
@@ -353,7 +353,7 @@ def db_resnet34(pretrained: bool = False, **kwargs: Any) -> DBNet:
     >>> out = model(input_tensor)
 
     Args:
-        pretrained (bool): If True, returns a model pre-trained on our text detection dataset
+        pretrained: If True, returns a model pre-trained on our text detection dataset
         **kwargs: keyword arguments of the DBNet architecture
 
     Returns:
@@ -386,7 +386,7 @@ def db_resnet50(pretrained: bool = False, **kwargs: Any) -> DBNet:
     >>> out = model(input_tensor)
 
     Args:
-        pretrained (bool): If True, returns a model pre-trained on our text detection dataset
+        pretrained: If True, returns a model pre-trained on our text detection dataset
         **kwargs: keyword arguments of the DBNet architecture
 
     Returns:
@@ -419,7 +419,7 @@ def db_mobilenet_v3_large(pretrained: bool = False, **kwargs: Any) -> DBNet:
     >>> out = model(input_tensor)
 
     Args:
-        pretrained (bool): If True, returns a model pre-trained on our text detection dataset
+        pretrained: If True, returns a model pre-trained on our text detection dataset
         **kwargs: keyword arguments of the DBNet architecture
 
     Returns:
diff --git a/doctr/models/detection/fast/pytorch.py b/doctr/models/detection/fast/pytorch.py
index 15003bf459..9d9b1a75c5 100644
--- a/doctr/models/detection/fast/pytorch.py
+++ b/doctr/models/detection/fast/pytorch.py
@@ -374,7 +374,7 @@ def fast_tiny(pretrained: bool = False, **kwargs: Any) -> FAST:
     >>> out = model(input_tensor)
 
     Args:
-        pretrained (bool): If True, returns a model pre-trained on our text detection dataset
+        pretrained: If True, returns a model pre-trained on our text detection dataset
         **kwargs: keyword arguments of the DBNet architecture
 
     Returns:
@@ -401,7 +401,7 @@ def fast_small(pretrained: bool = False, **kwargs: Any) -> FAST:
     >>> out = model(input_tensor)
 
     Args:
-        pretrained (bool): If True, returns a model pre-trained on our text detection dataset
+        pretrained: If True, returns a model pre-trained on our text detection dataset
         **kwargs: keyword arguments of the DBNet architecture
 
     Returns:
@@ -428,7 +428,7 @@ def fast_base(pretrained: bool = False, **kwargs: Any) -> FAST:
     >>> out = model(input_tensor)
 
     Args:
-        pretrained (bool): If True, returns a model pre-trained on our text detection dataset
+        pretrained: If True, returns a model pre-trained on our text detection dataset
         **kwargs: keyword arguments of the DBNet architecture
 
     Returns:
diff --git a/doctr/models/detection/linknet/pytorch.py b/doctr/models/detection/linknet/pytorch.py
index bf973e07d7..611e73ad44 100644
--- a/doctr/models/detection/linknet/pytorch.py
+++ b/doctr/models/detection/linknet/pytorch.py
@@ -307,7 +307,7 @@ def linknet_resnet18(pretrained: bool = False, **kwargs: Any) -> LinkNet:
     >>> out = model(input_tensor)
 
     Args:
-        pretrained (bool): If True, returns a model pre-trained on our text detection dataset
+        pretrained: If True, returns a model pre-trained on our text detection dataset
         **kwargs: keyword arguments of the LinkNet architecture
 
     Returns:
@@ -337,7 +337,7 @@ def linknet_resnet34(pretrained: bool = False, **kwargs: Any) -> LinkNet:
     >>> out = model(input_tensor)
 
     Args:
-        pretrained (bool): If True, returns a model pre-trained on our text detection dataset
+        pretrained: If True, returns a model pre-trained on our text detection dataset
         **kwargs: keyword arguments of the LinkNet architecture
 
     Returns:
@@ -367,7 +367,7 @@ def linknet_resnet50(pretrained: bool = False, **kwargs: Any) -> LinkNet:
     >>> out = model(input_tensor)
 
     Args:
-        pretrained (bool): If True, returns a model pre-trained on our text detection dataset
+        pretrained: If True, returns a model pre-trained on our text detection dataset
         **kwargs: keyword arguments of the LinkNet architecture
 
     Returns:
diff --git a/doctr/models/layout/lw_detr/pytorch.py b/doctr/models/layout/lw_detr/pytorch.py
index 521f6df7cc..42a140f824 100644
--- a/doctr/models/layout/lw_detr/pytorch.py
+++ b/doctr/models/layout/lw_detr/pytorch.py
@@ -789,7 +789,7 @@ def lw_detr_s(pretrained: bool = False, **kwargs: Any) -> LWDETR:
     >>> out = model(input_tensor)
 
     Args:
-        pretrained (bool): If True, returns a model pre-trained on our text detection dataset
+        pretrained: If True, returns a model pre-trained on our text detection dataset
         **kwargs: keyword arguments of the LinkNet architecture
 
     Returns:
@@ -820,7 +820,7 @@ def lw_detr_m(pretrained: bool = False, **kwargs: Any) -> LWDETR:
     >>> out = model(input_tensor)
 
     Args:
-        pretrained (bool): If True, returns a model pre-trained on our text detection dataset
+        pretrained: If True, returns a model pre-trained on our text detection dataset
         **kwargs: keyword arguments of the LinkNet architecture
 
     Returns:
diff --git a/doctr/models/recognition/crnn/pytorch.py b/doctr/models/recognition/crnn/pytorch.py
index d8299c890f..be2e2030ee 100644
--- a/doctr/models/recognition/crnn/pytorch.py
+++ b/doctr/models/recognition/crnn/pytorch.py
@@ -279,7 +279,7 @@ def crnn_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> CRNN:
     >>> out = model(input_tensor)
 
     Args:
-        pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
+        pretrained: If True, returns a model pre-trained on our text recognition dataset
         **kwargs: keyword arguments of the CRNN architecture
 
     Returns:
@@ -299,7 +299,7 @@ def crnn_mobilenet_v3_small(pretrained: bool = False, **kwargs: Any) -> CRNN:
     >>> out = model(input_tensor)
 
     Args:
-        pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
+        pretrained: If True, returns a model pre-trained on our text recognition dataset
         **kwargs: keyword arguments of the CRNN architecture
 
     Returns:
@@ -325,7 +325,7 @@ def crnn_mobilenet_v3_large(pretrained: bool = False, **kwargs: Any) -> CRNN:
     >>> out = model(input_tensor)
 
     Args:
-        pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
+        pretrained: If True, returns a model pre-trained on our text recognition dataset
         **kwargs: keyword arguments of the CRNN architecture
 
     Returns:
diff --git a/doctr/models/recognition/master/pytorch.py b/doctr/models/recognition/master/pytorch.py
index a8dd64882e..8acfe2d1a5 100644
--- a/doctr/models/recognition/master/pytorch.py
+++ b/doctr/models/recognition/master/pytorch.py
@@ -325,7 +325,7 @@ def master(pretrained: bool = False, **kwargs: Any) -> MASTER:
     >>> out = model(input_tensor)
 
     Args:
-        pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
+        pretrained: If True, returns a model pre-trained on our text recognition dataset
         **kwargs: keywoard arguments passed to the MASTER architecture
 
     Returns:
diff --git a/doctr/models/recognition/parseq/pytorch.py b/doctr/models/recognition/parseq/pytorch.py
index ee3619d7cd..b75f8a3561 100644
--- a/doctr/models/recognition/parseq/pytorch.py
+++ b/doctr/models/recognition/parseq/pytorch.py
@@ -482,7 +482,7 @@ def parseq(pretrained: bool = False, **kwargs: Any) -> PARSeq:
     >>> out = model(input_tensor)
 
     Args:
-        pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
+        pretrained: If True, returns a model pre-trained on our text recognition dataset
         **kwargs: keyword arguments of the PARSeq architecture
 
     Returns:
diff --git a/doctr/models/recognition/sar/pytorch.py b/doctr/models/recognition/sar/pytorch.py
index 4e7c9c47e6..b3ee6c7a59 100644
--- a/doctr/models/recognition/sar/pytorch.py
+++ b/doctr/models/recognition/sar/pytorch.py
@@ -389,7 +389,7 @@ def sar_resnet31(pretrained: bool = False, **kwargs: Any) -> SAR:
     >>> out = model(input_tensor)
 
     Args:
-        pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
+        pretrained: If True, returns a model pre-trained on our text recognition dataset
         **kwargs: keyword arguments of the SAR architecture
 
     Returns:
diff --git a/doctr/models/recognition/utils.py b/doctr/models/recognition/utils.py
index d2c9dbce9d..3cdc85f3aa 100644
--- a/doctr/models/recognition/utils.py
+++ b/doctr/models/recognition/utils.py
@@ -20,12 +20,11 @@ def merge_strings(a: str, b: str, overlap_ratio: float) -> str:
     Returns:
         A merged character sequence.
 
-    Example::
-        >>> from doctr.models.recognition.utils import merge_strings
-        >>> merge_strings('abcd', 'cdefgh', 0.5)
-        'abcdefgh'
-        >>> merge_strings('abcdi', 'cdefgh', 0.5)
-        'abcdefgh'
+    >>> from doctr.models.recognition.utils import merge_strings
+    >>> merge_strings('abcd', 'cdefgh', 0.5)
+    'abcdefgh'
+    >>> merge_strings('abcdi', 'cdefgh', 0.5)
+    'abcdefgh'
     """
     seq_len = min(len(a), len(b))
     if seq_len <= 1:  # One sequence is empty or will be after cropping in next step, return both to keep data
@@ -78,10 +77,9 @@ def merge_multi_strings(seq_list: list[str], overlap_ratio: float, last_overlap_
     Returns:
         A merged character sequence
 
-    Example::
-        >>> from doctr.models.recognition.utils import merge_multi_strings
-        >>> merge_multi_strings(['abc', 'bcdef', 'difghi', 'aijkl'], 0.5, 0.1)
-        'abcdefghijkl'
+    >>> from doctr.models.recognition.utils import merge_multi_strings
+    >>> merge_multi_strings(['abc', 'bcdef', 'difghi', 'aijkl'], 0.5, 0.1)
+    'abcdefghijkl'
     """
     if not seq_list:
         return ""
diff --git a/doctr/models/recognition/viptr/pytorch.py b/doctr/models/recognition/viptr/pytorch.py
index beb0e8567b..279b8ee06c 100644
--- a/doctr/models/recognition/viptr/pytorch.py
+++ b/doctr/models/recognition/viptr/pytorch.py
@@ -261,7 +261,7 @@ def viptr_tiny(pretrained: bool = False, **kwargs: Any) -> VIPTR:
     >>> out = model(input_tensor)
 
     Args:
-        pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
+        pretrained: If True, returns a model pre-trained on our text recognition dataset
         **kwargs: keyword arguments of the VIPTR architecture
 
     Returns:
diff --git a/doctr/models/recognition/vitstr/pytorch.py b/doctr/models/recognition/vitstr/pytorch.py
index 80bfebca70..87dc653c22 100644
--- a/doctr/models/recognition/vitstr/pytorch.py
+++ b/doctr/models/recognition/vitstr/pytorch.py
@@ -239,7 +239,7 @@ def vitstr_small(pretrained: bool = False, **kwargs: Any) -> ViTSTR:
     >>> out = model(input_tensor)
 
     Args:
-        pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
+        pretrained: If True, returns a model pre-trained on our text recognition dataset
         kwargs: keyword arguments of the ViTSTR architecture
 
     Returns:
@@ -268,7 +268,7 @@ def vitstr_base(pretrained: bool = False, **kwargs: Any) -> ViTSTR:
     >>> out = model(input_tensor)
 
     Args:
-        pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
+        pretrained: If True, returns a model pre-trained on our text recognition dataset
         kwargs: keyword arguments of the ViTSTR architecture
 
     Returns:
diff --git a/doctr/models/recognition/zoo.py b/doctr/models/recognition/zoo.py
index a89c57738a..a28234c949 100644
--- a/doctr/models/recognition/zoo.py
+++ b/doctr/models/recognition/zoo.py
@@ -71,12 +71,11 @@ def recognition_predictor(
 ) -> RecognitionPredictor:
     """Text recognition architecture.
 
-    Example::
-        >>> import numpy as np
-        >>> from doctr.models import recognition_predictor
-        >>> model = recognition_predictor(pretrained=True)
-        >>> input_page = (255 * np.random.rand(32, 128, 3)).astype(np.uint8)
-        >>> out = model([input_page])
+    >>> import numpy as np
+    >>> from doctr.models import recognition_predictor
+    >>> model = recognition_predictor(pretrained=True)
+    >>> input_page = (255 * np.random.rand(32, 128, 3)).astype(np.uint8)
+    >>> out = model([input_page])
 
     Args:
         arch: name of the architecture or model itself to use (e.g. 'crnn_vgg16_bn')
diff --git a/doctr/models/zoo.py b/doctr/models/zoo.py
index bfa8026943..188692a77d 100644
--- a/doctr/models/zoo.py
+++ b/doctr/models/zoo.py
@@ -189,8 +189,8 @@ def kie_predictor(
     """End-to-end KIE architecture using one model for localization, and another for text recognition.
 
     >>> import numpy as np
-    >>> from doctr.models import ocr_predictor
-    >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+    >>> from doctr.models import kie_predictor
+    >>> model = kie_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
     >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
     >>> out = model([input_page])
 
diff --git a/doctr/utils/metrics.py b/doctr/utils/metrics.py
index 5d01cc8bd1..edc1e2d7eb 100644
--- a/doctr/utils/metrics.py
+++ b/doctr/utils/metrics.py
@@ -558,6 +558,7 @@ def reset(self) -> None:
 
 class ObjectDetectionMetric:
     r"""Implements a COCO-style object detection metric (mAP@[.5:.95]) inspired by the COCO evaluation protocol.
+
     The aggregated metrics are computed as follows:
 
     .. math::
@@ -577,11 +578,12 @@ class ObjectDetectionMetric:
         \sum\limits_{c \in \mathcal{C}} AP_t(c)
 
     where:
-        - :math:`\mathcal{B}` is the set of possible bounding boxes,
-        - :math:`\mathcal{C}` is the set of possible class indices,
-        - :math:`S` are confidence scores associated to predictions,
-        - :math:`\mathcal{T} = \{0.5, 0.55, \dots, 0.95\}` is the set of IoU thresholds,
-        - :math:`AP_t(c)` is the Average Precision for class :math:`c`
+
+    - :math:`\mathcal{B}` is the set of possible bounding boxes,
+    - :math:`\mathcal{C}` is the set of possible class indices,
+    - :math:`S` are confidence scores associated to predictions,
+    - :math:`\mathcal{T} = \{0.5, 0.55, \dots, 0.95\}` is the set of IoU thresholds,
+    - :math:`AP_t(c)` is the Average Precision for class :math:`c`
         at IoU threshold :math:`t`.
 
     For a given class and IoU threshold, predictions from all images are
@@ -589,8 +591,9 @@ class ObjectDetectionMetric:
 
     Each prediction is greedily matched to the unmatched ground-truth box
     with the highest IoU, provided that:
-        - the IoU is greater than or equal to the threshold,
-        - the ground-truth box has not already been matched.
+
+    - the IoU is greater than or equal to the threshold,
+    - the ground-truth box has not already been matched.
 
     True positives and false positives are accumulated to build a
     precision-recall curve.
diff --git a/doctr/utils/visualization.py b/doctr/utils/visualization.py
index bf634fe1d9..27a5a15ded 100644
--- a/doctr/utils/visualization.py
+++ b/doctr/utils/visualization.py
@@ -400,13 +400,14 @@ def visualize_kie_page(
 
 
 def draw_boxes(boxes: np.ndarray, image: np.ndarray, color: tuple[int, int, int] | None = None, **kwargs) -> None:
-    """Draw an array of relative straight boxes on an image
+    """Draw an array of relative straight boxes on an image.
 
     Args:
-        boxes: array of relative boxes, of shape (*, 4)
+        boxes: array of relative boxes, of shape ``(*, 4)``
         image: np array, float32 or uint8
         color: color to use for bounding box edges
         **kwargs: keyword arguments from `matplotlib.pyplot.plot`
+
     """
     h, w = image.shape[:2]
     # Convert boxes to absolute coords