Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 16 additions & 1 deletion docs/source/getting_started/installing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,6 @@ Install the last stable release of the package using `pip <https://pip.pypa.io/e

pip install python-doctr


We strive towards reducing framework-specific dependencies to a minimum, but some necessary features are developed by third-parties for specific frameworks. To avoid missing some dependencies for a specific framework, you can install specific builds as follows:

.. code:: bash
Expand All @@ -24,6 +23,12 @@ We strive towards reducing framework-specific dependencies to a minimum, but som
# or with preinstalled packages for visualization & html & contrib module support
pip install "python-doctr[viz,html,contrib]"

Available optional extras:

* ``viz``: installs `matplotlib` and `mplcursors` for result visualisation (e.g. ``Page.show()``)
* ``html``: installs `weasyprint` for reading HTML documents via :func:`~doctr.io.read_html`
* ``contrib``: installs `onnxruntime` for the :class:`~doctr.contrib.ArtefactDetector` contrib module


Via Git
=======
Expand All @@ -35,3 +40,13 @@ Install the library in developer mode:

git clone https://github.com/mindee/doctr.git
pip install -e doctr/.


Via Docker
==========

Official Docker images are available on the `GitHub Container Registry <https://github.com/mindee/doctr/pkgs/container/doctr>`_.

.. code:: bash

docker run -it ghcr.io/mindee/doctr:latest bash
110 changes: 110 additions & 0 deletions docs/source/getting_started/quickstart.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@

**********
Quickstart
**********

This page shows you how to get OCR results from a document in just a few lines of code.
For more details see :ref:`using_models`.


Load a document
===============

docTR can read PDFs, images, and web pages:

.. code:: python3

from doctr.io import DocumentFile

# From a PDF
doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
# From one or more images
doc = DocumentFile.from_images("path/to/your/img.jpg")
doc = DocumentFile.from_images(["path/to/page1.jpg", "path/to/page2.jpg"])
# From a URL (requires the ``html`` extra: pip install "python-doctr[html]")
doc = DocumentFile.from_url("https://www.example.com")


Run OCR
=======

.. code:: python3

from doctr.io import DocumentFile
from doctr.models import ocr_predictor

doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
model = ocr_predictor(pretrained=True)
result = model(doc)

The predictor uses ``db_resnet50`` for text detection and ``crnn_vgg16_bn`` for text recognition by default.
You can choose any combination of :ref:`supported architectures <using_models>`.


Inspect the output
==================

The result is a :class:`~doctr.io.Document` object.

Render as plain text::

print(result.render())

Export as a nested dictionary (JSON-serialisable)::

import json
print(json.dumps(result.export(), indent=2))

Visualize on screen (requires the ``viz`` extra: ``pip install "python-doctr[viz]"``)::

result.pages[0].show()


Multi-page PDF end-to-end example
==================================

The following snippet processes every page of a PDF and collects the plain-text output:

.. code:: python3

import json
from doctr.io import DocumentFile
from doctr.models import ocr_predictor

model = ocr_predictor(pretrained=True)
doc = DocumentFile.from_pdf("path/to/multi_page.pdf")
result = model(doc)

# Plain-text — one string per page
for page_idx, page in enumerate(result.pages):
print(f"--- Page {page_idx + 1} ---")
print(page.render())

# Structured output — JSON-serialisable dict
output = result.export()
with open("ocr_output.json", "w") as f:
json.dump(output, f, indent=2)


Common pitfalls
===============

.. note::

* **Visualization** requires the ``viz`` extra (installs ``matplotlib`` and ``mplcursors``):
``pip install "python-doctr[viz]"``. Calls to ``result.show()`` or
``result.pages[0].show()`` raise a ``ModuleNotFoundError`` without it.
* **HTML input** requires the ``html`` extra: ``pip install "python-doctr[html]"``.
* **Image format**: pass file paths or NumPy ``uint8`` arrays shaped ``(H, W, C)`` in
RGB order. Grayscale arrays must be converted to 3-channel before use.
* **Pretrained weights** are downloaded on first use and cached locally. Subsequent calls are instantaneous.
* **PDF pages are returned as images**: ``DocumentFile.from_pdf`` returns one
NumPy array per page, so ``result.pages[i]`` corresponds to the *i*-th PDF page.


Next steps
==========

* :doc:`../using_doctr/using_models` - full predictor guide, architecture benchmarks, GPU usage.
* :doc:`../using_doctr/custom_models_training` - train and load your own models.
* :doc:`../using_doctr/sharing_models` - share your trained models on Hugging Face Hub.
2 changes: 2 additions & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ Main Features
-------------

* |:robot:| Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
* |:page_facing_up:| Layout analysis predictor for detecting document regions (tables, figures, headers, …)
* |:zap:| User-friendly, 3 lines of code to load a document and extract text with a predictor
* |:rocket:| State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
* |:zap:| Optimized for inference speed on both CPU & GPU
Expand All @@ -32,6 +33,7 @@ Main Features
:hidden:

getting_started/installing
getting_started/quickstart
notebooks


Expand Down
35 changes: 35 additions & 0 deletions docs/source/modules/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,12 @@ A Word is an uninterrupted sequence of characters.

.. autoclass:: Word

Prediction
^^^^^^^^^^
A Prediction is a Word with an additional crop orientation field indicating the detected text rotation angle.

.. autoclass:: Prediction

Line
^^^^
A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
Expand All @@ -33,6 +39,13 @@ An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, l

.. autoclass:: Artefact

LayoutElement
^^^^^^^^^^^^^

A LayoutElement is a region predicted by a layout detection model (e.g. Title, Text, Table, Page-header, Page-footer). Layout regions are attached to a :class:`Page` when the ``ocr_predictor`` / ``kie_predictor`` is run with ``detect_layout=True``.

.. autoclass:: LayoutElement

Block
^^^^^
A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
Expand All @@ -49,6 +62,17 @@ A Page is a collection of Blocks that were on the same physical page.
.. automethod:: show


KIEPage
^^^^^^^

A KIEPage is returned by the :py:meth:`kie_predictor <doctr.models.kie_predictor>`. It groups predictions by
semantic class rather than by spatial layout.

.. autoclass:: KIEPage

.. automethod:: show


Document
^^^^^^^^

Expand All @@ -59,6 +83,17 @@ A Document is a collection of Pages.
.. automethod:: show


KIEDocument
^^^^^^^^^^^

A KIEDocument is a collection of :class:`KIEPage` elements, returned by the
:py:meth:`kie_predictor <doctr.models.kie_predictor>`.

.. autoclass:: KIEDocument

.. automethod:: show


File reading
------------

Expand Down
12 changes: 12 additions & 0 deletions docs/source/modules/models.rst
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,8 @@ doctr.models.layout

.. autofunction:: doctr.models.layout.lw_detr_m

.. autofunction:: doctr.models.layout.layout_predictor


doctr.models.recognition
------------------------
Expand Down Expand Up @@ -124,3 +126,13 @@ doctr.models.factory
.. autofunction:: doctr.models.factory.from_hub

.. autofunction:: doctr.models.factory.push_to_hf_hub


doctr.models.utils
------------------

.. currentmodule:: doctr.models.utils

.. autofunction:: export_model_to_onnx

.. autofunction:: add_whitelist
6 changes: 6 additions & 0 deletions docs/source/modules/utils.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,13 +14,19 @@ Easy-to-use functions to make sense of your model's predictions.

.. autofunction:: visualize_page

.. autofunction:: visualize_kie_page

.. autofunction:: draw_boxes

Reconstitution
---------------

.. currentmodule:: doctr.utils.reconstitution

.. autofunction:: synthesize_page

.. autofunction:: synthesize_kie_page


.. _metrics:

Expand Down
18 changes: 13 additions & 5 deletions docs/source/using_doctr/custom_models_training.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _custom_models_training:

Train your own model
====================

Expand Down Expand Up @@ -54,18 +56,24 @@ Load a custom recognition model trained on another vocabulary as the default one
predictor = ocr_predictor(det_arch='linknet_resnet18', reco_arch=reco_model, pretrained=True)


Load a custom layout analysis model trained on another set of classes as the default one:
Plug a custom layout analysis model (trained on another set of classes) directly into the OCR pipeline so the detected regions are attached to every page:

.. code:: python3

import torch
from doctr.models import layout_predictor, lw_detr_s
from doctr.datasets import VOCABS
from doctr.models import ocr_predictor, lw_detr_s

layout_model = lw_detr_s(pretrained=False, class_names=["class_name_1", "class_name_2", ...])
# Custom layout model with your own class names
layout_model = lw_detr_s(pretrained=False, class_names=["heading", "paragraph", "figure", "table"])
layout_model.from_pretrained('<path_to_pt>')

predictor = layout_predictor(layout_arch=layout_model, pretrained=True)
# Pass it through `layout_arch`, exactly as for the detection / recognition models
predictor = ocr_predictor(pretrained=True, detect_layout=True, layout_arch=layout_model)

result = predictor(doc)
# The regions (with your custom class names) are available on each page
print([(region.type, region.confidence) for region in result.pages[0].layout])


Load a custom trained KIE detection model:

Expand Down
36 changes: 24 additions & 12 deletions docs/source/using_doctr/sharing_models.rst
Original file line number Diff line number Diff line change
@@ -1,40 +1,40 @@
Share your model with the community
===================================

docTR's focus is on open source, so if you also feel in love with than we appreciate sharing your trained model with the community.
To make it easy for you, we have integrated a interface to the huggingface hub.
docTR's focus is on open source, and if you feel the same way, we appreciate you sharing your trained model with the community.
To make it easy for you, we have integrated an interface to the Hugging Face Hub.

.. currentmodule:: doctr.models.factory


Loading from Huggingface Hub
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Loading from Hugging Face Hub
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

This section shows how you can easily load a pretrained model from the Huggingface Hub.
This section shows how you can easily load a pretrained model from the Hugging Face Hub.

.. code:: python3

from doctr.io import DocumentFile
from doctr.models import ocr_predictor, from_hub
image = DocumentFile.from_images(['data/example.jpg'])
# Load a custom detection model from huggingface hub
# Load a custom detection model from the Hugging Face Hub
det_model = from_hub('Felix92/doctr-torch-db-mobilenet-v3-large')
# Load a custom recognition model from huggingface hub
# Load a custom recognition model from the Hugging Face Hub
reco_model = from_hub('Felix92/doctr-torch-crnn-mobilenet-v3-large-french')
# You can easily plug in this models to the OCR predictor
# You can easily plug these models in to the OCR predictor
predictor = ocr_predictor(det_arch=det_model, reco_arch=reco_model)
result = predictor(image)


Pushing to the Huggingface Hub
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Pushing to the Hugging Face Hub
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

You can also push your trained model to the Huggingface Hub.
You can also push your trained model to the Hugging Face Hub.
You need only to provide the task type (classification, detection, recognition or obj_detection), a name for your trained model (NOTE:
existing repositories will not be overwritten) and the model name itself.

- Prerequisites:
- Huggingface account (you can easy create one at https://huggingface.co/)
- Hugging Face account (you can easily create one at https://huggingface.co/)
- installed Git LFS (check installation at: https://git-lfs.github.com/) in the repository

.. code:: python3
Expand Down Expand Up @@ -68,6 +68,8 @@ We suggest using the following naming conventions for your models:

**Recognition:** ``doctr-<architecture>-<vocab>``

**Layout:** ``doctr-<architecture>``


Classification
--------------
Expand Down Expand Up @@ -101,3 +103,13 @@ Recognition
+---------------------------------+---------------------------------------------------+---------------------+------------------------+
| parseq | rania-sr/doctr-model-v1-arabic | arabic | PyTorch |
+---------------------------------+---------------------------------------------------+---------------------+------------------------+


Layout
------

+---------------------------------+---------------------------------------------------+------------------------+
| **Architecture** | **Repo_ID** | **Framework** |
+=================================+===================================================+========================+
| lw_detr_s (dummy) | Felix92/doctr-dummy-torch-lw-detr-s | PyTorch |
+---------------------------------+---------------------------------------------------+------------------------+
2 changes: 1 addition & 1 deletion docs/source/using_doctr/using_model_export.rst
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ Further information can be found in the `PyTorch documentation <https://pytorch.
mobilenet_v3_small_page_orientation(pretrained=True).eval()
)

predictor = models.ocr_predictor(
predictor = ocr_predictor(
detection_model, recognition_model, assume_straight_pages=False
)
# NOTE: Only required for non-straight pages (`assume_straight_pages=False`) and non-disabled orientation classification
Expand Down
Loading
Loading