diff --git a/docs/source/getting_started/installing.rst b/docs/source/getting_started/installing.rst index 17637625b1..8552a2434e 100644 --- a/docs/source/getting_started/installing.rst +++ b/docs/source/getting_started/installing.rst @@ -15,7 +15,6 @@ Install the last stable release of the package using `pip `_. + +.. code:: bash + + docker run -it ghcr.io/mindee/doctr:latest bash diff --git a/docs/source/getting_started/quickstart.rst b/docs/source/getting_started/quickstart.rst new file mode 100644 index 0000000000..94818a2415 --- /dev/null +++ b/docs/source/getting_started/quickstart.rst @@ -0,0 +1,110 @@ + +********** +Quickstart +********** + +This page shows you how to get OCR results from a document in just a few lines of code. +For more details see :ref:`using_models`. + + +Load a document +=============== + +docTR can read PDFs, images, and web pages: + +.. code:: python3 + + from doctr.io import DocumentFile + + # From a PDF + doc = DocumentFile.from_pdf("path/to/your/doc.pdf") + # From one or more images + doc = DocumentFile.from_images("path/to/your/img.jpg") + doc = DocumentFile.from_images(["path/to/page1.jpg", "path/to/page2.jpg"]) + # From a URL (requires the ``html`` extra: pip install "python-doctr[html]") + doc = DocumentFile.from_url("https://www.example.com") + + +Run OCR +======= + +.. code:: python3 + + from doctr.io import DocumentFile + from doctr.models import ocr_predictor + + doc = DocumentFile.from_pdf("path/to/your/doc.pdf") + model = ocr_predictor(pretrained=True) + result = model(doc) + +The predictor uses ``db_resnet50`` for text detection and ``crnn_vgg16_bn`` for text recognition by default. +You can choose any combination of :ref:`supported architectures `. + + +Inspect the output +================== + +The result is a :class:`~doctr.io.Document` object. + +Render as plain text:: + + print(result.render()) + +Export as a nested dictionary (JSON-serialisable):: + + import json + print(json.dumps(result.export(), indent=2)) + +Visualize on screen (requires the ``viz`` extra: ``pip install "python-doctr[viz]"``):: + + result.pages[0].show() + + +Multi-page PDF end-to-end example +================================== + +The following snippet processes every page of a PDF and collects the plain-text output: + +.. code:: python3 + + import json + from doctr.io import DocumentFile + from doctr.models import ocr_predictor + + model = ocr_predictor(pretrained=True) + doc = DocumentFile.from_pdf("path/to/multi_page.pdf") + result = model(doc) + + # Plain-text — one string per page + for page_idx, page in enumerate(result.pages): + print(f"--- Page {page_idx + 1} ---") + print(page.render()) + + # Structured output — JSON-serialisable dict + output = result.export() + with open("ocr_output.json", "w") as f: + json.dump(output, f, indent=2) + + +Common pitfalls +=============== + +.. note:: + + * **Visualization** requires the ``viz`` extra (installs ``matplotlib`` and ``mplcursors``): + ``pip install "python-doctr[viz]"``. Calls to ``result.show()`` or + ``result.pages[0].show()`` raise a ``ModuleNotFoundError`` without it. + * **HTML input** requires the ``html`` extra: ``pip install "python-doctr[html]"``. + * **Image format**: pass file paths or NumPy ``uint8`` arrays shaped ``(H, W, C)`` in + RGB order. Grayscale arrays must be converted to 3-channel before use. + * **Pretrained weights** are downloaded on first use and cached locally. Subsequent calls are instantaneous. + * **PDF pages are returned as images**: ``DocumentFile.from_pdf`` returns one + NumPy array per page, so ``result.pages[i]`` corresponds to the *i*-th PDF page. + + +Next steps +========== + +* :doc:`../using_doctr/using_models` - full predictor guide, architecture benchmarks, GPU usage. +* :doc:`../using_doctr/custom_models_training` - train and load your own models. +* :doc:`../using_doctr/sharing_models` - share your trained models on Hugging Face Hub. diff --git a/docs/source/index.rst b/docs/source/index.rst index 76ca44eed6..1cfdb298b6 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -18,6 +18,7 @@ Main Features ------------- * |:robot:| Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters +* |:page_facing_up:| Layout analysis predictor for detecting document regions (tables, figures, headers, …) * |:zap:| User-friendly, 3 lines of code to load a document and extract text with a predictor * |:rocket:| State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract * |:zap:| Optimized for inference speed on both CPU & GPU @@ -32,6 +33,7 @@ Main Features :hidden: getting_started/installing + getting_started/quickstart notebooks diff --git a/docs/source/modules/io.rst b/docs/source/modules/io.rst index 7ac74025b0..56d88e014d 100644 --- a/docs/source/modules/io.rst +++ b/docs/source/modules/io.rst @@ -20,6 +20,12 @@ A Word is an uninterrupted sequence of characters. .. autoclass:: Word +Prediction +^^^^^^^^^^ +A Prediction is a Word with an additional crop orientation field indicating the detected text rotation angle. + +.. autoclass:: Prediction + Line ^^^^ A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines). @@ -33,6 +39,13 @@ An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, l .. autoclass:: Artefact +LayoutElement +^^^^^^^^^^^^^ + +A LayoutElement is a region predicted by a layout detection model (e.g. Title, Text, Table, Page-header, Page-footer). Layout regions are attached to a :class:`Page` when the ``ocr_predictor`` / ``kie_predictor`` is run with ``detect_layout=True``. + +.. autoclass:: LayoutElement + Block ^^^^^ A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath). @@ -49,6 +62,17 @@ A Page is a collection of Blocks that were on the same physical page. .. automethod:: show +KIEPage +^^^^^^^ + +A KIEPage is returned by the :py:meth:`kie_predictor `. It groups predictions by +semantic class rather than by spatial layout. + +.. autoclass:: KIEPage + + .. automethod:: show + + Document ^^^^^^^^ @@ -59,6 +83,17 @@ A Document is a collection of Pages. .. automethod:: show +KIEDocument +^^^^^^^^^^^ + +A KIEDocument is a collection of :class:`KIEPage` elements, returned by the +:py:meth:`kie_predictor `. + +.. autoclass:: KIEDocument + + .. automethod:: show + + File reading ------------ diff --git a/docs/source/modules/models.rst b/docs/source/modules/models.rst index 55ce88a365..f30c7d1dda 100644 --- a/docs/source/modules/models.rst +++ b/docs/source/modules/models.rst @@ -83,6 +83,8 @@ doctr.models.layout .. autofunction:: doctr.models.layout.lw_detr_m +.. autofunction:: doctr.models.layout.layout_predictor + doctr.models.recognition ------------------------ @@ -124,3 +126,13 @@ doctr.models.factory .. autofunction:: doctr.models.factory.from_hub .. autofunction:: doctr.models.factory.push_to_hf_hub + + +doctr.models.utils +------------------ + +.. currentmodule:: doctr.models.utils + +.. autofunction:: export_model_to_onnx + +.. autofunction:: add_whitelist diff --git a/docs/source/modules/utils.rst b/docs/source/modules/utils.rst index a80f663c37..59bb8f7939 100644 --- a/docs/source/modules/utils.rst +++ b/docs/source/modules/utils.rst @@ -14,6 +14,10 @@ Easy-to-use functions to make sense of your model's predictions. .. autofunction:: visualize_page +.. autofunction:: visualize_kie_page + +.. autofunction:: draw_boxes + Reconstitution --------------- @@ -21,6 +25,8 @@ Reconstitution .. autofunction:: synthesize_page +.. autofunction:: synthesize_kie_page + .. _metrics: diff --git a/docs/source/using_doctr/custom_models_training.rst b/docs/source/using_doctr/custom_models_training.rst index 9b28df0fbb..461ab22acc 100644 --- a/docs/source/using_doctr/custom_models_training.rst +++ b/docs/source/using_doctr/custom_models_training.rst @@ -1,3 +1,5 @@ +.. _custom_models_training: + Train your own model ==================== @@ -54,18 +56,24 @@ Load a custom recognition model trained on another vocabulary as the default one predictor = ocr_predictor(det_arch='linknet_resnet18', reco_arch=reco_model, pretrained=True) -Load a custom layout analysis model trained on another set of classes as the default one: +Plug a custom layout analysis model (trained on another set of classes) directly into the OCR pipeline so the detected regions are attached to every page: .. code:: python3 import torch - from doctr.models import layout_predictor, lw_detr_s - from doctr.datasets import VOCABS + from doctr.models import ocr_predictor, lw_detr_s - layout_model = lw_detr_s(pretrained=False, class_names=["class_name_1", "class_name_2", ...]) + # Custom layout model with your own class names + layout_model = lw_detr_s(pretrained=False, class_names=["heading", "paragraph", "figure", "table"]) layout_model.from_pretrained('') - predictor = layout_predictor(layout_arch=layout_model, pretrained=True) + # Pass it through `layout_arch`, exactly as for the detection / recognition models + predictor = ocr_predictor(pretrained=True, detect_layout=True, layout_arch=layout_model) + + result = predictor(doc) + # The regions (with your custom class names) are available on each page + print([(region.type, region.confidence) for region in result.pages[0].layout]) + Load a custom trained KIE detection model: diff --git a/docs/source/using_doctr/sharing_models.rst b/docs/source/using_doctr/sharing_models.rst index b2dcbfbc6f..d7206040eb 100644 --- a/docs/source/using_doctr/sharing_models.rst +++ b/docs/source/using_doctr/sharing_models.rst @@ -1,40 +1,40 @@ Share your model with the community =================================== -docTR's focus is on open source, so if you also feel in love with than we appreciate sharing your trained model with the community. -To make it easy for you, we have integrated a interface to the huggingface hub. +docTR's focus is on open source, and if you feel the same way, we appreciate you sharing your trained model with the community. +To make it easy for you, we have integrated an interface to the Hugging Face Hub. .. currentmodule:: doctr.models.factory -Loading from Huggingface Hub -^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +Loading from Hugging Face Hub +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -This section shows how you can easily load a pretrained model from the Huggingface Hub. +This section shows how you can easily load a pretrained model from the Hugging Face Hub. .. code:: python3 from doctr.io import DocumentFile from doctr.models import ocr_predictor, from_hub image = DocumentFile.from_images(['data/example.jpg']) - # Load a custom detection model from huggingface hub + # Load a custom detection model from the Hugging Face Hub det_model = from_hub('Felix92/doctr-torch-db-mobilenet-v3-large') - # Load a custom recognition model from huggingface hub + # Load a custom recognition model from the Hugging Face Hub reco_model = from_hub('Felix92/doctr-torch-crnn-mobilenet-v3-large-french') - # You can easily plug in this models to the OCR predictor + # You can easily plug these models in to the OCR predictor predictor = ocr_predictor(det_arch=det_model, reco_arch=reco_model) result = predictor(image) -Pushing to the Huggingface Hub -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +Pushing to the Hugging Face Hub +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -You can also push your trained model to the Huggingface Hub. +You can also push your trained model to the Hugging Face Hub. You need only to provide the task type (classification, detection, recognition or obj_detection), a name for your trained model (NOTE: existing repositories will not be overwritten) and the model name itself. - Prerequisites: - - Huggingface account (you can easy create one at https://huggingface.co/) + - Hugging Face account (you can easily create one at https://huggingface.co/) - installed Git LFS (check installation at: https://git-lfs.github.com/) in the repository .. code:: python3 @@ -68,6 +68,8 @@ We suggest using the following naming conventions for your models: **Recognition:** ``doctr--`` +**Layout:** ``doctr-`` + Classification -------------- @@ -101,3 +103,13 @@ Recognition +---------------------------------+---------------------------------------------------+---------------------+------------------------+ | parseq | rania-sr/doctr-model-v1-arabic | arabic | PyTorch | +---------------------------------+---------------------------------------------------+---------------------+------------------------+ + + +Layout +------ + ++---------------------------------+---------------------------------------------------+------------------------+ +| **Architecture** | **Repo_ID** | **Framework** | ++=================================+===================================================+========================+ +| lw_detr_s (dummy) | Felix92/doctr-dummy-torch-lw-detr-s | PyTorch | ++---------------------------------+---------------------------------------------------+------------------------+ diff --git a/docs/source/using_doctr/using_model_export.rst b/docs/source/using_doctr/using_model_export.rst index a3c18fea9c..7cf94accf8 100644 --- a/docs/source/using_doctr/using_model_export.rst +++ b/docs/source/using_doctr/using_model_export.rst @@ -76,7 +76,7 @@ Further information can be found in the `PyTorch documentation ` + * - Detect document regions by type (tables, figures, headers, …) + - :py:meth:`layout_predictor ` + * - Get word bounding-boxes only, without recognition + - :py:meth:`detection_predictor ` + * - Transcribe pre-cropped word images to strings + - :py:meth:`recognition_predictor ` + +For :doc:`custom model loading ` or sharing models, see the dedicated pages. + + Text Detection -------------- @@ -17,12 +40,11 @@ The task consists of localizing textual elements in a given image. While those text elements can represent many things, in docTR, we will consider uninterrupted character sequences (words). Additionally, the localization can take several forms: from straight bounding boxes (delimited by the 2D coordinates of the top-left and bottom-right corner), to polygons, or binary segmentation (flagging which pixels belong to this element, and which don't). Our latest detection models works with rotated and skewed documents! -Available architectures -^^^^^^^^^^^^^^^^^^^^^^^ +Available detection architectures +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The following architectures are currently supported: -* :py:meth:`linknet_resnet18 ` * :py:meth:`linknet_resnet34 ` * :py:meth:`linknet_resnet50 ` * :py:meth:`db_resnet50 ` @@ -70,7 +92,7 @@ Seconds per iteration (with a batch size of 1) is computed after a warmup phase Detection predictors ^^^^^^^^^^^^^^^^^^^^ -:py:meth:`detection_predictor ` wraps your detection model to make it easily useable with your favorite deep learning framework seamlessly. +:py:meth:`detection_predictor ` wraps your detection model to make it easily usable with your favorite deep learning framework seamlessly. .. code:: python3 @@ -81,12 +103,11 @@ Detection predictors out = model([dummy_img]) You can pass specific boolean arguments to the predictor: -* `pretrained`: if you want to use a model that has been pretrained on a specific dataset, setting `pretrained=True` this will load the corresponding weights. If `pretrained=False`, which is the default, would otherwise lead to a random initialization and would lead to no/useless results. -* `assume_straight_pages`: if you work with straight documents only, it will fit straight bounding boxes to the text areas. -* `preserve_aspect_ratio`: if you want to preserve the aspect ratio of your documents while resizing before sending them to the model. -* `symmetric_pad`: if you choose to preserve the aspect ratio, it will pad the image symmetrically and not from the bottom-right. -For instance, this snippet will instantiates a detection predictor able to detect text on rotated documents while preserving the aspect ratio: +* ``pretrained``: if you want to use a model that has been pretrained on a specific dataset, setting ``pretrained=True`` will load the corresponding weights. If ``pretrained=False`` (the default), the model is randomly initialized and will produce no useful results. +* ``assume_straight_pages``: if you work with straight documents only, it will fit straight bounding boxes to the text areas. +* ``preserve_aspect_ratio``: if you want to preserve the aspect ratio of your documents while resizing before sending them to the model. +* ``symmetric_pad``: if you choose to preserve the aspect ratio, it will pad the image symmetrically and not from the bottom-right. .. code:: python3 @@ -100,8 +121,8 @@ Text Recognition The task consists of transcribing the character sequence in a given image. -Available architectures -^^^^^^^^^^^^^^^^^^^^^^^ +Available recognition architectures +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The following architectures are currently supported: @@ -156,14 +177,14 @@ While most of our recognition models were trained on our french vocab (cf. :ref: print(predictor.model.cfg['vocab']) -*Disclaimer: both FUNSD subsets combine have 30595 word-level crops which might not be representative enough of the model capabilities* +*Disclaimer: both FUNSD subsets combined have 30595 word-level crops which might not be representative enough of the model capabilities* Seconds per iteration (with a batch size of 64) is computed after a warmup phase of 100 tensors, by measuring the average number of processed tensors per second over 1000 samples. Those results were obtained on a `11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz`. Recognition predictors ^^^^^^^^^^^^^^^^^^^^^^ -:py:meth:`recognition_predictor ` wraps your recognition model to make it easily useable with your favorite deep learning framework seamlessly. +:py:meth:`recognition_predictor ` wraps your recognition model to make it easily usable with your favorite deep learning framework seamlessly. .. code:: python3 @@ -181,8 +202,8 @@ The task consists of localizing and classifying visual elements in a given image This is a more general task than text detection, as it can be used to detect and classify any type of visual element in a document, such as tables, figures, headers, footers, etc. Our latest layout models works with rotated and skewed documents! -Available architectures -^^^^^^^^^^^^^^^^^^^^^^^ +Available layout architectures +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The following architectures are currently supported: @@ -210,7 +231,7 @@ Seconds per iteration (with a batch size of 1) is computed after a warmup phase Layout predictors ^^^^^^^^^^^^^^^^^ -:py:meth:`layout_predictor ` wraps your layout model to make it easily useable with your favorite deep learning framework seamlessly. +:py:meth:`layout_predictor ` wraps your layout model to make it easily usable with your favorite deep learning framework seamlessly. .. code:: python3 @@ -221,12 +242,13 @@ Layout predictors out = model([dummy_img]) You can pass specific boolean arguments to the predictor: -* `pretrained`: if you want to use a model that has been pretrained on a specific dataset, setting `pretrained=True` this will load the corresponding weights. If `pretrained=False`, which is the default, would otherwise lead to a random initialization and would lead to no/useless results. -* `assume_straight_pages`: if you work with straight documents only, it will fit straight bounding boxes to the text areas. -* `preserve_aspect_ratio`: if you want to preserve the aspect ratio of your documents while resizing before sending them to the model. -* `symmetric_pad`: if you choose to preserve the aspect ratio, it will pad the image symmetrically and not from the bottom-right. -For instance, this snippet will instantiates a layout predictor able to detect text on rotated documents while preserving the aspect ratio: +* ``pretrained``: if you want to use a model that has been pretrained on a specific dataset, setting ``pretrained=True`` will load the corresponding weights. If ``pretrained=False`` (the default), the model is randomly initialized and will produce no useful results. +* ``assume_straight_pages``: if you work with straight documents only, it will fit straight bounding boxes to the text areas. +* ``preserve_aspect_ratio``: if you want to preserve the aspect ratio of your documents while resizing before sending them to the model. +* ``symmetric_pad``: if you choose to preserve the aspect ratio, it will pad the image symmetrically and not from the bottom-right. + +For instance, this snippet instantiates a layout predictor able to detect text on rotated documents while preserving the aspect ratio: .. code:: python3 @@ -239,8 +261,8 @@ End-to-End OCR The task consists of both localizing and transcribing textual elements in a given image. -Available architectures -^^^^^^^^^^^^^^^^^^^^^^^ +Available OCR architectures +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ You can use any combination of detection and recognition models supported by docTR. @@ -280,7 +302,7 @@ For a comprehensive comparison, we have compiled a detailed benchmark on publicl All OCR models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`). Explanations about the metrics being used are available in :ref:`metrics`. -*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities* +*Disclaimer: both FUNSD subsets combined have 199 pages which might not be representative enough of the model capabilities* Two-stage approaches @@ -308,6 +330,10 @@ Additional arguments which can be passed to the `ocr_predictor` are: * `export_as_straight_boxes`: If you work with rotated and skewed documents but you still want to export straight bounding boxes and not polygons, set it to True. * `straighten_pages`: If you want to straighten the pages before sending them to the detection model, set it to True. +* `detect_orientation`: If you want to estimate the general page orientation and add it to each page, set it to True. +* `detect_language`: If you want to predict the language of the text on each page, set it to True. +* `detect_layout`: If you want to run a layout detection model on each page and attach the detected regions to each page, set it to True (default: False). +* `layout_arch`: The layout architecture name (e.g. ``'lw_detr_s'``, ``'lw_detr_m'``) or your own (fine-tuned) layout model instance to use when ``detect_layout=True``. For instance, this snippet instantiates an end-to-end ocr_predictor working with rotated documents, which preserves the aspect ratio of the documents, and returns polygons: @@ -319,7 +345,7 @@ For instance, this snippet instantiates an end-to-end ocr_predictor working with Additionally, you can change the batch size of the underlying detection and recognition predictors to optimize the performance depending on your hardware: -* `det_bs`: batch size for the detection model (default: 2) +* `det_bs`: batch size for the detection model (default: 2) - will also be used for the layout model if ``detect_layout=True`` * `reco_bs`: batch size for the recognition model (default: 128) .. code:: python3 @@ -341,6 +367,34 @@ For example to disable the automatic grouping of lines into blocks: model = ocr_predictor(pretrained=True, resolve_blocks=False) +Detecting the document layout +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +In addition to running the :py:meth:`layout_predictor ` standalone, you can plug a layout detection model directly into the end-to-end pipeline by setting ``detect_layout=True``. The detected regions (e.g. Title, Text, Table, Page-header, Page-footer) are attached to every :class:`Page ` and can be accessed through ``page.layout``, exported alongside the rest of the page, and rendered with :py:meth:`show `. + +.. code:: python3 + + from doctr.io import DocumentFile + from doctr.models import ocr_predictor + + model = ocr_predictor(pretrained=True, detect_layout=True) + doc = DocumentFile.from_images("path/to/your/doc.jpg") + result = model(doc) + + # Access the detected layout regions of the first page + for region in result.pages[0].layout: + print(region.type, region.confidence, region.geometry) + + # The layout is part of the exported representation + export = result.pages[0].export() + print(export["layout"]) + + # Overlay both text and layout regions (use display_layout=False to hide the regions) + result.pages[0].show() + +The same ``detect_layout`` / ``layout_arch`` arguments are available for the :py:meth:`kie_predictor `. + + Running the predictors on GPU ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -369,6 +423,7 @@ The same approach applies to all standalone predictors: * `detection_predictor` * `crop_orientation_predictor` * `page_orientation_predictor` +* `layout_predictor` Just create the predictor instance and move it to the appropriate device. To enable **half-precision inference**, you can append `.half()` after moving the predictor to the device. @@ -378,6 +433,7 @@ What should I do with the output? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ocr_predictor returns a `Document` object with a nested structure (with `Page`, `Block`, `Line`, `Word`, `Artefact`). +When ``detect_layout=True`` was passed, each `Page` additionally carries a list of `LayoutElement` regions under ``page.layout``. To get a better understanding of our document model, check our :ref:`document_structure` section Here is a typical `Document` layout:: @@ -617,3 +673,4 @@ learned confusions, or a ``{forbidden_char: allowed_char}`` dict to override spe handle = add_whitelist(predictor, VOCABS["latin"], strategy="nearest") out = predictor(input_page) handle.remove() + diff --git a/doctr/io/elements.py b/doctr/io/elements.py index c9cad3a12a..33506de9c2 100644 --- a/doctr/io/elements.py +++ b/doctr/io/elements.py @@ -22,7 +22,7 @@ except ModuleNotFoundError: pass -__all__ = ["Element", "Word", "Artefact", "Line", "Prediction", "Block", "Page", "KIEPage", "Document", "LayoutElement"] +__all__ = ["Element", "Word", "Artefact", "Line", "Prediction", "Block", "Page", "KIEPage", "KIEDocument", "Document", "LayoutElement"] class Element(NestedObject): diff --git a/doctr/models/classification/magc_resnet/pytorch.py b/doctr/models/classification/magc_resnet/pytorch.py index 2f79467a83..0272973aba 100644 --- a/doctr/models/classification/magc_resnet/pytorch.py +++ b/doctr/models/classification/magc_resnet/pytorch.py @@ -152,7 +152,7 @@ def magc_resnet31(pretrained: bool = False, **kwargs: Any) -> ResNet: >>> out = model(input_tensor) Args: - pretrained: boolean, True if model is pretrained + pretrained: If True, returns a model pre-trained on our classification dataset **kwargs: keyword arguments of the ResNet architecture Returns: diff --git a/doctr/models/classification/mobilenet/pytorch.py b/doctr/models/classification/mobilenet/pytorch.py index 801d3b6fed..a6b78c249e 100644 --- a/doctr/models/classification/mobilenet/pytorch.py +++ b/doctr/models/classification/mobilenet/pytorch.py @@ -137,7 +137,7 @@ def mobilenet_v3_small(pretrained: bool = False, **kwargs: Any) -> mobilenetv3.M >>> out = model(input_tensor) Args: - pretrained: boolean, True if model is pretrained + pretrained: If True, returns a model pre-trained on our classification dataset **kwargs: keyword arguments of the MobileNetV3 architecture Returns: @@ -160,7 +160,7 @@ def mobilenet_v3_small_r(pretrained: bool = False, **kwargs: Any) -> mobilenetv3 >>> out = model(input_tensor) Args: - pretrained: boolean, True if model is pretrained + pretrained: If True, returns a model pre-trained on our classification dataset **kwargs: keyword arguments of the MobileNetV3 architecture Returns: @@ -187,7 +187,7 @@ def mobilenet_v3_large(pretrained: bool = False, **kwargs: Any) -> mobilenetv3.M >>> out = model(input_tensor) Args: - pretrained: boolean, True if model is pretrained + pretrained: If True, returns a model pre-trained on our classification dataset **kwargs: keyword arguments of the MobileNetV3 architecture Returns: @@ -213,7 +213,7 @@ def mobilenet_v3_large_r(pretrained: bool = False, **kwargs: Any) -> mobilenetv3 >>> out = model(input_tensor) Args: - pretrained: boolean, True if model is pretrained + pretrained: If True, returns a model pre-trained on our classification dataset **kwargs: keyword arguments of the MobileNetV3 architecture Returns: @@ -240,7 +240,7 @@ def mobilenet_v3_small_crop_orientation(pretrained: bool = False, **kwargs: Any) >>> out = model(input_tensor) Args: - pretrained: boolean, True if model is pretrained + pretrained: If True, returns a model pre-trained on our classification dataset **kwargs: keyword arguments of the MobileNetV3 architecture Returns: @@ -266,7 +266,7 @@ def mobilenet_v3_small_page_orientation(pretrained: bool = False, **kwargs: Any) >>> out = model(input_tensor) Args: - pretrained: boolean, True if model is pretrained + pretrained: If True, returns a model pre-trained on our classification dataset **kwargs: keyword arguments of the MobileNetV3 architecture Returns: diff --git a/doctr/models/classification/resnet/pytorch.py b/doctr/models/classification/resnet/pytorch.py index 426b2d3d05..d072963e15 100644 --- a/doctr/models/classification/resnet/pytorch.py +++ b/doctr/models/classification/resnet/pytorch.py @@ -247,7 +247,7 @@ def resnet18(pretrained: bool = False, **kwargs: Any) -> TVResNet: >>> out = model(input_tensor) Args: - pretrained: boolean, True if model is pretrained + pretrained: If True, returns a model pre-trained on our classification dataset **kwargs: keyword arguments of the ResNet architecture Returns: @@ -274,7 +274,7 @@ def resnet31(pretrained: bool = False, **kwargs: Any) -> ResNet: >>> out = model(input_tensor) Args: - pretrained: boolean, True if model is pretrained + pretrained: If True, returns a model pre-trained on our classification dataset **kwargs: keyword arguments of the ResNet architecture Returns: @@ -306,7 +306,7 @@ def resnet34(pretrained: bool = False, **kwargs: Any) -> TVResNet: >>> out = model(input_tensor) Args: - pretrained: boolean, True if model is pretrained + pretrained: If True, returns a model pre-trained on our classification dataset **kwargs: keyword arguments of the ResNet architecture Returns: @@ -332,7 +332,7 @@ def resnet34_wide(pretrained: bool = False, **kwargs: Any) -> ResNet: >>> out = model(input_tensor) Args: - pretrained: boolean, True if model is pretrained + pretrained: If True, returns a model pre-trained on our classification dataset **kwargs: keyword arguments of the ResNet architecture Returns: @@ -364,7 +364,7 @@ def resnet50(pretrained: bool = False, **kwargs: Any) -> TVResNet: >>> out = model(input_tensor) Args: - pretrained: boolean, True if model is pretrained + pretrained: If True, returns a model pre-trained on our classification dataset **kwargs: keyword arguments of the ResNet architecture Returns: diff --git a/doctr/models/classification/textnet/pytorch.py b/doctr/models/classification/textnet/pytorch.py index 6f90219194..fb38381c1a 100644 --- a/doctr/models/classification/textnet/pytorch.py +++ b/doctr/models/classification/textnet/pytorch.py @@ -143,7 +143,7 @@ def textnet_tiny(pretrained: bool = False, **kwargs: Any) -> TextNet: >>> out = model(input_tensor) Args: - pretrained: boolean, True if model is pretrained + pretrained: If True, returns a model pre-trained on our classification dataset **kwargs: keyword arguments of the TextNet architecture Returns: @@ -190,7 +190,7 @@ def textnet_small(pretrained: bool = False, **kwargs: Any) -> TextNet: >>> out = model(input_tensor) Args: - pretrained: boolean, True if model is pretrained + pretrained: If True, returns a model pre-trained on our classification dataset **kwargs: keyword arguments of the TextNet architecture Returns: @@ -237,7 +237,7 @@ def textnet_base(pretrained: bool = False, **kwargs: Any) -> TextNet: >>> out = model(input_tensor) Args: - pretrained: boolean, True if model is pretrained + pretrained: If True, returns a model pre-trained on our classification dataset **kwargs: keyword arguments of the TextNet architecture Returns: diff --git a/doctr/models/classification/vgg/pytorch.py b/doctr/models/classification/vgg/pytorch.py index 823126ef86..0708dae6b9 100644 --- a/doctr/models/classification/vgg/pytorch.py +++ b/doctr/models/classification/vgg/pytorch.py @@ -92,7 +92,7 @@ def vgg16_bn_r(pretrained: bool = False, **kwargs: Any) -> tv_vgg.VGG: >>> out = model(input_tensor) Args: - pretrained (bool): If True, returns a model pre-trained on ImageNet + pretrained: If True, returns a model pre-trained on ImageNet **kwargs: keyword arguments of the VGG architecture Returns: diff --git a/doctr/models/classification/vip/pytorch.py b/doctr/models/classification/vip/pytorch.py index 907de205bd..dc00166b76 100644 --- a/doctr/models/classification/vip/pytorch.py +++ b/doctr/models/classification/vip/pytorch.py @@ -247,7 +247,7 @@ def vip_tiny(pretrained: bool = False, **kwargs: Any) -> VIPNet: https://github.com/cxfyxl/VIPTR/blob/main/modules/VIPTRv2.py) Args: - pretrained: whether to load pretrained weights + pretrained: If True, returns a model pre-trained on our classification dataset **kwargs: optional arguments Returns: @@ -276,7 +276,7 @@ def vip_base(pretrained: bool = False, **kwargs: Any) -> VIPNet: https://github.com/cxfyxl/VIPTR/blob/main/modules/VIPTRv2.py) Args: - pretrained: whether to load pretrained weights + pretrained: If True, returns a model pre-trained on our classification dataset **kwargs: optional arguments Returns: @@ -309,7 +309,7 @@ def _vip( Args: arch: architecture key - pretrained: load pretrained weights? + pretrained: If True, returns a model pre-trained on our classification dataset ignore_keys: layer keys to ignore **kwargs: arguments passed to VIPNet diff --git a/doctr/models/classification/vit/pytorch.py b/doctr/models/classification/vit/pytorch.py index fae95ebd70..a356d673a2 100644 --- a/doctr/models/classification/vit/pytorch.py +++ b/doctr/models/classification/vit/pytorch.py @@ -150,7 +150,7 @@ def vit_s(pretrained: bool = False, **kwargs: Any) -> VisionTransformer: >>> out = model(input_tensor) Args: - pretrained: boolean, True if model is pretrained + pretrained: If True, returns a model pre-trained on our classification dataset **kwargs: keyword arguments of the VisionTransformer architecture Returns: @@ -180,7 +180,7 @@ def vit_b(pretrained: bool = False, **kwargs: Any) -> VisionTransformer: >>> out = model(input_tensor) Args: - pretrained: boolean, True if model is pretrained + pretrained: If True, returns a model pre-trained on our classification dataset **kwargs: keyword arguments of the VisionTransformer architecture Returns: diff --git a/doctr/models/classification/vit_det/pytorch.py b/doctr/models/classification/vit_det/pytorch.py index 0862a05a90..c5c1ad468b 100644 --- a/doctr/models/classification/vit_det/pytorch.py +++ b/doctr/models/classification/vit_det/pytorch.py @@ -296,7 +296,7 @@ def vit_det_s(pretrained: bool = False, **kwargs: Any) -> VisionDetectionTransfo >>> out = model(input_tensor) Args: - pretrained: boolean, True if model is pretrained + pretrained: If True, returns a model pre-trained on our classification dataset **kwargs: keyword arguments of the VisionDetectionTransformer architecture Returns: @@ -328,7 +328,7 @@ def vit_det_m(pretrained: bool = False, **kwargs: Any) -> VisionDetectionTransfo >>> out = model(input_tensor) Args: - pretrained: boolean, True if model is pretrained + pretrained: If True, returns a model pre-trained on our classification dataset **kwargs: keyword arguments of the VisionTransformer architecture Returns: diff --git a/doctr/models/detection/differentiable_binarization/pytorch.py b/doctr/models/detection/differentiable_binarization/pytorch.py index 848bb009bd..526c39e8c0 100644 --- a/doctr/models/detection/differentiable_binarization/pytorch.py +++ b/doctr/models/detection/differentiable_binarization/pytorch.py @@ -353,7 +353,7 @@ def db_resnet34(pretrained: bool = False, **kwargs: Any) -> DBNet: >>> out = model(input_tensor) Args: - pretrained (bool): If True, returns a model pre-trained on our text detection dataset + pretrained: If True, returns a model pre-trained on our text detection dataset **kwargs: keyword arguments of the DBNet architecture Returns: @@ -386,7 +386,7 @@ def db_resnet50(pretrained: bool = False, **kwargs: Any) -> DBNet: >>> out = model(input_tensor) Args: - pretrained (bool): If True, returns a model pre-trained on our text detection dataset + pretrained: If True, returns a model pre-trained on our text detection dataset **kwargs: keyword arguments of the DBNet architecture Returns: @@ -419,7 +419,7 @@ def db_mobilenet_v3_large(pretrained: bool = False, **kwargs: Any) -> DBNet: >>> out = model(input_tensor) Args: - pretrained (bool): If True, returns a model pre-trained on our text detection dataset + pretrained: If True, returns a model pre-trained on our text detection dataset **kwargs: keyword arguments of the DBNet architecture Returns: diff --git a/doctr/models/detection/fast/pytorch.py b/doctr/models/detection/fast/pytorch.py index 15003bf459..9d9b1a75c5 100644 --- a/doctr/models/detection/fast/pytorch.py +++ b/doctr/models/detection/fast/pytorch.py @@ -374,7 +374,7 @@ def fast_tiny(pretrained: bool = False, **kwargs: Any) -> FAST: >>> out = model(input_tensor) Args: - pretrained (bool): If True, returns a model pre-trained on our text detection dataset + pretrained: If True, returns a model pre-trained on our text detection dataset **kwargs: keyword arguments of the DBNet architecture Returns: @@ -401,7 +401,7 @@ def fast_small(pretrained: bool = False, **kwargs: Any) -> FAST: >>> out = model(input_tensor) Args: - pretrained (bool): If True, returns a model pre-trained on our text detection dataset + pretrained: If True, returns a model pre-trained on our text detection dataset **kwargs: keyword arguments of the DBNet architecture Returns: @@ -428,7 +428,7 @@ def fast_base(pretrained: bool = False, **kwargs: Any) -> FAST: >>> out = model(input_tensor) Args: - pretrained (bool): If True, returns a model pre-trained on our text detection dataset + pretrained: If True, returns a model pre-trained on our text detection dataset **kwargs: keyword arguments of the DBNet architecture Returns: diff --git a/doctr/models/detection/linknet/pytorch.py b/doctr/models/detection/linknet/pytorch.py index bf973e07d7..611e73ad44 100644 --- a/doctr/models/detection/linknet/pytorch.py +++ b/doctr/models/detection/linknet/pytorch.py @@ -307,7 +307,7 @@ def linknet_resnet18(pretrained: bool = False, **kwargs: Any) -> LinkNet: >>> out = model(input_tensor) Args: - pretrained (bool): If True, returns a model pre-trained on our text detection dataset + pretrained: If True, returns a model pre-trained on our text detection dataset **kwargs: keyword arguments of the LinkNet architecture Returns: @@ -337,7 +337,7 @@ def linknet_resnet34(pretrained: bool = False, **kwargs: Any) -> LinkNet: >>> out = model(input_tensor) Args: - pretrained (bool): If True, returns a model pre-trained on our text detection dataset + pretrained: If True, returns a model pre-trained on our text detection dataset **kwargs: keyword arguments of the LinkNet architecture Returns: @@ -367,7 +367,7 @@ def linknet_resnet50(pretrained: bool = False, **kwargs: Any) -> LinkNet: >>> out = model(input_tensor) Args: - pretrained (bool): If True, returns a model pre-trained on our text detection dataset + pretrained: If True, returns a model pre-trained on our text detection dataset **kwargs: keyword arguments of the LinkNet architecture Returns: diff --git a/doctr/models/layout/lw_detr/pytorch.py b/doctr/models/layout/lw_detr/pytorch.py index 521f6df7cc..42a140f824 100644 --- a/doctr/models/layout/lw_detr/pytorch.py +++ b/doctr/models/layout/lw_detr/pytorch.py @@ -789,7 +789,7 @@ def lw_detr_s(pretrained: bool = False, **kwargs: Any) -> LWDETR: >>> out = model(input_tensor) Args: - pretrained (bool): If True, returns a model pre-trained on our text detection dataset + pretrained: If True, returns a model pre-trained on our text detection dataset **kwargs: keyword arguments of the LinkNet architecture Returns: @@ -820,7 +820,7 @@ def lw_detr_m(pretrained: bool = False, **kwargs: Any) -> LWDETR: >>> out = model(input_tensor) Args: - pretrained (bool): If True, returns a model pre-trained on our text detection dataset + pretrained: If True, returns a model pre-trained on our text detection dataset **kwargs: keyword arguments of the LinkNet architecture Returns: diff --git a/doctr/models/recognition/crnn/pytorch.py b/doctr/models/recognition/crnn/pytorch.py index d8299c890f..be2e2030ee 100644 --- a/doctr/models/recognition/crnn/pytorch.py +++ b/doctr/models/recognition/crnn/pytorch.py @@ -279,7 +279,7 @@ def crnn_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> CRNN: >>> out = model(input_tensor) Args: - pretrained (bool): If True, returns a model pre-trained on our text recognition dataset + pretrained: If True, returns a model pre-trained on our text recognition dataset **kwargs: keyword arguments of the CRNN architecture Returns: @@ -299,7 +299,7 @@ def crnn_mobilenet_v3_small(pretrained: bool = False, **kwargs: Any) -> CRNN: >>> out = model(input_tensor) Args: - pretrained (bool): If True, returns a model pre-trained on our text recognition dataset + pretrained: If True, returns a model pre-trained on our text recognition dataset **kwargs: keyword arguments of the CRNN architecture Returns: @@ -325,7 +325,7 @@ def crnn_mobilenet_v3_large(pretrained: bool = False, **kwargs: Any) -> CRNN: >>> out = model(input_tensor) Args: - pretrained (bool): If True, returns a model pre-trained on our text recognition dataset + pretrained: If True, returns a model pre-trained on our text recognition dataset **kwargs: keyword arguments of the CRNN architecture Returns: diff --git a/doctr/models/recognition/master/pytorch.py b/doctr/models/recognition/master/pytorch.py index a8dd64882e..8acfe2d1a5 100644 --- a/doctr/models/recognition/master/pytorch.py +++ b/doctr/models/recognition/master/pytorch.py @@ -325,7 +325,7 @@ def master(pretrained: bool = False, **kwargs: Any) -> MASTER: >>> out = model(input_tensor) Args: - pretrained (bool): If True, returns a model pre-trained on our text recognition dataset + pretrained: If True, returns a model pre-trained on our text recognition dataset **kwargs: keywoard arguments passed to the MASTER architecture Returns: diff --git a/doctr/models/recognition/parseq/pytorch.py b/doctr/models/recognition/parseq/pytorch.py index ee3619d7cd..b75f8a3561 100644 --- a/doctr/models/recognition/parseq/pytorch.py +++ b/doctr/models/recognition/parseq/pytorch.py @@ -482,7 +482,7 @@ def parseq(pretrained: bool = False, **kwargs: Any) -> PARSeq: >>> out = model(input_tensor) Args: - pretrained (bool): If True, returns a model pre-trained on our text recognition dataset + pretrained: If True, returns a model pre-trained on our text recognition dataset **kwargs: keyword arguments of the PARSeq architecture Returns: diff --git a/doctr/models/recognition/sar/pytorch.py b/doctr/models/recognition/sar/pytorch.py index 4e7c9c47e6..b3ee6c7a59 100644 --- a/doctr/models/recognition/sar/pytorch.py +++ b/doctr/models/recognition/sar/pytorch.py @@ -389,7 +389,7 @@ def sar_resnet31(pretrained: bool = False, **kwargs: Any) -> SAR: >>> out = model(input_tensor) Args: - pretrained (bool): If True, returns a model pre-trained on our text recognition dataset + pretrained: If True, returns a model pre-trained on our text recognition dataset **kwargs: keyword arguments of the SAR architecture Returns: diff --git a/doctr/models/recognition/utils.py b/doctr/models/recognition/utils.py index d2c9dbce9d..3cdc85f3aa 100644 --- a/doctr/models/recognition/utils.py +++ b/doctr/models/recognition/utils.py @@ -20,12 +20,11 @@ def merge_strings(a: str, b: str, overlap_ratio: float) -> str: Returns: A merged character sequence. - Example:: - >>> from doctr.models.recognition.utils import merge_strings - >>> merge_strings('abcd', 'cdefgh', 0.5) - 'abcdefgh' - >>> merge_strings('abcdi', 'cdefgh', 0.5) - 'abcdefgh' + >>> from doctr.models.recognition.utils import merge_strings + >>> merge_strings('abcd', 'cdefgh', 0.5) + 'abcdefgh' + >>> merge_strings('abcdi', 'cdefgh', 0.5) + 'abcdefgh' """ seq_len = min(len(a), len(b)) if seq_len <= 1: # One sequence is empty or will be after cropping in next step, return both to keep data @@ -78,10 +77,9 @@ def merge_multi_strings(seq_list: list[str], overlap_ratio: float, last_overlap_ Returns: A merged character sequence - Example:: - >>> from doctr.models.recognition.utils import merge_multi_strings - >>> merge_multi_strings(['abc', 'bcdef', 'difghi', 'aijkl'], 0.5, 0.1) - 'abcdefghijkl' + >>> from doctr.models.recognition.utils import merge_multi_strings + >>> merge_multi_strings(['abc', 'bcdef', 'difghi', 'aijkl'], 0.5, 0.1) + 'abcdefghijkl' """ if not seq_list: return "" diff --git a/doctr/models/recognition/viptr/pytorch.py b/doctr/models/recognition/viptr/pytorch.py index beb0e8567b..279b8ee06c 100644 --- a/doctr/models/recognition/viptr/pytorch.py +++ b/doctr/models/recognition/viptr/pytorch.py @@ -261,7 +261,7 @@ def viptr_tiny(pretrained: bool = False, **kwargs: Any) -> VIPTR: >>> out = model(input_tensor) Args: - pretrained (bool): If True, returns a model pre-trained on our text recognition dataset + pretrained: If True, returns a model pre-trained on our text recognition dataset **kwargs: keyword arguments of the VIPTR architecture Returns: diff --git a/doctr/models/recognition/vitstr/pytorch.py b/doctr/models/recognition/vitstr/pytorch.py index 80bfebca70..87dc653c22 100644 --- a/doctr/models/recognition/vitstr/pytorch.py +++ b/doctr/models/recognition/vitstr/pytorch.py @@ -239,7 +239,7 @@ def vitstr_small(pretrained: bool = False, **kwargs: Any) -> ViTSTR: >>> out = model(input_tensor) Args: - pretrained (bool): If True, returns a model pre-trained on our text recognition dataset + pretrained: If True, returns a model pre-trained on our text recognition dataset kwargs: keyword arguments of the ViTSTR architecture Returns: @@ -268,7 +268,7 @@ def vitstr_base(pretrained: bool = False, **kwargs: Any) -> ViTSTR: >>> out = model(input_tensor) Args: - pretrained (bool): If True, returns a model pre-trained on our text recognition dataset + pretrained: If True, returns a model pre-trained on our text recognition dataset kwargs: keyword arguments of the ViTSTR architecture Returns: diff --git a/doctr/models/recognition/zoo.py b/doctr/models/recognition/zoo.py index a89c57738a..a28234c949 100644 --- a/doctr/models/recognition/zoo.py +++ b/doctr/models/recognition/zoo.py @@ -71,12 +71,11 @@ def recognition_predictor( ) -> RecognitionPredictor: """Text recognition architecture. - Example:: - >>> import numpy as np - >>> from doctr.models import recognition_predictor - >>> model = recognition_predictor(pretrained=True) - >>> input_page = (255 * np.random.rand(32, 128, 3)).astype(np.uint8) - >>> out = model([input_page]) + >>> import numpy as np + >>> from doctr.models import recognition_predictor + >>> model = recognition_predictor(pretrained=True) + >>> input_page = (255 * np.random.rand(32, 128, 3)).astype(np.uint8) + >>> out = model([input_page]) Args: arch: name of the architecture or model itself to use (e.g. 'crnn_vgg16_bn') diff --git a/doctr/models/zoo.py b/doctr/models/zoo.py index bfa8026943..188692a77d 100644 --- a/doctr/models/zoo.py +++ b/doctr/models/zoo.py @@ -189,8 +189,8 @@ def kie_predictor( """End-to-end KIE architecture using one model for localization, and another for text recognition. >>> import numpy as np - >>> from doctr.models import ocr_predictor - >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True) + >>> from doctr.models import kie_predictor + >>> model = kie_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True) >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8) >>> out = model([input_page]) diff --git a/doctr/utils/metrics.py b/doctr/utils/metrics.py index 5d01cc8bd1..edc1e2d7eb 100644 --- a/doctr/utils/metrics.py +++ b/doctr/utils/metrics.py @@ -558,6 +558,7 @@ def reset(self) -> None: class ObjectDetectionMetric: r"""Implements a COCO-style object detection metric (mAP@[.5:.95]) inspired by the COCO evaluation protocol. + The aggregated metrics are computed as follows: .. math:: @@ -577,11 +578,12 @@ class ObjectDetectionMetric: \sum\limits_{c \in \mathcal{C}} AP_t(c) where: - - :math:`\mathcal{B}` is the set of possible bounding boxes, - - :math:`\mathcal{C}` is the set of possible class indices, - - :math:`S` are confidence scores associated to predictions, - - :math:`\mathcal{T} = \{0.5, 0.55, \dots, 0.95\}` is the set of IoU thresholds, - - :math:`AP_t(c)` is the Average Precision for class :math:`c` + + - :math:`\mathcal{B}` is the set of possible bounding boxes, + - :math:`\mathcal{C}` is the set of possible class indices, + - :math:`S` are confidence scores associated to predictions, + - :math:`\mathcal{T} = \{0.5, 0.55, \dots, 0.95\}` is the set of IoU thresholds, + - :math:`AP_t(c)` is the Average Precision for class :math:`c` at IoU threshold :math:`t`. For a given class and IoU threshold, predictions from all images are @@ -589,8 +591,9 @@ class ObjectDetectionMetric: Each prediction is greedily matched to the unmatched ground-truth box with the highest IoU, provided that: - - the IoU is greater than or equal to the threshold, - - the ground-truth box has not already been matched. + + - the IoU is greater than or equal to the threshold, + - the ground-truth box has not already been matched. True positives and false positives are accumulated to build a precision-recall curve. diff --git a/doctr/utils/visualization.py b/doctr/utils/visualization.py index bf634fe1d9..27a5a15ded 100644 --- a/doctr/utils/visualization.py +++ b/doctr/utils/visualization.py @@ -400,13 +400,14 @@ def visualize_kie_page( def draw_boxes(boxes: np.ndarray, image: np.ndarray, color: tuple[int, int, int] | None = None, **kwargs) -> None: - """Draw an array of relative straight boxes on an image + """Draw an array of relative straight boxes on an image. Args: - boxes: array of relative boxes, of shape (*, 4) + boxes: array of relative boxes, of shape ``(*, 4)`` image: np array, float32 or uint8 color: color to use for bounding box edges **kwargs: keyword arguments from `matplotlib.pyplot.plot` + """ h, w = image.shape[:2] # Convert boxes to absolute coords