Banyan Ingest

banyan_ingest is a python module that prepares documents for use in GenAI and LLM applications.

Rather than re-invent the wheel, banyan_ingest aims to utilize state-of-the-art tools to provide this capability.

Installation

In a python environment (conda, venv, etc.), use the following:

cd PATH_TO_REPO/
pip install .

You will also need to make sure poppler is installed on your system.

Supported Tools and File Formats

Currently we provide support for marker (link here) and NVIDIA's nemotron-parse models (link here). To install the necessary dependencies for these tools please use pip install .[marker] or pip install .[nemotronparse] respectively.

Note: please ensure you follow the guidelines and usage licenses of the tools.

Examples

The example_XXX.py scripts contain basic scripts for processing pdf documents using different OCR tools under the hood.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
src		src
COPYRIGHT.md		COPYRIGHT.md
LICENSE		LICENSE
README.md		README.md
example_marker.py		example_marker.py
example_nemoparse.py		example_nemoparse.py
example_pm.py		example_pm.py
example_pptx.py		example_pptx.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Banyan Ingest

Installation

Supported Tools and File Formats

Examples

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

sandialabs/banyan-ingest

Folders and files

Latest commit

History

Repository files navigation

Banyan Ingest

Installation

Supported Tools and File Formats

Examples

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages