AnteaterFind

AnteaterFind is a full stack web application and indexer, written using Python and React. AnteaterFind was written from the ground up, and is capable of handling over fifty thousand web pages, with a query response time under 300ms.

Features

Memory light:
- The indexer is designed to never load the entire index into memory at once, making it ideal for large sets of files.
- The search module reads tokens from the index one at a time, never loading the entire index into memory at once.
Fast query response time:
- The indexer keeps an index of token positions in the index, allowing O(1) token lookup at search time
- When retrieving documents, the search module uses an optimized order to avoid retrieving unnecessary results for the boolean AND query.
Elegant web frontend:
- The lightweight React frontend delegates search tasks to HTML GET requests from the Python backend, allowing for a simple design
- There are animations for most search functions, improving user experience
ChatGPT summaries:
- Provided an OpenAI api key, the search module uses ChatGPT to summarize retrieved web pages.
- ChatGPT summarization occurs after query retrieval, preventing potential slowdown

Installation and Usage

Prerequisites

Python 3.9 or higher
pip package manager
npm package manager

Install Dependencies

To install the Python dependencies for this project, clone the repository and install the requirements:

# Clone the repository
git clone <...link to anteaterfind repo>
cd AnteaterFind

# Install the Python dependencies
pip install -r requirements.txt

To install the React dependencies for this project, simply change directories to the search-frontend folder and run npm install to find and install the dependencies:

# Change to the frontend directory
cd search-frontend

# Install npm dependencies
npm install

Indexing

Run the indexer from your command line:

# Change your directory to the base directory, if it isn't there already
# Replace the path with the path to your zip containing the documents to index
python start_index.py path/to/documents.zip

# To run the indexer with simhash to eliminate similar documents use:
python start_index.py path/to/documents.zip -s

Search

To run the Search component, two separate terminals are needed to run the backend and the frontend. To run the backend, open the first terminal:

# Run the backend with no ChatGPT summaries
python search_server.py

# Optional: run the backend with ChatGPT summaries
# Replace placeholder parameters
python search_server.py path/to/documents.zip OPENAI-API-KEY

Leave this terminal running and open a second terminal to run the frontend:

# Start the npm server
cd search-frontend
npm start

Open a web browser and navigate to localhost to view and use the search engine.

Attribution

The logo of an Anteater with a magnifying glass over it was generated by DALL-E.

Name		Name	Last commit message	Last commit date
Latest commit History 210 Commits
InvertedIndex		InvertedIndex
Search		Search
search-frontend		search-frontend
tests		tests
zips		zips
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
TEST.txt		TEST.txt
requirements.txt		requirements.txt
search_server.py		search_server.py
start_index.py		start_index.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AnteaterFind

Features

Installation and Usage

Prerequisites

Install Dependencies

Indexing

Search

Attribution

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AnteaterFind

Features

Installation and Usage

Prerequisites

Install Dependencies

Indexing

Search

Attribution

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages