This project is an AI-powered chatbot that answers natural language questions about wage data using LangChain, Ollama, and ChromaDB. It can also display relevant visualizations based on user queries.
To allow users to interactively ask questions about wage-related data (e.g., average salary by education, gender gap, job class distributions) and receive intelligent, context-aware responses backed by large language models and vector similarity search.
- LangChain: For chaining prompts and building the LLM-based pipeline
- Ollama: For running local LLMs (question answering + embeddings)
- LLMs Used:
llama3.2for generating answersmxbai-embed-largefor embeddings
- ChromaDB: For storing and retrieving semantically similar wage data
- Matplotlib: For plotting distributions and comparisons
- Pandas: For data manipulation
main.py– Handles chat input, retrieval, response generation, and plot callscalculator.py– Loads CSV data, embeds documents, and retrieves relevant rows using ChromaDBdraw_plot.py– Contains functions for plotting wage distributions and averageswage_data.csv– Your wage dataset (must be placed in the same directory)
- Install dependencies:
pip install -r requirements.txt- Start Ollama and pull required models:
ollama run llama3
ollama run mxbai-embed-large-
Make sure your CSV file is named
wage_data.csvand is in the project directory. -
Run the chatbot:
python main.py- "What is the average wage by education?"
- "Is there a gender wage gap?"
- "Show me the wage distribution"
- "Plot average wage by job class"
- "Give me a bar chart by gender"
- "Draw the pie chart of wage by sex"
- The model uses semantic search to retrieve relevant rows before answering.
- Basic keyword detection is used to trigger relevant plots (e.g., education, gender, distribution).
- You can extend the logic with more advanced LLM-driven plot generation.
Zahra
Feel free to fork, extend, or contribute!