Skip to content

Zahra-7696/wage-data-rag-chatbot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

💰 Wage Data Chatbot (LangChain + Ollama + ChromaDB)

This project is an AI-powered chatbot that answers natural language questions about wage data using LangChain, Ollama, and ChromaDB. It can also display relevant visualizations based on user queries.


🎯 Aim

To allow users to interactively ask questions about wage-related data (e.g., average salary by education, gender gap, job class distributions) and receive intelligent, context-aware responses backed by large language models and vector similarity search.


🛠 Tech Stack

  • LangChain: For chaining prompts and building the LLM-based pipeline
  • Ollama: For running local LLMs (question answering + embeddings)
  • LLMs Used:
    • llama3.2 for generating answers
    • mxbai-embed-large for embeddings
  • ChromaDB: For storing and retrieving semantically similar wage data
  • Matplotlib: For plotting distributions and comparisons
  • Pandas: For data manipulation

📁 File Structure

  • main.py – Handles chat input, retrieval, response generation, and plot calls
  • calculator.py – Loads CSV data, embeds documents, and retrieves relevant rows using ChromaDB
  • draw_plot.py – Contains functions for plotting wage distributions and averages
  • wage_data.csv – Your wage dataset (must be placed in the same directory)

▶️ How to Run

  1. Install dependencies:
pip install -r requirements.txt
  1. Start Ollama and pull required models:
ollama run llama3
ollama run mxbai-embed-large
  1. Make sure your CSV file is named wage_data.csv and is in the project directory.

  2. Run the chatbot:

python main.py

🧠 Example Questions

  • "What is the average wage by education?"
  • "Is there a gender wage gap?"
  • "Show me the wage distribution"
  • "Plot average wage by job class"
  • "Give me a bar chart by gender"
  • "Draw the pie chart of wage by sex"

📌 Notes

  • The model uses semantic search to retrieve relevant rows before answering.
  • Basic keyword detection is used to trigger relevant plots (e.g., education, gender, distribution).
  • You can extend the logic with more advanced LLM-driven plot generation.

👩‍💻 Maintainer

Zahra
Feel free to fork, extend, or contribute!

About

AI chatbot using LangChain, Ollama, and ChromaDB to analyze wage data.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors