This documentation folder contains resources and instructions related to the development of the RAG-based chatbot plugin for Jenkins.
Below is a brief explanation of the key subdirectories:
chatbot-core/: Core logic of the chatbot.data/: Data-related files and scripts.collection/: Scripts to collect data from Jenkins Docs, Jenkins Plugins, Discourse, StackOverflow.preprocessing/: Scripts to clean, filter, the collected data before chunking.raw/: Output directory for collected data.processed/: Output directory for cleaned and filtered data.
utils/: Contains utils for the chatbot-core directory(e.g. logger).rag/: Core logic of the RAGembedding/: Scripts to embed the chunks.vectorstore/: Scripts to store the embeddings into a vector database.retriever/: Scripts to perform the semantic search across the vector database.
api/: FastAPI application that exposes the chatbot via a REST API.main.py: Entry point to run the FastAPI app.routes/: Defines the HTTP endpoints.services/: Contains the business logic, representing the service layer .models/: Contains schemas and LLM interface.prompts/: Utilities for prompt formatting and construction.config/: Loads and provides application configuration (YAML-based).
tests/: Tests for the backend.unit/: Contains the unit tests and the related configuration.integration/: Contains the integration tests and the related configuration.
requirements.txt: Python dependencies.requirements-cpu.txt: Python dependencies for cpu only usage.
docs/: Developer documentation. Inside it mirrors the codebase layout.frontend/: Directory for the frontend React application.
To quickly start, in the root directory a Makefile contains many ready-to-go targets, that allow to run basic flows(e.g. running the API).
In this doc file we'll use these targets, without going into the details of the scripts and the implementation reasonings. For further information you can visit the package-related doc files.
For the setup you can follow Setup Guide.
The first thing we want to be able to do is running the whole data pipeline. The data pipeline comprehends the following phases:
- Data Collection
- Preprocessing
- Chunking
- Embedding
- Storage
Note: the collection of StackOverflow's data is not included; to include it you must follow this.
So starting from the identification of the data sources(e.g. Jenkins Official Documentation) the data pipeline will collect it, process it, and finally store it in a vector database(FAISS) to later perform semantic search.
To run the following pipeline you can use the run-data-pipeline target:
make run-data-pipelineNote: for more details on the scripts and on the single processes you can visit the docs under
docs/chatbot-core/data/anddocs/chatbot-core/rag/.
Another key component in this repo is the backend, that allows to run the API that serves the chatbot functionalities. Also for the API there is a specific target in the Makefile, that does all the setup, installing the correct dependencies in the virtual environment, and runs the API.
To run it you can use the api target:
make apiNote: for more details on the API and on the architectural overview you can visit the docs under
docs/chatbot-core/api/.
For the UI the project relies on a React application, that is then built and injected into the Jenkins UI. A target in the Makefile allows to run the flow that builds the React app.
You can do that by running:
make build-frontendAfter running this command you can run Jenkins (mvn hpi:run).
Note: for more details on the frontend you can visit the docs under
docs/frontend/.
For both the frontend and the backend we have a suite of unit and integration tests. Also in this case we have a specific target to run all the tests.
To run the tests:
make run-test