Skip to content

Conversation

@Kannav02
Copy link
Collaborator

@Kannav02 Kannav02 commented Jul 9, 2025

This PR introduces a complete dataset generation and evaluation pipeline for creating high-quality question-answer pairs from OpenROAD documentation. The system includes automated QA pair generation using Gemini Pro and comprehensive quality evaluation metrics

The reference for all of the code and logic has been taken from the following link:
https://huggingface.co/learn/cookbook/en/rag_evaluation

The following files have been added under the folder dataset_gen_eval

  • eval_dataset.py: Evaluation script that loads generated QA pairs and applies quality metrics
  • generate_qa_pairs.py: Automated QA pair generation using Gemini Pro 2.5 to create factoid questions from domain-specific document chunks
  • ingest_doc.py: Document processing pipeline that chunks and indexes PDF/Markdown files into FAISS
  • quality_agents.py: Custom DeepEval metrics implementation with three quality assessment classes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants