Skip to content

MachineDotDev/pdf-autorag-qa

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

12 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🎸 AutoRAG Audio Equipment Q&A Pipeline

GitHub Actions Python PyTorch Hugging Face

GPU-Accelerated Q&A Extraction and RAG Evaluation Pipeline for Audio Equipment Documentation

Transform audio equipment manuals into high-quality training datasets through automated Q&A generation, FAISS vector indexing, and RAG-enhanced evaluation using Llama-3-8B-Instruct.

πŸš€ What This Pipeline Does

This repository implements a complete Dual RAG AutoRAG Pipeline that:

  1. πŸ“„ Extracts Q&A pairs from audio equipment PDFs using GPU-accelerated LLM processing
  2. 🎯 Generates 9 matrix combinations (3 difficulty levels Γ— 3 creativity styles)
  3. πŸ” Builds dual vector stores with both Standard and Adaptive RAG approaches (4 FAISS indices)
  4. ⚑ Evaluates both RAG approaches with scientific A/B testing and comparison analysis
  5. πŸ“Š Produces domain-specific insights with dual approach performance comparison
  6. πŸŽ“ Generates training datasets using the winning RAG approach based on empirical results

πŸ—οΈ Dual RAG Pipeline Architecture

graph LR
    A[πŸ“„ Audio Manual PDF] --> B[🎸 Q&A Generation<br/>Matrix 3Γ—3]
    B --> C[πŸ” Top-K Selection]
    C --> D[🧠 Dual Vector Store Builder]
    
    D --> E1[πŸ”Ή Standard RAG<br/>Answer Embeddings]
    D --> E2[πŸ”Έ Adaptive RAG<br/>Q+A Embeddings]
    
    E1 --> F1[πŸ’Ύ CPU/GPU Indices<br/>Standard]
    E2 --> F2[πŸ’Ύ CPU/GPU Indices<br/>Adaptive]
    
    F1 --> G1[⚑ Standard RAG Eval]
    F2 --> G2[⚑ Adaptive RAG Eval]
    
    G1 --> H[πŸ“Š A/B Comparison]
    G2 --> H
    
    H --> I[πŸ† Winner Selection]
    I --> J[πŸŽ“ Training Dataset]
    
    style A fill:#f9d71c
    style H fill:#e74c3c
    style J fill:#27ae60
Loading

3Γ—3 Matrix Generation Strategy

Difficulty High Creativity (0.9) Balanced (0.7) Conservative (0.3)
Basic Broad, creative questions Standard questions Focused, literal
Intermediate Complex scenarios Technical details Specific procedures
Advanced Expert-level analysis Professional insights Precise specifications

πŸ› οΈ Quick Start

Prerequisites

  • GPU Required: NVIDIA GPU with CUDA support (L40S recommended)
  • Python: 3.9+
  • Storage: ~10GB for models + datasets

Installation

# Clone the repository
git clone <repository-url>
cd autorag

# Install Poetry (if not already installed)
curl -sSL https://install.python-poetry.org | python3 -

# Install dependencies with Poetry
poetry install

# Set up Hugging Face token
export HF_TOKEN="your_hugging_face_token_here"

Run the Complete Pipeline

# Trigger the full AutoRAG pipeline via GitHub Actions
gh workflow run pdf-qa-autorag.yaml \
  --field input_file="pdfs/UAFX_Ruby_63_Top_Boost_Amplifier_Manual.pdf" \
  --field model_name="meta-llama/Meta-Llama-3-8B-Instruct" \
  --field top_k_selection="50"

Or run components individually:

# 1. Generate Q&A pairs (example: basic difficulty, balanced creativity)
python cli_pdf_qa.py \
  pdfs/UAFX_Ruby_63_Top_Boost_Amplifier_Manual.pdf \
  --output outputs/qa_basic_balanced.jsonl \
  --difficulty-levels basic \
  --temperature 0.7 --top-p 0.9

# 2. Select best pairs
python qa_pair_selector.py \
  --qa-artifacts-dir outputs \
  --output-dir rag_input \
  --top-k 50

# 3. Build dual vector stores (Standard + Adaptive RAG)
python qa_faiss_builder.py \
  --qa-pairs-file rag_input/selected_qa_pairs.json \
  --output-dir rag_store

# 4. Run parallel RAG evaluation (both approaches)
# Standard RAG
python qa_autorag_evaluator.py \
  --qa-pairs-file rag_input/selected_qa_pairs.json \
  --qa-faiss-index rag_store/qa_faiss_index_standard_gpu.bin \
  --output-dir autorag_results/standard_rag

# Adaptive RAG  
python qa_autorag_evaluator.py \
  --qa-pairs-file rag_input/selected_qa_pairs.json \
  --qa-faiss-index rag_store/qa_faiss_index_adaptive_gpu.bin \
  --output-dir autorag_results/adaptive_rag

# 5. Compare RAG approaches
python rag_comparison_analyzer.py \
  --standard-results autorag_results/standard_rag \
  --adaptive-results autorag_results/adaptive_rag \
  --output-file autorag_results/rag_comparison_report.json

# 6. Domain-specific evaluation
python domain_eval_gpu.py \
  --config audio_equipment_domain_questions.json \
  --results-dir outputs

πŸ“ Project Structure

autorag/
β”œβ”€β”€ 🎸 pdfs/                                    # Audio equipment manuals
β”‚   └── UAFX_Ruby_63_Top_Boost_Amplifier_Manual.pdf
β”œβ”€β”€ βš™οΈ  qa_extraction_lib/                      # Core extraction library
β”‚   β”œβ”€β”€ pdf_generator.py                       # PDF text processing
β”‚   β”œβ”€β”€ prompt_manager.py                      # LLM prompt templates
β”‚   └── text_processing.py                     # Text chunking & preprocessing
β”œβ”€β”€ πŸ”§ Pipeline Scripts
β”‚   β”œβ”€β”€ cli_pdf_qa.py                         # Main Q&A generator (9 matrix combinations)
β”‚   β”œβ”€β”€ qa_pair_selector.py                   # Top-K selection algorithm
β”‚   β”œβ”€β”€ qa_faiss_builder.py                   # GPU FAISS index builder
β”‚   β”œβ”€β”€ qa_autorag_evaluator.py               # RAG vs Base model evaluation
β”‚   β”œβ”€β”€ training_dataset_generator.py          # High-quality dataset generator
β”‚   └── domain_eval_gpu.py                    # Audio equipment domain evaluator
β”œβ”€β”€ 🎯 Configuration
β”‚   β”œβ”€β”€ audio_equipment_domain_questions.json  # Domain-specific evaluation config
β”‚   └── pyproject.toml                         # Poetry dependencies and project config
β”œβ”€β”€ πŸ€– .github/workflows/
β”‚   └── pdf-qa-autorag.yaml                   # Complete CI/CD pipeline
└── πŸ“Š Output Directories (auto-created)
    β”œβ”€β”€ outputs/           # Generated Q&A pairs (9 matrix files)
    β”œβ”€β”€ rag_input/         # Selected pairs + metadata
    β”œβ”€β”€ rag_store/         # FAISS indices + embeddings
    └── autorag_results/   # Evaluation reports + training datasets

🎯 Key Features

πŸš€ Dual RAG Architecture

  • 4 FAISS indices (CPU/GPU Γ— Standard/Adaptive) for comprehensive evaluation
  • Standard RAG: Traditional answer-only embeddings (speed-optimized)
  • Adaptive RAG: Combined Q+A embeddings (quality-optimized)
  • Scientific A/B testing with quantitative performance comparison
  • Automatic winner selection based on empirical results

πŸ“Š Comprehensive Evaluation

  • Standard vs Adaptive RAG head-to-head comparison
  • RAG vs Base Model performance analysis
  • BERT-Score semantic evaluation for both approaches
  • Domain relevance scoring with dual approach insights
  • Uncertainty detection and confidence calibration
  • Performance metrics (speed vs quality trade-offs)

πŸŽ“ Training-Ready Outputs

  • Winner-based training data using best-performing RAG approach
  • High-quality Q&A pairs filtered by semantic similarity and comparison results
  • JSONL format compatible with popular training frameworks
  • Metadata preservation (difficulty, creativity, source tracking, RAG comparison scores)
  • Quality metrics and approach selection rationale for dataset curation

πŸ”¬ Evaluation Metrics

The pipeline provides multi-dimensional evaluation:

Metric Category Measures Good For
Semantic Quality BERT-Score F1, Precision, Recall Answer accuracy (both approaches)
Domain Relevance Audio equipment term frequency Specialization (Standard vs Adaptive)
Response Length Word count, token count Completeness comparison
Uncertainty "I don't know" phrase detection Confidence calibration
Retrieval Quality Dense + sparse score combination Context relevance (dual comparison)
Performance Retrieval/generation time (ms) Speed vs quality trade-offs
Approach Comparison Standard vs Adaptive metrics Winner selection criteria

🎸 Audio Equipment Domain

Specifically tuned for guitar amplifiers and effects:

  • Domain Terms: amplifier, guitar, tone, distortion, overdrive, gain, EQ, tube, preamp, etc.
  • Question Categories: Technical specifications, setup procedures, troubleshooting, comparisons
  • Knowledge Areas: Impedance matching, tube saturation, power handling, signal processing

πŸ“š Detailed Documentation

For in-depth technical details on each component:

Component Documentation

Architecture Overview

Each component document includes technical implementation details, configuration options, performance characteristics, and use cases.


πŸš€ Advanced Usage

Custom PDF Processing

# Process your own audio equipment manual
python cli_pdf_qa.py your_manual.pdf \
  --chunk-size 600 \
  --batch-size 4 \
  --difficulty-levels basic intermediate \
  --quantize  # Enable for lower GPU memory

Fine-tune Selection Criteria

# More aggressive filtering
python qa_pair_selector.py \
  --qa-artifacts-dir outputs \
  --top-k 25 \
  --min-quality-threshold 0.7

Custom Domain Configuration

Edit audio_equipment_domain_questions.json to:

  • Add new domain terms
  • Create custom evaluation questions
  • Modify confidence templates

🎯 Expected Results

After running the complete dual RAG pipeline, expect:

  • ~500-1000 Q&A pairs from a typical amplifier manual
  • 50+ high-quality pairs selected for dual RAG evaluation
  • 4 GPU/CPU FAISS indices with sub-millisecond query times
  • Comparative analysis showing Standard vs Adaptive performance differences
  • Domain relevance scores typically 0.6-0.8 for in-domain questions (both approaches)
  • BERT-Score improvements of 0.1-0.3 F1 with RAG vs base model
  • Winner determination and deployment recommendations
  • Training dataset generated from best-performing approach

🀝 Contributing

This pipeline is designed for audio equipment domain specialization. To adapt for other domains:

  1. Replace PDF: Add your domain-specific documentation to pdfs/
  2. Update domain config: Modify audio_equipment_domain_questions.json
  3. Adjust prompts: Edit templates in qa_extraction_lib/prompt_manager.py
  4. Update workflow: Change default paths in .github/workflows/pdf-qa-autorag.yaml

πŸ“ License

This project demonstrates advanced RAG pipeline techniques for domain-specific knowledge extraction. Built with modern ML tools including PyTorch, Transformers, FAISS, and Llama-3.

Key Technologies: Python 3.9+, PyTorch 2.1+, Transformers 4.42+, FAISS GPU, Sentence-Transformers, BERT-Score


🎸 Ready to amplify your audio equipment knowledge with AI? Let's rock! 🀘

About

PDF Autorag with QA Pairs

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages