A comprehensive pipeline for conducting systematic literature reviews on LLM-powered wargaming research. This tool automates paper discovery, screening, information extraction, and analysis.
This pipeline implements the systematic review protocol defined in review_protocol_v0_3.md for studying LLM-powered wargames. It provides:
- Multi-source paper harvesting from Google Scholar, arXiv, Semantic Scholar, and Crossref
- Intelligent deduplication using DOI matching and fuzzy title comparison
- PDF fetching with fallback strategies including Sci-Hub
- LLM-powered extraction of key information using OpenAI GPT-4
- Failure mode detection using regex patterns
- Visualization generation for publication trends and analysis
- Export packaging with Zenodo integration
WordplayWorkshop2025/
├── src/lit_review/
│ ├── harvesters/ # Paper discovery modules
│ ├── processing/ # Data cleaning and PDF handling
│ ├── extraction/ # LLM extraction and tagging
│ ├── analysis/ # Failure detection and metrics
│ ├── visualization/ # Chart generation
│ └── utils/ # Configuration and utilities
├── data/
│ ├── raw/ # Harvested papers
│ ├── processed/ # Screening progress
│ ├── extracted/ # Extraction results
│ └── templates/ # Data structure templates
├── outputs/ # Visualizations and exports
├── pdf_cache/ # Downloaded PDFs
├── logs/ # SQLite logs
├── tests/ # Comprehensive test suite
├── notebooks/ # Jupyter notebooks
├── scripts/ # Utility scripts
├── config/ # Configuration files
└── run.py # CLI interface
- Python 3.13+
- UV package manager
- OpenAI API key (for LLM extraction)
- Optional: Semantic Scholar API key
- Clone the repository:
git clone <repository-url>
cd WordplayWorkshop2025- Create virtual environment with UV:
uv venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate- Install dependencies:
uv pip install -e .- Copy and configure settings:
cp config/config.yaml.example config/config.yaml
# Edit config/config.yaml with your API keys and preferencesrun.py as the main entry point. Do NOT use scripts in the scripts/ directory for pipeline execution - they are deprecated and may use incorrect settings (like limiting sources).
Search for papers from ALL configured sources (arXiv, Semantic Scholar, Google Scholar, CrossRef):
python run.py harvestOr specify sources explicitly:
python run.py harvest --sources arxiv semantic_scholar google_scholar crossrefOptions:
--query: Use preset queries or provide custom search string (default: uses query from config)--sources: Specify sources (default: ALL configured sources)
Note: The harvest command now automatically saves a snapshot of your configuration alongside the results for reproducibility.
--max-results: Maximum results per source (default: 100)--parallel/--no-parallel: Enable parallel searching
Generate Excel file for manual screening:
python run.py prepare-screen --input data/raw/papers_raw.csvThe Excel file includes:
- Paper metadata and abstracts
- Screening decision columns
- Data validation for include/exclude decisions
- Statistics and instructions sheets
Use LLM to extract structured information:
python run.py extract --input data/processed/screening_progress.csvExtracts:
- Venue type (conference, journal, workshop, tech-report)
- Game type (seminar, matrix, digital, hybrid)
- Open-ended vs quantitative classification
- LLM family and role
- Evaluation metrics
- Failure modes
Create charts and analysis:
python run.py visualise --input data/extracted/extraction.csvGenerates:
- Publication timeline
- Venue distribution
- Failure modes frequency
- LLM families usage
- Game types distribution
- Creative-Analytical Scale distribution (1-7)
Package results for sharing:
python run.py export \
--papers data/raw/papers_raw.csv \
--extraction data/extracted/extraction.csvCreates a ZIP package with:
- All data files (CSV format)
- Visualizations
- README and metadata
- Optional: Upload to Zenodo for DOI
Edit config/config.yaml to customize:
search:
queries:
preset1: '"LLM" AND ("wargaming" OR "wargame")'
preset2: '"Large Language Model" AND "strategic game"'
sources:
google_scholar:
enabled: true
max_results: 100api_keys:
openai: ${OPENAI_API_KEY} # Can use environment variables
semantic_scholar: your-key-herefailure_vocabularies:
escalation: [escalation, nuclear, brinkmanship]
bias: [bias, biased, unfair, skew]
hallucination: [hallucination, confabulate, fabricate]The scripts/ directory contains various utility and test scripts, but DO NOT use them for running the pipeline. These scripts may:
- Use hardcoded source lists (e.g., only arxiv + semantic_scholar)
- Skip important sources like Google Scholar and CrossRef
- Not save configuration for reproducibility
- Create inconsistent results
Always use the main run.py CLI or python -m src.lit_review for all pipeline operations.
Configure in config/config.yaml:
extraction:
model: gpt-4 # or gpt-3.5-turbo
temperature: 0.3
max_tokens: 4000python run.py harvest --query '"transformer model" AND "military simulation"'python run.py harvest --sources arxiv,crossref --max-results 50python run.py statusShows:
- Log summary by level
- Recent activity
- Error tracking
# Run all tests
./scripts/run_tests.sh
# Run specific test categories
./scripts/run_tests.sh unit
./scripts/run_tests.sh fast# Run all tests
make test
# Run tests with coverage
make test-coverage
# Run tests verbosely
make test-verbose# Format code
make format
# Run linting
make lint
# Run type checking
make type-check
# Run all quality checks
make allPre-commit hooks are configured to run automatically before each commit:
make pre-commit- Import errors: Ensure you're in the virtual environment
- API rate limits: Configure rate limits in
config/config.yaml - PDF download failures: Check internet connection and try Sci-Hub mirrors
- LLM extraction errors: Verify OpenAI API key and quota
Enable detailed logging:
python run.py --debug harvestQuery the SQLite log database:
from lit_review.utils import LoggingDatabase
db = LoggingDatabase('logs/logging.db')
errors = db.query_logs(level='ERROR')- Fork the repository
- Create a feature branch
- Run tests:
./scripts/run_tests.sh - Submit pull request
If you use this pipeline in your research, please cite:
@software{llm_wargame_review,
title = {Literature Review Pipeline for LLM-Powered Wargames},
author = {Your Name},
year = {2024},
url = {repository-url}
}[Specify your license here]
This pipeline was developed to support systematic reviews of LLM-powered wargaming research. Special thanks to the developers of the scholarly, arxiv, and other libraries that make this tool possible.