A batteries-included scientific research project template for reproducible experiments.
- π Python 3.12+ with modern type annotations
- π¦ Pixi for reproducible conda-forge environments across platforms
- π DVC for data versioning and ML pipeline management
- βοΈ Hydra + hydra-zen for type-safe configuration management
- π JupyterLab with LSP, Git integration, and MyST rendering
- π Jupyter notebooks β executed
.ipynbfiles committed with outputs, rendered directly by MyST - π Marimo reactive notebook environment
- π MyST-MD for publication-quality documentation
- π Ruff for fast linting and formatting
- π Pre-commit hooks for code quality enforcement
- π 14 GitHub Actions workflows for CI/CD, reproducibility, and automation
- π₯οΈ Cross-platform support (Linux x86-64, macOS ARM64)
- π CITATION.cff for academic citations
- π¬ Hydra multirun sweeps for hyperparameter optimization
research_template/
βββ .github/workflows/ # 14 GitHub Actions workflows
βββ .devcontainer/ # VS Code / Codespaces dev container
βββ configs/ # Hydra configuration hierarchy
β βββ train.yaml # Main config with defaults
β βββ model/ # Model configs (baseline, transformer)
β βββ data/ # Data configs (small, full)
βββ data/ # Data directories (DVC-managed)
β βββ raw/
β βββ processed/
β βββ external/
βββ docs/ # MyST documentation
βββ marimo_notebooks/ # Marimo reactive notebooks
βββ notebooks/ # Jupyter notebooks (.ipynb, MyST-rendered)
βββ results/ # Experiment results (DVC-managed)
βββ scripts/ # Entry point scripts
βββ src/research_notebook/ # Source package
β βββ data/ # Data loading utilities
β βββ models/ # Model implementations
β βββ trainers/ # Training loops
β βββ utils/ # Utility functions
βββ tests/ # Test suite
Install Pixi:
curl -fsSL https://pixi.sh/install.sh | bashgit clone https://github.com/jejjohnson/research_notebook
cd research_notebook
pixi install# Preprocess data
pixi run preprocess
# Train with default config
pixi run train
# Train with parameter overrides
pixi run train training.lr=0.001 model=transformer
# Evaluate
pixi run evaluate| Environment | Features | Use Case |
|---|---|---|
default |
dev | Testing, linting, training |
docs |
docs | Building MyST documentation |
jupyterlab |
dev + jupyterlab | Interactive notebooks |
marimo |
dev + marimo | Reactive notebooks |
# Activate specific environment
pixi run -e jupyterlab lab
pixi run -e marimo marimo-edit
pixi run -e docs docs-buildThis template uses both Hydra and hydra-zen for configuration management.
# Use default config
pixi run train
# Override parameters
pixi run train training.lr=0.001 training.epochs=50
# Use different model config
pixi run train model=transformer
# Hyperparameter sweep
pixi run train -m training.lr=0.001,0.01,0.1pixi run train-zenhydra-zen eliminates YAML boilerplate by defining configs as Python dataclasses:
from hydra_zen import builds, make_config, launch
ModelConfig = builds(BaselineModel, hidden_size=64, num_layers=2)
ExperimentConfig = make_config(model=ModelConfig, seed=42)This template uses DVC for data and experiment tracking.
# Add data to DVC
dvc add data/raw/dataset.csv
# Run the full pipeline
dvc repro
# Check pipeline status
dvc status
# View pipeline DAG
dvc dag
# Compare metrics across experiments
dvc metrics diffThe DVC pipeline is defined in dvc.yaml:
- preprocess: Processes raw data β
data/processed/ - train: Trains model β
results/metrics/train_metrics.json
Documentation is built with MyST-MD using the book theme.
# Serve locally with live reload
pixi run -e docs docs-serve
# Build static HTML
pixi run -e docs docs-buildDocs are automatically deployed to GitHub Pages on every push to main.
Notebooks are committed as executed .ipynb files under notebooks/, with
cell outputs embedded. MyST renders them directly in the docs site (no
conversion step), so figures and prose stay together in a single source of
truth. Jupytext is still available for optional .py β .ipynb pairing if
you prefer cleaner diffs during editing.
Full-featured JupyterLab with LSP, Git integration, MyST rendering, and spell checking:
pixi run -e jupyterlab labReactive, reproducible notebooks in pure Python:
pixi run -e marimo marimo-editMarimo notebooks in marimo_notebooks/ are stored as .py files,
making them diff-friendly and importable as regular Python modules.
| Workflow | Trigger | Description |
|---|---|---|
ci.yml |
push/PR | pytest on ubuntu + macos |
lint.yml |
push/PR | Ruff linting |
typecheck.yml |
push/PR | ty type checking |
pages.yml |
push to main | Build + deploy MyST docs |
dvc-check.yml |
DVC file changes | Validate DVC pipeline |
notebooks.yml |
notebook changes | Validate .ipynb structure via nbformat |
reproducibility.yml |
weekly schedule | Full dvc repro |
experiment-report.yml |
PR | DVC metrics diff comment |
citation.yml |
CITATION.cff changes | Validate citation file |
pixi-update.yml |
monthly schedule | Update pixi lockfile |
codeql.yml |
push/PR/schedule | Security scanning |
conventional-commits.yml |
PR | Validate PR title |
label-pr.yml |
PR | Auto-label PRs |
pre-commit-autoupdate.yml |
weekly schedule | Update pre-commit hooks |
If you use this template, please cite it using the metadata in CITATION.cff:
@software{johnson2026researchnotebook,
author = {Johnson, Juan Emmanuel},
title = {Research Notebook},
year = {2024},
url = {https://github.com/jejjohnson/research_notebook},
}Add BibTeX references to references.bib. They are automatically available
in MyST documentation and Jupyter notebooks with the jupyterlab-myst extension.
Connect your GitHub repository to Zenodo for automatic DOI assignment on releases.
This template was inspired by:
- jejjohnson/pypackage_template β library-focused Python package template
- DrivenData Cookiecutter Data Science
- Hydra documentation
- DVC documentation