Skip to content

jejjohnson/research_notebook

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

21 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Research Notebook

CI Docs License: MIT Python 3.12+ Pixi DVC

A batteries-included scientific research project template for reproducible experiments.

Features

  • 🐍 Python 3.12+ with modern type annotations
  • πŸ“¦ Pixi for reproducible conda-forge environments across platforms
  • πŸ“Š DVC for data versioning and ML pipeline management
  • βš™οΈ Hydra + hydra-zen for type-safe configuration management
  • πŸ““ JupyterLab with LSP, Git integration, and MyST rendering
  • πŸ““ Jupyter notebooks β€” executed .ipynb files committed with outputs, rendered directly by MyST
  • 🌊 Marimo reactive notebook environment
  • πŸ“š MyST-MD for publication-quality documentation
  • πŸ” Ruff for fast linting and formatting
  • πŸ”’ Pre-commit hooks for code quality enforcement
  • πŸš€ 14 GitHub Actions workflows for CI/CD, reproducibility, and automation
  • πŸ–₯️ Cross-platform support (Linux x86-64, macOS ARM64)
  • πŸ“ CITATION.cff for academic citations
  • πŸ”¬ Hydra multirun sweeps for hyperparameter optimization

Project Structure

research_template/
β”œβ”€β”€ .github/workflows/    # 14 GitHub Actions workflows
β”œβ”€β”€ .devcontainer/        # VS Code / Codespaces dev container
β”œβ”€β”€ configs/              # Hydra configuration hierarchy
β”‚   β”œβ”€β”€ train.yaml        # Main config with defaults
β”‚   β”œβ”€β”€ model/            # Model configs (baseline, transformer)
β”‚   └── data/             # Data configs (small, full)
β”œβ”€β”€ data/                 # Data directories (DVC-managed)
β”‚   β”œβ”€β”€ raw/
β”‚   β”œβ”€β”€ processed/
β”‚   └── external/
β”œβ”€β”€ docs/                 # MyST documentation
β”œβ”€β”€ marimo_notebooks/     # Marimo reactive notebooks
β”œβ”€β”€ notebooks/            # Jupyter notebooks (.ipynb, MyST-rendered)
β”œβ”€β”€ results/              # Experiment results (DVC-managed)
β”œβ”€β”€ scripts/              # Entry point scripts
β”œβ”€β”€ src/research_notebook/        # Source package
β”‚   β”œβ”€β”€ data/             # Data loading utilities
β”‚   β”œβ”€β”€ models/           # Model implementations
β”‚   β”œβ”€β”€ trainers/         # Training loops
β”‚   └── utils/            # Utility functions
└── tests/                # Test suite

Quick Start

Prerequisites

Install Pixi:

curl -fsSL https://pixi.sh/install.sh | bash

Installation

git clone https://github.com/jejjohnson/research_notebook
cd research_notebook
pixi install

Run an Experiment

# Preprocess data
pixi run preprocess

# Train with default config
pixi run train

# Train with parameter overrides
pixi run train training.lr=0.001 model=transformer

# Evaluate
pixi run evaluate

Environments

Environment Features Use Case
default dev Testing, linting, training
docs docs Building MyST documentation
jupyterlab dev + jupyterlab Interactive notebooks
marimo dev + marimo Reactive notebooks
# Activate specific environment
pixi run -e jupyterlab lab
pixi run -e marimo marimo-edit
pixi run -e docs docs-build

Configuration Management

This template uses both Hydra and hydra-zen for configuration management.

Classic Hydra (YAML-based)

# Use default config
pixi run train

# Override parameters
pixi run train training.lr=0.001 training.epochs=50

# Use different model config
pixi run train model=transformer

# Hyperparameter sweep
pixi run train -m training.lr=0.001,0.01,0.1

hydra-zen (Python-based, type-safe)

pixi run train-zen

hydra-zen eliminates YAML boilerplate by defining configs as Python dataclasses:

from hydra_zen import builds, make_config, launch

ModelConfig = builds(BaselineModel, hidden_size=64, num_layers=2)
ExperimentConfig = make_config(model=ModelConfig, seed=42)

Experiment Tracking

This template uses DVC for data and experiment tracking.

# Add data to DVC
dvc add data/raw/dataset.csv

# Run the full pipeline
dvc repro

# Check pipeline status
dvc status

# View pipeline DAG
dvc dag

# Compare metrics across experiments
dvc metrics diff

The DVC pipeline is defined in dvc.yaml:

  1. preprocess: Processes raw data β†’ data/processed/
  2. train: Trains model β†’ results/metrics/train_metrics.json

Documentation

Documentation is built with MyST-MD using the book theme.

# Serve locally with live reload
pixi run -e docs docs-serve

# Build static HTML
pixi run -e docs docs-build

Docs are automatically deployed to GitHub Pages on every push to main.

Notebook Environments

Notebooks are committed as executed .ipynb files under notebooks/, with cell outputs embedded. MyST renders them directly in the docs site (no conversion step), so figures and prose stay together in a single source of truth. Jupytext is still available for optional .py ↔ .ipynb pairing if you prefer cleaner diffs during editing.

JupyterLab

Full-featured JupyterLab with LSP, Git integration, MyST rendering, and spell checking:

pixi run -e jupyterlab lab

Marimo

Reactive, reproducible notebooks in pure Python:

pixi run -e marimo marimo-edit

Marimo notebooks in marimo_notebooks/ are stored as .py files, making them diff-friendly and importable as regular Python modules.

CI/CD Workflows

Workflow Trigger Description
ci.yml push/PR pytest on ubuntu + macos
lint.yml push/PR Ruff linting
typecheck.yml push/PR ty type checking
pages.yml push to main Build + deploy MyST docs
dvc-check.yml DVC file changes Validate DVC pipeline
notebooks.yml notebook changes Validate .ipynb structure via nbformat
reproducibility.yml weekly schedule Full dvc repro
experiment-report.yml PR DVC metrics diff comment
citation.yml CITATION.cff changes Validate citation file
pixi-update.yml monthly schedule Update pixi lockfile
codeql.yml push/PR/schedule Security scanning
conventional-commits.yml PR Validate PR title
label-pr.yml PR Auto-label PRs
pre-commit-autoupdate.yml weekly schedule Update pre-commit hooks

Academic

Citation

If you use this template, please cite it using the metadata in CITATION.cff:

@software{johnson2026researchnotebook,
  author = {Johnson, Juan Emmanuel},
  title  = {Research Notebook},
  year   = {2024},
  url    = {https://github.com/jejjohnson/research_notebook},
}

References

Add BibTeX references to references.bib. They are automatically available in MyST documentation and Jupyter notebooks with the jupyterlab-myst extension.

Zenodo

Connect your GitHub repository to Zenodo for automatic DOI assignment on releases.

Acknowledgments

This template was inspired by:

About

My personal research notebook with notes, tutorials, and resources written in Jupyterbook.

Resources

License

Contributing

Stars

Watchers

Forks

Contributors