Skip to content

PyRenew Rethinking #645

@cdc-mitzimorris

Description

@cdc-mitzimorris

I asked claude.ai to review the PyRenew code base and the elements of PyRenew currently being used by pyrenew-hew, with an eye towards adding the observation process, latent infections process, and model builder components which have been developed in repo https://github.com/cdcent/cfa-pyrenew-hierarchical (which is really just a staging repo for things to be added to PyRenew). This is the response to my query.

PyRenew Redesign Proposal

Executive Summary

Based on analysis of PyRenew, cfa-pyrenew-hierarchical, and pyrenew-hew, the core value of PyRenew lies in:

  1. RandomVariable metaclass - foundational abstraction
  2. convolve.py utilities - essential for renewal math
  3. Transformation system - clean parameter constraints
  4. Time utilities - date/MMWR week handling

Much of the current API is either unused, superseded by newer patterns, or overly specialized.


Part 1: Modules & Classes to Simplify or Remove

Tier 1: DEPRECATE (superseded by new observation/latent patterns)

Module Class Reason
pyrenew/model/ RtInfectionsRenewalModel Monolithic; replaced by composable latent + observation pattern
pyrenew/model/ HospitalAdmissionsModel Specialized model that should be a composition of generic components
pyrenew/latent/ HospitalAdmissions Should be a Counts observation process, not a latent component
pyrenew/observation/ PoissonObservation Replaced by Counts + PoissonNoise
pyrenew/observation/ NegativeBinomialObservation Replaced by Counts + NegativeBinomialNoise

Tier 2: SIMPLIFY (overly complex or redundant)

Module Class Issue
pyrenew/process/ RtPeriodicDiffARProcess Too specialized; should be composed from simpler pieces
pyrenew/process/ RtWeeklyDiffARProcess Convenience wrapper that hides composition pattern
pyrenew/latent/ InfectionInitializationProcess + 3 methods Complex abstraction; cfa-pyrenew-hierarchical uses simpler prevalence-based init
pyrenew/deterministic/ DeterministicPMF Keep but consider merging with DeterministicVariable

Tier 3: KEEP (core value)

Module Class/Function Why Essential
pyrenew/metaclass.py RandomVariable Foundation of entire system
pyrenew/metaclass.py Model MCMC integration layer
pyrenew/convolve.py new_convolve_scanner, compute_delay_ascertained_incidence Core renewal math
pyrenew/randomvariable/ DistributionalVariable, TransformedVariable Heavily used wrappers
pyrenew/randomvariable/ Hierarchical priors (HierarchicalNormalPrior, etc.) New, well-designed
pyrenew/process/ ARProcess, DifferencedProcess, IIDRandomSequence Building blocks
pyrenew/process/ RandomWalk, PeriodicEffect Common patterns
pyrenew/latent/ Infections, InfectionsWithFeedback Core renewal equation
pyrenew/observation/ BaseObservationProcess, Counts, CountsBySite, Measurements New generic framework
pyrenew/observation/ Noise models (PoissonNoise, NegativeBinomialNoise, HierarchicalNormalNoise) Composable noise
pyrenew/transformation/ All Clean, useful
pyrenew/time.py All Essential utilities
pyrenew/distutil.py PMF validation Essential utilities
pyrenew/datasets/ Reference distributions Useful defaults

Part 2: Staged Redesign Plan

Stage 1: Consolidate Observation Framework

Goal: Make the new generic observation pattern the primary API

  1. Mark PoissonObservation and NegativeBinomialObservation as deprecated
  2. Update all examples to use Counts + noise model pattern
  3. Move HospitalAdmissions out of latent/ (it's an observation, not latent)
  4. Ensure BaseObservationProcess has clear documentation

Stage 2: Integrate Latent Infection Processes from cfa-pyrenew-hierarchical

Goal: Add hierarchical and partitioned infection patterns

  1. Add HierarchicalInfections - multi-subpopulation renewal with Rt deviations
  2. Add PartitionedInfections - single renewal with allocation to subpopulations
  3. Add protocol-based TemporalProcess for Rt dynamics (RandomWalk, AR1, DifferencedAR1)
  4. Standardize four-tuple output: (infections_juris, infections_all, infections_obs, infections_unobs)

Stage 3: Add Model Composition Layer

Goal: Make multi-signal model building easy

  1. Add ModelBuilder that:
    - Automatically computes n_initialization_points from all components
    - Routes infections to observations based on infection_resolution()
    - Validates component compatibility at build time
  2. Add MultiSignalModel that orchestrates latent + multiple observations

Stage 4: Deprecate Monolithic Models

Goal: Remove pre-composed models that hide composition

  1. Deprecate RtInfectionsRenewalModel with migration guide
  2. Deprecate HospitalAdmissionsModel with migration guide
  3. Keep for 1-2 versions with deprecation warnings

Stage 5: Simplify Process Classes

Goal: Remove over-specialized Rt processes

  1. Deprecate RtPeriodicDiffARProcess and RtWeeklyDiffARProcess
  2. Show composition pattern: DifferencedProcess(ARProcess(...)) + PeriodicEffect(...)
  3. Simplify infection initialization to prevalence-based approach

Part 3: Tutorial Recommendations

DROP (no longer relevant or duplicated)

Tutorial Reason
extending_pyrenew.md Duplicates custom_randomvariables.md
extending_pyrenew-gfm.md Same content, different format

KEEP & UPDATE

Tutorial Updates Needed
getting_started.md Good foundation; add forward reference to generic observation framework
basic_renewal_model.md Update to show composition pattern instead of RtInfectionsRenewalModel
custom_randomvariables.md Good; rename to "Extending PyRenew" after dropping duplicate
observation_processes_counts.md Excellent; already aligned with new patterns
observation_processes_measurements.md Excellent; shows Wastewater subclass pattern
day_of_the_week.md Keep; practical feature
periodic_effects.md Keep; useful for seasonality

UPDATE SIGNIFICANTLY

Tutorial Changes
hospital_admissions_model.md Rewrite to use Counts observation + generic latent, not HospitalAdmissionsModel

ADD (missing critical content)

New Tutorial Content
Multi-Signal Models How to combine hospital + wastewater + ED in one model; use ModelBuilder pattern
Hierarchical Infections Multi-subpopulation modeling with HierarchicalInfections; partial pooling
Semi-Observed Models Wastewater covers part of population; unobserved subpopulation inference

Part 4: Missing Documentation

README Gaps

  1. No clear "when to use PyRenew" - needs positioning vs. EpiNow2, epidemia, etc.
  2. No quick-start code example - just links to tutorials
  3. No architecture diagram - the mermaid chart is too specific (HospitalAdmissions only)
  4. No component catalog - hard to discover what's available
  5. Missing: installation troubleshooting for JAX/NumPyro issues

API Documentation Gaps

  1. No docstrings on many classes - especially newer observation processes
  2. No type hints in many places
  3. No usage examples in docstrings
  4. Inconsistent validate() documentation - when is it called? what should it check?

Conceptual Documentation Gaps

  1. No "how PyRenew thinks" guide - the layer model (latent → observation → composition)
  2. No glossary - terms like "ascertainment", "generation interval", "infection resolution"
  3. No decision tree - "which component should I use for X?"
  4. No performance guide - when to use scan vs vectorized, JAX tracing tips

Part 5: API Clarity Improvements

Interface Simplification

  1. Standardize return types: All latent processes should return named tuples, not raw arrays
  2. Standardize sample() signatures: Consistent parameter names across similar components
  3. Add repr methods: Make debugging easier

Error Messages

  1. Add validation at construction time, not just sample time
  2. Clear error messages when components are incompatible
  3. Warnings when using deprecated patterns

Part 6: Recommended Package Structure (Post-Refactor)

pyrenew/
├── core/
│ ├── metaclass.py # RandomVariable, Model
│ ├── convolve.py # Convolution utilities
│ └── math.py # Mathematical utilities
├── variables/
│ ├── distributional.py # DistributionalVariable
│ ├── transformed.py # TransformedVariable
│ ├── deterministic.py # DeterministicVariable, DeterministicPMF
│ └── hierarchical.py # HierarchicalNormalPrior, etc.
├── processes/
│ ├── temporal.py # TemporalProcess protocol, AR, RandomWalk
│ ├── periodic.py # PeriodicEffect, DayOfWeekEffect
│ └── differenced.py # DifferencedProcess
├── latent/
│ ├── infections.py # Infections, InfectionsWithFeedback
│ ├── hierarchical.py # HierarchicalInfections
│ └── partitioned.py # PartitionedInfections
├── observations/
│ ├── base.py # BaseObservationProcess
│ ├── counts.py # Counts, CountsBySite
│ ├── measurements.py # Measurements (base for continuous)
│ ├── wastewater.py # Wastewater-specific
│ └── noise.py # All noise models
├── model/
│ ├── builder.py # ModelBuilder
│ └── multisignal.py # MultiSignalModel
├── utils/
│ ├── time.py # Date/time utilities
│ ├── arrays.py # Array utilities
│ └── validation.py # PMF validation, etc.
├── transforms/ # Transformations
├── datasets/ # Reference distributions
└── deprecated/ # Old models with deprecation warnings
├── rtinfections.py # RtInfectionsRenewalModel
└── admissions.py # HospitalAdmissionsModel


Summary: Priority Actions

  1. Immediate: Consolidate duplicate tutorials, update README with architecture overview
  2. Short-term: Add deprecation warnings to old observation classes; document new Counts/Measurements as primary
  3. Medium-term: Integrate HierarchicalInfections and PartitionedInfections from cfa-pyrenew-hierarchical
  4. Medium-term: Add ModelBuilder for multi-signal model composition
  5. Long-term: Deprecate monolithic models; reorganize package structure

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions