Skip to content

Process

Matt Dray edited this page Feb 9, 2026 · 5 revisions

Caution

This wiki is in development. It's incomplete and content may change without notice. Contact the Data Science team members named in the CODEOWNERS file for details.

Caution

This guidance is brief and under construction. The existing ATS table (referred to below) contains only tagged runs, but this isn't optimal. We plan to develop an Azure Table Storage table for all model runs, featuring columns that will flag run-stage and sites. We also plan to add run-stage and site metadata to entities in that table through a terminal user interface (TUI), which is currently in the concept repo nhp_tag_runs_tui).

Where

At time of writing, you'll need to tag runs in more than one location:

  • Azure Table Storage (ATS)
  • Results files directly
    • Old-style results, i.e. stored as a single *.json.gz
    • New-style results, i.e. stored in a directory of parquets and params.json

The ATS table was designed to be the 'truth' and to supersede the need to add metadata onto the literal *.json.gz and params.json files.

Repos like this one (nhp_tagged_runs), nhp_tagged_runs_params and nhp_compare_mitigation_predictions_app are now dependent on the content of the ATS table.

Others, like nhp_output_reports still depend on the old-style results having run-stage metadata on them (although there's a PR to correct this). Eventually, all repos that need to refer to this information should read it from the ATS table.

I (MD) don't think any projects are actually gleaning run-stage information off of the params.json in the new-style results directories. Run-stage tags have, however, been kept up to date with the run-stage tags on the old-style results. This is mostly for consistency and given the expected switch to use new-style results exclusively in our products (though an ATS solution would also supersede the need to tag new-style results as well). Ask MD for more information.

How

ATS

You can update the relevant ATS table programmatically. You can find metadata for the specific run to be tagged and then add/update that entity (row) in the table, based on its unique RowKey value. Python is a better option for this because R's {AzurStor} package doesn't have methods for tables.

There are some functions you may find useful in nhp_tag_runs_tui for this task (access the table, fetch all scenarios, update an entity).

You can also use the web interface or Microsoft Azure Storage Explorer to add entities to the table, but this is prone to error.

Results files

See the 'Process (superseded)' section of this wiki to see how to tag a *.json.gz file containing old-style results. It's a similar process for the new-style results, except it's the params.json that you'll need to tag within each results directory.

Clone this wiki locally