About

Caution

This wiki is in development. It's incomplete and content may change without notice. Contact the Data Science team members named in the CODEOWNERS file for details.

Purpose

Some of our New Hospital Programme (NHP) products use parameters and results from specific model scenarios. In particular, the output reports repository, but also in tools like the Compare NHP Activity Mitigation Predictions app. We label these special scenarios with a 'run stage' tag to flag them for use.

This wiki contains information about these tags and how to perform the tagging process.

Tag set

Tags are in the format 'nhp_final_ndg2'. This identifies a scenario whose results are to be used in a 'final' outputs report, where the scheme selected non-demographic variant (NDG) 2 from the available parameters.

Note

More tags may be added or retired in future and this wiki may not reflect the current state.

The main tags in use are:

final_report_ndg2 (primary scenario for outputs reports)
final_report_ndg3 (secondary 'comparison' scenario for outputs reports)
validation_report_ndg2 (primary scenario for validation reports)
validation_report_ndg3 (secondary 'comparison' scenario for validation reports)

'Retired' main tags are:

final_report_ndg1 (original 'comparison' scenario for outputs reports, superseded by the NDG3 variant)

Warning

The main tags above are integral to reporting process and should not be amended unless directed by a scheme via a model relationship manager (MRM), or if a mistake has been made.

Optional tags include:

initial_ndg* (schemes' first-recognised runs)
intermediate_ndg* (scenarios tagged as a 'savepoint' for reference, but not used in reporting)

Every scheme should eventually have scenarios labelled with the main tags (or retired tags). Optional tags should be considered ephemeral and are mainly used for ad hoc work.

Storage

Location

Currently, we store a lookup table of these tagged scenarios in Azure Table Storage (ATS). There's one row ('entity' in ATS parlance) per tagged scenario. Columns contain some useful metadata, like scenario name, create datetime, the run stage and the path to where the results blobs can be downloaded from. It also has columns for site codes split by activity type, which is important for filtering results data when reporting.

Note

The active ATS lookup table of tagged runs contains only scenarios that have a tag. In future, we may switch to a table that contains all scenarios in our results container.

Originally, we tagged scenario results files directly in the Azure blob container. This was not optimal, given the chance for error and the need to tag multiple files, given we have older-style zipped results files and newer-style results directories. An ATS table lets us handle tagging in one location only.

Note

At time of writing, the ATS table is the canonical source for tags. Active and new products should consult the table. Some older products may still contain code that consults results-file metadata directly (nhp_output_adhocs, for example).

ATS lookups are faster than crawling all the scenario results files for metadata. Queries can also be run on the server-side. Reading an ATS table also limits the number of requests we need to make to Azure.

Dictionary

Each ATS table entity (row) is a scenario. Columns are:

PartitionKey: scheme code
RowKey: unique, concatenated scenario name and create datetime
Timestamp: created automatically on upload
scenario: the scenario name
create_datetime: in the form YYYYMMDD_HHMMSS
run_stage: 'final_report_ndg2', etc
app_version: the model version that the scenario was run on, e.g. 'v4.2'
results_dir: path to directory containing aggregated results (parquets + params.json), which will only currently exist for runs >=v3.1
results_file: path to the old-style results file in a single zipped json
sites_ip, sites_op, sites_aae: comma-delimited (e.g. 'XYZ1,XYZ2', could be 'unknown', 'ALL' if all of them) set of site codes that the scheme has specified for their reports

Access

Deployed table

This repository, nhp_tagged_runs, contains a document that, when rendered on schedule, grabs a snapshot of the ATS table and presents it in a deployed interactive lookup table (login/permissions required). Note that this:

is intended primarily as a lookup tool for the Data Science team
may not reflect the current up-to-the-moment 'truth'
is not intended for purposes of governance
does not avoid the need for schemes and MRMs to keep their own records

The table contains a link to the outputs app for each scenario so you can see visualisations of the results and download the data.

Azure

If authorised and logged in, developers can view the ATS lookup table directly through the web interface or Microsoft Azure Storage Explorer. You can also work programmatically with the table via the Azure Tables client library in Python, or httr2 in R to query the table API (there is no dedicated ATS package for R).

Note

Please ask a member of the Data Science team for the location and name of this table, if you are authorised to access it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About

Purpose

Tag set

Storage

Location

Dictionary

Access

Deployed table

Azure

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally