Variability vs Uncertainty

When a parameter is reported in a paper it is usually presented as both a central estimate, and some associated parameter indicating the “spread” of the parameter. This may be the uncertainty of the parameter or the variability of the parameter. These terms can sometimes be used interchangeably, but this is a mistake, as both terms refer to different specific concepts. The REDCap extraction form includes sections for both uncertainty and/or variability. This page seeks to clarify exactly how such associated parameters should be extracted.

Central Estimate

When extracting a parameter in REDCap, the first section of the form captures the central estimate of the parameter in question. A central estimate might be a mean, a median, or some other type. This is detailed in the “Parameter value type” field, in yellow below. In most cases, where there is a single value for the central estimate, this value is entered in the field marked in red below. If, however, three or more estimates are given for the parameter, perhaps disaggregated by location, age, time period, etc., rather than extracting each estimate as a separate parameter, we instead report the minimum maximum in the range of these values, in the parameter range field marked in pink below. This is referred to as the “rule of three” and is further described in the “Parameter Data” section of the wiki. (A later section in the form titled “Disaggregation” allows you to detail what the parameters are disaggregated by.) Always ensure with your pathogen lead if any exceptions to this rule are in place for a specific pathogen.

Lastly, in the field marked in green below, you report whether this parameter is either an: Observed sample statistic - as is often the case for naive case fatality ratios, or infectious periods, for example. Estimated parameter – for example a model fit estimate of the reproduction number.

Central_example

Uncertainty

The next section in the parameter extraction page provides fields to extract the uncertainty in the parameter. Uncertainty captures, as the name suggests, the extent to which we are confident in the central value extracted above. In general, therefore, we would only expect to extract uncertainty for parameters that were marked as an “estimated parameter” in the field marked in green above.

For example, a study that reports on time spent in hospital may say that the mean time spent in hospital was 7 days, with a range of 1-15 days across all patients. We have full certainty in this value, each of these data points is a guaranteed lived experience. The range does not therefore explain the uncertainty of the estimate, but rather the variability, how much this value can vary in different settings/patients.

However, if a paper fit a model to provide an R0 value of mean 1.7 (95% CrI 1.3-1.8), the reproduction number is not an inherently measurable and observable quantity. It is an “estimated parameter” and our confidence, our certainty, in that value will improve as more data becomes available. We would extract the 95% CrI as an uncertainty, as shown below.

Uncertainty_example

Note, a more subtle example: A study may report that the mean incubation period is 5 days, with a standard error of 1.2 days. In this instance, the standard error captures how confident we are that the reported mean captures the true population mean based on the sample of data gathered. Thus, standard error is a measure of uncertainty.

The distinction can be difficult to intuit. In most instances, you will know whether an associated parameter is a measure of uncertainty or variability based on whether or not the parameter in question appears in the “Type” drop down menu of the field. For example, “95% CrI” is a type option in the Uncertainty section of the REDCap form, but not in the Variability section.

If in doubt, please consult with a member of the team.

Variability

Variability is an associated parameter that describes the heterogeneity of the population. This may include standard deviations, variance, interquartile ranges, and more.

The first question in the section will ask if there is such an associated parameter to extract. If you select “Yes”, fields for further information will appear. You will next be asked for the type of associated parameter you wish to extract. If the associated parameter you wish to extract (such as standard deviation or range) does not appear in this drop down menu, it may indicate that you are entering this value in the wrong section. In which case, reach out to a member of the team for clarification.

Variability Uncertainty

These associated parameters can, on occasion, also include their own associated uncertainty. Perhaps a model fits an estimate of a mean and standard deviation for a parameter, whereby the central value (the mean) may have it’s own 95% CrI uncertainty interval, and the variability (standard deviation) also has its own 95% CrI uncertainty interval. In this case, there is an additional “Variability Uncertainty” section for extracting the uncertainty surrounding the associated variability parameter.

Shape and Scale

Not all parameters are detailed as a central value with an associated variability parameter. Some are instead depicted via two distinct values. Most commonly, a parameter may be described by a probabilistic distribution informed by a “shape” and “scale” parameter. In these instances, the top “central” parameter field is used for the first, shape, parameter (you will find “shape” as an option in the parameter type drop-down menu, marked in yellow above), and the associated parameter in the Variability section is used for the second “scale”, or “rate” parameter. An example is given below.

Examples

Example 1

The first example is from Xiao et al. (2003). Amongst other parameters, they report an incubation period for SARS:

Example 1

The central value here is a mean estimate of 2.4 days. That is extracted at the top of the parameter form like so:

Example1_central_solution

The additional parameters include a +/- of 1.5 days and a range of 1-5 days. As an observed sample statistic, neither of these values are measures of uncertainty. They both describe the variability of the parameter. There is no uncertainty to extract. Only one associated parameter can be extracted. Ordinarily, we would prioritise the single value associated variability parameter, however, the text does not specifically say what this +/- 1.5 days represents. While it is most likely a standard deviation, if the text does not explicitly state this, then we do not infer any additional information. As such, the +/- 1.5 days is not suitable for extraction, and we instead extract the range like so:

Example1_variability

Example 2

The second example is from Krow-Lucal et al. (2017). This paper also reports on incubation periods, this time for Zika. They report that a Weibull distribution best fits their incubation period data, and provide the estimated parameters of such a distribution in the technical appendix:

Example 2

We shall extract the second row for confirmed zika cases. In this instance, where the parameter is represented by a distribution, we extract the first, shape, parameter in the first field of the parameter form, like so:

Example2_central

The fact that the statistical approach here is an “Estimated parameter”, should have us on the lookout for associated uncertainty. Indeed, the paper also provides a 95% CI uncertainty interval for this shape parameter, which we can include in the uncertainty field, like so:

Example2_central_uncertainty

The second, scale, parameter, can now be included as an associated parameter in the Variability section, like so:

Example2_variability

And lastly, the uncertainty interval (95% CI) of the scale parameter can be included in the Variability Uncertainty section, like so:

Example2_variability_uncertainty

Historical Extractions

During the Ebola review, we identified potentially a significant issue with the way we were extracting delay distributions. The issue uncovered is that the measure of uncertainty is not sufficiently well characterised, in particular, is the extracted value referring to the variability of the underlying sample (for example the standard deviation of the number of days of observed incubation period) or the uncertainty of the parameter estimate (for example the standard deviation of the estimator itself such as the standard deviation of the posterior means of the incubation period).

During the extraction, we ‘overloaded’ the Parameter uncertainty section of the extraction form and mixed both above types of uncertainty. As part of the Ebola analysis, this caused several challenges and highlighted that we do need to address this problem properly. The good news is that this has not impacted the Marburg paper and very minor impact on the Lassa work (which was taken care of).

To fix this issue, we used an intermediate fix for the SARS-CoV-1 extraction as they were already ongoing, and we were limited in terms of changes we can implement as we do not want to re-extract all the papers we have completed so far (36% complete when we uncovered the problem).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Variability vs Uncertainty

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Article extractions

Model extractions

Outbreak extractions

Parameter extractions

Quality assessment

Example extractions

Variability vs uncertainty

Clone this wiki locally