-
Notifications
You must be signed in to change notification settings - Fork 2
Parameter extractions
- Please see the example page for specific examples of data extraction of this information. If still in doubt after looking at the wiki and these examples, please contact the parameter lead (see home page/sign-up sheet) as a first port of call for questions.
- We are extracting everything as presented in the paper, even if you think it's an error by the author(s). Please mark the paper down in quality assessment and make a note about this in the notes.
- DO NOT use commas in any field. If you need to separate items within a field, please use a semi-colon.
Some papers may have a huge level of parameter disaggregation (e.g. age, sex, location), or may report multiple values for a parameter using different methods e.g. for reproduction numbers and evolutionary rates. We have established different rules to ease the extraction process.
For non-location-related disaggregation (age/sex etc) or for parameters estimated using different models and data subsets (i.e. for reproduction numbers and evolutionary rates), please remember The Rule of 3. If a paper presents three or more estimated values for a parameter, e.g. seroprevalence for three or more age groups, extract these as a range of central estimates and specify that disaggregated/multiple estimates are available and what the parameter is disaggregated by.
Please remember that this range must still exclude values based on sample sizes <10, so if the total sample is greater than 10, but some disaggregated results (e.g. by age) have smaller samples <10, these should not be extracted. There must be 3 or more parameter estimates, each with >10 to apply the rule of 3.
Each pathogen has different rules on location aggregation, which we state here:
- Marburg; Ebola; MERS; RVF: Location is included within the rule of 3.
- Lassa; SARS; Zika; Nipah: Extract all disaggregated values when the disaggregation is by location down to admin level 2 (sub-regions, e.g., districts). However, please respect the rule of three for estimates below the admin 2 level (e.g. by neighbourhood).
It is important that we do not aggregate over key methodological differences between parameter estimates. This means that the rule of three does not apply when there are differences in non-contextual fields such as:
- Statistical approach: observed sample statistic/estimated parameter
- Genomic - Gene
- Ratio_prevalence_rate - method
- Ratio_prevalence_rate - case definition
We should, however, still aggregate over reproduction number and genomic parameter estimation methods (e.g. transmission model/branching process for reproduction numbers, Molecular clock/Root tip regression for evolutionary parameters), and we should aggregate within groups (e.g. if there are multiple seroprevalence adjustment methods, we would group into a range, selecting statistical approach: estimated parameter and ticking disaggregated by method).
A contextual exception to the rule of 3 is for serology: when serology estimates differ in outbreak timing (e.g. if one set of parameter estimates comes before an outbreak while another set comes after), it is important that these are extracted separately, and not summarised as a range of central estimates.
We are only extracting parameters that are estimated from or fitted to actual data. Do not extract parameters from non-fitted, theoretical transmission models (i.e., where parameters have been selected from other studies/randomly).
-
Parameter type – this will give you a drop-down of all the parameters we want to extract.
-
Parameter(s) from figure only - we are not extracting data from figures. If a parameter is available in figure form only, tick this box. Do not tick this box if any parameters are available for extraction from the remaining text/tables, even when additional parameters are available in figures (in this case, describe that there are additional parameters in the figures in the notes section). Parameter context and other extractable information may still be available when parameters are only available in figures, and should be extracted as normal.
Please note: Other than easily accessible xls or csv files, if parameters are reported in a separate, programming language-specific database, e.g. only available in an RData file, do not extract parameter values and tick the "from figure only" box. (Note: update tick box text for future pathogens)
-
From supplement – tick this box if the values of the parameters are only found in the supplementary material. This will make things a lot easier when we want to go back and find this information again if we know it isn’t in the main text.
-
Parameter value – the value stated in the paper whether this is in free text, a table or a figure caption. Note that we are not extracting anything from the figures themselves or performing any calculations. You may not extract a value for this field when there are 3 or more central parameter values (rule of 3), unless one of these values is clearly presented as the primary result (e.g. summarised over the full data set, or reported as the preferred method).
Please note: While it may be possible to identify or infer some values from a figure (e.g. range, dates, median, mode), these values should not be extracted unless explicitly written as text.
-
Parameter exponent - in scientific notation, i.e. 10^x, where the default is 0 (10^0 = 1). Note that this should primarily reflect the central value, but may also apply to the parameter range boxes and parameter uncertainty section below.
Please note: If the exponent varies across range and uncertainty values, extract the values as presented (i.e. these may not match with the central value exponent extracted) and state the values that have a different exponent and the value of this exponent in the notes. This will be accounted for in post-processing.
-
Parameter range of central estimates - the lower and upper values here correspond to the minimum and maximum values of the parameter across any dimension of disaggregation. For example, if the CFR is disaggregated by age and occupation, the lower value may be for a particular age group while the upper value might be for a particular occupation. These fields should only be filled according to the rule of 3 (please refer to the paragraph at the top of this page for pathogen-specific rules on this threshold of 3):
- For up to 2 central parameter values, extract individual values into separate parameter sheets and leave the range fields blank.
- If there are 3 or more central parameter values, extract the range.
- If there are 3 or more groups, and there is a primary central parameter estimate among these values (e.g. summarised over the full data set, or reported as the preferred method), extract the range as described and extract the central parameter value, including all uncertainly, variation and contextual information for this value. Note that the exponent input above applies to the parameter range also.
Please note: Please note that there is a second range that features in the paired parameter variability dropdown list (see variability section). This pertains to the range of the primary central parameter estimate only, if available, and should not be associated with the range of disaggregated central parameters.
Please note: If all the data used to generate a central estimate is presented, this can be captured in the range for the primary central parameter estimate only (in the variability section), not the range of central parameter estimates. This does not count as disaggregation. Instead we would tick the data available box in the context section.
-
Parameter value reported as inverse - tick this box if the inverse of a parameter of interest is reported instead of the parameter itself, e.g. if a (fitted) recovery rate is reported instead of the infectious period. This has been extended to include percentage complements, e.g., if a paper reports 80% survival, this can be extracted as the inverse CFR. Note that ticking the inverse will also apply to the parameter range provided above.
-
Unit – per week, percent, days, etc. If the inverse of a parameter is provided (as described above), please select the units of the parameter itself, e.g. if a (fitted) recovery rate is provided instead of the infectious period, tick 'Parameter value provided as inverse' and select days (or reported time units) as the unit.
Please note: Point estimate and percentage used to be separate in the previous iteration. Now if you wanted to extract 73% you would put 73 in the parameter value field and then choose percent as your unit.
-
Parameter value type – mean, median, shape, etc. Please note that it may be the case that multiple measures of central tendency (or variability - see following section) are provided, especially when the entire distribution of a parameter is presented. To avoid extracting multiple measures of centrality and variability for the same parameter and to avoid bias, only one parameter value type can be extracted. Central parameter types are prioritised based on the available variability/uncertainty types in the following way:
- When SD/variance/CIs are available: extract the mean.
- Else when only IQR/CrIs are available: extract the median.
- If mode is presented, this should be prioritised after the mean or median.
- If Weibull distribution parameters are presented: prioritise extraction of the shape and scale parameters rather than mean/CIs or median/CrIs. We can get mean/CIs from shape/scale analytically but can only get shape/scale from mean/CIs numerically.
-
Statistical approach – if the central parameter estimates are summarised directly from empirical data, select “Observed sample statistic”. If the central parameter is estimated using a transmission or other kind of model, select “Estimated parameter” (e.g. an adjusted seroprevalence would be an estimated parameter, not an observed sample statistic). Due to limited data sources, the Oropouche systematic review was extended to include data from case studies.
Parameter uncertainty represents the confidence in the central estimate, and decreases with more data, unlike variation which increases with additional data.
- Do not extract uncertainty unless a primary central parameter is also extracted. Uncertainty must correspond with the primary central parameter. If a primary central parameter is not available, i.e. only the “parameter range of central estimates” is available (e.g. Rt 1.5-2.3), do not extract any uncertainty.
- Single-type uncertainty can be extracted only if an interval of paired uncertainty is unavailable. This may include the standard error or a measurement of the posterior variation: posterior variance, posterior standard deviation, or posterior coefficient of variation.
- Paired uncertainty is the option you will be using most of the time -- this includes confidence and credible intervals.
- Please note that the exponent input parameter fields also apply to the uncertainty value fields, and this should be specified in the notes if these differ from the central parameter.
Distribution type – use this when a study states that the uncertainty around your value
As stated, and in contrast to uncertainty, variability increases with additional data. Variability extraction fields broadly follow the same structure as the primary central parameter fields: value type, exponent, inverse, unit, extracted from supplement, statistical approach and the same rules apply to these fields.
- Do not extract a range of variability values over disaggregated datasets; the value(s) extracted here must correspond only with the primary central value previously extracted.
- This section may also apply to any secondary parameters associated with the primary parameter (e.g., if the primary parameter is shape, scale should be extracted here, and selected under “value type”).
- As with the central parameter, only one variability value type can be extracted. These should be prioritised in line with the value type of the central parameter as follows:
- If the mean has been extracted, prioritise:
- SD
- Variance
- If the median has been extracted:
- IQR
- Range
- If the shape has been extracted:
- Scale
- Rate
- If other kinds of variability are presented, you will need to make a judgement call as to which is the most informative, adding a note describing this measure.
- If the full data is presented, extract the range of these values that correspond with the central value and tick the mark this in data availability in the context section.
- If the mean has been extracted, prioritise:
If variability is reported as a single value (e.g. SD), not a paired value, there may also be uncertainty reported associated with this value. These fields follow the same structure as the “parameter uncertainty” section.
If the variability is reported as a distribution, this can be extracted here.
- Disaggregated data available – tick this box if you can find the parameter disaggregated by age groups, occupation etc. in line with the rule of 3.
- Disaggregated data available only – tick this box if there is no primary central parameter value.
- Disaggregated by – please note that disaggregated by 'Method' includes both the choice of model but also any sensitivity analyses that involves varying model parameters given a particular model choice.
- Sex – the sex composition of your study population. If you have 99 men and 1 woman you would still put “both” in this option.
- Sample size – number of participants/samples tested etc.
- Setting – how was the study conducted?
- Group – demographic i.e. who was sampled?
- Age min and age max – these must be number fields. If your sample is people over 18 you would put age min = 18 and leave age max blank. Please do not try and insert things like “18+” as this will make things much harder in post-processing.
- Country – where was the study undertaken?
- Location reported - i.e. Kerry Town Ebola Treatment Centre. If multiple locations are reported, DO NOT USE A COMMA TO SEPARATE THE LOCATIONS. Please use a semi-colon “;” instead.
- Location type - e.g. district, state, province, country, hospital. However the location is described by the paper
- Survey start and end dates - report the dates of the sampling, not the outbreak (although these may reflect one another).
-
Timing – when in the outbreak was this study undertaken?
Please note: If it is a serological study before and after an outbreak, extract the seroprevalence separately for each serology survey (see example 3), deviating from the rule of 3.
- Urban or rural area – was the study context an urban (e.g. urban/city/town) or rural (e.g. rural/agricultural/village) area, or both? This should be "Unspecified" if not explicitly described.
- Data availability – options include: as an attachment, with a DOI, on Github, on another platform. If the data is presented in full the paper, mark as “included in attachment”.
This section extracts details regarding parameters estimated from pathogen genetic sequences. Please note that
- Sequences may be obtained from human and non-human hosts, with the caveat that experimental genomic data should not be included, e.g. if mice were inoculated to see how mutation rates change under non-standard conditions.
- These values are often in the supplemental material, so if genetic sequences or phylogenetic analyses are presented, check the supplement.
- We are not extracting parameters associated with selection pressure or synonymous/nonsynonymous mutations unless, based on data or methodological limitations, they have only been able to calculate substitution rate from nonsynonymous mutations (in that case specify this in the 'Gene' field, similar to in vitro experiments - see ‘Gene’ description).
- If substitution rates are calculated for subgroups (e.g. 'clades,' 'strains,' 'branches', etc), report the global estimate and indicate disaggregated data is available in the Parameter Disaggregation section.
- If no parameters were derived from genetic sequences, then this section can be skipped even if sequencing was performed and reported.
-
Parameter type - substitution rate, evolutionary rate, and mutation rate are different ways of describing the speed at which genetic changes accumulate in a population. When selecting the parameter value type, choose the value type and units based on the wording used by the authors in the article. If there are multiple terms used for the same measure (e.g. substitution rate is used in the text, evolutionary rate is used in the table), choose either the most frequently used term or default to substitution rate (if the units are substitutions per site per year).
-
Parameter unit - as always, units are very important for these parameters. The most common unit is substitutions per site per year. If units are not clear or they do not match the available options in the drop-down menu, select 'unspecified.'
-
Parameter value type – As with other parameters, select the value type as specified in the paper. This is often median, but ‘central - unspecified’ may be appropriate if not addequately described by the authors.
-
Statistical approach – For the most part, substitution/evolutionary rates are estimated parameters, as opposed to observed sample statistics. However, if no detail on how these parameters were calculated, please select ‘unspecified’ and make sure this is reflected in your QA responses.
-
Gene - type the portion of the pathogen’s genome used to estimate any extracted parameters (e.g. growth rate, substitution rate). This can be a gene, a gene segment, a codon position, or a more generic description (e.g. ‘whole genome’ or ‘intergenic positions’). If parameter values are independently estimated for different portions of the genome, please enter each on a separate parameter value form, following the rule of 3.
Please note: Unlikely based on inclusion/exclusion criteria, but just in case: If a mutation rate is estimated by in vitro experiments of recombinant variants (for example, measuring the rate of mutation in an inserted gene, such as green fluorescent protein [GFP]), enter the name of the inserted gene used, even though this gene might not be naturally occurring in the virus's genome. In addition, they may measure different types of mutations (SNPs vs indels) during in vitro experiments. If this is the case, enter the type of mutation used to calculate the rate (ex. GFP-SNP, to signify that SNP mutations in the GFP gene were used to calculate the mutation rate).
-
Newly sequenced data available - select the data check box if the study sequenced new pathogen isolates and their accession numbers have been provided for retrieval from a public database. If sequences are available, but no parameters of interest were estimated using this data, do not check this box.
-
Sample size – the number of sequences used, as described by the paper. This may or may not include any “outgroup” strains (reference genomes that contextualise the strain of interest).
-
Setting/Group – as with the sample size, answer based on all sequences used in the analysis. This is often poorly described as sequences are often obtained from public databases where the setting is unknown.
-
Country – If available, list countries of origin of all sequences used in the analysis.
-
Start/End date – give the earliest (start) and most recent (end) dates of specimen collection that gave rise to sequences included in the analysis. This might only be a year.
-
Data availability – this can be ‘Yes- on another platform’ if the authors provide a full list of accession numbers and public database for the sequences used in the analysis.
Please note: it is often difficult to complete the context for phylogenetic analyses because much of the data comes from public databases or different studies that are not well described in the current analysis. For all context questions, answer based on all sequences included in the analysis where evolutionary/substitution rate was calculated.
The epidemic growth rate (often denoted
-
Parameter type - Growth rate (
$$r$$ ) - Parameter unit - units are particularly important for the growth rate, which is measured per unit of time (e.g. per day or per week). Possible options include per hour, per day, per week, per month, per year which should cover most cases, but if it doesn’t please use “unspecified” and describe what the unit is in the notes.
- Sample size – the sample size for the growth rate isn’t necessarily intuitive so we don’t collect that information here.
Please note: please use the notes field to add any information you think may be useful for interpretation, e.g. the type of data the estimate is based on (e.g. incidence of cases, incidence of deaths, or genomic data).
The doubling time (or halving time) is a parameter that translates the epidemic growth rate r (see above) into a more intuitive metric, namely the time it takes for the incidence to double (if
-
Parameter type - doubling or halving time, depending on whether the central estimate corresponds to a growth rate that is
$$>0$$ or$<0$ $ respectively, and irrespective of the uncertainty around the central estimate. Specifically, you can have estimates of the growth rate with uncertainty interval spanning zero, which are consistent with the epidemic either growing or declining. In those cases, we still classify$$T_d$$ as doubling time (i.e. the epidemic is growing) or halving time (i.e. the epidemic is declining) only based on the central estimate of r. This may also lead to uncertainty intervals in doubling/halving time that extend to infinity - see doubling time estimates for Nigeria in Table 2 in this paper for example: https://www.nejm.org/doi/full/10.1056/NEJMoa1411100. -
Parameter unit - as for the growth rate, this is particularly important for the doubling time which is measured in units of time. Possible options include hours, days, weeks, and months, which should cover most cases, but if it doesn’t please use “unspecified” and explain what the unit is in the notes.
-
Sample size – the sample size for the doubling time isn’t necessarily intuitive and we don’t collect that information here.
Please note: please use the notes field to add any information you think may be useful for interpretation, e.g. the type of data the estimate is based on (e.g. incidence of cases, incidence of deaths, or genomic data).
The attack rate is the proportion of an at-risk population contracting the disease during a specified time interval.
- Parameter type - we distinguish between attack rates (at the general population level) and secondary attack rates, which is the attack rate for a sub-population in a specific setting (households, hospital wards, etc).
- Parameter value, unit, and exponent - the attack rate is often reported as a percentage or rate, e.g. 52 people per 10,000 people.
- If the attack rate is reported as a percentage, extract a percentage as you would for any other parameter
- If the attack rate is reported as a rate:
- leave the parameter unit blank
- use the exponent box to record the denominator of the rate
- e.g. for 52/10,000 people: put 52 in the value box and then -4 in the exponent box.
- Ratio/Prevalence/Rate - numerator/denominator – please extract the numerator and denominator of the value (or central value) of the attack rate.
- Ratio/Prevalence/Rate – method – naïve or adjusted, as described in the severity section.
- Ratio/Prevalence/Rate – case definition – this refers to the numerator of the attack rate.
Please note: PCR prevalence is not the same as attack rate, so should not ordinarily be extracted under this parameter type. However, if the paper describes it as an attack rate, it should be extracted as an attack rate, but please describe this in the notes.
We are extracting the basic (R0) and effective (Re) reproduction numbers. These are further broken down for vector-borne diseases (e.g. Zika) into the human and mosquito R0 and Re components (where each combines to form the overall R0 or Re, e.g. R0human * R0mosquito = R0, although other formulations may be described). These parameters should, as always, be selected as identified by the paper.
Please note: For R0 transmission between other animals, choose Reproduction number (Basic R0/Effective Re), then specify “animal-animal” under “transmission pathway”, or “other” and specify the transmission pathway in the notes (e.g. animal-mosquito). Transmission from non-vector animals to humans does not constitute a reproduction number transmission pathway.
-
Parameter type
- R0/Re basic - Human-to-human or human-to-vector-to-human transmission paths
- R0 human - Human to vector (or as defined by the paper)
- R0 mosquito - Vector to human
-
Method – specify the method used from:
- Renewal equations / branching process - includes EpiEstim & Wallinga and Teunis for example - typically gives Re. Please see model extraction notes on how to identify renewal equations and branching processes.
- Growth rate - will typically use Wallinga and Lipsitch to convert an estimated growth rate into reproduction number.
- Compartmental model - fitted to data and where the parameters are then converted into a reproduction number.
- Next generation matrix - typically gives R0.
- Empirical - e.g. they reconstructed the transmission tree from contact tracing data, then counted secondary cases for each case - gives Re.
- Genomic methods – please name the gene used in the notes.
- Other - please write in notes.
-
Transmission pathway – specify whether the reproduction number represents transmission from human to human, vector to human, animal to human or animal to animal.
-
Sample size – the sample size for the reproduction number isn’t necessarily intuitive and depends on the estimation method as follows:
- Empirical: the number of infector-infectee transmission pairs.
- Genomic: the number of genomic samples.
- Any other (Renewal equations/Branching process/Growth rate/Compartmental model/Next generation matrix): report the number of cases. Please note that the number of timesteps with observations may also be relevant to the precision of the estimate, and the user should either refer to the duration field (the start and end dates of the sample collection) or to the study itself to know more (including any uncertainty estimates that are not captured here).
Overdispersion refers to the variation in infectiousness among individuals within a population. In many infectious diseases (SARS, Ebola virus disease), most people infect very few others, while a small number of "superspreaders" cause many secondary infections. This creates a highly skewed distribution of transmission events. When analysing transmission, we model Z = the number of secondary infections caused by each infected person, also called the offspring distribution.
The standard approach uses a negative binomial distribution with two key parameters:
- R0 (or Re during specific time periods): The average number of secondary infections per case.
- k: The overdispersion parameter that quantifies how much individual infectiousness varies.
The negative binomial distribution arises as:
- Each individual has their own reproductive number v (how many people they will infect).
- These individual values v follow a Gamma distribution: v∼Gamma(R_0,k).
- The resulting number of secondary cases Z follows: Z∼NegBinomial(R_0,k).
Please note that smaller values of k indicate greater overdispersion.
-
Parameter type - Overdispersion
Please note: There are two parameters that we extract under the "overdispersion" parameter:
- k: from a branching process model with a negative binomial distribution.
- The maximum number of superspreading cases: the highest number of secondary infections attributed to a single case or environment.
We will not extract any other overdispersion parameters. If the overdispersion parameter is estimated using a different branching process distribution or with a compartmental model, please capture this in a note somewhere sensible (e.g. a reproduction number parameter form note, if one exists, or the model extraction form note). Overdispersion estimates from compartmental models require additional parameter and contextual information to understand and should therefore be considered on a case-by-case basis.
-
Parameter unit -
- If k: unitless, leave this field blank.
- If max secondary infections: choose "Max nr. of cases superspreading (related to case)" or "Max nr. of cases superspreading (in environment)", as appropriate.
Please note: k values are directly associated with a specific Re value, and multiple time-specific Re-k pairs may be reported e.g. before/after interventions are implemented.
- It is important that we capture the associated Re value(s): please write these values in the notes.
- If multiple overdispersion values are presented, these should be extracted separately, along with all associated Re values, although the rule of 3 still applies.
This is intended for pathogens (e.g. MERS) where there is both human to human (h2h) and animal to human (a2h) transmission, and aims to capture the relative magnitude of these two routes of infections in humans.
-
Parameter type - one of two parameters can be selected from the drop-down menu:
- Relative contribution - human to human.
- Relative contribution - zoonotic to human.
- Parameter value and unit - we expect these to be proportions or percentages, so if a study estimates 60% of infections in humans to be from h2h infection, you would select "relative contribution: human to human" and enter "60" as parameter value and "percentage" as unit. Or if the study instead reported the opposite, i.e. 40% of infections in humans to be due to infection from animals, in this case you would select "relative contribution: zoonotic to human" and enter "40" as parameter value and "percentage" as unit.
Please note: For RVF, human to human transmission has not been observed. We do, however, want to distinguish between the relative contributions of mosquitos and livestock to human infection. We extract zoonotic contribution to infection as above and assume that zoonotic to human and mosquito to human contributions sum to 1. If only mosquito to human contribution is presented, tick the “parameter value reported as inverse” to allow for 1- zoonotic contribution.
Seasonal forcing was an attempt to capture seasonal patterns of transmission. This is difficult to standardise and has since been retired and replaced with the “Seasonality included” section of the model extraction form.
These parameters all refer to time intervals in the natural history of infection of the host.
-
Parameter type – Human delays are selected from the following:
- Generation Time - the generation time is the time interval between infector exposure/infection and infectee exposure/infection. It may be used in reproduction number estimation, but given the difficulties in its observation, it may be replaced by the serial interval (see below).
- Serial Interval - the serial interval is the time interval between infector symptom onset and infectee symptom onset. It is frequently used in reproduction number estimation, as a substitute for the generation time.
- Latent Period - the latent period is the time interval between exposure/infection and infectiousness. It is sometimes used interchangeably with the incubation period (see below). It may also be referred to as the latency period, the exposed period, or the pre-infectious period.
- Incubation Period - the incubation period is the time interval between exposure to infection and symptom onset. It often coincides with the latent period, but may be shorter (symptom onset before infectiousness, e.g. SARS) or longer (infectiousness before symptom onset, e.g. Covid-19). It may also be referred to as the intrinsic incubation period (in the context of vector-borne diseases) or a subclinical infection.
- Infectious Period - the infectious period is the time interval during which the host remains infectious. It directly follows the latent period (see above). It may also be referred to as the infective period, the contagious period, the transmission period or the communicability period.
- Time in Care - the time in care is the time interval between admission to care and discharge from care or death. Unless there is a delay in receiving care, it directly follows the time from symptom onset to careseeking. It may vary according to health outcome and is typically highly skewed. It may also be referred to as the length of stay (LOS).
- Symptom onset to admission to care - as defined.
- Symptom onset to discharge/recovery - as defined.
- Symptom onset to death - as defined.
- Admission to care to discharge/recovery - as defined.
- Admission to care to death - as defined.
- Other Human Delays - Human delays other than those listed above may also be reported, e.g. time from seeking care to admission to care, admission to care to intubation, time spent on mechanical ventilation etc. To record these, please go to the 'Other human delay' section and fill out the start time and end time as described in the article.
-
Parameter type - Mosquito delays are selected from the following:
- Extrinsic incubation period – The extrinsic incubation period (EIP) is defined as the period of time between mosquito infection and when the mosquito can transmit the pathogen. The EIP is often studied in experiments, but for these reviews, we only extract EIPs outside of experimental conditions (i.e. they estimated it with a model, or for some other reason that would make the paper stay within the inclusion/exclusion criteria). Sometimes, it is reported as 1/EIP, or the rate of transition between mosquito infection and mosquito transmission. This is most common with mathematical models. Please extract the value for 1/EIP and mark it as an inverse parameter.
- Time to viral clearance - this parameter is an artefact from previous extraction form versions and is unlikely to be useful.
- Human to mosquito generation time
- Mosquito to human generation time
As for the standard generation time, the human to mosquito and mosquito to human generation times are the time intervals between infector exposure/infection (of the human or mosquito) and infectee exposure/infection (of the mosquito or human).
Zika congenital syndrome (microcephaly)/miscarriage probability will most likely be reported as a proportion, percentage, or ratio with a numerator/denominator.
- Parameter type - select Zika congenital syndrome (Microcephaly) or Miscarriage rate.
- Parameter value – please extract the central estimate (proportion or percentage) if available.
- Parameter unit – select percentage or leave blank for proportion.
- Numerator and denominator - fill in the numerator and denominator. The sample size should be equal to the denominator and should only include infected pregnant women, not infants.
- Case definition - ideally, the definition should be among confirmed infected pregnant women, but please record the case definition of the infected pregnant women.
Please note: Zika congenital syndrome: the outcome in the numerator could be Zika congenital syndrome (which includes microcephaly and other neurological conditions) or only microcephaly. Please extract both and record in the notes which outcome was reported. For the Zika manuscript, extracted data were later post-processed and labelled according to the outcome.
Please note: Miscarriage rate: the outcome in the numerator could refer to general pregnancy loss (any loss of a pregnancy from conception to stillbirth), miscarriage (loss in the earlier stages of pregnancy), or stillbirth (loss in the later stages of pregnancy). Please extract each and define in the notes which outcome was reported.
-
Parameter type - we extract case fatality ratios (CFR), infection fatality ratios (IFR), and the proportion of cases that are symptomatic and asymptomatic.
- Case fatality ratio (CFR) - the proportion of cases who end up dying of the disease. Note this depends on the case definition used, as the denominator is people identified as "cases". All CFRs should be extracted, even when a subset of the population is selected (e.g. severe cases); make sure to describe the population denominator in the context and notes.
- Infection fatality ratio (IFR) - the proportion of infections that die from the disease (harder to calculate but less context dependent).
- Symptomatic proportion of infections - the proportion of total infections that are symptomatic.
- Asymptomatic proportion of infections - the proportion of total infections that are asymptomatic.
- Parameter value - we don't do any calculation ourselves i.e. if a paper quotes the number of deaths and number of cases, but not a CFR, we don't calculate the CFR, but can still extract the numerator and denominator as described below.
- Ratio/Prevalence/Rate values – please extract the numerator and denominator that generate the severity ratio. In line with the rule of 3, only extract the numerator and denominator of the central CFR value, even if disaggregated numerators and denominators are available. If there is no central value, do not extract any numerator or denominator. If the numerator and denominator are presented, but the percentage severity is not, extract the numerator, denominator and context, but leave the central value blank.
-
Ratio/Prevalence/Rate – method - we extract information about the method used to calculate CFR (or IFR), mainly whether it is:
- a "naive" method, i.e. percentage mortality that computes total deaths divided by total cases (or infections); this is wrong because there may be many cases or infections who do not have final status information, so the naive estimate is typically an underestimate of true CFR (or IFR).
- an adjusted method, which somehow accounts for infections or cases with unknown final status (e.g. calculates deaths / (deaths + recoveries) or does something more fancy).
- an unknown method.
- Ratio/Prevalence/Rate – case definition – choose from confirmed, clinically diagnosed/symptomatic, lab confirmed, epidemiologically linked (contact tracing), probable, suspected, other (write in notes, if so), or unspecified. The case definition represents the denominator of the severity estimate.
These parameters refer to estimations of seroprevalence in the paper. This may also be referred to as antibody prevalence. These parameters will all be expressed in a proportion or percentage of the population. Only extract seroprevalences from “real” populations, i.e. do not extract from papers estimating an assay's sensitivity or specificity. PCR prevalence may be reported with seroprevalence, but is not in our parameter list and should not be extracted.
-
Parameter type:
- IgG - the prevalence of IgG antibodies.
- IgM - the prevalence of IgM antibodies.
- PRNT - PRNT refers to a plaque reduction neutralization test, which is a test for neutralizing antibodies. This option is generalisable to other unlisted neutralising assays.
- HAI/HI - HAI refers to a hemagglutination inhibition assay, which is another test for neutralizing antibodies in the blood.
- IFA - IFA refers to an immunofluorescence assay, a test to estimate seroprevalence in a population.
- Other – specify in notes: If the assay is unlisted, choose this option and describe the test in the notes. Note that this option appears at the end of the REDCap dropdown list!
- Unspecified - if there is no assay specified, but it is indicated that some people had antibodies, then use this option. If multiple antibodies are tested for in the same test, then use this option and describe the test in the parameter notes.
Please note: If the serology test and the antibodies tested for differ, prioritise the antibodies tested for in the parameter type, not the test. Describe this difference in the notes.
- Parameter value – note that we extract null results (0%) and can infer 0% from the text if they state the serology with words, e.g. we found no seropositive individuals.
- Parameter unit – most results will be given as percentages.
- Ratio/Prevalence/Rate - numerator/denominator – please extract the numerator and denominator of the value (or central value) of the seroprevalence.
- Ratio/Prevalence/Rate – method – naïve or adjusted, where adjustments may account for survey design biases and sensitivity/specificity.
Please note: Often seroprevalence studies use more than one assay. For example, an initial test using ELISA is conducted, but then a neutralisation test is needed to confirm this, for example, due to cross-reactivity. In these cases, please extract both initial and neutralisation seroprevalence estimates in the paper ensuring that you select the relevant assay type and denominator and context each time. The denominator for the neutralisation test should be as reported (for example, but not exclusively, a subset of the samples tested by ELISA).
Please note: Typically, the sample type will be serum. Some pathogens (e.g. Nipah) seroprevalence, both IgG and IgM, may be based on sample types other than serum. If that is the case, please note the sample type in the notes field for that parameter extraction, e.g. cerebral spinal fluid.
We are extracting general information about risk factors in the included papers. Choose 'Risk Factors' from the Parameter Type drop-down menu, then move to the Risk Factors section below. Risk factors may identify significance (e.g. Chi-squared tests, Fisher tests, or may quantify the risk relationship e.g. logistic regression). We are not extracting the values of odds ratios, risk ratios, etc or information on the direction of the risk factor (i.e. increases or decreases risk) because this requires context that we are not extracting (e.g. definition of the reference group).
- Risk factor outcome - here, choose the outcome for which the risk factor was evaluated. There may be multiple options here, each of which should be extracted with a new parameter form. The outcome, not the risk factor must be associated with the pathogen of interest (e.g. we are interested in age as a risk factor pathogen infection, but we are not interested in infection of the pathogen as a risk factor for infection with a different pathogen). It is sometimes difficult to distinguish between an infection risk factor and a serology risk factor, since sometimes infection is determined based on a serological assay, e.g. PRNT, IgM. If the author specifies the outcome, please extract as written. If the author does not specify, e.g. there is just a significant difference in X between group A and group B, then extract PCR test as infection risk factors, and any assays, PRNT, IgM, IgG, HAI, IFA etc as serology risk factors.
- Risk factor name - this is the name of the population group to whom the risk factor applies, e.g. age, occupation, ...
- Risk factor occupation - if you have chosen 'Occupation' in the previous question, choose the occupation(s) that correspond(s) most closely to that described in the paper.
- Risk factor significant - choose whether the risk factor(s) is/are significant or not. We are extracting both univariate (naive) and multivariate (adjusted) risk factors, even if they're both available. Significance should be extracted according to the definition described by the paper, so may not be consistent across papers. If the significance level is not 0.05, add the significance level to the notes.
- Risk factor adjusted - choose whether the estimates for the risk factors are adjusted or unadjusted. If risk factor significance is estimated using multiple methods, tick disaggregated data available by "other" and make a note in the article form. If different methods disagree over whether a risk factor is significant or insignificant, extract the risk factor as both.
Please note: During extraction, we group the risk factors in the paper by those that are significant/insignificant and those that are adjusted/naive. This means that for each outcome, there may be a total of 4 parameters to extract: significant/adjusted, significant/naïve, insignificant/adjusted, insignificant/naïve.