|
58 | 58 | #' might produce biased predictions. In particular for mixed models, using |
59 | 59 | #' `"response"` is recommended, because averaging across random effects groups |
60 | 60 | #' is more accurate. |
61 | | -#' @param estimate Character string, indicating the type of target population |
62 | | -#' predictions refer to. This dictates how the predictions are "averaged" over |
63 | | -#' the non-focal predictors, i.e. those variables that are not specified in |
64 | | -#' `by` or `contrast`. We can roughly distinguish between "modelbased" and |
65 | | -#' "empirical" predictions. |
66 | | -#' - `"typical"` (default): Predictions are made for observations that are |
67 | | -#' represented by a data grid, which is built from all combinations of the |
68 | | -#' predictor levels in `by` (the focal predictors). `"typical"` then takes the |
69 | | -#' mean value for non-focal numeric predictors and marginalizes over the |
70 | | -#' factor levels of non-focal predictors, which computes a kind of "weighted |
71 | | -#' average" for the values at which these terms are hold constant. These |
72 | | -#' predictions are useful for comparing defined "groups" and are still a good |
73 | | -#' representation of the sample, because all possible values and levels of the |
74 | | -#' non-focal predictors are considered (averaged over). It answers the |
75 | | -#' question, "What would be the average outcome for a 'typical' observation?", |
76 | | -#' where 'typical' refers to subjects represented by (i.e., that share the |
77 | | -#' characteristics from) the data grid. This approach is the one taken by |
78 | | -#' default in the `emmeans` package. |
79 | | -#' - `"average"`: Predictions are made for each observation in the sample. Then, |
80 | | -#' the average of all predictions is calculated within all groups (or levels) |
81 | | -#' of the focal predictors defined in `by`. These predictions are the closest |
82 | | -#' representation of the sample, because `estimate = "average"` averages |
83 | | -#' across the full sample, where groups (in `by`) are not represented by a |
84 | | -#' balanced data grid, but rather the empirical distributions of the |
85 | | -#' characteristics of the sample. It answers the question, "What is the |
86 | | -#' predicted value for an average observation (from a certain group in `by`) |
87 | | -#' in my data?". |
88 | | -#' - `"population"`: Each observation is "cloned" multiple times, where each |
89 | | -#' duplicate gets one of the levels from the focal predictors in `by`. We then |
90 | | -#' have one "original" and several copies of that original, each varying in |
91 | | -#' the levels of the focal predictors. Hence, the sample is replicated |
92 | | -#' multiple times to produce "counterfactuals" and then takes the average of |
93 | | -#' these predicted values (aggregated/grouped by the focal predictors). It can |
94 | | -#' be considered as extrapolation to a hypothetical target population. |
95 | | -#' Counterfactual predictions are useful, insofar as the results can also be |
96 | | -#' transferred to other contexts (Dickerman and Hernan, 2020). It answers the |
97 | | -#' question, "What is the predicted response value for the 'average' |
98 | | -#' observation in *the broader target population*?". It does not only refer to |
99 | | -#' the actual data in your observed sample, but also "what would be if" we had |
100 | | -#' more data, or if we had data from a different sample. |
101 | | -#' |
102 | | -#' In other words, the distinction between estimate types resides in whether |
103 | | -#' the prediction are made for: |
104 | | -#' - *modelbased predictions* (focus lies on _predictors_), which are useful to |
105 | | -#' look at differences between typical groups, or for visualization |
106 | | -#' - A specific individual from the sample (i.e., a specific combination of |
107 | | -#' predictor values for focal and non-focal predictors): this is what is obtained |
108 | | -#' when using [`estimate_relation()`] and the other prediction functions. |
109 | | -#' - A typical individual from the sample: obtained with |
110 | | -#' `estimate_means(..., estimate = "typical")` |
111 | | -#' - *empirical predictions* (focus lies on _predictions_ of the outcome), which |
112 | | -#' are useful if you want realistic predictions of your outcome, assuming that |
113 | | -#' the sample is representative for a special population (option `"average"`), |
114 | | -#' or useful for "what-if" scenarios, especially if you want to make unbiased |
115 | | -#' comparisons (G-computation, option `"population"`) |
116 | | -#' - The average individual from the sample: obtained with |
117 | | -#' `estimate_means(..., estimate = "average")` |
118 | | -#' - The broader, hypothetical target population: obtained with |
119 | | -#' `estimate_means(..., estimate = "population")` |
| 61 | +#' @param estimate The `estimate` argument determines how predictions are |
| 62 | +#' averaged ("marginalized") over variables not specified in `by` or `contrast` |
| 63 | +#' (non-focal predictors). It controls whether predictions represent a "typical" |
| 64 | +#' individual, an "average" individual from the sample, or an "average" |
| 65 | +#' individual from a broader population. |
| 66 | +#' - `"typical"` (Default): Calculates predictions for a balanced data grid |
| 67 | +#' representing all combinations of focal predictor levels (specified in `by`). |
| 68 | +#' For non-focal numeric predictors, it uses the mean; for non-focal |
| 69 | +#' categorical predictors, it marginalizes (averages) over the levels. This |
| 70 | +#' represents a "typical" observation based on the data grid and is useful for |
| 71 | +#' comparing groups. It answers: "What would the average outcome be for a |
| 72 | +#' 'typical' observation?". This is the default approach when estimating |
| 73 | +#' marginal means using the *emmeans* package. |
| 74 | +#' - `"average"`: Calculates predictions for each observation in the sample and |
| 75 | +#' then averages these predictions within each group defined by the focal |
| 76 | +#' predictors. This reflects the sample's actual distribution of non-focal |
| 77 | +#' predictors, not a balanced grid. It answers: "What is the predicted value |
| 78 | +#' for an average observation in my data?" |
| 79 | +#' - `"population"`: "Clones" each observation, creating copies with all |
| 80 | +#' possible combinations of focal predictor levels. It then averages the |
| 81 | +#' predictions across these "counterfactual" observations (non-observed |
| 82 | +#' permutations) within each group. This extrapolates to a hypothetical |
| 83 | +#' broader population, considering "what if" scenarios. It answers: "What is |
| 84 | +#' the predicted response for the 'average' observation in a broader possible |
| 85 | +#' target population? This approach entails more assumptions about the |
| 86 | +#' likelihood of different combinations, but can be more apt to generalize. |
120 | 87 | #' |
121 | 88 | #' You can set a default option for the `estimate` argument via `options()`, |
122 | 89 | #' e.g. `options(modelbased_estimate = "average")` |
|
0 commit comments