Skip to content

Commit 273c68f

Browse files
committed
revise docs
1 parent e890354 commit 273c68f

File tree

4 files changed

+104
-254
lines changed

4 files changed

+104
-254
lines changed

R/estimate_means.R

Lines changed: 26 additions & 59 deletions
Original file line numberDiff line numberDiff line change
@@ -58,65 +58,32 @@
5858
#' might produce biased predictions. In particular for mixed models, using
5959
#' `"response"` is recommended, because averaging across random effects groups
6060
#' is more accurate.
61-
#' @param estimate Character string, indicating the type of target population
62-
#' predictions refer to. This dictates how the predictions are "averaged" over
63-
#' the non-focal predictors, i.e. those variables that are not specified in
64-
#' `by` or `contrast`. We can roughly distinguish between "modelbased" and
65-
#' "empirical" predictions.
66-
#' - `"typical"` (default): Predictions are made for observations that are
67-
#' represented by a data grid, which is built from all combinations of the
68-
#' predictor levels in `by` (the focal predictors). `"typical"` then takes the
69-
#' mean value for non-focal numeric predictors and marginalizes over the
70-
#' factor levels of non-focal predictors, which computes a kind of "weighted
71-
#' average" for the values at which these terms are hold constant. These
72-
#' predictions are useful for comparing defined "groups" and are still a good
73-
#' representation of the sample, because all possible values and levels of the
74-
#' non-focal predictors are considered (averaged over). It answers the
75-
#' question, "What would be the average outcome for a 'typical' observation?",
76-
#' where 'typical' refers to subjects represented by (i.e., that share the
77-
#' characteristics from) the data grid. This approach is the one taken by
78-
#' default in the `emmeans` package.
79-
#' - `"average"`: Predictions are made for each observation in the sample. Then,
80-
#' the average of all predictions is calculated within all groups (or levels)
81-
#' of the focal predictors defined in `by`. These predictions are the closest
82-
#' representation of the sample, because `estimate = "average"` averages
83-
#' across the full sample, where groups (in `by`) are not represented by a
84-
#' balanced data grid, but rather the empirical distributions of the
85-
#' characteristics of the sample. It answers the question, "What is the
86-
#' predicted value for an average observation (from a certain group in `by`)
87-
#' in my data?".
88-
#' - `"population"`: Each observation is "cloned" multiple times, where each
89-
#' duplicate gets one of the levels from the focal predictors in `by`. We then
90-
#' have one "original" and several copies of that original, each varying in
91-
#' the levels of the focal predictors. Hence, the sample is replicated
92-
#' multiple times to produce "counterfactuals" and then takes the average of
93-
#' these predicted values (aggregated/grouped by the focal predictors). It can
94-
#' be considered as extrapolation to a hypothetical target population.
95-
#' Counterfactual predictions are useful, insofar as the results can also be
96-
#' transferred to other contexts (Dickerman and Hernan, 2020). It answers the
97-
#' question, "What is the predicted response value for the 'average'
98-
#' observation in *the broader target population*?". It does not only refer to
99-
#' the actual data in your observed sample, but also "what would be if" we had
100-
#' more data, or if we had data from a different sample.
101-
#'
102-
#' In other words, the distinction between estimate types resides in whether
103-
#' the prediction are made for:
104-
#' - *modelbased predictions* (focus lies on _predictors_), which are useful to
105-
#' look at differences between typical groups, or for visualization
106-
#' - A specific individual from the sample (i.e., a specific combination of
107-
#' predictor values for focal and non-focal predictors): this is what is obtained
108-
#' when using [`estimate_relation()`] and the other prediction functions.
109-
#' - A typical individual from the sample: obtained with
110-
#' `estimate_means(..., estimate = "typical")`
111-
#' - *empirical predictions* (focus lies on _predictions_ of the outcome), which
112-
#' are useful if you want realistic predictions of your outcome, assuming that
113-
#' the sample is representative for a special population (option `"average"`),
114-
#' or useful for "what-if" scenarios, especially if you want to make unbiased
115-
#' comparisons (G-computation, option `"population"`)
116-
#' - The average individual from the sample: obtained with
117-
#' `estimate_means(..., estimate = "average")`
118-
#' - The broader, hypothetical target population: obtained with
119-
#' `estimate_means(..., estimate = "population")`
61+
#' @param estimate The `estimate` argument determines how predictions are
62+
#' averaged ("marginalized") over variables not specified in `by` or `contrast`
63+
#' (non-focal predictors). It controls whether predictions represent a "typical"
64+
#' individual, an "average" individual from the sample, or an "average"
65+
#' individual from a broader population.
66+
#' - `"typical"` (Default): Calculates predictions for a balanced data grid
67+
#' representing all combinations of focal predictor levels (specified in `by`).
68+
#' For non-focal numeric predictors, it uses the mean; for non-focal
69+
#' categorical predictors, it marginalizes (averages) over the levels. This
70+
#' represents a "typical" observation based on the data grid and is useful for
71+
#' comparing groups. It answers: "What would the average outcome be for a
72+
#' 'typical' observation?". This is the default approach when estimating
73+
#' marginal means using the *emmeans* package.
74+
#' - `"average"`: Calculates predictions for each observation in the sample and
75+
#' then averages these predictions within each group defined by the focal
76+
#' predictors. This reflects the sample's actual distribution of non-focal
77+
#' predictors, not a balanced grid. It answers: "What is the predicted value
78+
#' for an average observation in my data?"
79+
#' - `"population"`: "Clones" each observation, creating copies with all
80+
#' possible combinations of focal predictor levels. It then averages the
81+
#' predictions across these "counterfactual" observations (non-observed
82+
#' permutations) within each group. This extrapolates to a hypothetical
83+
#' broader population, considering "what if" scenarios. It answers: "What is
84+
#' the predicted response for the 'average' observation in a broader possible
85+
#' target population? This approach entails more assumptions about the
86+
#' likelihood of different combinations, but can be more apt to generalize.
12087
#'
12188
#' You can set a default option for the `estimate` argument via `options()`,
12289
#' e.g. `options(modelbased_estimate = "average")`

man/estimate_contrasts.Rd

Lines changed: 26 additions & 65 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

man/estimate_means.Rd

Lines changed: 26 additions & 65 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)