--- title: "Confidence intervals and tests in emmeans" author: "emmeans package, Version `r packageVersion('emmeans')`" output: emmeans::.emm_vignette vignette: > %\VignetteIndexEntry{Confidence intervals and tests} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, echo = FALSE, results = "hide", message = FALSE} require("emmeans") knitr::opts_chunk$set(fig.width = 4.5, class.output = "ro") ``` ## Contents {#contents} This vignette describes various ways of summarizing `emmGrid` objects. 1. [`summary()`, `confint()`, and `test()`](#summary) 2. [Back-transforming to response scale](#tran) (See also the ["transformations" vignette](transformations.html)) 3. [Multiplicity adjustments](#adjust) 4. [Using "by" variables](#byvars) 5. [Joint (omnibus) tests](#joint) 6. [Testing equivalence, noninferiority, nonsuperiority](#equiv) 7. Graphics (in ["basics" vignette](basics.html#plots)) [Index of all vignette topics](vignette-topics.html) ## `summary()`, `confint()`, and `test()` {#summary} The most important method for `emmGrid` objects is `summary()`. For one thing, it is called by default when you display an `emmeans()` result. The `summary()` function has a lot of options, and the detailed documentation via `help("summary.emmGrid")` is worth a look. For ongoing illustrations, let's re-create some of the objects in the ["basics" vignette](basics.html) for the `pigs` example: ```{r} pigs.lm1 <- lm(log(conc) ~ source + factor(percent), data = pigs) pigs.rg <- ref_grid(pigs.lm1) pigs.emm.s <- emmeans(pigs.rg, "source") ``` Just `summary()` by itself will produce a summary that varies somewhat according to context. It does this by setting different defaults for the `infer` argument, which consists of two logical values, specifying confidence intervals and tests, respectively. [The exception is models fitted using MCMC methods, where `summary()` is diverted to the `hpd.summary()` function, a preferable summary for many Bayesians.] The summary of a newly made reference grid will show just estimates and standard errors, but not confidence intervals or tests (that is, `infer = c(FALSE, FALSE)`). The summary of an `emmeans()` result, as we see above, will have intervals, but no tests (i.e., `infer = c(TRUE, FALSE)`); and the result of a `contrast()` call (see [comparisons and contrasts](comparisons.html)) will show test statistics and *P* values, but not intervals (i.e., `infer = c(FALSE, TRUE)`). There are courtesy methods `confint()` and `test()` that just call `summary()` with the appropriate `infer` setting; for example, ```{r} test(pigs.emm.s) ``` It is not particularly useful, though, to test these EMMs against the default of zero -- which is why tests are not usually shown. It makes a lot more sense to test them against some target concentration, say 40. And suppose we want to do a one-sided test to see if the concentration is greater than 40. Remembering that the response is log-transformed in this model, ```{r} test(pigs.emm.s, null = log(40), side = ">") ``` It is also possible to add calculated columns to the summary, via the `calc` argument. The calculations can include any columns up through `df` in the summary, as well as any variable in the object's `grid` slot. Among the latter are usually weights in a column named `.wgt.`, and we can use that to include sample size in the summary: ```{r} confint(pigs.emm.s, calc = c(n = ~.wgt.)) ``` [Back to Contents](#contents) ## Back-transforming {#tran} Transformations and link functions are supported an several ways in **emmeans**, making this a complex topic worthy of [its own vignette](transformations.html). Here, we show just the most basic approach. Namely, specifying the argument `type = "response"` will cause the displayed results to be back-transformed to the response scale, when a transformation or link function is incorporated in the model. For example, let's try the preceding `test()` call again: ```{r} test(pigs.emm.s, null = log(40), side = ">", type = "response") ``` Note what changes and what doesn't change. In the `test()` call, we *still* use the log of 40 as the null value; `null` must always be specified on the linear-prediction scale, in this case the log. In the output, the displayed estimates, as well as the `null` value, are shown back-transformed. As well, the standard errors are altered (using the delta method). However, the *t* ratios and *P* values are identical to the preceding results. That is, the tests themselves are still conducted on the linear-predictor scale (as is noted in the output). Similar statements apply to confidence intervals on the response scale: ```{r} confint(pigs.emm.s, side = ">", level = .90, type = "response") ``` With `side = ">"`, a *lower* confidence limit is computed on the log scale, then that limit is back-transformed to the response scale. (We have also illustrated how to change the confidence level.) [Back to Contents](#contents) ## Multiplicity adjustments {#adjust} Both tests and confidence intervals may be adjusted for simultaneous inference. Such adjustments ensure that the confidence coefficient for a whole set of intervals is at least the specified level, or to control for multiplicity in a whole family of tests. This is done via the `adjust` argument. For `ref_grid()` and `emmeans()` results, the default is `adjust = "none"`. For most `contrast()` results, `adjust` is often something else, depending on what type of contrasts are created. For example, pairwise comparisons default to `adjust = "tukey"`, i.e., the Tukey HSD method. The `summary()` function sometimes *changes* `adjust` if it is inappropriate. For example, with ```{r} confint(pigs.emm.s, adjust = "tukey") ``` the adjustment is changed to the Sidak method because the Tukey adjustment is inappropriate unless you are doing pairwise comparisons. ####### {#adjmore} An adjustment method that is usually appropriate is Bonferroni; however, it can be quite conservative. Using `adjust = "mvt"` is the closest to being the "exact" all-around method "single-step" method, as it uses the multivariate *t* distribution (and the **mvtnorm** package) with the same covariance structure as the estimates to determine the adjustment. However, this comes at high computational expense as the computations are done using simulation techniques. For a large set of tests (and especially confidence intervals), the computational lag becomes noticeable if not intolerable. For tests, `adjust` increases the *P* values over those otherwise obtained with `adjust = "none"`. Compare the following adjusted tests with the unadjusted ones previously computed. ```{r} test(pigs.emm.s, null = log(40), side = ">", adjust = "bonferroni") ``` [Back to Contents](#contents) ## "By" variables {#byvars} Sometimes you want to break a summary down into smaller pieces; for this purpose, the `by` argument in `summary()` is useful. For example, ```{r} confint(pigs.rg, by = "source") ``` If there is also an `adjust` in force when `by` variables are used, the adjustment is made *separately* on each `by` group; e.g., in the above, we would be adjusting for sets of 4 intervals, not all 12 together. There can be a `by` specification in `emmeans()` (or equivalently, a `|` in the formula); and if so, it is passed on to `summary()` and used unless overridden by another `by`. Here are examples, not run: ```{r eval = FALSE} emmeans(pigs.lm, ~ percent | source) ### same results as above summary(.Last.value, by = percent) ### grouped the other way ``` Specifying `by = NULL` will remove all grouping. ### Simple comparisons {#simple} There is also a `simple` argument for `contrast()` that is in essence the inverse of `by`; the contrasts are run using everything *except* the specified variables as `by` variables. To illustrate, let's consider the model for `pigs` that includes the interaction (so that the levels of one factor compare differently at levels of the other factor). ```{r} pigsint.lm <- lm(log(conc) ~ source * factor(percent), data = pigs) pigsint.rg <- ref_grid(pigsint.lm) contrast(pigsint.rg, "consec", simple = "percent") ``` In fact, we may do *all* one-factor comparisons by specifying `simple = "each"`. This typically produces a lot of output, so use it with care. [Back to Contents](#contents) ## Joint tests {#joint} From the above, we already know how to test individual results. For pairwise comparisons (details in [the "comparisons" vignette](comparisons.html)), we might do ```{r} pigs.prs.s <- pairs(pigs.emm.s) pigs.prs.s ``` But suppose we want an *omnibus* test that all these comparisons are zero. Easy enough, using the `joint` argument in `test` (note: the `joint` argument is *not* available in `summary()`; only in `test()`): ```{r} test(pigs.prs.s, joint = TRUE) ``` Notice that there are three comparisons, but only 2 d.f. for the test, as cautioned in the message. The test produced with `joint = TRUE` is a "type III" test (assuming the default equal weights are used to obtain the EMMs). See more on these types of tests for higher-order effects in the ["interactions" vignette section on contrasts](interactions.html#contrasts). ####### {#joint_tests} For convenience, there is also a `joint_tests()` function that performs joint tests of contrasts among each term in a model or `emmGrid` object. ```{r} joint_tests(pigsint.rg) ``` The tests of main effects are of families of contrasts; those for interaction effects are for interaction contrasts. These results are essentially the same as a "Type-III ANOVA", but may differ in situations where there are empty cells or other non-estimability issues, or if generalizations are present such as unequal weighting. (Another distinction is that sums of squares and mean squares are not shown; that is because these really are tests of contrasts among predictions, and they may or may not correspond to model sums of squares.) One may use `by` variables with `joint_tests`. For example: ```{r} joint_tests(pigsint.rg, by = "source") ``` In some models, it is possible to specify `submodel = "type2"`, thereby obtaining something akin to a Type II analysis of variance. See the [messy-data vignette](messy-data.html#type2submodel) for an example. [Back to Contents](#contents) ## Testing equivalence, noninferiority, and nonsuperiority {#equiv} The `delta` argument in `summary()` or `test()` allows the user to specify a threshold value to use in a test of equivalence, noninferiority, or nonsuperiority. An equivalence test is kind of a backwards significance test, where small *P* values are associated with small differences relative to a specified threshold value `delta`. The help page for `summary.emmGrid` gives the details of these tests. Suppose in the present example, we consider two sources to be equivalent if they are within 25% of each other. We can test this as follows: ```{r} test(pigs.prs.s, delta = log(1.25), adjust = "none") ``` By our 25% standard, the *P* value is quite small for comparing soy and skim, providing statistical evidence that their difference is enough smaller than the threshold to consider them equivalent. [Back to Contents](#contents) ## Graphics {#graphics} Graphical displays of `emmGrid` objects are described in the ["basics" vignette](basics.html#plots) [Index of all vignette topics](vignette-topics.html)