R: Complete a data frame with missing combinations of data

complete {tidyr}

R Documentation

Complete a data frame with missing combinations of data

Description

Turns implicit missing values into explicit missing values. This is a wrapper around expand(), dplyr::full_join() and replace_na() that's useful for completing missing combinations of data.

Usage

complete(data, ..., fill = list(), explicit = TRUE)

Arguments

`data`	A data frame.
`...`	Specification of columns to expand. Columns can be atomic vectors or lists. To find all unique combinations of `x`, `y` and `z`, including those not present in the data, supply each variable as a separate argument: `expand(df, x, y, z)`. To find only the combinations that occur in the data, use `nesting`: `expand(df, nesting(x, y, z))`. You can combine the two forms. For example, `expand(df, nesting(school_id, student_id), date)` would produce a row for each present school-student combination for all possible dates. When used with factors, `expand()` uses the full set of levels, not just those that appear in the data. If you want to use only the values seen in the data, use `forcats::fct_drop()`. When used with continuous variables, you may need to fill in values that do not appear in the data: to do so use expressions like `year = 2010:2020` or `year = full_seq(year,1)`.
`fill`	A named list that for each variable supplies a single value to use instead of `NA` for missing combinations.
`explicit`	Should both implicit (newly created) and explicit (pre-existing) missing values be filled by `fill`? By default, this is `TRUE`, but if set to `FALSE` this will limit the fill to only implicit missing values.

Details

With grouped data frames, complete() operates within each group. Because of this, you cannot complete a grouping column.

Examples

Run examples

library(dplyr, warn.conflicts = FALSE)

df <- tibble(
  group = c(1:2, 1, 2),
  item_id = c(1:2, 2, 3),
  item_name = c("a", "a", "b", "b"),
  value1 = c(1, NA, 3, 4),
  value2 = 4:7
)
df

# Generate all possible combinations of `group`, `item_id`, and `item_name`
# (whether or not they appear in the data)
complete(df, group, item_id, item_name)

# Cross all possible `group` values with the unique pairs of
# `(item_id, item_name)` that already exist in the data
complete(df, group, nesting(item_id, item_name))

# Within each `group`, generate all possible combinations of
# `item_id` and `item_name` that occur in that group
df %>%
  group_by(group) %>%
  complete(item_id, item_name)

# You can also choose to fill in missing values. By default, both implicit
# (new) and explicit (pre-existing) missing values are filled.
complete(
  df,
  group,
  nesting(item_id, item_name),
  fill = list(value1 = 0, value2 = 99)
)

# You can limit the fill to only implicit missing values by setting
# `explicit` to `FALSE`
complete(
  df,
  group,
  nesting(item_id, item_name),
  fill = list(value1 = 0, value2 = 99),
  explicit = FALSE
)

[Package tidyr version 1.2.0 Index]