hoist {tidyr} | R Documentation |
hoist()
, unnest_longer()
, and unnest_wider()
provide tools for
rectangling, collapsing deeply nested lists into regular columns.
hoist()
allows you to selectively pull components of a list-column out
in to their own top-level columns, using the same syntax as purrr::pluck()
.
unnest_wider()
turns each element of a list-column into a column, and
unnest_longer()
turns each element of a list-column into a row.
unnest_auto()
picks between unnest_wider()
or unnest_longer()
based on heuristics described below.
Learn more in vignette("rectangle")
.
hoist(
.data,
.col,
...,
.remove = TRUE,
.simplify = TRUE,
.ptype = NULL,
.transform = NULL
)
unnest_longer(
data,
col,
values_to = NULL,
indices_to = NULL,
indices_include = NULL,
names_repair = "check_unique",
simplify = TRUE,
ptype = NULL,
transform = NULL
)
unnest_wider(
data,
col,
names_sep = NULL,
simplify = TRUE,
strict = FALSE,
names_repair = "check_unique",
ptype = NULL,
transform = NULL
)
unnest_auto(data, col)
.data, data |
A data frame. |
.col, col |
List-column to extract components from. For For |
... |
Components of The column names must be unique in a call to |
.remove |
If |
.simplify, simplify |
If |
.ptype, ptype |
Optionally, a named list of prototypes declaring the desired output type of each component. Alternatively, a single empty prototype can be supplied, which will be applied to all components. Use this argument if you want to check that each element has the type you expect when simplifying. If a |
.transform, transform |
Optionally, a named list of transformation functions applied to each component. Alternatively, a single function can be supplied, which will be applied to all components. Use this argument if you want to transform or parse individual elements as they are extracted. When both |
values_to |
A string giving the column name (or names) to store the
unnested values in. If multiple columns are specified in |
indices_to |
A string giving the column name (or names) to store the
the inner names or positions (if not named) of the values. If multiple
columns are specified in |
indices_include |
A single logical value specifying whether or not to
add an index column. If any value has inner names, the index column will be
a character vector of those names, otherwise it will be an integer vector
of positions. If If |
names_repair |
Used to check that output data frame has valid names. Must be one of the following options:
See |
names_sep |
If If the values being unnested are unnamed and |
strict |
A single logical specifying whether or not to apply strict
vctrs typing rules. If |
The three unnest()
functions differ in how they change the shape of the
output data frame:
unnest_wider()
preserves the rows, but changes the columns.
unnest_longer()
preserves the columns, but changes the rows
unnest()
can change both rows and columns.
These principles guide their behaviour when they are called with a
non-primary data type. For example, if you unnest_wider()
a list of data
frames, the number of rows must be preserved, so each column is turned into
a list column of length one. Or if you unnest_longer()
a list of data
frames, the number of columns must be preserved so it creates a packed
column. I'm not sure how if these behaviours are useful in practice, but
they are theoretically pleasing.
unnest_auto()
heuristicsunnest_auto()
inspects the inner names of the list-col:
If all elements are unnamed, it uses
unnest_longer(indices_include = FALSE)
.
If all elements are named, and there's at least one name in
common across all components, it uses unnest_wider()
.
Otherwise, it falls back to unnest_longer(indices_include = TRUE)
.
For complex inputs where you need to rectangle a nested list according to a specification, see the tibblify package.
df <- tibble(
character = c("Toothless", "Dory"),
metadata = list(
list(
species = "dragon",
color = "black",
films = c(
"How to Train Your Dragon",
"How to Train Your Dragon 2",
"How to Train Your Dragon: The Hidden World"
)
),
list(
species = "blue tang",
color = "blue",
films = c("Finding Nemo", "Finding Dory")
)
)
)
df
# Turn all components of metadata into columns
df %>% unnest_wider(metadata)
# Choose not to simplify list-cols of length-1 elements
df %>% unnest_wider(metadata, simplify = FALSE)
df %>% unnest_wider(metadata, simplify = list(color = FALSE))
# Extract only specified components
df %>% hoist(metadata,
"species",
first_film = list("films", 1L),
third_film = list("films", 3L)
)
df %>%
unnest_wider(metadata) %>%
unnest_longer(films)
# unnest_longer() is useful when each component of the list should
# form a row
df <- tibble(
x = 1:3,
y = list(NULL, 1:3, 4:5)
)
df %>% unnest_longer(y)
# Automatically creates names if widening
df %>% unnest_wider(y)
# But you'll usually want to provide names_sep:
df %>% unnest_wider(y, names_sep = "_")
# And similarly if the vectors are named
df <- tibble(
x = 1:2,
y = list(c(a = 1, b = 2), c(a = 10, b = 11, c = 12))
)
df %>% unnest_wider(y)
df %>% unnest_longer(y)
# Both unnest_wider() and unnest_longer() allow you to unnest multiple
# columns at once. This is particularly useful with unnest_longer(), where
# unnesting sequentially would generate a cartesian product of the rows.
df <- tibble(
x = 1:2,
y = list(1:2, 3:4),
z = list(5:6, 7:8)
)
unnest_longer(df, c(y, z))
unnest_longer(unnest_longer(df, y), z)
# With JSON, it is common for empty elements to be represented by `list()`
# rather then their typed equivalent, like `integer()`
json <- list(
list(x = 1:2, y = 1:2),
list(x = list(), y = 3:4),
list(x = 3L, y = list())
)
df <- tibble(json = json)
# The defaults of `unnest_wider()` treat empty types (like `list()`) as `NULL`.
# This chains nicely into `unnest_longer()`.
wide <- unnest_wider(df, json)
wide
unnest_longer(wide, c(x, y))
# To instead enforce strict vctrs typing rules, use `strict`
wide_strict <- unnest_wider(df, json, strict = TRUE)
wide_strict
try(unnest_longer(wide_strict, c(x, y)))