hoist {tidyr}R Documentation

Rectangle a nested list into a tidy tibble

Description

Maturing lifecycle

hoist(), unnest_longer(), and unnest_wider() provide tools for rectangling, collapsing deeply nested lists into regular columns. hoist() allows you to selectively pull components of a list-column out in to their own top-level columns, using the same syntax as purrr::pluck(). unnest_wider() turns each element of a list-column into a column, and unnest_longer() turns each element of a list-column into a row. unnest_auto() picks between unnest_wider() or unnest_longer() based heuristics described below.

Learn more in vignette("rectangle").

Usage

hoist(.data, .col, ..., .remove = TRUE, .simplify = TRUE,
  .ptype = list())

unnest_longer(data, col, values_to = NULL, indices_to = NULL,
  indices_include = NULL, names_repair = "check_unique",
  simplify = TRUE, ptype = list())

unnest_wider(data, col, names_sep = NULL, simplify = TRUE,
  names_repair = "check_unique", ptype = list())

unnest_auto(data, col)

Arguments

.data, data

A data frame.

.col, col

List-column to extract components from.

...

Components of .col to turn into columns in the form col_name = "pluck_specification". You can pluck by name with a character vector, by position with an integer vector, or with a combination of the two with a list. See purrr::pluck() for details.

.remove

If TRUE, the default, will remove extracted components from .col. This ensures that each value lives only in one place.

.simplify

If TRUE, will attempt to simplify lists of length-1 vectors to an atomic vector

.ptype

Optionally, a named list of prototypes declaring the desired output type of each component.

values_to

Name of column to store vector values. Defaults to col.

indices_to

A string giving the name of column which will contain the inner names or position (if not named) of the values. Defaults to col with _id suffix

indices_include

Add an index column? Defaults to TRUE when col has inner names.

names_repair

Used to check that output data frame has valid names. Must be one of the following options:

  • "minimal": no name repair or checks, beyond basic existence,

  • "unique": make sure names are unique and not empty,

  • "check_unique": (the default), no name repair, but check they are unique,

  • "universal": make the names unique and syntactic

  • a function: apply custom name repair.

  • tidyr_legacy: use the name repair from tidyr 0.8.

  • a formula: a purrr-style anonymous function (see rlang::as_function())

See vctrs::vec_as_names() for more details on these terms and the strategies used to enforce them.

simplify

If TRUE, will attempt to simplify lists of length-1 vectors to an atomic vector

ptype

Optionally, supply a data frame prototype for the output cols, overriding the default that will be guessed from the combination of individual values.

names_sep

If NULL, the default, the names of new columns will come directly from the inner data frame.

If a string, the names of the new columns will be formed by pasting together the outer column name with the inner names, separated by names_sep.

Unnest variants

The three unnest() functions differ in how they change the shape of the output data frame:

These principles guide their behaviour when they are called with a non-primary data type. For example, if you unnest_wider() a list of data frames, the number of rows must be preserved, so each column is turned into a list column of length one. Or if you unnest_longer() a list of data frame, the number of columns must be preserved so it creates a packed column. I'm not sure how if these behaviours are useful in practice, but they are theoretically pleasing.

unnest_auto() heuristics

unnest_auto() inspects the inner names of the list-col:

Examples

df <- tibble(
  character = c("Toothless", "Dory"),
  metadata = list(
    list(
      species = "dragon",
      color = "black",
      films = c(
        "How to Train Your Dragon",
        "How to Train Your Dragon 2",
        "How to Train Your Dragon: The Hidden World"
       )
    ),
    list(
      species = "clownfish",
      color = "blue",
      films = c("Finding Nemo", "Finding Dory")
    )
  )
)
df

# Turn all components of metadata into columns
df %>% unnest_wider(metadata)

# Extract only specified components
df %>% hoist(metadata,
  species = "species",
  first_film = list("films", 1L),
  third_film = list("films", 3L)
)

df %>%
  unnest_wider(metadata) %>%
  unnest_longer(films)
# unnest_longer() is useful when each component of the list should
# form a row
df <- tibble(
  x = 1:3,
  y = list(NULL, 1:3, 4:5)
)
df %>% unnest_longer(y)
# Automatically creates names if widening
df %>% unnest_wider(y)

# And similarly if the vectors are named
df <- tibble(
  x = 1:2,
  y = list(c(a = 1, b = 2), c(a = 10, b = 11, c = 12))
)
df %>% unnest_wider(y)
df %>% unnest_longer(y)


[Package tidyr version 1.0.0 Index]