Skip to content

Releases: tidyverse/tidyr

tidyr 1.3.1

24 Jan 18:11
Compare
Choose a tag to compare
  • pivot_wider now uses .by and |> syntax for the dplyr helper message to
    identify duplicates (@boshek, #1516)

tidyr 1.3.0

24 Jan 21:16
Compare
Choose a tag to compare

New features

  • New family of consistent string separating functions:
    separate_wider_delim(), separate_wider_position(),
    separate_wider_regex(), separate_longer_delim(), and
    separate_longer_position(). These functions are thorough refreshes of
    separate() and extract(), featuring improved performance, greater
    consistency, a polished API, and a new approach for handling problems. They
    use stringr and supersede extract(), separate(), and separate_rows()
    (#1304).

  • nest() gains a .by argument which allows you to specify the columns to
    nest by (rather than the columns to nest, i.e. through ...). Additionally,
    the .key argument is no longer deprecated, and is used whenever ... isn't
    specified (#1458).

  • unnest_longer() gains a keep_empty argument like unnest() (#1339).

  • pivot_longer() gains a cols_vary argument for controlling the ordering of
    the output rows relative to their original row number (#1312).

  • New datasets who2, household, cms_patient_experience, and
    cms_patient_care to demonstrate various tidying challenges (#1333).

Breaking changes

  • The ... argument of both pivot_longer() and pivot_wider() has been
    moved to the front of the function signature, after the required arguments
    but before the optional ones. Additionally, pivot_longer_spec(),
    pivot_wider_spec(), build_longer_spec(), and build_wider_spec() have
    all gained ... arguments in a similar location. This change allows us to
    more easily add new features to the pivoting functions without breaking
    existing CRAN packages and user scripts.

    pivot_wider() provides temporary backwards compatible support for the case
    of a single unnamed argument that previously was being positionally matched to
    id_cols. This one special case still works, but will throw a warning
    encouraging you to explicitly name the id_cols argument.

    To read more about this pattern, see
    Data, dots, details in the
    tidyverse design guide (#1350).

Lifecycle changes

  • All functions deprecated in tidyr 1.0 and 1.2 (the old lazyeval functions
    ending in _ and various arguments to unnest()) now warn on every use.
    They will be made defunct in 2024 (#1406).

Rectangling

  • unnest_longer() now consistently drops rows with either NULL or empty
    vectors (like integer()) by default. Set the new keep_empty argument to
    TRUE to retain them. Previously, keep_empty = TRUE was implicitly being
    used for NULL, while keep_empty = FALSE was being used for empty vectors,
    which was inconsistent with all other tidyr verbs with this argument (#1363).

  • unnest_longer() now uses "" in the index column for fully unnamed
    vectors. It also now consistently uses NA in the index column for empty
    vectors that are "kept" by keep_empty = TRUE (#1442).

  • unnest_wider() now errors if any values being unnested are unnamed and
    names_sep is not provided (#1367).

  • unnest_wider() now generates automatic names for partially unnamed
    vectors. Previously it only generated them for fully unnamed vectors,
    resulting in a strange mix of automatic names and name-repaired names (#1367).

Bug fixes and minor improvements

General

  • Most tidyr functions now consistently disallow renaming during tidy-selection.
    Renaming was never meaningful in these functions, and previously either had no
    effect or caused problems (#1449, #1104).

  • tidyr errors (including input validation) have been thoroughly reviewed
    and should generally be more likely to point you in the right direction
    (#1313, #1400).

  • uncount() is now generic so implementations can be provided for objects
    other than data frames (@mgirlich, #1358).

  • uncount() gains a ... argument. It comes between the required and the
    optional arguments (@mgirlich, #1358).

  • nest(), complete(), expand(), and fill() now document their support
    for grouped data frames created by dplyr::group_by() (#952).

  • All built in datasets are now standard tibbles (#1459).

  • R >=3.4.0 is now required, in line with the tidyverse standard of supporting
    the previous 5 minor releases of R.

  • rlang >=1.0.4 and vctrs >=0.5.2 are now required (#1344, #1470).

  • Removed dependency on ellipsis in favor of equivalent functions in rlang
    (#1314).

Nesting, packing, and chopping

  • unnest(), unchop(), unnest_longer(), and unnest_wider() better handle
    lists with additional classes (#1327).

  • pack(), unpack(), chop(), and unchop() all gain an error_call
    argument, which in turn improves some of the error calls shown in nest()
    and various unnest() adjacent functions (#1446).

  • chop(), unpack(), and unchop() all gain ..., which must be empty
    (#1447).

  • unpack() does a better job of reporting column name duplication issues and
    gives better advice about how to resolve them using names_sep. This also
    improves errors from functions that use unpack(), like unnest() and
    unnest_wider() (#1425, #1367).

Pivoting

  • pivot_longer() no longer supports interpreting values_ptypes = list()
    and names_ptypes = list() as NULL. An empty list() is now interpreted as
    a <list> prototype to apply to all columns, which is consistent with how any
    other 0-length value is interpreted (#1296).

  • pivot_longer(values_drop_na = TRUE) is faster when there aren't any missing
    values to drop (#1392, @mgirlich).

  • pivot_longer() is now more memory efficient due to the usage of
    vctrs::vec_interleave() (#1310, @mgirlich).

  • pivot_longer() now throws a slightly better error message when
    values_ptypes or names_ptypes is provided and the coercion can't be made
    (#1364).

  • pivot_wider() now throws a better error message when a column selected by
    names_from or values_from is also selected by id_cols (#1318).

  • pivot_wider() is now faster when names_sep is provided (@mgirlich, #1426).

  • pivot_longer_spec(), pivot_wider_spec(), build_longer_spec(), and
    build_wider_spec() all gain an error_call argument, resulting in better
    error reporting in pivot_longer() and pivot_wider() (#1408).

Missing values

  • fill() now works correctly when there is a column named .direction in
    data (#1319, @tjmahr).

  • replace_na() is faster when there aren't any missing values to replace
    (#1392, @mgirlich).

  • The documentation of the replace argument of replace_na() now mentions
    that replace is always cast to the type of data (#1317).

tidyr 1.2.1

09 Sep 14:51
Compare
Choose a tag to compare
  • Hot patch release to resolve R CMD check failures.

tidyr 1.2.0

02 Feb 01:34
6152aae
Compare
Choose a tag to compare

Pivoting

  • pivot_wider() gains new names_expand and id_expand arguments for turning
    implicit missing factor levels and variable combinations into explicit ones.
    This is similar to the drop argument from spread() (#770).

  • pivot_wider() gains a new names_vary argument for controlling the
    ordering when combining names_from values with values_from column names
    (#839).

  • pivot_wider() gains a new unused_fn argument for controlling how to
    summarize unused columns that aren't involved in the pivoting process (#990,
    thanks to @mgirlich for an initial implementation).

  • pivot_longer()'s names_transform and values_transform arguments now
    accept a single function which will be applied to all of the columns
    (#1284, thanks to @smingerson for an initial implementation).

  • pivot_longer()'s names_ptypes and values_ptypes arguments now
    accept a single empty ptype which will be applied to all of the columns
    (#1284).

Nesting

  • unnest() and unchop()'s ptype argument now accepts a single empty
    ptype which will be applied to all cols (#1284).

  • unpack() now silently skips over any non-data frame columns specified by
    cols. This matches the existing behavior of unchop() and unnest()
    (#1153).

Rectangling

  • unnest_wider() and unnest_longer() can now unnest multiple columns at
    once (#740).

  • unnest_longer()'s indices_to and values_to arguments now accept
    a glue specification, which is useful when unnesting multiple columns.

  • For hoist(), unnest_longer(), and unnest_wider(), if a ptype is
    supplied, but that column can't be simplified, the result will be a list-of
    column where each element has type ptype (#998).

  • unnest_wider() gains a new strict argument which controls whether or not
    strict vctrs typing rules should be applied. It defaults to FALSE for
    backwards compatibility, and because it is often more useful to be lax
    when unnesting JSON, which doesn't always map one-to-one with R's types
    (#1125).

  • hoist(), unnest_longer(), and unnest_wider()'s simplify argument now
    accepts a named list of TRUE or FALSE to control simplification on a per
    column basis (#995).

  • hoist(), unnest_longer(), and unnest_wider()'s transform argument now
    accepts a single function which will be applied to all components (#1284).

  • hoist(), unnest_longer(), and unnest_wider()'s ptype argument now
    accepts a single empty ptype which will be applied to all components (#1284).

Grids

  • complete() gains a new explicit argument for limiting fill to only
    implicit missing values. This is useful if you don't want to fill in
    pre-existing missing values (#1270).

  • complete() gains a grouped data frame method. This generates a more correct
    completed data frame when groups are involved (#396, #966).

  • complete() and expand() no longer allow you to complete or expand on a
    grouping column. This was never well-defined since completion/expansion on a
    grouped data frame happens "within" each group and otherwise has the
    potential to produce erroneous results (#1299).

Missing values

  • drop_na(), replace_na(), and fill() have been updated to utilize vctrs.
    This means that you can use these functions on a wider variety of column
    types, including lubridate's Period types (#1094), data frame columns, and
    the rcrd type from vctrs.

  • replace_na() no longer allows the type of data to change when the
    replacement is applied. replace will now always be cast to the type of
    data before the replacement is made. For example, this means that using a
    replacement value of 1.5 on an integer column is no longer allowed.
    Similarly, replacing missing values in a list-column must now be done with
    list("foo") rather than just "foo".

  • replace_na() no longer replaces empty atomic elements in list-columns
    (like integer(0)). The only value that is replaced in a list-column is
    NULL (#1168).

  • drop_na() no longer drops empty atomic elements from list-columns
    (like integer(0)). The only value that is dropped in a list-column is
    NULL (#1228).

Bug fixes and minor improvements

General

  • @mgirlich is now a tidyr author in recognition of his significant and
    sustained contributions.

  • All lazyeval variants of tidyr verbs have been soft-deprecated. Expect them
    to move to the defunct stage in the next minor release of tidyr (#1294).

  • any_of() and all_of() from tidyselect are now re-exported (#1217).

  • dplyr >= 1.0.0 is now required.

Pivoting

  • pivot_wider() now gives better advice about how to identify duplicates when
    values are not uniquely identified (#1113).

  • pivot_wider() now throws a more informative error when values_fn doesn't
    result in a single summary value (#1238).

  • pivot_wider() and pivot_longer() now generate more informative
    errors related to name repair (#987).

  • pivot_wider() now works correctly when values_fill is a data frame.

  • pivot_wider() no longer accidentally retains values_from when pivoting
    a zero row data frame (#1249).

  • pivot_wider() now correctly handles the case where an id column name
    collides with a value from names_from (#1107).

  • pivot_wider() and pivot_longer() now both check that the spec columns
    .name and .value are character vectors. Additionally, the .name
    column must be unique (#1107).

  • pivot_wider()'s names_from and values_from arguments are now
    required if their default values of name and value don't correspond to
    columns in data. Additionally, they must identify at least 1 column
    in data (#1240).

  • pivot_wider()'s values_fn argument now correctly allows anonymous
    functions (#1114).

  • pivot_wider_spec() now works correctly with a 0-row data frame and a spec
    that doesn't identify any rows (#1250, #1252).

  • pivot_longer()'s names_ptypes argument is now applied after
    names_transform for consistency with the rectangling functions
    (i.e. hoist()) (#1233).

  • check_pivot_spec() is a new developer facing function for validating a pivot
    spec argument. This is only useful if you are extending pivot_longer() or
    pivot_wider() with new S3 methods (#1087).

Nesting

  • The nest() generic now avoids computing on .data, making it more
    compatible with lazy tibbles (#1134).

  • The .names_sep argument of the data.frame method for nest() is now
    actually used (#1174).

  • unnest()'s ptype argument now works as expected (#1158).

  • unpack() no longer drops empty columns specified through cols (#1191).

  • unpack() now works correctly with data frame columns containing 1 row but
    0 columns (#1189).

  • chop() now works correctly with data frames with 0 rows (#1206).

  • chop()'s cols argument is no longer optional. This matches the
    behavior of cols seen elsewhere in tidyr (#1205).

  • unchop() now respects ptype when unnesting a non-list column (#1211).

Rectangling

  • hoist() no longer accidentally removes elements that have duplicated names
    (#1259).

Grids

  • The grouped data frame methods for complete() and expand() now move the
    group columns to the front of the result (in addition to the columns you
    completed on or expanded, which were already moved to the front). This should
    make more intuitive sense, as you are completing or expanding "within" each
    group, so the group columns should be the first thing you see (#1289).

  • complete() now applies fill even when no columns to complete are
    specified (#1272).

  • expand(), crossing(), and nesting() now correctly retain NA values of
    factors (#1275).

  • expand_grid(), expand(), nesting(), and crossing() now silently
    apply name repair to automatically named inputs. This avoids a number of
    issues resulting from duplicate truncated names
    (#1116, #1221, #1092, #1037, #992).

  • expand_grid(), expand(), nesting(), and crossing() now allow
    columns from unnamed data frames to be used in expressions after that
    data frame was specified, like expand_grid(tibble(x = 1), y = x). This
    is more consistent with how tibble() behaves.

  • expand_grid(), expand(), nesting(), and crossing() now work
    correctly with data frames containing 0 columns but >0 rows (#1189).

  • expand_grid(), expand(), nesting(), and crossing() now return a 1
    row data frame when no inputs are supplied, which is more consistent with
    prod() == 1L and the idea that computations involving the number of
    combinations computed from an empty set should return 1 (#1258).

Missing values

  • drop_na() no longer drops missing values from all columns when a tidyselect
    expression that results in 0 columns being selected is used (#1227).

  • fill() now treats NaN like any other missing value (#982).

tidyr 1.1.4

27 Sep 21:20
Compare
Choose a tag to compare
  • expand_grid() is now about twice as fast and pivot_wider() is a bit faster
    (@mgirlich, #1130).

  • unchop() is now much faster, which propagates through to various functions,
    such as unnest(), unnest_longer(), unnest_wider(), and
    separate_rows() (@mgirlich, @DavisVaughan, #1127).

  • unnest() is now much faster (@mgirlich, @DavisVaughan, #1127).

  • unnest() no longer allows unnesting a list-col containing a mix of vector
    and data frame elements. Previously, this only worked by accident, and is
    considered an off-label usage of unnest() that has now become an error.

tidyr 1.1.3

03 Mar 12:39
Compare
Choose a tag to compare
  • tidyr verbs no longer have "default" methods for lazyeval fallbacks. This
    means that you'll get clearer error messages (#1036).

  • uncount() error for non-integer weights and gives a clearer error message
    for negative weights (@mgirlich, #1069).

  • You can once again unnest dates (#1021, #1089).

  • pivot_wider() works with data.table and empty key variables (@mgirlich, #1066).

  • separate_rows() works for factor columns (@mgirlich, #1058).

tidyr 1.1.2

27 Aug 12:27
Compare
Choose a tag to compare
  • separate_rows() returns to 1.1.0 behaviour for empty strings
    (@rjpatm, #1014).

tidyr 1.1.1

31 Jul 15:16
Compare
Choose a tag to compare

tidyr 1.1.0

20 May 13:24
Compare
Choose a tag to compare

General features

  • pivot_longer(), hoist(), unnest_wider(), and unnest_longer() gain
    new transform arguments; these allow you to transform values "in flight".
    They are partly needed because vctrs coercion rules have become stricter,
    but they give you greater flexibility than was available previously (#921).

  • Arguments that use tidy selection syntax are now clearly documented and
    have been updated to use tidyselect 1.1.0 (#872).

Pivoting improvements

  • Both pivot_wider() and pivot_longer() are considerably more performant,
    thanks largely to improvements in the underlying vctrs code
    (#790, @DavisVaughan).

  • pivot_longer() now supports names_to = character() which prevents the
    name column from being created (#961).

    df <- tibble(id = 1:3, x_1 = 1:3, x_2 = 4:6)  
    df %>% pivot_longer(-id, names_to = character())
    
  • pivot_longer() no longer creates a .copy variable in the presence of
    duplicate column names. This makes it more consistent with the handling
    of non-unique specs.

  • pivot_longer() automatically disambiguates non-unique ouputs, which can
    occur when the input variables include some additional component that you
    don't care about and want to discard (#792, #793).

    df <- tibble(id = 1:3, x_1 = 1:3, x_2 = 4:6)  
    df %>% pivot_longer(-id, names_pattern = "(.)_.")
    df %>% pivot_longer(-id, names_sep = "_", names_to = c("name", NA))
    df %>% pivot_longer(-id, names_sep = "_", names_to = c(".value", NA))
    
  • pivot_wider() gains a names_sort argument which allows you to sort
    column names in order. The default, FALSE, orders columms by their
    first appearance (#839). In a future version, I'll consider changing the
    default to TRUE.

  • pivot_wider() gains a names_glue argument that allows you to construct
    output column names with a glue specification.

  • pivot_wider() arguments values_fn and values_fill can now be single
    values; you now only need to use a named list if you want to use different
    values for different value columns (#739, #746). They also get improved
    errors if they're not of the expected type.

Rectangling

  • hoist() now automatically names pluckers that are a single string (#837).
    It error if you use duplicated column names (@mgirlich, #834), and now uses
    rlang::list2() behind the scenes (which means that you can now use !!!
    and :=) (#801).

  • unnest_longer(), unnest_wider(), and hoist() do a better job
    simplifying list-cols. They no longer add unneeded unspecified() when
    the result is still a list (#806), and work when the list contains
    non-vectors (#810, #848).

  • unnest_wider(names_sep = "") now provides default names for unnamed inputs,
    suppressing the many previous name repair messages (#742).

Nesting

  • pack() and nest() gains a .names_sep argument allows you to strip outer
    names from inner names, in symmetrical way to how the same argument to
    unpack() and unnest() combines inner and outer names (#795, #797).

  • unnest_wider() and unnest_longer() can now unnest list_of columns. This
    is important for unnesting columns created from nest() and with
    pivot_wider(), which will create list_of columns if the id columns are
    non-unique (#741).

Bug fixes and minor improvements

  • chop() now creates list-columns of class vctrs::list_of(). This helps
    keep track of the type in case the chopped data frame is empty, allowing
    unchop() to reconstitute a data frame with the correct number and types
    of column even when there are no observations.

  • drop_na() now preserves attributes of unclassed vectors (#905).

  • expand(), expand_grid(), crossing(), and nesting() once again
    evaluate their inputs iteratively, so you can refer to freshly created
    columns, e.g. crossing(x = seq(-2, 2), y = x) (#820).

  • expand(), expand_grid(), crossing(), and nesting() gain a
    .name_repair giving you control over their name repair strategy
    (@jeffreypullin, #798).

  • extract() lets you use NA in into, as documented (#793).

  • extract(), separate(), hoist(), unnest_longer(), and unnest_wider()
    give a better error message if col is missing (#805).

  • pack()'s first argument is now .data instead of data (#759).

  • pivot_longer() now errors if values_to is not a length-1 character vector
    (#949).

  • pivot_longer() and pivot_wider() are now generic so implementations
    can be provided for objects other than data frames (#800).

  • pivot_wider() can now pivot data frame columns (#926)

  • unite(na.rm = TRUE) now works for all types of variable, not just character
    vectors (#765).

  • unnest_wider() gives a better error message if you attempt to unnest
    multiple columns (#740).

  • unnest_auto() works when the input data contains a column called col
    (#959).

tidyr 1.0.3

07 May 12:03
ca5ed76
Compare
Choose a tag to compare

Compatibility with vctrs 0.3.0. Note that because of stricter semantics in vctrs, the names_ptypes and values_ptypes arguments have become less useful. We will fix this in the next version with more general transformation arguments.