Releases: tidyverse/tidyr
tidyr 1.3.1
tidyr 1.3.0
New features
-
New family of consistent string separating functions:
separate_wider_delim()
,separate_wider_position()
,
separate_wider_regex()
,separate_longer_delim()
, and
separate_longer_position()
. These functions are thorough refreshes of
separate()
andextract()
, featuring improved performance, greater
consistency, a polished API, and a new approach for handling problems. They
use stringr and supersedeextract()
,separate()
, andseparate_rows()
(#1304). -
nest()
gains a.by
argument which allows you to specify the columns to
nest by (rather than the columns to nest, i.e. through...
). Additionally,
the.key
argument is no longer deprecated, and is used whenever...
isn't
specified (#1458). -
unnest_longer()
gains akeep_empty
argument likeunnest()
(#1339). -
pivot_longer()
gains acols_vary
argument for controlling the ordering of
the output rows relative to their original row number (#1312). -
New datasets
who2
,household
,cms_patient_experience
, and
cms_patient_care
to demonstrate various tidying challenges (#1333).
Breaking changes
-
The
...
argument of bothpivot_longer()
andpivot_wider()
has been
moved to the front of the function signature, after the required arguments
but before the optional ones. Additionally,pivot_longer_spec()
,
pivot_wider_spec()
,build_longer_spec()
, andbuild_wider_spec()
have
all gained...
arguments in a similar location. This change allows us to
more easily add new features to the pivoting functions without breaking
existing CRAN packages and user scripts.pivot_wider()
provides temporary backwards compatible support for the case
of a single unnamed argument that previously was being positionally matched to
id_cols
. This one special case still works, but will throw a warning
encouraging you to explicitly name theid_cols
argument.To read more about this pattern, see
Data, dots, details in the
tidyverse design guide (#1350).
Lifecycle changes
- All functions deprecated in tidyr 1.0 and 1.2 (the old lazyeval functions
ending in_
and various arguments tounnest()
) now warn on every use.
They will be made defunct in 2024 (#1406).
Rectangling
-
unnest_longer()
now consistently drops rows with eitherNULL
or empty
vectors (likeinteger()
) by default. Set the newkeep_empty
argument to
TRUE
to retain them. Previously,keep_empty = TRUE
was implicitly being
used forNULL
, whilekeep_empty = FALSE
was being used for empty vectors,
which was inconsistent with all other tidyr verbs with this argument (#1363). -
unnest_longer()
now uses""
in the index column for fully unnamed
vectors. It also now consistently usesNA
in the index column for empty
vectors that are "kept" bykeep_empty = TRUE
(#1442). -
unnest_wider()
now errors if any values being unnested are unnamed and
names_sep
is not provided (#1367). -
unnest_wider()
now generates automatic names for partially unnamed
vectors. Previously it only generated them for fully unnamed vectors,
resulting in a strange mix of automatic names and name-repaired names (#1367).
Bug fixes and minor improvements
General
-
Most tidyr functions now consistently disallow renaming during tidy-selection.
Renaming was never meaningful in these functions, and previously either had no
effect or caused problems (#1449, #1104). -
tidyr errors (including input validation) have been thoroughly reviewed
and should generally be more likely to point you in the right direction
(#1313, #1400). -
uncount()
is now generic so implementations can be provided for objects
other than data frames (@mgirlich, #1358). -
uncount()
gains a...
argument. It comes between the required and the
optional arguments (@mgirlich, #1358). -
nest()
,complete()
,expand()
, andfill()
now document their support
for grouped data frames created bydplyr::group_by()
(#952). -
All built in datasets are now standard tibbles (#1459).
-
R >=3.4.0 is now required, in line with the tidyverse standard of supporting
the previous 5 minor releases of R. -
rlang >=1.0.4 and vctrs >=0.5.2 are now required (#1344, #1470).
-
Removed dependency on ellipsis in favor of equivalent functions in rlang
(#1314).
Nesting, packing, and chopping
-
unnest()
,unchop()
,unnest_longer()
, andunnest_wider()
better handle
lists with additional classes (#1327). -
pack()
,unpack()
,chop()
, andunchop()
all gain anerror_call
argument, which in turn improves some of the error calls shown innest()
and variousunnest()
adjacent functions (#1446). -
chop()
,unpack()
, andunchop()
all gain...
, which must be empty
(#1447). -
unpack()
does a better job of reporting column name duplication issues and
gives better advice about how to resolve them usingnames_sep
. This also
improves errors from functions that useunpack()
, likeunnest()
and
unnest_wider()
(#1425, #1367).
Pivoting
-
pivot_longer()
no longer supports interpretingvalues_ptypes = list()
andnames_ptypes = list()
asNULL
. An emptylist()
is now interpreted as
a<list>
prototype to apply to all columns, which is consistent with how any
other 0-length value is interpreted (#1296). -
pivot_longer(values_drop_na = TRUE)
is faster when there aren't any missing
values to drop (#1392, @mgirlich). -
pivot_longer()
is now more memory efficient due to the usage of
vctrs::vec_interleave()
(#1310, @mgirlich). -
pivot_longer()
now throws a slightly better error message when
values_ptypes
ornames_ptypes
is provided and the coercion can't be made
(#1364). -
pivot_wider()
now throws a better error message when a column selected by
names_from
orvalues_from
is also selected byid_cols
(#1318). -
pivot_wider()
is now faster whennames_sep
is provided (@mgirlich, #1426). -
pivot_longer_spec()
,pivot_wider_spec()
,build_longer_spec()
, and
build_wider_spec()
all gain anerror_call
argument, resulting in better
error reporting inpivot_longer()
andpivot_wider()
(#1408).
Missing values
-
fill()
now works correctly when there is a column named.direction
in
data
(#1319, @tjmahr). -
replace_na()
is faster when there aren't any missing values to replace
(#1392, @mgirlich). -
The documentation of the
replace
argument ofreplace_na()
now mentions
thatreplace
is always cast to the type ofdata
(#1317).
tidyr 1.2.1
- Hot patch release to resolve R CMD check failures.
tidyr 1.2.0
Pivoting
-
pivot_wider()
gains newnames_expand
andid_expand
arguments for turning
implicit missing factor levels and variable combinations into explicit ones.
This is similar to thedrop
argument fromspread()
(#770). -
pivot_wider()
gains a newnames_vary
argument for controlling the
ordering when combiningnames_from
values withvalues_from
column names
(#839). -
pivot_wider()
gains a newunused_fn
argument for controlling how to
summarize unused columns that aren't involved in the pivoting process (#990,
thanks to @mgirlich for an initial implementation). -
pivot_longer()
'snames_transform
andvalues_transform
arguments now
accept a single function which will be applied to all of the columns
(#1284, thanks to @smingerson for an initial implementation). -
pivot_longer()
'snames_ptypes
andvalues_ptypes
arguments now
accept a single empty ptype which will be applied to all of the columns
(#1284).
Nesting
-
unnest()
andunchop()
'sptype
argument now accepts a single empty
ptype which will be applied to allcols
(#1284). -
unpack()
now silently skips over any non-data frame columns specified by
cols
. This matches the existing behavior ofunchop()
andunnest()
(#1153).
Rectangling
-
unnest_wider()
andunnest_longer()
can now unnest multiple columns at
once (#740). -
unnest_longer()
'sindices_to
andvalues_to
arguments now accept
a glue specification, which is useful when unnesting multiple columns. -
For
hoist()
,unnest_longer()
, andunnest_wider()
, if aptype
is
supplied, but that column can't be simplified, the result will be a list-of
column where each element has typeptype
(#998). -
unnest_wider()
gains a newstrict
argument which controls whether or not
strict vctrs typing rules should be applied. It defaults toFALSE
for
backwards compatibility, and because it is often more useful to be lax
when unnesting JSON, which doesn't always map one-to-one with R's types
(#1125). -
hoist()
,unnest_longer()
, andunnest_wider()
'ssimplify
argument now
accepts a named list ofTRUE
orFALSE
to control simplification on a per
column basis (#995). -
hoist()
,unnest_longer()
, andunnest_wider()
'stransform
argument now
accepts a single function which will be applied to all components (#1284). -
hoist()
,unnest_longer()
, andunnest_wider()
'sptype
argument now
accepts a single empty ptype which will be applied to all components (#1284).
Grids
-
complete()
gains a newexplicit
argument for limitingfill
to only
implicit missing values. This is useful if you don't want to fill in
pre-existing missing values (#1270). -
complete()
gains a grouped data frame method. This generates a more correct
completed data frame when groups are involved (#396, #966). -
complete()
andexpand()
no longer allow you to complete or expand on a
grouping column. This was never well-defined since completion/expansion on a
grouped data frame happens "within" each group and otherwise has the
potential to produce erroneous results (#1299).
Missing values
-
drop_na()
,replace_na()
, andfill()
have been updated to utilize vctrs.
This means that you can use these functions on a wider variety of column
types, including lubridate's Period types (#1094), data frame columns, and
the rcrd type from vctrs. -
replace_na()
no longer allows the type ofdata
to change when the
replacement is applied.replace
will now always be cast to the type of
data
before the replacement is made. For example, this means that using a
replacement value of1.5
on an integer column is no longer allowed.
Similarly, replacing missing values in a list-column must now be done with
list("foo")
rather than just"foo"
. -
replace_na()
no longer replaces empty atomic elements in list-columns
(likeinteger(0)
). The only value that is replaced in a list-column is
NULL
(#1168). -
drop_na()
no longer drops empty atomic elements from list-columns
(likeinteger(0)
). The only value that is dropped in a list-column is
NULL
(#1228).
Bug fixes and minor improvements
General
-
@mgirlich is now a tidyr author in recognition of his significant and
sustained contributions. -
All lazyeval variants of tidyr verbs have been soft-deprecated. Expect them
to move to the defunct stage in the next minor release of tidyr (#1294). -
any_of()
andall_of()
from tidyselect are now re-exported (#1217). -
dplyr >= 1.0.0 is now required.
Pivoting
-
pivot_wider()
now gives better advice about how to identify duplicates when
values are not uniquely identified (#1113). -
pivot_wider()
now throws a more informative error whenvalues_fn
doesn't
result in a single summary value (#1238). -
pivot_wider()
andpivot_longer()
now generate more informative
errors related to name repair (#987). -
pivot_wider()
now works correctly whenvalues_fill
is a data frame. -
pivot_wider()
no longer accidentally retainsvalues_from
when pivoting
a zero row data frame (#1249). -
pivot_wider()
now correctly handles the case where an id column name
collides with a value fromnames_from
(#1107). -
pivot_wider()
andpivot_longer()
now both check that the spec columns
.name
and.value
are character vectors. Additionally, the.name
column must be unique (#1107). -
pivot_wider()
'snames_from
andvalues_from
arguments are now
required if their default values ofname
andvalue
don't correspond to
columns indata
. Additionally, they must identify at least 1 column
indata
(#1240). -
pivot_wider()
'svalues_fn
argument now correctly allows anonymous
functions (#1114). -
pivot_wider_spec()
now works correctly with a 0-row data frame and aspec
that doesn't identify any rows (#1250, #1252). -
pivot_longer()
'snames_ptypes
argument is now applied after
names_transform
for consistency with the rectangling functions
(i.e.hoist()
) (#1233). -
check_pivot_spec()
is a new developer facing function for validating a pivot
spec
argument. This is only useful if you are extendingpivot_longer()
or
pivot_wider()
with new S3 methods (#1087).
Nesting
-
The
nest()
generic now avoids computing on.data
, making it more
compatible with lazy tibbles (#1134). -
The
.names_sep
argument of the data.frame method fornest()
is now
actually used (#1174). -
unnest()
'sptype
argument now works as expected (#1158). -
unpack()
no longer drops empty columns specified throughcols
(#1191). -
unpack()
now works correctly with data frame columns containing 1 row but
0 columns (#1189). -
chop()
now works correctly with data frames with 0 rows (#1206). -
chop()
'scols
argument is no longer optional. This matches the
behavior ofcols
seen elsewhere in tidyr (#1205). -
unchop()
now respectsptype
when unnesting a non-list column (#1211).
Rectangling
hoist()
no longer accidentally removes elements that have duplicated names
(#1259).
Grids
-
The grouped data frame methods for
complete()
andexpand()
now move the
group columns to the front of the result (in addition to the columns you
completed on or expanded, which were already moved to the front). This should
make more intuitive sense, as you are completing or expanding "within" each
group, so the group columns should be the first thing you see (#1289). -
complete()
now appliesfill
even when no columns to complete are
specified (#1272). -
expand()
,crossing()
, andnesting()
now correctly retainNA
values of
factors (#1275). -
expand_grid()
,expand()
,nesting()
, andcrossing()
now silently
apply name repair to automatically named inputs. This avoids a number of
issues resulting from duplicate truncated names
(#1116, #1221, #1092, #1037, #992). -
expand_grid()
,expand()
,nesting()
, andcrossing()
now allow
columns from unnamed data frames to be used in expressions after that
data frame was specified, likeexpand_grid(tibble(x = 1), y = x)
. This
is more consistent with howtibble()
behaves. -
expand_grid()
,expand()
,nesting()
, andcrossing()
now work
correctly with data frames containing 0 columns but >0 rows (#1189). -
expand_grid()
,expand()
,nesting()
, andcrossing()
now return a 1
row data frame when no inputs are supplied, which is more consistent with
prod() == 1L
and the idea that computations involving the number of
combinations computed from an empty set should return 1 (#1258).
Missing values
tidyr 1.1.4
-
expand_grid()
is now about twice as fast andpivot_wider()
is a bit faster
(@mgirlich, #1130). -
unchop()
is now much faster, which propagates through to various functions,
such asunnest()
,unnest_longer()
,unnest_wider()
, and
separate_rows()
(@mgirlich, @DavisVaughan, #1127). -
unnest()
is now much faster (@mgirlich, @DavisVaughan, #1127). -
unnest()
no longer allows unnesting a list-col containing a mix of vector
and data frame elements. Previously, this only worked by accident, and is
considered an off-label usage ofunnest()
that has now become an error.
tidyr 1.1.3
-
tidyr verbs no longer have "default" methods for lazyeval fallbacks. This
means that you'll get clearer error messages (#1036). -
uncount()
error for non-integer weights and gives a clearer error message
for negative weights (@mgirlich, #1069). -
pivot_wider()
works with data.table and empty key variables (@mgirlich, #1066). -
separate_rows()
works for factor columns (@mgirlich, #1058).
tidyr 1.1.2
separate_rows()
returns to 1.1.0 behaviour for empty strings
(@rjpatm, #1014).
tidyr 1.1.1
-
New tidyr logo!
-
stringi dependency has been removed; this was a substantial dependency that
make tidyr hard to compile in resource constrained environments
(@rjpat, #936). -
Replace Rcpp with cpp11. See https://cpp11.r-lib.org/articles/motivations.html
for reasons why.
tidyr 1.1.0
General features
-
pivot_longer()
,hoist()
,unnest_wider()
, andunnest_longer()
gain
newtransform
arguments; these allow you to transform values "in flight".
They are partly needed because vctrs coercion rules have become stricter,
but they give you greater flexibility than was available previously (#921). -
Arguments that use tidy selection syntax are now clearly documented and
have been updated to use tidyselect 1.1.0 (#872).
Pivoting improvements
-
Both
pivot_wider()
andpivot_longer()
are considerably more performant,
thanks largely to improvements in the underlying vctrs code
(#790, @DavisVaughan). -
pivot_longer()
now supportsnames_to = character()
which prevents the
name column from being created (#961).df <- tibble(id = 1:3, x_1 = 1:3, x_2 = 4:6) df %>% pivot_longer(-id, names_to = character())
-
pivot_longer()
no longer creates a.copy
variable in the presence of
duplicate column names. This makes it more consistent with the handling
of non-unique specs. -
pivot_longer()
automatically disambiguates non-unique ouputs, which can
occur when the input variables include some additional component that you
don't care about and want to discard (#792, #793).df <- tibble(id = 1:3, x_1 = 1:3, x_2 = 4:6) df %>% pivot_longer(-id, names_pattern = "(.)_.") df %>% pivot_longer(-id, names_sep = "_", names_to = c("name", NA)) df %>% pivot_longer(-id, names_sep = "_", names_to = c(".value", NA))
-
pivot_wider()
gains anames_sort
argument which allows you to sort
column names in order. The default,FALSE
, orders columms by their
first appearance (#839). In a future version, I'll consider changing the
default toTRUE
. -
pivot_wider()
gains anames_glue
argument that allows you to construct
output column names with a glue specification. -
pivot_wider()
argumentsvalues_fn
andvalues_fill
can now be single
values; you now only need to use a named list if you want to use different
values for different value columns (#739, #746). They also get improved
errors if they're not of the expected type.
Rectangling
-
hoist()
now automatically names pluckers that are a single string (#837).
It error if you use duplicated column names (@mgirlich, #834), and now uses
rlang::list2()
behind the scenes (which means that you can now use!!!
and:=
) (#801). -
unnest_longer()
,unnest_wider()
, andhoist()
do a better job
simplifying list-cols. They no longer add unneededunspecified()
when
the result is still a list (#806), and work when the list contains
non-vectors (#810, #848). -
unnest_wider(names_sep = "")
now provides default names for unnamed inputs,
suppressing the many previous name repair messages (#742).
Nesting
-
pack()
andnest()
gains a.names_sep
argument allows you to strip outer
names from inner names, in symmetrical way to how the same argument to
unpack()
andunnest()
combines inner and outer names (#795, #797). -
unnest_wider()
andunnest_longer()
can now unnestlist_of
columns. This
is important for unnesting columns created fromnest()
and with
pivot_wider()
, which will createlist_of
columns if the id columns are
non-unique (#741).
Bug fixes and minor improvements
-
chop()
now creates list-columns of classvctrs::list_of()
. This helps
keep track of the type in case the chopped data frame is empty, allowing
unchop()
to reconstitute a data frame with the correct number and types
of column even when there are no observations. -
drop_na()
now preserves attributes of unclassed vectors (#905). -
expand()
,expand_grid()
,crossing()
, andnesting()
once again
evaluate their inputs iteratively, so you can refer to freshly created
columns, e.g.crossing(x = seq(-2, 2), y = x)
(#820). -
expand()
,expand_grid()
,crossing()
, andnesting()
gain a
.name_repair
giving you control over their name repair strategy
(@jeffreypullin, #798). -
extract()
lets you useNA
ininto
, as documented (#793). -
extract()
,separate()
,hoist()
,unnest_longer()
, andunnest_wider()
give a better error message ifcol
is missing (#805). -
pack()
's first argument is now.data
instead ofdata
(#759). -
pivot_longer()
now errors ifvalues_to
is not a length-1 character vector
(#949). -
pivot_longer()
andpivot_wider()
are now generic so implementations
can be provided for objects other than data frames (#800). -
pivot_wider()
can now pivot data frame columns (#926) -
unite(na.rm = TRUE)
now works for all types of variable, not just character
vectors (#765). -
unnest_wider()
gives a better error message if you attempt to unnest
multiple columns (#740). -
unnest_auto()
works when the input data contains a column calledcol
(#959).
tidyr 1.0.3
Compatibility with vctrs 0.3.0. Note that because of stricter semantics in vctrs, the names_ptypes and values_ptypes arguments have become less useful. We will fix this in the next version with more general transformation arguments.