-
Notifications
You must be signed in to change notification settings - Fork 991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent data.table assignment by reference behaviour #759
Comments
Thanks @tunaaa once again for the report. Side note: It's okay to just link to the SO issue. In fact it'd be nice if you edited your question to include the original link to your issues so that we can update the post once a decision is made and the issue is closed. |
Updated SO post. |
I wonder if it makes sense to extend this feature for grouping operations? library(data.table
dt = data.table(a=1)
dt[a > 2, x1 := "col will appear"]
# a x1
#1: 1 NA
dt[a > 2, x2 := "col wont appear", by=a]
# a x1
#1: 1 NA It would be probably quite costly to determine data type for NA columns. |
Now, the inconsistency is the other way around, unfortunately. As a matter of fact I have to fix a bunch of code which is relying on the old behaviour of version 1.9.6 and 1.9.7 (until recently).
used to yield in
Now, with the latests commit it returns
Please, note that my If this feature will stay in
I'm happy to hear about any other suggestions which will save me from the additional work to fix this. Potential solutionPerhaps, we have to distinguish here between a simple select statement in At least, this needs to be added to the list of POTENTIALLY BREAKING CHANGES in |
I'm sorry, what is your expected output? I may have missed it |
@UweBlock current approach is more consistent, it always add a column, and fills its will values depending on |
@jangorecki Agree. That was the thinking. To make the shape and type of the output not depend on data (whether the i expression returns all-FALSE). Assuming by 'current approach' you mean 'new approach'. @UweBlock Two questions. On :
What is this expression please (roughly) and why does it break on empty input? I'm thinking we should focus on your RHS of := so that it returns numeric(), integer() or character() on empty input. And you must have switches in your current code after this statement then? To cope with the new b column existing or not depending on whether i returns all-FALSE or not? |
Here's an example of old behaviour we're thinking is undesirable :
New behaviour :
|
Thank you for your immediate reactions and suggestions, @MichaelChirico, @jangorecki, @mattdowle. Two updates:
Now, with the fixes in place I feel comfortable with the new approach. However, I will spent some more time on the MWE. |
Thanks for the update @UweBlock. We'll keep the change then unless anything else comes to light and move the news item up to the 'potentially breaking changes' section as you suggested. |
I did something very similar to this in data.table 1.9.6:
now (since 1.9.8), the same code gives:
In hundreds of different piece of code I used the 1.9.6 inconsistency as a no-op case, expecting that the stop would never be evaluated if the RHS had a length of 0. I understand the purpose of the change in data.table and I am looking for the best way to remediate my code and possibly an option to restore old functionality to give me time to remediate my code. Would it be possible add an option to restore this "inconsistency", like what was done for the other potentially breaking changes introduced in 1.9.8? |
@brianbolt all breaking changes were listed in news file so be sure to check there. Also be aware if your application works fine you don't have to update package and your code. Multiple versions of same package can be managed by using R libraries. This is pretty old change already, I don't think there is easy way to suppress evaluation here. |
@jangorecki I don't feel Nov 2016 (when 1.9.8 was released) is that old and other possibly breaking changes were given options (to be removed in 2 years) for using the old behavior in a deprecated way. That being said, I am aware that installing the old package in a local library is possible and it is what I am doing as a workaround for now until I can remediate my code. For anyone else that might need to do this. Here is 1 way to install 1.9.6:
|
I find this part of the documentation very confusing now as it seems to indicate that I am using data.table correctly. Essentially as a for loop.
In my case, on which rows = 0 so I would expect (like in a normal for loop) that j would not be evaluated. I really do think this should be an option if not the default. |
@brianbolt I see your point, but another way of thinking about it is that evaluation of
|
just got bit by this... maybe worth adding to the jan's suggestion to the FAQ? e.g., I just tried:
And was (somewhat) surprised to see the error:
|
@MichaelChirico Late reply by 7 months (!) but are you still surprised by that error? It looks good and safe behaviour to return error in that case, to me. |
Hmm. Can't say I remember what exactly I was trying to do. Certainly my syntax looks weird. |
Original SO post: http://stackoverflow.com/questions/25145112/inconsistent-data-table-assignment-by-reference-behaviour
When assigning by reference with a
data.table
using a column from a seconddata.table
, the results are inconsistent. When there are no matches by the key columns of bothdata.table
s, it appears the assigment expressiony := y
is totally ignored - not evenNA
s are returned.However, when there is a partial match, non-matching columns have a placeholder
NA
.This wreaks havoc on later code that assumes a
y
column exists in all cases. Otherwise I keep having to write cumbersome additional checks to take into account both cases.Is there an elegant way around this inconsistency?
The text was updated successfully, but these errors were encountered: