You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Submitted by: James Sams; Assigned to: Nobody; R-Forge link
I have a file with three fields: two string fields and an integer field. In 99% of cases, the string fields aren't even quoted due to their simplicity. However, I have one line in a file that looks like:
233,"A ""EMBEDDED"" QUOTE FIELD",morechars
And fread fails to read, thinking that the second quote closes the string field and it expects a separator:
# "Expected sep (',') but '"' ends field 2 on line 828 when reading data:".
(Actual data not used due to confidentiality concerns.)
read.csv properly interprets this as three columns:
1) 233
2) A "EMBEDDED" QUOTE FIELD
3) morechars
IME, there are two ways that CSV-type files will handle embedded quotes with backslash escape (") and by doubling them up, as is done here (""). Well, at least two unambiguous ways. Note that it isn't uncommon to see this field without the outer quotes. The reason for this, as I understand it, is that some programs will only include the outer quotes if the field contains the designated field separator. Otherwise, these programs will rely on the escaping mechanism (either backslash or doubling) to handle single or double quotes, etc. Of course, csv files aren't standardized; so, there may be other cases. Hopefully this is helpful information though.
I see several other bug reports about fread's handling of quoted fields, but this seems to be a different issue than the others. Thus the separate report. Apologies if you consider it to be a duplicate report.
The text was updated successfully, but these errors were encountered:
Embedded quotes and doubled-up quotes should now be handled in v1.9.4 inside a quoted field or not. Report seems to be from much earlier this year. There's still a problem if an embedded newline occurs after a double-up quote. Check and add more tests on this one, document, add to README and close.
Submitted by: James Sams; Assigned to: Nobody; R-Forge link
I have a file with three fields: two
string
fields and aninteger
field. In 99% of cases, the string fields aren't even quoted due to their simplicity. However, I have one line in a file that looks like:And
fread
fails to read, thinking that the second quote closes the string field and it expects a separator:(Actual data not used due to confidentiality concerns.)
read.csv
properly interprets this as three columns:IME, there are two ways that CSV-type files will handle embedded quotes with backslash escape (") and by doubling them up, as is done here (""). Well, at least two unambiguous ways. Note that it isn't uncommon to see this field without the outer quotes. The reason for this, as I understand it, is that some programs will only include the outer quotes if the field contains the designated field separator. Otherwise, these programs will rely on the escaping mechanism (either backslash or doubling) to handle single or double quotes, etc. Of course, csv files aren't standardized; so, there may be other cases. Hopefully this is helpful information though.
I see several other bug reports about
fread
's handling of quoted fields, but this seems to be a different issue than the others. Thus the separate report. Apologies if you consider it to be a duplicate report.The text was updated successfully, but these errors were encountered: