Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read csv file issue #991

Closed
hehhuang opened this issue Oct 24, 2019 · 6 comments
Closed

read csv file issue #991

hehhuang opened this issue Oct 24, 2019 · 6 comments

Comments

@hehhuang
Copy link

var Excel = require("exceljs");
// read from a file
var workbook = new Excel.Workbook();
workbook.csv.readFile("3.csv")
.then(worksheet => {
// use workbook or worksheet
console.log(worksheet.getColumn(1).values);
});
/output :
[ <1 empty item>,
'name',
2001-04-30T16:00:00.000Z,
2001-04-30T16:00:00.000Z,
'george's Grocery aa55' ]
/
but the csv file is like this:
name
George's Grocery 005
Quinn's Supermarket #5
george's Grocery aa55
the issue is that if the value contains space number(or other Special symbol), it will output a date type value.For example "a 123" will output "0122-12-31T15:54:17.000Z". How to resolve this.(xlsx file can work well).

@LibertyNJ
Copy link
Contributor

I am encountering a similar issue, except it does not appear to be tied to a space or a special symbol.

String 0020Z6TNY parses as a string: 0020Z6TNY. (expected).
String 0020Z6TRL parses as a string: 0020Z6TRL. (expected).
String 00210PRG1 parses as a date: 1920-12-01T06:00:00.000Z. (unexpected).
String 00210PRGS parses as a date: 1920-12-01T05:00:00.000Z. (unexpected).
String 00211MDN2 parses as a date: 1921-01-01T07:00:00.000Z. (unexpected).

I wonder if this could be related to day.js date validation?

As a workaround I passed in an options object that specifies how each column should be parsed. This allowed me to override the behavior on a column by column basis. Changing the dateFormats part of the options object does not appear to work, which again may point back at day.js.

@LibertyNJ
Copy link
Contributor

LibertyNJ commented Nov 1, 2019

I looked into it a little bit more, and this is what I found. I would submit a pull request but there are a few different things going on between exceljs and dayjs, and I'm not sure what would be the right approach to rectifying it.

CSV.createInputStream's default map attempts to convert a cell's value into a date with dayjs. If the result passes a test as a valid date, it is returned as a date object for the value for the cell. However, the implementation of dayjs in exceljs is not extended with the customParseFormat plugin. Another issue is that it looks like dayjs does not accept an array of formats as an argument, only a single string as a primitive or a property in an object.

So instead of the intended behavior, dayjs defaults to parsing anything that passes a permissive regex test into a valid date, which exceljs then returns as the value for the cell. I tested the regex and had it pass and fail as reflected in my test cases above, but it did not do the same for hehhuang's examples.

The regex is
/^(\d{4})-?(\d{1,2})-?(\d{0,2})[^0-9]*(\d{1,2})?:?(\d{1,2})?:?(\d{1,2})?.?(\d{1,3})?$/
which is satisfied by anything that passes
/^\d{4}-?\d/

@hehhuang, is the example you presented exactly the same as you used when you experienced the issue? If so, these may be two separate issues.

@hehhuang
Copy link
Author

hehhuang commented Nov 1, 2019

Thanks for reply.My solution is to convert csv file to xlsx file that will not execute the date conversion operation.That will solve this issue correctly.

@damianaiamad
Copy link

damianaiamad commented Nov 11, 2019

I'm seeing a similar issue with data like this being converted to dates:
RN-56785
GG-74510
KS-49593

Problem occurs with ExcelJS v2 & v3.

Works ok by downgrading to v1.15.0.

@damianaiamad
Copy link

damianaiamad commented Nov 12, 2019

There is a nasty issue waiting in the wings. Even when using dayjs with customParseFormat, dayjs does not strictly enforce the format (and seems to have no option to do so). This results in non-dates still getting interpreted as dates even when using a carefully specified date format.

I hacked up exceljs to fix the issue noted by @LibertyNJ above and discovered this issue.

Consider the following program:

const dayjs = require('dayjs');
const customParseFormat = require('dayjs/plugin/customParseFormat');
dayjs.extend(customParseFormat);

const date = dayjs("NZ1019316", "D/M/YY");
console.log("Valid?", date.isValid());
console.log("Date?", date.format());

Output:

Valid? true
Date? 2032-07-10T00:00:00 10:00

See iamkun/dayjs#605

@damianaiamad
Copy link

So the reason exceljs v1 worked is because it uses momentjs rather than dayjs. Unfortunately dayjs is not a drop-in replacement in this instance. I humbly suggest reverting to momentjs.

This code in exceljs v1 (exceljs/lib/csv/csv.js) works as desired: handles array of date formats, true means date formats are strictly enforced.
v1
const dt = moment(datum, dateFormats, true);

New code: dateFormats array is invalid (can be solved with loop), true is ignored since dayjs does not have a strict mode (insurmountable with current dayjs).

v3
const dt = dayjs(datum, dateFormats, true);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants