-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
col-cache: optimize for performance #1489
Conversation
lib/utils/col-cache.js
Outdated
@@ -107,41 113,95 @@ const colCache = { | |||
|
|||
// check if value looks like an address | |||
validateAddress(value) { | |||
if (!value.match(/^[A-Z] \d $/)) { | |||
// if (!value.match(/^[A-Z] \d $/)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you please benchmark if your new implementation is faster than this:
const addressRegex = /^[A-Z] \d $/;
// ...
validateAddress(value) {
if (!addressRegex.test(value)) {
throw new Error(`Invalid Address: ${value}`);
}
return true;
}
This one would be a lot easier to read and maintain :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
npm run benchmark shows that addressRegex.test
is slightly slower, I'll switch to this if you wish.
npm run benchmark on myfreeer:col-cache-performance(8a47ae7) with validateAddress implementation mentioned above
> [email protected] benchmark D:\UserData\projects\exceljs
> node --expose-gc benchmark
####################################################
WARMUP: Current memory usage: 8.87 MB
WARMUP: huge xlsx file streams profiling started
Reading worksheet 1
Reading row 50000
Reading row 100000
Reading worksheet 2
Reading row 150000
Processed 2 worksheets and 150002 rows
WARMUP: huge xlsx file streams profiling finished in 6090ms
WARMUP: Current memory usage (before GC): 46.75 MB
WARMUP: Current memory usage (after GC): 9.6 MB
####################################################
RUN 1: huge xlsx file streams profiling started
Reading worksheet 1
Reading row 50000
Reading row 100000
Reading worksheet 2
Reading row 150000
Processed 2 worksheets and 150002 rows
RUN 1: huge xlsx file streams profiling finished in 6063ms
RUN 1: Current memory usage (before GC): 44.24 MB
RUN 1: Current memory usage (after GC): 9.75 MB
####################################################
RUN 2: huge xlsx file streams profiling started
Reading worksheet 1
Reading row 50000
Reading row 100000
Reading worksheet 2
Reading row 150000
Processed 2 worksheets and 150002 rows
RUN 2: huge xlsx file streams profiling finished in 5970ms
RUN 2: Current memory usage (before GC): 67.58 MB
RUN 2: Current memory usage (after GC): 9.74 MB
####################################################
RUN 3: huge xlsx file streams profiling started
Reading worksheet 1
Reading row 50000
Reading row 100000
Reading worksheet 2
Reading row 150000
Processed 2 worksheets and 150002 rows
RUN 3: huge xlsx file streams profiling finished in 6220ms
RUN 3: Current memory usage (before GC): 69.27 MB
RUN 3: Current memory usage (after GC): 9.79 MB
####################################################
WARMUP: Current memory usage: 9.47 MB
WARMUP: huge xlsx file async iteration profiling started
Reading worksheet 1
Reading row 50000
Reading row 100000
Reading worksheet 2
Reading row 150000
Processed 2 worksheets and 150002 rows
WARMUP: huge xlsx file async iteration profiling finished in 6015ms
WARMUP: Current memory usage (before GC): 40.56 MB
WARMUP: Current memory usage (after GC): 9.83 MB
####################################################
RUN 1: huge xlsx file async iteration profiling started
Reading worksheet 1
Reading row 50000
Reading row 100000
Reading worksheet 2
Reading row 150000
Processed 2 worksheets and 150002 rows
RUN 1: huge xlsx file async iteration profiling finished in 5956ms
RUN 1: Current memory usage (before GC): 37.71 MB
RUN 1: Current memory usage (after GC): 9.82 MB
####################################################
RUN 2: huge xlsx file async iteration profiling started
Reading worksheet 1
Reading row 50000
Reading row 100000
Reading worksheet 2
Reading row 150000
Processed 2 worksheets and 150002 rows
RUN 2: huge xlsx file async iteration profiling finished in 5870ms
RUN 2: Current memory usage (before GC): 37.96 MB
RUN 2: Current memory usage (after GC): 9.85 MB
####################################################
RUN 3: huge xlsx file async iteration profiling started
Reading worksheet 1
Reading row 50000
Reading row 100000
Reading worksheet 2
Reading row 150000
Processed 2 worksheets and 150002 rows
RUN 3: huge xlsx file async iteration profiling finished in 5902ms
RUN 3: Current memory usage (before GC): 39.57 MB
RUN 3: Current memory usage (after GC): 9.88 MB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah let's do it - other than that, the PR looks great!
Doing a profiling in chrome dev tools shows that the function decodeAddress and validateAddress in lib/utils/col-cache.js are taking considerable amount of time, mainly on the cache lookup and regexp matching. This would make that faster, especially with large worksheets.
8a47ae7
to
e8731f8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you!
Summary
Doing a profiling in chrome dev tools shows that the function
decodeAddress
andvalidateAddress
inlib/utils/col-cache.js
are taking considerable amount of time, mainly on the cache lookup and regexp matching. This would make that faster, especally with large worksheets.Test plan
Existing unit test and
npm run benchmark
result below.Env: Windows 10 x64, node.js 12.18.4 LTS x64
npm run benchmark on master(a682091)
npm run benchmark on myfreeer:col-cache-performance(8a47ae7)