Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

col-cache: optimize for performance #1489

Merged
merged 1 commit into from
Oct 5, 2020

Conversation

myfreeer
Copy link
Contributor

@myfreeer myfreeer commented Oct 5, 2020

Summary

Doing a profiling in chrome dev tools shows that the function decodeAddress and validateAddress in lib/utils/col-cache.js are taking considerable amount of time, mainly on the cache lookup and regexp matching. This would make that faster, especally with large worksheets.

Test plan

Existing unit test and npm run benchmark result below.

Env: Windows 10 x64, node.js 12.18.4 LTS x64

npm run benchmark on master(a682091)
> [email protected] benchmark D:\UserData\projects\exceljs
> node --expose-gc benchmark


####################################################
WARMUP: Current memory usage: 8.89 MB
WARMUP: huge xlsx file streams profiling started
Reading worksheet 1
Reading row 50000
Reading row 100000
Reading worksheet 2
Reading row 150000
Processed 2 worksheets and 150002 rows
WARMUP: huge xlsx file streams profiling finished in 8354ms
WARMUP: Current memory usage (before GC): 153.16 MB
WARMUP: Current memory usage (after GC): 41.63 MB

####################################################
RUN 1: huge xlsx file streams profiling started
Reading worksheet 1
Reading row 50000
Reading row 100000
Reading worksheet 2
Reading row 150000
Processed 2 worksheets and 150002 rows
RUN 1: huge xlsx file streams profiling finished in 8174ms
RUN 1: Current memory usage (before GC): 116.57 MB
RUN 1: Current memory usage (after GC): 25.67 MB

####################################################
RUN 2: huge xlsx file streams profiling started
Reading worksheet 1
Reading row 50000
Reading row 100000
Reading worksheet 2
Reading row 150000
Processed 2 worksheets and 150002 rows
RUN 2: huge xlsx file streams profiling finished in 8048ms
RUN 2: Current memory usage (before GC): 142.09 MB
RUN 2: Current memory usage (after GC): 41.6 MB

####################################################
RUN 3: huge xlsx file streams profiling started
Reading worksheet 1
Reading row 50000
Reading row 100000
Reading worksheet 2
Reading row 150000
Processed 2 worksheets and 150002 rows
RUN 3: huge xlsx file streams profiling finished in 8242ms
RUN 3: Current memory usage (before GC): 115.43 MB
RUN 3: Current memory usage (after GC): 25.67 MB

####################################################
WARMUP: Current memory usage: 25.33 MB
WARMUP: huge xlsx file async iteration profiling started
Reading worksheet 1
Reading row 50000
Reading row 100000
Reading worksheet 2
Reading row 150000
Processed 2 worksheets and 150002 rows
WARMUP: huge xlsx file async iteration profiling finished in 8120ms
WARMUP: Current memory usage (before GC): 148.28 MB
WARMUP: Current memory usage (after GC): 41.65 MB

####################################################
RUN 1: huge xlsx file async iteration profiling started
Reading worksheet 1
Reading row 50000
Reading row 100000
Reading worksheet 2
Reading row 150000
Processed 2 worksheets and 150002 rows
RUN 1: huge xlsx file async iteration profiling finished in 8181ms
RUN 1: Current memory usage (before GC): 110.38 MB
RUN 1: Current memory usage (after GC): 25.67 MB

####################################################
RUN 2: huge xlsx file async iteration profiling started
Reading worksheet 1
Reading row 50000
Reading row 100000
Reading worksheet 2
Reading row 150000
Processed 2 worksheets and 150002 rows
RUN 2: huge xlsx file async iteration profiling finished in 8215ms
RUN 2: Current memory usage (before GC): 148.74 MB
RUN 2: Current memory usage (after GC): 41.64 MB

####################################################
RUN 3: huge xlsx file async iteration profiling started
Reading worksheet 1
Reading row 50000
Reading row 100000
Reading worksheet 2
Reading row 150000
Processed 2 worksheets and 150002 rows
RUN 3: huge xlsx file async iteration profiling finished in 8259ms
RUN 3: Current memory usage (before GC): 112.38 MB
RUN 3: Current memory usage (after GC): 25.68 MB


npm run benchmark on myfreeer:col-cache-performance(8a47ae7)

> [email protected] benchmark D:\UserData\projects\exceljs
> node --expose-gc benchmark


####################################################
WARMUP: Current memory usage: 8.89 MB
WARMUP: huge xlsx file streams profiling started
Reading worksheet 1
Reading row 50000
Reading row 100000
Reading worksheet 2
Reading row 150000
Processed 2 worksheets and 150002 rows
WARMUP: huge xlsx file streams profiling finished in 6030ms
WARMUP: Current memory usage (before GC): 67.6 MB
WARMUP: Current memory usage (after GC): 9.65 MB

####################################################
RUN 1: huge xlsx file streams profiling started
Reading worksheet 1
Reading row 50000
Reading row 100000
Reading worksheet 2
Reading row 150000
Processed 2 worksheets and 150002 rows
RUN 1: huge xlsx file streams profiling finished in 5787ms
RUN 1: Current memory usage (before GC): 53.34 MB
RUN 1: Current memory usage (after GC): 9.75 MB

####################################################
RUN 2: huge xlsx file streams profiling started
Reading worksheet 1
Reading row 50000
Reading row 100000
Reading worksheet 2
Reading row 150000
Processed 2 worksheets and 150002 rows
RUN 2: huge xlsx file streams profiling finished in 5926ms
RUN 2: Current memory usage (before GC): 41.67 MB
RUN 2: Current memory usage (after GC): 9.76 MB

####################################################
RUN 3: huge xlsx file streams profiling started
Reading worksheet 1
Reading row 50000
Reading row 100000
Reading worksheet 2
Reading row 150000
Processed 2 worksheets and 150002 rows
RUN 3: huge xlsx file streams profiling finished in 5808ms
RUN 3: Current memory usage (before GC): 40.32 MB
RUN 3: Current memory usage (after GC): 9.76 MB

####################################################
WARMUP: Current memory usage: 9.47 MB
WARMUP: huge xlsx file async iteration profiling started
Reading worksheet 1
Reading row 50000
Reading row 100000
Reading worksheet 2
Reading row 150000
Processed 2 worksheets and 150002 rows
WARMUP: huge xlsx file async iteration profiling finished in 5808ms
WARMUP: Current memory usage (before GC): 48.61 MB
WARMUP: Current memory usage (after GC): 9.77 MB

####################################################
RUN 1: huge xlsx file async iteration profiling started
Reading worksheet 1
Reading row 50000
Reading row 100000
Reading worksheet 2
Reading row 150000
Processed 2 worksheets and 150002 rows
RUN 1: huge xlsx file async iteration profiling finished in 5837ms
RUN 1: Current memory usage (before GC): 38.44 MB
RUN 1: Current memory usage (after GC): 9.77 MB

####################################################
RUN 2: huge xlsx file async iteration profiling started
Reading worksheet 1
Reading row 50000
Reading row 100000
Reading worksheet 2
Reading row 150000
Processed 2 worksheets and 150002 rows
RUN 2: huge xlsx file async iteration profiling finished in 5800ms
RUN 2: Current memory usage (before GC): 51.21 MB
RUN 2: Current memory usage (after GC): 9.85 MB

####################################################
RUN 3: huge xlsx file async iteration profiling started
Reading worksheet 1
Reading row 50000
Reading row 100000
Reading worksheet 2
Reading row 150000
Processed 2 worksheets and 150002 rows
RUN 3: huge xlsx file async iteration profiling finished in 5793ms
RUN 3: Current memory usage (before GC): 48.64 MB
RUN 3: Current memory usage (after GC): 9.83 MB


@@ -107,41 113,95 @@ const colCache = {

// check if value looks like an address
validateAddress(value) {
if (!value.match(/^[A-Z] \d $/)) {
// if (!value.match(/^[A-Z] \d $/))
Copy link
Member

@alubbe alubbe Oct 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please benchmark if your new implementation is faster than this:

const addressRegex = /^[A-Z] \d $/;
// ...
  validateAddress(value) {
    if (!addressRegex.test(value)) {
       throw new Error(`Invalid Address: ${value}`);
    }
    return true;
  }

This one would be a lot easier to read and maintain :D

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

npm run benchmark shows that addressRegex.test is slightly slower, I'll switch to this if you wish.

npm run benchmark on myfreeer:col-cache-performance(8a47ae7) with validateAddress implementation mentioned above

> [email protected] benchmark D:\UserData\projects\exceljs
> node --expose-gc benchmark


####################################################
WARMUP: Current memory usage: 8.87 MB
WARMUP: huge xlsx file streams profiling started
Reading worksheet 1
Reading row 50000
Reading row 100000
Reading worksheet 2
Reading row 150000
Processed 2 worksheets and 150002 rows
WARMUP: huge xlsx file streams profiling finished in 6090ms
WARMUP: Current memory usage (before GC): 46.75 MB
WARMUP: Current memory usage (after GC): 9.6 MB

####################################################
RUN 1: huge xlsx file streams profiling started
Reading worksheet 1
Reading row 50000
Reading row 100000
Reading worksheet 2
Reading row 150000
Processed 2 worksheets and 150002 rows
RUN 1: huge xlsx file streams profiling finished in 6063ms
RUN 1: Current memory usage (before GC): 44.24 MB
RUN 1: Current memory usage (after GC): 9.75 MB

####################################################
RUN 2: huge xlsx file streams profiling started
Reading worksheet 1
Reading row 50000
Reading row 100000
Reading worksheet 2
Reading row 150000
Processed 2 worksheets and 150002 rows
RUN 2: huge xlsx file streams profiling finished in 5970ms
RUN 2: Current memory usage (before GC): 67.58 MB
RUN 2: Current memory usage (after GC): 9.74 MB

####################################################
RUN 3: huge xlsx file streams profiling started
Reading worksheet 1
Reading row 50000
Reading row 100000
Reading worksheet 2
Reading row 150000
Processed 2 worksheets and 150002 rows
RUN 3: huge xlsx file streams profiling finished in 6220ms
RUN 3: Current memory usage (before GC): 69.27 MB
RUN 3: Current memory usage (after GC): 9.79 MB

####################################################
WARMUP: Current memory usage: 9.47 MB
WARMUP: huge xlsx file async iteration profiling started
Reading worksheet 1
Reading row 50000
Reading row 100000
Reading worksheet 2
Reading row 150000
Processed 2 worksheets and 150002 rows
WARMUP: huge xlsx file async iteration profiling finished in 6015ms
WARMUP: Current memory usage (before GC): 40.56 MB
WARMUP: Current memory usage (after GC): 9.83 MB

####################################################
RUN 1: huge xlsx file async iteration profiling started
Reading worksheet 1
Reading row 50000
Reading row 100000
Reading worksheet 2
Reading row 150000
Processed 2 worksheets and 150002 rows
RUN 1: huge xlsx file async iteration profiling finished in 5956ms
RUN 1: Current memory usage (before GC): 37.71 MB
RUN 1: Current memory usage (after GC): 9.82 MB

####################################################
RUN 2: huge xlsx file async iteration profiling started
Reading worksheet 1
Reading row 50000
Reading row 100000
Reading worksheet 2
Reading row 150000
Processed 2 worksheets and 150002 rows
RUN 2: huge xlsx file async iteration profiling finished in 5870ms
RUN 2: Current memory usage (before GC): 37.96 MB
RUN 2: Current memory usage (after GC): 9.85 MB

####################################################
RUN 3: huge xlsx file async iteration profiling started
Reading worksheet 1
Reading row 50000
Reading row 100000
Reading worksheet 2
Reading row 150000
Processed 2 worksheets and 150002 rows
RUN 3: huge xlsx file async iteration profiling finished in 5902ms
RUN 3: Current memory usage (before GC): 39.57 MB
RUN 3: Current memory usage (after GC): 9.88 MB

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah let's do it - other than that, the PR looks great!

Doing a profiling in chrome dev tools shows that the function decodeAddress and validateAddress in lib/utils/col-cache.js are taking considerable amount of time, mainly on the cache lookup and regexp matching. This would make that faster, especially with large worksheets.
@myfreeer myfreeer force-pushed the col-cache-performance branch from 8a47ae7 to e8731f8 Compare October 5, 2020 14:20
Copy link
Member

@alubbe alubbe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you!

@alubbe alubbe merged commit 624d72f into exceljs:master Oct 5, 2020
@myfreeer myfreeer deleted the col-cache-performance branch October 5, 2020 14:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants