perf: improve for source providing huge list of items #1980

yioneko · 2024-07-09T13:27:47Z

Problem

nvim-cmp is still not ideally responsive when dealing with huge list of completion items, especially with tailwindcss which sends thousands of candidates on each completion response: #1009 #1828 #1949. Though some improvements have been done before but cmp is still apparently slower than other completion engine like vscode and coc.nvim in extreme cases.

Attempt to improve

This PR extracts two commits from my experimental perf branch fork which I think could be better upstreamed. The details and rationales are explained in commit descriptions.

The commits introduce massive changes to the codebase, and I'm not sure if you'd like to accept them. Since the changes are made just for improving performance in extreme cases, it's totally fine for me if you consider that it is not worth it.

Comparison

Recorded at a110e12:

PR:

epheien · 2024-07-09T14:54:09Z

Good jobs.
This plugin is still very useful.
Unfortunately, the author doesn't seem to have much time to maintain this plugin.

idelice · 2024-08-04T09:58:16Z

@hrsh7th please let's get this performance boost in asap

Shougo · 2024-08-04T10:18:58Z

But it seems draft state. It is ready to merge?

idelice · 2024-08-04T10:41:13Z

I tested this by forking @yioneko and there is a huge difference in the autocompletion. Everything works like the same - just better.

I also tested it by forking the original nvim-cmp and copying this PR changes over, and things seems to break. Could be because it's few commits behind which are kinda deal-breaking!

yioneko · 2024-08-04T12:33:55Z

@idelice Could you expand on what is breaking? I just rebased the branch onto the main locally and it seems to work fine as before. I've been using this for about one month personally without issues and I think this is generally ready for review if no more problems found.

idelice · 2024-08-04T12:36:15Z

@idelice Could you expand on what is breaking? I just rebased the branch onto the main locally and it seems to work fine as before. I've been using this for about one month personally without issues and I think this is generally ready for review if no more problems found.

I don't know. I'm not familiar with lua enough to pinpoint what caused my user-testing to fail

Shougo · 2024-08-04T14:54:01Z

Hm... I think more testers are needed.

xzbdmw · 2024-08-05T03:49:36Z

Works well so far 😄

lopi-py · 2024-08-26T00:26:49Z

It works fine for me, no issues. I'll be using this fork for now, @yioneko thanks for the improvement!

tronikelis · 2024-08-27T15:15:01Z

This is awesome, but why is it still draft? @yioneko you feel like its not ready yet?

…ters This mainly addresses the perf issue on large amount of calls to `entry.new`. Previously every `cache.ensure` calls in the code path of it creates an anonymous function, and it seems that luajit just could not inline it. Function creation is not expensive in luajit, but that overhead is noticeable if every `cache.ensure` call creates a function. The first improvemnt is to solidate the cache callback and attach it to the metatable of `entry`. This ensures that every created entry instance share the same cache callback and no new functions will be frequently created, reduces the ram usage and GC overhead. To improve it further, some frequently accessed fields of entry like `completion_item` and `offset` is refactored to use simple table access instead of getter pattern. The current cached getter is implemented using `cache.ensure`, which introduces two more levels of function calls on each access: `cache.key` and `cache.get`. The overhead is okay if but noticeable if entries amount is quite large: you need to call 4 functions on a simple `completion_item` field access for each item. All of the changes done in the commit is just constant time optimization. But the different is huge if tested with LS providing large amount of entries like tailwindcss.

`entry.get_vim_item` is a very heavy call, especially when user do complex stuff on item formatting. Delay its call to window displaying to let `performance.max_view_entries` applied to it.

yioneko · 2024-08-28T09:53:26Z

I think this is generally ready for review. It might do not matter whether the PR status is draft or open as it seems that it would take a while to receive response from the owner, so I'll just change it to indicate that the branch here is open to user testing.

About the CI failing: leafo/gh-actions-lua#49 and it passed before the rebase and all the tests are passed on my local environment.

pidgeon777 · 2024-08-31T15:54:41Z

Hopefully this will be merged, can't wait to test it.

xzbdmw · 2024-09-02T05:25:58Z

Tested with cmp-dictionary, and with 1000 items this branch is 6x faster, 300ms vs 2s 🚀

rockyzhang24 · 2024-09-02T05:34:06Z

I know I could use this PR personally but I still desire it gets merged and I believe it must benefit every user. Apologize sincerely for the ping @hrsh7th .

pidgeon777 · 2024-09-02T11:56:34Z

I tested the PR and I also noticed a big improvement in performance.

EuCaue · 2024-09-18T11:45:07Z

I tested this PR and noticed a significant improvement in performance. I haven't encountered any bugs. (yet?)

ref: hrsh7th/nvim-cmp#1980

pidgeon777 · 2024-10-01T10:08:20Z

Hello! I would like to kindly ask if there is any news about the testing outcome

hrsh7th · 2024-10-09T03:45:51Z

Sorry for the late reply. I'll do review.

hrsh7th · 2024-10-16T15:18:28Z

lua/cmp/entry.lua

-    }
+  -- https://www.lua.org/pil/11.6.html
+  -- do not use '..' to allocate multiple strings
+  local cache_key = string.format('%s:%d:%d:%d:%d:%d:%d', input, self.resolved_completion_item and 1 or 0, matching_config.disallow_fuzzy_matching and 1 or 0, matching_config.disallow_partial_matching and 1 or 0, matching_config.disallow_prefix_unmatching and 1 or 0, matching_config.disallow_partial_fuzzy_matching and 1 or 0, matching_config.disallow_symbol_nonprefix_matching and 1 or 0)


In my opinion this does not affect performance.
The issue of allocating multiple strings only arises when concatenating strings in a loop.

(But I can accept this.)

I don't believe that is true, especially for nested expressions, as every expression separately has to be evaluated and each evaluation will allocate a string.

My benchmark is here.

local ITERATION = 1000000000 local function consume(_) end local function run(name, f) collectgarbage('collect') collectgarbage('setpause') local s = os.clock() for _ = 0, ITERATION, 1 do consume(f()) end local e = os.clock() print(('%s'):format(name)) print((' %s seconds'):format(e - s)) print((' %s kb'):format(collectgarbage('count'))) end local a = string.rep('a', 100) local b = string.rep('b', 100) local c = string.rep('c', 100) local d = string.rep('d', 100) local e = string.rep('e', 100) local f = string.rep('f', 100) local g = string.rep('g', 100) vim.print(('Lua version %s (%s)'):format(_VERSION, type(jit) == 'table' and 'JIT' or 'PUC')) for i = 1, 10 do run(('[%s] format'):format(i), function() return string.format('%s%s%s%s%s%s%s', a, b, c, d, e, f, g) end) run(('[%s] syntax'):format(i), function() return a .. b .. c .. d .. e .. f .. g end) end

Lua version Lua 5.1 (JIT) [1] format 0.309662 seconds 7442.2548828125 kb [1] syntax 0.287758 seconds 7442.9619140625 kb [2] format 0.288337 seconds 7443.9189453125 kb [2] syntax 0.289305 seconds 7444.6806640625 kb [3] format 0.28824099999999 seconds 7445.8564453125 kb [3] syntax 0.288509 seconds 7447.3330078125 kb [4] format 0.28775099999999 seconds 7446.8486328125 kb [4] syntax 0.29039950000001 seconds 7446.5986328125 kb [5] format 0.288085 seconds 7446.4580078125 kb [5] syntax 0.290503 seconds 7446.3955078125 kb [6] format 0.28749599999998 seconds 7446.4111328125 kb [6] syntax 0.287892 seconds 7446.3955078125 kb [7] format 0.28724800000001 seconds 7446.4111328125 kb [7] syntax 0.287656 seconds 7446.3955078125 kb [8] format 0.28767599999998 seconds 7446.4111328125 kb [8] syntax 0.28742099999999 seconds 7446.4111328125 kb [9] format 0.28775399999998 seconds 7446.4111328125 kb [9] syntax 0.28724700000001 seconds 7446.4111328125 kb [10] format 0.28726999999998 seconds 7446.4111328125 kb [10] syntax 0.287116 seconds 7446.3955078125 kb

Can you try with nested expressions (using ())?

Also the benchmark uses the same strings so you'll get the benefit of string interning.

Benchmark updated.

However, it was necessary to use math.random.
This may have become a benchmark for math.random.

local ITERATION = 10000000 local function consume(_) end local function run(name, f) collectgarbage('collect') collectgarbage('stop') local s = os.clock() for _ = 0, ITERATION, 1 do consume(f()) end local e = os.clock() print(('%s'):format(name)) print((' %s seconds'):format(e - s)) print((' %s kb'):format(collectgarbage('count'))) collectgarbage('restart') end vim.print(('Lua version %s (%s)'):format(_VERSION, type(jit) == 'table' and 'JIT' or 'PUC')) for i = 1, 10 do run(('[%s] format'):format(i), function() local a = tostring(math.random()) local b = tostring(math.random()) local c = tostring(math.random()) return string.format('%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s', a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c) end) run(('[%s] syntax'):format(i), function() local a = tostring(math.random()) local b = tostring(math.random()) local c = tostring(math.random()) return (a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c) end) end

Lua version Lua 5.1 (JIT) [1] format 0.173224 seconds 7593.55078125 kb [1] syntax 0.161859 seconds 6650.484375 kb [2] format 0.158104 seconds 6652.5 kb [2] syntax 0.159953 seconds 6654.12890625 kb [3] format 0.158455 seconds 6656.18359375 kb [3] syntax 0.157741 seconds 6656.9921875 kb [4] format 0.160288 seconds 6645.15234375 kb [4] syntax 0.160823 seconds 6644.81640625 kb [5] format 0.161516 seconds 6644.77734375 kb [5] syntax 0.159043 seconds 6644.62890625 kb [6] format 0.159526 seconds 6644.71484375 kb [6] syntax 0.159805 seconds 6644.62890625 kb [7] format 0.160698 seconds 6644.71484375 kb [7] syntax 0.159825 seconds 6644.62890625 kb [8] format 0.160317 seconds 6644.71484375 kb [8] syntax 0.161488 seconds 6644.62890625 kb [9] format 0.159925 seconds 6644.71484375 kb [9] syntax 0.159897 seconds 6644.62890625 kb [10] format 0.15959 seconds 6644.70703125 kb [10] syntax 0.159551 seconds 6644.62890625 kb```

Good to know. Thanks again for doing that. I'll reference this for future discussions.

Sorry. my benchmark is wrong.

It seems to affect to performance to use string.format instead.

local ITERATION = 1000000 local function consume(data) _G.data = data end local function run(name, f) collectgarbage('collect') collectgarbage('stop') local s = os.clock() for _ = 0, ITERATION, 1 do consume(f()) end local e = os.clock() print(('%s'):format(name)) print((' %s seconds'):format(e - s)) print((' %s kb'):format(collectgarbage('count'))) collectgarbage('restart') end vim.print(('Lua version %s (%s)'):format(_VERSION, type(jit) == 'table' and 'JIT' or 'PUC')) for i = 1, 10 do run(('[%s] format'):format(i), function() local a = tostring(math.random()) local b = tostring(math.random()) local c = tostring(math.random()) return string.format('%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s', a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c) end) run(('[%s] syntax'):format(i), function() local a = tostring(math.random()) local b = tostring(math.random()) local c = tostring(math.random()) return (a .. b .. c .. a .. b .. c .. a .. b) .. (c .. (a .. b) .. (c .. a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c) .. a .. b .. c .. (a .. b .. c .. a .. b .. c) .. a .. b .. (c .. a .. b .. c) .. a .. b .. c) end) end

Lua version Lua 5.1 (JIT) [1] format 2.233753 seconds 1088391.6679688 kb [1] syntax 3.371306 seconds 2050993.4609375 kb [2] format 1.949485 seconds 1088394.8554688 kb [2] syntax 3.23479 seconds 2051043.8085938 kb [3] format 2.216916 seconds 1088433.4296875 kb [3] syntax 3.398898 seconds 2051061.4921875 kb [4] format 1.848707 seconds 1088388.6757813 kb [4] syntax 3.150517 seconds 2051042.03125 kb [5] format 1.838001 seconds 1088439.6914063 kb [5] syntax 3.136715 seconds 2051044.0859375 kb [6] format 1.865727 seconds 1088417.75 kb [6] syntax 3.187182 seconds 2051002.7890625 kb [7] format 1.915407 seconds 1088402.2890625 kb [7] syntax 3.131942 seconds 2051061.3242188 kb [8] format 1.882645 seconds 1088402.6679688 kb [8] syntax 3.171707 seconds 2051076.8320313 kb [9] format 1.829586 seconds 1088384.7734375 kb [9] syntax 3.060516 seconds 2050994.234375 kb [10] format 1.831612 seconds 1088394.1328125 kb [10] syntax 3.126076 seconds 2051067.203125 kb

Sorry, do you mean string.format was always faster? What specifically is different in the last run?

The difference is

local function consume(data) _G.data = data -- ← ADD THIS end

Probably, this disables JIT compilation, which makes it slower for syntax case.

The results show that string.format is consistently faster.
And there's no speed difference in a JIT environment.

I had prepared a consume function to properly control JIT compilation, but it didn't seem to work.
I fixed this, so I think the benchmark is probably working properly now.

hrsh7th · 2024-10-17T09:24:30Z

My last review subject is entry.lua, which is a bit complicated. It will take some time.

yioneko force-pushed the perf-up branch 3 times, most recently from e3d9729 to 9033adc Compare July 15, 2024 01:05

yioneko added 2 commits August 28, 2024 11:34

perf: delay fuzzy match on displayed vim item

a475d8b

`entry.get_vim_item` is a very heavy call, especially when user do complex stuff on item formatting. Delay its call to window displaying to let `performance.max_view_entries` applied to it.

yioneko force-pushed the perf-up branch from 9033adc to a475d8b Compare August 28, 2024 09:40

yioneko marked this pull request as ready for review August 28, 2024 09:54

xzbdmw mentioned this pull request Sep 2, 2024

1-5s Delay when snippets enabled for certain LSP #2028

Open

2 tasks

WilliamHsieh added a commit to WilliamHsieh/dotfiles that referenced this pull request Sep 26, 2024

perf(cmp): use the perf fork until the PR is merged

8ed830e

ref: hrsh7th/nvim-cmp#1980

iurimateus mentioned this pull request Oct 11, 2024

perf: avoid source slowdown due to unnecessary context switches #2060

Closed

Shougo mentioned this pull request Oct 12, 2024

Speed / lag is hard to troubleshoot ? #1819

Open

2 tasks

iguanacucumber mentioned this pull request Oct 15, 2024

matcher issue #1971

Open

2 tasks

hrsh7th reviewed Oct 16, 2024

View reviewed changes

hrsh7th added 2 commits October 20, 2024 13:15

remove unneeded fill_defaults

a1ec102

update gha

ea5427c

hrsh7th force-pushed the perf-up branch from 31300ce to ea5427c Compare October 20, 2024 04:24

hrsh7th merged commit 1a1d7ec into hrsh7th:main Oct 20, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: improve for source providing huge list of items #1980

perf: improve for source providing huge list of items #1980

yioneko commented Jul 9, 2024 •

edited

Loading

epheien commented Jul 9, 2024

idelice commented Aug 4, 2024

Shougo commented Aug 4, 2024

idelice commented Aug 4, 2024

yioneko commented Aug 4, 2024

idelice commented Aug 4, 2024

Shougo commented Aug 4, 2024

xzbdmw commented Aug 5, 2024

lopi-py commented Aug 26, 2024

tronikelis commented Aug 27, 2024

yioneko commented Aug 28, 2024

pidgeon777 commented Aug 31, 2024

xzbdmw commented Sep 2, 2024 •

edited

Loading

rockyzhang24 commented Sep 2, 2024

pidgeon777 commented Sep 2, 2024

EuCaue commented Sep 18, 2024

pidgeon777 commented Oct 1, 2024

hrsh7th commented Oct 9, 2024

hrsh7th Oct 16, 2024

lewis6991 Oct 16, 2024

hrsh7th Oct 17, 2024

lewis6991 Oct 17, 2024 •

edited

Loading

hrsh7th Oct 17, 2024

lewis6991 Oct 17, 2024

hrsh7th Oct 17, 2024

lewis6991 Oct 17, 2024

hrsh7th Oct 17, 2024

hrsh7th Oct 17, 2024

hrsh7th commented Oct 17, 2024

perf: improve for source providing huge list of items #1980

perf: improve for source providing huge list of items #1980

Conversation

yioneko commented Jul 9, 2024 • edited Loading

Problem

Attempt to improve

Comparison

epheien commented Jul 9, 2024

idelice commented Aug 4, 2024

Shougo commented Aug 4, 2024

idelice commented Aug 4, 2024

yioneko commented Aug 4, 2024

idelice commented Aug 4, 2024

Shougo commented Aug 4, 2024

xzbdmw commented Aug 5, 2024

lopi-py commented Aug 26, 2024

tronikelis commented Aug 27, 2024

yioneko commented Aug 28, 2024

pidgeon777 commented Aug 31, 2024

xzbdmw commented Sep 2, 2024 • edited Loading

rockyzhang24 commented Sep 2, 2024

pidgeon777 commented Sep 2, 2024

EuCaue commented Sep 18, 2024

pidgeon777 commented Oct 1, 2024

hrsh7th commented Oct 9, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lewis6991 Oct 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hrsh7th commented Oct 17, 2024

yioneko commented Jul 9, 2024 •

edited

Loading

xzbdmw commented Sep 2, 2024 •

edited

Loading

lewis6991 Oct 17, 2024 •

edited

Loading