-
Notifications
You must be signed in to change notification settings - Fork 400
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: improve for source providing huge list of items #1980
Conversation
Good jobs. |
e3d9729
to
9033adc
Compare
@hrsh7th please let's get this performance boost in asap |
But it seems draft state. It is ready to merge? |
I tested this by forking @yioneko and there is a huge difference in the autocompletion. Everything works like the same - just better. I also tested it by forking the original nvim-cmp and copying this PR changes over, and things seems to break. Could be because it's few commits behind which are kinda deal-breaking! |
@idelice Could you expand on what is breaking? I just rebased the branch onto the main locally and it seems to work fine as before. I've been using this for about one month personally without issues and I think this is generally ready for review if no more problems found. |
I don't know. I'm not familiar with lua enough to pinpoint what caused my user-testing to fail |
Hm... I think more testers are needed. |
Works well so far 😄 |
It works fine for me, no issues. I'll be using this fork for now, @yioneko thanks for the improvement! |
This is awesome, but why is it still draft? @yioneko you feel like its not ready yet? |
…ters This mainly addresses the perf issue on large amount of calls to `entry.new`. Previously every `cache.ensure` calls in the code path of it creates an anonymous function, and it seems that luajit just could not inline it. Function creation is not expensive in luajit, but that overhead is noticeable if every `cache.ensure` call creates a function. The first improvemnt is to solidate the cache callback and attach it to the metatable of `entry`. This ensures that every created entry instance share the same cache callback and no new functions will be frequently created, reduces the ram usage and GC overhead. To improve it further, some frequently accessed fields of entry like `completion_item` and `offset` is refactored to use simple table access instead of getter pattern. The current cached getter is implemented using `cache.ensure`, which introduces two more levels of function calls on each access: `cache.key` and `cache.get`. The overhead is okay if but noticeable if entries amount is quite large: you need to call 4 functions on a simple `completion_item` field access for each item. All of the changes done in the commit is just constant time optimization. But the different is huge if tested with LS providing large amount of entries like tailwindcss.
`entry.get_vim_item` is a very heavy call, especially when user do complex stuff on item formatting. Delay its call to window displaying to let `performance.max_view_entries` applied to it.
I think this is generally ready for review. It might do not matter whether the PR status is draft or open as it seems that it would take a while to receive response from the owner, so I'll just change it to indicate that the branch here is open to user testing. About the CI failing: leafo/gh-actions-lua#49 and it passed before the rebase and all the tests are passed on my local environment. |
Hopefully this will be merged, can't wait to test it. |
Tested with cmp-dictionary, and with 1000 items this branch is 6x faster, 300ms vs 2s 🚀 |
I know I could use this PR personally but I still desire it gets merged and I believe it must benefit every user. Apologize sincerely for the ping @hrsh7th . |
I tested the PR and I also noticed a big improvement in performance. |
I tested this PR and noticed a significant improvement in performance. I haven't encountered any bugs. (yet?) |
Hello! I would like to kindly ask if there is any news about the testing outcome |
Sorry for the late reply. I'll do review. |
} | ||
-- https://www.lua.org/pil/11.6.html | ||
-- do not use '..' to allocate multiple strings | ||
local cache_key = string.format('%s:%d:%d:%d:%d:%d:%d', input, self.resolved_completion_item and 1 or 0, matching_config.disallow_fuzzy_matching and 1 or 0, matching_config.disallow_partial_matching and 1 or 0, matching_config.disallow_prefix_unmatching and 1 or 0, matching_config.disallow_partial_fuzzy_matching and 1 or 0, matching_config.disallow_symbol_nonprefix_matching and 1 or 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion this does not affect performance.
The issue of allocating multiple strings only arises when concatenating strings in a loop.
(But I can accept this.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe that is true, especially for nested expressions, as every expression separately has to be evaluated and each evaluation will allocate a string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My benchmark is here.
local ITERATION = 1000000000
local function consume(_)
end
local function run(name, f)
collectgarbage('collect')
collectgarbage('setpause')
local s = os.clock()
for _ = 0, ITERATION, 1 do
consume(f())
end
local e = os.clock()
print(('%s'):format(name))
print((' %s seconds'):format(e - s))
print((' %s kb'):format(collectgarbage('count')))
end
local a = string.rep('a', 100)
local b = string.rep('b', 100)
local c = string.rep('c', 100)
local d = string.rep('d', 100)
local e = string.rep('e', 100)
local f = string.rep('f', 100)
local g = string.rep('g', 100)
vim.print(('Lua version %s (%s)'):format(_VERSION, type(jit) == 'table' and 'JIT' or 'PUC'))
for i = 1, 10 do
run(('[%s] format'):format(i), function()
return string.format('%s%s%s%s%s%s%s', a, b, c, d, e, f, g)
end)
run(('[%s] syntax'):format(i), function()
return a .. b .. c .. d .. e .. f .. g
end)
end
Lua version Lua 5.1 (JIT)
[1] format
0.309662 seconds
7442.2548828125 kb
[1] syntax
0.287758 seconds
7442.9619140625 kb
[2] format
0.288337 seconds
7443.9189453125 kb
[2] syntax
0.289305 seconds
7444.6806640625 kb
[3] format
0.28824099999999 seconds
7445.8564453125 kb
[3] syntax
0.288509 seconds
7447.3330078125 kb
[4] format
0.28775099999999 seconds
7446.8486328125 kb
[4] syntax
0.29039950000001 seconds
7446.5986328125 kb
[5] format
0.288085 seconds
7446.4580078125 kb
[5] syntax
0.290503 seconds
7446.3955078125 kb
[6] format
0.28749599999998 seconds
7446.4111328125 kb
[6] syntax
0.287892 seconds
7446.3955078125 kb
[7] format
0.28724800000001 seconds
7446.4111328125 kb
[7] syntax
0.287656 seconds
7446.3955078125 kb
[8] format
0.28767599999998 seconds
7446.4111328125 kb
[8] syntax
0.28742099999999 seconds
7446.4111328125 kb
[9] format
0.28775399999998 seconds
7446.4111328125 kb
[9] syntax
0.28724700000001 seconds
7446.4111328125 kb
[10] format
0.28726999999998 seconds
7446.4111328125 kb
[10] syntax
0.287116 seconds
7446.3955078125 kb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you try with nested expressions (using ()
)?
Also the benchmark uses the same strings so you'll get the benefit of string interning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Benchmark updated.
However, it was necessary to use math.random.
This may have become a benchmark for math.random.
local ITERATION = 10000000
local function consume(_)
end
local function run(name, f)
collectgarbage('collect')
collectgarbage('stop')
local s = os.clock()
for _ = 0, ITERATION, 1 do
consume(f())
end
local e = os.clock()
print(('%s'):format(name))
print((' %s seconds'):format(e - s))
print((' %s kb'):format(collectgarbage('count')))
collectgarbage('restart')
end
vim.print(('Lua version %s (%s)'):format(_VERSION, type(jit) == 'table' and 'JIT' or 'PUC'))
for i = 1, 10 do
run(('[%s] format'):format(i), function()
local a = tostring(math.random())
local b = tostring(math.random())
local c = tostring(math.random())
return string.format('%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s', a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c)
end)
run(('[%s] syntax'):format(i), function()
local a = tostring(math.random())
local b = tostring(math.random())
local c = tostring(math.random())
return (a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c)
end)
end
Lua version Lua 5.1 (JIT)
[1] format
0.173224 seconds
7593.55078125 kb
[1] syntax
0.161859 seconds
6650.484375 kb
[2] format
0.158104 seconds
6652.5 kb
[2] syntax
0.159953 seconds
6654.12890625 kb
[3] format
0.158455 seconds
6656.18359375 kb
[3] syntax
0.157741 seconds
6656.9921875 kb
[4] format
0.160288 seconds
6645.15234375 kb
[4] syntax
0.160823 seconds
6644.81640625 kb
[5] format
0.161516 seconds
6644.77734375 kb
[5] syntax
0.159043 seconds
6644.62890625 kb
[6] format
0.159526 seconds
6644.71484375 kb
[6] syntax
0.159805 seconds
6644.62890625 kb
[7] format
0.160698 seconds
6644.71484375 kb
[7] syntax
0.159825 seconds
6644.62890625 kb
[8] format
0.160317 seconds
6644.71484375 kb
[8] syntax
0.161488 seconds
6644.62890625 kb
[9] format
0.159925 seconds
6644.71484375 kb
[9] syntax
0.159897 seconds
6644.62890625 kb
[10] format
0.15959 seconds
6644.70703125 kb
[10] syntax
0.159551 seconds
6644.62890625 kb```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to know. Thanks again for doing that. I'll reference this for future discussions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry. my benchmark is wrong.
It seems to affect to performance to use string.format
instead.
local ITERATION = 1000000
local function consume(data)
_G.data = data
end
local function run(name, f)
collectgarbage('collect')
collectgarbage('stop')
local s = os.clock()
for _ = 0, ITERATION, 1 do
consume(f())
end
local e = os.clock()
print(('%s'):format(name))
print((' %s seconds'):format(e - s))
print((' %s kb'):format(collectgarbage('count')))
collectgarbage('restart')
end
vim.print(('Lua version %s (%s)'):format(_VERSION, type(jit) == 'table' and 'JIT' or 'PUC'))
for i = 1, 10 do
run(('[%s] format'):format(i), function()
local a = tostring(math.random())
local b = tostring(math.random())
local c = tostring(math.random())
return string.format('%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s', a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c, a, b, c)
end)
run(('[%s] syntax'):format(i), function()
local a = tostring(math.random())
local b = tostring(math.random())
local c = tostring(math.random())
return (a .. b .. c .. a .. b .. c .. a .. b) .. (c .. (a .. b) .. (c .. a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c .. a .. b .. c) .. a .. b .. c .. (a .. b .. c .. a .. b .. c) .. a .. b .. (c .. a .. b .. c) .. a .. b .. c)
end)
end
Lua version Lua 5.1 (JIT)
[1] format
2.233753 seconds
1088391.6679688 kb
[1] syntax
3.371306 seconds
2050993.4609375 kb
[2] format
1.949485 seconds
1088394.8554688 kb
[2] syntax
3.23479 seconds
2051043.8085938 kb
[3] format
2.216916 seconds
1088433.4296875 kb
[3] syntax
3.398898 seconds
2051061.4921875 kb
[4] format
1.848707 seconds
1088388.6757813 kb
[4] syntax
3.150517 seconds
2051042.03125 kb
[5] format
1.838001 seconds
1088439.6914063 kb
[5] syntax
3.136715 seconds
2051044.0859375 kb
[6] format
1.865727 seconds
1088417.75 kb
[6] syntax
3.187182 seconds
2051002.7890625 kb
[7] format
1.915407 seconds
1088402.2890625 kb
[7] syntax
3.131942 seconds
2051061.3242188 kb
[8] format
1.882645 seconds
1088402.6679688 kb
[8] syntax
3.171707 seconds
2051076.8320313 kb
[9] format
1.829586 seconds
1088384.7734375 kb
[9] syntax
3.060516 seconds
2050994.234375 kb
[10] format
1.831612 seconds
1088394.1328125 kb
[10] syntax
3.126076 seconds
2051067.203125 kb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, do you mean string.format
was always faster? What specifically is different in the last run?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The difference is
local function consume(data)
_G.data = data -- ← ADD THIS
end
Probably, this disables JIT compilation, which makes it slower for syntax
case.
The results show that string.format
is consistently faster.
And there's no speed difference in a JIT environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had prepared a consume function to properly control JIT compilation, but it didn't seem to work.
I fixed this, so I think the benchmark is probably working properly now.
My last review subject is entry.lua, which is a bit complicated. It will take some time. |
Problem
nvim-cmp is still not ideally responsive when dealing with huge list of completion items, especially with tailwindcss which sends thousands of candidates on each completion response: #1009 #1828 #1949. Though some improvements have been done before but cmp is still apparently slower than other completion engine like vscode and coc.nvim in extreme cases.
Attempt to improve
This PR extracts two commits from my experimental perf branch fork which I think could be better upstreamed. The details and rationales are explained in commit descriptions.
The commits introduce massive changes to the codebase, and I'm not sure if you'd like to accept them. Since the changes are made just for improving performance in extreme cases, it's totally fine for me if you consider that it is not worth it.
Comparison
Recorded at a110e12:
PR: