You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
num_batch can greatly impact inference performance at the cost of more VRAM usage.
Depending on the task, it can be beneficial to sacrifice say some context size if it allows you to increase num_batch or vice versa.
e.g:
Fast responses: num_batch: 2048, num_ctx: 8192
Larger context, but a bit slower: num_batch: 512, num_ctx 32768
Middleground: num_batch: 1024, num_ctx: 16384
Describe the solution you'd like
It would be great if you could set num_batch in Open WebUI.
It would also be really useful if basic parameters such as num_ctx, num_batch, temp/top, num_keep etc... were available in the chat interface without having to go into settings -> advanced and tweak them there each time.
Describe alternatives you've considered
Right now I'm having to create several copies of models with the num_ctx/num_batch in their name in order to quickly switch between settings. It works, but it's painful.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
num_batch can greatly impact inference performance at the cost of more VRAM usage.
Depending on the task, it can be beneficial to sacrifice say some context size if it allows you to increase num_batch or vice versa.
e.g:
Describe the solution you'd like
It would be great if you could set
num_batch
in Open WebUI.It would also be really useful if basic parameters such as num_ctx, num_batch, temp/top, num_keep etc... were available in the chat interface without having to go into settings -> advanced and tweak them there each time.
Describe alternatives you've considered
Right now I'm having to create several copies of models with the num_ctx/num_batch in their name in order to quickly switch between settings. It works, but it's painful.
The text was updated successfully, but these errors were encountered: