Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: allow setting num_gpu parameter #2877

Open
mherrmann3 opened this issue Jun 6, 2024 · 3 comments
Open

feat: allow setting num_gpu parameter #2877

mherrmann3 opened this issue Jun 6, 2024 · 3 comments

Comments

@mherrmann3
Copy link

Is your feature request related to a problem? Please describe.
To avoid creating a new modelfile only for adjusting/finetuning the number of layers offloaded to the GPU, make this setting ( num_gpu) user-configurable, which ollama considers as one of the most common parameters.

Describe the solution you'd like
Implement and add 'num_gpu (Ollama)' in the 'Advanced Params' section of a model in Workspace > Models.
(I would not add it to the 'Advanced Parameters' section of Settings > General, as the number and size of layers is model- and-quant-specific.

Describe alternatives you've considered
Well, creating a new ollama model(file) with an adjusted num_gpu, but this is cumbersome if one wants to adjust/finetune num_gpu (or modify it quickly if the GPU runs other things/models).

Additional context
num_gpu is not specified1 in the official ollama docs as valid PARAMETER, but is supported by the API.

Footnotes

  1. like use_mmap, use_mlock, and num_thread, which are already configurable in open_webui.

@Qualzz
Copy link

Qualzz commented Jul 4, 2024

bumping this

@derpyhue
Copy link

derpyhue commented Jul 4, 2024

derpyhue/openwebui_num_gpu@fff91f7
By editing these files it will enable the use of changing num_gpu layers.
Might need a bit of polishing.
This is my first time committing something to GitHub but i hope it helps!

@JKratto
Copy link

JKratto commented Jul 10, 2024

1
Thank you for this. I am looking forward to the merge. Ollama changed the memory allocation strategy (or so I think), and suddenly, the whole model "does not fit VRAM" (only 30/33 layers are offloaded to GPU). But in reality, it can fit 33/33 while still having about 25 % free VRAM. The performance penalty for the 30/33 scenario is about - 70 % loss of throughput (Mixtral 8x7b). Adjusting this setting for my machine would go a long way, as I would not have to create my own model to overcome this issue. It's not a big problem; it just seems cleaner. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants