Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: direct llama.cpp integration #1483

Open
tjbck opened this issue Apr 10, 2024 · 8 comments
Open

feat: direct llama.cpp integration #1483

tjbck opened this issue Apr 10, 2024 · 8 comments
Assignees

Comments

@tjbck
Copy link
Contributor

tjbck commented Apr 10, 2024

No description provided.

@jukofyork
Copy link

jukofyork commented Apr 10, 2024

Just a quick follow-up to say it seems to work fine:

  • I had to change the llama.cpp server port to 8081 to not clash with OpenWebUI (eg: ./server --port 8081 ...).
  • Then set the OpenAPI base URL to http://127.0.0.1:8081/v1 and the API Key to not be blank (eg: none) in OpenWebUI settings.

and it seems to be calling the OAI-like API endpoint on the llama.cpp server fine. It wasn't that clear I needed to add the /v1 to the URL and ensure the API Key not be blank though (had to find by trial and error).

The only difference I can see is there is no little "information" icon like there was with Ollama models, but it does seem to be calling the OAI-like API endpoint to get these stats:

{
  "tid": "140627543928832",
  "timestamp": 1712766280,
  "level": "INFO",
  "function": "print_timings",
  "line": 313,
  "msg": "prompt eval time     =     129.89 ms /    55 tokens (    2.36 ms per token,   423.43 tokens per second)",
  "id_slot": 0,
  "id_task": 13,
  "t_prompt_processing": 129.892,
  "n_prompt_tokens_processed": 55,
  "t_token": 2.3616727272727274,
  "n_tokens_second": 423.42869460782805
}

I'll report back if I can see any other major differences, but otherwise 👍

@jukofyork
Copy link

I've used this quite a bit with llama.cpp server now and the only problem I've come across is pressing the stop button doesn't actually disconnect/stop the generation. This was a problem with the Ollama server and was fixed AFAIK:

#1166
#1170

It would be helpful if this could be added to the OpenAI API code too, as otherwise the only way currently to stop runaway LLMs is to Control-C the running server and restart it.

@jukofyork
Copy link

Another thing that might be helpful would be to add an option to hide the "Modelfiles" and "Prompts" menu options in the left, as these aren't able to be used with the OpenAI API and just add clutter.

@tjbck
Copy link
Contributor Author

tjbck commented Apr 14, 2024

@jukofyork I'll start working on this feature after #665, we should strive to keep all the core features.

@DenisSergeevitch
Copy link

Small update: Stop generation button is still an issue

@justinh-rahb
Copy link
Collaborator

@DenisSergeevitch that is unrelated to the issue being discussed here. Let's keep discussion of the stop generation function here:

@tjbck
Copy link
Contributor Author

tjbck commented Jun 13, 2024

Related: #1166

I'm sorry, looks like it was my mistake or something with my setup (reverse proxies?) caused a problem. Can confirm that everything works as expected with the current open webui and ollama docker. :) thanks for the great software

#1568

I've completely ditched Ollama and just moved over to the llama.cpp server and just want to say thanks as it's working really smoothly with Open-WebUI! 👍

@jukofyork @DenisSergeevitch @SN4K3D @0x7CFE

Correct me if I'm wrong but, stop generation button not actually stopping is only an issue when running LLMs with Ollama using CPU only and a vast majority of us face zero issue with terminating the response using a stop button. Could anyone confirm this with the latest? I appreciate it!

@SN4K3D
Copy link

SN4K3D commented Jul 24, 2024

I confirm the issue it is with ollama.cpp using LLMs CPU only, today i have try with the latest version and the stop button work , this is top all thread ollama can be launch.
Thanks all for your work its appreciated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants