feat: direct llama.cpp integration #1483

tjbck · 2024-04-10T03:50:45Z

No description provided.

jukofyork · 2024-04-10T16:31:13Z

Just a quick follow-up to say it seems to work fine:

I had to change the llama.cpp server port to 8081 to not clash with OpenWebUI (eg: ./server --port 8081 ...).
Then set the OpenAPI base URL to http://127.0.0.1:8081/v1 and the API Key to not be blank (eg: none) in OpenWebUI settings.

and it seems to be calling the OAI-like API endpoint on the llama.cpp server fine. It wasn't that clear I needed to add the /v1 to the URL and ensure the API Key not be blank though (had to find by trial and error).

The only difference I can see is there is no little "information" icon like there was with Ollama models, but it does seem to be calling the OAI-like API endpoint to get these stats:

{
  "tid": "140627543928832",
  "timestamp": 1712766280,
  "level": "INFO",
  "function": "print_timings",
  "line": 313,
  "msg": "prompt eval time     =     129.89 ms /    55 tokens (    2.36 ms per token,   423.43 tokens per second)",
  "id_slot": 0,
  "id_task": 13,
  "t_prompt_processing": 129.892,
  "n_prompt_tokens_processed": 55,
  "t_token": 2.3616727272727274,
  "n_tokens_second": 423.42869460782805
}

I'll report back if I can see any other major differences, but otherwise 👍

jukofyork · 2024-04-12T10:27:19Z

I've used this quite a bit with llama.cpp server now and the only problem I've come across is pressing the stop button doesn't actually disconnect/stop the generation. This was a problem with the Ollama server and was fixed AFAIK:

#1166
#1170

It would be helpful if this could be added to the OpenAI API code too, as otherwise the only way currently to stop runaway LLMs is to Control-C the running server and restart it.

jukofyork · 2024-04-12T11:01:47Z

Another thing that might be helpful would be to add an option to hide the "Modelfiles" and "Prompts" menu options in the left, as these aren't able to be used with the OpenAI API and just add clutter.

tjbck · 2024-04-14T20:35:40Z

@jukofyork I'll start working on this feature after #665, we should strive to keep all the core features.

DenisSergeevitch · 2024-04-26T00:41:38Z

Small update: Stop generation button is still an issue

justinh-rahb · 2024-04-26T00:47:58Z

@DenisSergeevitch that is unrelated to the issue being discussed here. Let's keep discussion of the stop generation function here:

BUG: Stop button don't stop generation ollama #1568

tjbck · 2024-06-13T19:31:43Z

Related: #1166

I'm sorry, looks like it was my mistake or something with my setup (reverse proxies?) caused a problem. Can confirm that everything works as expected with the current open webui and ollama docker. :) thanks for the great software

#1568

I've completely ditched Ollama and just moved over to the llama.cpp server and just want to say thanks as it's working really smoothly with Open-WebUI! 👍

@jukofyork @DenisSergeevitch @SN4K3D @0x7CFE

Correct me if I'm wrong but, stop generation button not actually stopping is only an issue when running LLMs with Ollama using CPU only and a vast majority of us face zero issue with terminating the response using a stop button. Could anyone confirm this with the latest? I appreciate it!

SN4K3D · 2024-07-24T09:59:20Z

I confirm the issue it is with ollama.cpp using LLMs CPU only, today i have try with the latest version and the stop button work , this is top all thread ollama can be launch.
Thanks all for your work its appreciated

tjbck mentioned this issue Apr 10, 2024

feat: multiple OpenAI connections #693

Closed

tjbck self-assigned this Apr 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: direct llama.cpp integration #1483

feat: direct llama.cpp integration #1483

tjbck commented Apr 10, 2024

jukofyork commented Apr 10, 2024 •

edited

Loading

jukofyork commented Apr 12, 2024

jukofyork commented Apr 12, 2024

tjbck commented Apr 14, 2024

DenisSergeevitch commented Apr 26, 2024

justinh-rahb commented Apr 26, 2024

tjbck commented Jun 13, 2024 •

edited

Loading

SN4K3D commented Jul 24, 2024

feat: direct llama.cpp integration #1483

feat: direct llama.cpp integration #1483

Comments

tjbck commented Apr 10, 2024

jukofyork commented Apr 10, 2024 • edited Loading

jukofyork commented Apr 12, 2024

jukofyork commented Apr 12, 2024

tjbck commented Apr 14, 2024

DenisSergeevitch commented Apr 26, 2024

justinh-rahb commented Apr 26, 2024

tjbck commented Jun 13, 2024 • edited Loading

SN4K3D commented Jul 24, 2024

jukofyork commented Apr 10, 2024 •

edited

Loading

tjbck commented Jun 13, 2024 •

edited

Loading