Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Maintain Context Length for Title Generation Using Same Model for Chat #3106

Closed
chrisoutwright opened this issue Jun 12, 2024 · 5 comments

Comments

@chrisoutwright
Copy link

Is your feature request related to a problem? Please describe.
The current implementation of the title generation functionality in Open WebUI during chat initiation does not maintain context length (unless keept as default OR model definition contains a num_ctx already that overrides default), which causes the model to be reloaded every time in title generation. This can lead to performance issues and inconvenience for users, as it requires the model to reload with new context length each time a new title is generated during chat initiation, and then subsequent another loading for continuing the chat.

Describe the solution you'd like
I propose enhancing the generate_openai_chat_completion function to maintain context length during title generation when starting a chat. The function should check if the current model has been used before and, if this model is also defined to be used for title generation, use the previously stored context length to avoid reloading the model in ollama backend. This can be achieved in a safe manner by ensuring that if the default num_ctx has not been changed via general settings, no override request is made; otherwise, the same context length is reused as defined there.

Describe alternatives you've considered
An alternative solution would involve modifying the generate_title function to include a separate parameter for the context length. However, this approach would require changes in other parts of the codebase, also it would also necessitate user intervention to manually edit the context length based on the backend's loaded num_ctx for the model.

Additional context
This feature request is relevant for Open WebUI users who frequently generate titles during chat initiation using LLMs. By addressing the current issue with maintaining context length, we can improve overall performance and user experience in this specific use case of using a model for chat and title generation in tandem.

@chrisoutwright
Copy link
Author

The current workaround involves not defining the context length in Open WebUI via parameters (thus keeping it as default), but instead specifying it in the model's blob config file. However, this approach has a limitation: if the default context length is overridden via Open WebUI parameters, the context length specified in the config file will no longer be observed, and so the title generation will use size 2048 again compared to what has been defined higher in Open WebUI parameters.

@boshk0
Copy link

boshk0 commented Jun 19, 2024

I found a workaround! :) You need to create your own modelfile using the Admin -> Models and then use it for chat and Title generation:

Deepseek-v2-4096

FROM deepseek-v2:16b
SYSTEM """Answer in English"""
PARAMETER num_ctx 4096
PARAMETER temperature 0.0

@derpyhue
Copy link

derpyhue commented Jul 4, 2024

It is also possible to disable Title Auto-Generation in the settings menu.
It will set your first prompt as the chat name.

90b0118239844f2ab9a0cf83eeb5633d

@chrisoutwright
Copy link
Author

I found a workaround! :) You need to create your own modelfile using the Admin -> Models and then use it for chat and Title generation:

Deepseek-v2-4096

FROM deepseek-v2:16b SYSTEM """Answer in English""" PARAMETER num_ctx 4096 PARAMETER temperature 0.0

Did not work for me, It only works when num_ctx is set in original models blob file. For a derived one it did still use 2000.

@jessalfredsen
Copy link

I found a workaround! :) You need to create your own modelfile using the Admin -> Models and then use it for chat and Title generation:

Deepseek-v2-4096

FROM deepseek-v2:16b SYSTEM """Answer in English""" PARAMETER num_ctx 4096 PARAMETER temperature 0.0

Worked for me.
I copied the original Llama3.1 model file from ollama, modified it, and used Admin -> Models to create a 'new' version of the original llama3.1.
Seems to have worked. The endless model reloading have stopped. Thx @boshk0

@tjbck tjbck closed this as completed Sep 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants