-
-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: Maintain Context Length for Title Generation Using Same Model for Chat #3106
Comments
The current workaround involves not defining the context length in Open WebUI via parameters (thus keeping it as default), but instead specifying it in the model's blob config file. However, this approach has a limitation: if the default context length is overridden via Open WebUI parameters, the context length specified in the config file will no longer be observed, and so the title generation will use size 2048 again compared to what has been defined higher in Open WebUI parameters. |
I found a workaround! :) You need to create your own modelfile using the Admin -> Models and then use it for chat and Title generation: Deepseek-v2-4096 FROM deepseek-v2:16b |
Did not work for me, It only works when num_ctx is set in original models blob file. For a derived one it did still use 2000. |
Worked for me. |
Is your feature request related to a problem? Please describe.
The current implementation of the title generation functionality in Open WebUI during chat initiation does not maintain context length (unless keept as default OR model definition contains a num_ctx already that overrides default), which causes the model to be reloaded every time in title generation. This can lead to performance issues and inconvenience for users, as it requires the model to reload with new context length each time a new title is generated during chat initiation, and then subsequent another loading for continuing the chat.
Describe the solution you'd like
I propose enhancing the generate_openai_chat_completion function to maintain context length during title generation when starting a chat. The function should check if the current model has been used before and, if this model is also defined to be used for title generation, use the previously stored context length to avoid reloading the model in ollama backend. This can be achieved in a safe manner by ensuring that if the default num_ctx has not been changed via general settings, no override request is made; otherwise, the same context length is reused as defined there.
Describe alternatives you've considered
An alternative solution would involve modifying the generate_title function to include a separate parameter for the context length. However, this approach would require changes in other parts of the codebase, also it would also necessitate user intervention to manually edit the context length based on the backend's loaded num_ctx for the model.
Additional context
This feature request is relevant for Open WebUI users who frequently generate titles during chat initiation using LLMs. By addressing the current issue with maintaining context length, we can improve overall performance and user experience in this specific use case of using a model for chat and title generation in tandem.
The text was updated successfully, but these errors were encountered: