-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Display rate limits information from OpenAI headers when returning 429 from their API #3743
Comments
You're likely hitting the 30,000 tokens per minute rate limit, which will throw a 429 if your input |
I did some more testing, and I agree with this but am not sure why it's happening. If I start a new chat in owui, I don't get the 429. Why should a long chat get it? Is it sending the entire context/conversation back to openai with every request? The exercise has been to create a script for a video. I had it do research with webpages I provided and then come up with an outline. We iterated on the outline for a bit, and then I had it write a script for each section of the outline. It got through section 5 before everything failed. The only way that I can think that this would happen is if each request sent back something from the requests up to that point, which makes each subsequent request longer until I pop over the 30k TPM limit on a single request. At the same time, that sort of thing seems completely illogical because it would make API requests exponentially larger over time within the same chat window. The request that fails simply says, "Generate the script for section 6," although it also fails after that if I say "hello." It's like the whole chat session is busted. What am I missing? |
That's how LLMs work, you have to send the entire conversation every time you want the model to output something. Open WebUI currently does not truncate or summarise any previous messages. The context clip filter can be used to only retain the last n number of messages, and another potential strategy that hasn't been implemented is to have an LLM summarise the past messages. |
Hrm...ok. I was not aware of that. You can close this request if you don't think it's useful to implement. Thanks for taking the time to explain things. |
Is your feature request related to a problem? Please describe.
I'm getting a 429 error when using gpt-4o, but when I query the same endpoint with the same API key, the headers show that I'm not above my rate limits at all.
Describe the solution you'd like
If open-webui receives a 429 from OpenAI, include the header information that shows the current limit status and when they reset:
Describe alternatives you've considered
Short of running my own program to extract this information, I don't have any alternative. I can't see what OWUI is sending or why it would trigger a rate limit on the OpenAI side of things. It's been returning a 429 for over an hour, so my only recourse is to go back to using the ChatGPT webui to finish my project.
Additional context
This almost feels like a bug of some kind because I should get the same error when I make a query directly with the same API key to the same endpoint. I don't. Since OWUI is returning the response code from OpenAI, it would be helpful if it logged any additional information about what it's sending and what the response headers contained. Here's the script I wrote to query the current limits:
The text was updated successfully, but these errors were encountered: