feat: smart context length managment #1268

tjbck · 2024-03-22T21:39:33Z

e.g. messages.length > 10, slice

lainedfles · 2024-03-26T18:10:31Z

Great idea. I think it would be beneficial to cache this litellm file anyway which contains useful information including max_tokens:
https://raw.githubusercontent.com/BerriAI/litellm/main/model_prices_and_context_window.json

While it may not be useful for local Ollama models, the Ollama Modelfile syntax supports the num_ctx parameter and can be queried via the API. A good strategy may be to leverage the litellm JSON data for external models like OpenAI and presume that all Ollama context length is the Ollama default of 2048 unless otherwise determined by the Modelfile parameter for a given model.

Although it's still beneficial to retain configurability for cases where you don't require the max context or if the information is absent.

VertexMachine · 2024-06-05T18:30:05Z

Let me add to this some more information. This is very needed if you use APIs, and not only OpenAI API, but others like OpenRouter or Infermatics. Some models endpoints just fail when you exceed the context length (returning error 400), some will incur massive cost for the user (as it grows with context size, and turncation might be a good option in those cases). Unfortunately, the problem is that there is no standardized tokenization endpoint defined in OpenAI compatibile API. OpenAI recommend using https://github.com/openai/tiktoken on client side.

As a workaround I do use AutoTokenizer from Transformers (from transformers import AutoTokenizer ) to calculate token count in my apps. This is the function I've written (feel free to incorporate it in your code):

    def get_token_count(self, prompt: str, raw: bool = False) -> int:
        """
        Get the token count of the given prompt.

        If raw is True than we don't count BOS and EOS tokens.
        """
        if self.tokenizer is None:
            raise ValueError("Tokenizer is not selected")

        return len(self.tokenizer.encode(prompt, add_special_tokens=False)) if raw else len(self.tokenizer.encode(prompt))   1

I made it generic as sometimes I don't want to have BOS/EOS token counted (hf.tokniezers by default add BOS, but not EOS).

There are a few issues here:

one have to create appropriate tokenizer, and a lot of models out there have different ones (as even fine tunes can add/remove some tokens). The good news is, that if you know the model name from HF, tokenizer created from it, eg. , tokenizer = AutoTokenizer.from_pretrained("model_name", legacy=False) will download appropriate files.
Another bad news is that for example for use of official Meta's Llama3 tokenizer from their repo, you have to agree to their terms on HF and wait a bit, so you should handle fall backs or direct user to it as well.
Also you have to know the model name, which is often different than the name from API endpoint. To quickly workaround I created such mapping for my apps:

TOKENIZER_MAP = [
    # TotalGPT/Infermatic.ai models
    ("Midnight-Miqu-70B-v1.5", "sophosympatheia/Midnight-Miqu-70B-v1.5"),
    ("CodeLlama-13b-Instruct-hf", "codellama/CodeLlama-13b-Instruct-hf"),
    ("MiquMaid-v3-70B", "NeverSleep/MiquMaid-v3-70B"),
    ("UNA-SimpleSmaug-34b-v1beta", "fblgit/UNA-SimpleSmaug-34b-v1beta"),
    ("L3-MS-Astoria-70b", "Steelskull/L3-MS-Astoria-70b"),
    ("Mixtral-8x7B-Instruct-v0.1", "mistralai/Mixtral-8x7B-Instruct-v0.1"),
    ("miquliz-120b-v2.0", "wolfram/miquliz-120b-v2.0"),
    ("Noromaid-v0.4-Mixtral-Instruct-8x7b-Zloss", "NeverSleep/Noromaid-v0.4-Mixtral-Instruct-8x7b-Zloss"),
    ("Smaug-Llama-3-70B-Instruct", "abacusai/Smaug-Llama-3-70B-Instruct"),
    # OpenRouter models
    ("mistralai/mistral-7b-instruct:free", "mistralai/Mistral-7B-Instruct-v0.1"),
    ("alpindale/goliath-120b", "alpindale/goliath-120b"),
    ("sao10k/fimbulvetr-11b-v2", "Sao10K/Fimbulvetr-11B-v2"),
    ("cognitivecomputations/dolphin-mixtral-8x7b", "cognitivecomputations/dolphin-2.6-mixtral-8x7b"),
    ("cohere/command-r", "CohereForAI/c4ai-command-r-v01"),
    ("cohere/command-r-plus", "CohereForAI/c4ai-command-r-plus"),
    ("meta-llama/llama-3-70b-instruct", "Undi95/Meta-Llama-3-8B-hf"),
    ("neversleep/llama-3-lumimaid-70b", "NeverSleep/Llama-3-Lumimaid-70B-v0.1"),
    # Generic fallback guesses
    ("8x22B", "mistralai/Mixtral-8x22B-v0.1"),
    ("llama-3", "Undi95/Meta-Llama-3-8B-hf"),
    ("l3", "Undi95/Meta-Llama-3-8B-hf")
]

Also, feel free to use the above mapping as a starting point. As a last effort fallback I'm using simply gpt2 tokenizer AutoTokenizer.from_pretrained("gpt2").

tjbck · 2024-06-19T00:31:00Z

Filter function from #3247 will resolved this. You can essentially write your own custom middleware and install it with functions.

tjbck · 2024-06-30T03:17:59Z

https://openwebui.com/f/hub/context_clip_filter

Feedback wanted here!

tjbck added enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed core core feature labels Mar 24, 2024

tjbck mentioned this issue Apr 30, 2024

Truncating or Summarizing Message History to handle Long Conversations for Models with Small context windows #1866

Closed

tjbck mentioned this issue May 22, 2024

Unable to Directly Set Model Parameters Through Advanced Parameters by Administrator #2496

Closed

4 tasks

tjbck mentioned this issue Jun 4, 2024

Support client-side context clipping using OpenAI compliant API #2819

Closed

tjbck pinned this issue Jun 8, 2024

tjbck mentioned this issue Jun 9, 2024

Request to Limit Dialog Rounds posted to llm #2949

Closed

open-webui locked and limited conversation to collaborators Jun 30, 2024

tjbck converted this issue into discussion #3537 Jun 30, 2024

tjbck unpinned this issue Jul 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

feat: smart context length managment #1268

feat: smart context length managment #1268

tjbck commented Mar 22, 2024 •

edited

Loading

lainedfles commented Mar 26, 2024 •

edited

Loading

VertexMachine commented Jun 5, 2024

tjbck commented Jun 19, 2024

tjbck commented Jun 30, 2024

This issue was moved to a discussion.

This issue was moved to a discussion.

feat: smart context length managment #1268

feat: smart context length managment #1268

Comments

tjbck commented Mar 22, 2024 • edited Loading

lainedfles commented Mar 26, 2024 • edited Loading

VertexMachine commented Jun 5, 2024

tjbck commented Jun 19, 2024

tjbck commented Jun 30, 2024

This issue was moved to a discussion.

tjbck commented Mar 22, 2024 •

edited

Loading

lainedfles commented Mar 26, 2024 •

edited

Loading