Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: User can use Stop Words presets and add custom #3536

Open
2 of 3 tasks
Tracked by #3025
dan-homebrew opened this issue Sep 3, 2024 · 2 comments
Open
2 of 3 tasks
Tracked by #3025

feat: User can use Stop Words presets and add custom #3536

dan-homebrew opened this issue Sep 3, 2024 · 2 comments
Assignees
Labels
category: model settings Inference params, presets, templates category: threads & chat Threads & chat UI UX issues P2: nice to have Nice to have feature type: feature request A new feature

Comments

@dan-homebrew
Copy link
Contributor

dan-homebrew commented Sep 3, 2024

Problem

  • There are many users looking to enter multiple stop words
  • The "stop" section for configuring the model does not make it clear whether it should be a comma-separated list, a newline-separated list, or something else.
  • It would be nifty (and I think aligned with how other apps do it) if as you insert a token, it creates a standalone entry in the box. - Something like: <endofstring>, <new_sentence>, <this_is_the_end>.
  • And then you could individually click-to-delete any of the "stop" tags you created. Hopefully that is comprehensible.

Solution

From technical aspect, there are 2 cases:

  • Models from cortexso source: the stop token is included in <model_id>.yaml file, so Jan app just read from it to place it as default.
  • Models from other source: With cortex.cpp, when read model from gguf file, it also contains the stop words list, cortex.cpp will read and write to <model_id>.yaml, Jan app can read from <model_id>.yaml to set as default

Format

Most of model introduce stop tokens with this format <{content}> . The content is different for each model arch. So I think we can predefine a list option of map model's arch : list stop words like this for user to choses:

  • llama3: ["<|eot_id|>", "<|end_of_text|>"]
  • mistral: ["< /s > "]
  • ...

Maybe 5 or 6 popular models arch is enough and another option to let users input whatever they want (this feature may be only for power user or dev because normal user might only use default configuration)

Design

Figma: https://www.figma.com/design/DYfpMhf8qiSReKvYooBgDV/Jan-App-(3rd-version)?node-id=8281-97234&t=qb7yU8r2PAayVdNW-4

  • Use a tag-like interface where each stop word is in its own removable "pill"
  • Users can add new tags and remove existing ones

Default Stop Words:

  • These come from the model's YAML file
  • Should be visually distinct and are recommended not to be removed (they're usually carefully chosen by the model creators for optimal performance, removing these could potentially cause issues with the model's behavior or output quality).
  • Users should understand these are recommended for the model

Image

Predefined Options:

Offer a dropdown preset or quick-select for common stop words based on model architecture

Image

Custom/User-Added Stop Words:

  • Added by the user & should be removable

Image

Task

@dan-homebrew dan-homebrew changed the title Stop Word Settings ux: Stop Word Settings Sep 3, 2024
@dan-homebrew dan-homebrew added the needs verification Needs to be verified, unsure if true label Sep 3, 2024
@dan-homebrew
Copy link
Contributor Author

dan-homebrew commented Sep 3, 2024

@nguyenhoangthuan99 however there are some clarifications needed from Inference team. What is the stop token format?

  • I think the existing "Stop word" (as per the UX above) is incorrect
  • We need to be technically accurate
  • Should we prefill the <> for the user?
<|special_token|>
<|end_of_text|>
<|eom_id|>

@dan-homebrew dan-homebrew changed the title ux: Stop Word Settings feat: Stop Word Settings Sep 3, 2024
@imtuyethan imtuyethan added the needs designs Needs designs label Sep 3, 2024
@imtuyethan imtuyethan self-assigned this Sep 3, 2024
@nguyenhoangthuan99
Copy link

nguyenhoangthuan99 commented Sep 4, 2024

Stop words of a model can be a list so I think we can make it like this
image

From technical aspect, I think there are 2 cases we can follow:

  • Models from cortexso source: the stop token is included in <model_id>.yaml file, so Jan app just read from it to place it as default.
  • Models from other source: With cortex.cpp, when read model from gguf file, it also contains the stop words list, cortex.cpp will read and write to <model_id>.yaml, Jan app can read from <model_id>.yaml to set as default

Format
Most of model introduce stop tokens with this format <{content}> . The content is different for each model arch. So I think we can predefine a list option of map model's arch : list stop words like this for user to choses:

  • llama3: ["<|eot_id|>", "<|end_of_text|>"]
  • mistral: ["< /s > "]
    ...

Maybe 5 or 6 popular models arch is enough and another option to let users input whatever they want (this feature may be only for power user or dev because normal user might only use default configuration)

@imtuyethan imtuyethan removed needs designs Needs designs needs verification Needs to be verified, unsure if true labels Sep 5, 2024
@dan-homebrew dan-homebrew changed the title feat: Stop Word Settings feat: User can use Stop Words presets and add custom Sep 11, 2024
@0xSage 0xSage added category: threads & chat Threads & chat UI UX issues P2: nice to have Nice to have feature type: feature request A new feature category: model settings Inference params, presets, templates and removed category: engines labels Oct 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: model settings Inference params, presets, templates category: threads & chat Threads & chat UI UX issues P2: nice to have Nice to have feature type: feature request A new feature
Projects
Status: Scheduled
Development

No branches or pull requests

5 participants