-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: Adding correct example for model parameters with examples #741
Comments
Some clarification about ctx_len and max_tokens Q: Why there is no ctx_len in OpenAI API?OpenAI has pre-defined ctx_len for each of their model (for example gpt-3.5-turbo has ctx_len of 4096) and all the models are Q: What is the difference between ctx_len and max_tokenmax_token: Maximum number of tokens that you allow the model to generate during inferencing
ctx_len: The upper limit of token that can be processed during inference on the backend, the relationship between ctx_len and max_token: Q: What will happen if I input a chat that is longer than the ctx len ( max_token chat_token > ctx_len )In this scenario a context shift will happen, the inference will cut the extra context that is not fitted into ctx_len and keep doing inferencing normally, but it might lose some memory outside of ctx_len Q: What value should i add as default params for max_tokenIn practice you should both set ctx_len and max_token to be the same value, and it should follow the maximum token of the model. An example can be checked at: https://huggingface.co/TheBloke/neural-chat-7B-v3-1-AWQ. So normally just set both values to the values that is specified on where you download the model |
Problem
We do not have clear example at the jan docs page now
Success Criteria
Simple example of one model loading case
The text was updated successfully, but these errors were encountered: