Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: IQ quants support #2631

Closed
mr-september opened this issue Apr 5, 2024 · 8 comments
Closed

feat: IQ quants support #2631

mr-september opened this issue Apr 5, 2024 · 8 comments
Assignees
Labels
Milestone

Comments

@mr-september
Copy link

Problem
GGUF models quantized with IQ quants fail to load.

Success Criteria
Load and play as usual

Additional context
IQ quants: ggerganov/llama.cpp#4773

Example model with both traditional Q and new IQ quants: https://huggingface.co/bartowski/Starling_Monarch_Westlake_Garten-7B-v0.1-GGUF

@mr-september mr-september added the type: feature request A new feature label Apr 5, 2024
@Van-QA Van-QA added type: bug Something isn't working type: feature request A new feature and removed type: feature request A new feature type: bug Something isn't working labels Apr 6, 2024
@Van-QA
Copy link
Contributor

Van-QA commented Apr 6, 2024

hi @mr-september,

  • As I tested, the imported gguf model (Starling_Monarch_Westlake_Garten-7B-v0.1-Q2_K) is working. Can you tell us more details about the issue that you are facing?
image

Many thanks

@Van-QA Van-QA self-assigned this Apr 6, 2024
@mr-september
Copy link
Author

mr-september commented Apr 6, 2024

Thanks for the quick reply. I think the models you downloaded are using traditional quants, could you please check with an IQ quant model? For example:
image

Also, is there some logs or some other outputs I could share which may help with troubleshooting?

@Van-QA
Copy link
Contributor

Van-QA commented Apr 6, 2024

thank @mr-september,
We were able to reproduce the issue using Starling_Monarch_Westlake_Garten-7B-v0.1-IQ4_XS.gguf. Dev team will investigate the issue soon.

@Van-QA
Copy link
Contributor

Van-QA commented Apr 10, 2024

hi @mr-september,

Using Jan v0.4.10-368 ✅, the Starling_Monarch_Westlake_Garten-7B-v0.1-IQ4_XS.gguf is able to generate response, would you like to try it as well?

image Thank you

@mr-september
Copy link
Author

Beautiful, it's working flawlessly! Very impressive turnaround!

@mr-september
Copy link
Author

Hi, I think the latest nightly (-376) broke support again. It was a prompted update at startup. Rolling back to -368 still works.

@mr-september mr-september reopened this Apr 14, 2024
@Van-QA Van-QA modified the milestones: v0.4.11, v0.4.12 Apr 15, 2024
@Van-QA Van-QA mentioned this issue Apr 15, 2024
6 tasks
@Van-QA
Copy link
Contributor

Van-QA commented Apr 16, 2024

hi @mr-september, sorry for the inconvenience, due to the Nitro that supports IQ quants is currently facing many issues. Which we have to temporally revert it, and currently working on a fix atm.
cc: @CameronNg @vansangpfiev

@Van-QA
Copy link
Contributor

Van-QA commented Apr 17, 2024

hi @mr-september, the latest nightly build Jan v0.4.11-386 resolved the issue with IQ Quant. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Archived in project
Development

No branches or pull requests

6 participants