-
Notifications
You must be signed in to change notification settings - Fork 325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Model Wishlist #156
Comments
qwen1.5-72B-Chat |
llama3 |
@NiuBlibing, we have llama3 support ready: the README has a few examples. I will add Qwen support shortly. |
@NiuBlibing, I just added Qwen2 support. Quantized Qwen2 support will be added in the next few days. |
Hello! |
@cargecla1, yes! It will be a great use case for ISQ. |
@francis2tm, yes. I plan on supporting Llava and embedding models this week. |
@NiuBlibing, you can run Qwen now with ISQ, which will quantize it. |
Would be nice to support at least one strong vision-language model: https://huggingface.co/openbmb/MiniCPM-V-2 https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5 with an option to compute visual frontend model on CPU. You might find it easier to ship visual transformer part via onnx. |
Would love to see some DeepSeek-VL, this model is better than Llava and spupports multiple images per prompt |
Also, outside the LLM world, would love to see support for https://github.com/cvg/LightGlue :) but not sure if that's possible ... |
Could you add support to for GGUF quantized Phi-3-Mini to the wishlist? Currently, this fails (built from master): Running `./mistralrs-server gguf -m PrunaAI/Phi-3-mini-128k-instruct-GGUF-Imatrix-smashed -t microsoft/Phi-3-mini-128k-instruct -f /home/jett/Downloads/llms/Phi-3-mini-128k-instruct-q3_K_S.gguf`
2024-04-29T03:08:35.180939Z INFO mistralrs_server: avx: true, neon: false, simd128: false, f16c: false
2024-04-29T03:08:35.180975Z INFO mistralrs_server: Sampling method: penalties -> temperature -> topk -> topp -> multinomial
2024-04-29T03:08:35.180982Z INFO mistralrs_server: Loading model `microsoft/Phi-3-mini-128k-instruct` on Cpu...
2024-04-29T03:08:35.180989Z INFO mistralrs_server: Model kind is: quantized from gguf (no adapters)
2024-04-29T03:08:35.181017Z INFO hf_hub: Token file not found "/home/jett/.cache/huggingface/token"
2024-04-29T03:08:35.181048Z INFO mistralrs_core::utils::tokens: Could not load token at "/home/jett/.cache/huggingface/token", using no HF token.
2024-04-29T03:08:35.181122Z INFO hf_hub: Token file not found "/home/jett/.cache/huggingface/token"
2024-04-29T03:08:35.181133Z INFO mistralrs_core::utils::tokens: Could not load token at "/home/jett/.cache/huggingface/token", using no HF token.
Error: Unknown GGUF architecture `phi3` |
It'll be great to see WizardLM-2 and suzume. And thanks for a great tool! |
Command-R and Command-R from Cohere would be amazing 🙏 |
T5 |
Supporting a vision language or multimodal model is very high priority right now.
I'll add this one too.
I will look into it!
Yes, absolutely, I think it should be easy. In the meantime, you can use ISQ to get the same speed.
Thanks! I think suzume is just finetuned Llama so that can be used already. I'll add WizardLM.
Yes, I'll add those.
Yes, I'll add those. T5 will be a nice smaller model. |
@EricLBuehler Thanks for your reply, for adding my suggestion to the model wishlist, and for developing such an awesome project! It's very appreciated :) |
Congrats for your great work! |
it would be nice to add some embedding models like nomic-text-embed. |
Hello, first of all, I want to express my appreciation for the excellent work your team has accomplished on the mistral.rs engine. It's a great project. I am currently developing a personal AI assistant using Rust, and I believe integrating additional features into your engine could significantly enhance its utility and appeal. Specifically, adding support for Whisper and incorporating Text-to-Speech (TTS) functionalities, such as StyleTTS or similar technologies, would be incredibly beneficial. This would enable the engine to handle LLM inference, speech-to-text, and text-to-speech processes in a unified system very fast (near runtime). Implementing these features could transform the engine into a more versatile tool for developers like myself, who are keen on building more integrated and efficient AI applications. |
@EricLBuehler Woah, thank you so much! This will be lovely for us folks with less powerful computers or size constraints, you're awesome :) |
@jett06, my pleasure! I just fixed a small bug (in case you saw the strange behavior), so it should be all ready to go now! |
IBM's Granite series Code Models. |
The The |
I'm working on it now.chenwanqq/candle-llava |
@sammcj that would be great, I can add that. |
Not sure I'm up to the task yet. However I noticed that candle added support for quantized Qwen2, can we re-use this? |
@bachp yes. If you want to add that, feel free. I can take a look in a few days. |
I'd like to try mistralai/Mistral-Nemo-Instruct-2407 https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407 Sounds like it has a similar architecture to Mistral 7B so hoping it won't be too much work. |
@joshpopelka20 I just merged this in #595!
|
@EricLBuehler thanks for adding that feature. I haven't been able to get it to run as I'm having an issue with paged attention code. I'll add an issue to track and give more details. |
Codestral Mamba |
Hello, thank you for open-sourcing this project! I would be interested in running Mistral Large Instruct 2407 GGUF. Trying to run inference on the Q5 K S quant with mistral.rs commit 38fb942 I get : MISTRALRS_DEBUG=1 ./target/release/./mistralrs-server --port 1234 --throughput gguf --quantized-model-id $D/models/ --quantized-filename Mistral-Large-Instruct-2407-Q5_K_S-00001-of-00003.gguf
2024-07-28T17:47:45.176615Z INFO mistralrs_server: avx: true, neon: false, simd128: false, f16c: true
2024-07-28T17:47:45.176929Z INFO mistralrs_server: Sampling method: penalties -> temperature -> topk -> topp -> minp -> multinomial
2024-07-28T17:47:45.177583Z INFO mistralrs_server: Model kind is: quantized from gguf (no adapters)
2024-07-28T17:47:45.180482Z INFO mistralrs_core::pipeline::paths: Loading `Mistral-Large-Instruct-2407-Q5_K_S-00001-of-00003.gguf` locally at `$D/models/Mistral-Large-Instruct-2407-Q5_K_S-00001-of-00003.gguf`
2024-07-28T17:47:45.181595Z INFO mistralrs_core::pipeline::gguf: Loading model `$D/models/` on cpu.
2024-07-28T17:47:45.266615Z INFO mistralrs_core::pipeline::gguf: Model config:
general.architecture: llama
general.basename: Mistral
general.file_type: 16
general.finetune: Instruct
general.languages: en, fr, de, es, it, pt, zh, ja, ru, ko
general.license: other
general.license.link: https://mistral.ai/licenses/MRL-0.1.md
general.license.name: mrl
general.name: Mistral Large Instruct 2407
general.quantization_version: 2
general.size_label: Large
general.type: model
general.version: 2407
llama.attention.head_count: 96
llama.attention.head_count_kv: 8
llama.attention.layer_norm_rms_epsilon: 0.00001
llama.block_count: 88
llama.context_length: 131072
llama.embedding_length: 12288
llama.feed_forward_length: 28672
llama.rope.dimension_count: 128
llama.rope.freq_base: 1000000
llama.vocab_size: 32768
quantize.imatrix.chunks_count: 148
quantize.imatrix.dataset: /training_dir/calibration_datav3.txt
quantize.imatrix.entries_count: 616
quantize.imatrix.file: /models_out/Mistral-Large-Instruct-2407-GGUF/Mistral-Large-Instruct-2407.imatrix
split.count: 3
split.no: 0
split.tensors.count: 795
2024-07-28T17:47:45.267503Z INFO mistralrs_core::pipeline::gguf: Debug is enabled, wrote the names and information about each tensor to `mistralrs_gguf_tensors.txt`.
2024-07-28T17:47:45.316860Z INFO mistralrs_core::gguf::gguf_tokenizer: GGUF tokenizer model is `llama`, kind: `Unigram`, num tokens: 32768, num added tokens: 0, num merges: 0, num scores: 32768
2024-07-28T17:47:45.316880Z INFO mistralrs_core::gguf::gguf_tokenizer: Tokenizer: Tokenizer(TokenizerImpl { normalizer: Some(Sequence(Sequence { normalizers: [Prepend(Prepend { prepend: "▁" }), Replace(Replace { pattern: String(" "), content: "▁", regex: SysRegex { regex: Regex { raw: 0x571b958c9200 } } })] })), pre_tokenizer: None, model: Unigram(Unigram { vocab: 32768, unk_id: Some(0), byte_fallback: true }), post_processor: None, decoder: Some(Sequence(Sequence { decoders: [Replace(Replace { pattern: String("▁"), content: " ", regex: SysRegex { regex: Regex { raw: 0x571b958c9500 } } }), ByteFallback(ByteFallback { type_: MustBe!("ByteFallback") }), Fuse(Fuse { type_: MustBe!("Fuse") }), Strip(Strip { content: ' ', start: 1, stop: 0 })] })), added_vocabulary: AddedVocabulary { added_tokens_map: {"</s>": 2, "<unk>": 0, "<s>": 1}, added_tokens_map_r: {2: AddedToken { content: "</s>", single_word: false, lstrip: false, rstrip: false, normalized: false, special: true }, 0: AddedToken { content: "<unk>", single_word: false, lstrip: false, rstrip: false, normalized: false, special: true }, 1: AddedToken { content: "<s>", single_word: false, lstrip: false, rstrip: false, normalized: false, special: true }}, added_tokens: [], special_tokens: [AddedToken { content: "<s>", single_word: false, lstrip: false, rstrip: false, normalized: false, special: true }, AddedToken { content: "</s>", single_word: false, lstrip: false, rstrip: false, normalized: false, special: true }, AddedToken { content: "<unk>", single_word: false, lstrip: false, rstrip: false, normalized: false, special: true }], special_tokens_set: {"<unk>", "<s>", "</s>"}, split_trie: (AhoCorasick(dfa::DFA(
D 000000: \x00-\x0E => 0
F 000016:
* 000032: \x00-\x0E => 0
matches: 1
* 000048: \x00-\x0E => 0
matches: 2
* 000064: \x00-\x0E => 0
matches: 0
>000080: \x00-\x02 => 80, \x03 => 208, \x04-\x0E => 80
000096: \x00-\x02 => 0, \x03 => 208, \x04-\x0E => 0
000112: \x00-\x02 => 80, \x03 => 208, \x04-\n => 80, \x0B => 128, \x0C-\x0E => 80
000128: \x00-\x02 => 80, \x03 => 208, \x04 => 80, \x05 => 32, \x06-\x0E => 80
000144: \x00-\x02 => 80, \x03 => 208, \x04 => 80, \x05 => 64, \x06-\x0E => 80
000160: \x00-\x02 => 80, \x03 => 208, \x04-\x08 => 80, \t => 176, \n-\x0E => 80
000176: \x00-\x02 => 80, \x03 => 208, \x04-\x06 => 80, \x07 => 192, \x08-\x0E => 80
000192: \x00-\x02 => 80, \x03 => 208, \x04 => 80, \x05 => 48, \x06-\x0E => 80
000208: \x00 => 80, \x01 => 112, \x02 => 80, \x03 => 208, \x04-\n => 80, \x0B => 144, \x0C => 80, \r => 160, \x0E => 80
match kind: LeftmostLongest
prefilter: true
state length: 14
pattern length: 3
shortest pattern length: 3
longest pattern length: 5
alphabet length: 15
stride: 16
byte classes: ByteClasses(0 => [0-46], 1 => [47], 2 => [48-59], 3 => [60], 4 => [61], 5 => [62], 6 => [63-106], 7 => [107], 8 => [108-109], 9 => [110], 10 => [111-114], 11 => [115], 12 => [116], 13 => [117], 14 => [118-255])
memory usage: 992
)
), [1, 2, 0]), split_normalized_trie: (AhoCorasick(dfa::DFA(
D 000000: \x00 => 0
F 000001:
>000002: \x00 => 2
000003: \x00 => 0
match kind: LeftmostLongest
prefilter: false
state length: 4
pattern length: 0
shortest pattern length: 18446744073709551615
longest pattern length: 0
alphabet length: 1
stride: 1
byte classes: ByteClasses(0 => [0-255])
memory usage: 16
)
), []), encode_special_tokens: false }, truncation: None, padding: None })
2024-07-28T17:47:45.318706Z INFO mistralrs_core::gguf::chat_template: Discovered and using GGUF chat template: `{%- if messages[0]['role'] == 'system' %}\n {%- set system_message = messages[0]['content'] %}\n {%- set loop_messages = messages[1:] %}\n{%- else %}\n {%- set loop_messages = messages %}\n{%- endif %}\n\n{{- bos_token }}\n{%- for message in loop_messages %}\n {%- if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}\n {{- raise_exception('After the optional system message, conversation roles must alternate user/assistant/user/assistant/...') }}\n {%- endif %}\n {%- if message['role'] == 'user' %}\n {%- if loop.last and system_message is defined %}\n {{- '[INST] ' system_message '\n\n' message['content'] '[/INST]' }}\n {%- else %}\n {{- '[INST] ' message['content'] '[/INST]' }}\n {%- endif %}\n {%- elif message['role'] == 'assistant' %}\n {{- ' ' message['content'] eos_token}}\n {%- else %}\n {{- raise_exception('Only user and assistant roles are supported, with the exception of an initial optional system message!') }}\n {%- endif %}\n{%- endfor %}\n`
Error: cannot find tensor info for output_norm.weight Attached: mistralrs_gguf_tensors.txt |
Hi guys, thanks for the awesome work. Is there any plan to support Idefics3 and InternVl2? |
hey, thanks for this awesome work as it allows people with fewer resources to run LLMs and VLMs on their machines. Are we planning to support TTS, STT and image generation models as well? There is a lot of buzz around Flux.1 these days. There are also some good open-source models out there for voice cloning etc. But once again I must appreciate projects like these to help out the community. 🥇 |
@bhupesh-sf, yes, I'm planning to expand into the multimodal space with a broad variety of models. As you suggested, TTS, STT, and image generation are all on the table as well as embedding models. @dancixx, yes, I plan to add Idefics 3 at least! |
So does it support Deepseek Coder yet? |
I'd appreciate if Qwen2-VL could be considered to add: https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct |
I suggest considering the addition of https://huggingface.co/openbmb/MiniCPM3-4B as well. |
Pixtral! https://mistral.ai/news/pixtral-12b/ |
Here are two open image text to image text models (these are the only ones I know of): Anole - Chameleon finetune with image output Lumina-mGPT - an independent project by Chinese researchers The Lumina project also made many text-to-other-modes models, in their Lumina-T2X subproject. |
Qwen-2.5 |
Quick status update: We have added the FLUX and Llama 3.2 Vision models recently, with Parler TTS support coming in #791. If anyone would be able to attempt an implementation of any of the requested models, that would be incredible! My idea of the priority of adding models is:
Please feel free to revise this order! @youcefs21 I think Pixtral would certainly be an interesting addition! I will take a look at adding that. @ethanc8 thanks for linking the models! I think Chameleon would be a cool add, the Anole model seems very interesting. @pigfoot We could add QwenVL-2 :) @dancixx Idefics 3 support was recently merged into transformers, so work can begin on that too. @bhupesh-sf I have added FLUX with Parler TTS being implemented in #791! |
@oldgithubman I just merged support for Qwen 2.5 in #805! |
Move Qwen2 VL up the list @EricLBuehler. It is super strong and super important - especially with the HTTP API server (and improved Metal support ;-)). |
@ChristianWeyer that sounds good :) |
@EricLBuehler Could you look at adding support for Aryn/deformable-detr-DocLayNet? I'd like to be able to segment PDFs, screenshots, and output of headless chrome for further processing and storing in a vector DB. Ideally, Would love to integrate mistral.rs into swiftide to pull the whole thing together locally and offline. I've opened up an issue on swiftide (356) to add support for the same workflow. EDIT: There's a parked swiftide issue to add support for mistralrs (56) |
Anole seems quite interesting, would prioritize it over pixtral as it can generate images as well |
Another interesting highly multimodal model is Emu3-Gen, which can take in images, video, and text, and output images, video, and text -- its video generation is slightly better than OpenSora, and it's also able to extend existing videos. You can see example generations on the website. |
Please support the glm-4-9b-chat model. The model address link is https://huggingface.co/THUDM/glm-4-9b-chat. |
https://huggingface.co/jinaai/jina-embeddings-v3 $ mistralrs-server --isq Q4K --interactive-mode plain --model-id jinaai/jina-embeddings-v3
Error: Unsupported Huggging Face Transformers -CausalLM model class `XLMRobertaModel`. Please raise an issue. |
Is there any way this library would support Stable Diffusion in the near future (as I saw that FLUX is already supported) with quantization and LoRA adapter capabilities? |
support for black-forest-labs/FLUX.1-Fill-dev .uqff format. |
The new qwq model that came out a week or so ago. It's very well regarded on the locallama subreddit (seems to be the communities darling at the moment), and tinkering with it myself has also yielded positive results. To be able to use it in mistral.rs would be fantastic! |
Please let us know what model architectures you would like to be added!
Up to date todo list below. Please feel free to contribute any model, a PR without device mapping, ISQ, etc. will still be merged!
Language models
Multimodal models
Embedding models
The text was updated successfully, but these errors were encountered: