Pulse · EricLBuehler/mistral.rs · GitHub

November 26, 2024 – December 3, 2024

Overview

19 Active pull requests

44 Active issues

1 Release published by 1 person

v0.3.4
published Nov 28, 2024

18 Pull requests merged by 4 people

Bitsandbytes quantization: loading and kernels
#967 merged Dec 4, 2024
Fix example gguf_locally to match chat template requirements
#966 merged Dec 3, 2024
Fix metal fp8 quantization
#962 merged Dec 2, 2024
use llguidance library for constraints (including json schemas)
#899 merged Dec 2, 2024
Improve test speeds on Windows
#961 merged Dec 1, 2024
Ensure support for cuda cc 5.3
#960 merged Dec 1, 2024
Fix completion api behavior of best_of
#959 merged Dec 1, 2024
Fix append_sliding_window
#958 merged Dec 1, 2024
set minimum rustc version to 1.82
#957 merged Dec 1, 2024
Perplexity calculations with imatrix
#952 merged Dec 1, 2024
Support imatrix quantization for vision models
#950 merged Nov 30, 2024
Implement Imatrix for ISQ
#949 merged Nov 30, 2024
Better diffusion interactive mode
#948 merged Nov 29, 2024
fix(docs): fix broken link
#945 merged Nov 29, 2024
Use CUDA_COMPUTE_CAP if nvidia-smi not found
#944 merged Nov 28, 2024
Prepare for v0.3.4
#942 merged Nov 28, 2024
Expose a public tokenization API
#940 merged Nov 28, 2024
Implement the Idefics 3 models (Idefics 3, SmolVLM-Instruct)
#939 merged Nov 27, 2024

1 Pull request opened by 1 person

Bitsandbytes quantization ISQ support
#968 opened Dec 4, 2024

37 Issues closed by 5 people

Failed to parse Cargo.toml: [workspace] missing field `package`
#883 closed Dec 2, 2024
0.3.2 #891 build failure on Windows 11
#896 closed Dec 2, 2024
AICI -> llguidance?
#754 closed Dec 2, 2024
mistral-server with n>1 only returns one result
#955 closed Dec 2, 2024
phi3 output garbage on master
#956 closed Dec 2, 2024
is_streaming: true gives unreachable code panic
#953 closed Dec 1, 2024
Build on ubuntu 24.04 with src/cast.cu
#951 closed Dec 1, 2024
Feature Req: Add Importance Matrix / RAM avail calculations to ISQ
#377 closed Nov 30, 2024
Container image fails to start with 'Unable to dynamically load the "cuda" shared library'
#478 closed Nov 28, 2024
RequestMessage with tool_calls in Assistant message.
#793 closed Nov 28, 2024
How's the M1 performance compare with llama.cpp or ollama?
#673 closed Nov 28, 2024
Cuda error when running gemma2
#715 closed Nov 28, 2024
mistral does not support NVIDIA V100 (compute_cap <= 800)
#305 closed Nov 28, 2024
Batched & chunked prefill
#216 closed Nov 28, 2024
Add C api and provide shared and static libraries.
#258 closed Nov 28, 2024
Enabling prefix cache for llama3 gguf
#347 closed Nov 28, 2024
Is there any plans to support AWQ and GPTQ in the future?
#418 closed Nov 28, 2024
Support loading tokenizer from `sentencepiece` model
#407 closed Nov 28, 2024
Compilation failure with `--features flash-attn`
#485 closed Nov 28, 2024
Extending AnyMoE to support heterogeneous expert types
#544 closed Nov 28, 2024
Installation from PyPi doesn't work
#548 closed Nov 28, 2024
/usr/bin/ld: cannot find -lstdc : No such file or directory
#553 closed Nov 28, 2024
Python metal package runs on CPU
#555 closed Nov 28, 2024
Pre-built binary for linux fails to launch with "error while loading shared libraries: libssl.so.1.1"
#624 closed Nov 28, 2024
Streamed inference not as smooth (fast?) as with e.g. Ollama - Llama 3.1
#630 closed Nov 28, 2024
Llama 3.2 on macOS: "Metal contiguous affine I64 not implemented"
#807 closed Nov 28, 2024
Unexpected CUDA out of memory for minimal example
#819 closed Nov 28, 2024
How to use UQFF File locally without sending requests to Hugging Face?
#821 closed Nov 28, 2024
0.3.1 #862 new build failure, stop at mistralrs-quant
#866 closed Nov 28, 2024
Docker Build Failure: mistralrs-quant Fails with "No such file or directory" Error
#893 closed Nov 28, 2024
Vision interactive mode for gguf models
#882 closed Nov 28, 2024
CUDA out of memory with a presumed "full" offload to CPU
#751 closed Nov 28, 2024
cuda error not found
#654 closed Nov 28, 2024
Pre-built binary for macOS Silicon does not seem to use Metal / GPU
#629 closed Nov 28, 2024
WSL2 Docker error loading llama-3.1 gguf
#679 closed Nov 28, 2024
Slow CUDA inference speed
#763 closed Nov 28, 2024
CUDA_ERROR_ILLEGAL_ADDRESS when running Llama3 and Llama3.1
#783 closed Nov 28, 2024

7 Issues opened by 5 people

fast-forward tokens with llguidance
#965 opened Dec 2, 2024
parallel computation of mask in constrained sampling
#964 opened Dec 2, 2024
rejection sampling for `top_p` etc
#963 opened Dec 2, 2024
Possible problem with candle 0.8.0 - doesn't build on a GTX1650 (CI 75) nor a GTX1070 (CI 61)
#954 opened Dec 1, 2024
Create and load standalone quantized UQFF models
#947 opened Nov 29, 2024
DiffusionArchitecture not found in python package
#943 opened Nov 28, 2024
Flash Attention not building
#941 opened Nov 27, 2024

9 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

Phi-3.5-vision-Instruct muliples images loading
#795 commented on Nov 27, 2024 • 0 new comments
Error: Enable to run Lora - Adapter files are empty
#929 commented on Nov 28, 2024 • 0 new comments
Error: DriverError(CUDA_ERROR_INVALID_PTX, "a PTX JIT compilation failed") when loading utanh_bf16
#850 commented on Nov 28, 2024 • 0 new comments
Integrating Mistral.rs with Swiftide
#843 commented on Nov 28, 2024 • 0 new comments
Multi Image an Multi Prompt issue using Mistral.rs
#853 commented on Nov 30, 2024 • 0 new comments
Model Wishlist
#156 commented on Dec 2, 2024 • 0 new comments
Confusion around loading a GGUF locally
#922 commented on Dec 2, 2024 • 0 new comments
Couldnt run any vision model
#935 commented on Dec 2, 2024 • 0 new comments
Add Parler TTS
#791 commented on Dec 2, 2024 • 0 new comments